Careers
We are a next generation collaboration platform that unites AI agents and humans to revolutionize the way enterprises work, innovate, and thrive.
Our Values
Current Openings
Senior Site Reliability Engineer
About The Position
Build the Future Workforce
Wand turns AI into labor. It enables humans and AI agents to operate together as a unified, hybrid workforce, with comprehensive management and oversight. And it’s already operating at scale inside some of the world’s largest organizations.
Wand built the world’s first Agentic Labor Infrastructure enabling governments and global enterprises to create, manage, and scale digital workforces.
Our mission is to integrate agent ecosystems into the core of work and business, unlocking a generational leap in the global economy. We’re building the infrastructure that lets humans and AI agents operate together safely, transparently, and at scale.
Join Wand in leading the Agentic Shift
Wand is building a high-performing global team who take full ownership of what they build. We lead by example, move fast, make data-aware decisions, and continuously push for more- always with a focus on delivering real value to customers.
You would be joining a world-class team that combines deep research expertise and real-world product execution, with experience spanning Deepmind, Google, Amazon, Miro, Elise AI, IBM and Accern.
Requirements
Position Summary
We are hiring an experienced Senior Site Reliability Engineer to help build, operate, and scale our production infrastructure and reliability practices.
This is a hands-on engineering role focused on owning reliability, automation, and operational excellence across our platform. You will work closely with the SRE, engineering, QA, and data teams to improve system stability, strengthen deployment pipelines, and ensure our AI-driven products operate reliably at scale.
You will contribute to the design and operation of our cloud infrastructure, help evolve Kubernetes-based systems, and improve observability and incident management practices. The role requires someone comfortable operating production systems, troubleshooting complex issues, and driving practical improvements to reliability and delivery.
Responsibilities
- Build, maintain, and operate scalable production infrastructure.
- Own reliability and availability for key services and environments.
- Contribute to the design and operation of Kubernetes-based infrastructure.
- Develop and maintain Infrastructure-as-Code frameworks (e.g., Terraform).
- Improve monitoring, alerting, and observability across systems.
- Participate in on-call rotations and respond to production incidents.
- Investigate root causes of incidents and contribute to postmortems and reliability improvements.
- Improve system performance, availability, and fault tolerance.
- Contribute to CI/CD pipeline improvements to increase release safety and predictability.
- Support the deployment and operation of data platforms and ML workloads.
- Help standardise environments and infrastructure across internal systems and customer deployments.
- Troubleshoot issues across infrastructure, services, and deployment pipelines.
- Work closely with QA and engineering teams to improve production readiness and release stability.
- Contribute to automation efforts that reduce operational toil.
Key Requirements
- Strong hands-on experience in Site Reliability Engineering, DevOps roles.
- Experience working with cloud infrastructure (AWS preferred).
- Experience operating production systems and responding to incidents.
- Experience with Kubernetes in production environments.
- Strong experience with Infrastructure-as-Code (Terraform or similar).
- Experience working with CI/CD pipelines and deployment automation.
- Experience with monitoring, logging, and observability tooling.
- Strong troubleshooting and debugging skills in distributed systems.
- Experience supporting data platforms or ML workloads in production environments.
- Strong collaboration and communication skills.
Preferred Experience
- Experience in large-scale global B2B/B2C products.
- Experience working with AI, ML, or data-intensive systems.
- Exposure to MLOps workflows (model deployment, monitoring, retraining).
- Experience supporting customer-hosted or multi-tenant environments.
- Experience working in regulated or security-conscious environments.
- Experience scaling infrastructure in high-growth product environments.
- Experience in collaborating with large scale enterprise customers to deploy and operate environments within their accounts and VPCs.
Personal Characteristics
- Strong sense of ownership and accountability for systems in production.
- Practical and hands-on approach to solving operational problems.
- Calm and methodical during incidents and troubleshooting situations.
- Strong collaborator who works effectively across engineering teams.
- Curious and eager to continuously improve systems and processes.
- Bias toward automation and reducing manual operational work.
- Comfortable working in fast-moving, evolving technical environments.