Careers
We are a next generation collaboration platform that unites AI agents and humans to revolutionize the way enterprises work, innovate, and thrive.
Our Values
Current Openings
Head of SRE
About The Position
Build the Future Workforce
Wand turns AI into labor. It enables humans and AI agents to operate together as a unified, hybrid workforce, with comprehensive management and oversight. And it’s already operating at scale inside some of the world’s largest organizations.
Wand built the world’s first Agentic Labor Infrastructure enabling governments and global enterprises to create, manage, and scale digital workforces.
Our mission is to integrate agent ecosystems into the core of work and business, unlocking a generational leap in the global economy. We’re building the infrastructure that lets humans and AI agents operate together safely, transparently, and at scale.
Join Wand in leading the Agentic Shift
Wand is building a high-performing global team who take full ownership of what they build. We lead by example, move fast, make data-aware decisions, and continuously push for more- always with a focus on delivering real value to customers.
You would be joining a world-class team that combines deep research expertise and real-world product execution, with experience spanning Deepmind, Google, Amazon, Miro, Elise AI, IBM and Accern.
Requirements
Position Summary
We are hiring for a hands-on Head of SRE to establish, lead, and scale our Site Reliability Engineering function. This role combines strategic ownership with deep technical execution.
You will be responsible for defining reliability standards, building operational processes, and ensuring production stability, while actively architecting infrastructure, improving automation, and embedding SRE best practices across the engineering organisation.
There is significant scope to review, improve, and rebuild our systems, infrastructure and processes where necessary. You will be instrumental in designing, developing, and maintaining scalable backend systems, ensuring our AI products meet the highest standards.
You will also become part of the product-engineering leadership team, contribute to scaling the organization, and report directly to the CPTO.
Responsibilities
- Own and lead all SRE-related strategy, standards, and execution. Embed SRE culture and operational excellence across engineering teams.
- Review the current infrastructure and operational model; redesign and rebuild where needed.
- Architect, deploy, and maintain scalable, secure production environments.
- Define and implement SLIs, SLOs, and uptime targets.
- Establish robust monitoring, alerting, and observability practices.
- Design and implement incident management, RCA and postmortem processes.
- Build and manage sustainable on-call frameworks and escalation models.
- Automate the software delivery lifecycle to improve release predictability and safety.
- Create reproducible environments and IaaC provisioning templates.
- Improve system performance, availability, and reliability.
- Support and productionise data platforms and ML workloads.
- Partner closely with QA and Engineering leadership to improve release quality and stability.
- Ensure infrastructure meets enterprise-grade security and regulatory requirements.
- Hire, manage, and mentor a team of SRE engineers.
Key Requirements
- Proven hands-on experience in Site Reliability Engineering, Production Engineering, or a similar role.
- Strong hands-on expertise in cloud infrastructure (AWS or Azure preferred), IaaC (Terraform) and Kubernetes.
- Experience building or maturing SRE practices within an organisation.
- Demonstrated ability to improve uptime, reliability, and operational processes.
- Deep understanding of CI/CD, dev exp, infrastructure-as-code, and automation.
- Experience designing on-call processes and incident response frameworks.
- Experience managing at least one team of SRE engineers.
- Strong communication skills, with the ability to influence across teams.
- Experience supporting data platforms and ML systems in production environments.
- MLOps experience (model deployment, monitoring, retraining workflows).
Preferred Experience
- Background in large-scale global B2B/B2C products.
- Background in enterprise environments with security and compliance requirements.
- Expertise in ML, AI, LLMs.
- Experience implementing regulatory controls within cloud infrastructure.
- Experience evaluating and managing infrastructure vendors and tooling.
- Experience scaling systems in high-growth environments.
- Experience in collaborating with large scale enterprise customers to deploy and operate environments within their accounts and VPCs.
Personal Characteristics
- Practical and hands-on; willing to lead from the front.
- Strong operational mindset with clear opinions on best practices.
- Structured thinker who can build processes from ambiguity.
- High ownership mentality and accountability.
- Learning-oriented with a continuous improvement mindset.
- Excellent communication and interpersonal skills.
- Continuous drive for improvement and innovation.