Manager, SRE
Description
|
|
WHO WE ARE
Choreograph is WPP’s global data products and technology company. We’re on a mission to transform marketing by building the fastest, most connected data platform that bridges marketing strategy to scaled activation.
We work with agencies and clients to transform the value of data by bringing together technology, data and analytics capabilities. We deliver this through the Open Media Studio, an AI-enabled media and data platform for the next era of advertising.
We’re endlessly curious. Our team of thinkers, builders, creators and problem solvers are over 1,000 strong, across 20 markets around the world.
WHO WE ARE LOOKING FOR
We are seeking an experienced and motivated Manager, Site Reliability Engineering (SRE) to lead and grow our team of SREs. This role is critical in ensuring the reliability, scalability, and performance of our systems and applications. As a Manager of SRE, you will collaborate closely with engineering, product, and operations teams to design, build, and maintain highly available and resilient infrastructure. You will drive best practices, mentor team members, and lead efforts to continuously improve system reliability and operational efficiency.
(Please note this is a UK based role and requires individuals to have the right to work in this location)
WHAT YOU WILL DO
- Recruit, mentor, and retain top SRE talent.
- Provide guidance and technical leadership to the SRE team.
- Foster a culture of ownership, collaboration, and continuous learning.
- Manage team performance, set clear goals, and conduct regular performance reviews.
- Define and implement reliability standards.
- Develop and improve incident management processes in alignment with engineering support, ensuring effective resolution and root cause analysis.
- Drive proactive monitoring, alerting, and automation to minimize downtime and improve system reliability.
- Lead efforts to eliminate single points of failure.
- Collaborate with DevOps practice to ensure best practices to accelerate delivery.
- Drive post-incident reviews and implement preventative measures.
WHAT YOU WILL NEED
- Proven experience in SRE, DevOps, or related roles, with some experience in a leadership or managerial position.
- Strong knowledge of cloud platforms (AWS, GCP, Azure) and modern infrastructure technologies (Kubernetes, Docker, Terraform).
- Expertise in monitoring, logging, and observability tools (e.g., Prometheus, Grafana, Datadog, Splunk).
- Proficiency in at least one programming or scripting language (e.g., Python, Go, Bash).
- Deep understanding of networking, databases, and distributed systems.
- Strong communication, collaboration, and problem-solving skills.
- Experience with incident response, on-call rotations, and post-mortem processes.
If you are ready to be at the forefront of the AdTech industry, shaping its future, and driving success for both Choreograph and our clients, we encourage you to apply and join our team.
Choreograph is the beating heart of data inside WPP’s media investment group, GroupM, the world’s leading media investment company responsible for more than $60 billion in annual media investment. Discover more about Choreograph at www.choreograph.com
GroupM and all its affiliates embrace and celebrate diversity, inclusivity, and equal opportunity. We are committed to building a team that represents a variety of backgrounds, perspectives, and skills. We are a worldwide media agency network that represents global clients. The more inclusive we are, the more great work we can create together
#LI-Promoted