Site Reliability Engineering Manager
An exciting opportunity has come up in our Site Reliability Engineering team! We are looking for a talented Manager. As part of this team, you will be managing a group of high-skilled Site Reliability Engineers and automating observable and highly effective service availability strategies.
We use our own tools, built on top of state of the art technology, to setup and manage an infrastructure of thousands of cloud servers that support our production/live services and our teams of software engineers.
What you’ll do
- Lead a team of Software/Systems Engineers on infrastructure projects and be directly responsible for their uptime
- Setting SLIs, SLOs and SLAs
- Project planning and prioritization
- Improving the on-call process
- Opening up communication channels
- Improving service observability
- Own end-to-end availability and performance of key services and build automation to prevent problem recurrence
- Automate response to all non-exceptional service conditions.
Who you are
- Graduated or higher in Computer Science or other technical discipline, or related practical experience
- You have at least 3+ years of experience in a similar management role
- Experienced in managing an engineering team on projects with technical deep-dives into code, networking, operating systems and/or storage
- You are a compelling and clear communicator, able to represent your team to internal and external audiences with differing levels of technical fluency
- You have 2+ years experience in programming in one or more of Python, Go, Java or C
- Proficient in English - written and verbal communication skills.