Site Reliability Engineer (multiple locations)
An exciting opportunity has come up in our Site Reliability Engineering team! We are looking for talented Engineers based in PT, CN, IN, UK or BR - people with a healthy obsession about automation, scalability and infrastructure reliability. As part of this team, you will be joining a group of high-skilled Engineers and automating observable and highly effective service availability strategies.
We use our own tools, built on top of state of the art technology, to setup and manage an infrastructure of thousands of cloud servers that support our production/live services and our teams of software engineers.
What you’ll do
- Engage in and improve the whole lifecycle of services—from inception and design through deployment, operation and refinement
- Support services before they go live through activities such as system design consulting, developing software platforms and frameworks, capacity planning and launch reviews
- Provide the necessary means so that teams can properly maintain services once they are live by measuring and monitoring availability, latency and overall system health
- Scale systems sustainably through mechanisms like automation, and evolve systems by pushing for changes that improve reliability and velocity
- Practice sustainable incident response and blameless postmortems.
Who you are
- Graduated or higher in Computer Science or other technical discipline, or related practical experience
- Experienced with algorithms, data structures, complexity analysis and software design
- Experienced in one or more of the following: Python, Go, Java or C
- Interested in designing, analyzing and troubleshooting large-scale distributed systems
- You have a systematic problem-solving approach, coupled with strong communication skills and a sense of ownership and drive
- You have a knack for identifying and addressing toil
- Able to debug and optimize code
- Proficient in English - written and verbal communication skills.