• Salary: £50,000 to £72,200 plus up to £5K allowance depending on skills and experience

  • Job type: Permanent

  • Grade: Senior- G7 / Lead - G6

  • Number of open roles: 6

  • Hours: 37 hours (42 hours incl.1 hours lunch)

  • Working pattern: Full-time, Part-time, Flexible working

  • Closing date for applications: 31 August 2021

Lead & Senior Site Reliability Engineers / WebOps Engineers

Location: London or National hub(commute to London office up to 8 days a month)

The role 

Our Site Reliability Engineers (SRE) ensure that all our services, facing the Internet's erratic nature, are just working and are responsive.  

The journey starts from ensuring that the development teams have the right tools to do their day-to-day job, including APM, exception aggregation, log aggregation, autoscaling, performance metrics aggregation, dashboards, software-defined CD/CI pipelines, etc. 

You'll evaluate technology and solve cross-service problems and evangelise product teams about Service Level Indicators, Objectives, Error Budgets and negotiate them. You'll be representing the modern impatient Internet user.  

As part of a small core team that sets the DIT's stack direction, you'll work with project teams within digital, data, technology to help build and scale the global platform our product sits on.   

We're looking for several Senior SREs and 1 person Lead SRE to lead the team.  As a senior SRE, you'll be responsible for planning and designing large groups of stories.  A lead SRE, you'll act as a technical product owner for multiple SRE services. You will contribute to the development of the strategic direction of the DIT technology stack. 

If you’re currently an experienced senior developer, who has just moved into and/or seeking to make into, a move into a SRE role, please do apply. 

Work from any of our Hubs across the UK.


  • Ensure service reliability 
  • Work at least 50% of time developing / maintaining code 
  • Negotiate and measure Service Level Objectives for supported projects, and help achieving them 
  • Make sure deployment strategies for products are repeatable, scalable and highly available  
  • Provide your deep technical knowledge, providing support to delivery teams and solving complex problems  
  • Be involved in day-to-day service desk support 
  • Help grow our teams 
  • Coaching and mentoring to more junior colleague 
Our Tech Stack includes
  • GOV.UK PaaS (Cloud Foundry) 
  • PostgreSQL 
  • Sentry 
  • Elasticsearch 
  • Redis 
  • Python 
  • Jenkins 
  • Airflow 
  • Terraform & Cloud Formation 
  • K8s 

You’ll have 

  • Demonstrable experience and fluency in Python, writing clean and effective code  
  • Ability to build code defined, reliable and well tested infrastructure on top of cloud computing systems (Terraform, Cloud Formation) 
  • Experience in designing, analysing, and troubleshooting distributed systems  
  • Knowledge of Unix fundamentals and TCP/IP networking 
  • Ability to see user impact in the infrastructure changes 
  • Experience in defining and measuring Service Level Objectives  
  • Experience in observability driven development 
  • Experience in prototyping through reuse of existing Open Source components 
Desirable skills 

GOV.UK PaaS, KubernetesDjangooauth2/saml2 integrations in pythongevent, Python asyncioGCP, AWS, Azure. 

For your application, we would like to see your cv and a letter or personal statement outlining your experience and skills and fit for the role.  

During the CV review and interview stages we use Success Profiles, a flexible framework, to assess applicants against a range of elements giving you the opportunity to demonstrate these elements required to be successful in the role.  

At the interview for this role we assess your technical/specialist experience, outlined in the above role description, testing your ability through relevant assessments and ask you questions around behaviours. 

The behaviours we assess are: 

  • Communicating and influencing 
  • Developing self and others 
  • Changing and improving 
  • Making effective decisions