Site Reliability Delivery Engineer

Production Operations Budapest, Hungary


Description

Position at Diligent Corporation

The Company

Diligent is the pioneer in modern governance. We empower leaders to turn governance into a competitive advantage through unparalleled insight and highly secure, integrated SaaS applications, helping organizations thrive and endure in today’s complex, global landscape. The largest global network of corporate directors and executives, Diligent is relied on by more than 19,000 organizations and nearly 700,000 leaders in over 90 countries. With award-winning customer service, Diligent serves more than 50% of the Fortune 1000, 70% of the FTSE 100, and 65% of the ASX. Our passionate, smart, and creative group of more than 1,000 employees support customers around the globe.

 Position Overview

Diligent is looking for a Site Reliability Engineer, Delivery Automation. You will be responsible for system availability, infrastructure, and process automation. Your focus will be to reduce toil by measuring repetitive, manual tasks, and automating them where possible. Strong communication skills will be required as you will be working with multiple development teams to implement more efficient and reliable procedures. The ideal candidate will someone that is always looking to eliminate manual Customer Service requests.

 You Are

  • A team player with excellent communication and collaboration skills.
  • Always ready to learn and experiment on leading/innovative technologies.
  • Not afraid of change and passionate about process improvement.
  • Someone who cares about reliability, availability, security, and wants to develop autonomous solutions.

Key Responsibilities

  • Design, code, test, and deliver software and procedures to eliminate toil.
  • Define automated playbook that integrates with ServiceNow and our incident management systems.
  • Producing documents that outline the incident resolution process and the runbook requirements for the incident resolution workflow.
  • Leading incident response efforts.
  • Deploying new infrastructure with automation.
  • Defining and measuring SLIs, SLO and SLA.
  • Improving application performance and availability via log and metric analysis and anomaly detection.

Required Experience/Skills:

  • Software Engineering, SRE, or DevOps experience.
  • Knowledge of configuration management, release automation, and orchestration technologies such as Ansible, Terraform, Octopus, Jenkins, Puppet.
  • Good working experience for a wide range of technologies, frameworks, and platforms such as AWS/Azure Cloud, Windows, Linux, and Kubernetes,
  • HashiCorp Vault and database management  are a plus.
  • Hands-on experience with Python, PowerShell, .Net or Go.
  • Working knowledge (or willingness to learn) of automating infrastructure components (e.g. routers, load balancers, cloud products, container systems, compute, storage, and networks)
  • Participate in architectural sessions and provide solutions to complex problems
  • Strong problem solving, analytic, and time management skills.