Site Reliability Engineer, Observability
Diligent is the world’s largest GRC SaaS provider, serving nearly 1 million users from 25,000 organizations around the world. Our software enables holistic and informed conversations about governance, risk and compliance and ensures CEOs, CFOs and the board have an integrated view of audit, risk, information security, ethics and compliance from across the organization. Our world-changing idea is to bring technology, insights and confidence to leaders so they can build more effective, equitable, and successful organizations – and create lasting, positive impact on the world. We seek to empower organizations to be better for their stakeholders and communities, for their customers and employees, for their bottom line. Headquartered in New York, Diligent also has offices in Washington D.C., London, Galway, Budapest, Vancouver, Bengaluru, Munich, and Sydney.
Diligent is looking for a Site Reliability Engineer on the Observability team. The Observability team manages the platforms that provide observability across a large number of applications for a global organization. The platform includes tools such as SignalFX, Site24x7, ELK stack, Telegraf, Grafana, Terraform, Ansible, and more. The ideal candidate will be able to learn our toolset and assist in the onboarding of applications into platform through automation. The need to build and support infrastructure that supports our platform in all environments from development through to production will be essential. As well as effectively communicating, both verbally and through documentation.
- Development of new capabilities interacting with the APIs of the tools we use.
- Configure and optimize the observability platforms to streamline metric gathering, monitoring, alerting and issue identification.
- Automate manual tasks (existing or new features)
- Educate organization of the use of the observability platform, it’s exiting features and new features as they rollout.
- Documentation (architecture, overviews, troubleshooting steps)
Required Experience/ Skills:
- 3+ years experience working in any combination of a Linux/Windows/Kubernetes/Docker environment
- Experience scripting/coding in one or more of following languages: Python, Go, Bash, Powershell
- 3+ years experience identifying and automating manual tasks
- Experience with Git
- Good communication skills
Nice to Have:
- Terraform – writing modules and/or implementing
- Ansible experience
- SignalFX or equivalent
- Site24x7 or equivalent
- Opsgenie or equivalent