Site Reliability Engineer
Ekata, a Mastercard company, is the global standard in identity verification, providing businesses worldwide the ability to link any digital transaction to the human behind it. Our Ekata Identity Engine, the first and only of its kind, uses complex machine learning to combine features derived from the billions of transactions within our proprietary network and the data from our graph to deliver industry leading risk assessment solutions.
As the Site Reliability Engineer, you will manage our production environment, providing a highly available and scalable platform for Ekata to serve our customers. Our Infrastructure team provides a resource for Engineering to help diagnose production issues and provide guidance on improving the availability and performance of our applications. Our SRE will also develop systems, automation and tools to help make it easier for Engineering teams to deploy services in a fast, automated and reliable fashion.
As the Site Reliability Engineer, you will:
- Build, scale and support high availability Linux systems in a public cloud environment
- Develop and deploy tools and automation to replace manual tasks and improve efficiency
- Improve security practices and procedures within the Infrastructure team while providing guidance to the broader organization
- Manage Kubernetes clusters for container orchestration across multiple clouds
- Collaborate with Engineering to help them deploy systems that are highly available, secure, and performant
- Ensure methods are well defined for backing up critical data
- Participate in on-call rotation
- Manage load balancing platforms
- Manage security and availability monitoring for all services
- Maintain quality documentation for systems owned by the Infrastructure team
- Use monitoring tools to identify and resolve issues before they happen
- Help other teams troubleshoot and solve failures and performance problems
- Ensure security policies and procedures are consistently implemented to secure production data
- Participate in code reviews with the Infrastructure team
- Ensure the development and maintenance of standards and procedures that result in an environment compliant with information security policy
Our ideal Site Reliability Engineer will have:
- Experience working within one or more cloud environments, AWS, Azure, or Google
- Experience with Ansible, CloudFormation, Terraform, or other configuration management tools
- Proven skills with Linux or UNIX systems and related protocols/software with 3+ years’ experience
- A command of Linux systems including troubleshooting, memory management, tuning, I/O subsystem, and security
- Experience with Jenkins, ArgoCD or other CI/CD tools
- Programming aptitude in Ruby, Python, Go, etc.
- Experience with monitoring solutions such as Nagios, Prometheus, or Zabbix
- Working knowledge of database systems such as MySQL or PostgreSQL
- Experience with containers and orchestration platforms; Docker, Kubernetes
- Excellent written and spoken English skills
This position is located at our headquarters in Seattle, WA.
Unwavering in our pursuit of standardizing global identity data, we are approachable, real people that genuinely care about the success of those we partner with. With a commitment to service, innovation, and ownership, Ekata is a dynamic place to work for folks who want to make an impact on a global scale. We provide learning & development opportunities for each employee and promote work-life flexibility through self-managed time off. Headquartered in downtown Seattle, Ekata is growing internationally with offices in Budapest, Hungary, Amsterdam, and Singapore.
To learn more about the experience of working at Ekata, visit: https://ekata.com/careers/.
Ekata prides itself on celebrating diversity, inclusivity, and being an equal-opportunity employer.