Site Reliability Engineer
Role & Responsibilities:
- Build and Maintain the EKS Clusters, Lambda functions. This includes automatic failure remediation, application and systems deployment, capacity planning etc
- Develop and maintain Cloud Infrastructure (networking, load balancers, K8s clusters, EC2) through Infrastructure as Code and other automated approaches.
- Creation of automation tools around deployment, infra monitoring
- Application and Infrastructure testing (functional, performance, reliability, and failover)
- Be a champion of best practices and help develop the skills of less experienced or differently skilled engineers within our Cloud Engineering team and in the broader SRE organization.
- Share operational burden with other teams, assisting in deployments efforts, troubleshooting, incident response and proactive engineering efforts.
- Work within industry standard team collaboration and development tools (i.e. issue tracking, source code management etc.)
- Collaboration with supporting teams (cloud engineers, database administrators, network engineers, security office, and more) to develop production systems
- Collaborate with Software Engineering teams to optimize the availability, reliability, and performance of production services.
· 5-6 years of developing or managing services in a distributed, internet-scale, production environment.
· Deep knowledge and experience with containerization technologies (Docker, Kubernetes, Helm) and managed Kubernetes solutions (EKS, GKE, AKS).
· AWS experience: VPC,EC2,AUTO Scaling, ELB, SNS, Security, AWS services, EC2,RDS,EBS,S3,Route 53,CloudWatch,CloudFormation, Auto Scaling,IAM,
1) Sound knowledge in Kubernetes
2) In depth knowledge in Devops implementation
· Working knowledge of monitoring and observability (SLI/SLO/SLA methodology)
· Strong knowledge scripting languages.(Shell scripting ,Python)
· Strong Debugging skills
· Monitoring Tools: Splunk, Newrelic
· Strong knowledge on troubleshooting skills of Linux/UNIX Strong Linux operating systems experience end to end Prior working experience