Experience/ Background: 2-3 years in a role of Cloud SRE, Architect, Senior Support Expert for SaaS solution with the above-mentioned technologies. Should have certifications in 2 or more of the related technologies (AWS, CKA).
Qualification: BE/ BTech/ MCA
Technology Skills Required for SRE:
- AWS (must to have): VPC, EC2, Load balancers, Auto Scaling, EBS, Kinesis, S3, Lambda, Cloud Formation/ Terraform, CloudWatch, EKS, ECS, AWS Config, CloudFront, AWS WAF, RDS, Dynamo DB
- Hands on experience of cloud migrations
- In depth understanding of CI/CD Pipeline and hands on experience (Jenkins or Harness)
- Configuration Management Tools: Ansible
- OS: Primarily Linux and partly Windows Servers
- Scripting: Python, Bash and good to have Ruby, Powershell
- Monitoring: Monitoring tool like CloudWatch, DataDog, SumoLogic, Grafana, Prometheus, Application Insights, OMS, Logic Monitor, ELK, Sysdig, Nagios etc.
- Container Orchestration: Docker or Kubernetes (EKS)
Other Important Requirements: This role belongs to is expected 24x7 available (on call and availability in office) as it’s about managing global production and customer facing highly critical systems. Hence the individual should be flexible to adapt the roster/shift arrangements as required.
SRE Primary responsibilities include:
- Live with 24x7 available mind set for ensuring 99.97% uptime on customer platforms and applications.
- Excellent communication and teamwork skills in order to effectively relay information to other engineers and properly document their work
- Own, resolve and restore major technical issues to meet the uptime commitment. Expected to be available on-call any time (24x7) for
- Develop, deploy and continually improve the telemetry, monitoring and automation (self-heal, self-help, self-service) of the SaaS platform and the applications
- Ensure the Cloud Infrastructure, platform components and applications are secured and safeguarded via strong controls, monitoring and security incident management
- Own Root Cause Analysis of incidents end to end and demonstrate quantifiable technological, stability and process improvement of Customer Infrastructure, SaaS platforms and applications. Should have ability right post-mortem report for incident.
- Enable technology support teams, customers and business users by building and continually developing knowledge base driven by analysing practical usage/issues and related challenges.
- Will be highest level of Technical Escalation point and act as guide, coach and mentor for
- first and second level Application/Infrastructure support teams.
- Should be the bridge between Support and Product engineering teams and faces customers and business users as and when required proactively.
- Owns and drives the end to end technical resolution of critical incidents which might need
- involvement from multiple parties and ensures the right collaboration and communication is maintained to ultimately get the issue resolved fast paced through shortest and the most efficient path.