Junior Site Reliability Engineer [India]

Site Reliability Mumbai, India or Remote, India Bangalore, India Hyderabad, India


Description

Job Duties and Responsibilities:

  • Maintain and monitor our environments in a 24/7 rotation system
  • Develop and improve our monitoring systems, automate repetitive tasks
  • Cooperate with international teams
  • Identify and address performance challenges
  • Document and communicate progress on resolving issues

Must-Have:

  • 3 yrs of experience in the production environment
  • Working knowledge of Linux and networking
  • Track record of improving and maintaining monitoring tools (Icinga, Prometheus, Grafana, OpenTSDB)
  • Incident management skills - must be able to own, cooperate¬† and resolve large scale incidents under time pressure
  • Troubleshooting skills to hunt down the root causes of issues and persistence in preventing them from happening again
  • Experience handling large numbers of diverse systems with configuration management systems like Puppet, Ansible, Terraform
  • Knowledge of both self-hosted and cloud environments (preferably the Google Cloud Platform)
  • Ability to work effectively in a globally distributed team structure
  • Good English skills (B2+) to effectively communicate about technical matters

Nice to Have:

  • Coding experience in Python/Golang/Perl/Ruby
  • GCP certificate, CCNA certificate, RHCE¬† or equivalent
  • Kubernetes knowledge
  • Experience using CI/CD tools
  • Operational knowledge of the ELK stack