Site Reliability Engineer
Description
Role & Responsibilities:
- Build and Maintain the multiple Data analysis related work and support the data need for aggregation product.
- Creation of automation tools around data Analysis, infra monitoring, Data Monitoring
- Be a champion of best practices and help develop the skills of less experienced or differently skilled engineers within our team and in the broader SRE organization.
- Share operational burden with other teams, assisting in efforts, troubleshooting, incident response and proactive engineering efforts.
- Work within industry standard team collaboration and development tools (i.e. issue tracking, source code management etc.)
- Collaboration with supporting teams (cloud engineers, database administrators, network engineers, security office, and more) to maintain production systems
- Collaborate with Software Engineering teams to optimize the availability, reliability, and performance of production services/data related issue.
Technical Skills:
- 2-5 years of developing or managing services in a distributed, internet-scale, production environment.
- Deep knowledge and experience with sql,plsql, Splunk and good knowledge on UI development.
- Working knowledge of monitoring and observability.
- Strong knowledge scripting languages.(Shell scripting/Python)
- Strong Debugging skills
- Monitoring Tools: Splunk, DataDog
- Strong knowledge on troubleshooting skills of Linux/UNIX Strong Linux operating systems experience end to end Prior working experience.