Senior Cloud Site Reliability Engineer
For over 10 years, Zscaler has been disrupting and transforming the security industry. Our 100% purpose-built cloud platform delivers the entire gateway security stack as a service through 150 global data centers to securely connect users to their applications, regardless of device, location, or network in over 185 countries protecting over 3,900 companies and have detected 100 Million threats/day.
We work in a fast-paced, dynamic, and make it happen culture. Our people are some of the brightest and passionate in the industry that thrives on being the first to solve problems. We are always looking to hire highly passionate, collaborative, and humble people that want to make a difference.
Position: Senior Cloud Site Reliability Engineer
Location: Pune / Bangalore, India
The Senior Cloud SRE will report to the Director, CSPM Engineering. Zscaler is looking for an exceptional Site Reliability Engineer to maintain and continually improve our cloud-based applications. Site Reliability Engineering (SRE) is what you get when you treat system operations as a software engineering problem. The mission of the Site Reliability Engineer is to ensure uninterrupted service for Zscaler CSPM customers and act as a force multiplier for CSPM product teams to deliver stable, scalable, better software and to continue our focus in providing our customers the highest quality product experience.
The SRE team builds foundational backend services as well as tooling and automation to allow product teams to release and scale their software reliably and predictably. SREs are team players who embed themselves within product teams as needed to advance the architecture and performance of software systems and train their peers in topics such as troubleshooting, monitoring, building self-healing applications, and ensuring availability to fuel the company’s growth.
If you love designing, engineering, and running systems and infrastructure that will help millions of customers, then this is the place for you.
Responsibilities/What You’ll Do:
- Own cloud service provider issues and actively communicate on the current status and resolutions
- Own the overall health and site reliability of the Production Environments
- Collaborate with the DevOps engineers to optimize deployment practices and to ensure a highly resilient deployment strategy, ideally with zero downtime.
- Work with development teams to help engineer scalable, reliable, and resilient software running on multi-cloud environments
- Ensure that systems are running at scale and any indicators of instability are dealt with at priority.
- Improve and address concerns related to peak demand availability and latency.
- Increase the effectiveness, reliability, and performance of SaaS platform by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results.
- Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting, and root cause analysis
- Maintain traceability matrix from our customer issues to cloud service provider issues.
- Creating and utilizing tools to monitor our applications and services in the cloud including system health indicators, trend identification, and anomaly detection.
- Providing analytics and forecasts for cloud capacity, troubleshooting analysis, and uptime
- Work with our customer support team to ensure customer SLAs are being met.
- Implement security best practices to keep the environment safe and secure.
- Participate in on-call rotation, at times of incidents, lead incident response and be part of writing detailed post-mortem analysis reports which are brutally honest with no-blame.
- Technical degree (Computer Science or Math) or equivalent professional experience
- 5+ years of experience in managing SaaS platform at large scale and high availability configuration with strong Cloud SRE and Dev Ops Experience
- Expert levels on dealing with Cloud deployments, Zero downtime scenarios of one or many cloud providers (AWS, Azure, GCP)
- Experience on site reliability of the Production Environments
- Strong sense of ownership and integrity demonstrated through clear communication and collaboration
- Demonstrate a solid understanding of development, debugging, administration and automation frameworks: PowerShell, Python, Azure ARM Templates, CloudFormation templates, GIT and Visual Studio
- Strong communication skills – must be able to communicate effectively with technical staff and executives
- Ability to think strategically about business, product, and technical challenges
- Excellent troubleshooting and problem-solving skills
- Experience with scale testing, disaster recovery, and capacity planning
- Strong operations background understanding concepts such as alerting, monitoring, logging, and incident management.
- Hands-on experience leading the design, development, and deployment of business software at scale or current hands-on technology infrastructure, network, compute, storage, and virtualization experience
- Familiarity with Cloud SaaS Multitenant microservices architecture
- Experience working with Git and Visual Studio.
- Proficient in one or more cloud providers, i.e. AWS, Azure, GCP
- Experience with automation and configuration management.
- Zscaler is the world’s leading software-as-a-service security platform
- We deliver best of breed security services with unprecedented scale
- 100 Million threats detected a day across 185+ countries
- Glassdoor rating of 4.7/5.0 + 98% CEO Approval = Exceptional place to work!
People who excel at Zscaler are smart, motivated, and share our values. Ask yourself: Do you want to team with the best talent in the industry? Do you want to work on disruptive technology? Do you thrive in a fluid work environment? Do you appreciate a company culture that enables individual and group success and celebrates achievement? If you said yes, we’d love to talk to you about joining our award-winning team.
Learn more at zscaler.com or follow us on Twitter @zscaler. Additional information about Zscaler (NASDAQ: ZS ) is available at http://www.zscaler.com. All qualified applicants will receive consideration for employment without regard to race, sex, color, religion, sexual orientation, gender identity, national origin, protected veteran status, or on the basis of disability.