Principal/Lead Site Reliability Engineer (Wildfire)

Engineering Santa Clara, California


Your Career

Palo Alto Networks has been rapidly moving towards the future where cloud-based applications are increasingly common. As a Site Reliability Engineer, you will develop the frameworks and pathways to help move our internal applications to microservices. You will be a critical link between engineering and the Infrastructure Platform, building Infrastructure as Code and working in partnership with the App developers to deploy the applications in GCP, AWS and data centers across the globe.

As a Principal member of the SRE team, you will work on producing mission-critical platforms, tools, and processes that will ensure the highest levels of availability and reliability of all our applications. We need creative and innovative problem solvers who can partner with our Application development teams to make their services more usable. Our SRE team is furnished with a standout opportunity to build tools, frameworks, and cloud platforms that will support our company's growth over the next decade. If you are a self-starter and jump on new ideas to make the platform more stable, secure and feature-rich, this is your new career.

Your Impact

  • Write automation code for provisioning and operating infrastructure at massive scale
  • Design, build and operate Cloud infrastructure to enable reliable and rapid deployment of microservices with effective monitoring and resilient operations
  • Work with development teams to make sure the applications are production ready, scalable and reliable from the grounds up
  • Identify and drive opportunities to improve automation for code deployment, management, and visibility of application services
  • Develop tools and framework to automate operational tasks, deployment of machines, services, applications
  • Establish end-to-end monitoring and alerting on all critical components of the application
  • Participate in the on-call rotation supporting the platform and or the production application
  • Directs root cause analysis of critical business and production issues
  • Develop and mentor other SREs on standard methodology from Infra orchestration and troubleshooting application service in production
  • Represent SRE in design reviews and work cross-functionally with Engineering teams on operational readiness
Your Experience
  • Expertise in configuration management with a framework such as Terraform, Helmchart, Salt, or Ansible
  • Experience in infrastructure engineering, DevOps, or Site Reliability
  • Expertise in Google cloud computing (GCP) and its related services
  • Strong experience with Linux
  • Proficiency with a programming language like Python and shell scripting to automate tasks
  • Familiarity with CI/CD pipeline, GitHub, Jenkins, Artifactory 
  • Subject matter expert of one or more technologies e.g. Pub/Sub, Elastic Search, Kafka, Hadoop, GCP or databases
  • Ability to diagnose and troubleshoot complex distributed systems handling high volume transactions
  • Strong fundamentals in HTTP including HTTP headers and web servers
  • Development experience with big data infrastructure is plus
  • Proficient in  Python, Golang, Java or Node.js is plus
  • Management experience on a small group with agile is plus
  • BS or MS in Computer Science, a related field, or equivalent professional experience
  • Excellent problem solving, critical thinking, communication, and teamwork skills
  • Excellent written and verbal communication, able to collaborate and rally support
  • Self-disciplined, self-managed, self-motivated and strong sense of ownership, urgency, and drive
  • Passion for automation and monitoring instrumentation in the code
  • Excellent interpersonal skills and the ability to work well in a team
  • Passionate to learn, understand, and dissect new technology stack quickly on own

The Team

Our engineering team is at the core of our products and connected directly to the mission of preventing cyberattacks. We are constantly innovating — challenging the way we, and the industry, think about cybersecurity. Our engineers don’t shy away from building products to solve problems no one has pursued before.

We define the industry instead of waiting for directions. We need individuals who feel comfortable in ambiguity, excited by the prospect of a challenge, and empowered by the unknown risks facing our everyday lives that are only enabled by a secure digital environment.

Our Commitment

We’re trailblazers that dream big, take risks, and challenge cybersecurity’s status quo. It’s simple: we can’t accomplish our mission without diverse teams innovating, together.

We are committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.