Principal Site Reliability Engineer
At Shutterfly, we make life’s experiences unforgettable. We believe there is extraordinary power in the self-expression. That’s why our family of brands helps customers create products and capture moments that reflect who they uniquely are.
What You’ll Do Here:
Shutterfly is looking for a Principal Site Reliability Engineer to help us build a brand new SRE team.
Initially, our SRE team will focus on automation and instrumentation. Over time, the team will support a broad spectrum of platforms and systems. The automation related work will focus on Shutterfly's cloud platform and expand to other areas as opportunities present themselves. Shutterfly uses several homegrown and commercial monitoring tools, so the SRE team will leverage best practices to identify gaps and opportunities within our monitoring tool portfolio. The SRE team will also work closely with our Service Engineering and Cloud Platform teams to achieve automation and instrumentation goals. Finally, the team will play a crucial role in Shutterfly's RCA process by leading discussions and implementing changes to prevent future outages. SRE’s should have an inherent curiosity to understand how applications and platforms work and continually stay in front of emerging technologies. SRE’s should have an inherent ability to work with a broad range of individuals in an organization and have the ability to work with Executive Management, Product Owners, Engineering, and Support personnel.
- Developing and maintaining monitoring and alerting systems to quickly detect and respond to problems
- Incident management process, tooling, and automation. (runbooks, dashboards, alerting, engagement, etc.)
- Automating routine operations tasks to reduce manual intervention and improve efficiency
- Participating in on-call rotations to respond to incidents and ensure system availability
- Developing and implementing performance testing and capacity planning strategies to ensure systems can handle expected loads
- The ideal candidate for an SRE role should have a strong background in software engineering and operations, with a deep understanding of distributed systems, networking, and cloud technologies. They should also have excellent communication skills, as they will often be required to collaborate across teams to identify and address issues.
The Skills You’ll Bring:
- The ability to break down complex problems into solvable components
- Amazon Web Services, Microsoft Azure, GCP
- Terraform or Cloudformation
- Experience as an SRE, Software Engineer, or Production Engineer.
- Experience with log aggregation solutions: Splunk, ELK, SumoLogic
- Experience with metrics monitoring platforms: SignalFX, Datadog, Dynatrace, AppDynamics or other enterprise APM
- Strong desire to learn and grow
- Strong interest in SRE topics like SLIs, SLOs, resilience, scaling, and performance
It is helpful, but not required to have:
- Experience supporting large scale distributed systems
- Experience with infrastructure configuration and automation tools: Terraform, Puppet, Ansible
- Good working knowledge of build automation and continuous integration/delivery ecosystem: Git, Gerrit, Maven/Gradle, Jenkins, Docker, Nexus, Artifactory, Selenium.
- Experience with security in the cloud: Intrusion, penetration, and vulnerability scanning
Supporting a diverse and inclusive workforce is important to Shutterfly not only because it directly reflects our value of Embracing our Differences, but also because it’s the right thing to do for our business and for our people. Learn more about our commitment to Diversity, Equity and Inclusion at Shutterfly DE&I.
The compensation package for this role is based on multiple factors, such as job level, responsibilities,
location, and candidate experience. The base pay ranges included below are specific to the locations
listed, and may not be applicable to other locations.
California : [$119,400-169,750]
Connecticut, New York, and Rhode Island: [$119,400-155,800]
Colorado and Washington: [$119,400-143,800]
Nevada: [$112,100 -155,800]