Director, Automation Engineering

Internet Operations Redwood City, California


Description

Shutterfly is looking for a Director to manage and expand our Site Reliability Engineering (SRE) team.  Based in our Redwood City (flexible) office and reporting to the Senior Director of Internet Operations you will be responsible for our Load and Performance program, site monitoring (infrastructure and synthetic), site performance improvements, and implementing and improving automation across the stack.  Your team is growing and the SRE area is a new role at the company.  You will be a key leader in setting up this new team, defining its operating processes and onboarding new team members.  While your initial focus is on our consumer site (shutterfly.com) you will be leveraging the work you do in that space across our enterprise and our brands.  You will work very closely with both our development teams and infrastructure teams with a maniacal focus on improving our website availability and performance and as such delighting our customers when they shop with us.  You are a strong believer in KPI’s and metrics and you use the data you gather to both improve your own teams performance and to “make your case” when needed to convince others to take action.  You’ve been around technology most of your career and while you may not be a hands-on expert, you are eager to learn how things work and know the right questions to ask when things aren’t working as expected. 

 

Key Accountabilities

  • Oversee a multifaceted team of engineers who are responsible Site Reliability, load and performance testing, site performance improvements and site monitoring.
  • Define, implement or improve associated processes within these areas.
  • Define, implement and/or improve metrics and KPI’s within these areas.
  • Develop and implement reporting that provides tracking, pattern recognition (common problem categories, resolution techniques) and measures the effectiveness of SRE team.
  • Advocate for and drive adoption of the SRE concept across the enterprise beginning with shutterfly.com and leveraging those lessons t then onboard additional areas of the company.
  • Nurture a close working partnership with development teams and our infrastructure teams to help ensure that we have clear engagement processes and adequate instrumentation in place to properly support our portfolio of systems.
  • Oversee a growing team of engineers. Establish goals, conduct training, perform annual reviews, conduct team meeting and generally provide the needed guidance and support to your team.
  • Lead by example to encourage a culture of customer focus, flexibility and improvement

Skills

  • Clear and concise written and oral communication skills
  • Ability to bring clarity to ambiguous situations
  • Excellent problem-solving skills
  • Experience developing, implementing and tuning processes over a range of areas
  • Strong organizational and time-management skills

Education

  • MBA or BA/BS degree (In lieu of degree, 7+ years relevant work experience)

Required

  • 5+ years general technology experience across a broad range of technologies and ideally in a large ecommerce organization
  • 5+ years of management experience
  • 4+ years’ experience managing an SRE or production support team
  • 4+ years’ experience managing load and performance testing against an ecommerce platform
  • 4+ years of experience utilizing a range of monitoring tools in a production environment with a focus on infrastructure tools
  • 3+ years leading engineers focused on site performance improvements
  • 3+ years contract and budget management experience
  • 2+ years’ experience with process analysis, development and implementation

Desirable

  • Neustar, Splunk, Nagios, SignalFX experience
  • Willing to travel up to 5%
  • Experience working with and on AWS cloud