Cloud Operations Incident Commander

Engineering Krakow, Poland


Join us as we pursue our disruptive new vision to make machine data accessible, usable and valuable to everyone. We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we’re committed to our work, customers, having fun and most importantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!

Role

As a member of the Splunk Cloud Network Operations Center (CNOC) Incident Command Team, you will be responsible for leading the response to high profile customer-impacting incidents. In this role, you'll be part of a team of global Incident Commanders responsible for handling high severity customer incidents. This is a senior role at Splunk requiring an individual who can take charge in high-stress situations and give direction to Splunk incident response teams to drive expeditious resolution of incidents. We are looking for a natural leader with proven knowledge of incident management frameworks, a demonstrable understanding of distributed systems environments and the ability to communicate clearly and effectively to technical and business audiences. It is through the successful leading and facilitation of incident triage and remediation activities, this leader should consistently demonstrate the effective communication, delegation, and prioritization skills.

Responsibilities

  • Take command of incidents by setting up or taking over a cross-functional technical bridge call, comprised of internal teams.
  • Work with SME’s to interpret key metrics from monitoring tools and facilitate a discussion aimed at building an incident action plan (and a backup plan if appropriate)
  • Ensure that the partners have a deep understanding of the issue, the action plan and the path to resolution
  • Accountable for the efficient and effective execution of the Incident Management process ensure participants understand their role and responsibility in the process
  • Set clear incident resolution objectives (exit criteria) and timings.
  • Provide direction and time management and keep the resolution effort on track and moving forward
  • Drive the technical root cause analysis process by crafting the correct technical teams and driving the technical remediation plan
  • Operate as part of a 24X7 global team of Incident Commanders and ensure perfect handover of critical issues to other regions
  • Actively participate and drive incremental improvements to our Incident Runbooks through process creation, tool building and participating in post-incident reviews
  • Ensure internal readiness at all times by leading training sessions, simulations and drills

Requirements

  • Extensive experience in escalations management or technical support for an enterprise software company
  • Strong leadership skills
  • Knowledge of escalations management frameworks (eg. ITIL)
  • Demonstrable understanding of distributed systems concepts
  • Ability to work multi-functionally and to influence and execute across groups
  • Strong financial and business sense, critical thinking, decision-making abilities
  • Good interpersonal skills, both verbal and written.
  • Executive presentation skills.
  • Work well in a dynamic changing environment and is comfortable with ambiguity.
  • Not afraid of conflict, can influence at all levels and can work in stressful situations.

Does this sound like you? We would love to hear from you!

We value diversity at our company. All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, or any other applicable legally protected characteristics in the location in which the candidate is applying.

For job positions in San Francisco, CA, and other locations where required, we will consider for employment qualified applicants with arrest and conviction records.

Thank you for your interest in Splunk!