Cloud Operations Incident Commander

Engineering Sydney, Australia


Join us as we pursue our disruptive new vision to make machine data accessible, usable and valuable to everyone. 

We are a company filled with people who are passionate about our product and seek to deliver the best experience for our customers. At Splunk, we’re committed to our work, customers, having fun and most importantly to each other’s success. Learn more about Splunk careers and how you can become a part of our journey!

Diversity Matters, we harness our unique strengths to deliver the best outcomes. We work closely with several different parts of the organization to learn and educate each other, then we build the right solution for all parties; transparency is at our core. We have fun doing phenomenal work with great people. Come become one of us.

Want to work in a dynamic environment with the latest cloud technologies? Want to learn Splunk from the inside and grow your career in exciting ways? Splunk Cloud is looking for self-starting individuals to be a part of the Splunk>Cloud Network Operations Center (CNOC).  Splunk CNOC manages incidents that affect the availability and performance of Splunk>Cloud service for our customers Globally. The Splunk CNOC is an always-on / always-active team making sure that each of our customers has an outstanding experience.

We’re looking for an Incident Commander to join our team in supporting and supervising our ever-expanding Cloud platform.

Responsibilities:

  • Use the Splunk Incident Management System (SIMS) to restore normal service operations as quickly as possible to minimize impact to Customer business operations
  • Proactively respond to drops in microservice levels,and escalate as needed
  • Assemble the response team, which includes the incident owner, problem owner, and other professionals in the specified area of expertise
  • Establish accurate expectations from response procedures to ensure customer satisfaction throughout the process
  • Supervise, manage, and guide peers during incidents fully to ensure accurate information is captured
  • Assemble and lead conference calls for diagnosis and remediation of customer impacting outages
  • Craft clear and concise Problem Statements, Status Reports, and Final Summaries able to be easily understood by Engineers and Executives
  • Coordinate Unified Command for Multi-Bridge and breakout-room Incidents
  • Attend Customer Calls when needed
  • Provide Incident Commander responsibilities, run post incident reviews, and assigns and follows through with action plans
  • Keep vendors on track for deliverables to Splunk in 3rd-party incidents
  • Write Customer-facing communications in partnership with Customer Success
  • Develop positive, strong, and collaborative relationships with multiple cross-functional partners across Splunk to improve the team's efficiency and ability to deliver on sophisticated tasks that have broad impact
  • Lead process improvements and improved operational efficiencies
  • Participate in team meetings, discussions, leads trainings, as well as give and seek out feedback.

Qualifications:

  • You have 4+ years of major incident response and management experience
  • Knowledge of incident management frameworks (eg. ITIL)
  • You understand multi-functional teams and are able to speak across organizations to drive influence
  • You enjoy problem solving and analyzing global-scale distributed systems
  • You have outstanding interpersonal and communication skills
  • Don’t shy away from conflict, can influence at most levels and can work in stressful situations
  • You have a strong attention to detail
  • Participate in on-call rotations for some business use cases

Thank you for your interest in Splunk!