CNOC Incident Manager

Engineering San Francisco, California


It is the responsibility of the Splunk CNOC to monitor and resolve issues that affect the availability and performance of Splunk for our cloud customers. As the authority on our customer’s experience, the CNOC is the frontline of defense in making sure each of our customers has an extraordinaryexperience.

We’re looking for an experienced Incident Manager to join our team in supporting and monitoring our ever-expanding Cloud platform.

 

Responsibilities:

  • Provide 24/7 support & incident management.
  • Use the Splunk Incident Management System (SIMS) to restore normal service operations as quickly as possible to minimize the impact to business operations.
  • Assemble the incident management team which includes the incident owner, problem owner and other professionals in the specified area of expertise.
  • Establish accurate expectations form the escalating procedures to ensure customer satisfaction throughout the escalation process.
  • Track and manage incidents fully to ensure proper information is captured.
  • Create and run conference calls for diagnosis and remediation of incidents
  • Fully document and communicate incidents for shift-handoff reports.
  • Participate in Post Incident Reviews and make suggestions to prevent incident recurrence.
  • Build effective working relationships with cross-functional team members.
  • Make suggestions for incident process improvements and enhanced operational efficiencies.
  • During Business as Usual you will assist in alert management and runbook automation.

 

Qualifications:

  • You have 2+ years of experience in Systems Administration in a cloud environment
  • You have 1+ years in incident response and major incident management.
  • You have Unix/Linux systems experience with scripting experience in Shell, Perl or Python.
  • You are able to think out of the box and work on multiple tasks simultaneously while dynamically adjusting priority based on changing conditions.
  • US Citizenship required.
  • You enjoy problem solving and analyzing global scale distributed systems.
  • You are team-oriented with exceptional interpersonal and communication skills.
  • You are able to remain calm and collected in stressful situations, such as a major service outage
  • You have excellent problem-solving skills with a strong attention to detail.
  • You are willing to work weekends, overnight shifts and holidays with flexibility to work additional shifts on short notice.

Thank you for your interest in Splunk!