At Nutanix, we’re passionate about building software and delivering highly available Enterprise Cloud services. The Nutanix suite of services and Enterprise Cloud Platform blends whiteglove support, web-scale engineering, and consumer-grade design to natively converge server, storage, virtualization and networking into a resilient, software-defined solution with rich machine intelligence for customers. Today, our industry-leading Enterprise Cloud platform is enabling thousands of companies to run their apps and workloads with unparalleled performance in whichever cloud makes sense—private, public, or edge—knowing it’s working seamlessly with minimal intervention. Learn more at www.nutanix.com or follow up on Twitter @nutanix.
Founded in 2009, and selling our first product in 2011, Nutanix is now publicly listed (NASDAQ: NTNX) with FY2019 revenues of over $1.2 Billion. Nutanix has more than 5300 employees in over 50 countries and is growing rapidly.
At Nutanix we believe that our Cloud Engineers are the beating heart and soul of our superb service levels, our industry leading customer satisfaction, and core to our continued growth. We empower our Cloud Engineers to deliver and manage our services with high availability and stellar performance levels at cloud-speed, and at cloud-scale! As we rapidly extend our customer base with an expanding suite of cloud services, we are currently seeking high energy and experienced Reliability Engineers.
Specifically, we are searching for someone who has enthusiasm for cloud services, brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to develop real-world solutions and positive customer experiences. We seek individuals who constantly seek out ways to improve services, design solutions and automate responses to events.
- Run the delivery of services via the production environment through effective monitoring and by taking a holistic end-to-end perspective of system health across the global live site environment.
- Drive efficiency and take ownership of the end-to-end workflow, response time, relief time and long-term resolution to each incident that impacts, degrades, or otherwise affects our customers or the underlying infrastructure or application
- Initiate and lead cross-team technical and troubleshooting bridges for complex and impacting incidents to drive immediate customer resolution.
- Lead and drive forensic investigations, and root cause eradication. Identify short and long-term action items to mitigate and eliminate faults.
- Build software and systems that manage platform infrastructure and applications.
- Improve reliability, quality, and time-to-market across our suite of software solutions
- Measure service performance and build analytics with an eye toward pushing our capabilities forward, getting ahead of customer needs, and innovating to continually improve
- Provide primary operational oversight and direction for multiple large distributed software applications, across internationally located datacenters hosting Nutanix infrastructure.
- Analyze and respond to alerts in real-time, promote (or create automation rules to promote) to alerts to incidents and drive immediate relief to high priority issues.
- Drive and lead Major Incident bridges to quickly resolve complex, high impacting or highly visible incidents, as lead Crisis Manager.
- Author and edit knowledge base articles for frequent symptoms and alerts. Automate common response actions.
- Regularly review, tune and regulate alerts from disparate systems. Author event rules to build hierarchies, correlate across configuration items and reduce noise to ensure fidelity of signal, and subsequent service-levels
- Gather and analyze metrics from both operating systems and applications to assist in performance tuning and fault finding
- Collaborate across Engineering and Development teams for Security, Disaster Recovery, Virtual Desktop, Desktop as a Service, Hybrid Cloud and Distributed Storage requirements and buildout as we add and extend services.
- Partner with development teams to improve services through rigorous testing and release procedures
- Participate in system design consulting, platform management, and capacity planning
- Create sustainable systems and services through automation and orchestration
- Balance feature development speed and reliability with well-defined service level objectives
- Act as gatekeeper to ensure rigor such that no planned changes are permitted during a service impacting event on a common configuration item.
- Will work on a 24x7 work environment, which includes weekday and weekend shift rotation in each geo.
Required Skills and Qualifications
- A proactive approach to spotting problems, areas for improvement, and performance bottlenecks
- Familiarity with Core internet services (DNS, FTP, SMTP, TCP/UDP, Database technologies, CDN, Hypervisor, Storage, VPN, Storage and Application servers).
- Experience with distributed storage technologies like NFS, HDFS, Ceph, S3 as well as dynamic resource management frameworks (Mesos, Kubernetes, Yarn)
- Experience with cloud-based incident, change and problem management processes and familiarity with cloud speed.
- Direct experience and enthusiasm for working in an interrupt-driven environment.
- Bachelor degree in Computer Science, Information Technology or technical or scientific discipline considered.
- Bachelors or masters in Engineering
- Previous success in engineering and service support
- Previous experience with monitoring and mgmt. tools like Prometheus, ELK, Grafana
- Coding experience beyond simple scripts
- Previous experience in large scale or hyper growth cloud environments
Nutanix is an equal opportunity employer.
The Equal Employment Opportunity Policy is to provide fair and equal employment opportunity for all associates and job applicants regardless of race, color, religion, national origin, gender, sexual orientation, age, marital status, or disability. Nutanix hires and promotes individuals solely on the basis of their qualifications for the job to be filled.
Nutanix believes that associates should be provided with a working environment that enables each associate to be productive and to work to the best of his or her ability. We do not condone or tolerate an atmosphere of intimidation or harassment based on race, color, religion, national origin, gender, sexual orientation, age, marital status or disability.
We expect and require the cooperation of all associates in maintaining a discrimination and harassment-free atmosphere.