Lead Site Reliability Engineer
Lead Site Reliability Engineer
At New Relic, we’re building and scaling one of the largest and fastest-growing data platforms in the world. Our systems record billions of new data points per minute and handle an order of magnitude more bytes per second than Twitter. We’re looking for a architect-level engineer with deep devops or SRE experience to play a key role in guiding out teams in delivering resilient, scalable, low-toil systems in production.
In this role, you’ll be responsible for mentoring teams in reliability best practices, influencing design decisions, and getting hands-on to tackle key infrastructure, cluster automation, and scaling challenges. You’ll contribute to conversations like how we evolve our growing multi-region footprint, how we manage a customer workload that touches over 300,000,000,000,000 data points per minute, and how we build a culture that manages technical debt without sacrificing velocity.
In addition to technical depth, great interpersonal skills and comfort in a fast-paced setting, this role calls for working with diverse stakeholders in an agile, highly collaborative environment.
On the New Relic team, you’ll work with some of the best engineers and tackle some of the most complex problems in the industry. Our massively-scaled cloud platform is comprised of hundreds of services and processes billions of data points every minute, and we invest heavily in our internal tooling and developer infrastructure. Culturally, we value teamwork, personal respect and autonomy, work/life balance, and innovation.
What You’ll Do
- Analyze complex distributed systems to identify scaling bottlenecks, risks, and operational pain points, and influence design decisions to address these issues.
- Mentor teams on reliability, scaling, and operational best practices like gamedays, monitoring and alerting, workload distribution, and fault tolerance.
- Partner with teams to provide hands-on support in diagnosing and resolving complex scaling and resilience issues.
- Communicate roadmaps, pain points, and priorities across technical and non-technical stakeholders to drive alignment and develop execution plans.
- Collaborate with other architects to define a reliability and scaling roadmap, and to influence the direction of our internal runtime platform and tooling.
- Help design and build the software that operates our software.
- 7+ years of experience writing software in Go, Java, Python or C
- Familiarity with distributed systems and/or data pipeline design patterns
- Experience working in a SaaS environment and/or with microservices architectures
- Experience with at least some of the following: Kubernetes, Mesos/Marathon, Kafka, and high-scale datastores
- Experience being part of an on-call rotation for production support of critical services
- Familiarity with Site Reliability Engineering best practices
- Experience working with multiple teams that each have a heterogenous mix of technologies
- A collaborative work style that includes colleagues in important decisions and leads to shared code ownership
- Great communication and interpersonal skills
- Streaming Data Processing experience
- An abiding intellectual curiosity and passion for solving puzzles
Please note that visa sponsorship is not available for this position.
New Relic (NYSE: NEWR) is the industry’s largest and most comprehensive cloud-based instrumentation platform built to create more perfect software. The world’s best software and DevOps teams rely on New Relic to move faster, make better decisions and create best-in-class digital experiences. If you run software, you need to run New Relic. We’re proudly trusted by more than 50% of the Fortune 100.
Founded in 2008, we’re a global company focused on building a culture where all employees feel a deep sense of belonging, where every ‘Relic’ can bring their whole self to work and feel supported and empowered to thrive. We’re consistently recognized as a distinguished employer and are committed to building world-class products and an award winning culture. For more information, visit newrelic.com.
Our Hiring Process
In compliance with applicable law, all persons hired will be required to verify identity and eligibility to work and to complete employment eligibility verification. Note: Our stewardship of the data of thousands of customers’ means that a criminal background check is required to join New Relic.
We will consider qualified applicants with arrest and conviction records based on individual circumstances and in accordance with applicable law including, but not limited to, the San Francisco Fair Chance Ordinance.
Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. New Relic does not accept unsolicited headhunter and agency resumes, and will not pay fees to any third-party agency or company that does not have a signed agreement with New Relic.