Lead Site Reliability Engineer
Lead Site Reliability Engineer
At New Relic, we’re building and scaling one of the largest and fastest-growing data platforms in the world. Our systems record billions of new data points per minute and handle an order of magnitude more bytes per second than Twitter. We’re looking for a architect-level engineer with deep DevOps or SRE experience to lead the SRE and reliability practice across multiple in our growing Barcelona office.
In this role, you’ll be responsible for mentoring teams in reliability best practices, influencing design decisions, and getting hands-on to tackle key infrastructure, cluster automation, and scaling challenges. You’ll contribute to conversations like how we evolve our growing multi-region footprint, how we manage a customer workload that touches over 300,000,000,000,000 data points per minute, and how we build a culture that manages technical debt without sacrificing velocity.
In addition to technical depth, great interpersonal skills and comfort in a fast-paced setting, this role calls for working with diverse stakeholders in an agile, highly collaborative environment.
On the New Relic team, you’ll work with some of the best engineers and tackle some of the most complex problems in the industry. Our massively-scaled cloud platform is comprised of hundreds of services and processes billions of data points every minute, and we invest heavily in our internal tooling and developer infrastructure. Culturally, we value teamwork, personal respect and autonomy, work/life balance, and innovation.
What You’ll Do
- Analyze complex distributed systems to identify scaling bottlenecks, risks, and operational pain points, and influence design decisions to address these issues.
- Mentor teams on reliability, scaling, and operational best practices like gamedays, monitoring and alerting, workload distribution, and fault tolerance.
- Partner with teams to provide hands-on support in diagnosing and resolving complex scaling and resilience issues.
- Communicate roadmaps, pain points, and priorities across technical and non-technical stakeholders to drive alignment and develop execution plans.
- Collaborate with other architects to define a reliability and scaling roadmap, and to influence the direction of our internal runtime platform and tooling.
- Help design and build the software that operates our software.
- 7+ years of experience writing software in Go, Java, Python or C
- Experience being part of an on-call rotation for production support of critical services
- Familiarity with distributed systems and/or data pipeline design patterns
- Experience working in a SaaS environment and/or with microservices architectures
- Familiarity with Site Reliability Engineering best practices
- Experience working with multiple teams that each have a heterogenous mix of technologies
- A collaborative work style that includes colleagues in important decisions and leads to shared code ownership
- Great communication and interpersonal skills in both Spanish and English
- Experience with UI architectures and best practiices
- Experience with at least some of the following: Kubernetes, Mesos/Marathon, Kafka, and high-scale datastores
- Streaming Data Processing experience
- An abiding intellectual curiosity and passion for solving puzzles
Please note that visa sponsorship is not available for this position.
This position is in our Barcelona office, which was established on October 2014 with our acquisition of Ducksboard, a privately held startup. We provide challenging work, opportunities to learn, high quality teammates, a standard-setting product, and a company on the move. We offer:
- Competitive salary.
- Equity compensation plan.
- Performance reviews twice a year.
- Work-life balance and flexible schedule.
- Amazing and fun work environment.
- Private health insurance for you and your family, including dental coverage.
- Retirement fund and Life insurance.
- English and Spanish language classes.
- Office located in the center of Barcelona, very close to public transportation.
- We provide ergonomic furniture (chairs, desks) to keep you healthy and comfy.
- Fresh fruits, snacks and beverages.
- We support technical meetups, both local and international.
- We help with relocation.
We are passionate about data visualizations in real time, APIs, intuitive UX and beautiful design. We have no dogma but do whatever makes sense to deliver state of the art products.
New Relic (NYSE: NEWR) is the industry’s largest and most comprehensive cloud-based instrumentation platform built to create more perfect software. The world’s best software and DevOps teams rely on New Relic to move faster, make better decisions and create best-in-class digital experiences. If you run software, you need to run New Relic. We’re proudly trusted by more than 50% of the Fortune 100.
Founded in 2008, we’re a global company focused on building a culture where all employees feel a deep sense of belonging, where every ‘Relic’ can bring their whole self to work and feel supported and empowered to thrive. We’re consistently recognized as a distinguished employer and are committed to building world-class products and an award winning culture. For more information, visit newrelic.com.
Our Hiring Process
In compliance with applicable law, all persons hired will be required to verify identity and eligibility to work and to complete employment eligibility verification.
Headhunters and recruitment agencies may not submit resumes/CVs through this website or directly to managers. New Relic does not accept unsolicited headhunter and agency resumes, and will not pay fees to any third-party agency or company that does not have a signed agreement with New Relic.