Manager, Data Reliability Engineering
The Data Reliability Engineering team for Disney Streaming Services (DSS), a segment under the Disney Media & Entertainment Distribution (DMED), is responsible for maintaining and improving the reliability of DSS’ big data platform, which processes hundreds of terabytes of data and billions of events daily. We are looking for a Data Reliability Engineering Manager to help us in the ongoing mission of delivering an outstanding service to our users and make DSS more data-driven. You will help lead the expansion of a high performance reliability engineering team undergoing an increase in scope and responsibilities. Are you a leader who is passionate about reliability engineering and software delivery excellence? If that is the case, we believe this is the role for you.
WHAT YOU'LL DO
- You will lead a team of reliability engineers focused on tackling end-to-end reliability improvements, software delivery excellence and operational automation.
- Set team timelines and priorities for both short and long term objectives and help to build out the vision for reliability engineering for a complex and high impact data organization.
- You will implement processes to plan service capacity, design business continuity and disaster recovery plans and processes and work with partner engineering teams on implementation.
- Help DSS to meet and maintain legal compliance by implementing SOX-compliant change management and software delivery processes and procedures.
- Own the incident management process to ensure that incidents are resolved quickly and that root cause analysis is performed and understood to prevent repeat occurrences.
WHAT TO BRING
- 2+ years of relevant management experience.
- 5+ years of hands-on experience in Reliability Engineering for high-performant, scalable and distributed data systems with a focus on automation.
- Exposure to Cloud Native Infrastructures & best practices.
- Detailed problem-solving approach, coupled with a strong sense of drive and ownership.
- A passionate bias to action and passion for delivering high-quality data and software delivery solutions.
- Experience working in a Linux environment and proficient with cloud environment technologies (AWS).
- Experience coding in one or more of the following programming languages: Python, Java, or Scala.
- Experience with automation and orchestration of scalable infrastructure
- Deep understanding of CI/CD principles, version control systems and industry best practices.
- Attention to detail and quality with excellent problem solving and interpersonal skills.
- BS/MS in Computer Science, Information Management or related field.