DevOps Lead

Production Technology, Core Services Mumbai, India Chennai, India Bengaluru, Karnataka


Position at DNEG

Devops Lead

Core Services is looking for a Staff Software Developer to join our Product Reliability and Operations domain as its lead. The successful candidate will build and lead a team responsible for developing software solutions that help achieve operational excellence.

The Core Services group provides foundational technologies to other technology groups within DNEG. We are a team of software developers who architect, build and deliver geographically spread, enterprise-level, mission-critical infrastructure. We create services, frameworks, and products that other teams rely upon to build, deliver and operate their technology solutions. We are the core of all things technology at DNEG.

The Product Reliability and Operations domain @ Core Services is responsible for creating software libraries and frameworks that enable other teams to develop observable software products. We are also responsible for the monitoring and operational support of those software products. We define software development methodologies and rules geared towards reducing operational problems and provide the tools to develop software complying with those rules. DNEG is currently transitioning to DevOps culture and we are the ones architecting, designing, and building the software infrastructure necessary for realizing that vision.

The Product Reliability and Operations domain is at a very early stage and we are seeking a Staff Software Developer with extensive experience in designing, building and deploying enterprise-grade software infrastructure for automation, monitoring, and logging as well as setting up and running manual operational support teams. In collaboration with experts in DNEG's proprietary systems, the successful candidate will guide the design, implementation, and maintenance of the company's reliability and operations set-up as it grows into vital and mission-critical infrastructure.

You will build a team to:

        Devise and implement:

        Libraries and frameworks that help make software observable in a manner suitable to DNEG's requirements.

        Standard workflows for software developers and operations engineers.

        24/7 global Ops strategy.

        Engage with software development stakeholders from various departments to establish reliability and operations requirements.

        Develop automation tools for alerting.

        Report on the amount of toil handled and identify systems with potential technical debt.

        Collaborate with the IT Systems department to plan and administrate the use of automation, monitoring and logging servers, systems etc. relating to the reliability and operations requirements.

        Provide best-practice advice and direct consultancy for other technology teams regarding the use of the reliability and operations platform.


You will also:

        Determine the necessary global team structure, skills, capacity, roster, etc. in order to meet reliability and operations requirements.

        Plan and execute the initial creation and subsequent expansion of the team.

        Appropriately communicate high-level strategy, requirements, and technical details to various teams.

        In collaboration with assigned project managers, arrange for and facilitate the refinement, prioritization, and assignment of team members’ tasks.

        Act as the escalation point for issues arising related to the reliability and operations platform.

        Motivate, encourage and mentor team members.

        In collaboration with the Core Services Manager, provide day-to-day leadership & support, regular feedback, training, etc. to team members.

Must Have

        10+ years of non-intern experience in professional software development.

        5+ years of experience in designing and implementing enterprise-scale automated as well as manual reliability and operations systems.

        3+ years of experience in a team leadership/mentorship role.

        Proven expertise in the design, implementation, maintenance, and operation of systems involving the ELK stack, Prometheus, Graphite, Grafana etc.

        Expertise in using Ansible, Puppet, Chef or SaltStack.

        Expertise in using Jenkins.

        Experience in setting up Sensu.

        Experience in analyzing logs and identifying sources of errors.

        Expertise in software development processes including coding standards, version control, automated testing, and CI/CD practices.

        Excellent communication skills, including effective handling of complex, long-distance technical discussions.

        Experience organizing complex development and support workloads.

        Experience using issue-tracking software such as Atlassian Jira.

        Eagerness to tackle problems head-on.

        Bachelor’s or Master’s degree in Computer Science.

Nice to Have

        Certifications in DevOps practices.

        Experience of and/or qualification in Red Hat's OpenShift platform.

        Experience in the support of service-oriented software.

        Experience in supporting engineers and their managers in the uptake of new technologies.

        Agile development experience using Scrum framework.



About Us
We are DNEG, one of the world’s leading visual effects and animation companies for the creation of award-winning feature film, television, and multiplatform content. We employ more than 10,000 people with worldwide offices and studios across North America (Los Angeles, Montréal, Toronto, Vancouver), Europe (London), Asia (Bangalore, Mohali, Chennai, Mumbai) and Australia (Sydney).