Lead Data Integration Engineer

Engineering Systems Remote, United States


About O’Reilly Media
O’Reilly’s mission is to change the world by sharing the knowledge of innovators. For over 45 years, we’ve inspired companies and individuals to do new things—and do things better—by providing them with the skills and understanding that’s necessary for success.
At the heart of our business is a unique network of experts and innovators who share their knowledge through us. O’Reilly Learning offers exclusive live training, interactive learning, a certification experience, books, videos, and more, making it easier for our customers to develop the expertise they need to get ahead. And our books have been heralded for decades as the definitive place to learn about the technologies that are shaping the future. Everything we do is to help professionals from a variety of fields learn best practices and discover emerging trends that will shape the future of the tech industry.          
Our customers are hungry to build the innovations that propel the world forward. And we help you do just that.
Learn more: https://www.oreilly.com/about/      
At O’Reilly, we believe that true innovation depends on hearing from, and listening to, people with a variety of perspectives. We want our whole organization to recognize, include, and encourage people of all races, ethnicities, genders, ages, abilities, religions, sexual orientations, and professional roles.       
About the Team          
Our data platform team is dedicated to establishing a robust data infrastructure, facilitating easy access to quality, reliable, and timely data for reporting, analytics, and actionable insights. We focus on designing and building a sustainable and scalable data architecture, treating data as a core corporate asset. Our efforts also include process improvement, governance enhancement, and addressing application, functional, and reporting needs. We value teammates who are helpful, respectful, communicate openly, and prioritize the best interests of our users. Operating across various cities and time zones in the US, our team fosters collaboration to deliver work that brings pride and fulfillment.        
About the Job          
We are seeking a skilled and thoughtful Lead Data Integration Engineer to contribute to the design and development of a modern data platform. The ideal candidate will possess a deep understanding of modern data platform concepts, will develop and support data integration strategies that aligns with the organization goals. The candidate will work hand in hand with the data architect and lead team members. Responsibilities include overseeing implementation of data framework covering data integration services such as profiling, ingestion, transformation, quality, and data operations management.        
The Lead Data Integration Engineer will be comfortable with building software that interacts with a diverse range of data. Additionally, the Lead Data Integration Engineer will create tools for delivering analytics data within O’Reilly, aiding decision-making, and enhancing product features. These tools encompass RESTful web services, custom analytics dashboards, and data visualization.                            

Our ETL platform primarily uses BigQuery, Pub/Sub, Talend, Python, and PostgreSQL. We develop and support RESTful web applications in Django, Redshift, Hadoop, and Spark for higher volume data ETL and analysis. Containerization is integral to our approach, employing Docker, Jenkins, and Kubernetes for building, deploying, and managing a diverse range of services. As part of our ongoing initiatives, we are migrating our data platform and services to the cloud GCP environment. The candidate will oversee legacy and new data platform initiatives.          

Salary Range: $155,000-$170,000          
What You'll Do    
The Lead Data Integration Engineer will: 
  • Develop and Implement data integration strategies aligned with organization’s goals and objectives   
  • Identify opportunities to streamline data workflows and enhance data quality        
  • Oversee the integration of various data sources and ensure seamless data flow across different systems within the organization     
  • Collaborate with the Architect and enforce ETL best practices and oversee code reviews of the team
  • Collaborate with cross-functional teams, including data engineers, data analysts and business stakeholders, to understand data requirements and deliver integrated solutions that meet business needs        
  • Monitor and Optimize data integration processes for performance, scalability and efficiency      
  • Lead a team of data integration and data support professionals. Provide guidance, set priorities, mentor and support team members to ensure successful project delivery            
  • Oversee and maintain documentation for data integration processes, including data mappings, transformations and data lineage     
What You'll Have
  • Bachelor’s degree in Computer Science or related field              
  • In lieu of degree, equivalent education and/or experience may be considered
  • 4 years of experience leading teams in data warehousing and/or data engineering
  • Proven experience in data integration, ETL development and data warehousing             
  • Strong technical skills in GCP, Talend, BigQuery             
  • Deep understanding of SQL, Shell scripting, and Python             
  • Experience with Agile software development lifecycle              
  • Experience with Django, Pub/Sub, Hadoop, Spark and Kubernetes             
  • Knowledge on Postgres, Redshift, RabbitMQ, Jenkins and Docker
  • Knowledge of BI tools such as Qlik Sense or Looker
  • Knowledge of data governance principles and regulatory requirements             
  • A high level of comfort with DevOps processes
  • Excellent leadership and communication skills
  • Ability to work effectively in a fast-paced, dynamic environment     
  • Experience with BI tools such as Qlik Sense or Looker
  • Relevant certifications in data integration (ex. Talend Data Integration) and cloud technologies (ex. GCP, AWS, Azure) are a plus