Mid+ Data Engineer
Description
Company Overview
Lean Tech is a data-centric organization within the logistics sector, dedicated to establishing and maintaining the highest standards of data integrity, reliability, and consistency. We foster a culture of collaboration, continuous learning, and individual ownership, empowering passionate professionals to drive data excellence. Our approach to Master Data Management is built upon a sophisticated, modern architectural framework, leveraging technologies such as Databricks and AWS RDS (PostgreSQL) to create bespoke solutions. In our dynamic, remote-first environment, we prioritize robust governance processes and disciplined modeling to ensure the quality of our data assets across the enterprise.
Position Overview
We are seeking a Mid+ professional to join our Data Engineering team, dedicated to advancing our master and reference data capabilities. This pivotal role is responsible for the end-to-end design, development, and implementation of our internal master data platform. You will ensure the reliability, consistency, and integrity of master data across the organization by building robust dimensional data models and optimizing data pipelines within our Databricks lakehouse.
A key aspect of this position involves managing the transport and exposure of master data from the data lakehouse to a downstream AWS RDS (PostgreSQL) data hub. You will build and manage the API endpoints that enable internal applications to consume this critical data, owning the data exposure logic from start to finish. Operating within an architectural Master Data Management (MDM) framework, this is a SQL-first role where you will collaborate closely with cross-functional teams, data stewards, and platform engineering teams to apply best practices and disciplined data modeling to enforce data quality standards.
What You Will Be Doing
- Design, develop, and manage robust dimensional models for master and reference data within the Databricks lakehouse, utilizing SQL notebooks and views to ensure data integrity.
- Build and optimize robust batch and streaming data pipelines to integrate data into the data lakehouse, ensuring reliable and efficient data flows for master data model consumption.
- Develop and manage the transport of master data from the Databricks lakehouse to the AWS RDS (PostgreSQL) data hub, building and extending REST API endpoints to ensure seamless data consumption by internal applications.
- Implement and enforce data quality standards through rigorous modeling discipline and governance processes, partnering with data stewards to research and resolve data exceptions.
- Conduct comprehensive source system analysis, data profiling, and data mapping to inform data modeling and integration strategies.
- Collaborate with cross-functional teams to develop an in-depth understanding of master data and align on data strategies, processes, policies, and controls.
- Share master data management best practices and provide training and recommendations to data stewards and other stakeholders across the organization.
- Create and maintain comprehensive documentation for master data flows, lineage, and standards to enhance organizational data literacy.
Required Skills & Experience
- A minimum of 4 years of professional experience in data engineering, with at least 2 years specifically focused on master and reference data.
- Advanced expertise in Master Data Management (MDM) principles and best practices, with proven experience designing and implementing robust dimensional data models and architectural MDM solutions.
- Proficiency in a SQL-first development environment, with expert-level SQL skills for data modeling and transformations.
- Hands-on experience with modern cloud data platforms, particularly Databricks (for SQL notebooks, views, and dimensional modeling) and AWS RDS for PostgreSQL.
- Experience building and optimizing robust batch and streaming data pipelines.
- Proficiency in building and extending REST API endpoints on a PostgreSQL database to facilitate data synchronization and consumption by internal applications.
- Working knowledge of Python and Spark, utilized selectively for workloads where they are a better fit than SQL.
- Demonstrated ability in source system analysis, data profiling, and data mapping to ensure data integrity.
- Working knowledge of Agile/Scrum methodologies and familiarity with lifecycle management tools such as Jira and Confluence.
- Familiarity with software engineering best practices, including code reviews and CI/CD workflows (e.g., GitHub), and experience collaborating with platform teams on deployments.
- Strong business acumen, analytical skills, and intellectual curiosity with a high attention to detail
Nice to Have
- Professional experience within the logistics or supply chain industry.
- Experience implementing data quality frameworks and tools, such as dbt tests or Great Expectations.
- A foundational understanding of Generative AI concepts and their application in improving development efficiency, such as using built-in Databricks AI features.
- Hands-on experience with advanced Databricks features, including Unity Catalog for governance or Delta Live Tables (DLT) and Workflows for orchestration.
- Practical experience applying Generative AI for advanced data management tasks, such as PII detection or NLP-based entity resolution.
- Relevant professional certifications, such as Databricks Certified Data Engineer or AWS Certified Database - Specialty.
- Proven experience in mentoring other engineers or training business stakeholders and data stewards on data management best practices.
Soft Skills
- Exceptional Communication and Collaboration: Effectively articulate complex data concepts to diverse stakeholders, leading requirements gathering and collaborating closely with cross-functional teams, including platform and governance, to align on data strategies and processes.
- Leadership and Mentorship: Guide the design and development of master data solutions while actively sharing expertise, providing training, and promoting data management best practices to enhance data literacy across the organization.
- Analytical Acumen and Attention to Detail: Employ strong analytical and organizational skills to conduct meticulous source system analysis, data profiling, and mapping, with an intellectual curiosity to identify patterns and ensure the highest standards of data integrity.
- Autonomy and Adaptability: Demonstrate a proactive, self-starting approach to problem-solving, comfortably navigating ambiguity and adapting quickly to changing priorities in a dynamic, fast-paced environment.
- Strong Sense of Ownership: Take full accountability for the end-to-end lifecycle of master data solutions, from initial design and modeling to managing the API endpoints that ensure data synchronization and integrity across systems.
Why You Will Love Working with Lean Tech
Join a powerful tech workforce and help us change the world through technology
- Collaborative work environment
- Professional development opportunities with international customers
- Career path and mentorship programs that will lead to new levels.
Join Lean Tech and contribute to shaping the data landscape within a dynamic and growing organization. Your skills will be honed, and your contributions will play a vital role in our continued success. Lean Tech is an equal opportunity employer. We celebrate diversity and are committed to creating an inclusive environment for all employees.