Senior Data Engineer
Description
Title: Senior Data Engineer
Department: IT Data
Open Lending is seeking an experienced Senior Data Engineer to drive the continued evolution of our modern data platform leveraging Databricks, Unity Catalog, Lakehouse architecture, as well as assisting in the continued support and development of our data warehouse. As a Senior Data Engineer, you will be building scalable, secure, and high-performance data pipelines supporting analytics, business intelligence, and machine learning operations. As part of our modernization initiatives, you will design and deliver end-to-end data solutions with an emphasis on Databricks workflows, PySpark-based processing, and Spark structured streaming pipelines. In this role, you will be engaged in all phases of the data management lifecycle, including gathering requirements, processing, storing and securing of data, communication, and documentation. The ideal candidate is passionate about data engineering, data quality, governance, and automation, with experience or a strong desire to expand into ML Ops.
Responsibilities:
- Design, build, and maintain scalable data ingestion and transformation pipelines using Databricks (PySpark, Delta Lake, Workflows) and Azure Data Factory.
- Implement Lakehouse architecture best practices with Delta Lake for performance, reliability, and ACID compliance.
- Utilize Unity Catalog for centralized data governance, lineage tracking, and fine-grained access controls.
- Develop and maintain data validation frameworks within Databricks to ensure accuracy, completeness, and compliance across all data domains.
- Build both batch and streaming data pipelines using Spark Structured Streaming, Kafka, or Azure Event Hubs.
- Drive performance tuning, cost optimization, and observability of data pipelines and compute clusters
- Improve resiliency and maintainability through testing, alerting, and automation with Databricks workflows and Lakehouse monitoring.
- Partner with architects and data consumers to deliver curated, high-quality datasets optimized for downstream analytics and machine learning.
- Champion CI/CD and DevOps practices for data pipelines, leveraging Git-based workflows and automation within Databricks and GitHub Actions.
- Identify and implement process improvements to enhance speed of delivery while maintaining governance and compliance.
- Enable downstream analytics and reporting platforms such as Power BI through reliable, governed data delivery.
- Support MLOps processes for model deployment, feature management, and pipeline integration within Databricks.
Education & Experience:
- Bachelor’s degree in Computer Science, Information Technology, or related field.
- 5+ years of experience in Data Engineering, ETL, or Data Warehousing.
- 3+ years of hands-on experience with Azure Databricks, including PySpark, Delta Lake, Unity Catalog, and Databricks Workflows.
- 2+ years designing and maintaining data pipelines using Azure services such as Data Factory, Event Hubs, Key Vault, and Logic Apps.
- Expertise in Python/PySpark for large-scale transformations, orchestration, and automation.
- Strong knowledge of distributed data systems and Lakehouse design principles.
- Experience with streaming architectures (Spark Streaming, Kafka, Event Hubs).
- Proficiency with SQL and familiarity with document databases.
- Working understanding of Unity Catalog governance, lineage, and security controls.
- Familiarity with MLOps frameworks (MLflow, feature stores) or an eagerness to expand into automated ML lifecycle management.
- Experience using Git and Azure DevOps for CI/CD automation.
- Excellent collaboration, communication, and problem-solving skills.
Skillsets Used
Azure Databricks (PySpark, Delta Lake, Unity Catalog, Workflows, MLflow, feature stores), ADLS Gen2 Storage, Microsoft SQL Server, Kafka/Event Hubs, Spark SQL, Python, Azure Key Vault, GitHub, CI/CD, Infrastructure-as-Code (Databricks Asset Bundles).
Physical Requirements:
- Prolonged periods sitting at a desk and working on a computer.
- Must be able to lift 15 pounds at times.