Data Engineer

Data Services Durbanville, South Africa


Description

Data Engineer

 

This role will suit an all-round Data Engineer experienced in every step of data flow. Starting from configuring data sources to integrating analytical tools. The data engineer is a technical person who will contribute to architecting, building, testing, and maintaining the data platform as a whole

 

Mission

Deliver accurate data reliably within the required processing time, ready for processing by analytics applications.

 

You will work across multiple business teams to implement new and improve existing systems for:

  1. Extracting data from current sources
  2. Data storing/transition for all data gathered for analytical purposes.
  3. Transformation: Cleaning, structuring, and formatting the data sets to make data consumable for processing or analysis.

Responsibilities:

  1. Lead/contribute to designing the architecture of a data platform.
  2. Develop data systems tools, customize and manage integration tools, databases, warehouses, and analytical systems.
  3. Data pipeline maintenance/testing.
  4. Machine learning algorithm deployment. Machine learning models designed by our data scientists, deployed into production environments, managing computing resources and setting up monitoring tools.
  5. Manage data and meta-data storage, structed for efficiency, quality and performance.
  6. Track pipeline stability. Monitor the overall performance and stability of the systems.
  7. Keep track of related infrastructure costs and manage these as efficiently as possible, continuously find the balance between performance and cost.

 

Important characteristics:

 

  • Dynamic, driven by results. Enjoy getting things done.
  • A proactive, results-oriented, and can-do attitude is highly desirable
  • Self-starter with the ability to successfully plan, organize, and execute assigned initiatives with minimal guidance and direction.
  • Strong listening, problem-solving and analytical skills.
  • Ability to conduct systems analysis and prepare requirement specifications concerning data-related business processes and systems.
  • Willing to take the initiative.
  • Excellent written and verbal communication skills.
  • Exhibits close attention to detail and is a champion for accuracy.
  • High level of Integrity.
  • Proven ability to meet deadlines, ability to multi-task effectively
  • The ability and willingness to do a variety of different tasks.

 

Preferred qualifications and experience:

  • A Bachelor degree in computer science or engineering related field.
  • 5-10 years relevant experience.
  • Intermediate to advanced SQL optimization, developing ETL strategies.
  • Intermediate to advanced knowledge of database and data warehousing principles (e.g. OLAP, Data Marts, Star Schema, lambda/kappa architectures)
  • Knowledge of working with stream data pipeline frameworks or solutions, e.g. Apache Flink, Beam, Dataflow, Databricks etc. an advantage
  • Knowledge of cloud data platforms is an advantage
  • Experience within the Automotive or manufacturing industry is a plus.
  • Experience with agile or other development methodologies - Agile, Kanban or Scrum 

 

Preferred Technical experience:

  •  Implementing data pipelines using cloud infrastructure and services.
  • Implementing event-based systems using tools like Confluent Kafka or Kinesis.
  • Knowledge of CDC pipelines (E.g. Bin logs, Debezium, AWS Database Migration Service (DMS))
  • API integration and development using Python - Fast API, Flask
  • DevOps: Docker (ECS), Kubernetes (EKS), Spark clusters, etc.
  • Database analysis, design & administration.
  • SQL Query optimization and data architecture improvements;
  • DB: PostgreSQL, MS-SQL
  • ETL: Python, Bash
  • Infrastructure: AWS (Kinesis, API Gateway, S3, DMS)
  • Dev Tools: Git, Docker, Elastic Container Service (ECS), Elastic Kubernetes Service (EKS)
  • OS: Ubuntu, Windows Server