Data Engineer (Python, SQL), AdTech

Big Data Development Kharkiv, Poland Saratov, Poland St.Petersburg, Poland Lviv, Poland Kraków, Poland Belgrade, Poland Kyiv, Poland


Description

We’re looking for a Data Engineer to build innovative data pipelines for processing and analyzing client’s large user datasets (250 billion + events per month). Our client is one of the fastest growing and most promising AdTech companies in the US. Our Data team is working on Data Warehouse gathering ads performance data and data related features aimed at conversion improvements.

Responsibilities: 

  • Develop ETL (Extract, Transform and Load) Data pipelines in Spark, Kinesis, Kafka, custom Python apps  to transfer massive amounts of data (over 20TB/ month) most efficiently between systems
  • Engineer complex and efficient and distributed data transformation solutions using Python, Java, Scala, SQL
  • Productionalize Machine Learning models efficiently utilizing resources in clustered environment
  • Research, plan, design, develop, document, test, implement and support proprietary software applications 
  • Analytical data validation for accuracy and completeness of reported business metrics
  • Open to taking on, learn and implement engineering projects outside of core competency
  • Monitor system performance after implementation and iteratively devise solutions to improve performance and user experience

Requirements: 

  • 3+ years of experience of developing in Python to transform large datasets on distributed and cluster infrastructure
  • 3+ years of experience in engineering ETL data pipelines for Big Data Systems
  • Proficient level of SQL. Have some experience performing data transformations and data analysis using SQL
  • Comfortable in juggling multiple technologies and high priority tasks

Nice to have:
  • BS or higher degree in computer science, engineering or other related field
  • 5+ years of Object Oriented Programming experience in any of languages such as Java, Scala, C++
  • Prior experience of designing and building ETL infrastructure involving streaming systems such as Kafka, Spark, AWS Kinesis
  • Experience of implementing clustered/ distributed/ multi-threaded infrastructure to support Machine Learning processing on Spark or Sagemaker
  • Experience with Distributed columnar databases like Veritca, Greenplum, Redshift, or Snowflake

We offer:

  • Opportunity to work on bleeding-edge projects
  • Work with a highly motivated and dedicated team
  • Competitive salary
  • Flexible schedule
  • Medical insurance
  • Benefits program
  • Corporate social events

About us:

Grid Dynamics is the engineering services company known for transformative, mission-critical cloud solutions for retail, finance and technology sectors. We architected some of the busiest e-commerce services on the Internet and have never had an outage during the peak season. Founded in 2006 and headquartered in San Ramon, California with offices throughout the US and Eastern Europe, we focus on big data analytics, scalable omnichannel services, DevOps, and cloud enablement.