Data Operations Engineer

Data, Analytics & Insight Warsaw, Poland


Description

Position at Choreograph

Choreograph is a new global data company, to help clients realize the value of their first-party data, consult on and implement their data and technology strategies, and advise on privacy-first approaches to navigate the fast-changing data landscape. Choreograph brings together the specialist data units of GroupM and Wunderman Thompson into a single company with global reach, accessible to all WPP clients and companies.

Choreograph’s core belief is that marketers own their first-party data with consumer permission; respect for privacy and the intentional use of data is at the heart of its approach. Guided by this philosophy, Choreograph will continue to create market-leading tools to support clients in the appropriate and responsible application of data in advertising.

WHO WE ARE LOOKING FOR

As a Data Operations Engineer at Choreograph you will work with a team of developers, product owners, and support engineers to deliver high-quality data to fuel our applications and AI algorithms.

You will be responsible for communication with stakeholders and data providers, identifying opportunities for automation of the validation process and participating in its design, manual data transformation during PoC phases of data onboarding, and designing validation processes.

WHAT YOU’LL DO

  • Validating and loading datasets into software systems in multiple environments
  • Creating and maintaining scripts and tools for automation of data management
  • Monitoring of data consistency and completeness
  • Monitoring of dataset onboarding status
  • Creating ad hoc data analyses and reports
  • Aiding development teams
  • Communication with internal and external data providers
  • Identifying and resolving data issues in collaboration with users, partners, and stakeholders

WHAT YOU’LL NEED

At least 1 year of working with data manipulation using Python

Skills

  • Fluency in SQL
  • Good knowledge of Python data manipulation tools (e.g. NumPy, Pandas)
  • Understanding of basic statistical concepts
  • Comfortable in working with GIT
  • Understanding of HTTP APIs

 Nice to have

  • Knowledge of PySpark and Hadoop ecosystem
  • Experience with cloud services like BigQuery, Google Cloud Storage, AWS S3
  • Work experience as a researcher