Data Engineer

IT Warsaw, Poland


Description

We are looking for Data Engineer to help build and expand our RTB platform.


Your role:

  • Developing a solution that draws conclusions from millions of pages around the world and the content of tens of millions of articles, in dozens of languages ‚Äč‚Äčaround the world
  • Analyzing and categorizing the collected content using NLP techniques
  • Developing and refining new solutions improving the effectiveness of the developed tool
  • Quality analysis of collected data - including categorization and detection of incorrectly collected content

You will find yourself perfectly with us if:
  • You are not afraid of Cloud computing and Big Data (Google Cloud Platform, BigQuery)
  • You are ready to quickly acquire new knowledge and skills; you are open to experimenting with new solutions
  • You enjoy finding optimal means to solve a problem - through flexible use of programming languages, frameworks, tools and solutions prepared from scratch
  • You know Python to a degree that allows for scripting, data flow instrumentation (Airflow or similar tools) and data processing
  • You are comfortable with some analytical libraries/frameworks (Pandas, Numpy); you understand the optimization/performance problems that may occur with larger datasets
  • You can not only work with data but also present the conclusions drawn within the team
  • You engage in the tasks entrusted to you, taking care of the quality of the created product

Sample tasks:
  • Construct an algorithm to detect incompletely downloaded content (e.g. only the first paragraph, no rest of the content)
  • Detect of content that does not contextually match the examined document, on the scale (e.g. artifacts incorrectly "stuck" to the content - advertisements, related content, etc.)
  • Propose and design an optimisation to the data pipeline improving the e2e timing of the data processing

We offer:
  • Attractive salary
  • Work in a team of enthusiasts who are willing to share their knowledge and experience
  • High degree od autonomy
  • Extremely flexible working conditions - we do not have core hours, we do not have holiday limits, you can work fully remotely
  • Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project
  • Hardware and software you need

Do you have questions about the project, team, work style? Visit our tech blog: http://techblog.rtbhouse.com/jobs/