Data Engineer

IT Warsaw, Poland


Your role:

  • Developing a solution that draws conclusions from millions of pages around the world and the content of tens of millions of articles, in dozens of languages ‚Äč‚Äčaround the world.
  • Analyzing and categorizing the collected content using NLP techniques.
  • Development and refinement of new solutions improving the effectiveness of the developed tool.
  • Quality analysis of collected content - Including categorization and detection of incorrectly collected content.
You will find yourself perfectly with us if:
  • You are not afraid of Cloud computing and Big Data (Google Cloud Platform, BigQuery).
  • You are ready to quickly acquire new knowledge and skills; you are open to experimenting with new solutions.
  • You enjoy finding optimal means to solve a problem - through flexible use of programming languages, frameworks, tools and solutions prepared from scratch.
  • You know Python to a degree that allows for scripting, data flow instrumentation (Airflow or similar tools) and data processing.
  • You are comfortable with some analytical libraries / frameworks (Pandas, Numpy); you understand the optimization / performance problems that may occur with larger datasets
  • You can not only work with data but also present the conclusions drawn within the team.
  • You engage in the tasks entrusted to you, taking care of the quality of the created product.
Sample tasks:
  • Constructing an algorithm to detect incompletely downloaded content (e.g. only the first paragraph, no rest of the content).
  • Detection of content that does not contextually match the examined document, on the scale (e.g. artifacts incorrectly "stuck" to the content - advertisements, related content, etc.).
We offer:
  • Attractive salary.
  • Work in a team of enthusiasts who are willing to share their knowledge and experience.
  • Extremely flexible working conditions - we do not have core hours, we do not have holiday limits, you can work fully remotely.
  • Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project.
  • Hardware and software you need.