Big Data Engineer
Description
Your tasks:
- Development and maintenance of distributed systems processing large amounts of data (most real-time) for the needs of our RTB platform;
- Optimization of the developed software in terms of efficiency and resource consumption;
- Ensuring the reliability and scalability of the solutions built;
- Creating performance and correctness tests for new system components;
- Analysis of new technologies in terms of their applicability in production conditions;
- Development of tools for monitoring and analyzing the operation of the production system;
- Continuous optimization of existing tools and processes.
- Java, Python;
- Hadoop, Kafka;
- Kafka Streams, Flume, Logstash;
- Docker, Jenkins, Graphite;
- Aerospike, PostgreSQL;
- Google Big Query, Elastic.
- Replacement of the framework in the data processing component (transition from Storm to Kafka Streams);
- Creating a data stream merger based on the Kafka Client API;
- Creating a user profile synchronizer between DCs based on Kafka Streams;
- Creating a component that calculates aggregates based on the Kafka Client API and Bloom filters;
- Implementation of Logstash for loading and Elastic for querying indexed data (transition from Flume + Solr);
- Creating end-to-end monitoring of data correctness and delay;
- Replacement of the data streaming component to BigQuery and HDFS (from Flume to a proprietary solution based on Kafka Client API);
- Continuous system maintenance, detection and resolution of performance problems, as well as scaling due to the growing amount of data.
- Proficiency in programming;
- Excellent understanding of how complex IT systems work (from the hardware level, through software, to algorithmics);
- Good knowledge of basic methods of creating concurrent programs and distributed systems (from thread level to continental level);
- Practical ability to observe, monitor and analyze the operation of production systems (and draw valuable conclusions from it);
- The ability to critically analyze the solutions created in terms of efficiency (from estimating the theoretical performance of the designed systems to detecting and removing actual performance problems in production);
- Readiness to work in the DevOps model.
- Experience in creating distributed systems;
- Good knowledge of selected Big Data technologies such as Hadoop, Kafka, Storm, Spark or Flink;
- Knowledge of application profiling methods and tools (preferably Java, both from the JVM and Linux level)
- Attractive salary;
- Work in a team of enthusiasts who are willing to share their knowledge and experience;
- Extremely flexible working conditions - we do not have core hours, we do not have holiday limits, you can work fully remotely;
- Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project;
- Hardware and software you need.