Site Reliability Engineer

IT Warsaw, Poland Cracow, Poland


Description

What we do?

  • We select, evaluate and personalize advertisements. One million times per second, with a latency below 50 ms.
Your tasks:
  • Ensuring that all our services are available 24/7.
  • Automation of routine operations (e.g. deploying new machines / scaling services).
  • Detailed monitoring of all production systems.
  • Detecting (and fixing!) Problems before they cause big trouble.
  • Minimizing the duration and consequences of a failures.
  • Fast and efficient, but controlled, carrying out non-standard operations (e.g. data migration, implementation of a new solution), without affecting operation of the system, the work of developers, etc.
  • Searching for the best possible solutions and technologies to achieve the above mentioned goals.
Our expectations:
  • You perfectly understand how computers work (at every level: from hardware, through system, to software).
  • You can observe, monitor and analyze the performance of production systems (and draw valuable conclusions from this obserations).
  • You know various hack tools that allow you to penetrate the guts of systems (top, strace and visualvm)
  • You know (not only familiar with) A lot of technologies and you are not afraid to get to know new ones.
  • You can quickly understand the principles of operation of the components used (e.g. how exactly does aerospike arrange data on disk?)
  • You feel comfortable in distributed systems (not only in the LAN, but also around the world).
  • You know perfectly well what the theoretical performance of the equipment is ...
  • ... and you can answer the question why (yet!) we do not observe it in practice.
  • You are interested in both the lowest layers of the technological stack (hardware, network, OS ...) and also the business role of individual components.
  • You are fluent in the code of the systems you take care of. You can easily find something there, check it and modify it if necessary.
  • Not only are you able to efficiently extinguish a fire, but you can also identify its fundamental cause and propose a way to prevent similar events in the future.
  • You can automate your work with Python and Bash. You are not afraid to look at any code.
  • You comprehensively cover the topics you are working on. 
  • You understand that from proof-of-concept to production-grade software there is sometimes some work to be done.
  • You solve real problems instead of endlessly analyzing completely irrelevant issues.
Selected technologies:
  • Java, Python...
  • Aerospike, Memcached...
  • HaProxy
  • Jenkins
  • Grafana, Graphite, ClickHouse
  • PostgreSQL
  • Kafka
  • Kibana, ElasticSearch...
  • Jetty
  • Docker
  • Google Cloud…
We offer:
  • A very attractive salary
  • Interesting project and technologies (https://github.com/RTBHOUSE).
  • A challenging and constantly growing scale (currently: 4 DC's / several dozen equipment cabinets).
  • Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project.
  • Work in a team of enthusiasts who are willing to share their knowledge and experience.
  • Collaboration with people who understand technology (no PMs, analysts and managers).
  • Extremely flexible working conditions - we do not have core hours, we do not have holiday limits, you can work fully remotely.
  • The hardware and software you need