Site Reliability Engineer
We are looking for Site Reliability Engineers to join our RTB platform development team.
What we do? We select, evaluate and personalize advertisements. One million times per second, with a latency below 50 ms.
- Ensuring that all our services are available 24/7
- Automation of routine operations (e.g. deploying new machines / scaling services)
- Detailed monitoring of all production systems
- Detecting (and fixing!) Problems before they cause big trouble
- Minimizing the duration and consequences of a failures
- Fast and efficient, but controlled, carrying out non-standard operations (e.g. data migration, implementation of a new solution), without affecting operation of the system, the work of developers, etc.
- Searching for the best possible solutions and technologies to achieve the above mentioned goals
- You perfectly understand how computers work (at every level: from hardware, through system, to software)
- You can observe, monitor and analyze the performance of production systems (and draw valuable conclusions from this obserations)
- You know various hack tools that allow you to penetrate the guts of systems (top, strace and visualvm)
- You know (not only familiar with) A lot of technologies and you are not afraid to get to know new ones
- You can quickly understand the principles of operation of the components used (e.g. how exactly does aerospike arrange data on disk)
- You feel comfortable in distributed systems (not only in the LAN, but also around the world)
- You know perfectly well what the theoretical performance of the equipment is...
- ...and you can answer the question why (yet!) we do not observe it in practice
- You are interested in both the lowest layers of the technological stack (hardware, network, OS) and also the business role of individual components
- You are fluent in the code of the systems you take care of; You can easily find something there, check it and modify it if necessary
- Not only are you able to efficiently extinguish a fire, but you can also identify its fundamental cause and propose a way to prevent similar events in the future
- You can automate your work with Python and Bash; You are not afraid to look at any code
- You comprehensively cover the topics you are working on.
- You understand that from proof-of-concept to production-grade software there is sometimes some work to be done
- You solve real problems instead of endlessly analyzing completely irrelevant issues
- Java, Python
- Aerospike, Memcached
- Grafana, Graphite, ClickHouse
- Kibana, ElasticSearch
- Google Cloud
- A very attractive salary
- Interesting project and technologies (https://github.com/RTBHOUSE)
- A challenging and constantly growing scale (currently: 4 DC's/several dozen equipment cabinets).
- Access to the latest technologies and the possibility of real use of them in a large-scale and highly dynamic project
- Work in a team of enthusiasts who are willing to share their knowledge and experience
- Collaboration with people who understand technology (no PMs, analysts and managers)
- Extremely flexible working conditions - we do not have core hours, we do not have holiday limits, you can work fully remotely
- The hardware and software you need
Do you have questions about the project, team, work style? Visit our tech blog: http://techblog.rtbhouse.com/jobs/