Site Reliability Engineer, Cloud Platform

Operations Requisition ID 5312 Pune, India

Description

We are seeking a highly motivated and talented Site Reliability Engineer to work on Qualys’ Cloud Platform & Middleware technologies. Working with a team of engineers and architects, you will combine software development and systems engineering skills to build and run scalable, distributed and fault-tolerant systems.

The ideal candidate will write software to optimize day to day work through better automation, monitoring, alerting, testing and deployment.  Co-develop and participate in the full lifecycle development of cloud platform services from inception and design, deployment, operation and improvement by applying scientific principles. Increase the effectiveness, reliability, and performance of cloud platform technologies by identifying and measuring key indicators, making changes to the production systems in an automated way and evaluating the results. Support cloud platform team before the technologies are pushed for production release through activities such as system design, capacity planning, automation of key deployments, engaging in building a strategy for production monitoring and alerting and participate in testing/verification process. Ensure, that the cloud platform technologies are maintained properly by measuring and monitoring availability, latency, performance, and system health. Advice the cloud platform team to improve the reliability of the systems in production and scale them based on need. Participate in the development process by supporting new features, services, releases and hold an ownership mindset for the cloud platform technologies

Develop tools and automate the process for achieving large scale provisioning and deployment of cloud platform technologies Participate in on-call rotation for cloud platform technologies. At times of incidents, lead incident response and be part of writing detailed postmortem analysis reports which are brutally honest with no-blame. Propose improvements and drive efficiencies in systems and processes related to capacity planning, configuration management, scaling services, performance tuning, monitoring, alerting and root cause analysis

Requirements:

  • Fresh college graduates are welcome to apply
  • BS/MS degree in Computer Science, Applied Math, or related field
  • Knowledge in one of the programming languages: Java, Python or Go
  • Proficient in writing bash scripts
  • Good understanding of Linux operating system and commands
  • Good understanding of SQL and NoSQL systems
  • Good understanding of systems programming (network stack, file system, OS services)
  • Knowledge of network elements such as firewalls, load balancers, DNS, NAT, TLS/SSL, VLANs etc.
  • Knowledge of JVM concepts like garbage collection, heap, stack, profiling, class loading, etc.
  • Able to work under pressure
  • Able to drive results and set priorities independently

EEO Employer/Vet/Disabled