Cloud Databases: Site Reliability Engineer I

Customer Relationship & Support San Antonio, Texas


This is an SRE position on the Cloud Database product in OpenStack Public Cloud. You'll be responsible for deploying and managing the various environments that runs the Cloud Database product from multiple regions around the world. Familiarity with deploying and managing OpenStack in a production environment a plus.

PRIMARY RESPONSIBILITY: Site Reliability Engineering (SRE) is an engineering discipline that combines software and systems engineering to build and run large-scale, massively distributed, fault-tolerant systems. SRE ensures that Rackspace's managed service offerings & customer deployments have reliability and up-time appropriate to users' needs and a fast rate of improvement while monitoring and validating capacity and performance. Focused on reliability, scalability and the development of automation to manage a set of repetitive tasks at scale

  • KNOWLEDGE/SKILLS/ABILITY: Experience in one or more of: C, C++, Java, Perl, Python, Bash or Go. Intermediate experience working with Unix/Linux systems from kernel to shell and beyond, with experience working with system libraries, file systems, and client-server protocols. Networking: e.g. TCP/IP, UDP, ICMP, etc., MAC addresses, IP packets, DNS, SDN, OSI layers, and load balancing.  Expertise in designing, analyzing and troubleshooting large-scale distributed systems. Intermediate knowledge of operating systems. Familiarity with algorithms, data structures and complexity analysis.  Intermediate experience designing complex SaaS applications for cloud reliability and scalability. Strong experience with GCP, AWS or Openstack APIs. Intermediate experience with cloud infrastructure automation and CI/CD pipeline design. Expertise in operational monitoring and management tools (Nagios, Datadog, etc.). Intermediate written & verbal communication skills, both highly technical and non-technical. Ability to work closely with non-technical stakeholders and executives. Systematic problem-solving approach, coupled with a strong sense of ownership and drive. Additional skills may be required depending on role; for example Kubernetes, Docker, Terraform, CEPH and other modern tools/technologies.
  • JOB COMPLEXITY: Supports high complexity deployments and internal teams on an as-needed basis. Collaborates with other teams on tools for systems automation. Works in conjunction with multiple teams to ensure up-time and reliability of customer deployments.
  • SUPERVISION: Detailed instruction and/ or supervision under guidance of senior Developers & SREs.
  • EXPERIENCE/EDUCATION: High school diploma or equivalent required. Bachelor's degree in Computer Science or equivalent experience preferred. Usually requires 2+ years of information systems design/architecture/development experience. May require additional certifications depending on specialization.
  • PHYSICAL DEMANDS: General office environment. Considerable stress may occur at times. Some lifting up to 20 pounds required. General office environment. May require long periods sitting and viewing a computer monitor. Moderate levels of stress may occur at times. No special physical demands required.