Site Reliability Engineer

Engineering Santa Ana, San Jose

Position at Backcountry

Site Reliability Engineer

Backcountry’s engineers are aggressive about finding new technologies to make systems better, faster, more stable, and more interesting to work on. Any developer in the company can get passionate about something new, start using it, and rally others around putting it into production.

SREs at Backcountry play a crucial role in helping facilitate and provide structure to this innovation. We vet technologies for production readiness, provide deployment guidelines and support the operational environments. We strive to do this in a collaborative way with our Engineering counterparts as we work toward migrating the organization to a DevOps culture.

About the Job

  • Full stack ownership - load balancers, web servers, application servers and caching services
  • Design and develop applications to manage, maintain and support our infrastructure ( Infrastructure as Code )
  • Participate in code reviews with other SREs
  • Debug hard problems on production systems - data, hardware, software, application and network
  • Monitor system health and capacity and take proactive action to fix problems before they occur
  • Participate in an on-call rotation and serve as an escalation contact for incidents as needed
  • Collaborate with Development Engineering teams to build, deploy, support build new features

About You

  • 5+ Years experience managing Linux/Unix systems, with demonstrable knowledge of operating system internals, file systems and full-stack troubleshooting
  • Ability to code in at least one high level language ( preferably Python )
  • Familiar with Containerization ( Docker ) and Orchestration ( Kubernetes )
  • Familiar with continuous integration and continuous delivery principles
  • Cloud computing with AWS using Terraform/Ansible
  • Use of virtualization with VMware Terraform/Ansible
  • Knowledge of basic configuration and maintenance of common applications used in internet infrastructure such as: Apache, Tomcat, Nginx, Varnish, Redis, RabbitMQ
  • Knowledge of NoSQL databases is a plus
  • Ability to rapidly learn new development languages, software, technologies, frameworks and APIs
  • Identify problems and implement solutions on 24/7 production environments
  • Strong communication and collaboration skills
  • Ability to prioritize tasks and work independently as well as in a team