Senior Site Reliability Engineer

Engineering Austin, Texas Remote - United States


Description

Position at RetailMeNot

We are looking for a Senior Site Reliability Engineer to join our SRE team. We define the systems and services that allow RetailMeNot development teams to have a low-friction deployment experience, with always-on monitoring, and the fault-tolerant infrastructure. SRE works on defining Service-Level Agreements (SLAs), leading incident response, practicing blameless post-mortems, and automating processes to reduce operational toil.
As a member of the team, you'll be responsible for delivering highly scalable and resilient cloud deployment and provisioning solutions. You will be exposed to technology like Kubernetes, Prometheus, Helm, Terraform, and others. You will help automate and streamline the company's operations and processes through innovations in application monitoring and alerting, CI/CD modernization, and other efforts.
Are you a software engineer with a real passion for delivering and automation opportunities? Are you looking to join an organization with a well-established SRE culture? Start here!
RetailMeNot is headquartered in Austin, TX! This position is fully remote and we encourage applicants nationwide!

Who You Are

    • You have a Bachelor's degree in computer science or equivalent STEM field, or equivalent work experience.
    • You have at least 4 years of SRE or DevOps experience.
    • You have deep understanding of AWS and cloud architectures/services.
    • You have experience writing code with Python, Shell, Go, Java, or similar languages.
    • You have expertise within the container and container orchestration space (Docker, Kubernetes, etc.).
    • You have worked with infrastructure provisioning tools like CloudFormation, Terraform, Chef, Puppet, or others.
    • You have enabled CI/CD pipelines using tools such as Jenkins, AWS Code Pipeline, Gitlab, or others.
    • You bring a deep understanding and application of computer science fundamentals: data structures, algorithms, and design patterns.
    • You understand networking protocols (TCP/IP, HTTP, DNS, etc).
    • You have a track record of delivering successful solutions and collaborating with others.
    • You have strong interpersonal skills and can explain SRE concepts to a wider audience.

What You'll Do

    • You'll be working closely with Docker and Kubernetes to containerize applications and provision infrastructure.
    • You will investigate new technologies and tools and recommend those that best fit the team and organization.
    • You will champion SRE methodologies around monitoring, distributed tracing, deployment strategies (e.g. canary, sandbox), and logging.
    • You will identify and execute on opportunities to optimize existing systems, improve infrastructure, and eliminate work through automation.
    • You will educate other engineering teams and advocate for scalable and maintainable architectural decisions.
    • You will participate in our on-call rotation for production services.

Who We Are

    • We have an open environment where engineers are given a lot of responsibility and the freedom to make a huge impact.
    • We have lots of smart people to work with and learn from.
    • We work on large scale challenges with a variety of technologies and believe in an ever-growing diversity of technology platforms.
    • We believe in giving prizes, bonuses, and recognition for doing what you enjoy.
    • We have a phenomenal open vacation policy.
This is a remote/office based position which may be performed anywhere in the United States except for within the state of Colorado.
#RetailMeNot
#LI-KW1