Site Reliability Engineer II - Remote / WFH

Technical Operations United States

Position at Smarsh

Smarsh empowers its customers to manage the risk and unleash the intelligence in their digital communications. Our growing community of over 6500 organizations in regulated industries counts on Smarsh every day to help them spot compliance, legal or reputational risks in 80+ communication channels, before those risks become regulatory fines or headlines.  Relentless innovation has fueled our journey to consistent leadership recognition from analysts like Gartner and Forrester, and our sustained, aggressive growth has landed Smarsh in the annual Inc. 5000 list of fastest-growing American companies since 2008.

If you are passionate about working with vast amounts of data, obsess over performance and the quality of code, and have a desire grow professionally while having a significant impact on a business, then Smarsh is the right company for you. Our Professional Archive product is constantly growing, and with that growth, we are rapidly expanding our teams to increase our rate of innovation and ensure an excellent customer experience. This product ingests data from our customers 24/7, and working here, you will have exposure to dealing with over 20PBs of raw data, database tables and Solr indexes with 10s of billions of records, and an ingestion pipeline for data that is able to process thousands of communications per second.

The right engineer for Smarsh is someone who views their code as more than simply completing a story and pushing up a unit of work; we are looking for engineers that are passionate about their craft. We foster a culture of learning, collaboration, and knowledge sharing and look for engineers who want to be part of a team where they can learn and grow while also investing in others’ development. We have exciting engineering roles across teams for people who either want to do a mix of everything or for engineers who would rather specialize in areas like legacy code refactoring, performance optimization, cloud native integration, etc.

If you have ever dreamed of making an enormous impact through your contributions or aspired to take on additional responsibilities and grow professionally, then Smarsh would love to have you join us. We are large enough to have the benefits of a big company but small enough that every engineer matters to us.


As a Site Reliability Engineer you will be tasked with daily operations of running the Smarsh SaaS Platform. You will be passionate about uptime metrics, automating away toil, and creating pipelines to help our engineers get code into production. You’ll work closely with our Engineering, QA and Technical Operations group to manage our current on-premise deployments and cloud native infrastructure. Our stack runs on-premise and in the cloud on .NET and Java while using backend components such as MSSQL/MySQL, Redis, Kafka and Zookeeper.

Technical Requirements

  • Strong experience working on a high-volume SaaS application managed with modern Infrastructure-as-Code methodologies/tooling.
  • Experience with container technologies and orchestration platforms (Docker, Kubernetes, Rancher, Cloud Foundry)
  • Experience with running/managing systems and services on a cloud platform (AWS, Google Cloud, Azure)
  • Experience managing and using CI/CD systems (Circle CI, Concourse, Jenkins, TravisCI)
  • Strong experience working with configuration management tools. (Terraform, Ansible, Puppet)
  • Experience deploying and/or operating a centralized logging system (ELK stack or Splunk)
  • Experience working with monitoring and observability tools (We use Datadog and New Relic) 
  • Familiarity with working with relational databases MSSQL or MySQL databases
  • Background working in a multi-platform environment (Linux, Windows)
  • Familiarity with Agile/Scrum/Kanban methodologies 
  • Strong knowledge around of programming/scripting languages (ie. Python, Bash, Powershell, Go, etc.). Software Engineers looking to get into SRE/Devops encouraged to apply.

Core Responsibilities

  • Manage day to day operations of our SaaS on-prem platform ensuring health and performance of platform.
  • Creatively solve problems in the DevOps space, collaborating with Development, DBA, and QA team members
  • Communicate and coordinate effectively with Product, Customer Success and Integration teams on operations tasks and deployments.
  • Listen to our internal customers/teams, understand their pain points, coach/mentor them for working smarter
  • Execute with modern container and cloud native best practices
  • Document decisions regarding technology choices, best practices and process flow
  • Help create and manage continuous integration systems.
  • Mentor and up-level other SRE team members and champion best practices within the team.
  • Automate builds and deployments across multi-platform environments

Additional Requirements

  • Strong interpersonal skills
  • A can-do attitude and sense of urgency for a high growth/fast paced environment
  • Proven track record of owning a complex technical project from inception to completion.
  • BS in Computer Science or equivalent experience
  • Curious mind, wanting to learn new technologies and share with others.
  • The ability to think outside of the box to resolve issues and create long lasting solutions
  • Participation on an on-call schedule

About our culture

Smarsh hires lifelong learners with a passion for innovating with purpose, humility and humor. Collaboration is at the heart of everything we do. We work closely with the most popular communications platforms and the world’s leading cloud infrastructure platforms. We use the latest in AI/ML technology to help our customers break new ground at scale. We are a global organization that values diversity, and we believe that providing opportunities for everyone to be their authentic self is key to our success. Smarsh leadership, culture, and commitment to developing our people have all garnered Best Places to Work Awards. Come join us and find out what the best work of your career looks like.