Sr. Site Reliability Engineer

Technical Operations Open, Any


Position at Smarsh

Senior Site Reliability Engineer

Smarsh is the leading provider of archiving & compliance solutions for companies in regulated and litigious industries. The solutions are delivered using Smarsh product suite that process, control, manage and store a very large variety of electronic communication channels (from e.g. social networks, group chat, instant messaging, email, blogs, wikis, SMS/MMS, Voice etc.) at cloud scale.

About Connected Capture

Connected Capture is a suite of highly successful products (deployed on-premise & Cloud) that supports 80+ communication channels across Email, Mobile/Text, IM & Collaboration, Social, Voice and Web. Once captured, communications can be sent to the seamlessly integrated Connected Archive for compliant storage, or to an existing archive. All communications are captured continuously, in their native form, direct from the source channel, with full conversational context preserved. Policy control capabilities including message blocking and ethical walls are applied real-time for certain channels. The Capture Suite also includes a new-Gen Strategic Platform for Capture.

Roles & Responsibilities

· Join us as a Senior Site Reliability Engineer (SRE) in the Capture SaaS Operations team; you'll be part of a team which measures and improves production performance reliability through sustainable engineering practices

· Identify, and automate Toil in the Operational workflows

· Design & Build observability and monitoring dashboards

· Establish/Enforce/Participate On-Call rotation procedures, Troubleshoot application issues

· Define and set up intelligent Monitoring and Alerting so we’re aware of incidents before outages happen

· Identify, Establish, Train, and enforce service level objectives

· Build and leverage knowledgebase that can be leveraged to be effective for handling future incidents

· Mentor junior engineers in the team

Required Experience/Skills

· Minimum 5 years of industry experience in Site Reliability Engineering

· BS in CS or equivalent

· Experience delivering with CI/CD pipelines (Concourse, Bamboo, Jenkins)

· Experience working inside modern observability platforms like:

o Centralized logging (ELK, Splunk)

o APM (New Relic, AppDynamics, Dynatrace)

o Platform telemetry (DataDog, Nagios)

· Strong coding experience in any of the following: Python, Shell, Java

· Developing and maintaining IaaC using Terraform and Ansible

· Storage systems in both sql and no-sql contexts

· Proven experience to debug, optimize code, and automate routine tasks

· Excellent troubleshooting and problem-solving skills

· Experience working with Containers and Container orchestration platforms (Docker, Kubernetes, PCF and such)

· Experience working with Cloud platform a plus

· Deep understanding of DevOps principles

· A can-do attitude and sense of urgency for a high growth/ fast-paced environment

· A curious mind, wanting to learn new technologies

Why Smarsh?

Ready to join a thriving tech company that’s redefining digital archiving and business intelligence?

Smarsh is the leading comprehensive archiving platform. Recognized as one of today’s fastest growing companies in the U.S., Smarsh delivers innovative cloud-based solutions that help organizations manage and enforce flexible and secure records retention and compliance strategies for electronic communications, including social media and enterprise social networks (Yammer, Chatter, Facebook, LinkedIn and more).

Our motto is ‘People First. Inspire Confidence. Embrace the Impossible.’ We hire lifelong learners who have a passion for their discipline and a track record of excellence. To learn more about us, visit www.smarsh.com/careers.

#LI-Remote