Site Reliability Engineer (SRE)

Information Technology Opportunities available throughout U.S.


Description

Position at Gainbridge

Group1001 is a consumer-centric, technology-driven family of insurance companies on a mission to deliver outstanding value and operational performance by combining financial strength and stability with deep insurance expertise and a can-do culture. Group1001’s culture emphasizes the importance of collaboration, communication, core business focus, risk management, and striving for outcomes. This goal extends to how we hire and onboard our most valuable assets – our employees.

Company Overview:

Gainbridge, a part of the Group 1001 enterprise, is a self-managed, innovative, digital platform providing its clients with direct access to trusted financial products to smartly grow their savings over time. Gainbridge strives to offer products through its platform that are simple, intuitive, and backed by smart technology with no complexity or hidden fees. Gainbridge empowers clients to take control of the financial future with simple solutions that are accessible to everyone no matter their budget.

 

Job Summary:

We are looking for an experienced Site Reliability Engineer to partner with our growing Engineering team as we deliver our core platform to market. You will work with full stack development teams in the design, development, testing, and deployment of the Gainbridge B2B / B2C platform. You are comfortable with greenfield environments and look forward to defining what it means to be an SRE at Gainbridge.

You enjoy solving seemingly intractable puzzles in innovative yet pragmatic ways. You hold yourself and others accountable for routinely delivering mission critical outcomes. You are viewed as a trusted, intellectually honest partner within your team and across functional boundaries. You continually ask yourself, “What is accurate and what should be done about it?” This role will report to the AVP of Engineering Operations.

 

Main Accountabilities:

  • Support and optimize 24/7 cloud operations.
  • Work with Product Team to define SLAs, SLOs, and critical KPIs.
  • Provide incident management to systematically limit business disruption.
  • Maintain the infrastructure with patching, compliance audits, and responding to maintenance alerts.
  • Implement monitoring and provide rapid response to alerts to reduce MTTR.
  • Work with Development and Testing teams to confirm Go/No-Go status on launches and incremental deployments.
  • Integrate new tools and services for observability and automate runbooks to accelerate incident response.
  • Conduct diagnostic postmortems to prevent repeat incidents and improve platform robustness.
  • Recommend and implement best practice solutions, machines, processes, and tools to ensure systematic delivery of great outcomes.
  • Build and maintain strong working relationships with both internal teams and with third party vendors.

 

Qualifications:

  • Experience supporting a financial system at enterprise scale.
  • Excellent analytical skills for evaluating information carefully and solving complex problems.
  • Strong communication skills with ability to distill complex issues to meaningful synthesis.
  • Demonstrated ability to partner with Engineering teams and leaders to actualize vision.
  • Track record of balancing the coordination of fast-paced daily priorities with important, longer-term strategic efforts.
  • 5-7 years of experience as an SRE.
  • Experience partnering with product, business, and program management teams.
  • Experience working in leading financial services organizations.
  • BS degree in computer science or other technical field, or equivalent experience.