Site Reliability Engineer, Technical and Cloud Services

Software Engineering Plano, Texas


Description

This Site Reliability Engineer position is a technical role within the Technical and Cloud Services group that helps ensure the reliability, scalability, and performance of our infrastructure while driving automation and efficiency in our development process. This engineer provides technical guidance to team members and other development teams related to the Cloud Services initiatives and helps to ensure our Cloud initiatives are consistently improving. This role requires a proactive problem-solving mindset, excellent communication skills, and the ability to collaborate effectively with our cross-functional teams that help courts and public safety organizations of all sizes better protect and serve the public. By helping provide solutions that improve efficiency and response time, you can help serve our citizens and make communities safer.

NOTE:  This is a hybrid position that requires the candidate to be in the Plano, TX office at least 2 days per week. 

SHIFT EXPECTATION:
8 AM – 5 PM Central Time (with one-hour break)

Responsibilities

  • Design, build, and maintain highly available, scalable, and secure infrastructure components to support our applications and services.
  • Implement and maintain monitoring, alerting, and logging systems to proactively identify and 
    resolve issues before they impact users.
  • Collaborate with development teams to automate deployment pipelines, infrastructure 
    provisioning, and configuration management using tools like Jenkins, Terraform, and Kubernetes.
  • Lead incident response and post-mortem activities to identify root causes and implement 
    preventive measures to minimize future incidents.
  • Conduct regular performance tuning and optimization of system resources to ensure optimal 
    efficiency and cost-effectiveness.
  • Drive continuous improvement initiatives to streamline processes, enhance reliability, and 
    increase productivity across the organization.
  • Stay current with industry trends, best practices, and emerging technologies in SRE, DevOps, 
    cloud computing, and automation.

Qualifications

  • BS/BA degree in Computer Science, Computer Engineering, Information Systems, or similar field
  • 5-7 years of experience in a Site Reliability and/or DevOps role, with a strong understanding of both disciplines.
  • Proficiency in cloud computing platforms such as AWS, Azure, or GCP, including infrastructureas code (IaC) tools like Terraform.
  • Hands-on experience with containerization and orchestration tools such as Docker and 
    Kubernetes.
  • Expertise in scripting and programming languages such as PowerShell, Python, Bash, or Go for automation and tooling.
  • Strong understanding of database concepts and administration, including T-SQL scripting, 
    indexing, and performance tuning.
  • Solid understanding of networking concepts, security best practices, and system administration in Windows and Linux environments.
  • Strong analytical and problem-solving skill, with the ability to troubleshoot complex issues and 
    drive resolution in a timely manner.
  • Excellent communication and interpersonal skills, with the ability to collaborate effectively with
    cross-functional teams and stakeholders.
  • Creative problem solving with demonstrated pattern of applying creative solutions to bridge the gap between product development and environmental considerations.

Required:

  • AWS
  • MS SQL Server and/or PostgreSQL
  • Windows Server OS
  • Linux OS
  • PowerShell
  • Python
  • IIS

Preferred:

  • Knowledge of .NET Framework and Languages (C#, VB.NET)
  • Experience with monitoring tools, such as DataDog, AWS CloudWatch, and SolarWinds 
    Database Performance Analyzer
  • Experience with PagerDuty
  • Knowledge of Agile Development, with experience with tools such as JIRA
  • Knowledge of Web Development Practices and Technologies
  • Experience with ticketing systems, such as Microsoft CRM
  • Octopus Deploy
  • Hangfire
  • Elasticsearch
  • RabbitMQ
  • Apache
  • Advanced knowledge of Microsoft Hosting Technologies
  •  Advanced knowledge of Dev-Ops practices