In support of the Avalara SaaS suite of services, the Site Reliability Engineer is responsible for the security, compliance, reliability, and efficiency of our production systems. The team is responsible for maintaining service up-time in 24 x 7 environments, meet client expectations of accessible data services, and analyze the data in support of Tier2 client support tickets when requested.
The Site Reliability Engineer will enhance observability, troubleshoot Windows and Linux applications, create AWS infrastructure, write tools & automation, and impact the design of the future platform architecture. Additionally, the Senior Site Reliability Engineer develops support plans for all applications and adheres to those plans to provide the necessary level of production SLOs.
The Avalara product suite is a mix of Windows/SQL environments with a wide variety of additional open source applications present in a Linux environment. The role supports multiple projects and requires interactions and partnerships with teams and personnel from multiple business locations, skill sets, and backgrounds. As such, the position requires strong communication skills in addition to a solid foundation of technical skills, analytical abilities, and end-to-end troubleshooting techniques.
Essential Operational Engineering Skills:
- Recommend application changes to improve application performance
- Work with Development to transition applications from one platform to another when called for by the business
- Cloud Services experience with AWS, Azure or GPC
- Experienced in troubleshooting applications running on a Microsoft .NET Framework and .Net Core stacks or experienced in the LAMP stack. You must have strong experience in one and a willingness to learn the other.
- Provide technical escalation and share in the on-call rotation
- Work independently and as a team-player, as required for the project at hand
- Review existing processes and recommend changes or institute new processes as necessary, including the areas of monitoring, upgrades, and tuning, etc.
- Generate high-quality project documentation, such as architecture designs, implementation plans, design documents, test plans, etc.
- At least 5 years' experience in a SAAS operating environment.
- Deep expertise in the mentality, processes, and tools needed to deliver five nines SLAs.
- Strong working knowledge of Windows or Linux and it's underlying components, system statistics, performance tuning, file systems, and io.
- Experience with either .NET or Linux in AWS.
- Solid scripting skills in Powershell, Python, Bash, or Perl and history of automating workloads
- Experience with production deployment, monitoring and operational support for enterprise-class applications
- Experience in performance diagnostics, capacity planning, performance architecture design, performance tuning, performance monitoring
- Experience working with load balanced high-traffic solutions/services.
- Experience in SQL and database services
- BS in Computer Science or equivalent with 5-8 years of relevant work experience
- Good verbal and written communication skills
Technologies you are likely to be working with: AWS, GitLab, Atlassian suite, Powershell, Python, Terraform & HashiCorp suite, Jenkins, Packer, C#, SumoLogic, CloudHealth, Grafana, Prometheus, HAproxy, RabbitMQ, SQL, Windows, Linux, Docker and containerization technologies etc.
- Can do attitude, no problem is too big or too small.
- A desire to delight customers.
- A systematic problem solver, with the ability to think outside the box.
- Good data analysis skills to pick up trends before they become major problems.
- Previous experience as an enterprise class Site Reliability Engineer
- A strong mix of Software Engineer and Operation Support skills.
- Desire to learn new technologies and programming languages.