Sr. Site Reliability Engineer - Vipre

Operations Vancouver, British Columbia


Position at j2 Cloud Services

The Systems Engineer reports to the Director of Network Operations as a key team member adding to our collective DevSecOps culture on the Site Reliability Engineering (SRE) team. This position will have a direct impact on production operations for a large multi-site Linux infrastructure providing internet security solutions for domestic and international markets. 


As a Systems Engineer you will work closely with members of the Systems Operations team to build, manage and maintain a suite of online security products.


This is a high-profile opportunity within a highly profitable and very dynamic high-tech company. We've established a strong foundation to make us the undisputed leader in our space. Now we need to add an experienced, collaborative, and passionate DevOps professional to help us grow, support, and maintain our services for our customers.

Job Duties:

  • Responsible for maintaining and continually exceeding our published SLA’s in the production environment.
  • Design and improve system monitoring (including; instrumentation, observability, and alerting)
  • Reviewing and improving CI/CD pipeline and system configuration.
  • Oversee software and configuration changes to production environment
  • Work closely with development teams to ensure new features and changes meet technical design and operational friendliness goals.
  • Work closely with systems administration teams to ensure production issues are escalated and triaged appropriately. Participate in retrospective meetings.
  • Execute project work required for new business initiatives such as; non-function requirement gathering and implementation; reviewing system security requirements; new product and feature rollout planning; respond to non-planned events such as urgent system security upgrades.
  • Maintain and enhance our platform of internet and email security solutions by creating tooling and automation.
  • Interface with Systems Administration and Development teams to maintain SLA, uptime, and security requirements.
  • Troubleshoot, manage, and resolve problem tickets for application errors and emerging security threats.
  • Perform capacity analysis and provide forecasts for required infrastructure capacity increases.

Job or Project Requirements and Experience:


  • At least 5 years of experience working in a high traffic, managed, scalable and fault-tolerant Linux environment.
  • Proficient with Bash, Python or Golang
  • Experience with systems automation and configuration management tools such as Ansible, Puppet, or Chef.
  • Experience with change management policies and procedures.
  • Experience with container-based virtualization (Docker/LXC).
  • In depth experience with MySQL, including replication and high availability.
  • Solid understanding of networking fundamentals and protocols with hands-on experience configuring VPNs, Firewalls, Switches, Routers, etc. 
  • Must have excellent written and verbal communication skills in English.
  • You are service-oriented and enjoy working with engineers to manage, deploy and tune high volume applications in a cloud environment.
  • You are motivated and keep on top of current technical trends along with deep knowledge of your tools of choice.
  • You are reliable, and we can count on you in times of need.
  • Candidates should be able to demonstrate strong problem-solving methodology and the ability to work with individuals at all levels of the organization as well as external vendors.


  • Experience with metrics and monitoring tools such as Prometheus, Grafana, InfluxDB, and Nagios.
  • Experience with the HashiCorp stack including Nomad, Consul, Vault, and Terraform.
  • Experience with cloud technologies such as AWS, Google Compute, or Azure.
  • Experience supporting systems with compliance requirements such as; SOX; HIPPA; GDPR