Site Reliability Engineer / Deployment Engineer
About the role:
The Cloud Operations function is concerned with all aspects of operations management of the FINEOS Cloud for our customers’ pre-production and production environments. This includes uptime, deployment, administration and technology selection. We have an opening for a Site Reliability Engineer within the Cloud Operations team reporting to the Regional Head of Cloud Operations.
The successful candidate will be part of a global team responsible for the availability of the FINEOS stack in our customer facing data centres and responding to our daily challenges. Key characteristics we are looking for; strong troubleshooting skills, AWS technical knowledge and excellent communication skills. Responsibilities will range from build, deploy, operate, monitor and maintain large scale multi-system SaaS environments in public/private clouds that are secure, stable, auto-scalable and always on. This role generally works a standard business week (Irish business hours), but occasional weekend work and / or on-call rotations may be required.
Summary Job Description/responsibilities
- The Site Reliability Engineer will be responsible for the oversight and management of FINEOS's Customer facing applications, servers, and network 24x7 within their production and pre-production environments including their provisioning.
- Incident Management through second level responder to the monitoring alerts
- Proactively managing the environments to avoid incidents using the monitoring tools and alerts.
- Effective cost management of the AWS infrastructure
- Capacity and demand management for the environments
Skills and requirements
- Ability to own and drive Customer related issues to resolution working with the Customers technical teams.
- Excellent communication and analytical skills for incident and problem management as well as stakeholder management (FINEOS Engineers, Devops and Customer)
- Work well in a distributed, fast paced and dynamic team environment.
- Strong commitment to great customer service
- 6+ years of experience providing system support in a technical environment working with customers, with special focus in system performance and stability (Software as a Service companies preferred) – Site Reliability Engineering.
- Proven technical background in web applications and applications system performance
- Familiarity with AWS; Web Applications; Networking fundamentals, Unix; Scripting Language;
- Sc. / M.Sc. Computer Science / Information Systems / IT Operations Management
- AWS Certified Solution Architect, SysOps Administrator, OR AWS Certified Engineer - DevOps
Detailed Job Description/responsibilities
- Recommend and implement changes to the Customer environment that improve the stability, performance and security of the FINEOS Cloud platform ensuring uptime and availability for FINEOS solutions to meet the necessary SLAs including deployment and disaster recovery activities.
- Driving technical excellence and improvement across the organization as well as across offerings, participate in projects and initiatives to improve processes and drive efficiency gains (for example in areas of automation; system integrity; infrastructure costs).
- Enforcing the appropriate security compliance in Customer environments as agreed with the FINEOS Information Security Council.
- Problem and Incident Management
- Collaborating with Customer Support to resolve complex customer issues that span application and cloud infrastructure.
- Monitoring and responding to alerts that are generated on the components of our infrastructure and systems, including investigating for root cause, and troubleshooting errors/ issues.
- Work directly with customers, operations and development to research, troubleshoot, and resolve performance issues in a timely manner.
- Environment Maintenance and Management
- Performing regular maintenance requests in the environments as part of continuous delivery of new functionality and services.
- Installing updates and patches as needed to ensure the health and integrity of our infrastructure and systems (environments).
- Provide input on automation as well as continuing to improve the performance and reliability of the network and the overall service.
- Oversee operational activities within all aspects of AWS, to include VPCs, EC2 instances, RDS, volume snapshots and other key technologies
Detailed Skills and requirements
- 6+ years of experience providing system support in a technical environment working with customers, with special focus in system performance and stability (Software as a Service companies preferred).
- Familiarity & Understanding of AWS public and private cloud architecture
- Familiarity & Understanding of Networking concepts like Routing, SNMP, Web Application Firewalls, Load Balancing, and VPN
- Strong knowledge of AWS networking and architecture
- Strong knowledge of networking fundamentals, and common Internet protocols, in-depth knowledge of the TCP/IP Stack DNS, SMTP and ability to troubleshoot issues.
- At least five years’ equivalent work experience in a senior technical position responsible for managing infrastructure operations and application support
- Familiarity & Understanding of Oracle databases
- Ability to automate repeatable tasks via scripting using tools such as Chef or Puppet
- Familiarity & Understanding of Unix operating systems and command line with ability to administer common Unix services (HTTPD, DNS, DHCP, SMTP, LDAP)
- Preferred experience working with web applications, scripting languages, network technologies, and/or relational databases (Oracle, Postgres, Apache, Tomcat, OpenSSL).
- Strong knowledge of Service Oriented Architecture, Web services and distributed systems.
- Strong Knowledge of at least one scripting language (bash, awk, python, ruby, etc)