HPC Systems Architect
The Enterprise Technology Services team works with our internal customers to help define and deliver high impact solutions based on VMware, Linux and Windows platforms to support our own software products effectively. This HPC Systems Architect role involves interacting with our Model Developers as well as other business units, gathering comprehensive requirements, and delivering highly effective virtualized and physical solutions. Working closely with internal groups and vendor partners, you will have the opportunity to combine your technical skills, creativity, and customer focus to deliver great solutions that ensure our customers get the best out of these technologies.
Objective of the HPC Systems Architect:
The Application and Model developers who create our scientific models rely on several HPC clusters to assist in their compute, research, and analysis needs. This position has the primary responsibility of maintaining the HPC compute clusters, how they are used and accessed across our Model development teams and assisting the Model developers in optimizing their compute/storage needs. The role HPC System Architect will also be part of the team responsible for creating and executing the technology roadmap to support future company direction for Model development.
This is an exciting and challenging position for an individual who is simply not just looking for another IT role, but a role which has autonomy, growth, and knowledge that you are significantly contributing to the creation of the next generation of Catastrophe Models used by many of the world leading insurer’s, reinsurers, and brokers.
Key Accountabilities & Deliverables:
- Perform system administration of company-wide Linux and HPC environments – maintenance, performance, and issue resolution.
- Development and alignment of the infrastructure with Application Development Roadmaps.
- Implementation of performance metrics to measure system and application demands / bottlenecks.
- Perform and monitor server backup and off-site data replication.
- Researching and recommending equipment and software solutions for to enable effective use of budgets.
- Provide support for various virtual and cloud environments in AWS and Azure and maintain file storage.
- Experience of direct hands-on experience of managing and supporting HPC environments including maintaining consistency between UK and US data centers.
- Experience of designing, integrating, and maintaining production Linux-based IT services.
- Strong understanding of IBM GPFS or other distributed filesystem, Sun Grid Engine and TSM.
- Expertise with one or more scripting/programming language (Python, Go, Shell, Bash/csh, Julia, Fortran C) and libraries.
- Experience using Public Cloud Services in AWS, Azure or GCP.
- Experience in the financial services or academia fields is a plus.
- Experience configuring and maintaining secure, large-scale, complex, distributed applications/platforms.
- Experience implementing configuration management, monitoring, and deployment tools like xCAT, Foreman and Nagios.
- Strong expertise of SAN Storage, Fiber Channel, InfiniBand, NFS and PostgreSQL.
- Installing and configuring new hardware including but not limited to MPI/OpenMP with GPU and CPU support.
- This role requires one’s ability to physically report to the RMS London Office and Data Center.
Other Skills required:
- Industry Knowledge/Bus. Acumen
- Client Focused
- Solutions Driven
- Creativity & Innovation
- Results & Action Oriented
- Decision Making
- Self-Knowledge & Development
- Teamwork & Collaboration
- Planning and organizing
- Coaching and Mentoring
- Managing and Measuring Work
- Strategic Agility
There is a 1% chance an earthquake will cause $50 billion of insured loss within the next 12 months and a 5% chance that a hurricane will cause $60 billion of insured losses next year. At RMS, we turn risks into real numbers. How? By building simulation models that allow insurers and investors to understand and manage their global risks--from hurricanes, quakes, and wildfires, to cyber-attacks, terror attacks, and pandemics. Why? We want to build a more resilient world, and we’re on a mission to help make every risk known.
Insurers, reinsurers, investors, financial institutions, governments, and NGOs trust RMS solutions to better understand and manage catastrophe risks. RMS was founded in 1989 by Stanford scientists who created our first model for California Earthquake. Today, RMS has some 1,300 employees across 13 offices in the US, London, Bermuda, Zurich, India, China, Japan, Singapore, and Australia, and over 1,000 products and models now covering six continents.
RMS helped pioneer the natural catastrophe model market we now lead – and we continue to innovate. In May 2019, we announced RMS Risk Intelligence™ (RI), an open-standard platform for strategic risk management. Through this purpose-built platform, clients can tap into RMS HD models, rich data layers, intuitive applications and APIs that simply integrate into existing enterprise systems to support business decisions across underwriting, risk selection, mitigation, and portfolio management.
How we understand and manage risk affects everyone and our passion is nothing less than creating a more resilient world through a better understanding of catastrophic events. Join our team of leading scientists, developers, industry experts, and world-class professionals. Together, RMSers make a difference on a truly global scale.
RMS is proud to be an equal opportunity workplace. We are committed to equal employment opportunity without regard to race, color, creed, gender, religion, marital status, registered domestic partner status, age, national origin or ancestry, physical or mental disability, genetic characteristics, sexual orientation, or any other classification protected by applicable local, state, or federal law.
To all recruitment agencies: RMS does not accept unsolicited agency resumes and will not be responsible for the payment of placement fees related to unsolicited resumes submitted to open positions, job aliases, or to our employees.