High Performance Computing Engineer

High Performance Computing (HPC) McLean, Virginia


 

HPC Engineer

What you will be doing

Playing a key role in defining and operating some of the most complex compute platforms that the client has to bring to bear against complex problems. These systems enable complex analysis, simulation and modeling leveraging massively parallel computing and disparate holding of very large data sets, to answer difficult questions. To do this you will assist the users in deploying jobs to these systems to harness the capabilities of these systems producing answers in the form of analytic product, models and simulations. This mission enablement is the heart of the hardest problems to solve.

       Responsible for the normal day-to-day HPC operations and maintenance of the HPC systems

       Provide day to day systems administration duties for Nvidia GPUs, Commodity Cluster Systems and Cray HPC environments

       Perform system monitoring, software installations, debug, upgrades, health checks, and identification/implementation of automated business processes

       Provide assessments, on-going performance analysis and recommendations for future architectures

       Responsible for operating all the host systems for the analysis

       Works in a liaison role, linking the analysts and their specialty codes and applications, to the computing systems that are focused on yielding in-depth technically sound results.

       Oversees analytic applications running on a clustered HPC fabric including CPU and GPU systems

       Managing job submission to clients applications and codes using MPI/OpenMPI

       Provide in-depth analytic results, to achieve a best-tool-for-the-job approach.

       Partners with data scientists, engineers, and analysts conducting specialized scientific and engineering analysis.

       Escalate issues and problems to hardware support and/or engineering management as necessary

       Responsible for continuous performance analysis and tuning the HPC environment

       Assist with the identification, troubleshooting, and repair of software problems impacting performance of implemented HPC solutions

       Perform installation of software patches including upgrades to operating systems and firmware

       Assist with the resolution of trouble tickets and software problems identified by system’s users

       Identify and expand services and functionalities offered in HPC environment

       Be a primary point of contact to resolve any hardware or software malfunctions, including working with service personnel as necessary

       Review system logs to identify and resolve software and systems related issues

       Prepare reports related to the operational efficiency of the hardware and execution of users jobs

       Experience with MPI/OpenMPI, SLURM, and Linux Operating Systems essential

       Prior experience as a Systems Administrator essential, with a preference for experience working with clustered systems including GPUs in the hardware stack

       Experience with high speed networking, and CUDA preferred

       Software integration experience a plus

       Other duties could be required to support the customer’s mission

 

What you will need

       Minimum of 6 years demonstrated on-the-job experience

       Demonstrated on-the-job experience with integrating functionality from disparate systems via scripting/tooling/automation

       Demonstrated on-the-job experience with the Sponsor's system security environment and requirements

       Demonstrated experience leading systems architecture, operations, maintenance and administration

 

Clearance:
Active TS/SCI with an appropriate polygraph is required to be considered for this role 

Who You Are

You are a talented, multidiscipline engineer versed in getting the best performance out of systems. You are familiar with High Performance Computing using both CPU and GPU based systems. You understand scheduling using SLURM, computing using MPI, and operating software at scale.

Who are we?
Praxis Engineering* was founded in 2002 and is headquartered in Annapolis Junction MD - with growing offices in Chantilly VA and Aberdeen MD.

Praxis Engineering is a consulting, product, and solutions firm dedicated to the practical application of software and system engineering technologies to solve complex problems. 

With over 300 employees supporting more than 50 contracts, Praxis brings together world class engineers with proven engineering best practices, domain expertise, commercial technologies and proven agile management approaches to create high value solutions aimed at helping our customers meet their most critical business and mission objectives.  

*Praxis Engineering is a wholly owned subsidiary of General Dynamics IT.

Why Praxis? 
We are focused on continual learning and evolution. We don’t do things because “that’s the way we’ve always done things”; we listen to our employees and adapt to the changing marketplace.   We look at the big picture and encourage our engineers to get training and certifications in emerging technologies that will help shape our customer’s mission.  We've been profitable year after year.  We're always on the lookout for great engineers to join the team and we recognize that our employees are the heart and soul of what we do.   We focus on recruiting talented people, treating them right, and then allowing them to do what they do best.  No red tape. No micromanagement.  Smart people want to work with smart people, and we love people who are passionate about what they do, and finding ways to do it better.

And then there is the...

 

Benefits

  • Attractive total compensation package to include competitive salary and medical benefits with an option for FREE employee HSA medical plan!
  • Office perks such as free soft drinks and snacks (both healthy and not-so-healthy)
  • Praxis swag (annual gift certificate to purchase top brand Praxis apparel)
  • 401k contribution/match: combination of profit share/contribution (3.5%) and employer match (up to 4.5%) for a total of 8%.
  • Annual bonus plan
  • 4 weeks Paid Time Off + 10 holidays + comp time eligibility. (30+ days of leave to start!)
    • We reward longevity! On your 5th work anniversary – you will receive an additional week of PTO to 5 weeks of PTO. Making it 35+ days of leave altogether!
    • On your 10th work anniversary – you will receive an additional week of PTO to 6 weeks of PTO. Making it 40+ days of leave altogether!
    • At any time, your unused PTO can be traded in for $$$!
  • Carryover a max of 380hours of leave from year to year. You can choose to have a sabbatical one year or trade in your unused PTO for something nice!
  • Student loan repayment Assistance – COMINGSOON!
  • Training is a priority! Take advantage of our endless in-house training opportunities - or seek out vendor offered (paid) training opportunities like conferences, certification courses and seminars. 
    • Conferences (recently attended by Praxis employees): AWS Summit, IoT World, Black Hat and DefCon.
    • Training & Certifications: Splunk, AWS, Big Data/Cloudera, VMWare, Scrum Master...the list of certifications goes on and on!
    • Praxis University: Cyber Research, Data Analytics, IoT, AWS and RedHat course offerings and hands-on training.
  • We truly believe the right work-life balance can exist, and it's here at Praxis. Our work is extremely important, but your job is just a part of who you are. When you enjoy your life outside of our walls, you're at your best the next time you walk through our doors. We do all we can to assure that happens every day.

Praxis Engineering provides equal employment opportunities (EEO) to all employees and applicants for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status, or any other protected class.