HPC Engineer Who You Are You are a talented, multidiscipline engineer versed in getting the best performance out of systems. You are familiar with High Performance Computing using both CPU and GPU based systems. You understand scheduling using SLURM, computing using MPI, and operating software at scale.
What you will be doing:
Playing a key role in defining and operating some of the most complex computer platforms that the client has to bring to bear against complex problems. These systems enable complex analysis, simulation and modeling leveraging massively parallel computing and disparate holding of very large data sets, to answer difficult questions. To do this you will assist the users in deploying jobs to these systems to harness the capabilities of these systems producing answers in the form of analytic product, models and simulations. This mission enablement is the heart of the hardest problems to solve.
Specific duties include:
Responsible for the normal day-to-day HPC operations and maintenance of the HPC systems
Provide day to day systems administration duties for Nvidia GPUs, Commodity Cluster Systems and Cray HPC environments
Perform system monitoring, software installations, debug, upgrades, health checks, and identification/implementation of automated business processes
Provide assessments, on-going performance analysis and recommendations for future architectures
Responsible for operating all the host systems for analysis
Works in a liaison role, linking the analysts and their specialty codes and applications, to the computing systems that are focused on yielding in-depth technically sound results. Oversees analytic applications running on a clustered HPC fabric including CPU and GPU systems
Managing job submission to clients applications and codes using MPI/OpenMPI
Provide in-depth analytic results, to achieve a best-tool-for-the-job approach.
Partners with data scientists, engineers, and analysts conducting specialized scientific and engineering analysis.
Escalate issues and problems to hardware support and/or engineering management as necessary
Responsible for continuous performance analysis and tuning the HPC environment Assist with the identification, troubleshooting, and repair of software problems impacting performance of implemented HPC solutions
Perform installation of software patches including upgrades to operating systems and firmware
Assist with the resolution of trouble tickets and software problems identified by systems users
Identify and expand services and functionalities offered in HPC environment
Be a primary point of contact to resolve any hardware or software malfunctions, including working with service personnel as necessary
Review system logs to identify and resolve software and systems related issues
Prepare reports related to the operational efficiency of the hardware and execution of users jobs
Experience with MPI/OpenMPI, SLURM, and Linux Operating Systems essential
Prior experience as a Systems Administrator essential, with a preference for experience working with clustered systems including GPUs in the hardware stack
Experience with high speed networking, and CUDA preferred
Software integration experience a plus
Other duties could be required to support the customers mission
What you will need to succeed:
Minimum of 6 years demonstrated on-the-job experience
Security clearance required: TS/SCI w/ Polygraph
Demonstrated on-the-job experience with integrating functionality from disparate systems via scripting/tooling/automation
Demonstrated on-the-job experience with the Sponsor's system security environment and requirements
Demonstrated experience leading systems architecture, operations, maintenance and administration
GDIT IS YOUR PLACE
At GDIT, the mission is our purpose, and our people are at the center of everything we do.
- Growth: AI-powered career tool that identifies career steps and learning opportunities
- Support: An internal mobility team focused on helping you achieve your career goals
- Rewards: Comprehensive benefits and wellness packages, 401K with company match, and competitive pay and paid time off
- Community: Award-winning culture of innovation and a military-friendly workplace
#MD_2026Alumni
#IntelligenceEngineered
#praxisjobs