High Performance Computing (HPC) System Engineer
Information Technology/Computing | livermore, CA | 08/23/2022
Job Code: SES.2 Science & Engineering MTS 2 / SES.3 Science & Engineering MTS 3
Position Type: Career Indefinite
Security Clearance: Anticipated DOE Q clearance (requires U.S. citizenship and a federal background investigation)
Drug Test: Required for external applicant(s) selected for this position (includes testing for use of marijuana)
Medical Exam: Not applicable
Join us and make YOUR mark on the World!
Are you interested in joining some of the brightest talent in the world to strengthen the United States’ security? Come join Lawrence Livermore National Laboratory (LLNL) where our employees apply their expertise to create solutions for BIG ideas that make our world a better place.
We are committed to a diverse and equitable workforce with an inclusive culture that values and celebrates the diversity of our people, talents, ideas, experiences, and perspectives. This is essential to innovation and creativity for continued success of the Laboratory’s mission.
We have an opening for a High Performance Computing (HPC) System Engineer to support one of the largest supercomputer centers in the world. The selected candidate will work in a challenging and team-oriented environment supporting Livermore Computing’s (LC) high performance computing clusters. You will apply fundamental knowledge of HPC systems and contribute to technical projects using creativity and imagination. The position requires the ability to serve periodically on a rotating off-hours on-call list. This position is in the Livermore Computing Division within the Computation Directorate.
This position will be filled at either the SES.2 or SES.3 level based on knowledge and related experience as assessed by the hiring team. Additional job responsibilities (outlined below) will be assigned if hired at the higher level.
In this role you will
- Provide system administration support for Linux-based HPC, Network Attached Storage (NAS) systems, Infrastructure and Parallel file systems servers and clusters.
- Participate in the design and implementation of multiple Linux-based HPC, Infrastructure and Parallel file system servers and clusters.
- Build, configure, and maintain multiple RAID controllers and disk enclosures systems.
- Deploy and maintain high-speed cluster fabrics for compute and storage networks.
- Monitor and conduct installations of software releases, patches of the operating system, and third-party utilities with emphasis on overall system security.
- Improve the quality of service for end users, working with system engineers, Hotline, and Operations staff.
- Troubleshoot and determine root cause of moderately complex system issues.
- Respond to system problems and user questions in person, via email, and via a trouble ticket system.
- Perform other duties as assigned.
Additional job responsibilities, at the SES.3 level
- Analyze and tune performance of complex computer, network, file system and disk sub-systems.
- Investigate, evaluate, test, and recommend technical solutions for future systems.
- Develop tools and procedures to monitor and automate system tasks on servers and clusters.
- Ability to secure and maintain a U.S. DOE Q-level security clearance which requires U.S. citizenship
- Bachelor’s degree in computer science or related field or the equivalent combination of education and related experience.
- Broad experience with Linux systems including installation, configuration, networking, backups, updates and patching, and system security.
- Broad experience with or knowledge of HPC environments and technologies such as high-speed cluster fabrics (Infiniband), job scheduling (Slurm), and parallel file systems (Lustre and GPFS).
- Comprehensive knowledge of scripting and programming languages, such as, Perl, Python, and bash/csh/ksh.
- Proficient with disk and storage systems, such as host-based RAID controllers, software RAID and vendor RAID systems.
- Comprehensive experience with version control and configuration management systems, such as, git, Ansible, and cfengine.
- Demonstrated ability to work with limited direction in a dynamic environment with competing priorities.
- Ability to work off-hours and on-call (intermittently either as needed or as part of a rotation).
- Proficient communication, interpersonal skills, and the ability to work and communicate with other technical staff and end-users.
Additional qualifications at the SES.3 level
- Significant experience with Linux system administration in support of several independent but inter-related systems and software packages, and knowledge of container technologies, Kubernetes, and other virtualization machine software environments.
- Advanced knowledge of and significant experience providing innovative solutions to broadly defined tasks and problems.
- Advanced communication, interpersonal skills, and the ability to effectively interact with system developers and vendors with minimal direction.
Qualifications We Desire
- Master’s degree in computer science or related field.
- Experience with local, parallel and distributed file systems, such as, XFS, ZFS, GPFS, Lustre, and with NAS platforms, such as, NetApp FAS systems running OnTap 9.x.
- Design and deployment experience with container technologies (singularity, docker, podman) and Kubernetes (OpenShift), and other virtualization environments, such as, KVM, and VMware ESXi 6.7/7.x.
Additional InformationAll your information will be kept confidential according to EEO guidelines.
This is a Career Indefinite position. Lab employees and external candidates may be considered for this position.
Why Lawrence Livermore National Laboratory?
- Included in 2022 Best Places to Work by Glassdoor!
- Work for a premier innovative national Laboratory
- Comprehensive Benefits Package
- Flexible schedules (*depending on project needs)
- Collaborative, creative, inclusive, and fun team environment
Learn more about our company, selection process, position types and security clearances by visiting our Career site.
This position requires a Department of Energy (DOE) Q-level clearance. If you are selected, we will initiate a Federal background investigation to determine if you meet eligibility requirements for access to classified information or matter. In addition, all L or Q cleared employees are subject to random drug testing. Q-level clearance requires U.S. citizenship. For additional information, please see DOE Order 472.2.
Pre-Employment Drug Test
External applicant(s) selected for this position will be required to pass a post-offer, pre-employment drug test. This includes testing for use of marijuana as Federal Law applies to us as a Federal Contractor.
Equal Employment Opportunity
LLNL is an equal opportunity employer that is committed to providing candidates and employees with a work environment free of discrimination and harassment. We value and hire a diverse workforce as it is a vital component of our culture and success. All qualified applicants will receive consideration for employment without regard to race, color, religion, marital status, national origin, ancestry, sex, sexual orientation, gender identity, disability, medical condition, pregnancy, protected veteran status, age, citizenship, or any other characteristic protected by applicable laws.
At LLNL, our goal is to create an accessible and inclusive experience for all candidates applying and interviewing at the Laboratory. If you need a reasonable accommodation during the application or the recruiting process, please submit a request via our online form.
California Privacy Notice
The California Consumer Privacy Act (CCPA) grants privacy rights to all California residents. The law also entitles job applicants, employees, and non-employee workers to be notified of what personal information LLNL collects and for what purpose. The Employee Privacy Notice can be accessed here.