HPC Storage Systems Administrator (DAOS) – National Lab Support

Remote

Posted 4 months ago

Myticas Consulting is seeking an experienced HPC Storage Systems Administrator on behalf of a confidential national laboratory client. This is a senior, 100% remote contract role focused on maintaining reliable object storage for demanding scientific workloads in a high-performance computing (HPC) environment.

Location: Remote (Supporting client near Argonne, Illinois)
Contract Type: Full-time (40 hours/week)
Experience: 3–7 years administering Linux in production; 2+ years operating high-performance distributed storage systems at scale.
Focus: Day-to-day operations, maintenance, and incident resolution for large-scale HPC storage clusters, specifically DAOS.

Key Responsibilities: Distributed Storage Operations and Automation

The administrator is responsible for the health, stability, and security of the client’s cutting-edge distributed storage technology.

Daily Operations: Provide daily operations support, maintenance, and issue resolution for HPC storage clusters, with a focus on DAOS.
Diagnostics & Vendor Coordination: Monitor system health, perform diagnostics and root-cause analysis, and coordinate with internal teams and hardware/software vendors (e.g., HPE) to resolve storage incidents.
Maintenance: Perform upgrades, patches, and configuration changes to maintain system stability and security.
Automation: Automate routine administration tasks using scripting (Bash, Python) and/or configuration management tools (Ansible).
Documentation: Create or follow operational runbooks and documentation to ensure the availability and reliability of large-scale distributed object storage.

Required Skills & Expertise: Storage at Scale

Success requires experience managing petabyte-scale storage systems and hands-on hardware/software troubleshooting abilities.

Linux Systems: 3–7 years administering Linux systems in production environments, including command-line administration.
Distributed Storage (2+ years): Experience with large-scale distributed object or parallel file systems (e.g., HPE DAOS, Lustre, GPFS/Spectrum Scale, Ceph) and coordinating with vendors.
Hardware: 2+ years hands-on experience with server and storage hardware troubleshooting and maintenance.
Automation: 1+ years automating system administration tasks with scripting (Bash, Python) or configuration management tools (Ansible).
Residency: Must reside in the United States and be authorized to work without sponsorship.

Preferred Skills:

Hands-on production experience administering HPE DAOS.
Experience supporting HPC clusters, supercomputing environments, or scientific workflows.
Experience creating operational runbooks, monitoring dashboards, and documentation.

Job Features

Job Category

Cloud Engineering, Software Engineering

​Key Responsibilities: Distributed Storage Operations and Automation

​Required Skills & Expertise: Storage at Scale

​Preferred Skills:

Job Features

Apply For This Job

Key Responsibilities: Distributed Storage Operations and Automation

Required Skills & Expertise: Storage at Scale

Preferred Skills: