HPC Storage Systems Administrator (DAOS) – National Lab Support

Remote
Posted 1 month ago

​Myticas Consulting is seeking an experienced HPC Storage Systems Administrator on behalf of a confidential national laboratory client. This is a senior, 100% remote contract role focused on maintaining reliable object storage for demanding scientific workloads in a high-performance computing (HPC) environment.

  • Location: Remote (Supporting client near Argonne, Illinois)
  • Contract Type: Full-time (40 hours/week)
  • Experience: 3–7 years administering Linux in production; 2+ years operating high-performance distributed storage systems at scale.
  • Focus: Day-to-day operations, maintenance, and incident resolution for large-scale HPC storage clusters, specifically DAOS.

​Key Responsibilities: Distributed Storage Operations and Automation

​The administrator is responsible for the health, stability, and security of the client’s cutting-edge distributed storage technology.

  • Daily Operations: Provide daily operations support, maintenance, and issue resolution for HPC storage clusters, with a focus on DAOS.
  • Diagnostics & Vendor Coordination: Monitor system health, perform diagnostics and root-cause analysis, and coordinate with internal teams and hardware/software vendors (e.g., HPE) to resolve storage incidents.
  • Maintenance: Perform upgrades, patches, and configuration changes to maintain system stability and security.
  • Automation: Automate routine administration tasks using scripting (Bash, Python) and/or configuration management tools (Ansible).
  • Documentation: Create or follow operational runbooks and documentation to ensure the availability and reliability of large-scale distributed object storage.

​Required Skills & Expertise: Storage at Scale

​Success requires experience managing petabyte-scale storage systems and hands-on hardware/software troubleshooting abilities.

  • Linux Systems: 3–7 years administering Linux systems in production environments, including command-line administration.
  • Distributed Storage (2+ years): Experience with large-scale distributed object or parallel file systems (e.g., HPE DAOS, Lustre, GPFS/Spectrum Scale, Ceph) and coordinating with vendors.
  • Hardware: 2+ years hands-on experience with server and storage hardware troubleshooting and maintenance.
  • Automation: 1+ years automating system administration tasks with scripting (Bash, Python) or configuration management tools (Ansible).
  • Residency: Must reside in the United States and be authorized to work without sponsorship.

​Preferred Skills:

  • ​Hands-on production experience administering HPE DAOS.
  • ​Experience supporting HPC clusters, supercomputing environments, or scientific workflows.
  • ​Experience creating operational runbooks, monitoring dashboards, and documentation.

Job Features

Job CategoryCloud Engineering, Software Engineering

Apply For This Job

A valid phone number is required.