Infrastructure/DevOps Engineer – AI Platform (Remote, US/CA)

Remote
Posted 1 month ago

Abnormal Security is seeking an Infrastructure/DevOps Engineer to join their IT team, focusing on supporting their advanced AI platforms. This is a high-impact, full-time role that is fully remote for candidates residing in the United States or Canada. The core mission is to enable AI software engineers to move fast by building and maintaining reliable, scalable, and secure foundational infrastructure.

This role sits at the critical intersection of systems engineering and AI enablement. The base salary range is $127,500—$150,000 USD for most US remote locations, with a higher range for San Francisco/New York talent. The compensation package includes eligibility for a bonus and restricted stock units (RSUs).


Key Responsibilities and AI Enablement

As a key partner to IT, Security, and AI/ML engineering teams, you will solve complex operational challenges to unlock innovation across the company.

  • AI Infrastructure Architecture: Architect and manage infrastructure that specifically supports AI/ML pipelines, tools, and data platforms. This includes experience supporting ML workloads and GPU-based infrastructure.
  • Containerization & Orchestration: Implement and maintain containerization (Docker) and orchestration (Kubernetes) environments.
  • Automation & IaC: Automate provisioning and deployment using Infrastructure as Code (IaC) tools like Terraform or Pulumi.
  • CI/CD for ML: Develop CI/CD systems that integrate with ML workflows and ensure reproducible AI experiments.
  • Observability & Optimization: Monitor and troubleshoot infrastructure issues with tools like Prometheus, Grafana, and ELK stack. You will partner with engineers to optimize platform performance and resource utilization.
  • Security & Documentation: Collaborate with security and compliance teams to ensure infrastructure meets data protection standards, and maintain clear, accessible documentation to scale platform knowledge.

Required Experience and Technical Skills

The ideal candidate is a skilled engineer with a customer-first mindset, prioritizing automation, self-service tools, and operational excellence.

  • Experience: 4+ years of experience in DevOps, SRE, or Infrastructure Engineering roles.
  • Cloud & Containers: Proficiency with cloud providers (AWS preferred), Kubernetes, and Docker.
  • IaC & Scripting: Experience with Infrastructure as Code tools (Terraform, Ansible, or Pulumi), and strong scripting skills in Python, Bash, or similar.
  • CI/CD: Familiarity with CI/CD systems such as GitHub Actions, Jenkins, or CircleCI.
  • Foundational Knowledge: Understanding of networking, security, and identity management in cloud environments.
  • Problem Solving: Ability to troubleshoot complex system issues in a distributed environment.

Nice to Haves include familiarity with MLOps tools (MLflow, Kubeflow, or SageMaker), experience with AI platform infrastructure (model serving, feature stores), knowledge of logging frameworks (Fluentd, Loki), and a background supporting data platforms (Snowflake, Databricks, Hadoop).

Job Features

Job CategoryDevOps

Apply For This Job

A valid phone number is required.