Canonical, the publisher of Ubuntu, is a pioneer in globally distributed work. As a Site Reliability Engineer, you won't just be managing cloud tools; you will be perfecting enterprise infrastructure using a model-driven approach. You will manage hundreds of private clouds and Kubernetes clusters across both physical hardware (bare metal) and public clouds. At Canonical, automation is treated as a software engineering problem, requiring deep Python fluency and a scientific mindset to manage open-source operations at a massive scale.
- Location: Globally Remote (Americas/Pittsburgh focus)
- Experience: Strong background in Linux, Python, and Networking.
- Core Tech: Ubuntu, OpenStack, Kubernetes, Kubeflow, Kafka, OpenSearch.
- Travel: Ability to travel internationally twice a year for team sprints.
Full-Stack Open Source Operations
You will work across the entire technology stack, from bare-metal networking and the Linux kernel up to orchestration layers. Canonical’s approach focuses on "model-driven" operations, where complex software like OpenStack and Kubernetes are deployed and managed as reusable code models. This reduces the manual "toil" of traditional sysadmin work and allows for the management of massive distributed estates.
Kubernetes & Application Ecosystem
You will be responsible for the lifecycle of Kubernetes clusters and the open-source applications running on them, such as Kubeflow for AI, Kafka for streaming, and various databases. Your role involves monitoring these applications with an observability-first mindset, identifying incidents before they impact global customers, and ensuring that the entire open-source portfolio meets rigorous enterprise standards.
Automation as Software Engineering
To succeed at Canonical, you must be a software engineer first. You will use Python to build the automation that drives infrastructure. This includes creating recovery pipelines, automating security standards, and implementing metrics-driven scaling. You will move beyond "scripting" to build robust, maintainable software that handles the deployment and maintenance of mission-critical services for global brand-name customers.
Summary: You are at the heart of the open-source world. By applying high-level Python engineering to the challenges of bare-metal and cloud infrastructure, you ensure that Ubuntu-based environments remain the gold standard for enterprise innovation, AI, and IoT.
Job Features
| Job Category | DevOps |
Tekmetric is a high-growth, mission-driven platform providing all-in-one shop management software for the auto repair industry. As an SRE, you will be a "builder" in a culture that values extreme ownership and curiosity. You will architect the scalable AWS/GCP infrastructure that powers everything from digital vehicle inspections to payment processing, moving away from manual "grind" toward a philosophy of "automate everything."
- Location: Remote / Hybrid (Requires attendance at periodic in-person offsites)
- Experience: 5+ years in SRE or DevOps roles.
- Core Tech: AWS (or GCP), Kubernetes, Docker, Terraform.
- Stack Focus: CI/CD Pipelines, Prometheus, Grafana, ELK Stack.
- Culture: High-impact, direct communication, and a "winning together" mindset.
Scalable Infrastructure & Orchestration
You will design and maintain the cloud infrastructure (primarily AWS) that supports Tekmetric’s rapid scaling. This involves deep expertise in containerization (Docker) and orchestration via Kubernetes. By using Terraform for Infrastructure as Code (IaC), you will ensure that the environment is modular, repeatable, and secure-by-default, supporting a seamless user experience for auto repair shops nationwide.
Observability & "Automate Everything"
A core responsibility is the development of a comprehensive observability stack using Prometheus, Grafana, and the ELK Stack. You will move beyond simple monitoring to create intelligent alerting and automated incident response practices. By building robust CI/CD pipelines, you will improve the speed and consistency of code deployments, ensuring that "winning" for the customer is backed by a highly reliable system.
High Availability & Disaster Recovery
To ensure business continuity for thousands of repair shops, you will implement and manage advanced Disaster Recovery (DR) and failover processes. This includes designing backup solutions and recovery pipelines that meet strict recovery time objectives. You will also provide mentorship to junior team members, fostering a culture of continuous learning and technical excellence within the engineering organization.
Summary: You are the architect of reliability for the auto repair industry’s leading cloud platform. By replacing manual tasks with sophisticated automation and building a transparent, observable infrastructure, you empower shop owners to move "above the grind" and focus on their own customers.
Job Features
| Job Category | DevOps |
SitusAMC is a leader in the real estate financial services industry. In this role, you will be the technical anchor for products that have recently transitioned from on-premise data centers to the AWS Cloud. Your mission is to move these products from a "lift-and-shift" state to a fully optimized SaaS offering, implementing the AWS Well-Architected Framework while balancing a workload of roughly 70% operations and 30% development.
- Salary Range: $110,000 – $130,000 USD
- Location: Fully Remote (United States)
- Experience: 5+ years in DevOps/SRE roles.
- Core Tech: AWS (EKS, RDS, Elastic Beanstalk), Terraform, Azure DevOps.
- Methodology: GitOps (ArgoCD/FluxCD), Agile (Scrum/Kanban).
Cloud Optimization & Well-Architected Framework
Since these products are newly transitioned to the cloud, you will lead strategic initiatives to re-engineer them for stability and cost-efficiency. This involves applying AWS Well-Architected pillars to existing infrastructure, ensuring that EC2 instances, AMIs, and Load Balancers are tuned for high availability. You will also manage the AWS Transfer Family for large file movements, a critical component in real estate financial data flows.
GitOps & Container Orchestration
You will manage the container lifecycle using Amazon EKS (Kubernetes). A standout requirement for this role is proficiency in GitOps methodologies, specifically using ArgoCD or FluxCD. This ensures that the state of your Kubernetes clusters is always synchronized with your version control in Azure DevOps, providing a declarative and automated path for code and infrastructure changes.
Database & Network Management
As an SRE, you will provide deep support for the data layer, primarily Amazon RDS (SQL). Your responsibilities include managing schemas, users, and performance troubleshooting. On the networking side, you will utilize diagnostics like curl and wget to analyze HTTP protocols and optimize API performance. Experience with Service Mesh technologies (Linkerd or Istio) is a significant plus for managing these microservices.
Summary: You are the specialist who turns migrated applications into true cloud-native SaaS platforms. By combining your expertise in Kubernetes GitOps with a rigorous focus on AWS reliability and SQL performance, you ensure that SitusAMC's real estate tech is the most stable and scalable in the industry.
Job Features
| Job Category | Cloud Engineering |
NMI is a leading payment gateway provider, and this Senior DevOps Engineer role is the primary GCP specialist within a seasoned CloudOps team. You will lead the design, security, and networking architecture of Google Cloud Platform environments that support mission-critical, high-volume workloads. The role is heavily focused on "security-by-default" and complex networking patterns to ensure reliability across a multi-cloud footprint.
- Compensation: $125,000 – $160,000 USD + Bonus
- Location: Fully Remote (US or Canada)
- Experience: 5+ years in Cloud Engineering, SRE, or DevOps.
- Core Tech: GCP, Terraform, GKE, BigQuery, Bigtable, GitLab.
- Focus: Shared VPC Networking, IAM Least Privilege, and Data Platform Enablement.
Advanced GCP Networking & Security
You will architect sophisticated networking patterns, specifically focusing on GCP Shared VPC design. This involves managing service projects, cross-project routing, firewall policies, and private service access. Your goal is to implement a "least privilege" IAM strategy using custom roles and service accounts, backed by organizational policy guardrails to ensure production environments remain hardened against unauthorized access.
Infrastructure as Code & Reliability
Using Terraform, you will build low-toil, automated infrastructure. You will be responsible for the high availability of production services, utilizing an observability-first mindset with metrics, logs, and traces. You will participate in on-call rotations and lead blameless post-mortems to drive a culture of continuous improvement and system resilience across the platform.
Data Platform & GKE Enablement
A key differentiator for this role is the support of high-scale data services like BigQuery and Bigtable. You will optimize these services for performance and security, ensuring scalable data workflows and governance. Additionally, you will manage containerized workloads on Google Kubernetes Engine (GKE), implementing reusable modules and templates to improve the developer experience and release velocity.
Summary: You are the GCP authority for NMI’s cloud operations. By mastering Shared VPC networking, hardened IAM controls, and the automation of large-scale data platforms, you provide the secure and reliable foundation necessary for a global leader in payment processing.
Job Features
| Job Category | Cloud Engineering, DevOps |
CereCore provides IT services specifically tailored for the healthcare industry. This Sr. Cloud Engineer role is a specialized position focusing on the intersection of Microsoft 365 (M365), hybrid identity, and enterprise-grade data protection. You will be the technical lead for backup/recovery and disaster recovery strategies, ensuring that critical healthcare data remains available and secure through the use of Rubrik Cloud Data Management.
- Location: Remote (United States)
- Experience: 5+ years in Cloud Systems Engineering.
- Core Tech: M365, Azure AD, Rubrik, PowerShell.
- Specialty: Data Protection, Disaster Recovery (DR), and Identity Integration.
M365 & Azure AD Identity Integration
You will provide expert-level administration for the M365 suite, with a heavy focus on Azure Active Directory (Azure AD) integration. This involves managing hybrid identity environments, ensuring seamless authentication between on-premise workloads and cloud services. You will use PowerShell and Azure CLI to automate complex administrative tasks, user provisioning, and security configurations across the enterprise.
Rubrik Cloud Data Management
A primary responsibility is the administration of Rubrik for data protection. You will design and implement backup and recovery strategies for both cloud-native M365 data (Exchange, SharePoint, OneDrive) and on-premise workloads. This includes managing Rubrik clusters, configuring "SLA Domains" for automated protection, and ensuring that the organization meets its Recovery Point Objectives (RPOs) and Recovery Time Objectives (RTOs).
Disaster Recovery & Operational Excellence
You will lead the development of Disaster Recovery (DR) blueprints to protect against data loss and ransomware. By leveraging Rubrik’s immutable snapshots and Azure’s scalable infrastructure, you will ensure that the platform can recover quickly from catastrophic failures. Your role involves continuous performance tuning, analytical troubleshooting of distributed systems, and maintaining clear documentation for all recovery procedures.
Summary: You are the guardian of data resilience for CereCore’s healthcare clients. By mastering the integration between M365 and Rubrik, you provide a secure, automated, and highly recoverable cloud environment that allows healthcare providers to focus on patient care without the fear of data loss.
Job Features
| Job Category | Cloud Engineering, Information Technology |
SAIC is seeking a results-driven Java Backend Developer to support a high-priority IT modernization effort for a large Federal Agency. This role focuses on breaking down monolithic legacy services into modular, cloud-native Spring Boot applications. You will be a key player in migrating on-premise systems to AWS, utilizing DevSecOps and Lean best practices to ensure the new architecture is scalable and fault-tolerant.
- Location: Remote (HQ in Alexandria, VA)
- Experience: 5+ years in Java Development.
- Clearance: Must be able to obtain a Public Trust clearance.
- Core Tech: Java, Spring Boot, Hibernate, AWS, Oracle, REST.
- Focus: Cloud Migration, Monolith Decomposition, and Agile Delivery.
Cloud-Native Java Development
You will design and code microservices using the Spring Boot framework and Hibernate/JPA for ORM. The role requires a strong grasp of object-oriented principles to ensure code is maintainable and secure. You will transition legacy logic into RESTful web services, utilizing JSON and XML for data exchange and ensuring that all new modules are optimized for a cloud environment.
AWS Migration & Serverless
A primary responsibility is migrating on-premise applications to AWS. You will utilize a wide array of AWS resources, including ECS Fargate for containerized workloads, Lambda for serverless functions, and API Gateway for service orchestration. You will also manage data persistence using RDS (Oracle) and handle asynchronous messaging through SQS and SNS.
DevSecOps & Observability
Working in a fast-paced Agile environment, you will use GitLab for source control and Maven for build automation. To ensure high availability and performance, you will analyze logs using Splunk and monitor data flow performance with Instana. You will participate in the full Agile lifecycle—from story elaboration in JIRA/Rally to sprint reviews and retrospectives—ensuring that every deployment meets federal quality standards.
Summary: You are the architect of modularity for federal IT systems. By decomposing monolithic services into agile, cloud-native Java applications on AWS, you provide the government with the high-performance, secure, and maintainable software needed to serve the public effectively.
Job Features
| Job Category | DevOps, Information Technology |
Peraton is a major national security partner providing mission-critical IT solutions across the federal government. In this role, you will support a federal financial customer by migrating complex data flows into a secure AWS environment. A unique aspect of this position is the integration of legacy file transfer and messaging middleware—such as IBM MQ and Connect:Direct—into modern, cloud-native architectures.
- Salary Range: $86,000 - $138,000 USD
- Location: Remote (Home based) with Hybrid flexibility/travel as required.
- Experience: 5+ years (with Degree) or 9+ years (with HS Diploma).
- Clearance: Must be U.S. Citizen / Public Trust.
- Core Tech: AWS (Lambda, Glue, EKS), Python, Ansible, CDK/CloudFormation.
Data Migration & Legacy Integration
You will migrate large-scale data flows from on-premise systems to AWS. This includes managing IBM MQ and Connect:Direct services, ensuring that high-volume file transfers for financial systems remain reliable and secure. You will utilize AWS data services like Glue for ETL processes and S3 for durable storage, bridging the gap between traditional enterprise messaging and cloud-native serverless logic.
Infrastructure as Code (IaC) & Serverless
You will build the migration target environments using AWS CDK, CloudFormation, or Terraform. The role involves writing logic for Lambda and Step Functions to automate complex workflows. By applying Ansible and Python scripting, you will ensure that the infrastructure is version-controlled and deployed through robust CI/CD pipelines, adhering to federal security best practices.
Containerization & Orchestration
The project leverages modern container patterns, requiring proficiency in Docker and orchestration via Amazon EKS (Kubernetes) or ECS. You will be responsible for containerizing legacy components where possible and managing their lifecycle, performance tuning, and observability using CloudWatch and CloudTrail to meet strict federal compliance standards.
Summary: You are the bridge between legacy financial systems and the future of cloud computing. By mastering the integration of enterprise MQ services with modern AWS serverless and container technologies, you ensure that vital national financial data flows are secure, resilient, and highly automated.
Job Features
| Job Category | Information Technology, Software Engineering |
GovCIO is a prominent government IT contractor focused on digital transformation and cloud modernization. In this role, you will lead the migration and architectural evolution of a critical federal application into the AWS cloud. This is a senior-level position requiring a deep blend of Java development and DevSecOps engineering to ensure high-scale government systems remain secure, compliant, and highly available.
- Location: Fully Remote (HQ in Alexandria, VA)
- Experience: 8–12+ years in solutions design and engineering.
- Clearance: Must be able to obtain a Public Trust clearance.
- Core Tech: AWS, Java, GitLab CI/CD, Terraform, Ansible, Docker.
- Focus: Cloud Migration, Disaster Recovery, and Blue-Green Deployments.
Cloud Migration & Hybrid Architecture
You will be responsible for the end-to-end cloud transformation of legacy government systems. This involves designing "blueprints" for hybrid cloud infrastructures, focusing on VPC networking, storage, and security topologies. You will advise federal clients on architectural decisions, ensuring that the move to AWS follows industry best practices for service decomposition and microservices.
Automated CI/CD & Infrastructure as Code
You will manage the Infrastructure as Code (IaC) baseline using Terraform and Ansible. Your goal is to build a fully automated software delivery lifecycle—from code commit to production—using GitLab CI/CD. By implementing Blue-Green deployment environments, you ensure that updates can be released with zero downtime and minimal risk to critical government operations.
DevSecOps & Compliance
Given the government context, security is integrated into every stage of the project. You will embed security controls into the CI/CD pipelines and work closely with Java and Angular developers to ensure the application meets federal compliance standards. Your responsibilities also include designing Disaster Recovery protocols and maintaining a "secure-by-default" IAM and network posture.
Summary: You are the technical lead for a major federal cloud modernization effort. By combining expert Java coding skills with advanced AWS orchestration and DevSecOps automation, you provide the resilient and secure foundation necessary for government applications to thrive in the cloud era.
Job Features
| Job Category | DevOps, Information Technology |
This role represents the cutting edge of SRE, moving beyond traditional scripting toward Agentic Workflows and Autonomous Infrastructure. You will be responsible for building self-sustaining systems that use AI to eliminate operational toil. This involves integrating Large Language Models (LLMs) and orchestration frameworks directly into the production lifecycle to automate incident response and system scaling.
- Focus: AI Operations (AIOps), Autonomous Agents, and Predictive Observability.
- Core Frameworks: LangChain, LangGraph, n8n, CrewAI, AutoGPT.
- Automation Tools: Airplane.dev, Custom AI Flow Builders.
- Key Metric: Elimination of toil through self-healing systems.
Autonomous Agent Orchestration
You will design and deploy agentic workflows using frameworks like LangGraph or CrewAI. Unlike standard linear automation, these autonomous agents can reason through complex infrastructure alerts, interact with APIs, and execute remediation steps independently. You will be tasked with integrating these "AI Copilots" into production systems to handle routine maintenance and complex multi-step recoveries.
AI-Driven Observability & Predictive SLOs
A major component of this role is evolving traditional monitoring into Predictive Observability. You will build LLM-based assistants that help engineers query system state using natural language and design dashboards that predict Service Level Objective (SLO) breaches before they occur. By measuring "everything," you will create the data loops necessary for AI to understand and maintain system health.
Documentation & Communication
Clarity is critical when automating high-stakes infrastructure. You will be responsible for documenting complex AI flow logic and communicating technical resolutions to partners and customers. This ensures that even as the systems become more autonomous, the human operators maintain full visibility and "precision of understanding" regarding how the AI is managing the platform.
Summary: You are at the forefront of the "SRE 2.0" movement. By replacing manual toil with intelligent, agent-driven automation and predictive analytics, you ensure that enterprise-scale systems are not just reliable, but inherently self-sustaining.
Job Features
| Job Category | AI (Artificial Intelligence) |
Tempo is the leading time management and resource planning provider within the Atlassian ecosystem, serving over 30,000 customers and a third of the Fortune 500. As a Senior SRE, you will join the team responsible for the stable foundation upon which all other engineering departments build. This is a "Remote First" role focused on scaling a high-traffic SaaS platform on AWS, championing DevOps culture, and ensuring enterprise-grade availability.
- Location: Remote (United States or Canada)
- Experience: 5+ years in a SaaS environment.
- Core Tech: AWS, Kubernetes, CI/CD, Infrastructure as Code.
- Focus: Observability, Database Administration, and Platform Scalability.
Cloud-Native Infrastructure & Kubernetes
You will own the evolution of Tempo's AWS-based platform, ensuring it scales alongside a rapidly growing customer base. A primary focus is working with Kubernetes and modern cloud-native tools to improve platform extensibility. You will act as a "build champion," implementing architectural changes that increase deployment speed while maintaining the high quality expected by enterprise clients.
Observability & Performance Metrics
A critical part of this role involves deep-diving into system performance. You will implement and manage Real User Monitoring (RUM), distributed tracing, and advanced monitoring pipelines. By analyzing these metrics, you will create automated alerting and recovery systems that minimize downtime and improve the end-user experience across Tempo's suite of integrated solutions.
Database Administration & Automation
Beyond standard infrastructure, you will be responsible for the health of Tempo's data layer. This includes database administration tasks such as provisioning, performance tuning, and troubleshooting complex storage issues. You will automate these key processes—alongside build and release cycles—to ensure that manual intervention is minimized and recovery is predictable.
Summary: You are the architect of stability for Tempo’s global infrastructure. By combining deep AWS and Kubernetes expertise with a mentor-led approach to DevOps, you empower modern teams worldwide to deliver value through highly available and cost-efficient tech solutions.
Job Features
| Job Category | DevOps, Software Engineering |
Vultr is the world’s largest privately-held cloud infrastructure company, recently valued at $3.5 billion. Unlike typical DevOps roles that manage a company's internal apps, this position involves building the actual cloud products that Vultr's customers use. You will be working on the "engine room" of the cloud, developing and operating services like Vultr Kubernetes Engine (VKE), Load Balancers (VLB), and AI Inference platforms.
- Compensation: $75,000 – $100,000 USD
- Location: 100% Remote (United States)
- Experience: 3–5+ years in DevOps, SRE, or Cloud Engineering.
- Core Tech: Go (Golang), Kubernetes Internals, Terraform, Ansible.
- Focus: Cloud Provider Infrastructure and Container Runtimes.
Cloud Product Engineering & Go Development
This is a Go-first engineering role. You won't just be using cloud tools; you will be building them. You will contribute directly to Vultr’s open-source ecosystem, including their Terraform Provider, Crossplane integrations, and the vultr-cli. Your work will involve writing code to manage Vultr’s global footprint of Cloud GPUs, Bare Metal, and Cloud Storage.
Deep Kubernetes & Container Internals
You will move beyond high-level orchestration into the internals of the Kubernetes ecosystem. This includes working with the kubelet, custom controllers, and CRDs. A key part of the role involves designing integrations for container runtimes like containerd and runc, ensuring that OCI images deploy reliably and securely across Vultr's 32 global data center locations.
Networking & Load Balancing
You will help develop and enhance Vultr Load Balancers (VLB) and NAT Gateways. This requires a solid understanding of HAProxy, Envoy, or NGINX, and the ability to troubleshoot complex distributed systems. You'll work on the networking layer (CNI) to ensure high-performance connectivity for thousands of active customers worldwide.
Summary: You are building the cloud itself. By mastering Go and the deep internals of Kubernetes and container runtimes, you provide the high-performance infrastructure that powers the next generation of AI innovators and global enterprises.
Job Features
| Job Category | Cloud Engineering, DevOps |
This role is with a fast-growing SaaS provider specialized in high-volume data processing. As a Senior DevOps Engineer, you will be the primary architect of a reliable and scalable Google Cloud Platform (GCP) environment. You will bridge the gap between development and operations by leading initiatives in infrastructure automation, cost optimization, and observability, ensuring that the global platform remains secure and performant.
- Compensation: $100,000 – $140,000 USD
- Location: Remote Local (Boston, MA area)
- Experience: 5+ years in DevOps, SRE, or Platform Engineering.
- Core Tech: Google Cloud Platform (GCP), Terraform, Ansible, Docker, Jenkins.
- Focus: Cloud-Native Scalability, CI/CD, and Observability.
Google Cloud Infrastructure & Automation
You will be responsible for the design and operation of a highly available GCP environment. Using Terraform, you will manage the infrastructure-as-code (IaC) lifecycle, ensuring that all cloud resources are versioned and reproducible. You will also utilize Ansible for configuration management across your Linux fleet, maintaining a standardized and secure server environment for the company’s containerized workloads.
Release Engineering & CI/CD
You will lead the development and maintenance of automated deployment pipelines using Jenkins. Your goal is to enable "paved roads" for software engineers, allowing for fast and reliable deployments of Docker containers. This includes integrating automated testing and security guardrails into the CI/CD flow, reducing the manual effort required for high-frequency SaaS releases.
Observability & Incident Response
To support a global customer base, you will implement and optimize a modern observability stack using tools like Prometheus and Grafana (or GCP Cloud Monitoring). You will define critical alerts and dashboards to monitor system health, participate in on-call rotations, and lead post-incident reviews to drive long-term reliability improvements. You will also mentor junior engineers in these SRE practices to foster a culture of technical excellence.
Summary: You are the technical leader ensuring the stability of a high-growth SaaS platform. By mastering GCP internals and driving automation through Terraform and Ansible, you provide the resilient foundation necessary for global data processing at scale.
Job Features
| Job Category | Cloud Engineering, DevOps |
Cyera is a fast-growing startup reinventing data security for the cloud era. As a DevOps Engineer, you will be a high-impact contributor responsible for the infrastructure and automation that powers their data security platform. You will work across multi-cloud environments, focusing on "DevSecOps" to ensure that as the company scales its Fortune 1000 client base, the platform remains resilient, automated, and compliant with global security standards.
- Location: R&D US Remote
- Experience: 3–5+ years in DevOps, SRE, or Platform Engineering.
- Core Tech: Kubernetes, Docker, Terraform, CI/CD (GitHub Actions/GitLab).
- Cloud Platforms: AWS, GCP, and Azure.
- Focus: Data Security, Infrastructure as Code, and Observability.
Multi-Cloud Infrastructure & Kubernetes
You will design and maintain highly available infrastructure across AWS, GCP, and Azure. Central to this is Kubernetes orchestration, where you will manage containerized workloads to ensure they scale dynamically. By using Terraform for Infrastructure as Code (IaC), you will automate the provisioning of these multi-cloud environments, ensuring consistency across development, staging, and production.
DevSecOps & CI/CD Pipelines
As a security company, Cyera requires security to be "shifted left." You will build CI/CD pipelines using tools like GitHub Actions or GitLab CI, embedding automated security scanning and compliance checks (SAST/DAST) directly into the deployment workflow. Your goal is to increase release velocity without compromising the rigorous requirements of SOC2 or ISO 27001.
Reliability & Observability
You will own the uptime and performance of production services. This involves implementing comprehensive observability stacks using tools like Prometheus, Grafana, and Datadog. You will be responsible for defining alerting thresholds, conducting root cause analysis (RCA) for incidents, and creating automated runbooks to ensure the platform can handle the data security needs of modern enterprises.
Summary: You are the architect of the automated systems that protect enterprise data. By bridging the gap between high-speed software delivery and iron-clad cloud security, you enable Cyera to remain the leader in the next generation of cybersecurity.
Job Features
| Job Category | Cloud Engineering, DevOps, Security |
eSimplicity is a digital services firm that partners with federal agencies to modernize public health systems. This Senior DevOps Engineer role is specifically focused on supporting the Centers for Medicare and Medicaid Services (CMS). You will operate in a large-scale AWS environment, integrating DevSecOps principles into pipelines that handle massive healthcare data sets and critical government reporting tools.
- Salary Range: $106,300 – $136,600 USD
- Location: Fully Remote (Operating on Eastern Time)
- Experience: 8+ years in DevOps, DevSecOps, or Security Engineering.
- Clearance: Must be able to obtain a Public Trust clearance.
- Core Tech: AWS, Terraform, Terragrunt, GitHub Actions, Docker.
- Data Stack: Databricks, Redshift.
Federal DevSecOps & Compliance
A primary focus of this role is ensuring that all systems meet strict federal compliance standards, including FISMA and NIST. You will embed security controls—such as SAST, DAST, and SCA—directly into the software development lifecycle. Working with Java and Python/Django teams, you will automate the remediation of vulnerabilities to protect the data of millions of Americans.
Infrastructure as Code (IaC) with Terragrunt
You will manage cloud infrastructure using Terraform, specifically utilizing Terragrunt to keep your code DRY (Don't Repeat Yourself) and manage remote state across multiple AWS accounts. This ensures that the infrastructure supporting Databricks and Redshift clusters is repeatable, auditable, and secure-by-default, adhering to strict IAM and network security policies.
Big Data & Container Security
Supporting CMS involves managing high-performance data platforms. You will be responsible for the security and scaling of Databricks and Redshift clusters, ensuring that big data processing remains performant and compliant. Additionally, you will manage Docker container security, implementing hardened images and secure registry workflows to protect the application layer.
Summary: You are a critical guardian of federal healthcare infrastructure. By combining deep AWS expertise with advanced DevSecOps automation and a firm understanding of government compliance, you ensure that CMS can process vital data securely and efficiently to serve the public good.
Job Features
| Job Category | Data, DevOps, Healthcare |
This role is a unique hybrid position within a lean, growing startup environment. Despite the "DevOps" title, the responsibilities lean heavily toward Full Stack Development with a strong emphasis on functional programming and distributed systems. You will be a "generalist" engineer, moving seamlessly between frontend features, backend logic, and the Microsoft Azure infrastructure that supports it.
- Location: Remote - Work from Home
- Experience: 5+ years in software development.
- Core Tech: F#, .NET Ecosystem, Microsoft Orleans, Microsoft Azure.
- Focus: Full-stack development, Distributed Systems, and Architecture.
Functional Programming & The .NET Stack
The most distinct aspect of this role is the use of F# across the entire stack. You will be writing functional-first code within the .NET ecosystem. This approach prioritizes immutability and type safety, which is critical for the complex logic required by clinical product leaders and business stakeholders.
Distributed Systems with Microsoft Orleans
You will work with Microsoft Orleans, a "virtual actor" framework designed for building massive-scale distributed systems. This allows the team to handle complex state management and concurrency without the usual overhead of distributed locking. You will be responsible for ensuring these distributed components are performant and resilient within a cloud-native architecture.
Cloud Infrastructure & Architecture
While the majority of your time is spent on hands-on development, you will own the Microsoft Azure infrastructure. This includes designing the architecture for new services, conducting code reviews with a focus on cloud-best practices, and providing technical support for third-party vendor integrations. Your goal is to ensure the infrastructure scales alongside the growing business needs.
Summary: You are a key technical pillar in a collaborative startup. By leveraging the power of F# and the scalability of Microsoft Orleans on Azure, you bridge the gap between high-level clinical requirements and robust, distributed software architecture.
Job Features
| Job Category | Full Stack Developer |