Site Reliability Engineer (Remote, USA)

Remote
United States
Posted 1 month ago

Frontdoor, the parent company of American Home Shield and the cutting-edge home maintenance app, is seeking an experienced Site Reliability Engineer (SRE). This is a full-time, virtual/remote position in the USA, focused on applying software engineering principles to solve operational challenges, automate toil, and ensure the always-up, always-available resilience of their home service platform.

The salary range for this role is $123,000 to $150,000, based on experience, qualifications, and location.


Key Responsibilities and Technical Focus

This SRE role is responsible for the full lifecycle of infrastructure reliability, from automation design to incident response and capacity planning.

  • Automation & Toil Reduction: Research, design, and implement solutions to build resilient services, reducing manual toil through extensive automation. This includes integrating and automating existing manual solutions and processes.
  • Infrastructure & Orchestration: Build and maintain automation tooling for infrastructure, CI/CD, and observability (monitoring, alerting, logging, tracing). You will also build and maintain cloud and container orchestration infrastructure.
  • Collaboration & Best Practices: Collaborate with software engineering, security, and systems teams to streamline operations and implement best DevOps practices across the organization to improve performance and efficiency.
  • Operations & Support: Participate in an on-call rotation for production issue escalations, troubleshoot and support production issues, and investigate anomalies/outages to determine root cause.
  • Capacity Planning: Assist with planning for the growth and capacity of the infrastructure.

Required Experience and Technical Skills

The ideal candidate is a seasoned DevOps professional with 5+ years of hands-on experience and deep knowledge across cloud, networking, and modern automation tools.

  • Experience: 5+ years of hands-on DevOps experience required, including 2+ years managing production infrastructure on any cloud and 2+ years developing code.
  • Core Systems: Good understanding of Unix/Linux operating systems and internals, and core concepts of computer networking (TCP/UDP, IP Routing, DNS).
  • Cloud & IaC: Hands-on experience with at least one cloud service provider (GCP, AWS, or Azure). Deep understanding and experience with Infrastructure as Code (IaC) using tools like Terraform, CloudFormation, or Ansible.
  • Scripting & Coding: Proficient with Linux CLI and one other programming language (Python/Go) in addition to shell scripting (sh/bash).
  • Containers & CI/CD: Working knowledge of containers and any one container orchestration platform (Kubernetes/Nomad/Mesos/Swarm). Experience with at least one CI/CD pipeline (Jenkins, Travis, CircleCI, Gitlab).
  • Security & Networking: Experience with Palo Alto, F5, cloud firewalls, load balancers, WAF, Akamai, and related products/technologies.

Preferred Skills include specific experience with AWS & GCP, Terraform, Kafka, Git/GitLab, Kubernetes, good working knowledge of Istio service mesh, and experience with Infoblox Grid Manager.

Job Features

Job CategorySoftware Engineering

Apply For This Job

A valid phone number is required.