Site Reliability Engineer – Tekmetric (Remote / Hybrid)

Remote
Posted 3 weeks ago

​Tekmetric is a high-growth, mission-driven platform providing all-in-one shop management software for the auto repair industry. As an SRE, you will be a “builder” in a culture that values extreme ownership and curiosity. You will architect the scalable AWS/GCP infrastructure that powers everything from digital vehicle inspections to payment processing, moving away from manual “grind” toward a philosophy of “automate everything.”

  • Location: Remote / Hybrid (Requires attendance at periodic in-person offsites)
  • Experience: 5+ years in SRE or DevOps roles.
  • Core Tech: AWS (or GCP), Kubernetes, Docker, Terraform.
  • Stack Focus: CI/CD Pipelines, Prometheus, Grafana, ELK Stack.
  • Culture: High-impact, direct communication, and a “winning together” mindset.

​Scalable Infrastructure & Orchestration

​You will design and maintain the cloud infrastructure (primarily AWS) that supports Tekmetric’s rapid scaling. This involves deep expertise in containerization (Docker) and orchestration via Kubernetes. By using Terraform for Infrastructure as Code (IaC), you will ensure that the environment is modular, repeatable, and secure-by-default, supporting a seamless user experience for auto repair shops nationwide.

​Observability & “Automate Everything”

​A core responsibility is the development of a comprehensive observability stack using Prometheus, Grafana, and the ELK Stack. You will move beyond simple monitoring to create intelligent alerting and automated incident response practices. By building robust CI/CD pipelines, you will improve the speed and consistency of code deployments, ensuring that “winning” for the customer is backed by a highly reliable system.

​High Availability & Disaster Recovery

​To ensure business continuity for thousands of repair shops, you will implement and manage advanced Disaster Recovery (DR) and failover processes. This includes designing backup solutions and recovery pipelines that meet strict recovery time objectives. You will also provide mentorship to junior team members, fostering a culture of continuous learning and technical excellence within the engineering organization.

Summary: You are the architect of reliability for the auto repair industry’s leading cloud platform. By replacing manual tasks with sophisticated automation and building a transparent, observable infrastructure, you empower shop owners to move “above the grind” and focus on their own customers.

Job Features

Job CategoryDevOps

Apply For This Job

A valid phone number is required.