{"id":462,"date":"2025-10-27T20:41:28","date_gmt":"2025-10-27T20:41:28","guid":{"rendered":"https:\/\/skillbasedmatching.com\/jobs\/?post_type=jobpost&#038;p=462"},"modified":"2025-10-27T20:41:32","modified_gmt":"2025-10-27T20:41:32","slug":"site-reliability-engineer-sre-reliability-platform","status":"publish","type":"jobpost","link":"https:\/\/skillbasedmatching.com\/jobs\/current-jobs\/site-reliability-engineer-sre-reliability-platform\/","title":{"rendered":"Site Reliability Engineer (SRE) \u2013 Reliability Platform"},"content":{"rendered":"\n<p><strong>Zapier<\/strong>, a company building a platform for automation and AI that helps millions of businesses globally scale, is seeking a <strong>Site Reliability Engineer (SRE)<\/strong>. This high-impact role is on the <strong>Reliability Platform team<\/strong>, which owns observability, incident response, and service ownership, with the mission of strengthening Zapier&#8217;s reliability posture at scale.<\/p>\n\n\n\n<p>This is a <strong>Full-time, Remote<\/strong> position, specifically for the <strong>NAMER (West Coast)<\/strong> region. The salary range is <strong>$141,000 \u2013 $211,700 annually<\/strong>.<\/p>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Role Summary and Observability Mandate<\/h3>\n\n\n\n<p>This SRE role goes beyond typical infrastructure work, focusing heavily on <strong>observability, incident response, and coding<\/strong> to build systems that make Zapier more resilient. You&#8217;re expected to thrive in writing production-grade code and proactively find ways to reduce toil and automate repetitive work.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Things You\u2019ll Do:<\/h4>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Platform Tooling:<\/strong> Build and improve platform tooling that helps Zapier engineers <strong>observe and operate their services<\/strong>.<\/li>\n\n\n\n<li><strong>Observability Evolution:<\/strong> Operate and evolve core observability systems, including <strong>logging, metrics, alerting, and dashboards<\/strong>, using tools like <strong>Grafana, Datadog, Opensearch, and Prometheus<\/strong>.<\/li>\n\n\n\n<li><strong>Incident Response:<\/strong> Participate in the team\u2019s <strong>on-call rotation<\/strong> and contribute to the broader incident response program by improving processes, tooling, and practices used to detect, respond, and learn.<\/li>\n\n\n\n<li><strong>Automation &amp; Infra:<\/strong> Write code to automate operations, improve developer experience, and contribute to infrastructure reliability using <strong>AWS, Kubernetes, and Terraform<\/strong>.<\/li>\n\n\n\n<li><strong>Best Practices:<\/strong> Review instrumentation designs, suggest improvements, and advocate for effective alerting to raise the bar on observability and reliability across product teams.<\/li>\n\n\n\n<li><strong>AI Exploration:<\/strong> Explore and pilot <strong>AI-augmented tools<\/strong> (e.g., debugging agents, alert correlation) to improve reliability workflows.<\/li>\n<\/ul>\n\n\n\n<hr class=\"wp-block-separator has-alpha-channel-opacity\"\/>\n\n\n\n<h3 class=\"wp-block-heading\">Required Experience and Technical Qualifications<\/h3>\n\n\n\n<p>The ideal candidate is an experienced engineer with a strong coding background, deep familiarity with the cloud-native SRE stack, and a proactive, problem-solving mindset.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li><strong>Experience (Mandatory):<\/strong>\n<ul class=\"wp-block-list\">\n<li><strong>4+ years<\/strong> in systems, infrastructure, or backend software roles (SaaS, cloud-native environments preferred).<\/li>\n\n\n\n<li><strong>Hands-on experience with observability<\/strong> (metrics, logging, dashboards, alerts) and the ability to reason about instrumentation and alert design.<\/li>\n\n\n\n<li>Comfortable jumping into incidents, diagnosing across telemetry, coordinating, and contributing to postmortems.<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Core Technical Stack:<\/strong>\n<ul class=\"wp-block-list\">\n<li>Thrives writing production-grade code in <strong>Go, Python, or equivalent<\/strong>.<\/li>\n\n\n\n<li>Experience with <strong>Infrastructure-as-Code (Terraform, or equivalent)<\/strong>.<\/li>\n\n\n\n<li>Experience with cloud (<strong>AWS<\/strong>) and container orchestration (<strong>Kubernetes<\/strong>).<\/li>\n<\/ul>\n<\/li>\n\n\n\n<li><strong>Attitude:<\/strong> Thinks proactively about <strong>reducing toil<\/strong> and is comfortable influencing peers by suggesting better practices and driving cross-team improvements. Approaches new tools and ideas (especially <strong>AI in reliability<\/strong>) with curiosity and openness.<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Zapier, a company building a platform for automation and AI that helps millions of businesses globally scale, is seeking a Site Reliability Engineer (SRE). This high-impact role is on the Reliability Platform team, which owns observability, incident response, and service ownership, with the mission of strengthening Zapier&#8217;s reliability posture at scale. This is a Full-time, [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"menu_order":0,"template":"","jobpost_category":[1294,46],"jobpost_job_type":[39],"jobpost_location":[],"jobpost_tag":[2082,2081,188,1144,997,1263,1949,1261,1232,1002,1259,24,2080,2083,1018,1020,1004],"class_list":["post-462","jobpost","type-jobpost","status-publish","hentry","jobpost_category-cloud-engineering","jobpost_category-data","jobpost_job_type-remote","jobpost_tag-ai-in-reliability","jobpost_tag-automation-platform","jobpost_tag-aws","jobpost_tag-cloud-native","jobpost_tag-datadog","jobpost_tag-go","jobpost_tag-grafana","jobpost_tag-incident-response","jobpost_tag-kubernetes","jobpost_tag-observability","jobpost_tag-prometheus","jobpost_tag-python","jobpost_tag-remote-west-coast","jobpost_tag-service-ownership","jobpost_tag-site-reliability-engineer","jobpost_tag-sre","jobpost_tag-terraform"],"_links":{"self":[{"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost\/462","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost"}],"about":[{"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/types\/jobpost"}],"author":[{"embeddable":true,"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/users\/1"}],"wp:attachment":[{"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/media?parent=462"}],"wp:term":[{"taxonomy":"jobpost_category","embeddable":true,"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost_category?post=462"},{"taxonomy":"jobpost_job_type","embeddable":true,"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost_job_type?post=462"},{"taxonomy":"jobpost_location","embeddable":true,"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost_location?post=462"},{"taxonomy":"jobpost_tag","embeddable":true,"href":"https:\/\/skillbasedmatching.com\/jobs\/wp-json\/wp\/v2\/jobpost_tag?post=462"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}