Site Reliability Engineer III – Cloud Infrastructure & Platform
Vimeo, a major video platform serving as part of the internet’s infrastructure, is seeking a Site Reliability Engineer III to join its Site Reliability & Infrastructure Engineering team. The role spans platform engineering, database administration, release engineering, and internal tools, focusing on designing, developing, deploying, maintaining, and optimizing the platform that powers Vimeo.
This is a Full-time position, available in New York, NY, OR US – Remote. The base salary range is $130,000 – $178,750 in major metro areas, or $117,000 – $160,875 in all other US cities.
Role Summary and Reliability/Platform Mandate
You will work with cloud infrastructure at massive scale, focusing on optimizing performance, driving down outages, and building robust internal toolkits. The goal is to make Vimeo faster, simpler, more scalable, more reliable, and more efficient to operate.
What You’ll Do:
- Platform Evolution: Build, secure, and evolve platforms that power Vimeo’s applications.
- Tooling & Automation: Build and maintain tooling that makes manual infrastructure work obsolete and enables self-service for hundreds of engineers.
- Reliability: Improve observability and reliability of applications to reduce outages to an absolute minimum, while reducing MTTA and MTTR.
- Internal Platform: Contribute to an internal self-service infrastructure platform used by all engineers for application development and deployment.
- On-Call: Participate in a weekly on-call rotation shared between US and India offices, responding to production incidents and providing internal support.
- Documentation: Write and maintain thorough documentation to ensure the global team functions as a cohesive unit.
Required Experience and Technical Qualifications
The ideal candidate possesses deep expertise in distributed systems architecture, expert-level Kubernetes administration, database management (MySQL), and significant experience with major cloud providers.
- Experience (Mandatory):
- At least 3 years of professional experience in software development or DevOps.
- High proficiency in at least one general-purpose programming language (C/C++, Go, Java, Ruby, PHP, Python, etc.).
- Significant experience with major cloud providers (Google Cloud, AWS).
- Experience with “Infrastructure as Code” platforms such as Terraform.
- Experience with observability systems (e.g., Datadog, Grafana, Prometheus).
- Core Technical Expertise:
- Deep understanding of the architectural patterns of high-scalability distributed systems.
- Expert-level proficiency in maintaining, optimizing, and administering Kubernetes deployments.
- Significant experience with deploying and administering MySQL.
- Strong knowledge of container orchestration, Linux system internals, networking, and secure computing.
- Bonus Skills (Nice to Have):
- Knowledge of ArgoCD, Atlantis, Varnish, Memcached, and/or Chef.
- Experience with generalized or language-specific build systems (make, bazel, etc.).
Job Features
| Job Category | Cloud Engineering |