Site Reliability / GitOps Engineer
An opportunity is available for a Site Reliability / GitOps Engineer to join the Information Systems (IS) team at Canonical, the leading provider of Ubuntu and open-source software to global enterprise and technology markets. This role is a unique opportunity for an “automation-first” technologist to manage and evolve the core IT production services used by over 60 million Ubuntu users worldwide.
This is a full-time, remote position, available globally in any timezone.
Role Summary and Automation Leadership Mandate
This SRE & GitOps Engineer will drive operations automation to the next level across Canonical’s private and public clouds. The role combines deep hands-on expertise with infrastructure as code (IaC) and software development practices to ensure the reliability and scalability of Canonical’s services and products.
As a Site Reliability / GitOps Engineer, you will:
- IaC & Automation: Apply your experience of IaC to develop infrastructure as code practice within IS by constantly increasing automation and improving IaC processes. Automate software operations for re-usability and consistency across private and public clouds.
- Resilience & Development: Develop new features and improve the resilience and scalability of the existing cloud and container portfolio. You’ll be given uninterrupted development time to focus on large-scale projects and automation of manual tasks.
- Operational Responsibility: Maintain operational responsibility for all of Canonical’s core services, networks, and infrastructure. Carry final responsibility for time-critical escalations.
- Observability & Troubleshooting: Develop skills in troubleshooting, capacity planning, and performance investigation. Set up, maintain, and use observability tools such as Prometheus, Grafana, and Elasticsearch.
- Collaboration & Improvement: Collaborate with development teams to design service architecture, documentation, playbooks, and operational procedures. You will also improve Canonical products and the open-source technologies by providing critical feedback (submitting bugs and sometimes pull requests).
- GitOps Practice: Utilize version control, peer review, and CI/CD to roll out changes to both applications and infrastructure, defining operations entirely in code.
Required Experience and Technical Qualifications
The ideal candidate is a Linux and automation expert with a strong modern engineering background, capable of operating distributed systems and solving complex, full-stack problems.
- IaC & GitOps Expertise: A deep experience of, and knowledge to define operations in code, using version control, peer review, and CI/CD to roll out changes.
- Engineering Background: Strong modern engineering background (peer-review, unit testing, SCM, CI/CD, Agile).
- Programming: Python software development experience, particularly with large projects.
- Linux & Networking: Practical knowledge of Linux networking, routing, and firewalls. Hands-on experience administering enterprise Linux servers.
- Systems Knowledge: Affinity with various forms of Linux storage (from Ceph to Databases). Proficiency with cloud computing concepts and technologies.
- Education: Bachelor’s degree or greater, preferably in computer science or a related engineering field.
- Attributes: Motivated and able to troubleshoot from kernel to web. Passionate and familiar with open-source, especially Ubuntu or Debian.
Job Features
| Job Category | Information Technology, Product Management, Software Engineering |