Summary: The Site Reliability Engineer position is based in London, UK, and operates on a hybrid model requiring 2-3 days per week onsite. The role involves working with the Monitoring and Observability team to manage observability solutions and legacy monitoring stacks within Citi. Candidates should possess essential skills in OpenShift/Kubernetes, Grafana, and Prometheus.
Job Title - Site Reliability Engineer
Location - London, UK
Mode - Hybrid (2-3 days/week onsite)
Type - Contract (inside IR 35)
Job Details :- The Monitoring and Observability team is responsible for managing: Operating with a global footprint. Collaborating across various organizations to understand and develop observability solutions for enterprise-wide deployment at scale. Managing the legacy monitoring stack across the Production Management organization within Citi. Driving the strategic delivery of end-to-end Observability solutions.
Essential Skills :- OpenShift/Kubernetes Administration Grafana & Observability Stack Working knowledge of Grafana backend components: Mimir (metrics), Loki (logs), and Tempo (traces). Experience with Prometheus for metric collection and PromQL for querying. Helm Chart Management: Experience with Helm for deploying applications, including creating, modifying, and managing Helm charts, library charts, and dependencies.