Vurke Inc.

Site Reliability Engineer (SRE)

Contractor in Technology (Software, IT, AI, Internet)
  • Post Date : March 24, 2026
  • Apply Before : March 24, 2027

Job Detail

  • Job ID 12002
  • Employment type  Full Time
  • Location  Onsite

Job Description

Site Reliability Engineer (SRE)

Job Title: Site Reliability Engineer
Department: Infrastructure / Platform Engineering
Reports To: Head of Engineering / Cloud Architect

Role Overview

We are hiring a Site Reliability Engineer to ensure system stability, performance, observability, and resilience across production environments. This role will focus on uptime, incident response, automation, performance tuning, and service reliability.

Key Responsibilities

  • Monitor system availability, reliability, latency, and performance
  • Build and improve observability across logs, metrics, and tracing
  • Participate in incident management, root cause analysis, and recovery planning
  • Improve system resilience, failover readiness, and operational maturity
  • Define SLOs, SLIs, and reliability metrics
  • Partner with DevOps and development teams to improve production readiness
  • Automate operational tasks and reduce manual toil
  • Support capacity planning and performance optimization

Required Qualifications

  • Strong experience with Linux systems, cloud environments, and production operations
  • Experience with monitoring and observability tools
  • Knowledge of incident response, reliability practices, and service health monitoring
  • Experience with scripting or coding for automation
  • Strong troubleshooting and systems thinking capability

Preferred Qualifications

  • Experience with Kubernetes, Prometheus, Grafana, ELK, Datadog, or similar platforms
  • Familiarity with distributed systems and high-availability design
  • Experience working in always-on or client-critical production environments

Required skills

Other jobs you may like

Go to Top