As the Lead for Site Reliability Engineering at Sun International, you'll ensure our iGaming platform runs smoothly and reliably. You'll lead a team, define key performance metrics, and promote a culture of automation and continuous improvement.
Dynamic and performance-driven, with a focus on innovation and teamwork.
The Lead: Site Reliability Engineering is responsible for establishing and driving the SRE discipline within Sun International. The role owns platform reliability, availability, and performance across all production environments, ensuring the iGaming platform delivers a seamless, always\-on experience for players, driving shared ownership of reliability across engineering and product teams. The role will define SLOs and error budgets, build observability and incident management capabilities, drive infrastructure\-as\-code adoption, and lead a team of Site Reliability Engineers. The role bridges development and operations, embedding reliability practices into the software delivery lifecycle and championing a culture of automation, resilience, and continuous improvement. **Core behavioural \& Technical / proficiency competencies:** * SRE Principles (SLOs, SLIs, error budgets, toil reduction) * Azure Cloud Platform (compute, networking, storage, PaaS, security) * Infrastructure as Code (Terraform, Bicep, ARM templates) * Observability \& Monitoring (Grafana, Prometheus, Azure Monitor, Application Insights) * Containerisation \& Orchestration (Docker, Kubernetes / AKS) * Incident Management \& Disaster Recovery * CI/CD \& Deployment Strategies (Azure DevOps, blue\-green, canary) * Scripting \& Automation (PowerShell, Bash, Python) * Security Hardening \& Compliance (Defender for Cloud, PCI\-DSS) * Cost Optimisation \& FinOps **Qualifications:** * Bachelor’s degree in computer science, Software Engineering, Information Technology, or a related field (required) **Experience:** * 10 years’ experience in software development, DevOps, or site reliability engineering, inclusive of 3 years in a leadership or senior technical role * Proven experience defining and operating SLOs, SLIs, and error budgets in a production environment * Deep knowledge of Azure cloud services (App Services, AKS, Azure SQL, Azure Monitor, Application Insights, Key Vault, Front Door, Traffic Manager)
You'll be taken to the original listing on za.indeed.com to apply.