How SRE Teams Use Game Days to Build Incident Muscle Memory cover art

How SRE Teams Use Game Days to Build Incident Muscle Memory

How SRE Teams Use Game Days to Build Incident Muscle Memory

Listen for free

View show details
In this episode of The Site Reliability Podcast, Lucas and Luna explore how SRE teams use Game Days — simulated incidents — to build muscle memory and improve real-world response. They break down why Netflix's Chaos Monkey was just the beginning, and how modern teams run everything from network partitions to database failovers in a controlled environment. The conversation covers the key elements of a successful Game Day: blameless culture, clear objectives, and a 'no surprises' wrap-up. Lucas shares a concrete example: a 2025 study by Gremlin found that teams running quarterly Game Days reduced mean time to resolution by 34 percent. They also discuss common pitfalls like over-engineering scenarios and failing to include non-engineering stakeholders. Listeners walk away with a practical template for starting their own Game Day program, including the three questions every drill should answer: What did we learn? What broke? What do we fix next? #SiteReliabilityEngineering #SRE #GameDays #IncidentResponse #ChaosEngineering #Resilience #Uptime #ProductionEngineering #FexingoBusiness #BusinessPodcast #Technology #Podcast #Netflix #ChaosMonkey #Gremlin #MTTR #BlamelessCulture #Reliability Keep every episode free: buymeacoffee.com/fexingo
adbl_web_anon_alc_button_suppression_t1
No reviews yet