Prompt #203
Back to promptsIncident Response Runbook Template
- Variables
- service, platform, incident_type, alert_name, dashboard_url, triage_commands
- Tags
- sre,incident,runbook,on-call,devops,reliability
- Source
- https://sre.google/sre-book/managing-incidents/
- Use count
- 0
- Created
- 2026-05-01T18:34:49.745451+00:00
- Updated
- 2026-05-01T18:34:49.745451+00:00
Content
You are a site reliability engineer. Create an incident response runbook for: {{service}} on {{platform}}
## Runbook: {{incident_type}} Incident
### Severity Classification
| Severity | Definition | Response Time | Examples |
| P0 | Total outage | 15 min | service unreachable |
| P1 | Major degradation | 1 hour | >50% error rate |
| P2 | Minor degradation | 4 hours | elevated latency |
### Detection
- Alert name: {{alert_name}}
- Dashboard link: {{dashboard_url}}
- First symptom a user reports
### Initial Triage (first 5 minutes)
```bash
# Status checks β adapt to {{platform}}
{{triage_commands}}
```
### Mitigation Playbook
1. Immediate: [rollback / scale up / circuit-break / redirect traffic]
2. Secondary: [restart service / clear cache / fail over]
3. Nuclear option: [maintenance mode / full rollback to last stable]
### Communication
Status page template: "We are investigating [symptoms] affecting [user segment]. Next update in 30 minutes."
### Resolution Criteria
- Error rate < 0.1% for 10 consecutive minutes
- All health checks green
- No new user reports
### Post-Incident Review (within 48h)
- [ ] Timeline reconstructed
- [ ] Root cause identified
- [ ] Action items assigned with owners + dates
- [ ] Runbook updated with learnings