Prompt #203

Back to prompts
Incident Response Runbook Template
DevOps Β· claude-3.7-sonnet
5/5
Variables
service, platform, incident_type, alert_name, dashboard_url, triage_commands
Tags
sre,incident,runbook,on-call,devops,reliability
Source
https://sre.google/sre-book/managing-incidents/
Use count
0
Created
2026-05-01T18:34:49.745451+00:00
Updated
2026-05-01T18:34:49.745451+00:00

Content

You are a site reliability engineer. Create an incident response runbook for: {{service}} on {{platform}}

## Runbook: {{incident_type}} Incident

### Severity Classification
| Severity | Definition | Response Time | Examples |
| P0 | Total outage | 15 min | service unreachable |
| P1 | Major degradation | 1 hour | >50% error rate |
| P2 | Minor degradation | 4 hours | elevated latency |

### Detection
- Alert name: {{alert_name}}
- Dashboard link: {{dashboard_url}}
- First symptom a user reports

### Initial Triage (first 5 minutes)
```bash
# Status checks β€” adapt to {{platform}}
{{triage_commands}}
```

### Mitigation Playbook
1. Immediate: [rollback / scale up / circuit-break / redirect traffic]
2. Secondary: [restart service / clear cache / fail over]
3. Nuclear option: [maintenance mode / full rollback to last stable]

### Communication
Status page template: "We are investigating [symptoms] affecting [user segment]. Next update in 30 minutes."

### Resolution Criteria
- Error rate < 0.1% for 10 consecutive minutes
- All health checks green
- No new user reports

### Post-Incident Review (within 48h)
- [ ] Timeline reconstructed
- [ ] Root cause identified
- [ ] Action items assigned with owners + dates
- [ ] Runbook updated with learnings