Appearance
Runbook — Security Incident Response
Use this in the moment. This is the first-responder checklist; the policy framework lives at
docs/security-compliance/incident-response-plan.md.
Severity decision (one sentence)
If PHI may have been accessed by an unauthorized party OR the site is fully down to customers OR there is active exploitation in progress → SEV-0. Page the IC immediately.
For the full SEV-0/1/2/3 matrix see the IRP severity classification.
Page the IC (Incident Commander)
| Method | Use when |
|---|---|
| Text Taha at his personal phone | First contact for SEV-0 / SEV-1, day or night |
Email [email protected] + [email protected] simultaneously | First contact for SEV-2 |
| Google Chat in the team space | SEV-3 |
Incident Commander acks within the notification window (15 min for SEV-0; 1h for SEV-1; 4h for SEV-2; 24h for SEV-3).
Step 1 — Detect + open incident channel
When the IC acks:
- Open a private incident channel: Google Chat space named
🚨 sev-N incident YYYY-MM-DD <short slug>. - Invite: IC + Compliance Liaison (Asad) + Comms Lead (Ian) + the team member who detected the incident.
- Pin the incident summary at the top: 1 sentence on what was detected + 1 sentence on initial impact estimate.
Step 2 — Contain
The IC + Engineering Responder execute, in this order:
Stop the bleeding. Pick the smallest action that stops the immediate damage:
- If a credential is suspected compromised → rotate it.
aws secretsmanager update-secret --secret-id <id> --secret-string <new>then deploy task-def update. - If an Atlas user is suspected compromised → revoke the user's password (
atlas dbusers update --password <random>) or disable the user. - If a route is exposing data → disable the route. Feature-flag toggle, or temporary
503deploy. - If a source IP is hostile → block at AWS WAF + Atlas IP allowlist.
- If an ECS task is compromised → stop the task (
aws ecs stop-task --task <arn>). ECS replaces it automatically; the replaced task does NOT inherit the suspect's state.
- If a credential is suspected compromised → rotate it.
Preserve evidence. Do NOT delete logs. Do NOT clean up. Snapshot first:
- Atlas:
atlas clusters snapshots create askflorence-prod-01 --description "incident-<slug>". - S3: ensure bucket versioning is on (default for stateful buckets); take an inventory of suspect objects via
aws s3api list-objects-v2. - CloudWatch: log groups have 90-day default retention; capture relevant streams via
aws logs filter-log-eventsand save to the incident channel. - GuardDuty: capture finding ARNs in the incident channel.
- Atlas:
Stand up a war-room cadence for SEV-0/1: 30-min IC updates in the incident channel until status is "stable."
Step 3 — Assess
The IC + Engineering Responder + Compliance Liaison collaborate:
- What data was accessed? Read the audit log (
agent_audit_logfor app-layer, CloudTrail for AWS-layer, Atlas database audit for DB-layer). - Whose data? Build a candidate-affected-individuals list. If PHI / PII is in scope, the list goes into the Compliance Liaison's regulatory-clock tracker.
- HIPAA breach definition (45 CFR §164.402): unauthorized acquisition, access, use, or disclosure of unsecured PHI. If yes, the 60-day notification clock starts at discovery — record the discovery timestamp prominently.
- Time-bound the assessment: SEV-0/1 assessment within 24h; SEV-2 within 72h.
Step 4 — Notify
The Compliance Liaison owns the regulatory clock; the Comms Lead owns the messaging.
| Recipient | Trigger | Deadline | Owner | How |
|---|---|---|---|---|
| Affected individuals | HIPAA breach involving their PHI | 60 days from discovery (CA = 30 days for residents; check each state) | Comms Lead drafts; Compliance Liaison reviews | Per-individual letter or email with HHS-required content |
| HHS OCR | HIPAA breach (any number of individuals) | 60 days (>500 affected) or annually (<500) | Compliance Liaison | OCR breach portal |
| Media | HIPAA breach affecting >500 individuals in a state | 60 days | Comms Lead | "Prominent media outlet" in the state |
| State AG | Per state-specific law | Varies; default 30 days | Compliance Liaison | Per state-specific procedure |
| CMS EDE program contact | EDE program-eligibility-relevant incident (post-submission) | Per program | Compliance Liaison | EDE program portal |
| Affected vendor (BAA partner) | Incident involves their data flow | Per BAA terms (typically 30 days) | Compliance Liaison | Per vendor contract |
| Investors + advisors | SEV-0 customer-facing | Same business day | Founder (Taha) | Email + scheduled brief |
| Internal team | All SEV-0/1 | Immediate | IC | Incident channel |
Each notification's sent date is recorded in the incident channel + the post-mortem file.
Step 5 — Remediate + post-mortem
- Implement the durable fix. Document the fix's deploy timestamp in the incident channel.
- Verify remediation. Run the relevant CI guards (
staging-collections-guard,staging-cluster-drift,validate-secrets) + a synthetic exercise of the previously-vulnerable path. - Close the incident when (a) the immediate vector is closed, (b) all required notifications are sent, (c) the durable fix is deployed, (d) the post-mortem placeholder is open.
- File the post-mortem within 5 business days at
docs/session-log/<date>-incident-<slug>.md. Use the template below.
Post-mortem template
markdown
---
title: "Incident post-mortem — <slug>"
date: YYYY-MM-DD
severity: SEV-N
status: closed
---
# Incident — <slug>
## Timeline
| When (UTC) | What |
|---|---|
| YYYY-MM-DD HH:MM | Detection — `<source>` |
| YYYY-MM-DD HH:MM | IC acknowledged + incident channel opened |
| YYYY-MM-DD HH:MM | Containment action — `<action>` |
| YYYY-MM-DD HH:MM | Assessment complete |
| YYYY-MM-DD HH:MM | Notifications sent (per regulatory clock) |
| YYYY-MM-DD HH:MM | Durable fix deployed |
| YYYY-MM-DD HH:MM | Incident closed |
## Impact
- Data exposure: `<scope>`
- Affected individuals: `<count, or N/A>`
- Customer-visible impact: `<duration, or N/A>`
- Financial impact: `<estimate, or N/A>`
## Root cause
`<plain language; 1-3 paragraphs>`
## Contributing factors
`<bulleted list>`
## What worked
`<bulleted list>`
## What didn't
`<bulleted list>`
## Preventive measures
| Owner | Action | Due |
|---|---|---|
| | | |
## Regulatory notifications
| Recipient | Sent | Confirmation # |
|---|---|---|
| | | |Preventive-measure rows feed into the next quarterly access review until they close.
When in doubt
- Classify SEVerity higher (one tier up if you're unsure).
- Stop the bleeding first; root-cause analysis can wait.
- Don't clean up — preserve evidence.
- Ack early to the team — silence on a suspected incident is worse than a false alarm.
Reference
- Incident Response Plan — full policy framework + worked examples
- Break-Glass Root Login — when standard auth is unavailable
- Atlas user provisioning runbook — credential rotation specifics
- Risk Assessment — known risks the IRP must handle
- HIPAA Breach Notification Rule: 45 CFR §§164.400-414
- HHS OCR Breach Portal: https://ocrportal.hhs.gov/ocr/breach/wizard_breach.jsf