Outage playbook

Anthropic's public status page shows substantial instability in recent months: ~15 incidents in ~10 days in mid-April 2026 alone, including a ~13-hour Opus 4.6 outage on 2026-04-20, repeated Sonnet 4.6 error spikes, and multiple 1–2 hour API/login incidents. 90-day uptime has ranged 98.8 %–99.91 % — at the lower bound, that is ~105 hours/year of customer-facing unavailability. Unacceptable for a product where the LLM is the interface.

This playbook specifies exactly how Florence stays up, and stays at quality, when a provider degrades or fails. The design goal is explicit:

Quality-preserving failover first, quality-regressing failover only as a last resort. Most Claude outages are regional or model-specific — a same-weights fallback (different region, different transport) preserves full quality and is invisible to users.

Provider risk & portability frames the strategic risks and the four tiers of vendor switch. This doc is the operational companion: exact thresholds, concrete runbook steps, graceful degradation modes, and the chaos-drill cadence that keeps all of it rehearsed.

Failure-domain analysis — what is actually independent?

Not all "Claude" endpoints share a failure domain. This determines which failovers preserve quality.

Endpoint	Failure domain	Correlation with other endpoints
Anthropic direct API (`api.anthropic.com`) — us-east	Anthropic infra	Correlates with claude.ai; uncorrelated with Bedrock
Anthropic direct API — other regions	Anthropic infra	Same as above
AWS Bedrock Claude — `us-east-1`	AWS region	Uncorrelated with Anthropic direct; correlated with other AWS services in region
AWS Bedrock Claude — `us-west-2`	AWS region	Uncorrelated with us-east-1
AWS Bedrock Claude — `eu-west-1`, `ap-northeast-1`	AWS region	Uncorrelated with other regions
OpenAI via Azure	Microsoft / OpenAI	Fully uncorrelated
Vertex AI Gemini	Google	Fully uncorrelated
Self-hosted SageMaker	Our AWS region	Correlates with own region

The useful insight: Anthropic's Apr 20 Opus outage was at Anthropic's infra. Bedrock Claude Opus in us-east-1 was likely unaffected. A product that routes only via Anthropic direct feels the full 13-hour outage; a product with Bedrock as a parallel transport feels nothing.

The five-tier quality-preserving cascade

At every turn, Florence attempts providers in order. Each tier is attempted with a tight per-tier timeout; failure or timeout promotes to the next tier. Tiers 1–3 preserve full quality (same model family, same weights). Tier 4 accepts measured quality drift. Tier 5 is a graceful degradation, not a Florence turn.

Tier 1 — Primary transport, primary region

Whatever is configured as FLORENCE_LLM_PRIMARY — e.g., bedrock-claude-sonnet-4-6@us-east-1. Timeout budget: 8 s for first token; 30 s total. Retries on 5xx: 1 retry with jittered backoff.

Tier 2 — Same model, alternate region

Same Claude model, different Bedrock region. bedrock-claude-sonnet-4-6@us-west-2 or @eu-west-1. Zero quality drift — identical weights, identical behavior. The only variable is cross-region latency (+50–150 ms typically).

This is the single most effective failover step. AWS region outages are rare and regionally scoped; "all Bedrock regions down" has never happened.

Tier 3 — Alternate transport, same model

Bedrock → Anthropic direct, or Anthropic direct → Bedrock. Same Claude model, same prompt, same tools — just a different vendor's infrastructure carrying the request. Zero quality drift.

Catches the Anthropic-infra-specific outages (like the Apr 15 login/API incident, the Apr 20 Opus outage at Anthropic — Bedrock was likely unaffected).

Tier 4 — Alternate model within Claude family

Sonnet failing? Try Haiku. Opus failing? Try Sonnet. Per-model circuit breaker trips only that model.

Measured quality drift. For ~85 % of turns (lookup + response synthesis, already Haiku-default) this is zero-impact. For the 14 % of Sonnet turns, Haiku-on-Sonnet-task is measurably weaker on complex plan comparison and SEP triage — but still better than "Florence is unavailable."

Eval harness maintains a "downshift" eval set that measures exactly how much quality we lose when Sonnet turns are forcibly run on Haiku; published to the outage-readiness dashboard.

Tier 5 — Cross-vendor (Claude → OpenAI / Gemini)

Invokes the provider-risk Tier 2 switch. Adaptation prompts engage. Measured quality drift, possibly in both directions (some tasks OpenAI is stronger on, some Claude is).

Only engaged when Tiers 1–4 are all failing — i.e., broad Claude unavailability across transports. Rare, but Apr 15 (Anthropic login/API) + a hypothetical AWS Bedrock regional failure same-day would qualify.

Tier 6 — Graceful deterministic-only mode

All LLM providers unreachable. Florence UI switches to a clearly-labeled degraded state:

"Florence is temporarily unavailable. You can still browse plans, check drug coverage, and use the full platform — just without the conversational assistant."
All UI flows (/plans, /agents, member dashboard) remain fully functional via the deterministic API — no LLM required.
Authenticated members see a banner offering to leave a message that Florence will answer when she's back; message queued to the audit-log collection.
Voice is disabled with a specific message; text-mode basic forms stay available.

Florence fully degraded but the product is not down. The deterministic platform exists for exactly this.

Circuit breakers and thresholds

Per-tier, per-model, per-region breakers. Separate state so one model failing doesn't trip an unrelated one.

Signal	Window	Threshold	Action
5xx / timeout rate on (tier, model, region)	60 s rolling	> 10 %	Open breaker for that combo; promote to next tier automatically
Latency p95 on (tier, model, region)	60 s rolling	> 3 × baseline	Same — open breaker, promote
Tool-call failure rate	5 min rolling	> 5 %	Alert on-call; evaluate breaker
Grounding-check failure spike	10 min rolling	> 3 × baseline	Alert on-call; possible model regression — consider Tier 4 downshift
Broad Claude unavailable (all tiers 1–4 tripped)	30 s	All open	Auto-engage Tier 5; page on-call
All providers unavailable	30 s	Tiers 1–5 open	Auto-engage Tier 6; page founder

Breakers half-open after a cooldown (60 s → 5 min exponential) with a small canary-probe traffic share. Fully close on sustained success.

Request hedging — for turns where user experience dominates cost

For a small fraction of high-value turns, send the same request to two providers simultaneously and accept the first valid response. Costs ~2 × for hedged turns; kills tail latency during partial outages.

Default hedging policy:

Hedge always: first turn of a conversation (user's first-impression experience). ~1 % of turns at typical usage.
Hedge conditionally: when primary breaker is half-open (during recovery). ~<0.1 % of turns in normal operation; higher during recovery.
Never hedge: routine mid-conversation turns. Keeps steady-state cost overhead to ~1 %.

Hedging is a weapon against partial degradation (some requests slow, others fast) which is harder to circuit-break on than a clean failure.

Response cache — the freebie during outages

Anthropic prompt caching is about prefix hits on the Anthropic side. Separately, Florence maintains a response cache keyed on (prompt_version, tool_results_hash, user_context_hash) for deterministic-grade queries — "what's the cheapest Silver plan for a family of 4 in Miami at $45k income?" is the same answer for any user who asks it.

Sizes:

Cache size target: ~10 000 most-common deterministic-grade answers
TTL: 24 hours (tool results themselves cache-invalidate on deterministic-data updates)
Hit rate in normal operation: ~15 % (bonus on cost)
Hit rate during LLM outage: effectively 100 % for any cached query — user asks something we've answered before, Florence serves from cache with a freshness marker

During a Tier 6 outage, cache serves what it can before the deterministic-only banner engages.

Multi-region Bedrock — the operational detail

Currently staging has a bedrock-runtime VPC endpoint only in us-east-1. For the outage posture:

Provision Bedrock access in at least three regions pre-launch: us-east-1, us-west-2, and one of eu-west-1 / ap-northeast-1. Region-specific Bedrock access must be explicitly enabled per model (AWS Bedrock model-access UI).
Private endpoints (VPC interface endpoints for bedrock-runtime) in each of those regions, scoped to the ECS security group. Avoids relying on public internet egress during a regional degradation where routing is flaky.
Automatic region selection by the FlorenceLLMProvider adapter — starts with nearest region, promotes to alternate regions on breaker trip.
Region-level evals — daily eval suite runs against each region; a region failing eval (rare but possible during a regional incident) auto-excludes from the selection pool until it recovers.

Chaos drill schedule

Outage plans atrophy if not rehearsed. Cadence:

Frequency	Drill
Weekly	Tier 2 region swap — force `us-east-1` breaker closed for 10 min; verify seamless failover to `us-west-2`. Automated; runs in staging.
Monthly	Tier 3 transport swap — flip Anthropic direct ↔ Bedrock in production for 30 min. Monitor parity.
Quarterly	Tier 4 downshift — force Sonnet circuit open for 60 min; measure quality delta on live traffic against the downshift eval set.
Quarterly	Tier 5 cross-vendor — route 1–2 % of traffic through OpenAI / Gemini for 1 hour; post quality-comparison report.
Semi-annual	Tier 6 "nuclear drill" — in staging only, disable all LLM providers; verify deterministic-only mode engages cleanly, cache serves, UI banner renders, member messages queue correctly.
On any sustained production outage	Post-incident review within 48 hours — did the cascade engage as designed? Any tier skip? What to tune?

Drills are not optional. A cascade that has never been exercised is a cascade that doesn't work when you need it.

SLO targets during outages

Outage behavior has its own SLO, distinct from normal operation:

Metric	Normal SLO	Outage SLO (any tier engaged)	Hard floor (Tier 6 engaged)
Florence turn success rate	≥ 99.5 %	≥ 99 %	N/A (Florence unavailable, deterministic-only)
First-token latency p95	≤ 500 ms	≤ 1000 ms	N/A
Grounding check pass rate	≥ 99.5 %	≥ 99 %	N/A
Deterministic platform availability	≥ 99.9 %	≥ 99.9 %	≥ 99.9 %
User-visible degradation banner when Tier 6 engaged	—	—	Shown within 30 s

The deterministic platform SLO never drops. Whatever happens to the LLM, the user can still browse plans and see accurate prices.

Customer-visible behavior across the cascade

Consistent framing across tiers; users should never see raw errors.

Tier engaged	User-facing behavior
1	Normal operation
2, 3	Normal operation. No indicator. Latency may be slightly higher (~100–200 ms).
4	Normal operation. No indicator unless the downshift materially affects a complex comparison — in which case Florence offers "Want me to dig deeper? This might take a moment," which is a natural-sounding way to prompt a Tier 3 retry or patience.
5	Normal operation. No indicator. Voice/style may feel very slightly different.
6	Explicit banner: "Florence is temporarily unavailable. The rest of the platform works normally — browse plans, check coverage, manage your account. [Leave a message for Florence]"

Never show raw "provider error 503" to a user. Ever.

What the FlorenceRuntime needs to own

The runtime is where the cascade executes. Concrete additions to runtime beyond what's already documented:

Multi-region Bedrock discovery. Not a single LLM_PROVIDER env var — a config array of (provider, region, model) tuples in priority order.
Per-(provider, region, model) circuit breaker state. Stored in-memory per server instance; shared via Redis or equivalent for coordinated ingress behavior. Breaker decisions propagate within ~1 second.
Hedging policy module. Declarative: { always: [firstTurn], conditional: [breakerHalfOpen] }.
Response-cache lookup layer. Sits before Tier 1 attempt; populates on successful response.
Tier 6 "deterministic-only" mode toggle. Server-side flag; UI polls it and renders the banner.
Outage-readiness dashboard. Rolls up: current tier in use, breaker state per combo, eval pass rate per region, downshift-quality-delta trend.

What this costs vs. what it buys

Ongoing cost:

Multi-region Bedrock: VPC endpoints in 3 regions (~$30–$40/region/month for endpoints + minor cross-region latency when failed over)
Hedging: ~1 % LLM spend overhead at baseline
Response cache infra: Redis / DynamoDB TTL-based; ~low $100s/month at 10 k member scale
Chaos drills: engineering time monthly; production impact negligible
Daily region-level evals: ~$10–20/day at the full cascade

Estimated total: ~3–4 % of steady-state LLM spend as outage-preparedness overhead.

What it buys:

Zero user-visible impact on the majority of Anthropic-infra incidents (Apr 15 login/API would have been transparent; Apr 20 Opus outage would have been transparent via Tier 3 or Tier 4 downshift if Bedrock was also affected)
Measurable-but-bounded quality impact on broad Claude outages (Tier 4 / 5)
Graceful product-survives mode on total-LLM-failure scenarios (Tier 6)
Defensible uptime story when members, agents, or EDE auditors ask "what happens when your AI goes down?"

Provider risk & portability — strategic framing + the four tiers of vendor switch (Tiers 4–5 here correspond to those)
Runtime — where the cascade executes
Evals & observability — downshift eval set, outage-readiness dashboard
Voice — vendor strategy — the same cascade discipline applies to ASR + TTS
AWS Bedrock status: status.aws.amazon.com
Anthropic status: status.claude.com

Tracking

Open items to land before Florence text launch:

[ ] Provision Bedrock access in ≥ 3 regions with private VPC endpoints
[ ] Implement multi-region / multi-transport FlorenceLLMProvider adapter
[ ] Circuit-breaker module with per-combo state
[ ] Response-cache layer + Tier 6 mode toggle
[ ] Downshift eval set authored and integrated into CI
[ ] Chaos drill #1 (weekly region swap in staging) scheduled
[ ] Outage-readiness dashboard live before first user sees Florence