Appearance
Outage playbook
Anthropic's public status page shows substantial instability in recent months: ~15 incidents in ~10 days in mid-April 2026 alone, including a ~13-hour Opus 4.6 outage on 2026-04-20, repeated Sonnet 4.6 error spikes, and multiple 1–2 hour API/login incidents. 90-day uptime has ranged 98.8 %–99.91 % — at the lower bound, that is ~105 hours/year of customer-facing unavailability. Unacceptable for a product where the LLM is the interface.
This playbook specifies exactly how Florence stays up, and stays at quality, when a provider degrades or fails. The design goal is explicit:
Quality-preserving failover first, quality-regressing failover only as a last resort. Most Claude outages are regional or model-specific — a same-weights fallback (different region, different transport) preserves full quality and is invisible to users.
Provider risk & portability frames the strategic risks and the four tiers of vendor switch. This doc is the operational companion: exact thresholds, concrete runbook steps, graceful degradation modes, and the chaos-drill cadence that keeps all of it rehearsed.
Failure-domain analysis — what is actually independent?
Not all "Claude" endpoints share a failure domain. This determines which failovers preserve quality.
| Endpoint | Failure domain | Correlation with other endpoints |
|---|---|---|
Anthropic direct API (api.anthropic.com) — us-east | Anthropic infra | Correlates with claude.ai; uncorrelated with Bedrock |
| Anthropic direct API — other regions | Anthropic infra | Same as above |
AWS Bedrock Claude — us-east-1 | AWS region | Uncorrelated with Anthropic direct; correlated with other AWS services in region |
AWS Bedrock Claude — us-west-2 | AWS region | Uncorrelated with us-east-1 |
AWS Bedrock Claude — eu-west-1, ap-northeast-1 | AWS region | Uncorrelated with other regions |
| OpenAI via Azure | Microsoft / OpenAI | Fully uncorrelated |
| Vertex AI Gemini | Fully uncorrelated | |
| Self-hosted SageMaker | Our AWS region | Correlates with own region |
The useful insight: Anthropic's Apr 20 Opus outage was at Anthropic's infra. Bedrock Claude Opus in us-east-1 was likely unaffected. A product that routes only via Anthropic direct feels the full 13-hour outage; a product with Bedrock as a parallel transport feels nothing.
The five-tier quality-preserving cascade
At every turn, Florence attempts providers in order. Each tier is attempted with a tight per-tier timeout; failure or timeout promotes to the next tier. Tiers 1–3 preserve full quality (same model family, same weights). Tier 4 accepts measured quality drift. Tier 5 is a graceful degradation, not a Florence turn.
Tier 1 — Primary transport, primary region
Whatever is configured as FLORENCE_LLM_PRIMARY — e.g., bedrock-claude-sonnet-4-6@us-east-1. Timeout budget: 8 s for first token; 30 s total. Retries on 5xx: 1 retry with jittered backoff.
Tier 2 — Same model, alternate region
Same Claude model, different Bedrock region. bedrock-claude-sonnet-4-6@us-west-2 or @eu-west-1. Zero quality drift — identical weights, identical behavior. The only variable is cross-region latency (+50–150 ms typically).
This is the single most effective failover step. AWS region outages are rare and regionally scoped; "all Bedrock regions down" has never happened.
Tier 3 — Alternate transport, same model
Bedrock → Anthropic direct, or Anthropic direct → Bedrock. Same Claude model, same prompt, same tools — just a different vendor's infrastructure carrying the request. Zero quality drift.
Catches the Anthropic-infra-specific outages (like the Apr 15 login/API incident, the Apr 20 Opus outage at Anthropic — Bedrock was likely unaffected).
Tier 4 — Alternate model within Claude family
Sonnet failing? Try Haiku. Opus failing? Try Sonnet. Per-model circuit breaker trips only that model.
Measured quality drift. For ~85 % of turns (lookup + response synthesis, already Haiku-default) this is zero-impact. For the 14 % of Sonnet turns, Haiku-on-Sonnet-task is measurably weaker on complex plan comparison and SEP triage — but still better than "Florence is unavailable."
Eval harness maintains a "downshift" eval set that measures exactly how much quality we lose when Sonnet turns are forcibly run on Haiku; published to the outage-readiness dashboard.
Tier 5 — Cross-vendor (Claude → OpenAI / Gemini)
Invokes the provider-risk Tier 2 switch. Adaptation prompts engage. Measured quality drift, possibly in both directions (some tasks OpenAI is stronger on, some Claude is).
Only engaged when Tiers 1–4 are all failing — i.e., broad Claude unavailability across transports. Rare, but Apr 15 (Anthropic login/API) + a hypothetical AWS Bedrock regional failure same-day would qualify.
Tier 6 — Graceful deterministic-only mode
All LLM providers unreachable. Florence UI switches to a clearly-labeled degraded state:
- "Florence is temporarily unavailable. You can still browse plans, check drug coverage, and use the full platform — just without the conversational assistant."
- All UI flows (
/plans,/agents, member dashboard) remain fully functional via the deterministic API — no LLM required. - Authenticated members see a banner offering to leave a message that Florence will answer when she's back; message queued to the audit-log collection.
- Voice is disabled with a specific message; text-mode basic forms stay available.
Florence fully degraded but the product is not down. The deterministic platform exists for exactly this.
Circuit breakers and thresholds
Per-tier, per-model, per-region breakers. Separate state so one model failing doesn't trip an unrelated one.
| Signal | Window | Threshold | Action |
|---|---|---|---|
| 5xx / timeout rate on (tier, model, region) | 60 s rolling | > 10 % | Open breaker for that combo; promote to next tier automatically |
| Latency p95 on (tier, model, region) | 60 s rolling | > 3 × baseline | Same — open breaker, promote |
| Tool-call failure rate | 5 min rolling | > 5 % | Alert on-call; evaluate breaker |
| Grounding-check failure spike | 10 min rolling | > 3 × baseline | Alert on-call; possible model regression — consider Tier 4 downshift |
| Broad Claude unavailable (all tiers 1–4 tripped) | 30 s | All open | Auto-engage Tier 5; page on-call |
| All providers unavailable | 30 s | Tiers 1–5 open | Auto-engage Tier 6; page founder |
Breakers half-open after a cooldown (60 s → 5 min exponential) with a small canary-probe traffic share. Fully close on sustained success.
Request hedging — for turns where user experience dominates cost
For a small fraction of high-value turns, send the same request to two providers simultaneously and accept the first valid response. Costs ~2 × for hedged turns; kills tail latency during partial outages.
Default hedging policy:
- Hedge always: first turn of a conversation (user's first-impression experience). ~1 % of turns at typical usage.
- Hedge conditionally: when primary breaker is half-open (during recovery). ~<0.1 % of turns in normal operation; higher during recovery.
- Never hedge: routine mid-conversation turns. Keeps steady-state cost overhead to ~1 %.
Hedging is a weapon against partial degradation (some requests slow, others fast) which is harder to circuit-break on than a clean failure.
Response cache — the freebie during outages
Anthropic prompt caching is about prefix hits on the Anthropic side. Separately, Florence maintains a response cache keyed on (prompt_version, tool_results_hash, user_context_hash) for deterministic-grade queries — "what's the cheapest Silver plan for a family of 4 in Miami at $45k income?" is the same answer for any user who asks it.
Sizes:
- Cache size target: ~10 000 most-common deterministic-grade answers
- TTL: 24 hours (tool results themselves cache-invalidate on deterministic-data updates)
- Hit rate in normal operation: ~15 % (bonus on cost)
- Hit rate during LLM outage: effectively 100 % for any cached query — user asks something we've answered before, Florence serves from cache with a freshness marker
During a Tier 6 outage, cache serves what it can before the deterministic-only banner engages.
Multi-region Bedrock — the operational detail
Currently staging has a bedrock-runtime VPC endpoint only in us-east-1. For the outage posture:
- Provision Bedrock access in at least three regions pre-launch:
us-east-1,us-west-2, and one ofeu-west-1/ap-northeast-1. Region-specific Bedrock access must be explicitly enabled per model (AWS Bedrock model-access UI). - Private endpoints (VPC interface endpoints for
bedrock-runtime) in each of those regions, scoped to the ECS security group. Avoids relying on public internet egress during a regional degradation where routing is flaky. - Automatic region selection by the FlorenceLLMProvider adapter — starts with nearest region, promotes to alternate regions on breaker trip.
- Region-level evals — daily eval suite runs against each region; a region failing eval (rare but possible during a regional incident) auto-excludes from the selection pool until it recovers.
Chaos drill schedule
Outage plans atrophy if not rehearsed. Cadence:
| Frequency | Drill |
|---|---|
| Weekly | Tier 2 region swap — force us-east-1 breaker closed for 10 min; verify seamless failover to us-west-2. Automated; runs in staging. |
| Monthly | Tier 3 transport swap — flip Anthropic direct ↔ Bedrock in production for 30 min. Monitor parity. |
| Quarterly | Tier 4 downshift — force Sonnet circuit open for 60 min; measure quality delta on live traffic against the downshift eval set. |
| Quarterly | Tier 5 cross-vendor — route 1–2 % of traffic through OpenAI / Gemini for 1 hour; post quality-comparison report. |
| Semi-annual | Tier 6 "nuclear drill" — in staging only, disable all LLM providers; verify deterministic-only mode engages cleanly, cache serves, UI banner renders, member messages queue correctly. |
| On any sustained production outage | Post-incident review within 48 hours — did the cascade engage as designed? Any tier skip? What to tune? |
Drills are not optional. A cascade that has never been exercised is a cascade that doesn't work when you need it.
SLO targets during outages
Outage behavior has its own SLO, distinct from normal operation:
| Metric | Normal SLO | Outage SLO (any tier engaged) | Hard floor (Tier 6 engaged) |
|---|---|---|---|
| Florence turn success rate | ≥ 99.5 % | ≥ 99 % | N/A (Florence unavailable, deterministic-only) |
| First-token latency p95 | ≤ 500 ms | ≤ 1000 ms | N/A |
| Grounding check pass rate | ≥ 99.5 % | ≥ 99 % | N/A |
| Deterministic platform availability | ≥ 99.9 % | ≥ 99.9 % | ≥ 99.9 % |
| User-visible degradation banner when Tier 6 engaged | — | — | Shown within 30 s |
The deterministic platform SLO never drops. Whatever happens to the LLM, the user can still browse plans and see accurate prices.
Customer-visible behavior across the cascade
Consistent framing across tiers; users should never see raw errors.
| Tier engaged | User-facing behavior |
|---|---|
| 1 | Normal operation |
| 2, 3 | Normal operation. No indicator. Latency may be slightly higher (~100–200 ms). |
| 4 | Normal operation. No indicator unless the downshift materially affects a complex comparison — in which case Florence offers "Want me to dig deeper? This might take a moment," which is a natural-sounding way to prompt a Tier 3 retry or patience. |
| 5 | Normal operation. No indicator. Voice/style may feel very slightly different. |
| 6 | Explicit banner: "Florence is temporarily unavailable. The rest of the platform works normally — browse plans, check coverage, manage your account. [Leave a message for Florence]" |
Never show raw "provider error 503" to a user. Ever.
What the FlorenceRuntime needs to own
The runtime is where the cascade executes. Concrete additions to runtime beyond what's already documented:
- Multi-region Bedrock discovery. Not a single
LLM_PROVIDERenv var — a config array of(provider, region, model)tuples in priority order. - Per-(provider, region, model) circuit breaker state. Stored in-memory per server instance; shared via Redis or equivalent for coordinated ingress behavior. Breaker decisions propagate within ~1 second.
- Hedging policy module. Declarative:
{ always: [firstTurn], conditional: [breakerHalfOpen] }. - Response-cache lookup layer. Sits before Tier 1 attempt; populates on successful response.
- Tier 6 "deterministic-only" mode toggle. Server-side flag; UI polls it and renders the banner.
- Outage-readiness dashboard. Rolls up: current tier in use, breaker state per combo, eval pass rate per region, downshift-quality-delta trend.
What this costs vs. what it buys
Ongoing cost:
- Multi-region Bedrock: VPC endpoints in 3 regions (~$30–$40/region/month for endpoints + minor cross-region latency when failed over)
- Hedging: ~1 % LLM spend overhead at baseline
- Response cache infra: Redis / DynamoDB TTL-based; ~low $100s/month at 10 k member scale
- Chaos drills: engineering time monthly; production impact negligible
- Daily region-level evals: ~$10–20/day at the full cascade
Estimated total: ~3–4 % of steady-state LLM spend as outage-preparedness overhead.
What it buys:
- Zero user-visible impact on the majority of Anthropic-infra incidents (Apr 15 login/API would have been transparent; Apr 20 Opus outage would have been transparent via Tier 3 or Tier 4 downshift if Bedrock was also affected)
- Measurable-but-bounded quality impact on broad Claude outages (Tier 4 / 5)
- Graceful product-survives mode on total-LLM-failure scenarios (Tier 6)
- Defensible uptime story when members, agents, or EDE auditors ask "what happens when your AI goes down?"
Related
- Provider risk & portability — strategic framing + the four tiers of vendor switch (Tiers 4–5 here correspond to those)
- Runtime — where the cascade executes
- Evals & observability — downshift eval set, outage-readiness dashboard
- Voice — vendor strategy — the same cascade discipline applies to ASR + TTS
- AWS Bedrock status: status.aws.amazon.com
- Anthropic status: status.claude.com
Tracking
Open items to land before Florence text launch:
- [ ] Provision Bedrock access in ≥ 3 regions with private VPC endpoints
- [ ] Implement multi-region / multi-transport FlorenceLLMProvider adapter
- [ ] Circuit-breaker module with per-combo state
- [ ] Response-cache layer + Tier 6 mode toggle
- [ ] Downshift eval set authored and integrated into CI
- [ ] Chaos drill #1 (weekly region swap in staging) scheduled
- [ ] Outage-readiness dashboard live before first user sees Florence