Skip to content
AskFlorence
Main Navigation ArchitectureFlorence AIAgentsMembersAgent PlatformValidationInfrastructure

Appearance

Sidebar Navigation

Overview

Home

Glossary

System Architecture

Consumer & Agent Flow

Florence AI

Overview

Principles

Runtime

Tool surface

Adding a tool

Tool registry

Knowledge: SBC scenarios & CSR

Voice

Evals & observability

Provider risk & portability

Outage playbook

Roadmap

Build plan

Agents

Overview

Workflows & pain points

Members

Overview

Medicaid coverage gap

Carriers

Overview

Marketplaces

Overview

Agency

Overview

Regulations

Overview

Agent Platform

Overview

Auth Architecture

MongoDB Permissioning

Compliance Model

Data Models

Data Sources

Overview

CMS Marketplace API

CMS dependency map

PUF Data

State Subsidies

SBE Ingestion Playbook

SBE State Watchouts + Decisions

CA Phase C/D Playbook

NY Phase C/D Playbook

Validation

Overview

Methodology

APTC Formula

California 2026

New York 2026

CAPS Formula

Scenario Results

Infrastructure

Account Inventory

AWS Setup Runbook

AWS Organizations

CloudTrail

GuardDuty

Security Hub

Config

CloudFront + WAFv2

Data sources & ingest

Phase 4 DNS

Change Log

Vulnerability Management

MongoDB Setup

Access Control

Data Classification

Documentation Hosting

Post-deploy Smoke

Development

Preflight (local CI mirror)

Testing strategy

Compliance

Overview (auditor entry point)

SOC 2 Control Mapping

HIPAA Control Mapping

CMS EDE Appendix A Mapping

Risk Assessment

Encryption Policy

Data Retention Policy

Privacy Impact Assessment

Consent Capture & Versioning

Incident Response Plan

Access Control Policy

Marketing vs. Portal Analytics

Vendor / Subprocessor Register

Dependency Vulnerability Policy

BAA / Compliance Evidence

Compliance-Automation Integration

Compliance-Automation Vendor Evaluation

Penetration Test Reports

Architecture

Portal entry handoff

Mobile app strategy

Deferred architecture decisions

Session cookie architecture

Share flows

Decisions (ADRs)

Index

0001 — Atlas project isolation

0002 — Append-only audit log

0003 — Narrow-scoped Mongo users

0004 — Cross-cluster Atlas PrivateLink

0005 — Delayed-job architecture

0006 — Mongo user simplification

0007 — Terraform owns ECS task def

0008 — E2E testing strategy

0009 — Self-hosted analytics + observability (superseded)

0010 — PostHog HIPAA Cloud (supersedes 0009)

Runbooks

Security Incident Response

Break-Glass Root Login

Onboard Team Member

Offboard Team Member

Atlas user provisioning

Deploy via Terraform (ENG-277)

Rollback via Terraform (ENG-277)

S3 data bucket migration (planned Phase 11)

Access Reviews

2026-Q2 Review

Session log

Index

2026-04-23 — Phase 10 DNS cutover

2026-04-22 — Phase 8 prod AWS mirror

2026-04-22 — Phase 7 Atlas VPC peering

2026-04-22 — Phase 6 CloudFront + WAF

2026-04-21 — Phase 5 staging go-live

2026-04-17 — Atlas staging

Briefs

Index

Member portal plan (ENG-187)

2026-04-16/17 handoff

2026-04-17 Atlas handoff

System briefing (2026-04-17)

Creative AdBundance proposal brief

Creative AdBundance analytics brief

ElevenLabs RN integration research

Policies

Overview

On this page

Adding (or removing) a tool ​

The tool surface will grow and change continuously. Drug coverage ships (Phase C, #17). Provider directory ships (Phase D, #18). Appointment booking, claims, bill review, renewal analysis, carrier integrations, agent-side drafting tools — all arrive as tools. Some earlier tools will be renamed, refactored, deprecated.

This playbook is the standard path for adding, versioning, and removing a Florence tool. Follow it every time.

Lifecycle at a glance ​

Adding a tool — checklist ​

1. Proposal (before writing code) ​

One-paragraph proposal in the PR description or a linked issue:

  • What member / agent scenario does this tool unlock?
  • What deterministic endpoint does it wrap? (If the endpoint doesn't exist yet, this is two PRs.)
  • What data classes flow through it (input + output)?
  • Which auth contexts should be allowed?
  • What failure modes matter most (slow API, empty result, auth edge cases)?

2. Design review (required before implementation) ​

Confirm with the Florence AI architect (Taha by default):

  • [ ] Tool name follows conventions: api_verb_noun or ui_verb_noun.
  • [ ] Input / output shapes are reviewed for LLM-friendliness (stable field names, compact, self-describing).
  • [ ] Data classification assessed against data classification. If any FTI or ApplicationPayload is in the output, additional egress-control review is required.
  • [ ] Auth contexts reviewed: is anonymous OK? Agent cross-member access intentional?
  • [ ] Cacheability decided: TTL set, invalidation rules clear.

3. Implementation ​

  • [ ] Write the tool in src/lib/florence/tools/api/<tool-name>.ts (or ui/<tool-name>.ts). One tool per file.
  • [ ] Export a FlorenceTool<Input, Output> object with every field populated.
  • [ ] Zod schemas for input and output.
  • [ ] Wire it into src/lib/florence/tools/registry.ts.
  • [ ] Add to tool registry doc.
  • [ ] If the tool reads plan / drug / provider data: handle SBE vs FFM correctly. Thread the user's state into the deterministic call, and confirm the discovery identifier matches the coverage identifier for owned-data states (NY, CA). CA providers key on Symphony providerId, not NPPES NPI — a tool that discovers via NPPES but checks coverage against CA data will silently return "not covered." See the SBE vs FFM data lookup section in the tool-surface contract + docs/data-sources/sbe-state-watchouts.md. Add at least one eval case per owned-data state you claim to support.

4. Eval coverage (blocking on merge) ​

Minimum eval bundle in scripts/florence-evals/tools/<tool-name>/:

  • [ ] At least three factual cases where the expected tool output is known and the response must include those facts.
  • [ ] At least one adversarial case: question that looks computational but should route to the tool.
  • [ ] At least one auth-boundary case: call from a disallowed auth context must be rejected without the underlying endpoint being hit.
  • [ ] At least one hallucination-dragnet case if the tool returns numeric data.

5. Security review (blocking on merge for any tool touching PHI, PII, FTI, or authenticated endpoints) ​

  • [ ] Data classification sign-off: output class declared correctly.
  • [ ] Auth context allowlist reviewed.
  • [ ] Adapter-sink compatibility confirmed (the deterministic endpoint's destination vendor accepts the declared class).
  • [ ] Audit-log payload reviewed — does it capture what an auditor would need without over-retaining PII?
  • [ ] Cache semantics reviewed — member-specific outputs must not be cached across members.

6. Ship as beta ​

  • [ ] Feature flag on: FLORENCE_TOOL_<NAME>=beta.
  • [ ] Announced to the LLM in the tool-definitions block only when flag is beta or stable.
  • [ ] Staging deploy verified end-to-end (see evals & observability).
  • [ ] Monitor dashboard includes this tool's latency, cost, error rate, auth-denial count, cache-hit rate.

7. Graduate to stable ​

Flip to stable once beta has sustained all of:

  • [ ] Zero unexplained auth denials.
  • [ ] p95 latency within budget (set per tool; default ≤ 800 ms for API-wrapper tools).
  • [ ] Eval pass rate ≥ 98 %.
  • [ ] Cost per invocation within 20 % of estimate.
  • [ ] Cache-hit rate within target (if cacheable).

Update tool registry status.

Versioning a tool ​

Tools evolve. Two rules:

Minor (additive, non-breaking). New optional input field. New output field. Relaxed validation. No version bump. Deploy straight to stable once evals pass.

Major (breaking). Input renamed, removed, retyped. Output shape changed. Behavior changed.

  1. Create src/lib/florence/tools/api/<tool-name>-v2.ts. Register as api_<tool_name>_v2.
  2. Keep the v1 tool in the registry marked deprecated for one audit window (one eval cycle minimum).
  3. The system prompt announces only stable + beta tools — the v1 deprecated tool is still callable by in-flight conversations that already have it in context, but Florence no longer volunteers it.
  4. Full eval bundle for v2 (independent of v1).
  5. After the audit window, v1 is removed in a PR that also archives its eval bundle (kept for auditor traceability; not run in CI).

Deprecating / removing a tool ​

Tools go away. Most often because:

  • Deterministic endpoint was replaced.
  • Scope of Florence shifted (e.g. Ian decides agent-side Florence doesn't need api_draft_renewal_outreach because the marketing team owns that copy).
  • Data classification changed and the tool must be removed from a specific auth context.

Steps:

  1. Mark the tool deprecated in its file. Registry reflects this automatically.
  2. Announce in the sprint notes; flag the downstream code / prompts / UI that depended on it.
  3. One audit window of dual-running (tool still callable, not announced to LLM for new conversations).
  4. Remove the tool file + registry entry in a PR that:
    • [ ] Deletes src/lib/florence/tools/{api,ui}/<tool-name>.ts.
    • [ ] Removes registry entry.
    • [ ] Updates tool registry doc — removed tools are moved to an "Archived" section with the removal date, so the auditor-facing trail is preserved.
    • [ ] Archives the eval bundle (move scripts/florence-evals/tools/<tool-name>/ to scripts/florence-evals/tools/_archived/<tool-name>/).
    • [ ] Bumps the system-prompt version (see runtime) — removing tools is a cache-invalidating change.

Anti-patterns ​

Things that look fine and aren't.

  • Hand-editing the LLM-visible tool description in one place and the Zod schema in another. Single source of truth: the Zod schema + the description field on the tool object. Regenerate the LLM-visible block from those.
  • Skipping eval coverage "because the API is tested." The API being tested doesn't tell us Florence calls it correctly. Eval coverage is non-negotiable.
  • Catching and swallowing deterministic-API errors inside the wrapper. If the API is slow or errors, Florence needs to know — she'll tell the user, offer to retry, or escalate. A wrapper that returns a faked "empty" result on error causes silent hallucinations downstream.
  • Per-user prompt caching. The Anthropic prompt cache cannot include per-user data in the cached prefix. Keep user_profile as its own prompt slot, cached separately; do not bake user-specific content into the stable prefix.
  • Adding a tool "just for agents" without thinking about the prompt. Agent-mode Florence is a different system prompt + tool surface; see principles §9. Agent-only tools declare acceptsAuthContexts: ["authenticated_agent", "authenticated_admin"] and are registered only in the agent-mode tool list.
  • Caching member-specific outputs across members. The cache key must include the member ID (or any identifier that makes the result user-specific). Static analysis check on cacheKey implementations flags this.

Related ​

  • Tool surface — the shape every tool conforms to
  • Tool registry — living inventory
  • Evals & observability — eval harness detail
  • Data classification — the broader compliance-in-code plan
Pager
Previous pageTool surface
Next pageTool registry

AskFlorence Internal Documentation. Not for public distribution.

AskFlorence

Internal Documentation

Access restricted. Not for public distribution.