Appearance
Adding (or removing) a tool
The tool surface will grow and change continuously. Drug coverage ships (Phase C, #17). Provider directory ships (Phase D, #18). Appointment booking, claims, bill review, renewal analysis, carrier integrations, agent-side drafting tools — all arrive as tools. Some earlier tools will be renamed, refactored, deprecated.
This playbook is the standard path for adding, versioning, and removing a Florence tool. Follow it every time.
Lifecycle at a glance
Adding a tool — checklist
1. Proposal (before writing code)
One-paragraph proposal in the PR description or a linked issue:
- What member / agent scenario does this tool unlock?
- What deterministic endpoint does it wrap? (If the endpoint doesn't exist yet, this is two PRs.)
- What data classes flow through it (input + output)?
- Which auth contexts should be allowed?
- What failure modes matter most (slow API, empty result, auth edge cases)?
2. Design review (required before implementation)
Confirm with the Florence AI architect (Taha by default):
- [ ] Tool name follows conventions:
api_verb_nounorui_verb_noun. - [ ] Input / output shapes are reviewed for LLM-friendliness (stable field names, compact, self-describing).
- [ ] Data classification assessed against data classification. If any FTI or ApplicationPayload is in the output, additional egress-control review is required.
- [ ] Auth contexts reviewed: is anonymous OK? Agent cross-member access intentional?
- [ ] Cacheability decided: TTL set, invalidation rules clear.
3. Implementation
- [ ] Write the tool in
src/lib/florence/tools/api/<tool-name>.ts(orui/<tool-name>.ts). One tool per file. - [ ] Export a
FlorenceTool<Input, Output>object with every field populated. - [ ] Zod schemas for input and output.
- [ ] Wire it into
src/lib/florence/tools/registry.ts. - [ ] Add to tool registry doc.
- [ ] If the tool reads plan / drug / provider data: handle SBE vs FFM correctly. Thread the user's
stateinto the deterministic call, and confirm the discovery identifier matches the coverage identifier for owned-data states (NY, CA). CA providers key on SymphonyproviderId, not NPPES NPI — a tool that discovers via NPPES but checks coverage against CA data will silently return "not covered." See the SBE vs FFM data lookup section in the tool-surface contract +docs/data-sources/sbe-state-watchouts.md. Add at least one eval case per owned-data state you claim to support.
4. Eval coverage (blocking on merge)
Minimum eval bundle in scripts/florence-evals/tools/<tool-name>/:
- [ ] At least three factual cases where the expected tool output is known and the response must include those facts.
- [ ] At least one adversarial case: question that looks computational but should route to the tool.
- [ ] At least one auth-boundary case: call from a disallowed auth context must be rejected without the underlying endpoint being hit.
- [ ] At least one hallucination-dragnet case if the tool returns numeric data.
5. Security review (blocking on merge for any tool touching PHI, PII, FTI, or authenticated endpoints)
- [ ] Data classification sign-off: output class declared correctly.
- [ ] Auth context allowlist reviewed.
- [ ] Adapter-sink compatibility confirmed (the deterministic endpoint's destination vendor accepts the declared class).
- [ ] Audit-log payload reviewed — does it capture what an auditor would need without over-retaining PII?
- [ ] Cache semantics reviewed — member-specific outputs must not be cached across members.
6. Ship as beta
- [ ] Feature flag on:
FLORENCE_TOOL_<NAME>=beta. - [ ] Announced to the LLM in the tool-definitions block only when flag is
betaorstable. - [ ] Staging deploy verified end-to-end (see evals & observability).
- [ ] Monitor dashboard includes this tool's latency, cost, error rate, auth-denial count, cache-hit rate.
7. Graduate to stable
Flip to stable once beta has sustained all of:
- [ ] Zero unexplained auth denials.
- [ ] p95 latency within budget (set per tool; default ≤ 800 ms for API-wrapper tools).
- [ ] Eval pass rate ≥ 98 %.
- [ ] Cost per invocation within 20 % of estimate.
- [ ] Cache-hit rate within target (if cacheable).
Update tool registry status.
Versioning a tool
Tools evolve. Two rules:
Minor (additive, non-breaking). New optional input field. New output field. Relaxed validation. No version bump. Deploy straight to stable once evals pass.
Major (breaking). Input renamed, removed, retyped. Output shape changed. Behavior changed.
- Create
src/lib/florence/tools/api/<tool-name>-v2.ts. Register asapi_<tool_name>_v2. - Keep the v1 tool in the registry marked
deprecatedfor one audit window (one eval cycle minimum). - The system prompt announces only
stable+betatools — the v1 deprecated tool is still callable by in-flight conversations that already have it in context, but Florence no longer volunteers it. - Full eval bundle for v2 (independent of v1).
- After the audit window, v1 is removed in a PR that also archives its eval bundle (kept for auditor traceability; not run in CI).
Deprecating / removing a tool
Tools go away. Most often because:
- Deterministic endpoint was replaced.
- Scope of Florence shifted (e.g. Ian decides agent-side Florence doesn't need
api_draft_renewal_outreachbecause the marketing team owns that copy). - Data classification changed and the tool must be removed from a specific auth context.
Steps:
- Mark the tool
deprecatedin its file. Registry reflects this automatically. - Announce in the sprint notes; flag the downstream code / prompts / UI that depended on it.
- One audit window of dual-running (tool still callable, not announced to LLM for new conversations).
- Remove the tool file + registry entry in a PR that:
- [ ] Deletes
src/lib/florence/tools/{api,ui}/<tool-name>.ts. - [ ] Removes registry entry.
- [ ] Updates tool registry doc — removed tools are moved to an "Archived" section with the removal date, so the auditor-facing trail is preserved.
- [ ] Archives the eval bundle (move
scripts/florence-evals/tools/<tool-name>/toscripts/florence-evals/tools/_archived/<tool-name>/). - [ ] Bumps the system-prompt version (see runtime) — removing tools is a cache-invalidating change.
- [ ] Deletes
Anti-patterns
Things that look fine and aren't.
- Hand-editing the LLM-visible tool description in one place and the Zod schema in another. Single source of truth: the Zod schema + the
descriptionfield on the tool object. Regenerate the LLM-visible block from those. - Skipping eval coverage "because the API is tested." The API being tested doesn't tell us Florence calls it correctly. Eval coverage is non-negotiable.
- Catching and swallowing deterministic-API errors inside the wrapper. If the API is slow or errors, Florence needs to know — she'll tell the user, offer to retry, or escalate. A wrapper that returns a faked "empty" result on error causes silent hallucinations downstream.
- Per-user prompt caching. The Anthropic prompt cache cannot include per-user data in the cached prefix. Keep
user_profileas its own prompt slot, cached separately; do not bake user-specific content into the stable prefix. - Adding a tool "just for agents" without thinking about the prompt. Agent-mode Florence is a different system prompt + tool surface; see principles §9. Agent-only tools declare
acceptsAuthContexts: ["authenticated_agent", "authenticated_admin"]and are registered only in the agent-mode tool list. - Caching member-specific outputs across members. The cache key must include the member ID (or any identifier that makes the result user-specific). Static analysis check on
cacheKeyimplementations flags this.
Related
- Tool surface — the shape every tool conforms to
- Tool registry — living inventory
- Evals & observability — eval harness detail
- Data classification — the broader compliance-in-code plan