Appearance
CloudFront + WAFv2 — Edge front door
Status: Active in prod since 2026-04-23 (Phase 8) and staging since 2026-04-21 (Phase 6). Purpose: SOC 2 CC6.1 / CC6.6 (boundary protection), HIPAA §164.308(a)(1)(ii)(B) (risk management) + §164.312(b) (audit controls), NIST 800-53 R4 SC-7 (boundary protection) + SI-4 (system monitoring), CMS EDE Phase 3 perimeter defense.
Summary
Every viewer request to askflorence.health, www.askflorence.health, or stage.askflorence.health lands on CloudFront first. CloudFront enforces TLS 1.2+ at the viewer edge, attaches a response-headers policy for security + IP-opacity headers, and runs every request through a WAFv2 web ACL before forwarding to the origin ALB. WAF logs ship to a CloudWatch log group in each environment account; the prod log group is on a 90-day hot retention with a planned cross-account export to log-archive S3 (object-lock COMPLIANCE 7-year) at Phase 11.
The configuration is Terraform-managed via infra/modules/cloudfront-waf/, wired into each environment from infra/envs/{staging,prod}/cloudfront.tf. There is exactly one place to change rule structure for both environments — the module — which is how scoped exemptions stay consistent across envs.
Resources
| Resource | Staging | Prod |
|---|---|---|
| Web ACL name | askflorence-staging-web-acl | askflorence-prod-web-acl |
| Web ACL ID | 4d7e1072-04b4-466b-b67a-5ce03036757d | e05c650b-4dec-456a-af42-3ec0a7c3dcdc |
| Account | 549136075525 | 039624954211 |
| Region (CLOUDFRONT scope) | us-east-1 | us-east-1 |
| WAF log group | aws-waf-logs-askflorence-staging-web-acl | aws-waf-logs-askflorence-prod-web-acl |
| Log retention (CloudWatch) | 14 days | 90 days |
| KMS encryption (logs) | alias/askflorence-staging-data | alias/askflorence-prod-data |
| Default action | Allow | Allow |
| Aliases served | stage.askflorence.health | askflorence.health, www.askflorence.health, prod-canary.askflorence.health |
Rule stack (priority order)
| Priority | Rule | Vendor | Mode | Scope-down |
|---|---|---|---|---|
| 0 | AWSManagedRulesCommonRuleSet | AWS managed | Enforce (vendor default Block) | URI does NOT start with /ingest/ — see Scoped exemptions |
| 10 | AWSManagedRulesKnownBadInputsRuleSet | AWS managed | Enforce | None — runs on all requests |
| 20 | AWSManagedRulesSQLiRuleSet | AWS managed | Enforce | URI does NOT start with /ingest/ — see Scoped exemptions |
| 30 | AWSManagedRulesAmazonIpReputationList | AWS managed | Enforce | User-Agent does NOT match the documented social-crawler allowlist — see Scoped exemptions |
| 40 | AWSManagedRulesAnonymousIpList | AWS managed | Enforce | Same UA-allowlist exemption as priority 30 |
| 100 | RateBasedBlanket | Custom | Block | None — 2000 req/5min/IP applies to ALL requests including exempted-from-managed-rule traffic |
The default WAF action is Allow. Every rule that matches its statement returns the rule action (Block for managed groups in Enforce mode, Block for the rate-based rule). Anything that doesn't match any rule passes through.
Scoped exemptions
Two scoped exemptions are applied to remediate documented false-positives observed post-Phase-10 cutover. Both are narrow, documented, IaC-managed, and preserve audit logging — the model the migration plan calls for under "documented, risk-based deviations from default managed-rule posture."
Exemption 1 — PostHog analytics proxy /ingest/*
Rules exempted: AWSManagedRulesCommonRuleSet (priority 0) and AWSManagedRulesSQLiRuleSet (priority 20). Scope: request URI starts with /ingest/.
Why: the path is a first-party Next.js rewrite to PostHog (/ingest/static/* → us-assets.i.posthog.com/static/* and /ingest/* → us.i.posthog.com/*). Browsers POST gzip-compressed event payloads to it. The compressed body pattern-matches Common (size/format/encoding) and SQLi signatures, returning HTTP 403 to every legitimate analytics emit. The endpoint does not surface SQL or user-controllable input — it forwards opaque event blobs to PostHog. Managed-rule inspection adds no value here.
Residual coverage on /ingest/* requests:
AWSManagedRulesKnownBadInputsRuleSet(priority 10) — still active.AWSManagedRulesAmazonIpReputationList(priority 30) — still active.AWSManagedRulesAnonymousIpList(priority 40) — still active.RateBasedBlanket(priority 100) — still active. 2000 req/5min/IP cap applies to/ingest/*traffic the same as everywhere else.
Exemption 2 — Social-media link-unfurl crawlers
Rules exempted: AWSManagedRulesAmazonIpReputationList (priority 30) and AWSManagedRulesAnonymousIpList (priority 40). Scope: User-Agent contains (case-insensitive) any of:
| User-Agent substring | Crawler |
|---|---|
telegrambot | Telegram link previews |
facebookexternalhit | Facebook + Instagram OG fetch |
facebookcatalog | Facebook product catalog |
linkedinbot | LinkedIn link previews |
slackbot | Slack link unfurl (matches Slackbot-LinkExpanding) |
discordbot | Discord link previews |
twitterbot | Twitter/X card validator |
whatsapp | WhatsApp link previews |
skypeuripreview | Skype/Teams link previews |
redditbot | Reddit link previews |
applebot | Apple Spotlight/Siri previews (partial iMessage coverage) |
The allowlist is in infra/modules/cloudfront-waf/variables.tf under social_crawler_user_agents. Variable validation enforces a minimum of 2 entries (WAFv2 or_statement requires ≥2 sub-statements).
Why: these crawlers operate from cloud datacenter CIDR ranges (Telegram 149.154.0.0/16, Meta AS32934, Microsoft Azure ranges for LinkedIn/Skype, etc.) that the AWS-managed IP-reputation feeds flag wholesale based on activity from other actors in the same range. Pre-fix, every social share of an askflorence.health link returned a broken preview — material funnel drag for the consumer + agent acquisition flows.
Residual coverage on crawler-UA requests:
AWSManagedRulesCommonRuleSet(priority 0) — still active.AWSManagedRulesKnownBadInputsRuleSet(priority 10) — still active.AWSManagedRulesSQLiRuleSet(priority 20) — still active.RateBasedBlanket(priority 100) — still active.
A UA-spoofing attacker from a flagged IP must therefore still bypass payload-inspection rules and the rate-based cap to make any progress. The exemption only neuters IP-reputation gating for the documented allowlist, not the rest of the rule stack.
Compliance posture (both exemptions)
| Framework | Control | How this configuration satisfies it |
|---|---|---|
| HIPAA Security Rule | §164.308(a)(1)(ii)(B) Risk management | Documented risk-based exception with compensating controls. |
| HIPAA Security Rule | §164.312(b) Audit controls | All /ingest/* and crawler-UA requests still logged to the WAF log group with action field showing whether the exempted rule(s) were skipped. Forensics intact. |
| SOC 2 TSC | CC6.1 Logical access controls | Boundary controls remain in BLOCK mode for all rules; exemptions are payload-class- and identity-scoped, not blanket allows. |
| SOC 2 TSC | CC6.6 Boundary protection | Defense in depth preserved — every request still hits ≥4 enforcement layers. |
| SOC 2 TSC | CC7.1 / CC7.2 System monitoring | CloudWatch metrics on every managed rule group expose pre/post-exemption block rates. CloudTrail records wafv2:UpdateWebACL for every IaC change. |
| SOC 2 TSC | CC8.1 Change management | Terraform-managed; commit history + reviewed PR + dated change-log entry. |
| NIST 800-53 R4 (MARS-E 2.2) | SC-7 Boundary protection | "Risk-commensurate" boundary protection on a public-data path. |
| NIST 800-53 R4 (MARS-E 2.2) | SI-4 System monitoring | WAF logs to S3 + CloudWatch metrics unchanged. |
| NIST 800-53 R4 (MARS-E 2.2) | AU-2 / AU-3 Audit events | Audit records still generated for every request including the action, terminatingRuleId, and ruleGroupList fields that show the exemption fired. |
| CMS EDE Phase 3 | MARS-E 2.2 inheritance | Both exemptions apply only to public consumer + first-party analytics paths that carry no PHI / PII / FTI / application / cms_hub data class today. |
When to re-evaluate
| Trigger | Action |
|---|---|
| Phase 5 cutover (agent portal + member dashboard ship) | Confirm authenticated/PHI-bearing routes are not addressable from the social-crawler UA allowlist (crawlers can't authenticate, so this is naturally self-limiting). Confirm /ingest/* is not carrying any new property-class data — the SDK property allowlist + outbound-egress wire-level guard are the actual defenses there, not WAF. |
| EDE Phase 3 audit prep (~Sept 2026) | Include both exemptions in the NIST 800-53 control mapping document under SC-7. Each gets a one-page entry citing this section. |
| PostHog vendor decision (Phase 11) | If PostHog migrates to self-hosted in our AWS FedRAMP Moderate env or is replaced with CloudWatch RUM, re-assess whether /ingest/* exemption is still needed. |
| Crawler list drift | Add new entries to social_crawler_user_agents only when (a) a real partner/share path is breaking AND (b) the crawler's documentation lists a stable User-Agent string. Avoid generic browser strings. |
Response-headers policy
Attached to both default + /_next/static/* cache behaviors. Two jobs:
- Security headers:
Strict-Transport-Security: max-age=31536000; includeSubDomains; preloadContent-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self' data:; connect-src 'self' https://us.i.posthog.com https://us-assets.i.posthog.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'X-Content-Type-Options: nosniffX-Frame-Options: DENYReferrer-Policy: strict-origin-when-cross-origin
- IP opacity (per migration plan):
Server: AskFlorence(override)- Strip
X-Powered-By,X-AspNet-Version,X-AspNetMvc-Version Viais intentionally not stripped — CloudFront refuses to suppress it viaRemoveHeaders; CloudFront's ownViaidentifies the CDN, not the origin stack, so this is OK for IP-opacity goals.
CSP unsafe-inline + unsafe-eval will tighten in Phase 11 once Next.js inline scripts move to nonces or hashes.
Verification
Run after any rule change:
bash
# A. PostHog /ingest/e/ — should NOT be 403 (scope-down working)
curl -s -o /dev/null -w "%{http_code}\n" -X POST -H "Content-Type: application/json" \
-d '{"api_key":"x","event":"test"}' \
"https://askflorence.health/ingest/e/?ip=0&_=test&ver=1.367.0"
# Expect: 400 from PostHog (rejecting test body), NOT 403.
# B. SQLi probe on a NON-/ingest path — should STILL be 403
curl -s -o /dev/null -w "%{http_code}\n" \
"https://askflorence.health/api/counties?zip=84094%27%20UNION%20SELECT%201,2,3--"
# Expect: 403. Confirms SQLi rule still enforces general traffic.
# C. Crawler UA — should be 200 (exemption + normal serve)
curl -s -o /dev/null -A "TelegramBot (like TwitterBot)" -w "%{http_code}\n" \
"https://askflorence.health/"
# Expect: 200. Repeat for facebookexternalhit, LinkedInBot, Slackbot, etc.
# D. Health endpoint
curl -s "https://askflorence.health/api/health"
# Expect: {"status":"ok","commit":"...","env":"prod"}Operational runbook
Inspect what the WAF blocked recently
bash
# Recent BLOCK actions (last 30 minutes), prod
aws --profile askflorence-prod logs filter-log-events \
--log-group-name aws-waf-logs-askflorence-prod-web-acl \
--filter-pattern '{ $.action = "BLOCK" }' \
--start-time $(($(date +%s)*1000 - 1800000)) \
--max-items 20 \
--query 'events[].message' \
--output text | jq '{timestamp: .timestamp | tonumber | (./1000) | strftime("%Y-%m-%dT%H:%M:%SZ"), uri: .httpRequest.uri, ip: .httpRequest.clientIp, ua: ([.httpRequest.headers[] | select(.name | ascii_downcase == "user-agent") | .value] | first), rule: .terminatingRuleId, group: .ruleGroupList[0].ruleGroupId}'Confirm an exempted crawler UA is no longer being blocked
bash
# Count BLOCKs in the last hour matching a crawler UA — should be 0 after the fix
aws --profile askflorence-prod logs filter-log-events \
--log-group-name aws-waf-logs-askflorence-prod-web-acl \
--filter-pattern '{ $.action = "BLOCK" && ($.httpRequest.headers[*].value = "*TelegramBot*" || $.httpRequest.headers[*].value = "*facebookexternalhit*" || $.httpRequest.headers[*].value = "*LinkedInBot*" || $.httpRequest.headers[*].value = "*Slackbot*" || $.httpRequest.headers[*].value = "*Discordbot*" || $.httpRequest.headers[*].value = "*Twitterbot*" || $.httpRequest.headers[*].value = "*WhatsApp*") }' \
--start-time $(($(date +%s)*1000 - 3600000)) \
--query 'length(events[])' \
--output textAdd a new crawler to the allowlist
- Edit
social_crawler_user_agentsininfra/modules/cloudfront-waf/variables.tf. - Append a row to Exemption 2's allowlist table above with the crawler name + UA substring + reason.
terraform applyagainst staging, then prod.- Append an entry to change-log.md with timestamp + commit SHA.
Remove a scope-down (re-enforce a rule on the exempted scope)
Set the controlling variable to its disabling value (empty string for posthog_proxy_uri_prefix, empty list for social_crawler_user_agents) in the relevant env's cloudfront.tf. Plan + apply. The dynamic block disappears from the resource, and the managed rule resumes inspecting the previously-exempted requests.
Rotate the rate-based limit
Edit web_acl_rate_limit_per_5min on the relevant module.cloudfront_* block in infra/envs/{staging,prod}/cloudfront.tf. Default is 2000.
Related
infra/modules/cloudfront-waf/— Terraform moduleinfra/envs/staging/cloudfront.tf— staging instantiationinfra/envs/prod/cloudfront.tf— prod instantiationnext.config.ts—/ingest/*rewrite to PostHog- Change Log — every WAF change is recorded here
- Issue #47 — AWS migration parent issue (where the WAF false-positive observations originated)