CloudFront + WAFv2 — Edge front door

Status: Active in prod since 2026-04-23 (Phase 8) and staging since 2026-04-21 (Phase 6). Purpose: SOC 2 CC6.1 / CC6.6 (boundary protection), HIPAA §164.308(a)(1)(ii)(B) (risk management) + §164.312(b) (audit controls), NIST 800-53 R4 SC-7 (boundary protection) + SI-4 (system monitoring), CMS EDE Phase 3 perimeter defense.

Summary

Every viewer request to askflorence.health, www.askflorence.health, or stage.askflorence.health lands on CloudFront first. CloudFront enforces TLS 1.2+ at the viewer edge, attaches a response-headers policy for security + IP-opacity headers, and runs every request through a WAFv2 web ACL before forwarding to the origin ALB. WAF logs ship to a CloudWatch log group in each environment account; the prod log group is on a 90-day hot retention with a planned cross-account export to log-archive S3 (object-lock COMPLIANCE 7-year) at Phase 11.

The configuration is Terraform-managed via infra/modules/cloudfront-waf/, wired into each environment from infra/envs/{staging,prod}/cloudfront.tf. There is exactly one place to change rule structure for both environments — the module — which is how scoped exemptions stay consistent across envs.

Resources

Resource	Staging	Prod
Web ACL name	`askflorence-staging-web-acl`	`askflorence-prod-web-acl`
Web ACL ID	`4d7e1072-04b4-466b-b67a-5ce03036757d`	`e05c650b-4dec-456a-af42-3ec0a7c3dcdc`
Account	549136075525	039624954211
Region (CLOUDFRONT scope)	us-east-1	us-east-1
WAF log group	`aws-waf-logs-askflorence-staging-web-acl`	`aws-waf-logs-askflorence-prod-web-acl`
Log retention (CloudWatch)	14 days	90 days
KMS encryption (logs)	`alias/askflorence-staging-data`	`alias/askflorence-prod-data`
Default action	Allow	Allow
Aliases served	`stage.askflorence.health`	`askflorence.health`, `www.askflorence.health`, `prod-canary.askflorence.health`

Rule stack (priority order)

Priority	Rule	Vendor	Mode	Scope-down
0	`AWSManagedRulesCommonRuleSet`	AWS managed	Enforce (vendor default Block)	URI does NOT start with `/ingest/` — see Scoped exemptions
10	`AWSManagedRulesKnownBadInputsRuleSet`	AWS managed	Enforce	None — runs on all requests
20	`AWSManagedRulesSQLiRuleSet`	AWS managed	Enforce	URI does NOT start with `/ingest/` — see Scoped exemptions
30	`AWSManagedRulesAmazonIpReputationList`	AWS managed	Enforce	User-Agent does NOT match the documented social-crawler allowlist — see Scoped exemptions
40	`AWSManagedRulesAnonymousIpList`	AWS managed	Enforce	Same UA-allowlist exemption as priority 30
100	`RateBasedBlanket`	Custom	Block	None — 2000 req/5min/IP applies to ALL requests including exempted-from-managed-rule traffic

The default WAF action is Allow. Every rule that matches its statement returns the rule action (Block for managed groups in Enforce mode, Block for the rate-based rule). Anything that doesn't match any rule passes through.

Scoped exemptions

Two scoped exemptions are applied to remediate documented false-positives observed post-Phase-10 cutover. Both are narrow, documented, IaC-managed, and preserve audit logging — the model the migration plan calls for under "documented, risk-based deviations from default managed-rule posture."

Exemption 1 — PostHog analytics proxy `/ingest/*`

Rules exempted: AWSManagedRulesCommonRuleSet (priority 0) and AWSManagedRulesSQLiRuleSet (priority 20). Scope: request URI starts with /ingest/.

Why: the path is a first-party Next.js rewrite to PostHog (/ingest/static/* → us-assets.i.posthog.com/static/* and /ingest/* → us.i.posthog.com/*). Browsers POST gzip-compressed event payloads to it. The compressed body pattern-matches Common (size/format/encoding) and SQLi signatures, returning HTTP 403 to every legitimate analytics emit. The endpoint does not surface SQL or user-controllable input — it forwards opaque event blobs to PostHog. Managed-rule inspection adds no value here.

Residual coverage on /ingest/* requests:

AWSManagedRulesKnownBadInputsRuleSet (priority 10) — still active.
AWSManagedRulesAmazonIpReputationList (priority 30) — still active.
AWSManagedRulesAnonymousIpList (priority 40) — still active.
RateBasedBlanket (priority 100) — still active. 2000 req/5min/IP cap applies to /ingest/* traffic the same as everywhere else.

Rules exempted: AWSManagedRulesAmazonIpReputationList (priority 30) and AWSManagedRulesAnonymousIpList (priority 40). Scope: User-Agent contains (case-insensitive) any of:

User-Agent substring	Crawler
`telegrambot`	Telegram link previews
`facebookexternalhit`	Facebook + Instagram OG fetch
`facebookcatalog`	Facebook product catalog
`linkedinbot`	LinkedIn link previews
`slackbot`	Slack link unfurl (matches `Slackbot-LinkExpanding`)
`discordbot`	Discord link previews
`twitterbot`	Twitter/X card validator
`whatsapp`	WhatsApp link previews
`skypeuripreview`	Skype/Teams link previews
`redditbot`	Reddit link previews
`applebot`	Apple Spotlight/Siri previews (partial iMessage coverage)

The allowlist is in infra/modules/cloudfront-waf/variables.tf under social_crawler_user_agents. Variable validation enforces a minimum of 2 entries (WAFv2 or_statement requires ≥2 sub-statements).

Why: these crawlers operate from cloud datacenter CIDR ranges (Telegram 149.154.0.0/16, Meta AS32934, Microsoft Azure ranges for LinkedIn/Skype, etc.) that the AWS-managed IP-reputation feeds flag wholesale based on activity from other actors in the same range. Pre-fix, every social share of an askflorence.health link returned a broken preview — material funnel drag for the consumer + agent acquisition flows.

Residual coverage on crawler-UA requests:

AWSManagedRulesCommonRuleSet (priority 0) — still active.
AWSManagedRulesKnownBadInputsRuleSet (priority 10) — still active.
AWSManagedRulesSQLiRuleSet (priority 20) — still active.
RateBasedBlanket (priority 100) — still active.

A UA-spoofing attacker from a flagged IP must therefore still bypass payload-inspection rules and the rate-based cap to make any progress. The exemption only neuters IP-reputation gating for the documented allowlist, not the rest of the rule stack.

Compliance posture (both exemptions)

Framework	Control	How this configuration satisfies it
HIPAA Security Rule	§164.308(a)(1)(ii)(B) Risk management	Documented risk-based exception with compensating controls.
HIPAA Security Rule	§164.312(b) Audit controls	All `/ingest/*` and crawler-UA requests still logged to the WAF log group with `action` field showing whether the exempted rule(s) were skipped. Forensics intact.
SOC 2 TSC	CC6.1 Logical access controls	Boundary controls remain in BLOCK mode for all rules; exemptions are payload-class- and identity-scoped, not blanket allows.
SOC 2 TSC	CC6.6 Boundary protection	Defense in depth preserved — every request still hits ≥4 enforcement layers.
SOC 2 TSC	CC7.1 / CC7.2 System monitoring	CloudWatch metrics on every managed rule group expose pre/post-exemption block rates. CloudTrail records `wafv2:UpdateWebACL` for every IaC change.
SOC 2 TSC	CC8.1 Change management	Terraform-managed; commit history + reviewed PR + dated change-log entry.
NIST 800-53 R4 (MARS-E 2.2)	SC-7 Boundary protection	"Risk-commensurate" boundary protection on a public-data path.
NIST 800-53 R4 (MARS-E 2.2)	SI-4 System monitoring	WAF logs to S3 + CloudWatch metrics unchanged.
NIST 800-53 R4 (MARS-E 2.2)	AU-2 / AU-3 Audit events	Audit records still generated for every request including the `action`, `terminatingRuleId`, and `ruleGroupList` fields that show the exemption fired.
CMS EDE Phase 3	MARS-E 2.2 inheritance	Both exemptions apply only to public consumer + first-party analytics paths that carry no PHI / PII / FTI / `application` / `cms_hub` data class today.

When to re-evaluate

Trigger	Action
Phase 5 cutover (agent portal + member dashboard ship)	Confirm authenticated/PHI-bearing routes are not addressable from the social-crawler UA allowlist (crawlers can't authenticate, so this is naturally self-limiting). Confirm `/ingest/*` is not carrying any new property-class data — the SDK property allowlist + outbound-egress wire-level guard are the actual defenses there, not WAF.
EDE Phase 3 audit prep (~Sept 2026)	Include both exemptions in the NIST 800-53 control mapping document under SC-7. Each gets a one-page entry citing this section.
PostHog vendor decision (Phase 11)	If PostHog migrates to self-hosted in our AWS FedRAMP Moderate env or is replaced with CloudWatch RUM, re-assess whether `/ingest/*` exemption is still needed.
Crawler list drift	Add new entries to `social_crawler_user_agents` only when (a) a real partner/share path is breaking AND (b) the crawler's documentation lists a stable User-Agent string. Avoid generic browser strings.

Response-headers policy

Attached to both default + /_next/static/* cache behaviors. Two jobs:

Security headers:
- Strict-Transport-Security: max-age=31536000; includeSubDomains; preload
- Content-Security-Policy: default-src 'self'; script-src 'self' 'unsafe-inline' 'unsafe-eval'; style-src 'self' 'unsafe-inline'; img-src 'self' data: https:; font-src 'self' data:; connect-src 'self' https://us.i.posthog.com https://us-assets.i.posthog.com; frame-ancestors 'none'; base-uri 'self'; form-action 'self'
- X-Content-Type-Options: nosniff
- X-Frame-Options: DENY
- Referrer-Policy: strict-origin-when-cross-origin
IP opacity (per migration plan):
- Server: AskFlorence (override)
- Strip X-Powered-By, X-AspNet-Version, X-AspNetMvc-Version
- Via is intentionally not stripped — CloudFront refuses to suppress it via RemoveHeaders; CloudFront's own Via identifies the CDN, not the origin stack, so this is OK for IP-opacity goals.

CSP unsafe-inline + unsafe-eval will tighten in Phase 11 once Next.js inline scripts move to nonces or hashes.

Verification

Run after any rule change:

bash

# A. PostHog /ingest/e/ — should NOT be 403 (scope-down working)
curl -s -o /dev/null -w "%{http_code}\n" -X POST -H "Content-Type: application/json" \
  -d '{"api_key":"x","event":"test"}' \
  "https://askflorence.health/ingest/e/?ip=0&_=test&ver=1.367.0"
# Expect: 400 from PostHog (rejecting test body), NOT 403.

# B. SQLi probe on a NON-/ingest path — should STILL be 403
curl -s -o /dev/null -w "%{http_code}\n" \
  "https://askflorence.health/api/counties?zip=84094%27%20UNION%20SELECT%201,2,3--"
# Expect: 403. Confirms SQLi rule still enforces general traffic.

# C. Crawler UA — should be 200 (exemption + normal serve)
curl -s -o /dev/null -A "TelegramBot (like TwitterBot)" -w "%{http_code}\n" \
  "https://askflorence.health/"
# Expect: 200. Repeat for facebookexternalhit, LinkedInBot, Slackbot, etc.

# D. Health endpoint
curl -s "https://askflorence.health/api/health"
# Expect: {"status":"ok","commit":"...","env":"prod"}

Operational runbook

Inspect what the WAF blocked recently

bash

# Recent BLOCK actions (last 30 minutes), prod
aws --profile askflorence-prod logs filter-log-events \
  --log-group-name aws-waf-logs-askflorence-prod-web-acl \
  --filter-pattern '{ $.action = "BLOCK" }' \
  --start-time $(($(date +%s)*1000 - 1800000)) \
  --max-items 20 \
  --query 'events[].message' \
  --output text | jq '{timestamp: .timestamp | tonumber | (./1000) | strftime("%Y-%m-%dT%H:%M:%SZ"), uri: .httpRequest.uri, ip: .httpRequest.clientIp, ua: ([.httpRequest.headers[] | select(.name | ascii_downcase == "user-agent") | .value] | first), rule: .terminatingRuleId, group: .ruleGroupList[0].ruleGroupId}'

Confirm an exempted crawler UA is no longer being blocked

bash

# Count BLOCKs in the last hour matching a crawler UA — should be 0 after the fix
aws --profile askflorence-prod logs filter-log-events \
  --log-group-name aws-waf-logs-askflorence-prod-web-acl \
  --filter-pattern '{ $.action = "BLOCK" && ($.httpRequest.headers[*].value = "*TelegramBot*" || $.httpRequest.headers[*].value = "*facebookexternalhit*" || $.httpRequest.headers[*].value = "*LinkedInBot*" || $.httpRequest.headers[*].value = "*Slackbot*" || $.httpRequest.headers[*].value = "*Discordbot*" || $.httpRequest.headers[*].value = "*Twitterbot*" || $.httpRequest.headers[*].value = "*WhatsApp*") }' \
  --start-time $(($(date +%s)*1000 - 3600000)) \
  --query 'length(events[])' \
  --output text

Add a new crawler to the allowlist

Edit social_crawler_user_agents in infra/modules/cloudfront-waf/variables.tf.
Append a row to Exemption 2's allowlist table above with the crawler name + UA substring + reason.
terraform apply against staging, then prod.
Append an entry to change-log.md with timestamp + commit SHA.

Remove a scope-down (re-enforce a rule on the exempted scope)

Set the controlling variable to its disabling value (empty string for posthog_proxy_uri_prefix, empty list for social_crawler_user_agents) in the relevant env's cloudfront.tf. Plan + apply. The dynamic block disappears from the resource, and the managed rule resumes inspecting the previously-exempted requests.

Rotate the rate-based limit

Edit web_acl_rate_limit_per_5min on the relevant module.cloudfront_* block in infra/envs/{staging,prod}/cloudfront.tf. Default is 2000.

infra/modules/cloudfront-waf/ — Terraform module
infra/envs/staging/cloudfront.tf — staging instantiation
infra/envs/prod/cloudfront.tf — prod instantiation
next.config.ts — /ingest/* rewrite to PostHog
Change Log — every WAF change is recorded here
Issue #47 — AWS migration parent issue (where the WAF false-positive observations originated)