Secure LLM Gateway & AI Control Plane

An edge gateway that fronts every LLM call with hard token budgets, cost metering, PII & prompt-injection guardrails, online evals, and full observability — running on Cloudflare Workers, provisioned entirely by Terraform, and shipped through a security-scanned CI/CD pipeline.

Hard budget caps (not soft alerts) Atomic spend ledger · D1 Guardrails in & out Workers AI · Llama 3 Terraform IaC Semgrep · Trivy · Checkov · SBOM NIST 800-53 mapped

Overview

last 24h
Requests
success rate —
Tokens
in + out
Spend
across tenants
p95 Latency
p50 —
Requests over time
Budget burn-down
Guardrail hits by reason
No guardrail activity
Online eval score
mean(safety, latency-SLO) over window · CI gate at 0.6
Reliability
Errors
Guardrail blocks
Cache hits
SLO p95 < 3000ms

Live request log

most recent 20
TimeStatusTenantModelTokensCostLatencyEvalGuardrail
No requests yet — send one below ↓

▶ Try the gateway

live · rate-limited · no data retained
The model response and its per-request metadata — tokens, cost, guardrail verdict, eval score, budget impact — appear here.

Architecture

self-documenting · how it's built & shipped
  client / curl / this dashboard
            │  HTTPS
            ▼
  ┌──────────── Cloudflare edge ─────────────┐
  │  WAF · per-IP rate-limit ruleset (TF)     │
  │                ▼                          │
  │   Aegis Worker  (single artifact)         │
  │   1 rate-limit   (KV sliding window)      │
  │   2 guardrails-in  (PII · injection)──▶ block 402/200
  │   3 budget pre-charge (atomic D1)  ──▶ 402 hard cap
  │   4 inference  ─────────────────────▶ AI Gateway ▶ Workers AI (Llama 3)
  │   5 reconcile cost + tokens (D1 ledger)   │
  │   6 guardrails-out (PII redaction)        │
  │   7 online eval (safety · latency SLO)    │
  │   8 audit log (D1 requests table)         │
  └───────┬──────────────────┬────────────────┘
       KV (rate, cache)    D1 (ledger, audit, evals)

  provisioned by:  Terraform (Worker + bindings + KV + D1 + AI Gateway + DNS + WAF)
  shipped by:      GitHub Actions → typecheck · tests · esbuild · Semgrep · Trivy(SCA+IaC) · Checkov · hadolint · SBOM → gated terraform apply

Cloudflare runtime

  • Workers — gateway logic + UI, one URL
  • Workers AI — Llama 3 inference on-net
  • AI Gateway — caching + analytics
  • KV — rate-limit windows
  • D1 (SQLite) — spend ledger + audit log

Terraform (IaC)

  • Every resource declared as code
  • Worker script + AI/KV/D1 bindings
  • DNS + custom domain + rate-limit rule
  • fmt / validate / plan in CI
  • No click-ops, reproducible

DevSecOps

  • Trivy config + Checkov — IaC scanning
  • Semgrep (SAST) · Trivy (SCA)
  • Syft SBOM + cosign signing
  • Least-privilege scoped token
  • Gated apply · NIST 800-53 mapped

Component models

how the system is built — request flow, runtime topology, control plane, data

1 · Request pipeline

Every /v1/chat request runs these stages in order (src/gateway.ts). Red = can reject.

2 · Runtime topology

A single Worker artifact mediates all traffic; right consistency model per store.
Clientcurl · dashboard
Cloudflare edgeWAF · per-IP rate-limit
Aegis Workergateway + UI
KVrate windows · cache
D1 (SQLite)budget ledger · audit
AI Gateway → Workers AILlama 3 · cache · analytics

3 · Control plane — IaC + CI/CD

Terraform owns the topology; two mirrored pipelines build, scan, and gate the deploy.
Terraformimport → plan → apply
CloudflareWorker · KV · D1 · AI GW · domain · WAF
buildbun · tsc · test · esbuild · eval gate
scanSemgrep · Trivy · Checkov · hadolint
attestSBOM · SLSA provenance
gated applyprotected env · review

4 · Data model

D1 is the transactional source of truth; KV holds ephemeral counters.
D1 · tenantsid · api_key · budget_usd · spent_usd · rate_per_min
D1 · requeststs · model · tokens · cost · status · flags · eval
KVrl:{tenant}:{min} · cache:{sha256}

Standards & best practices

what it's built against, and how — verified against the code
Implemented present in code Partial real but incomplete Planned named cheap addition full mapping →
AI / LLM security & governance // OWASP · NIST · MITRE
OWASP LLM Top 10 (2025)8/10 ✓
Guardrails (LLM01/02/05/07), hard budgets & rate limits (LLM10), supply-chain scanning (LLM03). LLM04/06/08 scoped out by design.
src/guardrails.ts · src/gateway.ts
NIST AI RMF 1.0Govern/Map/Measure/Manage
MEASURE + MANAGE evidenced by the eval harness, audit log, budgets, and fallback; GOVERN/MAP via this mapping + model card.
docs/STANDARDS.md
MITRE ATLAScost/DoS/auth ✓
Cost Harvesting (T0034) & Denial of Service (T0029) neutralized by budgets+rate-limits; inference access gated by virtual keys.
budgets · rate limits · API keys
ISO 42001 · EU AI Acttransparency
Model/transparency card; structured audit logging (Art. 12); model-id provenance per response (Art. 13). No certification claimed.
docs/MODEL-CARD.md
Software supply chain & secure SDLC // SLSA · SSDF · OpenSSF · CIS
SLSA v1.0Build L2
Signed build provenance via keyless Sigstore attestation (OIDC). Verify: gh attestation verify.
.github/workflows/ci.yml
NIST SSDF (800-218)PW · RV
SAST (PW.7), SCA/IaC/container tests (PW.8), secure defaults (PW.9), continuous vuln ID (RV.1).
ci.yml · docker.yml
OpenSSF Scorecardhigh
Token-Permissions, SAST, Dependency-Update, Dangerous-Workflow pass; full action SHA-pinning & branch protection tracked.
permissions: {} · Dependabot
CIS Docker Benchmark4.x ✓
Non-root, minimal pinned base, no secrets, multi-stage, HEALTHCHECK, COPY-not-ADD. Base-by-digest planned.
Dockerfile · .hadolint.yaml
Supply-chain scanningSBOM + 5 scanners
Syft SBOM, Semgrep (SAST), Trivy (SCA/image/secret/IaC), Checkov (IaC), hadolint (Docker) → SARIF.
SARIF → code scanning
Sigstore / cosignimage signing
Keyless image + SBOM signing documented; provenance attestation already wired.
roadmap
Application / web / API security // OWASP · 12-Factor · W3C · IETF
OWASP API Top 10 (2023)9/10 ✓
BOLA/auth via Bearer keys, resource limits (API4), SSRF-immune by construction, untrusted-upstream handling (API10).
src/index.ts · gateway.ts
OWASP ASVSL1 · L2
Deny-by-default access control, no-stack-trace errors, redacted-only cache, full header set.
V1/V4/V7/V8/V13/V14
OWASP Secure Headers9 headers
CSP, HSTS, nosniff, frame-deny, Referrer-Policy, Permissions-Policy, COOP, CORP, no-store — on every response.
SECURITY_HEADERS
The Twelve-Factor App12/12
Config in env, backing services as bindings, stateless isolates, build/release/run, logs as a stream, mock for dev parity.
edge-reinterpreted
WCAG 2.2 AAAA
Skip link, landmarks, focus-visible, chart aria-labels, keyboard nav, reduced-motion, aria-live, AA contrast.
public/index.html
RFC 9116 security.txtpublished
Machine-readable disclosure contact at /.well-known/security.txt; robots.txt for API hygiene.
/.well-known/security.txt
Compliance frameworks // NIST · SOC 2 · ISO
NIST 800-53 Rev 524 controls
AC / AU / SC / SI / CM / RA / SR families mapped to concrete features.
docs/NIST-800-53-mapping.md
SOC 2 / ISO 27001change · vuln · access
Change management (gated IaC apply), vuln management (scanning), access control (least-privilege tokens).
CC7.1 · CC8.1 · A.8.28