Secure LLM Gateway & AI Control Plane

An edge gateway that fronts every LLM call with hard token budgets, cost metering, PII & prompt-injection guardrails, online evals, and full observability — running on Cloudflare Workers, provisioned entirely by Terraform, and shipped through a security-scanned CI/CD pipeline.

Hard budget caps (not soft alerts) Atomic spend ledger · D1 Guardrails in & out Workers AI · Llama 3 Terraform IaC Semgrep · Trivy · Checkov · SBOM NIST 800-53 mapped

Overview

last 24h

Requests

—

success rate —

Tokens

—

in + out

Spend

—

across tenants

p95 Latency

—

p50 —

Requests over time

Budget burn-down

—

Guardrail hits by reason

No guardrail activity

Online eval score

—

mean(safety, latency-SLO) over window · CI gate at 0.6

Reliability

Errors —

Guardrail blocks —

Cache hits —

SLO p95 < 3000ms —

Live request log

most recent 20

Time	Status	Tenant	Model	Tokens	Cost	Latency	Eval	Guardrail
No requests yet — send one below ↓

▶ Try the gateway

live · rate-limited · no data retained

The model response and its per-request metadata — tokens, cost, guardrail verdict, eval score, budget impact — appear here.

Architecture

self-documenting · how it's built & shipped

  client / curl / this dashboard
            │  HTTPS
            ▼
  ┌──────────── Cloudflare edge ─────────────┐
  │  WAF · per-IP rate-limit ruleset (TF)     │
  │                ▼                          │
  │   Aegis Worker  (single artifact)         │
  │   1 rate-limit   (KV sliding window)      │
  │   2 guardrails-in  (PII · injection)──▶ block 402/200
  │   3 budget pre-charge (atomic D1)  ──▶ 402 hard cap
  │   4 inference  ─────────────────────▶ AI Gateway ▶ Workers AI (Llama 3)
  │   5 reconcile cost + tokens (D1 ledger)   │
  │   6 guardrails-out (PII redaction)        │
  │   7 online eval (safety · latency SLO)    │
  │   8 audit log (D1 requests table)         │
  └───────┬──────────────────┬────────────────┘
       KV (rate, cache)    D1 (ledger, audit, evals)

  provisioned by:  Terraform (Worker + bindings + KV + D1 + AI Gateway + DNS + WAF)
  shipped by:      GitHub Actions → typecheck · tests · esbuild · Semgrep · Trivy(SCA+IaC) · Checkov · hadolint · SBOM → gated terraform apply

Cloudflare runtime

Workers — gateway logic + UI, one URL
Workers AI — Llama 3 inference on-net
AI Gateway — caching + analytics
KV — rate-limit windows
D1 (SQLite) — spend ledger + audit log

Terraform (IaC)

Every resource declared as code
Worker script + AI/KV/D1 bindings
DNS + custom domain + rate-limit rule
fmt / validate / plan in CI
No click-ops, reproducible

DevSecOps

Trivy config + Checkov — IaC scanning
Semgrep (SAST) · Trivy (SCA)
Syft SBOM + cosign signing
Least-privilege scoped token
Gated apply · NIST 800-53 mapped

Component models

how the system is built — request flow, runtime topology, control plane, data

1 · Request pipeline

Every /v1/chat request runs these stages in order (src/gateway.ts). Red = can reject.

1Rate limitKV window · 429

2Guardrails-inPII · injection · 200/blocked

3Budget pre-chargeatomic D1 · 402

4Cache checkexact-match KV · free hit

5InferenceModelRouter → Workers AI

6Reconciletokens · cost → ledger

7Guardrails-outPII + prompt-leak redact

8Eval + auditscore → D1 log

2 · Runtime topology

A single Worker artifact mediates all traffic; right consistency model per store.

Clientcurl · dashboard

→

Cloudflare edgeWAF · per-IP rate-limit

→

Aegis Workergateway + UI

→

KVrate windows · cache

D1 (SQLite)budget ledger · audit

AI Gateway → Workers AILlama 3 · cache · analytics

3 · Control plane — IaC + CI/CD

Terraform owns the topology; two mirrored pipelines build, scan, and gate the deploy.

Terraformimport → plan → apply

→

CloudflareWorker · KV · D1 · AI GW · domain · WAF

buildbun · tsc · test · esbuild · eval gate

→

scanSemgrep · Trivy · Checkov · hadolint

→

attestSBOM · SLSA provenance

→

gated applyprotected env · review

4 · Data model

D1 is the transactional source of truth; KV holds ephemeral counters.

D1 · tenantsid · api_key · budget_usd · spent_usd · rate_per_min

D1 · requeststs · model · tokens · cost · status · flags · eval

KVrl:{tenant}:{min} · cache:{sha256}

Standards & best practices

what it's built against, and how — verified against the code

Implemented present in code Partial real but incomplete Planned named cheap addition full mapping →

AI / LLM security & governance // OWASP · NIST · MITRE

OWASP LLM Top 10 (2025)8/10 ✓

Guardrails (LLM01/02/05/07), hard budgets & rate limits (LLM10), supply-chain scanning (LLM03). LLM04/06/08 scoped out by design.

src/guardrails.ts · src/gateway.ts

NIST AI RMF 1.0Govern/Map/Measure/Manage

MEASURE + MANAGE evidenced by the eval harness, audit log, budgets, and fallback; GOVERN/MAP via this mapping + model card.

docs/STANDARDS.md

MITRE ATLAScost/DoS/auth ✓

Cost Harvesting (T0034) & Denial of Service (T0029) neutralized by budgets+rate-limits; inference access gated by virtual keys.

budgets · rate limits · API keys

ISO 42001 · EU AI Acttransparency

Model/transparency card; structured audit logging (Art. 12); model-id provenance per response (Art. 13). No certification claimed.

docs/MODEL-CARD.md

Software supply chain & secure SDLC // SLSA · SSDF · OpenSSF · CIS

SLSA v1.0Build L2

Signed build provenance via keyless Sigstore attestation (OIDC). Verify: gh attestation verify.

.github/workflows/ci.yml

NIST SSDF (800-218)PW · RV

SAST (PW.7), SCA/IaC/container tests (PW.8), secure defaults (PW.9), continuous vuln ID (RV.1).

ci.yml · docker.yml

OpenSSF Scorecardhigh

Token-Permissions, SAST, Dependency-Update, Dangerous-Workflow pass; full action SHA-pinning & branch protection tracked.

permissions: {} · Dependabot

CIS Docker Benchmark4.x ✓

Non-root, minimal pinned base, no secrets, multi-stage, HEALTHCHECK, COPY-not-ADD. Base-by-digest planned.

Dockerfile · .hadolint.yaml

Supply-chain scanningSBOM + 5 scanners

Syft SBOM, Semgrep (SAST), Trivy (SCA/image/secret/IaC), Checkov (IaC), hadolint (Docker) → SARIF.

SARIF → code scanning

Sigstore / cosignimage signing

Keyless image + SBOM signing documented; provenance attestation already wired.

roadmap

Application / web / API security // OWASP · 12-Factor · W3C · IETF

OWASP API Top 10 (2023)9/10 ✓

BOLA/auth via Bearer keys, resource limits (API4), SSRF-immune by construction, untrusted-upstream handling (API10).

src/index.ts · gateway.ts

OWASP ASVSL1 · L2

Deny-by-default access control, no-stack-trace errors, redacted-only cache, full header set.

V1/V4/V7/V8/V13/V14

OWASP Secure Headers9 headers

CSP, HSTS, nosniff, frame-deny, Referrer-Policy, Permissions-Policy, COOP, CORP, no-store — on every response.

SECURITY_HEADERS

The Twelve-Factor App12/12

Config in env, backing services as bindings, stateless isolates, build/release/run, logs as a stream, mock for dev parity.

edge-reinterpreted

WCAG 2.2 AAAA

Skip link, landmarks, focus-visible, chart aria-labels, keyboard nav, reduced-motion, aria-live, AA contrast.

public/index.html

RFC 9116 security.txtpublished

Machine-readable disclosure contact at /.well-known/security.txt; robots.txt for API hygiene.

/.well-known/security.txt

Compliance frameworks // NIST · SOC 2 · ISO

NIST 800-53 Rev 524 controls

AC / AU / SC / SI / CM / RA / SR families mapped to concrete features.

docs/NIST-800-53-mapping.md

SOC 2 / ISO 27001change · vuln · access

Change management (gated IaC apply), vuln management (scanning), access control (least-privilege tokens).

CC7.1 · CC8.1 · A.8.28