Architecture Overview
AI Cost Firewall is a lightweight LLM infrastructure component.
Components
AI Cost Firewall
The Rust + Axum gateway that validates requests, checks caches, forwards misses upstream, stores cache entries, and exposes metrics.
Redis / Valkey
Stores exact cache entries.
Qdrant
Stores semantic cache entries and performs vector search. AI Cost Firewall uses Qdrant gRPC on port 6334.
Upstream LLM API
Receives cache misses.
Prometheus and Grafana
Prometheus scrapes /metrics; Grafana visualizes cache performance, savings, and diagnostics.