Skip to main content

Architecture Overview

AI Cost Firewall is a lightweight LLM infrastructure component.

Components

AI Cost Firewall

The Rust + Axum gateway that validates requests, checks caches, forwards misses upstream, stores cache entries, and exposes metrics.

Redis / Valkey

Stores exact cache entries.

Qdrant

Stores semantic cache entries and performs vector search. AI Cost Firewall uses Qdrant gRPC on port 6334.

Upstream LLM API

Receives cache misses.

Prometheus and Grafana

Prometheus scrapes /metrics; Grafana visualizes cache performance, savings, and diagnostics.