Architecture Overview

AI Cost Firewall is a lightweight LLM infrastructure component.

Components

AI Cost Firewall

The Rust + Axum gateway that validates requests, checks caches, forwards misses upstream, stores cache entries, and exposes metrics.

Redis / Valkey

Stores exact cache entries.

Qdrant

Stores semantic cache entries and performs vector search. AI Cost Firewall uses Qdrant gRPC on port 6334.

OpenAI-compatible chat upstream

Receives cache misses.

OpenAI-compatible embedding provider

Semantic caching uses embeddings generated by the configured embedding provider. The embedding provider may be the same service as the chat upstream or a separate OpenAI-compatible endpoint.

OpenAI-Compatible Provider Flexibility

AI Cost Firewall supports practical OpenAI-compatible providers including:

OpenAI
Ollama
LM Studio
vLLM
LiteLLM
OpenRouter

without requiring provider-specific configuration blocks.

The deployment stack also includes:

Prometheus
Grafana
Overview dashboard
Diagnostics dashboard

See:

deploy/examples/

Prometheus and Grafana

Prometheus scrapes /metrics; Grafana visualizes cache performance, savings, and diagnostics.

Components​

AI Cost Firewall​

Redis / Valkey​

Qdrant​

OpenAI-compatible chat upstream​

OpenAI-compatible embedding provider​

OpenAI-Compatible Provider Flexibility​

Prometheus and Grafana​