What AI Cost Firewall Does
AI Cost Firewall acts as a smart gateway for OpenAI-compatible chat-completion requests.
Instead of sending every request directly to an LLM provider, it checks whether the same or a similar request has already been answered.
Responsibilities
- validate requests
- normalize requests
- check exact cache
- check semantic cache
- forward upstream when needed
- store cache entries
- estimate token and cost savings
- expose metrics
- handle readiness, shutdown, and reload behavior
OpenAI-compatible gateway
Supported endpoint:
POST /v1/chat/completions
Existing applications can point their OpenAI-compatible client to AI Cost Firewall.
Two cache layers
| Layer | Backend | Purpose |
|---|---|---|
| Exact cache | Redis / Valkey | Reuse identical normalized requests |
| Semantic cache | Qdrant | Reuse semantically similar prompts |
Exact cache hits are fastest and have no embedding cost. Semantic cache hits require embedding lookup but can reuse answers for similar prompts.