What is AI Cost Firewall?
AI Cost Firewall is an OpenAI-compatible API gateway for caching, cost control, and operational visibility.
It sits between your application and an upstream LLM provider. Applications send requests to AI Cost Firewall instead of calling the provider directly. The firewall checks whether a response can be reused from cache and forwards only necessary requests upstream.
Why it exists
LLM applications often send repeated or semantically similar prompts. Without caching, every request can become an upstream API call, token usage, added latency, and additional cost.
AI Cost Firewall reduces this waste with two cache layers:
- Exact cache — reuses responses for identical normalized requests.
- Semantic cache — reuses responses for similar prompts when similarity is high enough.
Core capabilities
- OpenAI-compatible
/v1/chat/completionsendpoint - Redis / Valkey exact caching
- Qdrant semantic caching
- Prometheus metrics and Grafana dashboards
- strict configuration validation
- model allowlist behavior through
model_price - request size limits
- readiness and liveness endpoints
- graceful shutdown and hot reload via
SIGHUP - semantic cache lifecycle control