Skip to main content

What is AI Cost Firewall?

AI Cost Firewall is an OpenAI-compatible API gateway for caching, cost control, and operational visibility.

It sits between your application and an upstream LLM provider. Applications send requests to AI Cost Firewall instead of calling the provider directly. The firewall checks whether a response can be reused from cache and forwards only necessary requests upstream.

Why it exists

LLM applications often send repeated or semantically similar prompts. Without caching, every request can become an upstream API call, token usage, added latency, and additional cost.

AI Cost Firewall reduces this waste with two cache layers:

  1. Exact cache — reuses responses for identical normalized requests.
  2. Semantic cache — reuses responses for similar prompts when similarity is high enough.

Core capabilities

  • OpenAI-compatible /v1/chat/completions endpoint
  • Redis / Valkey exact caching
  • Qdrant semantic caching
  • Prometheus metrics and Grafana dashboards
  • strict configuration validation
  • model allowlist behavior through model_price
  • request size limits
  • readiness and liveness endpoints
  • graceful shutdown and hot reload via SIGHUP
  • semantic cache lifecycle control
  1. What it does
  2. Quick Start with Docker
  3. Request flow
  4. Configuration overview
  5. Runtime overview
  6. Metrics