Skip to main content

What AI Cost Firewall Does

AI Cost Firewall acts as a smart gateway for OpenAI-compatible chat-completion requests.

Instead of sending every request directly to an LLM provider, it checks whether the same or a similar request has already been answered.

Responsibilities

  • validate requests
  • normalize requests
  • check exact cache
  • check semantic cache
  • forward upstream when needed
  • store cache entries
  • estimate token and cost savings
  • expose metrics
  • handle readiness, shutdown, and reload behavior

OpenAI-compatible gateway

Supported endpoint:

POST /v1/chat/completions

Existing applications can point their OpenAI-compatible client to AI Cost Firewall.

Two cache layers

LayerBackendPurpose
Exact cacheRedis / ValkeyReuse identical normalized requests
Semantic cacheQdrantReuse semantically similar prompts

Exact cache hits are fastest and have no embedding cost. Semantic cache hits require embedding lookup but can reuse answers for similar prompts.