Skip to main content

Request Flow

AI Cost Firewall processes requests through validation, cache lookup, upstream forwarding, and cache storage.

Exact cache

The firewall checks Redis / Valkey for an identical normalized request.

Semantic cache

If exact cache misses, semantic cache can search Qdrant for similar prompts.

A candidate is reusable only if:

similarity_score >= semantic_similarity_threshold
AND
expires_at > now
AND
cached response payload is valid

In v0.1.6, expired entries are filtered before similarity ranking.

Upstream request

If no valid cache hit exists, the request is forwarded to the upstream OpenAI-compatible provider.

Cache storage

The upstream response can be stored in Redis and Qdrant.