Request Flow
AI Cost Firewall processes requests through validation, cache lookup, upstream forwarding, and cache storage.
Exact cache
The firewall checks Redis / Valkey for an identical normalized request.
Semantic cache
If exact cache misses, semantic cache can search Qdrant for similar prompts.
A candidate is reusable only if:
similarity_score >= semantic_similarity_threshold
AND
expires_at > now
AND
cached response payload is valid
In v0.1.6, expired entries are filtered before similarity ranking.
Upstream request
If no valid cache hit exists, the request is forwarded to the upstream OpenAI-compatible provider.
Cache storage
The upstream response can be stored in Redis and Qdrant.