Request Flow
AI Cost Firewall processes requests through validation, cache lookup, upstream forwarding, and cache storage.
Exact cache
The firewall checks Redis / Valkey for an identical normalized request.
Semantic cache
If exact cache misses, semantic cache can search Qdrant for similar prompts.
Semantic cache entries include:
- inserted_at
- expires_at
Expired entries are skipped during lookup and never reused.
A candidate is reusable only if:
similarity_score >= semantic_similarity_threshold
AND
expires_at > now
AND
cached response payload is valid
Expired entries are filtered before similarity ranking.
When:
semantic_cache_fail_open true;
runtime semantic lookup failures behave like cache misses and requests continue upstream normally.
Upstream request
If no valid cache hit exists, the request is forwarded to the upstream OpenAI-compatible provider.
The chat provider and embedding provider may use separate OpenAI-compatible endpoints.
Cache storage
The upstream response can be stored in Redis and Qdrant.