Skip to main content

Request Flow

AI Cost Firewall processes requests through validation, cache lookup, upstream forwarding, and cache storage.

Exact cache

The firewall checks Redis / Valkey for an identical normalized request.

Semantic cache

If exact cache misses, semantic cache can search Qdrant for similar prompts.

Semantic cache entries include:

  • inserted_at
  • expires_at

Expired entries are skipped during lookup and never reused.

A candidate is reusable only if:

similarity_score >= semantic_similarity_threshold
AND
expires_at > now
AND
cached response payload is valid

Expired entries are filtered before similarity ranking.

When:

semantic_cache_fail_open true;

runtime semantic lookup failures behave like cache misses and requests continue upstream normally.

Upstream request

If no valid cache hit exists, the request is forwarded to the upstream OpenAI-compatible provider.

The chat provider and embedding provider may use separate OpenAI-compatible endpoints.

Cache storage

The upstream response can be stored in Redis and Qdrant.