Skip to main content

Caching Strategy

AI Cost Firewall evaluates cache reuse in stages:

  1. exact cache (Redis)
  2. semantic cache (Qdrant)
  3. upstream request

Exact cache

Backend:

Redis / Valkey

Exact cache stores responses for identical normalized requests.

Benefits:

  • very low latency
  • no embedding lookup cost
  • predictable matching

Semantic cache

Backend:

Qdrant

Semantic cache stores embeddings and response payloads for semantically similar prompts.

Example similar prompts:

"Explain Redis briefly"
"What is Redis used for?"

Semantic cache lookup requires an embedding request. When embedding_price is configured, this cost is included in net savings calculations.

Semantic cache may introduce embedding overhead.

AI Cost Firewall therefore distinguishes:

  • gross savings
  • embedding overhead
  • net savings

Similarity threshold

semantic_similarity_threshold 0.92;

Typical values:

ValueBehavior
0.85aggressive reuse
0.92balanced default
0.97strict reuse

Freshness

Semantic entries include inserted_at and expires_at. Expired entries are not reused.

Runnable deployment examples are available under:

deploy/examples/