Skip to main content

Caching Strategy

AI Cost Firewall uses two cache layers.

Exact cache

Backend:

Redis / Valkey

Exact cache stores responses for identical normalized requests.

Benefits:

  • very low latency
  • no embedding cost
  • predictable matching

Semantic cache

Backend:

Qdrant

Semantic cache stores embeddings and response payloads for semantically similar prompts.

Example similar prompts:

"Explain Redis briefly"
"What is Redis used for?"

Similarity threshold

semantic_similarity_threshold 0.92;

Typical values:

ValueBehavior
0.85aggressive reuse
0.92balanced default
0.97strict reuse

Freshness

Semantic entries include inserted_at and expires_at. Expired entries are not reused.