Caching Strategy
AI Cost Firewall uses two cache layers.
Exact cache
Backend:
Redis / Valkey
Exact cache stores responses for identical normalized requests.
Benefits:
- very low latency
- no embedding cost
- predictable matching
Semantic cache
Backend:
Qdrant
Semantic cache stores embeddings and response payloads for semantically similar prompts.
Example similar prompts:
"Explain Redis briefly"
"What is Redis used for?"
Similarity threshold
semantic_similarity_threshold 0.92;
Typical values:
| Value | Behavior |
|---|---|
0.85 | aggressive reuse |
0.92 | balanced default |
0.97 | strict reuse |
Freshness
Semantic entries include inserted_at and expires_at. Expired entries are not reused.