Caching Strategy
AI Cost Firewall evaluates cache reuse in stages:
- exact cache (Redis)
- semantic cache (Qdrant)
- upstream request
Exact cache
Backend:
Redis / Valkey
Exact cache stores responses for identical normalized requests.
Benefits:
- very low latency
- no embedding lookup cost
- predictable matching
Semantic cache
Backend:
Qdrant
Semantic cache stores embeddings and response payloads for semantically similar prompts.
Example similar prompts:
"Explain Redis briefly"
"What is Redis used for?"
Semantic cache lookup requires an embedding request. When embedding_price is configured, this cost is included in net savings calculations.
Semantic cache may introduce embedding overhead.
AI Cost Firewall therefore distinguishes:
- gross savings
- embedding overhead
- net savings
Similarity threshold
semantic_similarity_threshold 0.92;
Typical values:
| Value | Behavior |
|---|---|
0.85 | aggressive reuse |
0.92 | balanced default |
0.97 | strict reuse |
Freshness
Semantic entries include inserted_at and expires_at. Expired entries are not reused.
Runnable deployment examples are available under:
deploy/examples/