Caching Strategy

AI Cost Firewall evaluates cache reuse in stages:

Exact cache

Backend:

Redis / Valkey

Exact cache stores responses for identical normalized requests.

Benefits:

Backend:

Qdrant

Semantic cache stores embeddings and response payloads for semantically similar prompts.

Example similar prompts:

"Explain Redis briefly"
"What is Redis used for?"

Semantic cache lookup requires an embedding request. When embedding_price is configured, this cost is included in net savings calculations.

Semantic cache may introduce embedding overhead.

AI Cost Firewall therefore distinguishes:

semantic_similarity_threshold 0.92;

Typical values:

Semantic entries include inserted_at and expires_at. Expired entries are not reused.

Runnable deployment examples are available under:

deploy/examples/