Skip to main content

Performance Characteristics

Performance depends on infrastructure, Redis latency, Qdrant latency, embedding provider latency, upstream model latency, and cache hit rate.

ScenarioExpected behavior
Exact cache hitfastest path; no upstream call; no embedding lookup
Semantic cache hitavoids upstream chat call; requires embedding and Qdrant lookup
Cache missdominated by upstream LLM latency

Useful metrics

aif_upstream_request_duration_seconds
aif_upstream_timeouts_total
aif_upstream_calls_total
aif_cache_exact_hits
aif_cache_semantic_hits
aif_cache_misses