Performance Characteristics
Performance depends on infrastructure, Redis latency, Qdrant latency, embedding provider latency, upstream model latency, and cache hit rate.
| Scenario | Expected behavior |
|---|---|
| Exact cache hit | fastest path; no upstream call; no embedding lookup |
| Semantic cache hit | avoids upstream chat call; requires embedding and Qdrant lookup |
| Cache miss | dominated by upstream LLM latency |
Useful metrics
aif_upstream_request_duration_seconds
aif_upstream_timeouts_total
aif_upstream_calls_total
aif_cache_exact_hits
aif_cache_semantic_hits
aif_cache_misses