Release Notes v0.1.8
v0.1.8 adds cost and savings intelligence to make the financial impact of AI Cost Firewall easier to measure, explain, and demonstrate.
Cost and savings intelligence
- Added per-model request cost visibility.
- Added per-model input and output token metrics.
- Added structured gross savings metrics by model and cache type.
- Added structured net savings metrics by model and cache type.
- Added embedding overhead metrics for semantic lookup and store operations.
- Added request-related cost metrics split by chat and embedding cost type.
Exact vs semantic savings
Exact and semantic cache savings are now easier to evaluate separately:
- Exact cache savings represent avoided upstream chat-completion cost.
- Semantic cache savings represent avoided upstream chat-completion cost minus embedding overhead.
- Gross savings, embedding overhead, and net savings are exposed separately.
New metrics
aif_model_cost_micro_usd_total{model="..."}
aif_model_requests_total{model="..."}
aif_model_input_tokens_total{model="..."}
aif_model_output_tokens_total{model="..."}
aif_gross_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_net_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_embedding_overhead_micro_usd_total{model="...", operation="lookup|store"}
aif_request_cost_micro_usd_total{model="...", cost_type="chat|embedding"}
aif_cache_hits_total{model="...", cache_type="exact|semantic"}
Grafana dashboards
Updated Overview dashboard now shows:
- estimated chat cost
- gross savings
- embedding overhead
- net savings
- net savings percentage
- savings by model
- savings by cache type
- exact vs semantic hit rate
- cost per upstream request
- net saved per cache hit
- top models by spend
- top models by savings
Updated the Diagnostics dashboard now shows:
- embedding overhead over time
- gross vs net semantic savings
- exact vs semantic savings
- semantic cache misses vs passes
- semantic net savings by model
- potentially low-value semantic models
- semantic threshold and expiration behavior
- provider and semantic latency diagnostics
Documentation
- Added explanation of the cost and savings accounting model.
- Clarified the difference between gross savings, embedding overhead, and net savings.
- Clarified why exact and semantic cache savings should be evaluated separately.
- Updated metrics and dashboard documentation for the new v0.1.8 observability model.