Release Notes v0.1.8

v0.1.8 adds cost and savings intelligence to make the financial impact of AI Cost Firewall easier to measure, explain, and demonstrate.

Cost and savings intelligence

Added per-model request cost visibility.
Added per-model input and output token metrics.
Added structured gross savings metrics by model and cache type.
Added structured net savings metrics by model and cache type.
Added embedding overhead metrics for semantic lookup and store operations.
Added request-related cost metrics split by chat and embedding cost type.

Exact vs semantic savings

Exact and semantic cache savings are now easier to evaluate separately:

Exact cache savings represent avoided upstream chat-completion cost.
Semantic cache savings represent avoided upstream chat-completion cost minus embedding overhead.
Gross savings, embedding overhead, and net savings are exposed separately.

New metrics

aif_model_cost_micro_usd_total{model="..."}
aif_model_requests_total{model="..."}
aif_model_input_tokens_total{model="..."}
aif_model_output_tokens_total{model="..."}
aif_gross_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_net_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_embedding_overhead_micro_usd_total{model="...", operation="lookup|store"}
aif_request_cost_micro_usd_total{model="...", cost_type="chat|embedding"}
aif_cache_hits_total{model="...", cache_type="exact|semantic"}

Grafana dashboards

Updated Overview dashboard now shows:

estimated chat cost
gross savings
embedding overhead
net savings
net savings percentage
savings by model
savings by cache type
exact vs semantic hit rate
cost per upstream request
net saved per cache hit
top models by spend
top models by savings

Updated the Diagnostics dashboard now shows:

embedding overhead over time
gross vs net semantic savings
exact vs semantic savings
semantic cache misses vs passes
semantic net savings by model
potentially low-value semantic models
semantic threshold and expiration behavior
provider and semantic latency diagnostics

Documentation

Added explanation of the cost and savings accounting model.
Clarified the difference between gross savings, embedding overhead, and net savings.
Clarified why exact and semantic cache savings should be evaluated separately.
Updated metrics and dashboard documentation for the new v0.1.8 observability model.

Cost and savings intelligence​

Exact vs semantic savings​

New metrics​

Grafana dashboards​

Documentation​

Cost and savings intelligence

Exact vs semantic savings

New metrics

Grafana dashboards

Documentation