Skip to main content

Release Notes v0.1.8

v0.1.8 adds cost and savings intelligence to make the financial impact of AI Cost Firewall easier to measure, explain, and demonstrate.

Cost and savings intelligence

  • Added per-model request cost visibility.
  • Added per-model input and output token metrics.
  • Added structured gross savings metrics by model and cache type.
  • Added structured net savings metrics by model and cache type.
  • Added embedding overhead metrics for semantic lookup and store operations.
  • Added request-related cost metrics split by chat and embedding cost type.

Exact vs semantic savings

Exact and semantic cache savings are now easier to evaluate separately:

  • Exact cache savings represent avoided upstream chat-completion cost.
  • Semantic cache savings represent avoided upstream chat-completion cost minus embedding overhead.
  • Gross savings, embedding overhead, and net savings are exposed separately.

New metrics

aif_model_cost_micro_usd_total{model="..."}
aif_model_requests_total{model="..."}
aif_model_input_tokens_total{model="..."}
aif_model_output_tokens_total{model="..."}
aif_gross_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_net_saved_micro_usd_total{model="...", cache_type="exact|semantic"}
aif_embedding_overhead_micro_usd_total{model="...", operation="lookup|store"}
aif_request_cost_micro_usd_total{model="...", cost_type="chat|embedding"}
aif_cache_hits_total{model="...", cache_type="exact|semantic"}

Grafana dashboards

Updated Overview dashboard now shows:

  • estimated chat cost
  • gross savings
  • embedding overhead
  • net savings
  • net savings percentage
  • savings by model
  • savings by cache type
  • exact vs semantic hit rate
  • cost per upstream request
  • net saved per cache hit
  • top models by spend
  • top models by savings

Updated the Diagnostics dashboard now shows:

  • embedding overhead over time
  • gross vs net semantic savings
  • exact vs semantic savings
  • semantic cache misses vs passes
  • semantic net savings by model
  • potentially low-value semantic models
  • semantic threshold and expiration behavior
  • provider and semantic latency diagnostics

Documentation

  • Added explanation of the cost and savings accounting model.
  • Clarified the difference between gross savings, embedding overhead, and net savings.
  • Clarified why exact and semantic cache savings should be evaluated separately.
  • Updated metrics and dashboard documentation for the new v0.1.8 observability model.