Skip to main content

Release Notes v0.1.7

v0.1.7 hardens support for practical OpenAI-compatible upstream and embedding endpoints.

OpenAI-compatible provider hardening

  • upstream_base_url and embedding_base_url now accept either a provider root URL or its /v1 base path.
  • AI Cost Firewall builds /v1/chat/completions and /v1/embeddings internally.
  • Full endpoint URLs such as http://ollama:11434/v1/chat/completions are rejected with clearer validation errors.
  • Chat upstreams and embedding providers may use different base URLs.

Local provider compatibility

Placeholder API keys do not create upstream bearer auth headers:

dummy
none
null
-

Response compatibility

  • Missing model fields can be normalized using the requested model.
  • Partial usage fields are handled more safely.
  • Extra provider-specific response fields are tolerated.

Diagnostics

Improved diagnostics for:

  • invalid base URLs
  • unsupported endpoint paths
  • authentication failures
  • rate limits
  • upstream timeouts
  • embedding timeouts
  • DNS/connect errors
  • TLS/certificate failures

New metrics

aif_embedding_request_duration_seconds
aif_embedding_timeouts_total

Documentation

  • Added OpenAI-compatible provider documentation.
  • Added examples for OpenAI, Ollama, LM Studio, vLLM, LiteLLM, and OpenRouter-style setups.
  • Updated Grafana dashboards for provider and embedding diagnostics.