Skip to main content

AI Cost Firewall v0.1.9 — Pilot Deployment Polish

v0.1.9 focuses on improving the operator and evaluator experience before the v0.2.0 milestone.

This release avoids heavy internal architectural changes and instead concentrates on deployment clarity, onboarding flow, observability guidance, provider compatibility documentation, and runnable deployment examples.

The goal of v0.1.9 is to make AI Cost Firewall significantly easier to:

  • install
  • evaluate
  • deploy
  • troubleshoot
  • observe
  • understand

especially for pilot users and infrastructure evaluators.


Release Theme

Pilot Deployment Polish

AI Cost Firewall is becoming easier for operators and pilot customers to install, test, and understand.


Highlights

Runnable Deployment Examples

v0.1.9 introduces structured deployment examples under:

deploy/examples/

Included deployment patterns:

ExampleDescription
openai-cloud/Fastest cloud evaluation path
local-ollama/Fully local Ollama deployment
hybrid-openai-local-embeddings/OpenAI chat + local embeddings
openrouter/OpenRouter upstream + OpenAI embeddings
local-full-stack/Fully local stack with dashboards

Each example includes:

  • docker-compose.yml
  • minimal firewall configuration
  • example requests
  • expected behavior
  • expected metrics
  • optional observability overlays

Improved Observability Guidance

Deployment examples now clearly document:

  • Prometheus integration
  • Grafana dashboards
  • metrics endpoints
  • semantic diagnostics
  • expected cache metrics
  • dashboard provisioning behavior

Examples now distinguish between:

  • built-in observability stacks
  • optional observability overlays

Improved README Structure

The README has been reorganized around:

  • deployment-first onboarding
  • deployment patterns
  • operational visibility
  • observability
  • provider compatibility
  • quick evaluation flow

The updated structure is designed to improve:

  • GitHub onboarding
  • Docker Hub readability
  • evaluator experience
  • deployment clarity

Expanded Documentation

Documentation was significantly reorganized and expanded.

New or improved documents include:

DocumentPurpose
provider-compatibility.mdOpenAI-compatible provider guidance
troubleshooting.mdOperational troubleshooting
quickstart.mdDeployment-first onboarding
operation.mdRuntime behavior and lifecycle
how-it-works.mdRequest lifecycle and cache flow
architecture.mdSystem architecture overview
config-reference.mdDirective reference
faq.mdExpanded operational FAQ

OpenAI-Compatible Provider Improvements

v0.1.9 further clarifies compatibility with OpenAI-compatible providers including:

  • OpenAI
  • Ollama
  • LM Studio
  • vLLM
  • LiteLLM
  • OpenRouter

The release emphasizes:

  • flat provider configuration
  • provider portability
  • separate chat and embedding endpoints
  • OpenAI-compatible deployment patterns

without introducing provider-specific configuration blocks.


Deployment Improvements

Deployment examples now better demonstrate:

  • OpenAI cloud deployments
  • fully local deployments
  • hybrid deployments
  • local embedding patterns
  • OpenRouter deployments
  • observability-enabled deployments

This significantly improves evaluation speed for new operators.


Dashboard & Metrics Improvements

Documentation and deployment examples now better explain:

  • exact cache metrics
  • semantic cache metrics
  • semantic threshold diagnostics
  • embedding overhead
  • gross savings
  • net savings
  • semantic lookup latency

Dashboard provisioning guidance has also been improved.


Operational Improvements

Documentation now more clearly explains:

  • readiness vs liveness
  • graceful shutdown
  • runtime dependency validation
  • hot reload behavior
  • semantic cache lifecycle
  • vector-size validation
  • runtime fail-open behavior
  • logging behavior

Troubleshooting Improvements

v0.1.9 significantly expands troubleshooting guidance for:

  • wrong provider base URLs
  • Qdrant vector mismatch
  • TLS certificate issues
  • empty dashboards
  • provider connectivity
  • semantic cache misses
  • embedding failures
  • Docker Compose path problems

Included Dashboards

AI Cost Firewall continues to include:

Overview Dashboard

Shows:

  • request traffic
  • cache hit rates
  • gross savings
  • embedding overhead
  • net savings

Diagnostics Dashboard

Shows:

  • semantic threshold pass/fail behavior
  • semantic lookup latency
  • semantic candidate activity
  • runtime cache diagnostics

Example Provider Patterns

Typical supported deployment combinations:

Chat ProviderEmbedding Provider
OpenAIOpenAI
OpenAIOllama
OpenRouterOpenAI
OllamaOllama
LiteLLMOllama

Upgrade Notes

No major configuration migration is required from v0.1.8.

Recommended actions:

  • review new deployment examples
  • update documentation references
  • verify dashboard provisioning paths
  • review provider compatibility guidance
  • verify vector-size configuration for embeddings

Recommended Starting Points

Fastest Cloud Evaluation

deploy/examples/openai-cloud/

Fully Local Evaluation

deploy/examples/local-full-stack/

Hybrid Deployment Evaluation

deploy/examples/hybrid-openai-local-embeddings/

Compatibility Notes

v0.1.9 continues to focus on:

OpenAI-compatible APIs only

Provider-specific configuration blocks remain intentionally deferred until after v0.2.0.


Looking Ahead to v0.2.0

The v0.2.0 roadmap continues focusing on:

  • gateway maturity
  • operational reliability
  • deployment flexibility
  • observability
  • provider interoperability

while remaining within the OpenAI-compatible ecosystem.


Full Source Code

GitHub:

https://github.com/vcal-project/ai-firewall

Documentation:

https://ai-firewall.docs.vcal-project.com/