AI Cost Firewall v0.1.9 — Pilot Deployment Polish

v0.1.9 focuses on improving the operator and evaluator experience before the v0.2.0 milestone.

This release avoids heavy internal architectural changes and instead concentrates on deployment clarity, onboarding flow, observability guidance, provider compatibility documentation, and runnable deployment examples.

The goal of v0.1.9 is to make AI Cost Firewall significantly easier to:

install
evaluate
deploy
troubleshoot
observe
understand

especially for pilot users and infrastructure evaluators.

Release Theme

Pilot Deployment Polish

AI Cost Firewall is becoming easier for operators and pilot customers to install, test, and understand.

Highlights

Runnable Deployment Examples

v0.1.9 introduces structured deployment examples under:

deploy/examples/

Included deployment patterns:

Example	Description
`openai-cloud/`	Fastest cloud evaluation path
`local-ollama/`	Fully local Ollama deployment
`hybrid-openai-local-embeddings/`	OpenAI chat + local embeddings
`openrouter/`	OpenRouter upstream + OpenAI embeddings
`local-full-stack/`	Fully local stack with dashboards

Each example includes:

docker-compose.yml
minimal firewall configuration
example requests
expected behavior
expected metrics
optional observability overlays

Improved Observability Guidance

Deployment examples now clearly document:

Prometheus integration
Grafana dashboards
metrics endpoints
semantic diagnostics
expected cache metrics
dashboard provisioning behavior

Examples now distinguish between:

built-in observability stacks
optional observability overlays

Improved README Structure

The README has been reorganized around:

deployment-first onboarding
deployment patterns
operational visibility
observability
provider compatibility
quick evaluation flow

The updated structure is designed to improve:

GitHub onboarding
Docker Hub readability
evaluator experience
deployment clarity

Expanded Documentation

Documentation was significantly reorganized and expanded.

New or improved documents include:

Document	Purpose
`provider-compatibility.md`	OpenAI-compatible provider guidance
`troubleshooting.md`	Operational troubleshooting
`quickstart.md`	Deployment-first onboarding
`operation.md`	Runtime behavior and lifecycle
`how-it-works.md`	Request lifecycle and cache flow
`architecture.md`	System architecture overview
`config-reference.md`	Directive reference
`faq.md`	Expanded operational FAQ

OpenAI-Compatible Provider Improvements

v0.1.9 further clarifies compatibility with OpenAI-compatible providers including:

OpenAI
Ollama
LM Studio
vLLM
LiteLLM
OpenRouter

The release emphasizes:

flat provider configuration
provider portability
separate chat and embedding endpoints
OpenAI-compatible deployment patterns

without introducing provider-specific configuration blocks.

Deployment Improvements

Deployment examples now better demonstrate:

OpenAI cloud deployments
fully local deployments
hybrid deployments
local embedding patterns
OpenRouter deployments
observability-enabled deployments

This significantly improves evaluation speed for new operators.

Dashboard & Metrics Improvements

Documentation and deployment examples now better explain:

exact cache metrics
semantic cache metrics
semantic threshold diagnostics
embedding overhead
gross savings
net savings
semantic lookup latency

Dashboard provisioning guidance has also been improved.

Operational Improvements

Documentation now more clearly explains:

readiness vs liveness
graceful shutdown
runtime dependency validation
hot reload behavior
semantic cache lifecycle
vector-size validation
runtime fail-open behavior
logging behavior

Troubleshooting Improvements

v0.1.9 significantly expands troubleshooting guidance for:

wrong provider base URLs
Qdrant vector mismatch
TLS certificate issues
empty dashboards
provider connectivity
semantic cache misses
embedding failures
Docker Compose path problems

Included Dashboards

AI Cost Firewall continues to include:

Overview Dashboard

Shows:

request traffic
cache hit rates
gross savings
embedding overhead
net savings

Diagnostics Dashboard

Shows:

semantic threshold pass/fail behavior
semantic lookup latency
semantic candidate activity
runtime cache diagnostics

Example Provider Patterns

Typical supported deployment combinations:

Chat Provider	Embedding Provider
OpenAI	OpenAI
OpenAI	Ollama
OpenRouter	OpenAI
Ollama	Ollama
LiteLLM	Ollama

Upgrade Notes

No major configuration migration is required from v0.1.8.

Recommended actions:

review new deployment examples
update documentation references
verify dashboard provisioning paths
review provider compatibility guidance
verify vector-size configuration for embeddings

Recommended Starting Points

Fastest Cloud Evaluation

deploy/examples/openai-cloud/

Fully Local Evaluation

deploy/examples/local-full-stack/

Hybrid Deployment Evaluation

deploy/examples/hybrid-openai-local-embeddings/

Compatibility Notes

v0.1.9 continues to focus on:

OpenAI-compatible APIs only

Provider-specific configuration blocks remain intentionally deferred until after v0.2.0.

Looking Ahead to v0.2.0

The v0.2.0 roadmap continues focusing on:

gateway maturity
operational reliability
deployment flexibility
observability
provider interoperability

while remaining within the OpenAI-compatible ecosystem.

Full Source Code

GitHub:

https://github.com/vcal-project/ai-firewall

Documentation:

https://ai-firewall.docs.vcal-project.com/