AI Cost Firewall v0.1.9 — Pilot Deployment Polish
v0.1.9 focuses on improving the operator and evaluator experience before the v0.2.0 milestone.
This release avoids heavy internal architectural changes and instead concentrates on deployment clarity, onboarding flow, observability guidance, provider compatibility documentation, and runnable deployment examples.
The goal of v0.1.9 is to make AI Cost Firewall significantly easier to:
- install
- evaluate
- deploy
- troubleshoot
- observe
- understand
especially for pilot users and infrastructure evaluators.
Release Theme
Pilot Deployment Polish
AI Cost Firewall is becoming easier for operators and pilot customers to install, test, and understand.
Highlights
Runnable Deployment Examples
v0.1.9 introduces structured deployment examples under:
deploy/examples/
Included deployment patterns:
| Example | Description |
|---|---|
openai-cloud/ | Fastest cloud evaluation path |
local-ollama/ | Fully local Ollama deployment |
hybrid-openai-local-embeddings/ | OpenAI chat + local embeddings |
openrouter/ | OpenRouter upstream + OpenAI embeddings |
local-full-stack/ | Fully local stack with dashboards |
Each example includes:
docker-compose.yml- minimal firewall configuration
- example requests
- expected behavior
- expected metrics
- optional observability overlays
Improved Observability Guidance
Deployment examples now clearly document:
- Prometheus integration
- Grafana dashboards
- metrics endpoints
- semantic diagnostics
- expected cache metrics
- dashboard provisioning behavior
Examples now distinguish between:
- built-in observability stacks
- optional observability overlays
Improved README Structure
The README has been reorganized around:
- deployment-first onboarding
- deployment patterns
- operational visibility
- observability
- provider compatibility
- quick evaluation flow
The updated structure is designed to improve:
- GitHub onboarding
- Docker Hub readability
- evaluator experience
- deployment clarity
Expanded Documentation
Documentation was significantly reorganized and expanded.
New or improved documents include:
| Document | Purpose |
|---|---|
provider-compatibility.md | OpenAI-compatible provider guidance |
troubleshooting.md | Operational troubleshooting |
quickstart.md | Deployment-first onboarding |
operation.md | Runtime behavior and lifecycle |
how-it-works.md | Request lifecycle and cache flow |
architecture.md | System architecture overview |
config-reference.md | Directive reference |
faq.md | Expanded operational FAQ |
OpenAI-Compatible Provider Improvements
v0.1.9 further clarifies compatibility with OpenAI-compatible providers including:
- OpenAI
- Ollama
- LM Studio
- vLLM
- LiteLLM
- OpenRouter
The release emphasizes:
- flat provider configuration
- provider portability
- separate chat and embedding endpoints
- OpenAI-compatible deployment patterns
without introducing provider-specific configuration blocks.
Deployment Improvements
Deployment examples now better demonstrate:
- OpenAI cloud deployments
- fully local deployments
- hybrid deployments
- local embedding patterns
- OpenRouter deployments
- observability-enabled deployments
This significantly improves evaluation speed for new operators.
Dashboard & Metrics Improvements
Documentation and deployment examples now better explain:
- exact cache metrics
- semantic cache metrics
- semantic threshold diagnostics
- embedding overhead
- gross savings
- net savings
- semantic lookup latency
Dashboard provisioning guidance has also been improved.
Operational Improvements
Documentation now more clearly explains:
- readiness vs liveness
- graceful shutdown
- runtime dependency validation
- hot reload behavior
- semantic cache lifecycle
- vector-size validation
- runtime fail-open behavior
- logging behavior
Troubleshooting Improvements
v0.1.9 significantly expands troubleshooting guidance for:
- wrong provider base URLs
- Qdrant vector mismatch
- TLS certificate issues
- empty dashboards
- provider connectivity
- semantic cache misses
- embedding failures
- Docker Compose path problems
Included Dashboards
AI Cost Firewall continues to include:
Overview Dashboard
Shows:
- request traffic
- cache hit rates
- gross savings
- embedding overhead
- net savings
Diagnostics Dashboard
Shows:
- semantic threshold pass/fail behavior
- semantic lookup latency
- semantic candidate activity
- runtime cache diagnostics
Example Provider Patterns
Typical supported deployment combinations:
| Chat Provider | Embedding Provider |
|---|---|
| OpenAI | OpenAI |
| OpenAI | Ollama |
| OpenRouter | OpenAI |
| Ollama | Ollama |
| LiteLLM | Ollama |
Upgrade Notes
No major configuration migration is required from v0.1.8.
Recommended actions:
- review new deployment examples
- update documentation references
- verify dashboard provisioning paths
- review provider compatibility guidance
- verify vector-size configuration for embeddings
Recommended Starting Points
Fastest Cloud Evaluation
deploy/examples/openai-cloud/
Fully Local Evaluation
deploy/examples/local-full-stack/
Hybrid Deployment Evaluation
deploy/examples/hybrid-openai-local-embeddings/
Compatibility Notes
v0.1.9 continues to focus on:
OpenAI-compatible APIs only
Provider-specific configuration blocks remain intentionally deferred until after v0.2.0.
Looking Ahead to v0.2.0
The v0.2.0 roadmap continues focusing on:
- gateway maturity
- operational reliability
- deployment flexibility
- observability
- provider interoperability
while remaining within the OpenAI-compatible ecosystem.
Full Source Code
GitHub:
https://github.com/vcal-project/ai-firewall
Documentation:
https://ai-firewall.docs.vcal-project.com/