Quick Start with Docker
Docker Compose is the fastest way to run AI Cost Firewall.
The Compose stack includes AI Cost Firewall, Redis, Qdrant, Prometheus, and Grafana.
Prerequisites
docker --version
docker compose version
Choose a deployment pattern
For v0.2.0, the recommended starting point remains deploy/examples/.
| Pattern | Use case |
|---|---|
openai-cloud/ | Fastest cloud evaluation |
local-ollama/ | Local Ollama chat + embeddings |
hybrid-openai-local-embeddings/ | OpenAI chat + local embeddings |
openrouter/ | OpenRouter upstream + OpenAI embeddings |
local-full-stack/ | Full local stack with dashboards |
Example:
cd deploy/examples/openai-cloud
docker compose up -d
Clone and configure
git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall
cp configs/ai-firewall.conf.example configs/ai-firewall.conf
nano configs/ai-firewall.conf
OpenAI-compatible examples are available under configs/examples/ for OpenAI, Ollama, LM Studio, vLLM, LiteLLM, and OpenRouter-style setups. v0.2.0 keeps a flat configuration model and does not add provider-specific configuration blocks.
Configure your upstream provider, API key or placeholder, embedding provider if semantic cache is enabled, and exact model pricing:
model_price gpt-4o-mini-2024-07-18 0.15 0.60;
For local providers without authentication, use placeholder keys:
upstream_api_key dummy;
embedding_api_key dummy;
Start the stack
docker compose pull
docker compose up -d
Check services
docker compose ps
docker compose logs -f firewall
| Service | URL |
|---|---|
| Firewall API | http://localhost:8080 |
| Prometheus | http://localhost:9090 |
| Grafana | http://localhost:3000 |
Health, readiness, and version
curl -i http://localhost:8080/healthz
curl -i http://localhost:8080/readyz
curl -s http://localhost:8080/version
Expected healthy result for /healthz and /readyz:
HTTP/1.1 200 OK
Expected /version output includes the running release and compatibility model, for example:
{
"version": "0.2.0",
"release_title": "Pilot-Ready OpenAI-Compatible LLM Gateway",
"supported_api_style": "openai_compatible",
"provider_specific_config_blocks": false
}
Validate configuration
--test-config performs static validation only.
docker compose run --rm firewall \
--config /configs/ai-firewall.conf \
--test-config
Expected output:
configuration OK
This does not connect to Redis, Qdrant, embedding providers, or upstream LLM providers.
Print masked configuration
docker compose run --rm firewall \
--config /configs/ai-firewall.conf \
--print-config
Send a test request
curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-2024-07-18",
"messages": [
{"role": "user", "content": "Say hello."}
]
}'
View metrics
curl http://localhost:8080/metrics
The root Docker Compose stack includes Prometheus and Grafana. Most deployment examples provide an optional docker-compose.observability.yml overlay. local-full-stack/ includes observability directly.