Skip to main content

Quick Start with Docker

Docker Compose is the fastest way to run AI Cost Firewall.

The Compose stack includes AI Cost Firewall, Redis, Qdrant, Prometheus, and Grafana.

Prerequisites

docker --version
docker compose version

Choose a deployment pattern

For v0.2.0, the recommended starting point remains deploy/examples/.

PatternUse case
openai-cloud/Fastest cloud evaluation
local-ollama/Local Ollama chat + embeddings
hybrid-openai-local-embeddings/OpenAI chat + local embeddings
openrouter/OpenRouter upstream + OpenAI embeddings
local-full-stack/Full local stack with dashboards

Example:

cd deploy/examples/openai-cloud
docker compose up -d

Clone and configure

git clone https://github.com/vcal-project/ai-firewall.git
cd ai-firewall
cp configs/ai-firewall.conf.example configs/ai-firewall.conf
nano configs/ai-firewall.conf

OpenAI-compatible examples are available under configs/examples/ for OpenAI, Ollama, LM Studio, vLLM, LiteLLM, and OpenRouter-style setups. v0.2.0 keeps a flat configuration model and does not add provider-specific configuration blocks.

Configure your upstream provider, API key or placeholder, embedding provider if semantic cache is enabled, and exact model pricing:

model_price gpt-4o-mini-2024-07-18 0.15 0.60;

For local providers without authentication, use placeholder keys:

upstream_api_key dummy;
embedding_api_key dummy;

Start the stack

docker compose pull
docker compose up -d

Check services

docker compose ps
docker compose logs -f firewall
ServiceURL
Firewall APIhttp://localhost:8080
Prometheushttp://localhost:9090
Grafanahttp://localhost:3000

Health, readiness, and version

curl -i http://localhost:8080/healthz
curl -i http://localhost:8080/readyz
curl -s http://localhost:8080/version

Expected healthy result for /healthz and /readyz:

HTTP/1.1 200 OK

Expected /version output includes the running release and compatibility model, for example:

{
"version": "0.2.0",
"release_title": "Pilot-Ready OpenAI-Compatible LLM Gateway",
"supported_api_style": "openai_compatible",
"provider_specific_config_blocks": false
}

Validate configuration

--test-config performs static validation only.

docker compose run --rm firewall \
--config /configs/ai-firewall.conf \
--test-config

Expected output:

configuration OK

This does not connect to Redis, Qdrant, embedding providers, or upstream LLM providers.

docker compose run --rm firewall \
--config /configs/ai-firewall.conf \
--print-config

Send a test request

curl http://localhost:8080/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "gpt-4o-mini-2024-07-18",
"messages": [
{"role": "user", "content": "Say hello."}
]
}'

View metrics

curl http://localhost:8080/metrics

The root Docker Compose stack includes Prometheus and Grafana. Most deployment examples provide an optional docker-compose.observability.yml overlay. local-full-stack/ includes observability directly.