Configuration Overview
AI Cost Firewall uses nginx-style configuration.
directive value;
Example:
listen_addr 0.0.0.0:8080;
Directives are case-sensitive and must end with a semicolon.
Configuration model
AI Cost Firewall keeps configuration intentionally flat. OpenAI-compatible cloud services, local model servers, and proxy gateways are configured with the same directive style instead of provider-specific configuration blocks.
Use:
upstream_*directives for chat-completion trafficembedding_*directives for semantic-cache embeddings- cache directives for exact-cache and semantic-cache behavior
- optional guard directives for VCAL Guard integrations
upstream_base_url and embedding_base_url may use either a provider root URL or its /v1 base path. Do not configure full endpoint paths such as /v1/chat/completions or /v1/embeddings.
Example configuration
listen_addr 0.0.0.0:8080;
# Redis exact cache
#
# Local/dev example:
redis_url redis://redis:6379;
# Production reminder:
# Protect Redis with authentication and private networking.
# Example with password:
# redis_url redis://:your-redis-password@redis:6379;
# Chat upstream provider
# Default: openai_compatible
# Supports OpenAI-compatible /v1/chat/completions endpoints.
upstream_provider openai_compatible;
upstream_base_url https://api.openai.com;
upstream_api_key sk-your-api-key;
# Embedding provider
# Default: openai_compatible
# Supports OpenAI-compatible /v1/embeddings endpoints.
embedding_provider openai_compatible;
embedding_base_url https://api.openai.com;
embedding_api_key sk-your-api-key;
embedding_model text-embedding-3-small;
# Qdrant semantic cache
#
# Local/dev example:
qdrant_url http://qdrant:6334;
# Production reminder:
# Protect Qdrant with an API key and private networking.
# Example:
# qdrant_api_key your-qdrant-api-key;
qdrant_collection aif_semantic_cache;
# Must match the dimension of the configured embedding_model.
qdrant_vector_size 1536;
# Backward-compatible default for both cache layers.
cache_ttl_seconds 2592000;
# Optional explicit lifecycle controls.
# exact_cache_ttl_seconds 86400;
# semantic_cache_retention_seconds 604800;
# Semantic cache lifecycle behavior:
# - Entries include inserted_at and expires_at metadata
# - Expired entries are skipped during lookup
# - Entries are NOT automatically deleted from Qdrant
#
# To clean up expired entries manually:
# ai-firewall --prune-expired-semantic-cache
# Request and provider timeouts.
#
# request_timeout_seconds is kept for compatibility.
# Prefer the more specific upstream and embedding timeouts where available.
request_timeout_seconds 120;
upstream_timeout_seconds 120;
embedding_timeout_seconds 30;
# Request protection limits.
max_request_body_bytes 1M;
max_prompt_chars 200000;
# Exact cache controls.
exact_cache_enabled true;
exact_cache_store_enabled true;
# If Redis is unavailable:
# - true: skip exact cache and continue to upstream
# - false: fail the request
exact_cache_fail_open true;
# Semantic cache controls.
semantic_cache_enabled true;
semantic_cache_store_enabled true;
semantic_similarity_threshold 0.92;
# Higher threshold = stricter similarity and fewer semantic hits.
# Lower threshold = more reuse, but potentially less precise matches.
# If the embedding/semantic provider is unavailable:
# - true: skip semantic cache and continue to upstream
# - false: fail the request
semantic_cache_fail_open true;
# Optional cache bypass header.
# Requests with this header bypass cache lookup and go directly upstream.
cache_bypass_header X-AIF-Cache-Bypass;
# Model validation behavior.
# By default, only models defined via `model_price` are allowed.
# Unknown models will be rejected with 400.
allow_unknown_models_pass_through false;
# Chat-completion pricing (USD per 1M tokens).
# model_price <model> <input_usd_per_1m_tokens> <output_usd_per_1m_tokens>;
model_price gpt-4o-mini-2024-07-18 0.15 0.60;
model_price gpt-4.1-mini-2025-04-14 0.30 1.20;
# Embedding pricing.
# Optional. Used for net cost estimation only.
embedding_price 0.020;
# Optional VCAL Privacy Guard integration.
#
# When enabled, AI Cost Firewall can call VCAL Privacy Guard before sending
# prompts upstream and can restore placeholders in assistant responses.
#
# This is normally used in enterprise/private deployments.
privacy_guard_enabled false;
privacy_guard_url http://vcal-privacy-guard:8090;
privacy_guard_api_key your-privacy-guard-api-key;
privacy_guard_mode anonymize;
privacy_guard_restore_enabled true;
privacy_guard_tenant_id default;
privacy_guard_policy_id default;
privacy_guard_timeout_seconds 10;
# Guard failure behavior:
# - true: continue when the guard is unavailable
# - false: fail closed when the guard is unavailable
guard_fail_open false;
Core directives
| Directive | Purpose |
|---|---|
listen_addr | Address and port where AI Cost Firewall listens. |
redis_url | Redis connection URL for the exact cache. |
upstream_provider | Chat upstream provider type. Currently uses OpenAI-compatible behavior. |
upstream_base_url | Base URL for the chat-completion provider. |
upstream_api_key | API key for the chat-completion provider. |
embedding_provider | Embedding provider type. Currently uses OpenAI-compatible behavior. |
embedding_base_url | Base URL for the embedding provider. |
embedding_api_key | API key for the embedding provider. |
embedding_model | Embedding model used for semantic-cache vectors. |
qdrant_url | Qdrant URL for the semantic cache. |
qdrant_api_key | Optional Qdrant API key. |
qdrant_collection | Qdrant collection used by AI Cost Firewall. |
qdrant_vector_size | Vector size. Must match the configured embedding model. |
Cache controls
| Directive | Purpose |
|---|---|
exact_cache_enabled | Enables or disables exact-cache lookup. |
exact_cache_store_enabled | Controls whether new exact-cache entries are stored. |
exact_cache_fail_open | Controls whether Redis failures are skipped or treated as request failures. |
semantic_cache_enabled | Enables or disables semantic-cache lookup. |
semantic_cache_store_enabled | Controls whether new semantic-cache entries are stored. |
semantic_similarity_threshold | Minimum similarity score required for a semantic-cache hit. |
semantic_cache_fail_open | Controls whether embedding/Qdrant failures are skipped or treated as request failures. |
cache_ttl_seconds | Backward-compatible TTL default for cache layers. |
exact_cache_ttl_seconds | Optional explicit TTL for exact-cache entries. |
semantic_cache_retention_seconds | Optional retention period for semantic-cache entries. |
cache_bypass_header | Optional request header used to bypass cache lookup. |
Request limits and timeouts
| Directive | Purpose |
|---|---|
request_timeout_seconds | Backward-compatible request timeout. |
upstream_timeout_seconds | Timeout for upstream chat-completion requests. |
embedding_timeout_seconds | Timeout for embedding provider requests. |
max_request_body_bytes | Maximum accepted HTTP request body size. |
max_prompt_chars | Maximum accepted combined prompt size. |
Model and pricing controls
| Directive | Purpose |
|---|---|
allow_unknown_models_pass_through | Allows or rejects models not defined with model_price. |
model_price | Defines chat-completion model pricing for savings and cost metrics. |
embedding_price | Optional embedding price used for net cost estimation. |
Optional VCAL Privacy Guard directives
| Directive | Purpose |
|---|---|
privacy_guard_enabled | Enables or disables VCAL Privacy Guard integration. |
privacy_guard_url | URL of the VCAL Privacy Guard service. |
privacy_guard_api_key | API key used by AI Cost Firewall to call VCAL Privacy Guard. |
privacy_guard_mode | Guard mode, for example anonymize. |
privacy_guard_restore_enabled | Restores placeholders in assistant responses before returning to the client. |
privacy_guard_tenant_id | Optional tenant identifier passed to VCAL Privacy Guard. |
privacy_guard_policy_id | Optional policy identifier passed to VCAL Privacy Guard. |
privacy_guard_timeout_seconds | Timeout for calls to VCAL Privacy Guard. |
guard_fail_open | Controls whether guard failures are skipped or treated as request failures. |
Environment variables
Most deployment examples use configuration files. Containerized deployments may also provide selected settings through environment variables.
Common environment variables include:
AIF_PRIVACY_GUARD_ENABLED
AIF_PRIVACY_GUARD_URL
AIF_PRIVACY_GUARD_API_KEY
AIF_PRIVACY_GUARD_MODE
AIF_PRIVACY_GUARD_RESTORE_ENABLED
AIF_GUARD_FAIL_OPEN
For production deployments, prefer secrets management for API keys instead of hard-coding credentials in committed configuration files.
Default paths
configs/ai-firewall.conf
/etc/ai-firewall/ai-firewall.conf
Example configurations
Configuration-only examples are available under:
configs/examples/
Runnable deployment examples are available under:
deploy/examples/
Use configs/examples/ for reusable snippets and deploy/examples/ for full Docker Compose evaluation patterns.