Skip to main content

Configuration Overview

AI Cost Firewall uses nginx-style configuration.

directive value;

Example:

listen_addr 0.0.0.0:8080;

Directives are case-sensitive and must end with a semicolon.

Configuration model

AI Cost Firewall keeps configuration intentionally flat. OpenAI-compatible cloud services, local model servers, and proxy gateways are configured with the same directive style instead of provider-specific configuration blocks.

Use:

  • upstream_* directives for chat-completion traffic
  • embedding_* directives for semantic-cache embeddings
  • cache directives for exact-cache and semantic-cache behavior
  • optional guard directives for VCAL Guard integrations

upstream_base_url and embedding_base_url may use either a provider root URL or its /v1 base path. Do not configure full endpoint paths such as /v1/chat/completions or /v1/embeddings.

Example configuration

listen_addr 0.0.0.0:8080;

# Redis exact cache
#
# Local/dev example:
redis_url redis://redis:6379;

# Production reminder:
# Protect Redis with authentication and private networking.
# Example with password:
# redis_url redis://:your-redis-password@redis:6379;

# Chat upstream provider
# Default: openai_compatible
# Supports OpenAI-compatible /v1/chat/completions endpoints.
upstream_provider openai_compatible;
upstream_base_url https://api.openai.com;
upstream_api_key sk-your-api-key;

# Embedding provider
# Default: openai_compatible
# Supports OpenAI-compatible /v1/embeddings endpoints.
embedding_provider openai_compatible;
embedding_base_url https://api.openai.com;
embedding_api_key sk-your-api-key;
embedding_model text-embedding-3-small;

# Qdrant semantic cache
#
# Local/dev example:
qdrant_url http://qdrant:6334;

# Production reminder:
# Protect Qdrant with an API key and private networking.
# Example:
# qdrant_api_key your-qdrant-api-key;

qdrant_collection aif_semantic_cache;

# Must match the dimension of the configured embedding_model.
qdrant_vector_size 1536;

# Backward-compatible default for both cache layers.
cache_ttl_seconds 2592000;

# Optional explicit lifecycle controls.
# exact_cache_ttl_seconds 86400;
# semantic_cache_retention_seconds 604800;

# Semantic cache lifecycle behavior:
# - Entries include inserted_at and expires_at metadata
# - Expired entries are skipped during lookup
# - Entries are NOT automatically deleted from Qdrant
#
# To clean up expired entries manually:
# ai-firewall --prune-expired-semantic-cache

# Request and provider timeouts.
#
# request_timeout_seconds is kept for compatibility.
# Prefer the more specific upstream and embedding timeouts where available.
request_timeout_seconds 120;
upstream_timeout_seconds 120;
embedding_timeout_seconds 30;

# Request protection limits.
max_request_body_bytes 1M;
max_prompt_chars 200000;

# Exact cache controls.
exact_cache_enabled true;
exact_cache_store_enabled true;

# If Redis is unavailable:
# - true: skip exact cache and continue to upstream
# - false: fail the request
exact_cache_fail_open true;

# Semantic cache controls.
semantic_cache_enabled true;
semantic_cache_store_enabled true;
semantic_similarity_threshold 0.92;

# Higher threshold = stricter similarity and fewer semantic hits.
# Lower threshold = more reuse, but potentially less precise matches.

# If the embedding/semantic provider is unavailable:
# - true: skip semantic cache and continue to upstream
# - false: fail the request
semantic_cache_fail_open true;

# Optional cache bypass header.
# Requests with this header bypass cache lookup and go directly upstream.
cache_bypass_header X-AIF-Cache-Bypass;

# Model validation behavior.
# By default, only models defined via `model_price` are allowed.
# Unknown models will be rejected with 400.
allow_unknown_models_pass_through false;

# Chat-completion pricing (USD per 1M tokens).
# model_price <model> <input_usd_per_1m_tokens> <output_usd_per_1m_tokens>;

model_price gpt-4o-mini-2024-07-18 0.15 0.60;
model_price gpt-4.1-mini-2025-04-14 0.30 1.20;

# Embedding pricing.
# Optional. Used for net cost estimation only.
embedding_price 0.020;

# Optional VCAL Privacy Guard integration.
#
# When enabled, AI Cost Firewall can call VCAL Privacy Guard before sending
# prompts upstream and can restore placeholders in assistant responses.
#
# This is normally used in enterprise/private deployments.
privacy_guard_enabled false;
privacy_guard_url http://vcal-privacy-guard:8090;
privacy_guard_api_key your-privacy-guard-api-key;
privacy_guard_mode anonymize;
privacy_guard_restore_enabled true;
privacy_guard_tenant_id default;
privacy_guard_policy_id default;
privacy_guard_timeout_seconds 10;

# Guard failure behavior:
# - true: continue when the guard is unavailable
# - false: fail closed when the guard is unavailable
guard_fail_open false;

Core directives

DirectivePurpose
listen_addrAddress and port where AI Cost Firewall listens.
redis_urlRedis connection URL for the exact cache.
upstream_providerChat upstream provider type. Currently uses OpenAI-compatible behavior.
upstream_base_urlBase URL for the chat-completion provider.
upstream_api_keyAPI key for the chat-completion provider.
embedding_providerEmbedding provider type. Currently uses OpenAI-compatible behavior.
embedding_base_urlBase URL for the embedding provider.
embedding_api_keyAPI key for the embedding provider.
embedding_modelEmbedding model used for semantic-cache vectors.
qdrant_urlQdrant URL for the semantic cache.
qdrant_api_keyOptional Qdrant API key.
qdrant_collectionQdrant collection used by AI Cost Firewall.
qdrant_vector_sizeVector size. Must match the configured embedding model.

Cache controls

DirectivePurpose
exact_cache_enabledEnables or disables exact-cache lookup.
exact_cache_store_enabledControls whether new exact-cache entries are stored.
exact_cache_fail_openControls whether Redis failures are skipped or treated as request failures.
semantic_cache_enabledEnables or disables semantic-cache lookup.
semantic_cache_store_enabledControls whether new semantic-cache entries are stored.
semantic_similarity_thresholdMinimum similarity score required for a semantic-cache hit.
semantic_cache_fail_openControls whether embedding/Qdrant failures are skipped or treated as request failures.
cache_ttl_secondsBackward-compatible TTL default for cache layers.
exact_cache_ttl_secondsOptional explicit TTL for exact-cache entries.
semantic_cache_retention_secondsOptional retention period for semantic-cache entries.
cache_bypass_headerOptional request header used to bypass cache lookup.

Request limits and timeouts

DirectivePurpose
request_timeout_secondsBackward-compatible request timeout.
upstream_timeout_secondsTimeout for upstream chat-completion requests.
embedding_timeout_secondsTimeout for embedding provider requests.
max_request_body_bytesMaximum accepted HTTP request body size.
max_prompt_charsMaximum accepted combined prompt size.

Model and pricing controls

DirectivePurpose
allow_unknown_models_pass_throughAllows or rejects models not defined with model_price.
model_priceDefines chat-completion model pricing for savings and cost metrics.
embedding_priceOptional embedding price used for net cost estimation.

Optional VCAL Privacy Guard directives

DirectivePurpose
privacy_guard_enabledEnables or disables VCAL Privacy Guard integration.
privacy_guard_urlURL of the VCAL Privacy Guard service.
privacy_guard_api_keyAPI key used by AI Cost Firewall to call VCAL Privacy Guard.
privacy_guard_modeGuard mode, for example anonymize.
privacy_guard_restore_enabledRestores placeholders in assistant responses before returning to the client.
privacy_guard_tenant_idOptional tenant identifier passed to VCAL Privacy Guard.
privacy_guard_policy_idOptional policy identifier passed to VCAL Privacy Guard.
privacy_guard_timeout_secondsTimeout for calls to VCAL Privacy Guard.
guard_fail_openControls whether guard failures are skipped or treated as request failures.

Environment variables

Most deployment examples use configuration files. Containerized deployments may also provide selected settings through environment variables.

Common environment variables include:

AIF_PRIVACY_GUARD_ENABLED
AIF_PRIVACY_GUARD_URL
AIF_PRIVACY_GUARD_API_KEY
AIF_PRIVACY_GUARD_MODE
AIF_PRIVACY_GUARD_RESTORE_ENABLED
AIF_GUARD_FAIL_OPEN

For production deployments, prefer secrets management for API keys instead of hard-coding credentials in committed configuration files.

Default paths

configs/ai-firewall.conf
/etc/ai-firewall/ai-firewall.conf

Example configurations

Configuration-only examples are available under:

configs/examples/

Runnable deployment examples are available under:

deploy/examples/

Use configs/examples/ for reusable snippets and deploy/examples/ for full Docker Compose evaluation patterns.