Configuration Directives

Core

listen_addr 0.0.0.0:8080;
redis_url redis://redis:6379;

Upstream

upstream_provider openai_compatible;
upstream_base_url https://api.openai.com;
upstream_api_key sk-xxxx;

upstream_base_url may be the provider root URL or its /v1 base path.

Correct:

upstream_base_url http://ollama:11434/v1;

Wrong:

upstream_base_url http://ollama:11434/v1/chat/completions;

Placeholder no-auth values:

dummy
none
null
-

Embeddings

embedding_provider openai_compatible;
embedding_base_url https://api.openai.com;
embedding_api_key sk-xxxx;
embedding_model text-embedding-3-small;
embedding_price 0.020;

For local providers without authentication, use dummy, none, null, or -. Placeholder keys do not create upstream bearer auth headers.

Qdrant

qdrant_url http://qdrant:6334;
qdrant_api_key your-qdrant-key;
qdrant_collection aif_semantic_cache;
qdrant_vector_size 1536;

qdrant_vector_size must match the embedding model. Existing collections are validated at startup.

Cache lifecycle

cache_ttl_seconds 86400;
exact_cache_ttl_seconds 86400;
semantic_cache_retention_seconds 604800;

Request behavior

request_timeout_seconds 120;
max_request_body_bytes 1M;

Request limits

max_request_body_bytes 1M;

Supported formats:

Semantic cache

semantic_cache_enabled true;
semantic_cache_fail_open true;
semantic_similarity_threshold 0.92;

semantic_cache_fail_open applies to runtime lookup failures only, not startup initialization.

Model pricing

model_price gpt-4o-mini-2024-07-18 0.15 0.60;
allow_unknown_models_pass_through false;

For providers with variable model names, such as OpenRouter, use:

allow_unknown_models_pass_through true;

Core​

Upstream​

Embeddings​

Qdrant​

Cache lifecycle​

Request behavior​

Request limits​

Semantic cache​

Model pricing​