Configuration Directives
Core
listen_addr 0.0.0.0:8080;
redis_url redis://redis:6379;
Upstream
upstream_provider openai_compatible;
upstream_base_url https://api.openai.com;
upstream_api_key sk-xxxx;
upstream_base_url may be the provider root URL or its /v1 base path.
Correct:
upstream_base_url http://ollama:11434/v1;
Wrong:
upstream_base_url http://ollama:11434/v1/chat/completions;
Placeholder no-auth values:
dummy
none
null
-
Embeddings
embedding_provider openai_compatible;
embedding_base_url https://api.openai.com;
embedding_api_key sk-xxxx;
embedding_model text-embedding-3-small;
embedding_price 0.020;
For local providers without authentication, use dummy, none, null, or -. Placeholder keys do not create upstream bearer auth headers.
Qdrant
qdrant_url http://qdrant:6334;
qdrant_api_key your-qdrant-key;
qdrant_collection aif_semantic_cache;
qdrant_vector_size 1536;
qdrant_vector_size must match the embedding model. Existing collections are validated at startup.
Cache lifecycle
cache_ttl_seconds 86400;
exact_cache_ttl_seconds 86400;
semantic_cache_retention_seconds 604800;
Request behavior
request_timeout_seconds 120;
max_request_body_bytes 1M;
Request limits
max_request_body_bytes 1M;
Supported formats:
1024
512K
1M
2M
Semantic cache
semantic_cache_enabled true;
semantic_cache_fail_open true;
semantic_similarity_threshold 0.92;
semantic_cache_fail_open applies to runtime lookup failures only, not startup initialization.
Model pricing
model_price gpt-4o-mini-2024-07-18 0.15 0.60;
allow_unknown_models_pass_through false;
For providers with variable model names, such as OpenRouter, use:
allow_unknown_models_pass_through true;