OpenAI-Compatible Providers

AI Cost Firewall supports practical OpenAI-compatible chat and embedding endpoints while keeping the flat configuration model.

Supported patterns

Provider	Typical role
OpenAI	Chat and embeddings
Ollama	Local chat and embeddings
LM Studio	Local desktop inference
vLLM	Self-hosted GPU inference
LiteLLM	Aggregation/proxy layer
OpenRouter	Multi-provider upstream

upstream_provider openai_compatible;
upstream_base_url <base-url>;
upstream_api_key <key-or-placeholder>;

embedding_provider openai_compatible;
embedding_base_url <base-url>;
embedding_api_key <key-or-placeholder>;

The base URL may be either the provider root URL or its /v1 base path:

https://api.openai.com
https://api.openai.com/v1
http://ollama:11434
http://ollama:11434/v1
http://lmstudio:1234/v1
http://vllm:8000/v1
http://litellm:4000/v1

Do not configure the full endpoint path:

# Wrong
upstream_base_url http://ollama:11434/v1/chat/completions;

# Correct
upstream_base_url http://ollama:11434/v1;

For local providers without authentication, use a placeholder key:

upstream_api_key dummy;
embedding_api_key dummy;

Accepted placeholder values are dummy, none, null, and -.

Deployment examples

Runnable examples are available under:

deploy/examples/

Recommended examples:

openai-cloud/
local-ollama/
hybrid-openai-local-embeddings/
openrouter/
local-full-stack/

Supported patterns​

Deployment examples​

Supported patterns

Deployment examples