Skip to main content

OpenAI-Compatible Providers

AI Cost Firewall supports practical OpenAI-compatible chat and embedding endpoints while keeping the flat configuration model.

Supported patterns

ProviderTypical role
OpenAIChat and embeddings
OllamaLocal chat and embeddings
LM StudioLocal desktop inference
vLLMSelf-hosted GPU inference
LiteLLMAggregation/proxy layer
OpenRouterMulti-provider upstream
upstream_provider openai_compatible;
upstream_base_url <base-url>;
upstream_api_key <key-or-placeholder>;

embedding_provider openai_compatible;
embedding_base_url <base-url>;
embedding_api_key <key-or-placeholder>;

The base URL may be either the provider root URL or its /v1 base path:

https://api.openai.com
https://api.openai.com/v1
http://ollama:11434
http://ollama:11434/v1
http://lmstudio:1234/v1
http://vllm:8000/v1
http://litellm:4000/v1

Do not configure the full endpoint path:

# Wrong
upstream_base_url http://ollama:11434/v1/chat/completions;

# Correct
upstream_base_url http://ollama:11434/v1;

For local providers without authentication, use a placeholder key:

upstream_api_key dummy;
embedding_api_key dummy;

Accepted placeholder values are dummy, none, null, and -.

Deployment examples

Runnable examples are available under:

deploy/examples/

Recommended examples:

openai-cloud/
local-ollama/
hybrid-openai-local-embeddings/
openrouter/
local-full-stack/