Troubleshooting
Confirm the running release
Before debugging a pilot deployment, confirm the running binary and compatibility model:
curl -s http://localhost:8080/version
For v0.2.0, the response should report version as 0.2.0, supported_api_style as openai_compatible, and provider_specific_config_blocks as false.
Low cache hit rate
Check:
aif_cache_exact_hits
aif_cache_semantic_hits
aif_cache_misses
aif_semantic_threshold_results_total{result="fail"}
Common causes: threshold too high, prompts not similar, semantic cache disabled, retention too short.
Startup diagnostics fail before the server is ready
Check the firewall logs first. v0.2.0 startup diagnostics should identify whether the failure is related to configuration, Redis, Qdrant, vector-size compatibility, upstream connectivity, embedding provider behavior, DNS, connection errors, or TLS/certificate validation.
Use /healthz, /readyz, and /version together:
curl -i http://localhost:8080/healthz
curl -i http://localhost:8080/readyz
curl -s http://localhost:8080/version
/healthz only shows that the process is alive. /readyz shows whether it is ready to serve traffic. /version confirms what release is running.
Redis connection failure
Docker Compose:
redis_url redis://redis:6379;
Local source run:
redis_url redis://127.0.0.1:6379;
Qdrant connection failure
Docker Compose:
qdrant_url http://qdrant:6334;
Local source run:
qdrant_url http://127.0.0.1:6334;
Qdrant vector size mismatch
Fix by recreating the collection, using a matching embedding model, or updating qdrant_vector_size.
ai-firewall: command not found
After source build, use:
./target/release/ai-firewall
Or install it:
sudo install -m 0755 target/release/ai-firewall /usr/local/bin/ai-firewall
/metrics shows one in-flight request
This is normal because the metrics request itself is active.
Wrong OpenAI-compatible base URL
Use a provider root URL or /v1 base path.
Correct:
upstream_base_url http://ollama:11434/v1;
Wrong:
upstream_base_url http://ollama:11434/v1/chat/completions;
Upstream provider errors
Check:
aif_errors_total{class="upstream_authentication_error"}
aif_errors_total{class="upstream_not_found"}
aif_errors_total{class="upstream_rate_limited"}
aif_errors_total{class="upstream_tls_error"}
aif_errors_total{class="upstream_dns_error"}
aif_errors_total{class="upstream_connect_error"}
Common causes:
wrong provider API key
- full endpoint path configured instead of base URL
- provider hostname cannot be resolved
- provider port is unreachable
- self-signed or hostname-mismatched TLS certificate
Embedding provider timeouts
Check:
aif_embedding_timeouts_total
aif_embedding_request_duration_seconds
Common causes:
- wrong provider API key
- full endpoint path configured instead of base URL
- provider hostname cannot be resolved
- provider port is unreachable
- self-signed or hostname-mismatched TLS certificate
Dashboards are empty
Common causes:
- no traffic has been sent yet
- Prometheus is not scraping the firewall
- Grafana datasource is not connected
- wrong Compose working directory
- dashboard provisioning paths are wrong
Check:
curl http://localhost:8080/metrics
and open:
http://localhost:9090/targets