Skip to main content

Runtime Overview

AI Cost Firewall supports:

  • health and readiness endpoints
  • graceful shutdown
  • request draining
  • upstream and embedding timeout handling
  • hot reload through SIGHUP
  • runtime metrics
  • semantic cache fail-open behavior
  • OpenAI-compatible provider diagnostics
  • embedding provider timeout visibility
  • release and compatibility introspection through /version

Startup dependencies

Redis is required for exact caching.

Qdrant is required when:

semantic_cache_enabled true;

semantic_cache_fail_open applies to runtime semantic lookup failures only.

AI Cost Firewall validates runtime dependencies during startup and reload. v0.2.0 startup diagnostics are intended to make pilot deployment failures easier to identify before traffic is sent.

This includes:

  • loaded configuration summary
  • Redis connectivity
  • Qdrant connectivity
  • semantic cache configuration completeness
  • vector-size compatibility
  • OpenAI-compatible upstream and embedding provider configuration

During graceful shutdown:

  • readiness becomes unavailable
  • new requests are rejected
  • in-flight requests continue

AI Cost Firewall supports nginx-style configuration reload using:

SIGHUP

semantic_cache_fail_open affects runtime semantic lookup behavior only and does not bypass startup validation.

Version endpoint

The /version endpoint reports the running release and compatibility model. For v0.2.0, it confirms that AI Cost Firewall is a pilot-ready OpenAI-compatible gateway and that provider-specific configuration blocks are intentionally not part of this release.