Data Relationships

This document is schema-oriented. It describes the persistent relationships that are hard to infer from a single file, but it does not try to restate every runtime rule owned by neighboring docs.

Source of Truth

Core Entity Graph

teams 0..N team_memberships
users 0..1 team_memberships
teams 0..N service_accounts
api_keys belongs to exactly one principal owner: user or service account
api_keys N..N gateway_models through api_key_model_grants
Optional restriction overlays:
- user_model_allowlist
- team_model_allowlist
budgets stores user, service-account, and user-model budgets with one active row per canonical scope key
usage_cost_events records request ownership, model attribution, pricing status, and computed cost
request_logs records the final user-visible request outcome
request_log_payloads stores sanitized request and response bodies separately from the summary row
request_log_attempts stores ordered upstream provider execution attempts for a request log
mcp_tool_invocations stores individual tool-call audit rows correlated by request_id
Request-log purge removes old request-log parents and their payload, tag, and attempt children without touching spend ledger rows
pricing_catalog_cache stores normalized pricing snapshots used by runtime pricing resolution
model_pricing stores effective-dated pricing rows used for historical charging
usage_cost_event_duplicates_archive preserves duplicate-ledger migration/archive context
external_mcp_servers stores user-added MCP server registry rows and soft-disable state
external_mcp_tools stores discovered MCP tools, stable tool ids, schema hashes, schema versions, and active/inactive state
external_mcp_discovery_runs stores immutable discovery attempt diagnostics

Table Catalog

Foundation Tables

providers: upstream provider config and secret references
gateway_models: gateway model registry; rows can be provider-backed or alias-backed
model_routes: execution targets for provider-backed models only
api_key_model_grants: model grants attached to an API key
audit_logs: control-plane audit baseline

model_routes stores two distinct route execution metadata documents:

capabilities_json controls whether the route may execute a request
compatibility_json controls declared provider API compatibility transforms after route selection

capabilities_json includes API-family gates such as chat_completions, responses, and embeddings.

Compatibility metadata is not a provider config fallback and is not an extra_body convention.

Identity and Access Tables

teams
- Key columns: team_id, team_key, status, model_access_mode
- Notes: team_key is the durable identifier; model_access_mode is all|restricted
users
- Key columns: user_id, email, global_role, auth_mode, request_logging_enabled, model_access_mode
- Notes: case-insensitive uniqueness is enforced through email_normalized
team_memberships
- Key columns: team_id, user_id, role
- Notes: one-team-per-user is enforced by a unique user_id
service_accounts
- Key columns: service_account_id, team_id, name, status, created_by_user_id, deactivated_at
- Notes: service accounts are team-owned non-human principals; deletion is deactivation
oidc_providers
- Key columns: oidc_provider_id, provider_key, provider_type, issuer_url, client_id, enabled
- Notes: the current schema supports okta|generic_oidc
user_password_auth
- Key columns: user_id, password_hash, password_updated_at
user_oidc_auth
- Key columns: user_id, oidc_provider_id, subject, email_claim
- Notes: unique (oidc_provider_id, subject)
user_oauth_auth
- Key columns: user_id, oauth_provider, subject
user_oidc_links
- Purpose: pre-provisioned relationship between a user and the OIDC provider they are allowed to activate against

Authorization Overlay Tables

user_model_allowlist
- Relationship: user_id + model_id
team_model_allowlist
- Relationship: team_id + model_id

Ownership, Accounting, and Logging Tables

api_keys
- Key columns: id, public_id, secret_hash, owner_kind, owner_user_id, owner_service_account_id
- Constraint: exactly one owner column must be set consistently with owner_kind
- Notes: direct team-owned runtime keys and system-legacy ownership are not supported
budgets
- Key columns: budget_id, scope_kind, scope_key, user_id, service_account_id, model_id, upstream_model, cadence, amount_10000, hard_limit, timezone, is_active
- Constraint: one active budget per canonical scope_key
- Notes: supported scope kinds are user, service_account, and user_model; teams are not budget principals
usage_cost_events
- Key columns: usage_event_id, request_id, ownership_scope_key, api_key_id, user_id, team_id, actor_user_id, model_id, provider_key, upstream_model, pricing_status, unpriced_reason, pricing_row_id, pricing_provider_id, computed_cost_10000, provider_usage, occurred_at
- Notes: this is the canonical spend ledger used for enforcement and reporting
request_logs
- Key columns: request_log_id, request_id, api_key_id, user_id, team_id, model_key, resolved_model_key, provider_key, caller_service, caller_component, caller_env, status_code, referenced_mcp_server_count, exposed_tool_count, invoked_tool_count, filtered_tool_count, user_agent_raw, agent_harness_key, agent_harness_label, metadata_json, occurred_at
- Notes: one summary row per final request outcome; metadata_json.payload_policy records the capture mode and limits used for the row when request logging is enabled; tool-cardinality columns are nullable typed facts, with historical or not-yet-observable dimensions left null; agent harness usage groups by agent_harness_key while preserving bounded raw User-Agent detail evidence
request_log_payloads
- Key columns: request_log_id, request_json, response_json
- Notes: summary and payload are intentionally split; rows exist only when the payload policy captures redacted payloads
request_log_tags
- Key columns: request_log_id, tag_key, tag_value
- Notes: bounded bespoke caller tags for request-log filtering and attribution
request_log_attempts
- Key columns: request_attempt_id, request_log_id, request_id, attempt_number, route_id, provider_key, upstream_model, status, retryable, terminal, produced_final_response, stream, started_at, completed_at, latency_ms
- Notes: attempt rows are metadata-only children of request_logs; they are ordered by attempt_number and describe provider execution, not pre-provider gateway rejections
mcp_tool_invocations
- Key columns: mcp_tool_invocation_id, request_log_id, request_id, api_key_id, user_id, team_id, owner_kind, server_id, server_display_key, tool_id, tool_display_key, status, policy_result, latency_ms, error_code, has_payload, arguments_payload_truncated, result_payload_truncated, arguments_payload_redacted, result_payload_redacted, metadata_json, occurred_at
- Notes: MCP invocation rows are correlated by request_id; request_log_id is nullable and non-owning because request-log summaries are written at final outcome and may be absent or purged independently. Server/tool stable IDs are nullable until registry records exist, but display keys are required.
mcp_tool_invocation_payloads
- Key columns: mcp_tool_invocation_id, arguments_json, result_json
- Notes: payload rows exist only when MCP invocation payload policy captures sanitized payloads; summary rows are still recorded when payload capture is disabled.

External MCP Registry Tables

external_mcp_servers
- Key columns: mcp_server_id, server_key, display_name, transport, server_url, auth_mode, auth_config_json, timeout_ms, status, last_discovery_status, last_discovery_at, last_successful_discovery_at, last_error_summary, last_tool_count, created_at, updated_at, disabled_at
- Notes: rows are user-added registry records. Recommended catalog entries are not seeded into this table. Disabling a server is the archive/delete path; hard delete is not exposed.
external_mcp_tools
- Key columns: mcp_tool_id, mcp_server_id, upstream_name, display_name, description, input_schema_json, schema_hash, schema_version, is_active, first_discovered_at, last_discovered_at, deactivated_at
- Notes: (mcp_server_id, upstream_name) is unique. Rediscovery preserves mcp_tool_id for stable upstream names and increments schema_version only when schema_hash changes.
external_mcp_discovery_runs
- Key columns: discovery_run_id, mcp_server_id, status, started_at, finished_at, discovered_tool_count, active_tool_count, schema_set_hash, error_summary, details_json
- Notes: diagnostics are bounded and must not contain raw tokens or user credentials.

Request-log purge treats request_logs as the parent retention boundary. Purging a parent row also removes matching request_log_payloads, request_log_tags, and request_log_attempts rows. The purge does not remove usage_cost_events; spend reporting and budget enforcement stay tied to the ledger.

Pricing Catalog Cache

pricing_catalog_cache
- Key columns: catalog_key, source, etag, fetched_at, snapshot_json
- Notes: runtime uses the cached snapshot together with the vendored fallback in the repo
model_pricing
- Key columns: model_pricing_id, pricing_provider_id, pricing_model_id, effective_start_at, effective_end_at
- Notes: effective-dated pricing rows are the durable historical charging source
usage_cost_event_duplicates_archive
- Purpose: preserves duplicate-ledger rows during pricing/ledger migration cleanup and audit backfill flows

Authorization Semantics

Effective model access is the intersection of:

API key grants from api_key_model_grants
Team allowlist, only when teams.model_access_mode='restricted'
User allowlist, only when users.model_access_mode='restricted'

If neither the team nor the user is restricted, grants remain unchanged.

Service-account credentials use API-key grants plus the owning team's allowlist. User allowlists do not apply to service accounts.

Budget and Pricing Notes

Pricing lookup comes from the internal pricing catalog layer, not from provider /v1/models responses
Supported pricing source ids in this slice are openai, google-vertex, and google-vertex-anthropic
openai_compat providers must declare pricing_provider_id
gcp_vertex derives pricing source from the upstream_model publisher prefix
Pricing is exact-only in this slice; unsupported billing modifiers, unsupported publisher/location combinations, and unknown model ids resolve as unpriced
Chargeable requests write usage ledger rows and participate in budget enforcement
Unpriced requests are not charged and must not be budget-blocked
Supported budget cadence values are daily|weekly|monthly
Current UTC window semantics:
- Daily windows start at 00:00:00 UTC
- Weekly windows start at Monday 00:00:00 UTC
- Monthly windows start at the first day of the month at 00:00:00 UTC
- Sunday 23:59:59 UTC remains in the previous weekly window
Budget threshold alerts persist audit rows plus per-recipient delivery rows and currently deliver owner-only email alerts when remaining budget crosses to 20% or less
Service-account budget alerts are delivered to active owners and admins of the owning team
Service-account spend remains attributable to the service account and its owning team

Requested vs Resolved Model Identity

gateway_models can either point directly to provider routes or alias another model
request_logs.model_key stores the requested gateway model
request_logs.resolved_model_key stores the canonical execution model after alias resolution

This distinction matters for admin-facing observability and historical debugging. See model-routing-and-api-behavior.md.

Route Viability Note

Schema alone does not determine whether a model can execute.

Operational viability also depends on:

provider existence
route enabled state
positive route weights
capability filtering

Those rules are owned by configuration-reference.md and model-routing-and-api-behavior.md.

Ownership Notes

User-owned and service-account-owned API keys share the same api_keys table
Service-account usage and request logs exist without an acting user
Direct team-owned runtime API keys are removed from the schema contract
There is no reserved system-legacy ownership path

That ownership model is explained operationally in identity-and-access.md, service-accounts.md, and budgets-and-spending.md.

PostgreSQL and libsql Parity

Both runtime backends are expected to stay logically aligned for:

schema shape
migrations
seed behavior
aliases and request-log model identity
spend ledger behavior
request-log summary, payload, tag, and attempt persistence

See ../crates/gateway-store/README.md for the storage-layer overview.

Data Relationships ​

Source of Truth ​

Core Entity Graph ​

Table Catalog ​

Foundation Tables ​

Identity and Access Tables ​

Authorization Overlay Tables ​

Ownership, Accounting, and Logging Tables ​

External MCP Registry Tables ​

Pricing Catalog Cache ​

Authorization Semantics ​

Budget and Pricing Notes ​

Requested vs Resolved Model Identity ​

Route Viability Note ​

Ownership Notes ​

PostgreSQL and libsql Parity ​