Observability and Request Logs

This document describes the live observability contract for the gateway.

Observability Pages

Request Logs: request-scoped admin list/detail, payload policy state, provider attempts, and request-level tool cardinality.
MCP Invocations: per-tool MCP audit rows, request correlation, owner context, authorization policy result, latency, and redacted argument/result metadata.
MCP Registry and Discovery: platform-admin registry management, recommended catalog import flow, Streamable HTTP discovery, auth declarations, and stable MCP server/tool ids.

Source of Truth

observability bootstrap:
- ../crates/gateway/src/observability.rs
HTTP request instrumentation:
- ../crates/gateway/src/http/handlers.rs
request-log lifecycle:
- ../crates/gateway-service/src/request_logging.rs
request-attempt persistence:
- ../crates/gateway-store/migrations/V19__request_log_attempts.sql
redaction policy:
- ../crates/gateway-service/src/redaction.rs
admin APIs:
- ../crates/gateway/src/http/observability.rs
- ../crates/gateway/src/http/mcp_registry.rs

OTLP-First Model

The gateway exports tracing spans and metrics through OpenTelemetry.

Current config knobs:

server.otel_endpoint
server.otel_metrics_endpoint
server.otel_export_interval_secs

The intended deploy path is collector-friendly OTLP export rather than an in-process Prometheus endpoint.

Kubernetes Helm Wiring

The Helm chart keeps observability generic. It does not install an OpenTelemetry Collector, Datadog Agent, or any other collector.

Kubernetes installs can use:

gateway.config.server.otel_endpoint
gateway.config.server.otel_metrics_endpoint
gateway.config.server.otel_export_interval_secs
observability.env
observability.podLabels
observability.podAnnotations
observability.volumes
observability.volumeMounts
observability.sidecars

Common shapes:

existing in-cluster collector Service: set OTLP endpoints in gateway.config.server.*
DaemonSet collector or vendor agent: point OTLP endpoints at the node-local or service DNS address documented by that deployment
sidecar collector: add the collector container with observability.sidecars and point OTLP endpoints at localhost
annotation-driven scraping or injection: set the required labels and annotations through observability.podLabels and observability.podAnnotations

Use Kubernetes and Helm for chart-level values and examples.

OpenTelemetry Collector service example:

yaml

gateway:
  config:
    server:
      otel_endpoint: http://otel-collector.observability.svc:4317
      otel_metrics_endpoint: http://otel-collector.observability.svc:4317
observability:
  env:
    - name: OTEL_SERVICE_NAME
      value: oceans-llm-gateway

Datadog Agent OTLP example:

yaml

gateway:
  config:
    server:
      otel_endpoint: http://datadog-agent.datadog.svc:4317
      otel_metrics_endpoint: http://datadog-agent.datadog.svc:4317
observability:
  podLabels:
    tags.datadoghq.com/service: oceans-llm-gateway
    tags.datadoghq.com/env: production

What Gets Recorded

The runtime emits bounded request-level signals for:

API request totals
request latency
request outcomes
token totals
priced spend metric totals
usage-record totals by pricing status
request tool-cardinality histograms
caller request tags for filtering and attribution; see Tagging

Request correlation is anchored on x-request-id. The HTTP middleware boundary owns request-id generation and propagation: caller-provided values are preserved, and missing values are generated once before handlers run.

Request outcomes are emitted once per request with bounded labels. Important examples in this slice are:

budget_error for pre-provider hard-limit rejection
invalid_request for capability mismatch
upstream_error for upstream execution or stream failure

Tagging and Attribution

The request-log surface records caller-supplied request tags for filtering and attribution. The tag header contract, validation rules, examples, and identity tag guidance live in Tagging.

Request tags are request-scoped. User and team tags are durable identity metadata managed by admins. Keep those two surfaces distinct when exporting or reconciling observability data.

Request Log Storage Shape

Request logs are intentionally split:

request_logs
- hot summary row
request_log_payloads
- sanitized request and response bodies
request_log_tags
- bounded bespoke caller tags
request_log_attempts
- ordered upstream provider execution attempts

The summary row stores:

request identity
owner identity
service-account identity for non-human callers
requested and resolved model identity
provider key
universal caller tags
status, latency, and usage totals
typed MCP and tool cardinality counts
bounded raw User-Agent and normalized agent harness key/label
truncation flags
metadata such as operation, stream, and payload_policy

operation is the public API family. Current values include chat_completions, responses, and embeddings.

Request-attempt rows describe upstream provider execution only. Pre-provider failures such as authentication rejection, capability mismatch, route unavailability, or budget hard-limit rejection have zero attempts. In the current runtime, successful provider-backed requests record one terminal attempt. Retry and fallback execution remain disabled until the configurable policy tracked in issue #118 is implemented.

Native Vertex embeddings use the same request-log surfaces. Successful or failed provider-backed embedding requests record operation: embeddings; provider execution details appear as request-attempt rows when a request-log summary is written; sanitized request and response payloads follow the configured payload policy. Embedding inputs are text-only for native Vertex routes, but the redaction and byte-limit policy still applies before storage.

Tool-cardinality fields are explicit nullable columns on request_logs.

exposed_tool_count: shallow count of OpenAI-compatible request tools.
invoked_tool_count: count of tool-call artifacts observed in normalized provider output.
referenced_mcp_server_count: nullable until an MCP access/filtering layer records server exposure.
filtered_tool_count: nullable until an MCP access/filtering layer records filtered or denied tools.

New Chat Completions and Responses rows record 0 for exposed and invoked counts when no tools are present. Historical rows and unavailable MCP-specific dimensions remain null. Admin surfaces render null as n/a and preserve real zeroes.

Streaming requests persist a bounded transcript payload rather than raw transport bytes.

The stream payload contract is incremental rather than chunk-local:

UTF-8 is reassembled across transport chunk boundaries
SSE data: frames are reassembled across chunk boundaries
both data: and data: forms are accepted
the latest coherent usage object is retained for request-log and ledger work
Responses streams also retain usage from response.usage on completed response events
streaming tool-call artifacts increment invoked_tool_count while SSE frames are parsed for request logging

Request-log payloads are user-visible artifacts. They do not persist the transformed outbound provider request body produced by route compatibility profiles.

Provider stream transcripts can include normalized compatibility output, such as promoted usage or canonical reasoning deltas, because that normalized stream is what the gateway returns to callers. Responses streams preserve response.* event names and payloads rather than being rewritten into Chat Completions chunks.

Request Log Retention and Purge

Request-log retention is admin-controlled. The supported retention windows are intentionally small and explicit:

1d
3d
7d

The default retention window is 7d. Admins can run the purge command manually before enabling any recurring cleanup:

bash

mise run gateway-purge-request-logs-dry-run
mise run gateway-purge-request-logs

Use --dry-run first in production-shaped environments. A dry run reports how many parent request-log rows are older than the selected retention cutoff without deleting data.

When the command runs without --dry-run, it deletes matching request_logs rows and their request-log children:

request_log_payloads
request_log_tags
request_log_attempts

Admins should not hand-delete only one request-log table. Manual partial deletion can leave observability detail misleading even when database constraints prevent direct orphan rows.

Recurring purge is disabled by default and must be opted into from config. Use a standard cron expression and keep the schedule daily or less frequent:

yaml

request_logging:
  purge:
    enabled: false
    retention: 7d
    schedule: "0 0 * * *"

Runtime safety rules:

enabled defaults to false
retention defaults to 7d
only 1d, 3d, and 7d are valid windows
schedule uses standard 5-field cron syntax
recurring schedules must not be more frequent than daily
each gateway process starts its own recurring worker when enabled
the runtime keeps a UTC-day guard so a recurring worker cannot purge more than once per day even if a bad schedule is supplied

Retention only affects operational request-log tables. It does not delete spend ledger rows in usage_cost_events, budget history, provider config, model config, users, teams, or API keys.

Payload Policy

Chat-completion request-log payload persistence is controlled by request_logging.payloads in gateway.yaml.

Default config:

yaml

request_logging:
  payloads:
    capture_mode: redacted_payloads
    request_max_bytes: 65536
    response_max_bytes: 65536
    stream_max_events: 128
    redaction_paths: []

Capture modes:

disabled: skip request-log persistence for chat completions
summary_only: write request_logs summary rows with has_payload=false; do not write request_log_payloads
redacted_payloads: write summary rows and sanitized payload rows

The policy is read from YAML only. The admin UI displays the policy used for each row, but does not edit it.

Owner behavior also matters:

user-owned API keys honor users.request_logging_enabled
service-account credentials always persist request-log summary rows

This is why a user-owned request can be absent from request logs while a service-account request with the same payload policy is still visible.

Validation rules:

request_max_bytes must be greater than zero
response_max_bytes must be greater than zero
stream_max_events must be greater than zero
redaction_paths must use dot-separated object keys, with * as a full-segment wildcard
paths are anchored from the wrapped payload root, for example body.messages.*.content.*.image_url.url

Each request-log row persists lightweight policy metadata in request_logs.metadata_json:

json

{
  "operation": "chat_completions",
  "stream": false,
  "payload_policy": {
    "capture_mode": "redacted_payloads",
    "request_max_bytes": 65536,
    "response_max_bytes": 65536,
    "stream_max_events": 128,
    "version": "builtin:v1"
  }
}

Redaction and Truncation Boundaries

Payloads are wrapped before policy application:

requests: { "headers": ..., "body": ... }
responses: { "body": ... }
streams: { "stream": true, "events": ..., "usage": ..., "error": ... }

Redaction applies one explicit built-in policy plus additive admin-configured paths from request_logging.payloads.redaction_paths.

Sensitive built-in headers include:

authorization
anthropic-api-key
cookie
set-cookie
x-goog-api-key
x-api-key

Sensitive built-in JSON keys include:

token
access_token
refresh_token
api_key
anthropic_api_key
client_secret
credentials
private_key
secret
password

Known bulky provider fields are shape-preserving truncated before the whole-payload byte budget is applied. Built-ins cover OpenAI-compatible image/audio/file payloads, Vertex Gemini inline data, and Vertex Anthropic base64 source data.

Processing order:

wrap the payload
apply built-in and admin-configured redaction rules
truncate known bulky fields while preserving JSON shape where possible
apply request_max_bytes or response_max_bytes as a final guardrail

For streams, the gateway keeps parsing every frame for usage and provider errors. Only stored event payloads are capped by stream_max_events; if the cap is hit, response_payload_truncated=true.

Recent Contract Cleanup

Recent cleanup changed the contract in a few important ways.

fallback-era request metadata is gone
provider execution attempts now live in request_log_attempts instead of summary metadata
missing request-log detail rows return strict 404 not_found
stream payload parsing is more boundary-safe than the earlier chunk-by-chunk behavior
budget-rejected chat requests record a budget_error request outcome without executing the provider

Admins and maintainers should stop expecting:

fallback metadata columns to appear in new request rows
nullable detail lookups for missing rows

Admin Observability APIs

Platform admins can inspect request logs through:

GET /api/v1/admin/observability/leaderboard
GET /api/v1/admin/observability/harness-usage
GET /api/v1/admin/observability/request-logs
GET /api/v1/admin/observability/request-logs/{request_log_id}

Request-log list and detail responses include the row metadata, so admins can see the public operation for each row, such as chat_completions, responses, or embeddings, alongside the typed payload policy and truncation fields.

The MCP invocation admin UI consumes these generated admin API endpoints:

GET /api/v1/admin/observability/mcp-invocations
GET /api/v1/admin/observability/mcp-invocations/{mcp_tool_invocation_id}

Validate documentation-only edits to this page with mise run docs:check before handoff.

Usage Leaderboard

The leaderboard is a separate admin observability surface from spend reporting.

Endpoint:

GET /api/v1/admin/observability/leaderboard?range=7d|31d

Current semantics:

ranked by total spend over the selected range
ties sort by request count, then user name
chart cohort is the top 5 ranked users
table is the top 30 ranked users
per-user tool-cardinality averages use only rows where each dimension was recorded, so historical nulls do not dilute averages
time buckets are 12-hour UTC buckets and are zero-filled for chart stability
dominant model is chosen by request count, then spend, then model key

Use the leaderboard to identify recent high-usage users. Use spend reporting when the question is about owner totals, budgets, or pricing status counts.

Agent Harness Usage

Harness usage is a separate admin observability surface from the user leaderboard.

Endpoint:

GET /api/v1/admin/observability/harness-usage?range=7d|31d

Current semantics:

ranked by request count over the selected range
chart cohort is the top 5 ranked harnesses
table is the top 30 ranked harnesses
time buckets are 12-hour UTC buckets and are zero-filled for chart stability
aggregation groups by agent_harness_key, not raw User-Agent
bounded raw User-Agent values remain available in request-log detail for debugging
harness classification is self-reported from User-Agent and is not authenticated client identity

Use Agent Harness Usage for the classifier contract and page behavior.

Request-log list filters:

page
page_size
request_id
model_key
provider_key
status_code
user_id
team_id
service
component
env
tag_key
tag_value

Current Gaps

deploy examples do not ship an OTLP collector by default

Relationship to Spend Reporting

Request logs and spend accounting are related, but intentionally separate.

request logs describe the user-visible request outcome
usage_cost_events is the canonical spend ledger

For the full request path across both systems, use request-lifecycle-and-failure-modes.md.

Observability and Request Logs ​

Observability Pages ​

Source of Truth ​

OTLP-First Model ​

Kubernetes Helm Wiring ​

What Gets Recorded ​

Tagging and Attribution ​

Request Log Storage Shape ​

Request Log Retention and Purge ​

Payload Policy ​

Redaction and Truncation Boundaries ​

Recent Contract Cleanup ​

Admin Observability APIs ​

Usage Leaderboard ​

Agent Harness Usage ​

Current Gaps ​

Relationship to Spend Reporting ​

Observability and Request Logs

Observability Pages

Source of Truth

OTLP-First Model

Kubernetes Helm Wiring

What Gets Recorded

Tagging and Attribution

Request Log Storage Shape

Request Log Retention and Purge

Payload Policy

Redaction and Truncation Boundaries

Recent Contract Cleanup

Admin Observability APIs

Usage Leaderboard

Agent Harness Usage

Current Gaps

Relationship to Spend Reporting