Appearance
Observability and Request Logs
See also: Request Logs, MCP Invocations, MCP Registry and Discovery, Tagging, Agent Harness Usage, Data Relationships, Service Accounts, Model Routing and API Behavior, Provider API Compatibility, Request Lifecycle and Failure Modes, Admin Control Plane, Deploy and Operations, ADR: Team Service Accounts for Non-Human Gateway Access, ADR: OTLP-First Observability and Payload-Backed Request Logs, ADR: Route-Level Provider API Compatibility Profiles, ADR: MCP Tool Cardinality Observability, ADR: External MCP Registry and Discovery Boundary
This document describes the live observability contract for the gateway.
Observability Pages
- Request Logs: request-scoped admin list/detail, payload policy state, provider attempts, and request-level tool cardinality.
- MCP Invocations: per-tool MCP audit rows, request correlation, owner context, authorization policy result, latency, and redacted argument/result metadata.
- MCP Registry and Discovery: platform-admin registry management, recommended catalog import flow, Streamable HTTP discovery, auth declarations, and stable MCP server/tool ids.
Source of Truth
- observability bootstrap:
- HTTP request instrumentation:
- request-log lifecycle:
- request-attempt persistence:
- redaction policy:
- admin APIs:
OTLP-First Model
The gateway exports tracing spans and metrics through OpenTelemetry.
Current config knobs:
server.otel_endpointserver.otel_metrics_endpointserver.otel_export_interval_secs
The intended deploy path is collector-friendly OTLP export rather than an in-process Prometheus endpoint.
Kubernetes Helm Wiring
The Helm chart keeps observability generic. It does not install an OpenTelemetry Collector, Datadog Agent, or any other collector.
Kubernetes installs can use:
gateway.config.server.otel_endpointgateway.config.server.otel_metrics_endpointgateway.config.server.otel_export_interval_secsobservability.envobservability.podLabelsobservability.podAnnotationsobservability.volumesobservability.volumeMountsobservability.sidecars
Common shapes:
- existing in-cluster collector
Service: set OTLP endpoints ingateway.config.server.* - DaemonSet collector or vendor agent: point OTLP endpoints at the node-local or service DNS address documented by that deployment
- sidecar collector: add the collector container with
observability.sidecarsand point OTLP endpoints atlocalhost - annotation-driven scraping or injection: set the required labels and annotations through
observability.podLabelsandobservability.podAnnotations
Use Kubernetes and Helm for chart-level values and examples.
OpenTelemetry Collector service example:
yaml
gateway:
config:
server:
otel_endpoint: http://otel-collector.observability.svc:4317
otel_metrics_endpoint: http://otel-collector.observability.svc:4317
observability:
env:
- name: OTEL_SERVICE_NAME
value: oceans-llm-gatewayDatadog Agent OTLP example:
yaml
gateway:
config:
server:
otel_endpoint: http://datadog-agent.datadog.svc:4317
otel_metrics_endpoint: http://datadog-agent.datadog.svc:4317
observability:
podLabels:
tags.datadoghq.com/service: oceans-llm-gateway
tags.datadoghq.com/env: productionWhat Gets Recorded
The runtime emits bounded request-level signals for:
- API request totals
- request latency
- request outcomes
- token totals
- priced spend metric totals
- usage-record totals by pricing status
- request tool-cardinality histograms
- caller request tags for filtering and attribution; see Tagging
Request correlation is anchored on x-request-id. The HTTP middleware boundary owns request-id generation and propagation: caller-provided values are preserved, and missing values are generated once before handlers run.
Request outcomes are emitted once per request with bounded labels. Important examples in this slice are:
budget_errorfor pre-provider hard-limit rejectioninvalid_requestfor capability mismatchupstream_errorfor upstream execution or stream failure
Tagging and Attribution
The request-log surface records caller-supplied request tags for filtering and attribution. The tag header contract, validation rules, examples, and identity tag guidance live in Tagging.
Request tags are request-scoped. User and team tags are durable identity metadata managed by admins. Keep those two surfaces distinct when exporting or reconciling observability data.
Request Log Storage Shape
Request logs are intentionally split:
request_logs- hot summary row
request_log_payloads- sanitized request and response bodies
request_log_tags- bounded bespoke caller tags
request_log_attempts- ordered upstream provider execution attempts
The summary row stores:
- request identity
- owner identity
- service-account identity for non-human callers
- requested and resolved model identity
- provider key
- universal caller tags
- status, latency, and usage totals
- typed MCP and tool cardinality counts
- bounded raw
User-Agentand normalized agent harness key/label - truncation flags
- metadata such as
operation,stream, andpayload_policy
operation is the public API family. Current values include chat_completions, responses, and embeddings.
Request-attempt rows describe upstream provider execution only. Pre-provider failures such as authentication rejection, capability mismatch, route unavailability, or budget hard-limit rejection have zero attempts. In the current runtime, successful provider-backed requests record one terminal attempt. Retry and fallback execution remain disabled until the configurable policy tracked in issue #118 is implemented.
Tool-cardinality fields are explicit nullable columns on request_logs.
exposed_tool_count: shallow count of OpenAI-compatible request tools.invoked_tool_count: count of tool-call artifacts observed in normalized provider output.referenced_mcp_server_count: nullable until an MCP access/filtering layer records server exposure.filtered_tool_count: nullable until an MCP access/filtering layer records filtered or denied tools.
New Chat Completions and Responses rows record 0 for exposed and invoked counts when no tools are present. Historical rows and unavailable MCP-specific dimensions remain null. Admin surfaces render null as n/a and preserve real zeroes.
Streaming requests persist a bounded transcript payload rather than raw transport bytes.
The stream payload contract is incremental rather than chunk-local:
- UTF-8 is reassembled across transport chunk boundaries
- SSE
data:frames are reassembled across chunk boundaries - both
data:anddata:forms are accepted - the latest coherent
usageobject is retained for request-log and ledger work - Responses streams also retain usage from
response.usageon completed response events - streaming tool-call artifacts increment
invoked_tool_countwhile SSE frames are parsed for request logging
Request-log payloads are user-visible artifacts. They do not persist the transformed outbound provider request body produced by route compatibility profiles.
Provider stream transcripts can include normalized compatibility output, such as promoted usage or canonical reasoning deltas, because that normalized stream is what the gateway returns to callers. Responses streams preserve response.* event names and payloads rather than being rewritten into Chat Completions chunks.
Request Log Retention and Purge
Request-log retention is admin-controlled. The supported retention windows are intentionally small and explicit:
1d3d7d
The default retention window is 7d. Admins can run the purge command manually before enabling any recurring cleanup:
bash
mise run gateway-purge-request-logs-dry-run
mise run gateway-purge-request-logsUse --dry-run first in production-shaped environments. A dry run reports how many parent request-log rows are older than the selected retention cutoff without deleting data.
When the command runs without --dry-run, it deletes matching request_logs rows and their request-log children:
request_log_payloadsrequest_log_tagsrequest_log_attempts
Admins should not hand-delete only one request-log table. Manual partial deletion can leave observability detail misleading even when database constraints prevent direct orphan rows.
Recurring purge is disabled by default and must be opted into from config. Use a standard cron expression and keep the schedule daily or less frequent:
yaml
request_logging:
purge:
enabled: false
retention: 7d
schedule: "0 0 * * *"Runtime safety rules:
enableddefaults tofalseretentiondefaults to7d- only
1d,3d, and7dare valid windows scheduleuses standard 5-field cron syntax- recurring schedules must not be more frequent than daily
- each gateway process starts its own recurring worker when enabled
- the runtime keeps a UTC-day guard so a recurring worker cannot purge more than once per day even if a bad schedule is supplied
Retention only affects operational request-log tables. It does not delete spend ledger rows in usage_cost_events, budget history, provider config, model config, users, teams, or API keys.
Payload Policy
Chat-completion request-log payload persistence is controlled by request_logging.payloads in gateway.yaml.
Default config:
yaml
request_logging:
payloads:
capture_mode: redacted_payloads
request_max_bytes: 65536
response_max_bytes: 65536
stream_max_events: 128
redaction_paths: []Capture modes:
disabled: skip request-log persistence for chat completionssummary_only: writerequest_logssummary rows withhas_payload=false; do not writerequest_log_payloadsredacted_payloads: write summary rows and sanitized payload rows
The policy is read from YAML only. The admin UI displays the policy used for each row, but does not edit it.
Owner behavior also matters:
- user-owned API keys honor
users.request_logging_enabled - service-account credentials always persist request-log summary rows
This is why a user-owned request can be absent from request logs while a service-account request with the same payload policy is still visible.
Validation rules:
request_max_bytesmust be greater than zeroresponse_max_bytesmust be greater than zerostream_max_eventsmust be greater than zeroredaction_pathsmust use dot-separated object keys, with*as a full-segment wildcard- paths are anchored from the wrapped payload root, for example
body.messages.*.content.*.image_url.url
Each request-log row persists lightweight policy metadata in request_logs.metadata_json:
json
{
"operation": "chat_completions",
"stream": false,
"payload_policy": {
"capture_mode": "redacted_payloads",
"request_max_bytes": 65536,
"response_max_bytes": 65536,
"stream_max_events": 128,
"version": "builtin:v1"
}
}Redaction and Truncation Boundaries
Payloads are wrapped before policy application:
- requests:
{ "headers": ..., "body": ... } - responses:
{ "body": ... } - streams:
{ "stream": true, "events": ..., "usage": ..., "error": ... }
Redaction applies one explicit built-in policy plus additive admin-configured paths from request_logging.payloads.redaction_paths.
Sensitive built-in headers include:
authorizationanthropic-api-keycookieset-cookiex-goog-api-keyx-api-key
Sensitive built-in JSON keys include:
tokenaccess_tokenrefresh_tokenapi_keyanthropic_api_keyclient_secretcredentialsprivate_keysecretpassword
Known bulky provider fields are shape-preserving truncated before the whole-payload byte budget is applied. Built-ins cover OpenAI-compatible image/audio/file payloads, Vertex Gemini inline data, and Vertex Anthropic base64 source data.
Processing order:
- wrap the payload
- apply built-in and admin-configured redaction rules
- truncate known bulky fields while preserving JSON shape where possible
- apply
request_max_bytesorresponse_max_bytesas a final guardrail
For streams, the gateway keeps parsing every frame for usage and provider errors. Only stored event payloads are capped by stream_max_events; if the cap is hit, response_payload_truncated=true.
Recent Contract Cleanup
Recent cleanup changed the contract in a few important ways.
- fallback-era request metadata is gone
- provider execution attempts now live in
request_log_attemptsinstead of summary metadata - missing request-log detail rows return strict
404 not_found - stream payload parsing is more boundary-safe than the earlier chunk-by-chunk behavior
- budget-rejected chat requests record a
budget_errorrequest outcome without executing the provider
Admins and maintainers should stop expecting:
- fallback metadata columns to appear in new request rows
- nullable detail lookups for missing rows
Admin Observability APIs
Platform admins can inspect request logs through:
GET /api/v1/admin/observability/leaderboardGET /api/v1/admin/observability/harness-usageGET /api/v1/admin/observability/request-logsGET /api/v1/admin/observability/request-logs/{request_log_id}
Request-log list and detail responses include the row metadata, so admins can see the public operation for each row, such as chat_completions, responses, or embeddings, alongside the typed payload policy and truncation fields.
The MCP invocation admin UI consumes these generated admin API endpoints:
GET /api/v1/admin/observability/mcp-invocationsGET /api/v1/admin/observability/mcp-invocations/{mcp_tool_invocation_id}
Validate documentation-only edits to this page with mise run docs:check before handoff.
Usage Leaderboard
The leaderboard is a separate admin observability surface from spend reporting.
Endpoint:
GET /api/v1/admin/observability/leaderboard?range=7d|31d
Current semantics:
- ranked by total spend over the selected range
- ties sort by request count, then user name
- chart cohort is the top 5 ranked users
- table is the top 30 ranked users
- per-user tool-cardinality averages use only rows where each dimension was recorded, so historical nulls do not dilute averages
- time buckets are 12-hour UTC buckets and are zero-filled for chart stability
- dominant model is chosen by request count, then spend, then model key
Use the leaderboard to identify recent high-usage users. Use spend reporting when the question is about owner totals, budgets, or pricing status counts.
Agent Harness Usage
Harness usage is a separate admin observability surface from the user leaderboard.
Endpoint:
GET /api/v1/admin/observability/harness-usage?range=7d|31d
Current semantics:
- ranked by request count over the selected range
- chart cohort is the top 5 ranked harnesses
- table is the top 30 ranked harnesses
- time buckets are 12-hour UTC buckets and are zero-filled for chart stability
- aggregation groups by
agent_harness_key, not rawUser-Agent - bounded raw
User-Agentvalues remain available in request-log detail for debugging - harness classification is self-reported from
User-Agentand is not authenticated client identity
Use Agent Harness Usage for the classifier contract and page behavior.
Request-log list filters:
pagepage_sizerequest_idmodel_keyprovider_keystatus_codeuser_idteam_idservicecomponentenvtag_keytag_value
Current Gaps
- deploy examples do not ship an OTLP collector by default
Relationship to Spend Reporting
Request logs and spend accounting are related, but intentionally separate.
- request logs describe the user-visible request outcome
usage_cost_eventsis the canonical spend ledger
For the full request path across both systems, use request-lifecycle-and-failure-modes.md.
