Skip to main content
INTG-STD-029v1.1.0MANDATORYINTEGRATIONstandard

Observability

Purpose

Every integration action - API call, event, file transfer, or agent handshake - MUST be traceable from origin to destination. This standard establishes mandatory requirements for distributed tracing (W3C Trace Context), structured logging (JSON), and metrics collection (OpenTelemetry) across all integration touchpoints. It also codifies what MUST NOT appear in logs to prevent data leakage per the OWASP Logging Cheat Sheet.

Rules

R-1: W3C Trace Context Propagation

Key W3C Trace Context terms:

  • traceparent — carries the trace ID (globally unique ID for the entire request chain), the parent span ID (ID of the immediate upstream operation), and trace flags (sampling state). Format: {version}-{trace-id}-{parent-id}-{trace-flags}.
  • tracestate — carries vendor-specific tracing metadata in key=value pairs, forwarded alongside traceparent through the chain.
  • Trace ID — 32 hex characters identifying the entire distributed operation across all services.
  • Span ID — 16 hex characters identifying a single unit of work within the trace.

All integration endpoints MUST propagate the traceparent HTTP header per the W3C Trace Context specification:

traceparent: {version}-{trace-id}-{parent-id}-{trace-flags}
Example: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01
  • Services MUST NOT generate all-zero trace-id or parent-id values.
  • If a request arrives without traceparent, the receiving service MUST generate a new trace context.
  • Services SHOULD propagate the tracestate header alongside traceparent.
  • Services MUST NOT modify or strip tracestate entries they do not own.
  • For non-HTTP transports, trace context MUST be propagated via the protocol's native metadata mechanism (gRPC metadata keys, Kafka message headers, AMQP application properties, batch file manifest metadata).

R-2: Correlation IDs

All API responses MUST include an X-Request-ID header containing a UUID v4 or ULID.

What is a ULID? A ULID (Universally Unique Lexicographically Sortable Identifier) is an alternative to UUID v4 that embeds a millisecond timestamp in its first 10 characters, making ULIDs sortable by creation time. Format: 26 uppercase characters from a Crockford Base32 alphabet (e.g., 01HZX3KQVB8E72GQJHF5RM6YWN). ULIDs are preferable to UUIDs where time-ordered identifiers are useful (e.g., tracing request sequences in logs).

  • If the incoming request includes X-Request-ID, the service MUST echo that value.
  • If the incoming request does not include X-Request-ID, the service MUST generate one.
  • The X-Request-ID MUST be included in all log entries for that request via the request_id field.
  • The X-Request-ID is distinct from trace-id: it is a business-level identifier MAY be shared with API consumers for support purposes, while trace-id is an internal tracing concern that MAY be regenerated at trust boundaries.

R-3: Structured Logging Format

All integration components MUST emit logs in JSON format. Unstructured log output MUST NOT be used beyond local development.

Required fields:

FieldTypeDescriptionExample
timestampstringISO 8601 with UTC (Z), microsecond precision"2026-03-28T14:32:01.482319Z"
levelstringLog severity (uppercase)"INFO"
trace_idstringW3C trace identifier (32 hex chars)"0af7651916cd43dd8448eb211c80319c"
span_idstringCurrent span identifier (16 hex chars)"b9c7c989f97918e1"
servicestringService name (lowercase, hyphenated)"order-service"
messagestringHuman-readable event description"Payment authorization completed"

Optional fields SHOULD be included when applicable: request_id, environment, version, operation, duration_ms, http.method, http.status_code, http.url (sensitive parameters redacted), error.type, error.message.

The service field MUST match the OpenTelemetry service.name resource attribute.

R-4: Prohibited Log Content

Log entries MUST NOT contain:

CategoryExamples
PIINames, emails, phone numbers, government IDs, dates of birth
Authentication credentialsPasswords, API keys, bearer tokens, JWTs, session IDs
Cryptographic materialPrivate keys, certificates, encryption keys
Financial dataFull card numbers, bank account numbers, CVV codes
Health dataMedical records, diagnoses, treatment information
Full request/response bodiesUse truncated or summarized representations instead

If debugging requires logging intersecting data, it MUST be masked before writing (e.g., "d***@example.com", "****-****-****-4242", "sk-prod-****", or log schema shape only: "body_keys: [\"name\", \"address\"]").

Log entries MUST be sanitized against log injection - newlines, control characters, and ANSI escape sequences MUST be escaped or stripped from user-supplied values (ref: OWASP CWE-117). Stack traces SHOULD only appear at ERROR or FATAL level and MUST be reviewed for leaked secrets.

R-5: Log Levels

Services MUST use the following log levels consistently:

LevelWhen to Use
FATALUnrecoverable failure; service cannot continue
ERROROperation failed; requires attention but service continues
WARNUnexpected condition that does not prevent operation
INFONormal operational events worth recording
DEBUGDiagnostic detail for troubleshooting
TRACEProtocol-level verbosity

Production environments MUST default to INFO. DEBUG and TRACE MUST be activatable at runtime without redeployment. ERROR MUST be reserved for conditions requiring investigation - client 4xx errors SHOULD be logged at WARN, not ERROR.

R-6: Metrics

All custom integration metrics MUST follow OpenTelemetry semantic naming conventions: dot-separated namespaces, lowercase, no units in names.

Required metrics for every integration endpoint:

Metric NameTypeUnitDescription
integration.request.durationHistogramsRequest-to-response time
integration.request.countCounter{request}Total requests
integration.request.error.countCounter{request}Failed requests
integration.request.activeUpDownCounter{request}In-flight requests

Unit notation: {request} and {event} use OpenTelemetry's curly-brace notation for dimensionless counts of domain-specific items. This follows the OpenTelemetry Semantic Conventions which distinguish between raw numbers and counts of specific things. {request} means "count of requests" (not a raw dimensionless number), which enables instrumentation tools to display units meaningfully (e.g., "142 requests" rather than "142").

Additional metrics for event-driven integrations:

Metric NameTypeUnitDescription
integration.event.publish.countCounter{event}Events published
integration.event.consume.countCounter{event}Events consumed
integration.event.consume.durationHistogramsEvent processing time
integration.event.consume.lagGauge{event}Consumer lag
integration.event.dlq.countCounter{event}Dead-letter queue events

All metrics MUST include resource attributes service.name, service.version, and deployment.environment.name. Common attributes MUST include integration.type, integration.target, network.protocol.name, and error.type where applicable.

R-7: Span Attributes

All integration operations MUST be instrumented as OpenTelemetry spans. Span names MUST follow protocol conventions (e.g., GET /api/v2/orders for HTTP, orders.created publish for messaging).

HTTP spans MUST include: http.request.method, http.response.status_code, url.path, server.address. Messaging spans MUST include: messaging.system, messaging.destination.name, messaging.operation.type.

Spans MUST set appropriate SpanKind: SERVER/CLIENT for HTTP/gRPC, PRODUCER/CONSUMER for messaging, INTERNAL for local processing. Span status MUST be set to ERROR on failure; HTTP 5xx MUST set span error status, 4xx SHOULD NOT.

R-8: Audit Traceability

Every state-changing integration operation MUST produce an INFO log entry including at minimum: trace_id, span_id, request_id, operation, service, and outcome. It MUST be possible to reconstruct the complete execution path of any transaction using trace_id across all participating services. Audit-relevant entries MUST be retained per the organization's data retention policy.

Examples

Structured Log Entry

{
"timestamp": "2026-03-28T14:32:01.482319Z",
"level": "INFO",
"trace_id": "0af7651916cd43dd8448eb211c80319c",
"span_id": "b9c7c989f97918e1",
"service": "order-service",
"request_id": "f47ac10b-58cc-4372-a567-0e02b2c3d479",
"operation": "createOrder",
"http.method": "POST",
"http.status_code": 201,
"duration_ms": 142.7,
"message": "Order created successfully"
}

Trace Context Headers

GET /api/v2/orders/12345 HTTP/1.1
Host: order-service.internal
traceparent: 00-0af7651916cd43dd8448eb211c80319c-b9c7c989f97918e1-01
tracestate: vendorA=eyJhbGciOiJIUzI,vendorB=abc123
X-Request-ID: f47ac10b-58cc-4372-a567-0e02b2c3d479

Enforcement Rules

  • Gateway enforcement: API gateways MUST generate traceparent and X-Request-ID for incoming external requests that lack them. Internal service-to-service requests missing traceparent SHOULD be flagged.
  • Build-time enforcement: CI/CD pipelines MUST validate that all log output is valid JSON with required fields (timestamp, level, trace_id, span_id, service, message), ISO 8601 timestamps, and valid log levels. Non-JSON log output MUST fail validation.
  • Runtime enforcement: Log aggregation systems SHOULD reject or quarantine entries missing required fields.
  • Security enforcement: Log pipelines SHOULD include automated PII/credential pattern detection. Matches MUST trigger security team alerts. Repeated violations MAY result in deployment blocks.
  • Correlation ID check: API gateways or integration test suites MUST verify all responses include X-Request-ID.

Validation patterns:

  • traceparent: ^00-[0-9a-f]{32}-[0-9a-f]{16}-[0-9a-f]{2}$
  • X-Request-ID (UUID v4): ^[0-9a-f]{8}-[0-9a-f]{4}-4[0-9a-f]{3}-[89ab][0-9a-f]{3}-[0-9a-f]{12}$
  • X-Request-ID (ULID): ^[0-9A-HJKMNP-TV-Z]{26}$
  • Timestamp (ISO 8601 UTC): ^\d{4}-\d{2}-\d{2}T\d{2}:\d{2}:\d{2}(\.\d+)?Z$

References

Rationale

W3C Trace Context over proprietary headers - Vendor-neutral W3C Recommendation supported by all major observability platforms, preventing lock-in and ensuring partner interoperability.

JSON structured logging - Machine-parseable without custom grammars, natively supported by all major log aggregation platforms, and enables field-level indexing for correlation across services.

Separate X-Request-ID from trace-id - The trace-id is an internal tracing concern that may be regenerated at trust boundaries; X-Request-ID is a business-facing identifier consumers can reference in support tickets.

Prohibit PII in logs - Logs are stored with broader access controls than production databases, making aggregated log stores high-value targets (ref: OWASP CWE-532). Prevention is far more effective than post-hoc redaction.

OpenTelemetry naming conventions - CNCF-backed industry standard ensuring metrics and spans from different teams, languages, and frameworks are consistent and correlatable without manual mapping.

Version History

VersionDateChange
1.0.02026-03-28Initial definition
1.1.02026-04-10R-1: added W3C Trace Context term definitions; R-2: defined ULID and converted to bullet format; R-6: added explanation of OTel curly-brace unit notation