title | subtitle | description | context | |
---|---|---|---|---|
Router Telemetry |
Collect observable data to monitor your router and supergraph |
Observe and monitor the health and performance of GraphQL operations in the Apollo GraphOS Router or Apollo Router Core by collecting and exporting telemetry logs, metrics, and traces. |
|
import TelemetryPerformanceNote from '../../../../shared/telemetry-performance.mdx';
Since the router is the single access point for all traffic to and from your graph, router telemetry is the most comprehensive way to observe your supergraph. By implementing telemetry, you can:
- Monitor your supergraph's health and performance
- Diagnose issues and deduce root causes
- Optimize resource usage and system reliability
To understand how router telemetry fits into the broader set of GraphOS observability tooling, see the observability overview.
By default, the router doesn't collect or export any telemetry beyond the operation and field usage metrics it sends to GraphOS. You configure which additional telemetry data to collect and where to export it via your router's configuration file.
The router request lifecycle is the primary data source for telemetry data or signals. Telemetry signals include logs, metrics, and traces. The section on router telemetry signals explains these data types and gives basic configuration examples. Exporters are responsible for sending telemetry data to your application performance monitoring (APM) and observability tools for storage, visualization, and analysis.
flowchart LR
subgraph Router
lifecycle("Request Lifecycle<br/>(telemetry sources)")
exporters("Logs, Metrics,<br/>Traces Exporters")
lifecycle-->exporters
end
apms["APM, agent,<br/>or collector"]
exporters--"OTLP"-->apms
The router emits telemetry in the industry-standard OpenTelemetry Protocol (OTLP) format and is therefore compatible with many APM tools, including:
- Prometheus
- OpenTelemetry Collector
- Datadog
- New Relic
- Jaeger
- Zipkin
Attributes and selectors are key-value pairs that add contextual information from the router request lifecycle to telemetry data. You can use attributes and selectors to annotate events, metrics, and spans so they can help you filter and group data in your APMs.
The router supports a set of standard attributes from OpenTelemetry semantic conventions. Example attributes include:
- HTTP status code
- GraphQL operation name
- Subgraph name
Selectors allow you to define custom data points based on the router's request lifecycle.
Description | |
---|---|
Attribute | Standard data points that can be attached to spans, instruments, and events. |
Selector | Custom data points extracted from the router's request lifecycle, tailored to specific needs. |
The router supports three signal types for collecting and exporting telemetry:
Signal | Description |
---|---|
Logs and events |
|
Metrics and instruments |
|
Traces and spans |
|
These mechanisms let you collect data about the inner workings of your router and graph and export them accordingly.
Logs record events in the router's request lifecycle. Examples of logged events include:
- Information about the router lifecycle
- Warnings about misconfiguration
- Errors that occurred during a request
You can log events to standard output in either text or JSON format. Logs can also be consumed by logging exporters and as part of spans via tracing exporters.
flowchart LR
Router --"Emits logs in<br/>text or JSON format"--> stdout
stdout --"Exports logs"--> log_store
log_store[("Log store")]
This configuration snippet enables stdout logging in JSON:
telemetry:
exporters:
logging:
stdout:
enabled: true
format: json
Metrics are measurements of the router's behavior that are collected and often analyzed over time to identify trends. Examples of router metrics include the number of incoming HTTP requests and the time spent processing a request.
Instruments define how to collect and report metrics. Different kinds of instruments include counters, gauges, and histograms. For example, given the metric "number of incoming HTTP requests," a counter records the total number of requests, a histogram captures the distribution of request counts over time, and a gauge provides a snapshot of the current request count at a given moment.
Metric instruments fall into three categories:
Instrument Type | Description |
---|---|
OTEL instruments |
Standard OpenTelemetry instruments around the HTTP lifecycle, including:
|
Router instruments |
Standard instruments for the router request life cycle, including:
|
Custom instrument | Custom instruments defined in the router request life cycle. |
This configuration snippet enables OTEL instrumentation for a histogram of request body sizes:
telemetry:
instrumentation:
instruments:
router:
http.server.request.body.size: true
See Instruments for an overview of available instruments and a guide for configuring and customizing instruments.
In addition to the operation metrics and field usage metrics that GraphOS Router sends to GraphOS, you can configure the router with metric exporters for other observability tools and APMs.
flowchart LR
Router --"OTEL<br/>metrics"--> APM
Router --"Usage/Performance<br/>metrics"--> GraphOS
This configuration snippet enables exporting metrics to Prometheus:
telemetry:
exporters:
metrics:
prometheus:
enabled: true
listen: 127.0.0.1:9090
path: /metrics
Learn more about sending metrics to Prometheus and metric exporters in general.
Traces help you monitor the flow of a request through the router. A trace is composed of spans. A span captures a request's duration as it flows through the router request lifecycle. Spans may include contextual information about the request, such as the HTTP status code or the name of the subgraph being queried.
Examples of spans include:
- router - Wraps an entire request from the HTTP perspective
- supergraph - Wraps a request once GraphQL parsing has taken place
- subgraph - Wraps a request to a subgraph.
If you've enabled federated tracing (also known as FTV1 tracing) in your subgraph libraries, the router sends field-level traces to GraphOS. Additionally, trace exporters can consume and report traces to your APM.
flowchart LR
Router --"OTEL<br/>traces"--> APM
Router --"FTV1 Data"--> GraphOS
This configuration snippet enables
- setting attributes that Datadog uses to organize its APM view
- exporting traces to a Datadog agent:
telemetry:
instrumentation:
spans:
mode: spec_compliant
router:
attributes:
otel.name: router
operation.name: "router"
resource.name:
request_method: true
supergraph:
attributes:
otel.name: supergraph
operation.name: "supergraph"
resource.name:
operation_name: string
subgraph:
attributes:
otel.name: subgraph
operation.name: "subgraph"
resource.name:
subgraph_operation_name: string
exporters:
tracing:
otlp:
enabled: true
endpoint: "${env.DATADOG_AGENT_HOST}:4317"
Learn more about sending traces to DataDog and trace exporters in general.
Effective telemetry provides just the right amount and granularity of information to maintain your graph. Too much data can overwhelm your system, for example, with high cardinality metrics. Too little may not provide enough information to debug issues.
Specific events that need to be captured—and the conditions under which they need to be captured—can change as client applications and graphs change. Different environments, such as production and development, can have different observability requirements.
Router telemetry is customizable to meet the observability needs of different graphs. Keep in mind your particular environments' and graphs' requirements when configuring your telemetry.
You can set conditions for instruments and events to only collect telemetry data when necessary. This configuration snippet enables only collecting the configured telemetry data when the request_header
is equal to "example-value":
eq:
- "example-value"
- request_header: x-req-header
You can use metric exporters' view
property with the drop
aggregation to remove certain metrics from being sent to your APM. This configuration snippet removes all instruments that begin with apollo_router
:
telemetry:
exporters:
metrics:
common:
service_name: apollo-router
views:
- name: apollo_router*
aggregation: drop
Consult the following documentation for details on how to configure the various telemetry mechanisms and exporters: