Skip to content

Commit a0a4bc5

Browse files
feat: add option to enable ApiTracer (#3095)
* feat: add option to enable ApiTracer Adds an option to enable an ApiTracer for the client. An ApiTracer will add traces for all RPCs that are being executed. The traces will only be exported if an OpenTelemetry or OpenCensus trace exporter has been configured. An ApiTracer adds additional detail information about the time that a single RPC took. It can also be used to determine whether an RPC was retried or not. * feat: add gfe-latency as attribute to span * fix: get span at request start * test: add tests for ApiTracer * test: fix tests when using mux sessions * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * chore: cleanup * build: declare test dependency that is being used * chore: remove TODOs * chore: add env var for enabling api tracing * chore: add clirr ignored diff * chore: address review comments * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * test: add OpenCensus ApiTracer tests * 🦉 Updates from OwlBot post-processor See https://github.com/googleapis/repo-automation-bots/blob/main/packages/owl-bot/README.md * fix: clear spans after test * fix: ignore test traces coming from a previous test * test: skip OpenCensus ApiTracer test as OpenCensus is way too intrusive * test: increase timeout to reduce flakiness --------- Co-authored-by: Owl Bot <gcf-owl-bot[bot]@users.noreply.github.com>
1 parent 34edbd9 commit a0a4bc5

17 files changed

+1702
-21
lines changed

.readme-partials.yaml

+16
Original file line numberDiff line numberDiff line change
@@ -166,6 +166,22 @@ custom_content: |
166166
This option can also be enabled by setting the environment variable
167167
`SPANNER_ENABLE_EXTENDED_TRACING=true`.
168168
169+
#### OpenTelemetry API Tracing
170+
You can enable tracing of each API call that the Spanner client executes with the `enableApiTracing`
171+
option. These traces also include any retry attempts for an API call:
172+
173+
```
174+
SpannerOptions options = SpannerOptions.newBuilder()
175+
.setOpenTelemetry(openTelemetry)
176+
.setEnableApiTracing(true)
177+
.build();
178+
```
179+
180+
This option can also be enabled by setting the environment variable
181+
`SPANNER_ENABLE_API_TRACING=true`.
182+
183+
> Note: The attribute keys that are used for additional information about retry attempts and the number of requests might change in a future release.
184+
169185
### Instrument with OpenCensus
170186
171187
> Note: OpenCensus project is deprecated. See [Sunsetting OpenCensus](https://opentelemetry.io/blog/2023/sunsetting-opencensus/).

README.md

+19-3
Original file line numberDiff line numberDiff line change
@@ -57,13 +57,13 @@ implementation 'com.google.cloud:google-cloud-spanner'
5757
If you are using Gradle without BOM, add this to your dependencies:
5858

5959
```Groovy
60-
implementation 'com.google.cloud:google-cloud-spanner:6.68.0'
60+
implementation 'com.google.cloud:google-cloud-spanner:6.68.1'
6161
```
6262

6363
If you are using SBT, add this to your dependencies:
6464

6565
```Scala
66-
libraryDependencies += "com.google.cloud" % "google-cloud-spanner" % "6.68.0"
66+
libraryDependencies += "com.google.cloud" % "google-cloud-spanner" % "6.68.1"
6767
```
6868
<!-- {x-version-update-end} -->
6969

@@ -272,6 +272,22 @@ SpannerOptions options = SpannerOptions.newBuilder()
272272
This option can also be enabled by setting the environment variable
273273
`SPANNER_ENABLE_EXTENDED_TRACING=true`.
274274

275+
#### OpenTelemetry API Tracing
276+
You can enable tracing of each API call that the Spanner client executes with the `enableApiTracing`
277+
option. These traces also include any retry attempts for an API call:
278+
279+
```
280+
SpannerOptions options = SpannerOptions.newBuilder()
281+
.setOpenTelemetry(openTelemetry)
282+
.setEnableApiTracing(true)
283+
.build();
284+
```
285+
286+
This option can also be enabled by setting the environment variable
287+
`SPANNER_ENABLE_API_TRACING=true`.
288+
289+
> Note: The attribute keys that are used for additional information about retry attempts and the number of requests might change in a future release.
290+
275291
### Instrument with OpenCensus
276292

277293
> Note: OpenCensus project is deprecated. See [Sunsetting OpenCensus](https://opentelemetry.io/blog/2023/sunsetting-opencensus/).
@@ -671,7 +687,7 @@ Java is a registered trademark of Oracle and/or its affiliates.
671687
[kokoro-badge-link-5]: http://storage.googleapis.com/cloud-devrel-public/java/badges/java-spanner/java11.html
672688
[stability-image]: https://img.shields.io/badge/stability-stable-green
673689
[maven-version-image]: https://img.shields.io/maven-central/v/com.google.cloud/google-cloud-spanner.svg
674-
[maven-version-link]: https://central.sonatype.com/artifact/com.google.cloud/google-cloud-spanner/6.68.0
690+
[maven-version-link]: https://central.sonatype.com/artifact/com.google.cloud/google-cloud-spanner/6.68.1
675691
[authentication]: https://github.com/googleapis/google-cloud-java#authentication
676692
[auth-scopes]: https://developers.google.com/identity/protocols/oauth2/scopes
677693
[predefined-iam-roles]: https://cloud.google.com/iam/docs/understanding-roles#predefined_roles

google-cloud-spanner/clirr-ignored-differences.xml

+7
Original file line numberDiff line numberDiff line change
@@ -688,6 +688,13 @@
688688
<method>boolean isEnableExtendedTracing()</method>
689689
</difference>
690690

691+
<!-- Added API tracing option -->
692+
<difference>
693+
<differenceType>7012</differenceType>
694+
<className>com/google/cloud/spanner/SpannerOptions$SpannerEnvironment</className>
695+
<method>boolean isEnableApiTracing()</method>
696+
</difference>
697+
691698
<!-- Added ExcludeTxnFromChangeStreams -->
692699
<difference>
693700
<differenceType>7012</differenceType>
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,274 @@
1+
/*
2+
* Copyright 2024 Google LLC
3+
*
4+
* Licensed under the Apache License, Version 2.0 (the "License");
5+
* you may not use this file except in compliance with the License.
6+
* You may obtain a copy of the License at
7+
*
8+
* https://www.apache.org/licenses/LICENSE-2.0
9+
*
10+
* Unless required by applicable law or agreed to in writing, software
11+
* distributed under the License is distributed on an "AS IS" BASIS,
12+
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
* See the License for the specific language governing permissions and
14+
* limitations under the License.
15+
*/
16+
17+
package com.google.cloud.spanner;
18+
19+
import com.google.api.gax.tracing.ApiTracer;
20+
import com.google.api.gax.tracing.ApiTracerFactory.OperationType;
21+
import com.google.common.base.Preconditions;
22+
import io.opentelemetry.api.common.AttributeKey;
23+
import io.opentelemetry.api.common.Attributes;
24+
import io.opentelemetry.api.common.AttributesBuilder;
25+
import io.opentelemetry.api.trace.Span;
26+
import io.opentelemetry.api.trace.StatusCode;
27+
import java.util.concurrent.atomic.AtomicLong;
28+
import javax.annotation.Nonnull;
29+
import javax.annotation.Nullable;
30+
import org.threeten.bp.Duration;
31+
32+
/**
33+
* {@link com.google.api.gax.tracing.ApiTracer} for use with OpenTelemetry. Based on {@link
34+
* com.google.api.gax.tracing.OpencensusTracer}.
35+
*/
36+
class OpenTelemetryApiTracer implements ApiTracer {
37+
/** The attribute keys that are used by this tracer might change in a future release. */
38+
private final AttributeKey<Long> ATTEMPT_COUNT_KEY = AttributeKey.longKey("attempt.count");
39+
40+
private final AttributeKey<Long> TOTAL_REQUEST_COUNT_KEY =
41+
AttributeKey.longKey("total_request_count");
42+
private final AttributeKey<Long> TOTAL_RESPONSE_COUNT_KEY =
43+
AttributeKey.longKey("total_response_count");
44+
private final AttributeKey<String> EXCEPTION_MESSAGE_KEY =
45+
AttributeKey.stringKey("exception.message");
46+
private final AttributeKey<Long> ATTEMPT_NUMBER_KEY = AttributeKey.longKey("attempt.number");
47+
private final AttributeKey<Long> ATTEMPT_REQUEST_COUNT_KEY =
48+
AttributeKey.longKey("attempt.request_count");
49+
private final AttributeKey<Long> ATTEMPT_RESPONSE_COUNT_KEY =
50+
AttributeKey.longKey("attempt.response_count");
51+
private final AttributeKey<String> CONNECTION_ID_KEY = AttributeKey.stringKey("connection");
52+
private final AttributeKey<Long> RETRY_DELAY_KEY = AttributeKey.longKey("delay_ms");
53+
private static final AttributeKey<Long> BATCH_SIZE_KEY = AttributeKey.longKey("batch.size");
54+
private static final AttributeKey<Long> BATCH_COUNT_KEY = AttributeKey.longKey("batch.count");
55+
56+
private final Span span;
57+
private final OperationType operationType;
58+
59+
private volatile String lastConnectionId;
60+
private volatile long currentAttemptId;
61+
private final AtomicLong attemptSentMessages = new AtomicLong(0);
62+
private long attemptReceivedMessages = 0;
63+
private final AtomicLong totalSentMessages = new AtomicLong(0);
64+
private long totalReceivedMessages = 0;
65+
66+
OpenTelemetryApiTracer(@Nonnull Span span, @Nonnull OperationType operationType) {
67+
this.span = Preconditions.checkNotNull(span);
68+
this.operationType = Preconditions.checkNotNull(operationType);
69+
}
70+
71+
Span getSpan() {
72+
return this.span;
73+
}
74+
75+
@Override
76+
public Scope inScope() {
77+
final io.opentelemetry.context.Scope openTelemetryScope = span.makeCurrent();
78+
return openTelemetryScope::close;
79+
}
80+
81+
@Override
82+
public void operationSucceeded() {
83+
span.setAllAttributes(baseOperationAttributes());
84+
span.setStatus(StatusCode.OK);
85+
span.end();
86+
}
87+
88+
@Override
89+
public void operationCancelled() {
90+
span.setAllAttributes(baseOperationAttributes());
91+
span.setStatus(StatusCode.ERROR, "Cancelled by caller");
92+
span.end();
93+
}
94+
95+
@Override
96+
public void operationFailed(Throwable error) {
97+
span.setAllAttributes(baseOperationAttributes());
98+
span.setStatus(StatusCode.ERROR, error.getMessage());
99+
span.end();
100+
}
101+
102+
@Override
103+
public void lroStartFailed(Throwable error) {
104+
span.addEvent(
105+
"Operation failed to start", Attributes.of(EXCEPTION_MESSAGE_KEY, error.getMessage()));
106+
span.setStatus(StatusCode.ERROR, error.getMessage());
107+
span.end();
108+
}
109+
110+
@Override
111+
public void lroStartSucceeded() {
112+
span.addEvent("Operation started");
113+
}
114+
115+
@Override
116+
public void connectionSelected(String id) {
117+
lastConnectionId = id;
118+
}
119+
120+
@Override
121+
public void attemptStarted(int attemptNumber) {
122+
attemptStarted(null, attemptNumber);
123+
}
124+
125+
@Override
126+
public void attemptStarted(@Nullable Object request, int attemptNumber) {
127+
currentAttemptId = attemptNumber;
128+
attemptSentMessages.set(0);
129+
attemptReceivedMessages = 0;
130+
131+
// Attempts start counting a zero, so more than zero indicates a retry.
132+
if (attemptNumber > 0 && operationType != OperationType.LongRunning) {
133+
// Add an event if the RPC retries, as this is otherwise transparent to the user. Retries
134+
// would then show up as higher latency without any logical explanation.
135+
span.addEvent("Starting RPC retry " + attemptNumber);
136+
} else if (operationType == OperationType.LongRunning) {
137+
span.addEvent("Starting poll attempt " + attemptNumber);
138+
}
139+
}
140+
141+
@Override
142+
public void attemptSucceeded() {
143+
Attributes attributes = baseAttemptAttributes();
144+
145+
// Same infrastructure is used for both polling and retries, so need to disambiguate it here.
146+
if (operationType == OperationType.LongRunning) {
147+
span.addEvent("Polling completed", attributes);
148+
} else {
149+
span.addEvent("Attempt succeeded", attributes);
150+
}
151+
}
152+
153+
@Override
154+
public void attemptCancelled() {
155+
Attributes attributes = baseAttemptAttributes();
156+
157+
// Same infrastructure is used for both polling and retries, so need to disambiguate it here.
158+
if (operationType == OperationType.LongRunning) {
159+
span.addEvent("Polling was cancelled", attributes);
160+
} else {
161+
span.addEvent("Attempt cancelled", attributes);
162+
}
163+
lastConnectionId = null;
164+
}
165+
166+
@Override
167+
public void attemptFailed(Throwable error, Duration delay) {
168+
AttributesBuilder builder = baseAttemptAttributesBuilder();
169+
if (delay != null) {
170+
builder.put(RETRY_DELAY_KEY, delay.toMillis());
171+
}
172+
if (error != null) {
173+
builder.put(EXCEPTION_MESSAGE_KEY, error.getMessage());
174+
}
175+
Attributes attributes = builder.build();
176+
177+
// Same infrastructure is used for both polling and retries, so need to disambiguate it here.
178+
if (operationType == OperationType.LongRunning) {
179+
// The poll RPC was successful, but it indicated that the operation is still running.
180+
span.addEvent("Scheduling next poll", attributes);
181+
} else {
182+
span.addEvent("Attempt failed, scheduling next attempt", attributes);
183+
}
184+
lastConnectionId = null;
185+
}
186+
187+
@Override
188+
public void attemptFailedRetriesExhausted(@Nonnull Throwable error) {
189+
AttributesBuilder builder = baseAttemptAttributesBuilder();
190+
builder.put(EXCEPTION_MESSAGE_KEY, error.getMessage());
191+
Attributes attributes = builder.build();
192+
193+
// Same infrastructure is used for both polling and retries, so need to disambiguate it here.
194+
if (operationType == OperationType.LongRunning) {
195+
span.addEvent("Polling attempts exhausted", attributes);
196+
} else {
197+
span.addEvent("Attempts exhausted", attributes);
198+
}
199+
lastConnectionId = null;
200+
}
201+
202+
@Override
203+
public void attemptPermanentFailure(@Nonnull Throwable error) {
204+
AttributesBuilder builder = baseAttemptAttributesBuilder();
205+
builder.put(EXCEPTION_MESSAGE_KEY, error.getMessage());
206+
Attributes attributes = builder.build();
207+
208+
// Same infrastructure is used for both polling and retries, so need to disambiguate it here.
209+
if (operationType == OperationType.LongRunning) {
210+
span.addEvent("Polling failed", attributes);
211+
} else {
212+
span.addEvent("Attempt failed, error not retryable", attributes);
213+
}
214+
lastConnectionId = null;
215+
}
216+
217+
@Override
218+
public void responseReceived() {
219+
attemptReceivedMessages++;
220+
totalReceivedMessages++;
221+
}
222+
223+
@Override
224+
public void requestSent() {
225+
attemptSentMessages.incrementAndGet();
226+
totalSentMessages.incrementAndGet();
227+
}
228+
229+
@Override
230+
public void batchRequestSent(long elementCount, long requestSize) {
231+
span.setAllAttributes(
232+
Attributes.of(BATCH_COUNT_KEY, elementCount, BATCH_SIZE_KEY, requestSize));
233+
}
234+
235+
private Attributes baseOperationAttributes() {
236+
AttributesBuilder builder = Attributes.builder();
237+
builder.put(ATTEMPT_COUNT_KEY, currentAttemptId + 1);
238+
long localTotalSentMessages = totalSentMessages.get();
239+
if (localTotalSentMessages > 0) {
240+
builder.put(TOTAL_REQUEST_COUNT_KEY, localTotalSentMessages);
241+
}
242+
if (totalReceivedMessages > 0) {
243+
builder.put(TOTAL_RESPONSE_COUNT_KEY, totalReceivedMessages);
244+
}
245+
return builder.build();
246+
}
247+
248+
private Attributes baseAttemptAttributes() {
249+
return baseAttemptAttributesBuilder().build();
250+
}
251+
252+
private AttributesBuilder baseAttemptAttributesBuilder() {
253+
AttributesBuilder builder = Attributes.builder();
254+
populateAttemptNumber(builder);
255+
256+
long localAttemptSentMessages = attemptSentMessages.get();
257+
if (localAttemptSentMessages > 0) {
258+
builder.put(ATTEMPT_REQUEST_COUNT_KEY, localAttemptSentMessages);
259+
}
260+
if (attemptReceivedMessages > 0) {
261+
builder.put(ATTEMPT_RESPONSE_COUNT_KEY, attemptReceivedMessages);
262+
}
263+
String localLastConnectionId = lastConnectionId;
264+
if (localLastConnectionId != null) {
265+
builder.put(CONNECTION_ID_KEY, localLastConnectionId);
266+
}
267+
268+
return builder;
269+
}
270+
271+
private void populateAttemptNumber(AttributesBuilder builder) {
272+
builder.put(ATTEMPT_NUMBER_KEY, currentAttemptId);
273+
}
274+
}

0 commit comments

Comments
 (0)