Aspecto blog

On microservices, OpenTelemetry, and anything in between

Context Management in OpenTelemetry Node.js

context management opentelemetry nodejs

Share this post

Share on facebook
Share on twitter
Share on linkedin

This post shares some of our experience gained by writing Node.js plugins and debugging context issues at Aspecto.

You might find it interesting if:

  • You are developing an instrumentation plugin for OpenTelemetry in Node.
  • You get broken-traces or the structure of the trace tree is not as expected.
  • You would like to have a deeper understanding of how the magic works under the hood.

So you’ve started using OpenTelemetry in your javascript application to collect traces. Since you are reading this article about context management, we assume that you understand what OpenTelemetry is all about and the basics of how to set up tracing for your application.

You have probably run some example application such as the basic-tracer-node under the “examples” directory in the Opentelemetry-js repository, and got a trace in Jaeger.

a basic-tracer-node application in Jaeger UI from the “examples” directory in the Opentelemetry-js repository

In the trace above, we have one main span, with 10 doWork child spans.

But how does doWork know to link to its parent?

This child-parent connection between main and doWork is what we will cover in this article.

What is Span Context?

A trace is a collection of spans, with some relation between the spans (parent <> children). In the example above we have 11 different spans which all belong to the same trace.

In OpenTelemetry, each span is handled autonomously — it starts, ends, and processed orthogonally to other spans. Jaeger received each of the above spans individually and might receive many other spans from other traces in between. Since those traces share the same traceId, Jaeger can fuse them into one trace. It draws the spans as a tree because each span records the span id of its parent. We’ll call these two properties SpanContext. They are captured each time a span is created via the tracer.startSpan() function.

Span context is the glue that connects those separate spans into a trace. The SpanContext interface is defined in @opentelemetry/api:

export interface SpanContext {
   traceId: string;
   spanId: string;
   // and few other context attributes
}

The Goal

The goal is to have a complete, well-structured trace tree.

If everything works properly, all you need to do is to install OpenTelemetry plugins for the libraries you are using and want to auto instrument, setup and configure OpenTelemetry tracing, and should not bother yourself with context.

When things don’t work well (for various reasons), you might encounter:

  • Broken traces — where you expect to see one trace for a single execution, but receive multiple traces where each one is a sub-part of the full trace. For example: If your application receives an HTTP call and writes something to the database, and you get two traces: one with HTTP span, and another with database span.

  • Wrong tree hierarchy — when you logically expect a span A to be the child of some span B, but instead it resides under another span C, giving the impression that the flow of the code is different from what it really is.

How a Span Gets its Context?

The most straightforward way to link a span to its parent is to set the parent option on SpanOptionswhen starting the span:

const span = tracer.startSpan('doWork', { parent });






The parent span stores its own id and the trace id it belongs to, which becomes the parentSpanId and traceId of the new span. This method is sometimes useful, but it doesn’t fit for general instrumentation plugins that patch module functions and cannot usually pass around the parent span object to other plugins or places where it is needed.

Let’s illustrate with an example – we’ll take the basic-trace-node example that we used earlier and change it a bit:

  • Replace the doWork function with a more realistic example that uses async-await semantic and Axios to GETsome data from an external HTTP endpoint.

  • Install the HTTPS plugin, so outgoing HTTP traffic from thedoWork function will create a span per request.

  • Reduce the work spans from 10 to 2, so Jaeger will be less noisy, and await for their completion with `Promise.all()`

The code after those modifications becomes:

"use strict";

const opentelemetry = require("@opentelemetry/api");
const { NodeTracerProvider } = require("@opentelemetry/node");
const { SimpleSpanProcessor } = require("@opentelemetry/tracing");
const { JaegerExporter } = require("@opentelemetry/exporter-jaeger");

// Use NodeTracerProvider which instruments https module automatically
const provider = new NodeTracerProvider();
const axios = require("axios").default;

const exporter = new JaegerExporter({ serviceName: "basic-service" });
provider.addSpanProcessor(new SimpleSpanProcessor(exporter));
provider.register();
const tracer = opentelemetry.trace.getTracer("example-basic-tracer-node");

const parentSpan = tracer.startSpan("main");
// call the doWork async function twice and await it's result
Promise.all([doWork(1), doWork(2)]).then(() => {
  parentSpan.end();
  exporter.shutdown();
});

async function doWork(index) {
  await axios.get(`https://jsonplaceholder.typicode.com/todos/${index}`);
}

If we run the above example and check Jaeger, we’ll see broken-traces — there are 3 traces instead of one!

Our old friend “main” span as one-span trace, and two additional traces with single-span each, for the two HTTP operations.

The issue is that the two HTTP spans were created internally in the @opentelemetry/https plugin and can’t tell their parent is the “main” span. To fix the issue, we need to set a scope for the “main” span with the `tracer.withSpan` function:

const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, () => {
  // call the doWork async function twice and await it's result
  Promise.all([doWork(1), doWork(2)]).then(() => {
    parentSpan.end();
    exporter.shutdown();
  });
});

The withSpan function executes its callback parameter in such a way that when a new span is started inside that function or one of the cascading sync or async calls, the new span context is set correctly to parentSpan. Since we called doWork() from within the scope of withSpan , and it calls axios.get() which calls the instrumented HTTPS module – the HTTP spans are now created with the right traceId and parentSpan:

Jaeger UI one complete trace with two basic-service requests

We can see one complete trace, as well as that the two requests are executed simultaneously since they run with Promise.all and async function. We fixed the broken-traces issue by setting the scope for the span.

ContextManager

ContextManager is a global, singleton service of the OpenTelemetry js framework which tracks the currently active context at any time. It is used by the tracer to query for the current active Context using the active()function, and to set scoped Context for following function calls using the with() and bind() functions.

AsyncHooksContextManager

OpenTelemetry will take care of setting the correct context for any function called from inside the with() scope, even for async function and promises which start new spans after being awaited.

For example, the following code will produce a correct trace where the three HTTP spans are children of the main span:

const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, async () => {
  setTimeout(() => {
    axios.get(`https://jsonplaceholder.typicode.com/todos/1`);
  }, 1000);
  await axios.get(`https://jsonplaceholder.typicode.com/todos/2`);
  axios.get(`https://jsonplaceholder.typicode.com/todos/3`);
});

parentSpan.end();

The context will propagate correctly to the timer callback being executed after one second, to the awaited `axios.get` call, and to the non-awaited second axios.get call. This is done by subscribing to various lifecycle events of asynchronous resources using the async_hooks node module.

  • To see the current active context, you can examine the return value from opentelemetry.context.active() imported as const opentelemetry = require("@opentelemetry/api"). This function will return a Context object, which is an empty object if there is no active context, and an object with a key such as “Symbol(OpenTelemetry Context Key ACTIVE_SPAN)”, if a context is set.

  • If the code is using `thenables` objects (objects that has a then()method, and behave like a promise, but are not a real promise), then due to a bug in V8, context will not propagate across the ‘thenable’ call. The following example code will create child_span as a new trace, and will not link it correctly under the main parent, even though it is awaited from inside the withSpan call.
const parentSpan = tracer.startSpan("main");
tracer.withSpan(parentSpan, async () => {
  const thenable = {
    then: (onFulfilled) => {
      const span = tracer.startSpan("child span");
      onFulfilled(42);
      span.end();
    },
  };
  await thenable;
});

parentSpan.end();
  • The withSpan function is a common practice in js plugins. It scopes the part of code that should run under the context of a specific span. Another common practice is to use the bind() function, which binds a context to a later run of an arbitrary function or EventEmitter. It can sometimes be surprising to run a function and receive a context that is different from what’s stated in withSpan. So keep that in mind in case you see some voodoo context being set and not sure where it is coming from.

  • Make sure your spans are scoped correctly. In many modules, there are complicated callbacks logic and interactions with other modules. If your trace is not structured correctly, check which parts of the code resides under the withSpan() and bind() functions.

If you have any questions on how to leverage OpenTelemetry in your environment feel free to reach out to us.

Spread the word

Share on facebook
Share on twitter
Share on linkedin
Subscribe for more distributed applications tutorials and insights that will help you boost microservices troubleshooting.