Aspecto blog

On microservices, OpenTelemetry, and anything in between

Checklist for TroubleShooting OpenTelemetry NodeJS Tracing Issues

OpenTelemetry race car with pit crew

Share this post

I’ll try to make this one short and to the point. You are probably here because you installed OpenTelemetry in your NodeJS application and did not see any traces or some expected spans were missing.

There can be many reasons for that, but some are more common than others. In this post, I will try to enumerate the common ones, along with some diagnostic methods and tips.

If you prefer a video version, we also hosted a live workshop on the topic.

Table of Contents

Requirements

I assume that you already have basic knowledge of what OpenTelemetry is and how it works and that you tried to set it up in your NodeJS application.

If you’re just getting started with OpenTelemetry, start with this OpenTelemetry guide.

Enable Logging

OpenTelemetry JS will by default not log anything to its diagnostic logger. Most of the SDK issues below are easily detected when a logger is enabled.

You can log everything to the console by adding the following code as early as possible in your service:

// tracing.ts or main index.ts
import { diag, DiagConsoleLogger, DiagLogLevel } from '@opentelemetry/api';
diag.setLogger(new DiagConsoleLogger(), DiagLogLevel.DEBUG);
// rest of your otel initialization code

This is useful for debugging. Logging everything to the console in production is not a good idea, so remember to remove or disable it when your issues are resolved.

Pro tip: At Aspecto we use the OTEL_LOG_LEVEL environment variable to set DiagLogLevel so we can easily turn it off and on.

Auto Instrumentation Libraries

Many users choose to use auto Instrumentation libraries, which automatically create spans for interesting operations in popular and widely used packages (DB drivers, http frameworks, cloud services SDKs, etc) 

Some initialization patterns and configuration options can cause your service to fail to create spans, to begin with.

To rule out auto instrumentation libraries issues, try to create a manual span first. If you see manual spans but not spans from the installed auto instrumentation libraries, continue reading this section.

import { trace } from '@opentelemetry/api';
trace.getTracerProvider().getTracer('debug').startSpan('test manual span').end();

Install and Enable

To use an auto instrumentation library in your service, you’ll need to:

  1. Npm install it: npm install @opentelemetry/instrumentation-foo. You can search the OpenTelemetry Registry to find available instrumentations
  2. Create the instrumentation object: new FooInstrumentation(config)
  3. Make sure instrumentation is enabled: call registerInstrumentations(...)
  4. Verify you are using the right TracerProvider

For most users, the following should cover it:

// First run `npm install @opentelemetry/instrumentation-foo @opentelemetry/instrumentation-bar
// Replace foo and bar with the actual packages you need to instrument (http/mysql/redis etc)
import { FooInstrumentation } from '@opentelemetry/instrumentation-foo';
import { BarInstrumentation } from '@opentelemetry/instrumentation-bar';
import { registerInstrumentations } from '@opentelemetry/instrumentation';
// create TracerProvider, SpanProcessors and SpanExporters
registerInstrumentations({
  instrumentations: [new FooInstrumentation(), new BarInstrumentation()],
});

For advanced users who choose to use the low-level api instead of calling “registerInstrumentations”, make sure your instrumentation is set to use the right tracer provider and that you call “enable()” if appropriate.

Enable Before Require

All instrumentations are designed such that you first need to enable them and only then require the instrumented package. A common mistake is to require packages before enabling the instrumentation libraries for them.

Here is a bad example:

import { NodeTracerProvider } from "@opentelemetry/sdk-trace-node";
import { registerInstrumentations } from "@opentelemetry/instrumentation";
import { HttpInstrumentation } from "@opentelemetry/instrumentation-http";
import { SimpleSpanProcessor, ConsoleSpanExporter } from "@opentelemetry/sdk-trace-base";
import http from "http"; // ⇐ BAD - at this point instrumentation is not registered yet
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();
registerInstrumentations({ instrumentations: [new HttpInstrumentation()] });
// your application code which uses http

In most cases, the instrumentation code resides in a different file or package than the application code, which makes it tricky to discover. Some frameworks, such as serverless, can import packages before the instrumentation code has a chance to run. This can be easily missed.

To diagnose this issue, enable logging and verify you are seeing your instrumentation package being loaded. For example:

״@opentelemetry/instrumentation-http Applying patch for https@12.22.9״

If missing, chances are your auto instrumentation library is not being applied.

Library Configuration

Some auto instrumentation libraries include a custom configuration that controls when instrumentation is skipped. For example, http instrumentation has options such as ignoreIncomingRequestHook and requireParentforOutgoingSpans

In specific cases, some libraries are not instrumenting by default, and you have to specifically opt-in to get spans. For example, ioredis and redis instrumentations should be configured with requireParentSpan = true to create spans for internal operation with no parent span.

If you don’t see spans for a library, maybe you need to tweak the configuration to make them appear.

Instrumented Library Version

Auto instrumentation libraries usually don’t support all versions of the library they instrument. If the version you are using is too old or very recent, it might not be supported and thus no spans will be created.

Consult the documentation of the library you are using to verify if your version is compatible. This data is usually found in the README for the instrumentation, for example, ioredis README.

No Recording and Non-Sampled Spans

Not all spans that are created in your application are exported. Spans can be marked as “Not Sampled” or “Non-Recorded” in which case you will not see them in your backend.

To rule out these issues, you can hook in a “debug span processor” which only prints the sampled decision. If “span sampled: false” is printed to the console, continue reading this section.

import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { ReadableSpan } from '@opentelemetry/sdk-trace-base';
import { trace, Span, Context, TraceFlags } from '@opentelemetry/api';
const provider = new NodeTracerProvider();
provider.addSpanProcessor({ 
    forceFlush: async () => {},
    onStart: (_span: Span, _parentContext: Context) => {},
    onEnd: (span: ReadableSpan) => { 
        const sampled = !!(span.spanContext().traceFlags & TraceFlags.SAMPLED);
        console.log(`span sampled: ${sampled}`);
    },
    shutdown: async () => {},
});
provider.register();

NoopTracerProvider

If you don’t create and register a valid TracerProvider, your app will run with the default TracerProvider which starts all the spans in your app as NonRecordingSpans.

You need to have code similar to this as early as possible in your application:

import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { ConsoleSpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();

Remote Sampling Decision

The default sampling behavior (and a very popular one) is that each span inherits the sampling decision from its parent. If the component that invoked your service is configured not to sample, then you will not see spans from your service as well.

Examples:

  • An API Gateway can be configured with sampling logic or have tracing turned off, in which case it can affect all downstream tracing (including your innocent service, which needs to be sampled).

  • External users, which are calling your service, can also be instrumented and derive their own sampling decisions (which you have no control of). These sampling decisions are then propagated to your service and affect it.

  • Other services in your system can derive sampling decisions based on their local needs and viewpoint. It can be easy to configure an upstream service endpoint to not sample an uninteresting endpoint without realizing that it calls a very interesting and important endpoint downstream (which we do want to sample).

Local Sampler

You can configure your local sampler to sample some spans or none. If the configuration was written by someone else a long time ago, or if it is complex / non-intuitive — then spans are justifiably not sampled and exported, which can be easy to miss.

Exporting Issues

It is possible that the service is generating spans, but they are not exported correctly to your backend or are being thrown in the collector for some reason.

To rule out exporting issues, try to add ConsoleExporter. If you see spans exported to console but not in the backend you export to, continue reading this section.

import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
import { ConsoleSpanExporter, SimpleSpanProcessor } from '@opentelemetry/sdk-trace-base';
const provider = new NodeTracerProvider();
provider.addSpanProcessor(new SimpleSpanProcessor(new ConsoleSpanExporter()));
provider.register();

Configuring an Exporter

Your service should have span exporting code similar to this:

import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
// Create TracerProvider
const exporter = new OTLPTraceExporter();
provider.addSpanProcessor(new BatchSpanProcessor(exporter));

In this example, I used @opentelemetry/exporter-trace-otlp-proto, but there are other exporters to choose from, and each one has a few configuration options. An error in one of these options will fail to export, which is silently ignored by default.

Here are a few common configuration errors:

OTLP exporters

  • Format — OTLP supports http/json, http/proto, and grpc formats. You need to choose an exporter package that matches the format your OTLP collector support.

  • Path — If you set http collector endpoint (via config in code or environment variables), you must also set the path: “http://my-collector-host:4318/v1/traces”. If you forget the path, the export will fail. In gRPC, you must not add path: “grpc://localhost:4317”. This can be a bit confusing to get right at first.

  • Secure Connection — Check if your collector expects a secure or insecure connection. In http, this is determined by the URL scheme (http: / https:). In grpc, the scheme has no effect and the connection security is set exclusively by the credentials parameter: grpc.credentials.createSsl(), grpc.credentials.createInsecure(), etc. The default security for both HTTP and gRPC is Insecure.

Jaeger Exporter

Jaeger exporter can work in “Agent” mode (over UDP) and “Collector” mode (over TCP). The logic to decide which one to use is a bit confusing and lacks documentation. If you pass the endpoint parameter in exporter config or set OTEL_EXPORTER_JAEGER_ENDPOINT environment variable, then the exporter will use “Collector” HTTP sender. Else, it will export in “Agent” mode with UDP sender to the host configured in the param, or, OTEL_EXPORTER_JAEGER_AGENT_HOST or localhost:6832.

Setting Vendor Credentials

If you are using a vendor as your tracing backend, you might need to add additional info such as authentication headers. For example, if you send traces to Aspecto, you’ll need to add your Aspecto token as an Authorization header. Like this:

import { BatchSpanProcessor } from '@opentelemetry/sdk-trace-base';
import { OTLPTraceExporter } from '@opentelemetry/exporter-trace-otlp-proto';
// Create TracerProvider
const exporter = new OTLPTraceExporter({
  url: 'https://otelcol.aspecto.io/v1/trace',
  headers: {
      Authorization: 'YOUR_API_KEY_HERE'
  }  
});
provider.addSpanProcessor(new BatchSpanProcessor(exporter));

If not applied, you will not be able to see any data in your vendor’s account.

Flush and Shutdown

When your service goes down or your lambda function ends, it is possible that not all spans are successfully exported to your collector yet. You need to call the shutdown function on your tracer provider and await the returned promise to ensure all data has been sent.

import { NodeTracerProvider } from '@opentelemetry/sdk-trace-node';
const provider = new NodeTracerProvider();
provider.register();
// when your you terminate your service, call shutdown on provider:
provider.shutdown();

Package Versions Compatibility

Some issues can be a result of incompatible or old versions of SDK and instrumentation packages.

SDK versions

It is recommended to check that your SDKs and API packages are not old and are compatible with each other. Make sure you don’t have any peer dependency warnings when you npm install.

Other APM libraries

OpenTelemetry is not guaranteed to be compatible with other APM libraries that use monkey patching to do their magic. If you have such a package installed, try to remove or disable it and check if the problem goes away.

What’s Next?

Where to Get Help

If neither of the above solved your problem, you can ask for help in the following channels:

Resources

Should I Use a Vendor?

Another alternative is to use a vendor’s distribution of opentelemetry. These distributions can save you time and effort:

  • Technical support
  • Preconfigured with popular features for common and advanced users
  • Up to date with latest OpenTelemetry versions
  • Implementing best practices and avoiding the pitfalls mentioned above

You can try Aspecto’s SDK here or any other tracing vendor of your choice.

Spread the word

Subscribe for more distributed applications tutorials and insights that will help you boost microservices troubleshooting.