TL;DR – A step-by-step tutorial video + a GitHub demo repo are below.
When it comes to complex microservices architecture, using messaging brokers is a no-brainer.
Communicating over a message broker reduces the mutual awareness that services should have of each other to be able to exchange messages, effectively implementing decoupling which is the root idea of microservices.
But now, using asynchronous (async) communication in microservices environments can be complicated to handle. Understanding the logic that occurred in each transaction using only logs can take hours.
Traces provide observability into microservices and async communication, helping us identify issues when something breaks.
In most cases, OpenTelemetry is used, among other things, to instrument, generate, collect and manage traces while providing an out-of-the-box auto instrumentation experience.
When it comes to Redis Pub/Sub, however, one of the challenges we’re facing is our inability to automatically send the metadata required for auto instrumentation.
In this post, we will go through how to overcome this challenge by manually instrumenting Redis Pub/Sub with OpenTelemetry, to get complete end-to-end visibility into the path, process, and actions that a single Redis Pub/Sub message went through.
This kind of granular view can be highly valuable when trying to troubleshoot specific cases in an environment that uses Redis Pub/Sub.
This blog post is based on the talk our CTO, Michael Haberman, gave at RedisConf 2021. You can view the full session here.
Why do we need traces?
Systems have a lot of components that communicate with each other and require understanding the interaction between these components.
This is true for our day-to-day work, but it’s especially true when something breaks, like when there’s an issue in the production environment – which is when you need it the most.
Usually, what we developers do to understand that interaction is just add a ton of logs. When something breaks, we begin the debugging process by hoping we have enough data in the logs.
But hoping is not enough and we discover we either added too many logs or not enough in the right place.
Traces provides us with that missing piece of visualization. They let us see how different components communicate with each other, which allows us to identify issues.
Let’s look at the example trace below from the Aspecto platform.
It shows a local host communicating with the microservice/user, which is also communicating with the microservice /openapi/envs, as well as with an AWS lambda function, which in turn sends an API call to DynamoDb.
In this query, we have visualization.
We can see the HTTP that invoked the Lambda and the three microservices, a Lambda and a DynamoDb. We have the visibility to understand how they communicate.
Now let’s say we had a failure here instead of a 200 OK.
This visualization provides the first layer of understanding how things are operating and what didn’t work as expected.
From there, we can dig deeper into the issue.
Instrumenting with OpenTelemetry
How can we instrument what’s happening within your application? Cue OpenTelemetry.
OpenTelemetry, an open-source observability framework for cloud-native software and a member of the Cloud Native Computing Foundation (CNCF, same folks that do Kubernetes), provides the ability to instrument our application.
The way it’s doing this is by adding a unique identifier for every operation that is happening. These identifiers let us generate traces and spans.
A span is an operation that takes a span of time. E.g, getting the response from an API call. The correlation between multiple spans is a trace.
Each span contains metadata about the operation, such as its name, start and end timestamps, attributes, and more.
In the image above, trace #31 is shared between service A and service B because their spans communicate through the same HTTP call.
This auto instrumentation happens automatically with HTTP.
Redis Pub/Sub + Traces
Redis Pub/Sub is a great choice for messaging brokers. We love it because it’s easy to use, reliable, the code is simple to understand and it’s easy to run it locally.
How traces work with Redis Pub/Sub
Every time we get individualized requests, we store them in Redis. After a minute or so, we aggregate them so they tell a story of communication between microservices.
This story becomes the trace that tells us what started the action, its operations, and more.
Then, we announce the trace is ready, by publishing a message through Redis Pub/Sub.
This trace can provide visibility by alerting about a breaking change, contract testing issues, performance issues, and more.
The Challenge: Auto Instrumentation with Redis Pub/Sub
But the challenge is that Redis Pub/Sub doesn’t support sending metadata. What happens when you can’t send metadata? Then, you need to add it yourself, manually.
Manual instrumentation in Redis Pub/Sub
In this part of our RedisConf 2021 session, Michael takes you through the process of Redis pub/sub manual instrumentation.
In addition, we created a GitHub demo repo that has everything you need to do it yourself: https://github.com/aspecto-io/redis-pub-sub-demo.
If you followed correctly all the steps in the tutorial, you should now have end-to-end visibility. You can see who consumed the message, who published it, and have access to the raw data.
Async communication is not super easy to manage. Getting that end-to-end visibility takes some work. When using message brokers such as Kafka, or AWS SQS, you can use OpenTelemetry and auto instrument your messaging solution since these support metadata.
When it comes to Redis Pub/Sub, despite its many benefits, you do not have the ability to auto instrument, so to get that granular visibility, you’ll have to go the extra mile and manually instrument it.
To do that, we highly recommend using OpenTelemetry – many vendors work with OpenTelemetry, and its community keeps growing.
Related GitHub Repositories
Each repo was created with ❤️
OpenTelemetry instrumentations for Node.js.
How to get end to end visibility with redis pub/sub using OpenTelemetry demo repo