In this blog post, we’ll cover how to get started with the OpenTelemetry aws-sdk instrumentation for Ruby and what are some of the main use cases for this instrumentation.
(Link to the instrumentation repository below).
Before we jump in, and for those new to OpenTelemetry, let’s cover two main concepts.
What is Opentelemetry?
OpenTelemetry is a CNCF project, which, among other things, enables the collection of traces, logs, and metrics (also known as the three pillars of observability). OpenTelemetry is an incredible collection of tools that help you understand your software’s performance and behavior.
What does instrumentation mean?
In the context of OpenTelemetry, instrumenting means providing the ability to monitor and measure the performance of libraries and applications in different languages. For example, OpenTelemetry aws-sdk Instrumentation for Node.js provides us the ability to use OpenTelemetry in order to collect telemetry data from Node.js applications on AWS.
About this instrumentation
By creating this instrumentation, our goal was to provide more granular data that would help members of the growing OpenTelemetry community to get more specific insights into the interactions between their Ruby applications on AWS, helping anyone using this instrumentation to troubleshoot the complex relationships within their services.
What problems does this instrumentation handle?
This instrumentation enables us to track and ultimately visualize traces that represent the interactions of various AWS components with different services and with each other (components such as SQS, SNS, S3, EC2, DynamoDB, and more).
Traces enable us to understand our system and the relationships between components, which is often the hardest to understand when debugging, especially when on a large scale.
(Btw, if you’re unfamiliar with terms like spans and traces, we wrote an article covering the topic of distributed tracing and why you’d want to use it: Logging vs Distributed Tracing: Why Logs Aren’t Enough to Debug Your Microservices)
Messaging services visibility
When it comes to distributed systems, some of the most used AWS services out there are SQS (a message queuing service) and SNS (a pub/sub service) which essentially allow you to decouple microservices.
Now imagine a scenario where you have dozens of SQS/SNS services communicating with each other and with other services asynchronously.
Visualizing the complete journey any message has gone through, who consumed it, and what happened to it right after, is crucial. This is an important function for both understanding your system structure and troubleshooting it as fast as possible.
The ability to correlate events across different services and transfer metadata from one service to the next is a key concept in the OpenTelemetry realm and it’s called context propagation.
Using this instrumentation, when a message is sent to the queue, you will be able to see the full trace and which services consume it, and what other cascading operations happen down the stream.
A word on visualization – Having end-to-end visibility to these messaging systems (whether you’re using SQS, Kafka, or anything else) is highly dependent on which visualization layer you’re using, since not all open sources or vendors provide you with this ability. This is what it looks like in the Aspecto UI.
Here’s a quick getting started:
To install the instrumentation, call
use with the name of the instrumentation.
OpenTelemetry::SDK.configure do |c| c.use 'OpenTelemetry::Instrumentation::AwsSdk' end
Alternatively, you can also call use_all to install all the available instrumentation.
OpenTelemetry::SDK.configure do |c| c.use_all end
On a final note, here’s a take-home point for you.
Since messaging systems are a critical part of any distributed system and there are many moving parts, the main idea is to reduce guesswork around what happened, in what order, and when, as much as possible.
Keep your message broker game strong.