In this short guide, you will learn about the unexplored world of OpenTelemetry and eBPF. What does eBPF mean for observability? Why are we even discussing it in the context of OpenTelemetry? Let’s get started.
What to expect?
- What is eBPF?
- Why is eBPF relevant to observability?
- What do OpenTelemetry and eBPF have to do with each other?
- Bottom line
What is eBPF?
eBPF is a technology that allows us to run lightweight, sandboxed programs in a privileged context such as the operating system kernel. We use it to extend the kernel’s capabilities without changing kernel source code or load kernel modules.
Traditionally, the kernel’s ability to oversee and control the entire operating system has made the kernel an ideal place to implement observability, security, and networking functionality.
But the kernel’s central role and high demands for stability and security have made it historically more difficult to evolve. As a result, the rate of innovation at the operating system level has been lower compared to outside of the operating system (e.g., virtualization).
eBPF has changed the playing field. Our ability to run lightweight, sandboxed programs within the operating system enables application developers to extend the capabilities of the operating system without directly modifying it.
The result has been a wave of eBPF-based projects covering a range of use cases, including observability and security.
Anyone familiar with the topic can’t think of observability without thinking OpenTelemetry. So what does eBPF mean for OpenTelemetry and why is it relevant to observability?
Keep on reading.
Why is eBPF relevant to observability?
We define observability (o11y) as the ability to understand what’s happening within our app without looking inside.
As mentioned, eBPF is located outside the app and allows us to add additional capabilities to the kernel layer without changing anything internally, which ultimately informs us about the application’s state running on top of it.
Without getting into the kernel’s nasty technicalities, we know it’s a crucial part of the operating system, connecting the application and the core resources of the computer (CPU, memory, etc.).
With eBPF, we get a closer look at the operations going through the kernel, without “interfering” with the app or the kernel itself. It operates as a powerful tool to gain observability in these areas.
So rather than relying solely on static indicators and gauges exposed by the operating system, with eBPF, we can collect and aggregate custom metrics from within the kernel and generate visibility events based on a wide range of possible sources.
This way, we extend the depth of visibility into the kernel and can even reduce the overall system overhead dramatically by collecting only the visibility data required.
What do OpenTelemetry and eBPF have to do with each other?
Let’s start with OpenTelemetry.
OpenTelemetry is an open-source project by the CNCF (Cloud Native Computing Foundation, the same foundation responsible for Kubernetes).
It operates as a single library that captures and generates logs, metrics, and traces in a unified way and under a single specification, then ships it to your dedicated location (backend, collector, etc.).
Traces, logs, and metrics are known as the three pillars of observability.
OpenTelemetry enables us to instrument our distributed services, meaning, to gather data from the events that happen in our systems, which ultimately help us understand our software’s performance and behavior. In other words, to gain observability.
So similar to eBPF, we know that OpenTelemetry also allows us to gain observability (and is becoming today’s standard). But the two are not entirely interchangeable.
We first need to distinguish between two layers:
- Telemetry collection
- Telemetry specification
OpenTelemetry and eBPF overlap in their telemetry collection functionality
Both OpenTelemetry and eBPF allow us to collect telemetry (OpenTelemetry, using the OTEL SDK).
Each has a different way of doing it, and there are trade-offs for each project.
Thanks to its kernel proximity, eBPF shines when collecting operating system metrics, generating deep profiling, or any purpose that requires deep packet visibility.
The OpenTelemetry SDK is deployed within our code. It is easier to manage and use for achieving end-to-end tracing in a distributed system.
For developers, it’s part of their ecosystem. Modifications are made within the code in their native programming language – it’s a familiar environment. For the uninitiated, eBPF is a different creature altogether.
More on the OpenTelemetry SDK, here.
Telemetry specification: OpenTelemetry is the standard
While there are many ways to collect telemetry, the “observability standard” is determined by OpenTelemetry.
For example, we can use eBPF to collect telemetry in an OpenTelemery format. This ability extends the type and scope of data collected by OpenTelemetry to a wide range of apps and kernel telemetry.
OpenTelemetry + eBPF: Bottom line
OpenTelemetry and eBPF have some overlapping telemetry collection functionalities. Which tool to use depends on your goal.
If you wish to achieve distributed tracing, we recommend using the OpenTelemetry SDKs.
eBPF is extremely powerful for some use cases, and we believe its integration with the OpenTelemetry standard will be tightened in the future so users can collect data from different sources while having the ability to handle it under the same roof.
It’s important to mention that both tools have much more to offer than collecting telemetry. OpenTelemetry is a set of layers (SDKs, collector, specification). And while eBPF has observability capabilities, it is also used for security purposes and much more.
That’s it, folks. A short overview of OpenTelemetry and eBPF. There’s so much more to dive into when it comes to both f these technologies.
If you wish to learn more about eBPF, the website offers a well-written deep dive into more use cases, usage, and more.
To learn more about OpenTelemetry, visit the OpenTelemetry Bootcamp. It’s a free, 6 episodes, YouTube series in which we cover everything OpenTelemetry – from the very basics to sampling, security, and production deployment.
You can use it as your OpenTelemetry playbook.