Open-Source Tracing Tools: Jaeger Vs. Zipkin Vs. Grafana Tempo

Distributed tracing is crucial for monitoring complex systems. This article covers the three most popular open-source tracing tools: Jaeger, Zipkin, and Grafana Tempo.

Kentaro Wakayama Avatar

Kentaro Wakayama

25 January 2023

Open-Source Tracing Tools: Jaeger Vs. Zipkin Vs. Grafana Tempo

The rise of internet-scale enterprises like Amazon, Netflix, Microsoft, and Google has ushered in a new era of distributed systems and microservice applications.

Distributed architectures make it possible to build and operate highly resilient applications, comprising a suite of small independent services. However, they're inherently complex. And this complexity can create major headaches when debugging systems, detecting bottlenecks, tracking down failures, or simply attempting to understand what's happening in the systems.

In distributed architectures, user requests travel through dozens of services implemented in different programming languages. It’s practically impossible to attach debuggers to different processes and step through them to troubleshoot issues. Because of this, traditional ways of debugging monolithic systems have proven ineffective in distributed systems. Distributed tracing aims to address these challenges.

Tracing Fundamentals

Distributed tracing is a modern, diagnostic technique that allows development teams to trace requests traversing distributed services, applications, and databases. It empowers software teams to profile, monitor, troubleshoot, and optimize distributed applications.

Tracing begins with a user request at the entry point of an application. As the request travels through the system, it generates traces that track all of the processing operations performed on the request by the system's components. 

Each trace has a unique ID and passes through a span. This span represents the activities or  individual unit of work invoked as part of fulfilling the user request. Spans contain information like unique ID, name, metadata, and timestamp, which are useful for analytics and debugging.

By tracking the flow of requests from end to end, software teams are able to compare performant traces against anomalous ones. By seeing the differences in timing, structure, and behavior, teams save overhead and can troubleshoot and resolve system issues faster.

With so many distributed tracing tools on the market, deciding which to adopt can be difficult. The remainder of this article covers Zipkin, Jaeger and Grafana Tempo capabilities and where they are lacking.

OpenTelemetry

There are two options for application instrumentation. You can either build a custom tracing solution to fit your needs, or leverage existing tracing frameworks and tools. In general, existing tools offer far more benefits than customized ones. Additionally, your team may lack the time and resources required to implement their own custom telemetry collector.

This is where a community-driven observability framework known as OpenTelemetry comes in. The aim of OpenTelemetry is to provide a universal set of APIs, libraries, agents, and collectors you can use to instrument your code.

OpenTelemetry is the result of a recent merger between OpenTracing and OpenCensus. It is, therefore, still common to see applications instrumented with OpenTracing or OpenCensus. We’ll cover both briefly.

OpenTracing

OpenTracing is a Cloud Native Computing Foundation (CNCF) project and open-source framework that allows developers to instrument their codebases using APIs that don't lock them into one particular vendor or product. It provides distributed tracing standards for open-source packages and applications. It also offers a set of libraries for multiple programming languages. This allows you to collect distributed traces and either visualize traces in a UI or transfer the data to various backends.

OpenTracing integrates with commercial (e.g. Datadog or Instana) and open-source (e.g. Zipkin, Jaeger) tracing solutions.

OpenCensus

OpenCensus is a Google-funded, open-source framework that allows organizations adopting distributed architectures to instrument their software for distributed tracing. OpenCensus, like OpenTracing, is vendor-agnostic. It provides instrumentation and APIs that allow developers to collect, manipulate, and export distributed traces and time-series metrics to the backend(s) of their choice in real-time.

OpenCensus offers custom exporters, local debugging, and a gRPC bridge. It also provides integrations with popular frameworks, libraries, and products like MongoDB and Redis. The framework also works well with all software systems, including highly-distributed microservices, and large monolithic or client applications. 

OpenTracing and OpenCensus provide APIs for implementing distributed tracing. After adopting any of the frameworks, you still need a distributed tracing tool to process the trace data and monitor the performance of your applications.

Of course, it’s impossible to know the right tool to use if you don't understand the strengths and weaknesses of each. In the next sections, we'll investigate the two major distributed tracing tools, Zipkin and Jaeger, and introduce you to Grafana Tempo.

The Battle of the Open-Source Tracing Tools

On their own, trace data are just analytical, much like log files or raw time-series metrics. Therefore, collecting trace data isn’t enough to get the most benefit from the data. To understand the global behavior of a distributed system, a tracing tool is needed. When an application fails or becomes unresponsive, tracing tools like Zipkin, Jaeger, or Grafana Tempo can help you track down the root cause of a problem and identify the problematic service among myriad interconnected services.

1. Jaeger

Jaeger is a CNCF-incubated distributed tracing tool for monitoring and troubleshooting microservices-based systems. Jaeger was inspired by OpenZipkin and Dapper, and made open source by Uber Engineering. It’s handy for performance and latency optimization, distributed transaction monitoring, service dependency analysis, and root cause analysis.

Jaeger is written in Golang and, like Zipkin, is vendor-agnostic. It also supports ElasticSearch, Cassandra, and Kafka as scalable backends. Jaeger tracing provides several capabilities, including: 

  • High scalability
  • Integration with Kiali
  • OpenTracing compatibility
  • Backward compatibility with Zipkin
  • Distributed context propagation
  • Adaptive sampling

2. Zipkin

Zipkin is a Java-based, vendor-agnostic, and open-source distributed tracing tool. Based on Google's Dapper paper, Zipkin allows software teams to send, receive, store, and visualize trace data in a distributed architecture.

Zipkin has an accessible UI and a dependency diagram that shows the number of traced requests through an application. You can filter the traces based on application, timestamp, and length of the trace to identify aggregate behaviors, like calls to depreciated services or error paths.

Under the hood, Zipkin uses ElasticSearch, Apache Cassandra, and MySQL as backends for storing trace data. It also implements Kafka or HTTP as the communication protocol. That said, there are other protocols, such as RabbitMQ, gRPC, and Apache ActiveMQ.

3. Grafana Tempo

Released by Grafana Labs in 2020, Grafana Tempo is a high-volume, minimal-dependency, and open-source tool. According to Grafana Tempo’s creators, it’s an easy, cost-efficient, and scalable alternative to Zipkin and Jaeger.

Other tracing backends typically require the use of data stores like Cassandra or ElasticSearch. But Grafana Tempo requires only object storage, such as Google Cloud Storage or Amazon S3, to operate. Using object storage means software teams can collect and store a higher volume of traces from distributed applications without the need for sampling.

Grafana Tempo integrates with Grafana Loki, Prometheus, and Grafana. To learn more about Loki you can check our article here: https://codersociety.com/blog/articles/loki-kubernetes-logging. It's also compatible with Zipkin, Jaeger, OpenTelemetry and OpenCensus. Grafana Tempo is designed to work with metrics (exemplars) and logs for trace discovery, and it only supports key/value lookup using traceID.

Comparison between Jaeger, Zipkin and Grafana Tempo

Each open-source tracing tool has its own strengths and weaknesses. Deciding which to use depends on a number of factors, including:

  • OpenTracing compatibility
  • Storage support
  • Deployment option
  • Language and library support

Table 1: Comparison between Jaeger, Zipkin and Grafana Tempo

Which Tracing Tool Is Right for You?

Jaeger, Zipkin, and Grafana Tempo are all powerful tools for tracing, collecting, and analyzing requests or system events to understand what’s happening in your applications, but certain characteristics make Grafana Tempo standout.

Grafana Tempo is less expensive and allows you to store a massive amount of traces at a low cost. It is compatible with OpenTelemetry and Opentracing and integrates with other tools like Grafana Loki, Prometheus. In addition, it’s able to accept spans from Jaeger and Zipkin.

Although Grafana Tempo is young (still in beta phase) and has less community support, its ease of use and low cost makes it a good fit for teams looking to increase the number of distributed traces they collect and store.

The battle of distributed tracing tools is tough but Grafana Tempo is the new, modern and young tool with a promising future. I wouldn't be surprised if Grafana Tempo wins by a large margin in the coming years.

For our latest insights and updates, follow us on LinkedIn

Kentaro Wakayama Avatar

Kentaro Wakayama

Managing Director, CEO

Kentaro leads Coder Society as CEO, bringing hands-on expertise in software development, cloud technologies, and building high-performing engineering teams.

Contact us