ra2 studio - Fotolia
Given the complex nature of distributed systems and microservice apps, DevOps teams need a dependable way to find, diagnose and fix application performance issues and delivery pipeline bottlenecks.
One way for programmers to map and isolate data across a system's critical pathways is to trace user-initiated requests, which further identify bottlenecks and help improve execution flow. Distributed tracing is a critical function that helps ensure distributed systems run efficiently.
Let's look at the history of end-to-end tracing and see how tools like OpenTracing, Zipkin and Jaeger form a foundation for consistent application performance.
From X-Trace to Google's Dapper: How did we get here?
Tracing tells the story of a transaction or workflow as it propagates through a system. In the monolith era, teams used the term tracing to describe all of the processes involved with handling a request within a single mainframe. Tracing meant tracking requests as they pass back and forth within a system. Then they would create log statements as well as request IDs, thread identifiers and user IDs. However, proprietary language requirements and the overall systems that they targeted often limited these older tracing frameworks.
Introduced in 2007, X-Trace was a true forerunner to today's distributed tracing tools. It provided a concise framework for correlating the events in one component to the events in other arbitrary components -- a key element for microservices monitoring. By 2015, the rapid adoption of tracing and a new reliance on distributed services by well-known companies like Netflix, Uber and Facebook set the stage for the rise of Google's Dapper.
Dapper's success as a tracing framework is based on ubiquitous deployment across a targeted system along with continuous monitoring. Dapper provides capabilities similar to X-Trace when it comes to correlating events between different components, but Dapper executes this on a much more massive scale within complex distributed systems and environments. Dapper further grew to become increasingly important to track transactions across process boundaries, especially in systems where teams couldn't install application program managers.
The ability to monitor distributed trace data helps you maintain coherence in a microservice architecture and accurately identify data silo problems. In contrast to monolith-based monitoring, distributed tracing can record all the different processes involved in handling a single, user-initiated request.
Also, this tracing process logs every operation called to handle a request. These identifiers are known as spans. Since every span is tagged with a unique ID, a distributed tracing engine can identify and analyze all of the processing data correlated to the original request.
Option 1: OpenTracing
OpenTracing offers a set of standards and techniques for distributed tracing that avoids vendor lock-in because it traces transactions through the use of vendor-neutral APIs and instrumentation. In the past, the ordeal of switching code instrumentation to accurately trace requests across different frameworks and layers slowed down the tracing process and was also a burden to developers. Since code instrumentations tightly couple with underlying tracing platforms, programmers of the past had to switch between tracing systems and regularly refactor code.
OpenTracing abstracts the differences in distributed tracer deployments so that all tracers can coexist within one system. This abstraction makes it easier for developers to swap out tracer instances without having to constantly change instrumentation. Moreover, with a greater amount of services and clouds, there's bound to be more trace information passing through points where it could be lost, thus presenting the need for a general-purpose tracing API.
Option 2: Zipkin
Zipkin is a distributed tracing system that was first developed at Twitter and is now offered as open source code. Zipkin visualizes trace data between and within services. Its Java-enabled architecture consists of four components: a collector, storage service, search service and a web UI. The collector validates incoming data and passes it along to the storage of choice, such as Cassandra, Elastic Search or MySQL.
Users can then query Zipkin and retrieve traces from the database via the search service API and the UI. By defining the context propagation features for a trace, users gain a holistic view of an entire service to pinpoint more slowdowns and perform debugging. Zipkin also enables a more effective way to perform forensics without reassembling application flows from log data.
Option 3: Jaeger
While Jaeger and Zipkin are both compatible with OpenTracing, Jaeger's architecture is focused on scalability and parallelism. With a back end built in Go language, Jaeger uses components such as a collector, datastore, query API and UI. Moreover, it easily accepts Zipkin span requests, making it easy for Zipkin users to switch to Jaeger.
Jaeger agents listen for incoming requests and route those requests to a collector that validates, transforms and stores each span to persistent storage. The query service then exposes a REST API to access tracing data from storage for analysis via the React-based UI. This simple and direct process is one of the reasons for Jaeger's surging popularity. Its low resource overhead and a wealth of features like dynamic sampling also contribute to that popularity.
Processing and monitoring the ever-increasing volumes of data generated from large-scale distributed systems will often present a key hurdle to overcome. This challenge demands more availability to tracing approaches that are more scalable than the systems they're monitoring, as well as greater standardization and tracing interoperability. This void still exists, and an extensible format is needed to share trace data across all open source and commercial tools for further interpretation. However, the tools mentioned above can take you one step further to filling that gap.