everythingpossible - Fotolia
Application logs simplify problem-solving in support, but teams face several challenges when they try to execute microservices logging, which requires a centralized view of multiple, distributed services.
An application log is an essential component of any application, regardless if it's monolithic or microservices-based, but the fundamental architecture of microservices-based applications makes logging a complicated endeavor. With microservices, several services constantly run and communicate among themselves, generating their own logs along the way. When one or more services fail, the team needs to know which service experienced an issue and why. It's also difficult to decipher the complete request flow in microservices. For instance, which services have been called? And in what sequence and frequency is that service called?
But there are a few ways to combat the inherent complications of microservices logging. This includes having meaningful logs, a unique ID to correlate requests, and a way to track errors that span across several services. Keep in mind, however, that these practices are easier said than done, and that these issues only get more complex as businesses increase their array of microservices across multiple apps. That's what makes it so important to plan ahead.
Use a correlation ID
Microservices logging has to account for the path of a transaction through multiple services. If services are calling services that call other services, how do you figure out which service failed, and during which call? And chances are that the log will contain millions of log messages. This is where a correlation ID comes to the rescue.
A correlation ID is a unique identifier that developers use to segregate sets of operations and track individual requests. It really doesn't matter how the correlation ID is generated, so long as it is unique, accessible to downstream services and diligently logged along with other important service call data.
If the transaction passing through multiple services has a correlation ID, troubleshooters can do an ID search in the logs and view data about the individual service calls, including the number of times a service was used. This way, the correlation ID can identify which service the transaction failure stemmed from.
Structure logs appropriately
Microservices-based applications can incorporate several technology stacks. And if a microservices-based application uses different structures to log data in their stacks, it can hamper log standardization. For example, one service might use a pipe symbol as a delimiter for fields, while another service uses a comma as a delimiter. This means troubleshooters can't analyze the individual logs in the same manner.
Developers should structure an application's log data to simplify parsing and to only log relevant information. This structured logging helps create simple and flexible microservices logs.
You can even take this a step further with data visualization. Teams can add a dashboard feature to microservices logging that provides a visual depiction of the information carried into the application logs.
Informative application logs
When an error occurs, the log should include all the needed information pertaining to the issue. The more information troubleshooters have from the microservices' logs, the easier and more quickly they can ascertain what went wrong.
Logs should, at a bare minimum, include the following information:
- Service name
- IP address
- Correlation ID
- Message received time in UTC, not local time
- Time taken
- Method name
- Call stack
Use centralized log storage
Implementing log storage for each distributed microservice is a daunting task, if only because each service requires its own event logging mechanism. As the number of microservices grows, this approach becomes increasingly difficult to carry out.
Individual log storage adds a lot of complexity to log retrieval and analytics. Instead, send the logs to a single centralized location for easy accessibility. Aggregation makes it easier for teams to manage log data and to correlate it to solve problems.
The ability to query logs efficiently is an essential part of finding failures that occur across multiple microservices. Using the correlation ID, a developer or tester should be able to access the complete request flow within the application.
Teams can query the log data to find out the percentage of failed requests, the time taken by each request and the frequency of each service call. One way to supplement this is to use a tool that can aggregate log data, such as the ELK Stack, Splunk or Sumo Logic.
Application deployments should also rely on an automated alert system that can analyze the logs and send out alerts whenever something goes wrong with one or more services. Developers should also consider the timing of failures, because it's possible that logging component might be down at a certain time of the day due to high logins or automated processes. Finally, applications should always include a fallback service that's adept at handling logging failures and restoring log data, if needed.