Chaos Monkey is a popular open source tool developed by Netflix that takes reliability testing to new heights....
The idea behind Chaos Monkey testing is to deliberately kill random nodes across the system at regular intervals to assess whether the system can survive despite these failures. The Chaos Monkey testing principle can help evaluate the reliability of microservice-based applications, but rather than intentionally kill nodes, architects should focus on the interruption of services.
As one service fails, other dependent services could stall or fail in a ripple effect. This delivers a bad user experience. There are many ways to deal with this issue, and when used in combination, they can help architects design more resilient systems.
Fallback services matter
Replicas and fallbacks are not just for infrastructure components, but for mission-critical services as well. Consider creating an alternate, bare-bones service that can take on the load if the default service fails. This is especially useful for services such as billing and transaction processing for e-commerce applications.
Setting up a fallback service could be as simple as routing traffic away from the failed service. Before flipping the Chaos Monkey testing switch, though, you need to ensure backup services for critical processes are ready to take over.
Building resilient infrastructure
With modern container stacks, it is now possible -- even easy -- to automatically restart failed containers and set up autoscaling container clusters. This automation helps ensure resiliency in the infrastructure layer.
At the networking level, shortening timeout limits will ensure services are quickly rerouted to a fallback service after a failure. This helps better optimize the system for performance.
Having persistent data storage for containers is also essential. When a container fails, its stored data is lost unless you configure persistent storage volumes. Persistent storage ensures that, even if a service fails, it can be easily resumed with the old, stored data.
When all efforts still result in lost data, a disaster recovery (DR) tool can ensure you always have access to your data in the event of mass failures. DR tools and services are worth purchasing if you need true data resilience.
Whether it's the service layer or the infrastructure layer, you can build more resilient applications by employing the Chaos Monkey testing principle in your microservice applications.
Dig Deeper on Application development and management
Related Q&A from Twain Taylor
See how event-driven architecture patterns can set up an enterprise application for adaptability in a way that monolithic architectures can't. Continue Reading
Explore the pros and cons that differentiate SDKs and APIs from each other and come to understand what types of applications call for SDKs, APIs or ... Continue Reading
The more distributed and complex a modern application is, the more likely an API mapping tool should join the developers' arsenal for efficient ... Continue Reading