This content is part of the Essential Guide: ERP hardware and infrastructure: Laying the groundwork for excellence
Manage Learn to apply best practices and optimize your operations.

App vs. infrastructure: Designing for horizontal scaling, availability

As the cloud and virtualization enter host application platforms, questions on horizontal scalability and availability management are arising.

Software architects and CIOs want applications that consistently provide a quality experience. As virtualization and the cloud enter the repertoire of host application platforms, questions arise on how scalability and availability management should be divided between applications and platforms. To answer these questions, architects have to:

  • Consider the effectiveness and cost of both approaches
  • Assess the ways applications and platforms exercise control
  • Ensure the evolutionary risks of both strategies are understood

The evolution of application development to componentized deployment created an opportunity for architects to design in greater horizontal scalability and availability by providing for component replication or replacement to augment or restore performance. If a resource fails or is overloaded, spin up a copy of the affected components on a different resource and rely on the workflow-directing capability of the application to restore proper operation.

Virtualization and the cloud make this process easier, but because virtual platforms have some native ability to manage horizontal scalability and availability, architects have to assess whether to build in these features, exploit what the platforms offer, or both. The problem is that scalability and availability are similar in how they are addressed by either platform or application, but different in how effectively they can be detected.

An application component can be made to know whether it's overloaded simply by providing it a mechanism to monitor the work waiting for it. However, it's much more difficult to know whether something has failed, since a failed application component is clearly in no position to make determinations.

In contrast, a platform like a virtualization manager or cloud manager can detect failures at the resource level or in connectivity, both of which are hard for applications to detect. That means the first step in managing scalability and availability is to decide how to detect the problem. At the minimum, this entails exploring the ways hosting software (virtualization, the cloud, or traditional server operating systems and middleware) will be able to recognize a problem in the first place.

Pay close attention to whether the identified mechanisms are platform-independent.

As the assessment made, pay close attention to whether the identified mechanisms are platform-independent. If features can't be replicate across all platform choices, there is a risk of getting locked into a specific hosting choice.

When assessing the detection of scalability or availability events, be sure to look at how they can be actioned. When a platform detects a problem, it may not be easy to identify which applications are impacted and thus hard to know what components to scale in or out. This connection must be made before proceeding.

Once it's been established that overloaded components can be detected, the process shifts to remediation. The prevailing practice is to replicate or replace components as needed to assure QoE, but that may not be a simple matter.

The question of how the new component will be integrated into current workflows needs to be addressed. Here, scalability presents a greater challenge. A workflow that includes a variable number of component instances will have to divide work among them. That work division has to expand and contract as the number of component copies are changed. If the components being scaled or replaced are stateful, load-sharing and workflow steering will have to be stateful in turn.

Explicit load balancing isn't needed for a single copy of a component or even to replace a failed component. It will, however, be essential when a second copy is added or moved from two copies back to one.

If there are a few specific places where scalability might be essential, it may be a good idea to design the workflow for load balancing even if only one component is in use. This will make it easier and faster to add component instances later.

Application-based detection and response to failures or overload conditions has the advantage of being under control of the architect. It's wise to use platform tools for replacement (including moving a machine image to another VM) where they are available.

To reflect different capabilities in detecting and addressing horizontal scalability or availability events, consider creating a generic function or functions and adding a plug-in component for the specific platform. This will not only accommodate things like cloud bursting into a different environment, it will also support migration to a different platform later on.

Architects need to accept that platform tools for scaling and availability management are going to improve over time, which is likely to erode the benefits of an application-centric approach. At the very least, applications are going to have to accommodate and use these tools to maximize user QoE, possibly stranding application-specific solutions. It's critical architects and planners track the progress of DevOps and cloud orchestration tools and standards so these can be integrated into applications properly. Furthermore, this will allow for component management and workflow management, and even component design can be optimized for the future.

About the author:
Tom Nolle is president of CIMI Corp., a strategic consulting firm specializing in telecommunications and data communications since 1982.

Follow us on Twitter at @SearchSOA and like us on Facebook.

Next Steps

Designing public cloud apps with the hybrid in mind

Developing apps in a hybrid cloud environment

How to properly build hybrid cloud applications

Dig Deeper on Application development and management

Join the conversation

1 comment

Send me notifications when other members comment.

Please create a username to comment.

Which approach to scalability and availability do you prefer?