Until fairly recently there were really only two types of networked storage available: large, high-end and high-performance storage arrays, and modular arrays built on a dual-controller architecture. Scale-out systems were few and far between, and were only really implemented in niche use cases. What a difference a few years can make.
Now, thanks to software defined storage, there is an enormous variety of scale-out storage solutions that utilize a likewise varietous number of architectures. Scale-out storage solutions are being rolled out for almost any type of use case.
Thanks to the flexibility and ease of implementation of modern scale-out systems, both high-end arrays and dual-controller setups are seen less frequently now. There are still situations that call for extreme levels of reliability and/or performance, and in those cases, high-end arrays are still used. Likewise, dual controller modules are still useful for lighter applications and as pieces of a more complex whole, but scale-out storage is becoming more and more ubiquitous.
That ubiquity makes it all the more important to understand how scale-out storage works — in particular, the two aspects of scale-out storage systems that have the biggest impact on their performance.
Redundancy and Data Safety
The most common type of scale-out storage array is the shared-nothing cluster. In this architecture, each node in the system has exclusive access to a piece of the total storage. This makes a shared-nothing cluster easy to implement with common, un-specialized hardware, but results in some complications regarding data efficiency.
Because each node has its own independent piece of the total storage, it is necessary to mirror the data in multiple locations within the system. Three times as much storage as data is required to ensure resiliency should a node fail. Erasure coding is an encoding scheme that can increase data efficiency, but a storage to data ratio of 1.2 to 1 is still necessary.
Network Usage
Scale-out storage systems use the network in two ways: node to node traffic, and host access. In early systems, nodes were often interconnected using a dedicated secondary network to perform load balancing and data mirroring when new nodes were added and removed, or to rebuild the data store in the case of a node failure.
These days, with the increasing availability of 10GBps networking, it’s seldom necessary to provide a secondary, dedicated network. Most corporate networks provide adequate bandwidth at a low enough latency for both types of traffic. In situations where a slower network is all that’s available, however, this can be a factor to consider.
Vast Flexibility
Scale-out storage systems are, by their very nature, flexible. They are meant to allow for expansion and contraction of the storage system, with operation at a variety of speeds and capacities, as the needs of the organization change. They are subject to a small handful of limitations, but within those limitations, the sky is the limit.