Scalability

When talking about a cluster scalability can be viewed from different angles. From a workload point-of-view:

Scale-up
is the ability to cope with changes in the workload (e.g. an increase in the number of users, number of transactions or database size) by using additional hardware to maintain the service quality
Speed-up
lets a cluster decrease the processing time (e.g. response or turn around time) for a constant workload by applying more parallelism
From a hardware perspective:
vertical scalability
is the ability to increase the capacity of one node by adding CPU's, memory, network cards etc. This is common ground for most servers today. Some use the term scale-up instead.
horizontal scalability
is unique for clusters which allow you to increase the capacity by adding nodes. Very often this is possible on the fly, i.e. without shutting down the existing nodes. It is also called scale-out.

CPU and memory

In the 70s and 80s high performance computing was the domain of supercomputers like the ones built by Cray, IBM and others. But the high cost and long development cycles for supercomputers let people look for alternatives which they found in clusters.

The strategy behind HPC on clusters is partitioning:If your problem is too big for a single computer, divide your problem into smaller chunks and run it on a cluster. In reality this is more complex than it sounds at first. Supercomputers remain superior for problems in which each calculation can have a ripple effect on others and where inter-node communication has a strong negative impact on overall performance. Have a look at [Foster 95] to get an idea about the problems and solutions.

Vertical scalability is supported by other architectures too. SMP and cc-NUMA systems are scalable designs. In particular the larger systems offer respectable capacity increases from the smallest to the largest model. So what is the advantage for clusters and why do so many occupy top spots in www.top500.org which lists the worlds fastest computers?

A lot has to do with memory. CPU performance has seen a dramatic improvement over the last 20 years. From the single instruction per cycle units driven at single digit megahertz in 1980 to today's super-pipelined, super-scalar monsters with multiple ALUs capable of executing several instructions per cycle and driven in the gigahertz range. As a result todays CPU's are not only extremely powerful but also quite memory hungry and you need to overcome the crippling bottleneck of memory latency. While today's processors run at extremely high frequencies, they sit idle as much as 75 percent of the time because they must wait for the much slower memory to retrieve needed data.

In contrast to CPUs memory performance has not seen the same improvement. A simple graph (borrowed from Dr. John McAlpin's STREAMS benchmark site) that plots CPU vs. memory performance shows the dilemma. Depending on the workload and the number, size and placement of cache memories one or two of today's hot chips can easily saturate the memory subsystem. And putting 16, 32 or more CPU's together in front of the same memory doesn't make things better.

And this is where clusters come in. They allow you to use vertical scalability just up to the SMP sweet spot and horizontal scalability from then on. This approach lets you put thousands of CPU's to work on a single problem giving CPU horsepower in the multiple teraflop range.

Storage

Scalability doesn't stop here. Many compute intensive applications like e.g. oil exploration and high energy physics have to handle gigantic amounts of data too. Like commercial applications they require data scalability. New storage architectures have triggered the development of cluster file systems which exploits the parallelism in CPU, memory and disks to deliver unprecedented throughput. The next chapter on storage has more details.

Network

Serving a larger user population or higher transaction rates typically not only requires more CPU, memory and disk capacity but also more network connections and bandwidth. The network section sheds some light on parallel networking.