Documente Academic
Documente Profesional
Documente Cultură
Cluster Computing
Introduction
A computer cluster consists of a set of loosely connected computers that work together so that
in many respects they can be viewed as a single system.
The components of a cluster are usually connected to each other through fast local area networks,
each node (computer used as a server) running its own instance of an operating system.
Computer clusters emerged as a result of convergence of a number of computing trends
including the availability of low cost microprocessors, high speed networks, and software for
high performance distributed computing.
Clusters are usually deployed to improve performance and availability over that of a single
computer, while typically being much more cost-effective than single computers of comparable
speed or availability.
Computer clusters have a wide range of applicability and deployment, ranging from small
business clusters with a handful of nodes to some of the fastest supercomputers in the world.
Basic concepts
The desire to get more computing horsepower and better reliability by orchestrating a
number of low cost commercial off-the-shelf computers has given rise to a variety of
architectures and configurations.
The computer clustering approach usually (but not always) connects a number of readily
available computing nodes (e.g. personal computers used as servers) via a fast local area
network. The activities of the computing nodes are orchestrated by "clustering middleware", a
software layer that sits atop the nodes and allows the users to treat the cluster as by and large one
cohesive computing unit, e.g. via a single system image concept.
Computer clustering relies on a centralized management approach which makes the nodes
available as orchestrated shared servers. It is distinct from other approaches such as peer to peer
or grid computing which also use many nodes, but with a far more distributed nature.
A computer cluster may be a simple two-node system which just connects two personal
computers, or may be a very fast supercomputer. A basic approach to building a cluster is that of
a Beowulf cluster which may be built with a few personal computers to produce a cost-effective
alternative to traditional high performance computing. An early project that showed the viability
of the concept was the 133 nodes Stone Soupercomputer.[4] The developers used Linux, the
Parallel Virtual Machine toolkit and the Message Passing Interface library to achieve high
performance at a relatively low cost.
Although a cluster may consist of just a few personal computers connected by a simple
network, the cluster architecture may also be used to achieve very high levels of performance.
The TOP500 organization's semiannual list of the 500 fastest supercomputers often includes
many clusters, e.g. the world's fastest machine in 2011 was the K computer which has a
distributed memory, cluster architecture.[6][7]
Attributes of clusters
Computer clusters may be configured for different purposes ranging from general purpose
business needs such as web-service support, to computation-intensive scientific calculations. In
either case, the cluster may use a high-availability approach. Note that the attributes described
below are not exclusive and a "compute cluster" may also use a high-availability approach, etc.
By contrast, the special purpose 144 node DEGIMA cluster is tuned to running astrophysical
N-body simulations using the Multiple-Walk parallel treecode, rather than general purpose
scientific computations.
Due to the increasing computing power of each generation of game consoles, a novel use has
emerged where they are repurposed into High-performance computing (HPC) clusters. Some
examples of game console clusters are Sony PlayStation clusters and Microsoft Xbox clusters.
Another example of consumer game product is the Nvidia Tesla Personal Supercomputer
workstation, which uses multiple graphics accelerator processor chips.
Computer clusters have historically run on separate physical computers with the same
operating system. With the advent of virtualization, the cluster nodes may run on separate
physical computers with different operating systems which are painted above with a virtual layer
to look similar. The cluster may also be virtualized on various configurations as maintenance
takes place. An example implementation is Xen as the virtualization manager with Linux-HA.[11]
performance in these systems because of overhead in using a lock manager and the potential
bottlenecks of shared hardware generally. Shared-disk clusters make up for this shortcoming
with relatively good scaling properties: OPS and HACMP support eight-node systems,
MPI emerged in the early 1990s out of discussions between 40 organizations. The initial effort
was supported by ARPA and National Science Foundation. Rather than starting anew, the design
of MPI drew on various features available in commercial systems of the time. The MPI
specifications then gave rise to specific implementations. MPI implementations typically use
TCP/IP and socket connections. MPI is now a widely available communications model that
enables parallel programs to be written in languages such as C, Fortran, Python, etc. Thus, unlike
PVM which provides a concrete implementation, MPI is a specification which has been
implemented in systems such as MPICH and Open MPI.
Cluster management
One of the challenges in the use of a computer cluster is the cost of administrating it which can at
times be as high as the cost of administrating N independent machines, if the cluster has N
nodes. In some cases this provides an advantage to shared memory architectures with lower
administration costs.This has also made virtual machines popular, due to the ease of
administration.
Task scheduling
When a large multi-user cluster needs to access very large amounts of data, task scheduling
becomes a challenge. The MapReduce approach was suggested by Google in 2004 and other
algorithms such as Hadoop have been implemented.
However, given that in a complex application environment the performance of each job
depends on the characteristics of the underlying cluster, mapping tasks onto CPU cores and GPU
devices provides significant challenges.[17] This is an area of ongoing research and algorithms
that combine and extend MapReduce and Hadoop have been proposed and studied. [17]
Price performance
Clustering can provide significant performance benefits versus price. The System X
supercomputer at Virginia Tech, the 28th most powerful supercomputer on Earth as of June
2006,is a 12.25 TFlops computer cluster of 1100 Apple XServe G5 2.3 GHz dual-processor
machines (4 GB RAM, 80 GB SATA HD) running Mac OS X and using InfiniBand
interconnect. The cluster initially consisted of Power Mac G5s; the rack-mountable XServes are
denser than desktop Macs, reducing the aggregate size of the cluster. The total cost of the
previous Power Mac system was $5.2 million, a tenth of the cost of slower mainframe computer
supercomputers. (The Power Mac G5s were sold off.)
unified system image (single system image) and availability out of a collection on independent
but interconnected computers.
Programming environments can o_er portable, e_cient, and easy-to-use tools for development of
applications. They include message passing libraries, debuggers, and profilers. It should not be
forgotten that clusters could be used for the execution of sequential or parallel applications.
Clusters Classifications
Clusters offer the following features at a relatively low cost:
High Performance
Expandability and Scalability
High Throughput
High Availability
Cluster technology permits organizations to boost their processing power using standard
technology (commodity hardware and software components) that can be acquired/purchased at a
relatively low cost. This provides expandability- an affordable upgrade path that lets rganizations
increase their computing power-while preserving their existing investment and without incurring
a lot of extra expenses.The performance of applications also improves with the support of
scalable software environment. Another benefit of clustering is a failover capability that allows
a backup computer to take over the tasks of a failed computer located in its cluster.
Clusters are classified into many categories based on various factors as indicated below.
1. Application Target - Computational science or mission-critical applications.
High Performance (HP) Clusters
High Availability (HA) Clusters
2. Node Ownership - Owned by an individual or dedicated as a cluster node.
Dedicated Clusters
Nondedicated Clusters
The distinction between these two cases is based on the ownership of the nodes in a cluster. In
the case of dedicated clusters, a particular individual does not own a workstation; the resources
are shared so that parallel computing can be performed across the entire cluster. The alternative
nondedicated case is where individuals own workstations and applications are executed by
stealing idle CPU cycles. The motivation for this scenario is based on the fact that most
workstation CPU cycles are unused, even during peak hours. Parallel computing on a
dynamically changing set of nondedicated workstations is called adaptive parallel computing.
In nondedicated clusters, a tension exists between the workstation owners and remote users who
need the workstations to run their application. The former expects fast interactive response from
their workstation, while the latter is only concerned with fast application turnaround by utilizing
any spare CPU cycles. This emphasis on sharing the processing resources erodes the concept of
node ownership and introduces the need for complexities such as process migration and load
balancing strategies. Such strategies allow clusters to deliver adequate interactive performance as
well as to provide shared resources to demanding sequential and parallel applications.
3. Node Hardware - PC, Workstation, or SMP.
Clusters of PCs (CoPs) or Piles of PCs (PoPs)
Clusters of Workstations (COWs)
Clusters of SMPs (CLUMPs)
Processors
Over the past two decades, phenomenal progress has taken place in microprocessor
architecture (for example RISC, CISC, VLIW, and Vector) and this is making the single-chip
CPUs almost as powerful as processors used in supercomputers. Most recently researchers have
been trying to integrate processor and memory or network interface into a single chip. The
Berkeley Intelligent RAM (IRAM) project is exploring the entire spectrum of issues involved in
designing general purpose computer systems that integrate a processor and DRAM onto a single
chip- from circuits, VLSI design, and architectures to compilers and operating systems. Digital,
with its Alpha 21364 processor, is trying to integrate processing, memory controller, and
network interface into a single chip.
and they include Extended Data Out (EDO) and fast page. EDO allows the next access to begin
while the previous data is still being read, and fast page allows multiple adjacent accesses to be
made more efficiently. The amount of memory needed for the cluster is likely to be determined
by the cluster target applications. Programs that are parallelized should be distributed
such that the memory, as well as the processing, is distributed between processors for scalability.
Thus, it is not necessary to have a RAM that can hold the entire problem in memory on each
system, but it should be enough to avoid the occurrence of too much swapping of memory blocks
(page-misses) to disk, since disk access has a large impact on performance.
Access to DRAM is extremely slow compared to the speed of the processor, taking up to orders
of magnitude more time than a CPU clock cycle. Caches are used to keep recently used blocks of
memory for very fast access if the CPU references a word from that block again. However, the
very fast memory used for cache is expensive and cache control circuitry becomes more complex
as the size of the cache grows. Because of these limitations, the total size of a cache is usually in
the range of 8KB to 2MB. Within Pentium-based machines it is not uncommon to have a 64-bit
wide memory bus as well as a chip set that supports 2 MBytes of external cache. These
improvements were necessary to exploit the full power of the Pentium and to make the memory
architecture very similar to that of UNIX workstations.
Cluster Applications
Clusters have been employed as an execution platform for a range of application classes, ranging
from supercomputing and mission-critical ones, through to ecommerce, and database-based ones.
Clusters are being used as execution environments for Grand Challenge Applications (GCAs)
[57] such as weather modeling, automobile crash simulations, life sciences, computational fluid
dynamics, nuclear simulations, image processing, electromagnetics, data mining, aerodynamics
and astrophysics. These applications are generally considered intractable without the use of stateof-the-art parallel supercomputers. The scale of their resource requirements, such as processing
time, memory, and communication needs distinguishes GCAs from other applications. For
example, the execution of scientific applications used in predicting life-threatening situations
such as earthquakes or hurricanes requires enormous computational power and storage resources.
In the past, these applications would be run on vector or parallel supercomputers costing millions
of dollars in order to calculate predictions well in advance of the actual events. Such applications
can be migrated to run on commodity off-the-shelf-based clusters and deliver comparable
performance at a much lower cost.
In fact, in many situation expensive parallel supercomputers have been replaced by low-cost
commodity Linux clusters in order to reduce maintenance costs and increase overall
computational resources. Clusters are increasingly being used for running commercial
applications. In a business environment, for example in a bank, many of its activities are
automated. However, a problem will arise if the server that is handling customer transactions
fails. The banks activities could come to halt and customers would not be able to deposit or
withdraw money from their account. Such situations can cause a great deal of inconvenience and
result in loss of business and confidence in a bank. This is where clusters can be useful. A bank
could continue to operate even after the failure of a server by automatically isolating failed
components and migrating activities to alternative resources as a means of offering an
uninterrupted service.
With the increasing popularity of the Web, computer system availability is becoming critical,
especially for e-commerce applications. Clusters are used to host many new Internet service
sites. For example, free email sites like Hotmail , and search sites like Hotbot use clusters.
Cluster-based systems can be used to execute many Internet applications:
Web servers
Search engines
Email
Security
Proxy
Database servers
In the commercial arena these servers can be consolidated to create what is known as an
enterprise server. The servers can be optimized, tuned, and managed for increased efficiency and
responsiveness depending on the workload through various loadbalancing techniques. A large
number of low-end machines (PCs) can be clustered along with storage and applications for
scalability, high availability, and performance.
The Linux Virtual Server [66] is a cluster of servers, connected by a fast network. It provides a
viable platform for building scalable, cost-effective and a more reliable Internet service than a
tightly coupled multi-processor system since failed components can be easily isolated and the
system can continue to operate without any disruption. The Linux Virtual Server directs clients
network connection requests to the different servers according to a scheduling algorithm and
makes the parallel services of the cluster appear as a single virtual service with a single IP
address. Prototypes of the Linux Virtual Server have already been used to build many sites that
cope with heavy loads, Client applications interact with the cluster as if it were a single server.
The clients are not affected by the interaction with the cluster and do not need modification. The
applications performance and scalability is achieved by adding one or more nodes to the cluster,
by automatically detecting node or daemon failures and by reconfiguring the system
appropriately to achieve high availability.
Clusters have proved themselves to be effective for a variety of data mining applications. The
data mining process involves both compute and data intensive operations. Clusters provide two
fundamental roles:
Data-clusters that provide the storage and data management services for the data sets being
mined.
Compute-clusters that provide the computational resources required by the data filtering,
preparation and mining tasks.
The Terabyte Challenge [69] is an open, distributed testbed for experimental work related to
managing, mining and modelling large, massive and distributed data sets.
The Terabyte Challenge is sponsored by the National Scalable Cluster Project (NSCP) [70], the
National Center for Data Mining (NCDM) [71], and the Data Mining Group (DMG) [72]. The
Terabyte Challenges testbed is organized into workgroup clusters connected with a mixture of
traditional and high performance networks. They define a meta-cluster as a 100-node workgroup of clusters connected via TCP/IP, and a supercluster as a cluster connected via a high
performance network such as Myrinet. The main applications of the Terabyte Challenge include:
A high energy physics data mining system called EventStore;
A distributed health care data mining system called MedStore;
A web documents data mining system called Alexa Crawler [73];
Other applications such as distributed BLAST search, textural data mining and economic data
mining.
An underlying technology of the NSCP is a distributed data mining system called Papyrus.
Papyrus has a layered infrastructure for high performance, wide area data mining and predictive
modelling. Papyrus is built over a data-warehousing layer, which can move data over both
commodity and proprietary networks. Papyrus is specifically designed to support various cluster
configurations, it is the first distributed data mining system to be designed with the flexibility of
moving data, moving predictive, or moving the results of local computations.