Documente Academic
Documente Profesional
Documente Cultură
www.bull.com
Contents
Introduction
y the end of 2012, over 50% of applications running on x86 platforms will be virtualized.
This figure illustrates the massive interest that businesses are showing in a technology which
guarantees more flexible and lightweight infrastructure, and is indispensable in a smart
switch-over to Cloud computing.
However, if you look at the number of applications there are a figure that is set to expand four
or five times by 2015 according to Gartner and the fact that currently only 20% of mission-critical
applications have so far been virtualized, there is a long way to go.
Indeed, it is gradually becoming clear that the most commonly used virtualization technologies,
which depend on large numbers of blade servers, only partially meet the demand for virtualization
on a massive scale, as well as the high availability needs of critical applications and power-hungry
applications such as ERP implementations and large databases.
However the deployment of multiple physical servers for virtualization purposes may increase
the instability and the complexity of virtualized infrastructures, thus creating new problems for IT
Departments. Consequently the choice of the target architecture for virtualization is essential for the
enterprise.
Yet there are several types of server architectures dedicated to virtualization, each one has its own
advantages and drawbacks.
In this white paper, a particular focus is provided on the architecture designed by Bull and
implemented in its bullion server to virtualize business-critical applications on a massive scale.
-3-
Scale-out
and scale-up
La rvolution
Openarchitectures
Source
Moving to the second wave of virtualization, on the road to private cloud computing, providing the agility of virtualization to
business-critical applications are placing todays enterprise virtualization clusters under enormous stress. Challenges include:
Inadequate scaling of compute capabilities: scalable solutions are required to effectively handle increasingly complex
and demanding workloads and support exponential data growth over the life of business-critical applications
Insufficient reliability: resilient solutions are required in particular to handle high-density virtualization. The ability to
ensure availability can have a clear economic benefit. Downtime can result in revenue loss, damage to company
reputation, and lower employee productivity.
Increased management complexity and operating cost: underutilized IT resources consume too much space, and cost
too much to power, cool, maintain, administer, and service. Efficient solutions exist to enable you to free up and redirect
operational dollars into business innovations that will strengthen competitiveness.
Methods of adding more resources for a particular application fall into two broad categories: scale-out (scale horizontally)
and scale-up architectures (scale vertically).
-4-
Scale-out
Scale-up
To handle increasingly demanding workloads, sockets are gradually added in a seamless way within a single server.
The sockets are connected together, as well as to the memory and the I/O boards. Applications are thus able to benefit from
more and more compute power, memory, I/Os and networking capabilities. However no bottleneck should slow down the
applications, the obtained performance must be exactly in line with the number of added resources. Otherwise your scale-up
architecture is useless.
There are two broad scale-up server architectures: the glueless architecture and the glued architecture.
-5-
-6-
Cache coherency
To achieve cache coherency, a read request
must be reflected to all other processor caches
as a snoop. It can be compared to doing a
broadcast on an IP network.
Each processor must check for the requested
memory line and provide the data if it has
the most up-to-date version. If the read was
for exclusive access, then all other caches
must also invalidate their copies. In case the
modified line is available in another cache this
source snooping protocol provides the minimum
latency when the line is copied from one cache
to the next. However this solution has limited
scalability for many workloads, especially
when virtualizing Java Applications, running
large databases (Big-Data) or latency sensitive
applications.
In a source snooping coherency protocol all
reads result in snoops to all other caches.
This consumes link and cache bandwidth as
these snoop packets use cache cycles and link
resources that could otherwise be used for
data transfers. These source snoops can also
impact memory latency when snoops or snoop
responses require multiple hops. A source
snoopy memory controller cannot return the
memory data until it has collected all the snoop
responses and is sure that no cache provided a
more recent copy of the memory line. Accessing
local memory is sufficiently fast that the multi-hop
snoops and snoop responses delay the delivery
of data from the memory read. In 8-socket
glueless systems, snoops can consume up to
65% of the bandwidth.
Performance benefits
In a ccNUMA system, the hardware ensures
cache coherency by tracking where the most
up-to-date data is for every cache line held
in a processor cache. Latencies between
processor and memory in a ccNUMA system
vary depending on the location of these two
components in relation to each other and the
quality of the node-controller.
When scaling to eight processors, the glued,
node controller implementation provides
performance benefits beyond those offered by
a glueless implementation.
Glued architecture
-7-
16
4, 8
2, 3, 4, 6, 7, 8, 10, 11,
12, 14, 15, 16
80
160
65%
5-10%
2TB
4TB
1100
1100
< 2000
2200
NA
4100
Maximum number of
processors supported
Processor socket configurations
supported
Number of processor cores
-8-
Bull Coherence Switch Architecture is Bulls implementation of the glued node controller architecture.
The BCS architecture is the design foundation for servers that need to deliver more scalability, resiliency, and efficiency to
meet the requirements of the most demanding applications in high performance computing as well as in business computing.
In business computing, the BCS technology is the foundation of bullion servers dedicated to virtualization and critical
applications (see chapter 5 for more information on bullion servers). In high performance computing, the BCS technology is
the foundation of bullx Supernodes series designed to run HPC applications that require huge volumes of shared resources,
in particular shared memory.
BCS Architecture
The BCS enables two key functionalities: CPU
caching and the resilient node controller fabric.
These features serve to reduce communication
and coordination overhead and provide
availability features consistent with Intel Xeon
E7-4800 series processor.
The BCS meets the most demanding
requirements in terms of performance, RAS
features and ease of use. Servers based on the
BCS Architecture scale to sixteen processors
supporting up to 160 processor cores and up
to 320 logical processors with enabled hyperthreading. The server processing capacity is
balanced with 256x DDR3 DIMM slots for
a maximum of 4TB of memory using 16GB
DIMMs, and up to 24 I/O slots.
BCS architecture
-10-
Uncontested performance
System
Architecture
Bull
bullion
4110
X86
Hewlett-Packard
ProLiant DL980 G7
2180
X86
Fujitsu
PRIMEQUEST 1800E2
1890
X86
Oracle
1380
X86
IBM
System x3850 X5
1250
X86
Cisco
UCS C460 M2
1160
X86
Fujitsu Oracle
Sparc Enterprise
3150
RISC
Hewlett-Packard
HP Integrity Superdome
1650
EPIC
-11-
Reliability
Resilient system fabric
The Bull BCS Architecture extends the advanced
reliability of the Intel Xeon processor E7-4800
series in bullion with a resilient eXtended-QPI
fabric. This interconnect fabric provides higher
interconnect bandwidth to improve performance
and scaling, and availability features consistent
with the QPI fabric. The BCS X-QPI fabric
enables:
No more hops, to reach the information
inside any of the other processor caches.
Redundant data paths The X-QPI fabrics
provision of 100% more interconnect links
(eight here versus four in most competitive
8-socket systems without a node controller)
improves system performance by providing
more bisection bandwidth and dynamically
balancing the traffic on the links.
The fabric redundancy also helps reduce
unscheduled downtime. A failure of a X-QPI
link, is automatically resolved by using the
redundant link. Should in the most extreme
case a complete module fail the automatic
-13-
-14-
-15-
Reliability
Memory management
One of the unique reliability features of the
bullion server is its RAM memory management.
Memory protection mechanisms guarantee up
to 100% memory reliability on bullion. Over
and above traditional memory correction
mechanisms, such as ECC memory, which
maintains a memory system effectively free
from single-bit errors, bullion provides much
more sophisticated mechanisms such as DDDC
(Double Device Data Correction), which correct
dual errors.
The commonly available DIMM sparing is
now being enhanced to provide rank sparing.
With rank sparing of dual rank DIMMs, only
12.5% of the memory capacity is being used to
enhance the Quality of Service. If, for example,
bullion servers are equipped with 32GB dual
rank DIMM memory kits, each kit consisting of
two DIMMs with a capacity of 16GB, a 32GB
dual rank memory kit thus provides 28GB of
useable space, while the rest is being used for
fail-over, if the level of ECC errors becomes too
high.
Another example of a mechanism to improve
memory reliability is MCA recovery, which
ensures that memory errors detected are
forwarded to the VMware hypervisor, to make
sure that the hypervisor does not use this
deficient memory address space any more.
These two features limit the impact of memory
crashes just to the affected VMs, without having
to provide the memory DIMMs needed for
memory mirroring.
Finally, for 100% memory reliability, bullion
-16-
Ultra-Efficient cooling
Each bullion module contains eight strategically
located hot-swap fans in N+N configuration,
combined with the efficient airflow paths defined
by the unique positioning of the memory DIMM
modules. It provides a highly efficient system
cooling. The fans are arranged to cool four
separate zones each with their own pair of
fans for optimum redundancy and superior
availability levels.
The fans automatically adjust speeds in
response to changing thermal requirements,
depending on the zone, redundancy and
internal temperatures. When the temperature
inside the server increases, the fans speed up to
maintain the proper ambient temperature. When
the temperature returns to a normal operating
level, the fans return to their default speed. With
this solution Bull enables to reduce significantly
the ambient noise, reduce the wear and tear
on the fans and reduce the server electricity
consumption.
Together with the optimized front-bezel
combining the unique form and shape of the
ventilation holes with an innovative design
underlining the reconciliation of energy
consumption and management efficiency.
-17-
Availability
Active/Passive Power-supplies to reconcile
availability and energy efficiency
The bullion servers are equipped with two
1600W common slot power supplies, which are
80+ Platinum level certified. These two 1600W
power supplies provide in standard a full grid
N+N redundancy for a maximum availability.
To increase even further the efficiency Bull has
developed a patented solution based on an
active/passive power supply principle.
Active/passive power supplies provide the
highest efficiency rate possible, regardless the
requirement and still provide a maximum uptime
Serviceability
Maintainability and Availability
With bullion a new simplified path is taken to
ease the replacement of the most frequently
failing motorized components, such as the
ventilators, power-supplies and disk-drives.
Those three components are responsible for over
80% of hardware failures, but have no impact
whatsoever in the production on bullion servers.
In fact the Bull engineers have done an excellent
work to ease the maintainability of these
components. Thanks to these efforts replacing
these components are now part of the Customer
Replaceable Units (CRUs). This program
empowers you to repair your own machine and it is easier than you may think. In situations
where a computer failure can be attributed to
an easily replaceable part ( a CRU), Bull sends
you the new part. Without needing any special
tools or skills, you swap the old part for the new
one. It is simple. The major advantage: really
fast service for you and reduced support and
maintenance fees.
-18-
iCare
bullion can also interface with the iCare
software package developed by Bull. The iCare
package facilitates the maintenance of the
bullion system by collecting error events and
log files transmitted by the bullion systems into
a central database. It provides a suite of tools
to aid in the analysis and diagnosis of system
events, and assistance in identifying possible
preventive maintenance actions. It can also
serve as an autocall concentrator, allowing
rules and actions for autocalls to be defined for
events.
-19-
VMware vSphere 5 is a NUMA-aware hypervisor that allocates memory local to or close to a requesting core or thread to
minimize memory latency and link bandwidth consumption. VMware vSphere 5 automatically optimizes the Virtual Machines
deployed and can further be tuned with parameters and other mechanisms to adapt the default NUMA behaviour for various
workloads to achieve improved performance scalability.
-20-
-21-
-22-
Conclusion
-23-
This white paper is printed on paper combining 40% eco-certified fibers from sustainable forests management and 60% recycled fibers in line with current environment
standards (ISO 14001).
W-BCS-en1
Bull SAS - 2012 - Bull acknowledges the rights of proprietors of trademarks mentioned herein. Bull reserves the right to modify this document at any time without notice. Some offers or parts of
offers described in this document may not be available in your country. Please consult your local Bull correspondent for information regarding the offers which may be available in your country.
This document has no contractual significance. Intel and Intel Xeon are trademarks or registered registered trademarks of Intel Corporation in the US and other countries.