Cray Family (Coa)

TERM PAPER
OF
COMPUTER ARCHITECTURE AND
ORGANIZATION
TOPIC – CRAY FAMILY

Submitted by: Submitted to :Lect.
Ruchika Dhall
Avinash Manhas
Roll.no- RE2801B46
Reg.no- 10809450
Course- B Tech-M.Tech( IT)
CRAY FAMILY
INTRODUCTION
The first Cray-1 system was installed at Los Alamos National Laboratory in 1976 for
$8.8 million. It boasted a world-record speed of 160 million floating-point operations per
second (160 megaflops) and an 8 megabyte (1 million word) main memory. No wire in
the system was more than four feet long. To handle the intense heat generated by the
computer, Cray developed an innovative refrigeration system using Freon.
In 1988, Cray Research introduced the Cray Y-MP, the world's first supercomputer to
sustain over 1 gigaflop on many applications. Multiple 333 MFLOPS processors powered
the system to a record sustained speed of 2.3 gigaflops.
The 1990s brought a number of transforming events to Cray Research. The company
continued its leadership in providing the most powerful supercomputers for production
applications. The Cray C90 featured a new central processor with industry-leading
sustained performance of 1 gigaflop. Using 16 of these powerful processors and 256
million words of central memory, the system boasted unrivaled total performance. The
company also produced its first "minisupercomputer," the Cray XMS system, followed
by the Cray Y-MP EL series and the subsequent Cray J90.
In 1993, Cray Research offered its first massively parallel processing (MPP) system, the
Cray T3D supercomputer, and quickly captured MPP market leadership from early MPP
companies such as Thinking Machines and MasPar. The Cray T3D proved to be
exceptionally robust, reliable, sharable and easy-to-administer, compared with competing
MPP systems.
In another technological landmark, the Cray T90 became the world's first wireless
supercomputer when it was unveiled in 1994. Also introduced that year, the Cray J90
series has since become the world's most popular supercomputer, with over 400 systems
sold.
Cray Research merged with SGI (Silicon Graphics, Inc.) in February 1996. In August
1999, SGI created a separate Cray Research business unit to focus exclusively on the
unique requirements of high-end supercomputing customers. Assets of this business unit
were sold to Tera Computer Company in March 2000.
Cray provides two types of dedicated nodes — compute nodes and service nodes.
Compute nodes are optimized to run parallel MPI and/or OpenMP tasks with
maximum efficiency. Service nodes provide scalable system and I/O connectivity and
can serve as login nodes from which applications are compiled and launched. Cray
provides fully integrated networking, using an efficient, low-contention three-
dimensional (3D) torus architecture, designed for superior application performance
for large-scale, massively parallel applications.
Contents:
* Cray supercomputer families
* Assessing a supercomputer
* Cray1
a. Address Component
b. Scalar Component
c. Vector Component
d. I/O Component
* PVP Generations, XMP, YMP, C90, T90
a. Parallel Vector Processors, the core product line of Cray Research.
b. Inside a Vector CPU

* The Cray-2
* Cray Superserver systems
* Cray Operating systems
* Ows and other support equipment
Cray supercomputer families
1972 Very Approx. Time line 1996 (dates not to scale)
**T3d* --> T3e --> T3e/1200 = MPP

* *
C1 -> XMP --> YMP --> C90 ----> T90 = PVP
\ \ \
C2 \ \ -> C90M = Large memory
. XMS --> ELs ---> J90 -> J90se -> SV1 = Air cooled Vector supermini
. APP --> SMP ---> CS6400 = Sparc Superserver
....... C3 --> C4 = Cray Computer corp.
Object Description:
--> Direct architectural descendant

\ Similar architecture but different technology
... Family resemblance
* Hosted by
Computers that proudly carried the Cray name can be divided into groups of related
architectural families. The first incarnation was the Cray-1, designed and built by Cray
Research founder Seymour Cray. This machine incorporated a number of novel
architectural concepts that became the foundation for the line of Vector super computers
which made Cray Research (1978..1995) legendary in the scientific and technical
computing market.
The Cray-1 evolved through a number of, often one-of-a-kind, sub-variants before being
replaced by the evolutionary XMP and substantially different Cray-2. This split between
the XMP and Cray-2 marked the first divide in the Cray architectural line that together
came to define and dominate the super computing market for the best part of 20 years.
The line of machines designated C1, C2, C3 and C4 were the particular developments of
Seymour Cray, who split on friendly financial terms from Cray Research in 1989, to
progress the Cray 3 project and found Cray Computer Corporation (1989..1992).
In parallel to this, the main body of Cray Research evolved and developed the original
Cray-1 concept through four technological generations that culminated in the 32 CPU
T90. Along with this a line of compatible mini super computers, an enterprise class
version of a scaled-up SMP Sparc architecture and a line of Massively parallel machines
were developed and brought to market.
Cray machines were never cheap to buy or own but provided demanding customers with
the most powerful computers of the time. Used by a select group of research labs and the
top flight of industry, they defined the very nature of supercomputing in the 1980s and
1990s.
Assessing a supercomputer
It is easy to describe the power of these super computers in terms of CPU MHzs and
Mflops but the numbers fail to quantify the real difference between Cray computers and
the other available machines of the day. It is easier to think in terms of an analogy. If you
compare computers to cars and lorries the Cray machines are those big dumper trucks and
land graders that you see lumbering round a quarry. They certainly are not the fastest to
change direction and you can't use them to do the weekly supermarket shop but when it
comes to moving rocks in large quantities there is nothing to touch them. The speed of
the CPU in a computer is only one measure of its usefulness in solving problems, the
other important metrics for effective problem solution are capacity and balance.
When looking at a high performance computer you have to examine many aspects
specifically, CPU speed, memory bandwidth, parallelism, IO capacity and finally ease of
use. Running briefly through this list we can see that Cray machines had features in each
area that combined to deliver unmatched time to solution.
CPU speed: all data maths was done in 64 bit words, lists of numbers (vectors) could be
worked as efficiently as single values. Special instructions could be used to short-circuit
more complex operations (gather/scatter, population, leading zero, vector arithmetic).
The CPUs also happened to be implemented in very fast logic.
Memory bandwidth: Cray memory is fast real memory, no page faults or translation
tables to slow memory access. Memory was subdivided into independent banks so that a
list of numbers could be fetched to a CPU faster than the per bank memory delay. In the
Vector machines the memory was globally and equally accessible by all CPUs but the
MPP systems had physically distributed, globally accessible memory.
IO capacity: provided by separate subsystems that read and wrote directly to memory
without the need to divert the CPU from computational tasks. The disks had to be the best
available as they would often receive a pounding far in excess of most disk duty cycles.
Heavy duty networking was provided initially by proprietary protocols over proprietary
hardware but later, when TCP/IP had been invented, open standards were adopted. The
machines often sat at the centre of large diverse networks.
Ease of use: achieved on two fronts for both programmers and administrators. By
providing compilers that could automatically detect and optimise the parallelism and
vectorisation within a program as well as highly efficient numerical libraries, programs
were developed that achieved very high percentages of the peak speed of the machines.
Unicos, an extended Unix variant, provided the system administrators with an operating
system with big system facilities with a familiar interface. Unicos was Unix tuned for big
multi-user environments, providing mainframe class resource control and security
features.
The Cray 1
The Cray 1 was first delivered in 1976. This was around the same time that 8-bit
microprocessors were beginning to gain popularity, typical memory components were 1K
bit SRAM and 4 K bit DRAM. Most machines were operating at about a 1 MHz clock
rate, had 32-bit words, and large mainframes had 1 MB to 8 MB of RAM.
The Cray 1 had (Baron and Higbie CS manual)
• 64-bit words
• 8 MB of RAM
• 16-way interleaving on low-order bits
• 50 ns memory cycle
• 12.5 ns clock cycle (80 MHz)
• 12 pipelined functional units
The Cray 1 has 3 basic data types: addesses (24-bit integer), integers (64-bit), floating
point (64-bit, 48-bit mantissa).
The 12 functional units are divided into four groups.
Group 1 -- Vector units
Vector (integer) Add: 3 stages

Vector Logical: 2 stages
Vector Shift: 4 stages
Group 2 -- Vector and scalar units

Floating Add: 6 stages
Floating Multiply: 7 stages
Floating Reciprocal Approximation: 14 stages
Group 3 -- Scalar units
Integer Add: 3 stages

Logical: 1 stage
Shift: 2 stages
Scalar population count and leading zero count: 3 stages
Group 4 -- Address units
Add: 2 stages
Multiply: 6 stages
The machine itself is divided into six major subsystems
• Memory
• Instruction component
• Address component
• Scalar component
• Vector component
• I/O component
• Instruction Component
Cray 1 instructions are 32 or 16 bits, so from 2 to 4 instructions can be packed into a

word. Instructions are thus addressed on 16-bit boundaries while data is addressed on 64-
bit boundaries.
The instruction unit has four 16-word instruction buffers, three instruction registers, and
one instruction counter. Each 16-bit field in a word is called an instruction parcel.
The three instruction registers are
• Next Instruction Parcel -- holds first parcel of the next instruction, prefetched
from buffer
• Current Instruction Parcel -- holds the high-order portion of the instruction to be
issued
• Lower Instruction Parcel -- holds low-order portion of instruction to be issued
For a 32-bit instruction, the low-order portion is fetched to the NIP and then moved to the
LIP. There is no mechanism for discarding instructions in the pipe -- once in the CIP/LIP,
they will be issued. At most they will be delayed for some time.
The instruction buffers are tied to the memory via the 16-way interleaving, so it is
possible to fill a buffer in 4 clock cycles (recall that the clock is 12.5 ns and memory is 50
ns). Buffers are filled on a demand basis in a round-robin pattern. They thus act as an
instruction cache of 256 instructions, organized into four lines of 64 instructions. Each
buffer has its own address comparator, so we would call this a fully associative cache
(easy to implement when there are only 4 lines). The buffers cannot be written to -- a
write bypasses the instruction cache and only goes to main memory.
Scalar instruction issue requires that all of the instruction's required resources be free --
otherwise the instruction waits. Vector instruction issue in the Cray involves reserving
functional units, including memory, operand registers and result registers, and then
releasing an instruction once all of its resources are available. In addition, some data
paths are shared between the vector and scalar components, and these must be available.
The control unit is able to detect when a result register for one vector operation is an
operand for another vector operation and, if the two vector instructions do not conflict in
any other resource requirements, it sets up a vector chaining operation between the two
instructions.
Address Component
There are 8 24-bit address registers, 64 24-bit spill registers, an adder, and a multiplier in
this component. Its purpose is to perform index arithmetic and send the results to the
scalar and vector components so that they can fetch the appropriate operands.
Arithmetic is performed on the address registers directly. The spill registers are used to
hold address values that do not fit into the address registers. A set of 8 addresses can be
transferred between the address registers and their spill registers in a single cycle. Thus,
they bear a certain similarity to the register windows of the SPARC (or vice versa). The
spill registers can be thought of as an explicitly managed data cache with 8 lines. Their
value is that they reduce the traffic to main memory, freeing that resource for vector
operations.
Scalar Component
Similar to the address component, the scalar component has 8 64-bit registers and 64 64-
bit spill registers. It has sole access to four functional units: Integer Add, Logical, Shift,
and Population Count. The Scalar Component also has access to three functional units
that are shared with the Vector Component: Floating Add, Multiply, and Reciprocal
Approximation.
Because the scalar component has its own integer units, it can always execute integer
operations in parallel with a vector operation. However, for floating point, the vector unit
takes priority.
Vector Component
The are 8 64-word vector registers in the vector component. It takes four memory loads
to fill a vector register. Normally, this would require 16 instruction cycles. However,
careful pipelining in the memory unit reduces the time to just 11 cycles.
A vector mask register contains a bit-map of the elements in a register operand that will
participate in an instruction. A vector length register determines whether fewer than 64
operands are contained in a set of vector operands. Manipulating these values is the
primary reason for the population and leading zeros counter. Vector loads and stores
specify the first location, the length, and the stride.
I/O Component
The I/O component has 24 programmable I/O channel units. I/O has the lowest priority
for memory access.
Cray X-MP
• Extended the Cray-1 architecture to 4-way multiprocessing.

• Cycle reduced to 8.5 ns (117 MHz)
• Increased instruction buffers to 32 words
• Added a multiport memory system.
• Redesigned the vector unit to support arbitrary chaining.
• Added Gather/Scatter to support sparse arrays.
• Increased memory to 16 M words, 32-way interleave
• Provides a set of shared registers to support fine-grained (loop-level)
multiprocessing. There are N+1 sets of these registers for an N-processor system.
They include eight address registers, 8 scalar registers, and 32 binary semaphores.
• The I/O system was improved and a solid state disk cache was added.
Cray Y-MP
• Extends the X-MP architecture to 8 processors.

• Cycle reduced to 6 ns (166 MHz)
• Extends memory to 128 M words
PVP Generations, XMP, YMP, C90, T90
Parallel Vector Processors, the core product line of Cray Research.
The XMP range evolved from the Cray 1 and introduced dual processing to the Vector
line. Originally limited to 16 MWd memory the later "Extended memory architecture"
variants grew the address register from 24 to 32 bits growing the maximum program size
to 2 GBytes. The XMP was brought to market whilst the Cray 2 was still in development
and was a huge success that proved hard to repeat in later years. The line of big-iron
vector super computers continued with the YMP, C90 and T90 each generation of which
approximately doubled the CPU speed, number of CPUs, memory size and CPU quantity
along with improving a host of other details.
Date
| Model
| | Max. Number CPUs
CPU speed per CPU approx. peak.
1976 C1 1 0.160 GFlop
1982 XMP 4 0.235 GFlop
1988 YMP 8 0.333 GFlop
1991 C90 16 1 GFlop
1995 T90 32 2 GFlop
The YMP range of machines started by utilising new board and cooling technology for
just the CPUs, using XMP style boards for the IOS and SSD. Eventually the IOS and
SSD also came to use YMP style, internally cooled boards in a range of chassis sizes.
This final leg, which balanced the extreme performance of the memory and CPUs, was
the model E IO subsystems which provided high speed sustained and parallel
input/output capacity for the system.
Inside a Vector CPU
Just what was it that made the Cray CPUs so fast? Putting aside the fact that the logic was
implemented in fast bipolar hardware, there were a number of features that, combined
with clever compiler technology, made the processors speed through the type of scientific
and engineering problems that were the heartland of Cray customers. Described in this
section are some to the features that made the difference in both speed and price.
Registers: lots of them, in a YMP CPU for example
8 V registers, each 64 words long, each word 64 bits,

64 T registers, each 64 bits,
8 S registers, each 64 bits,
64 B registers, each 32 bits,
8 A registers, each 32 bits,
4 Instructions buffers 32 64-bit words (128 16-bit parcels).
YMP functional units were:

address: add, multiply
scalar: add, shift, logical, pop/parity/leading zero
vector: add, shift, logical, pop/parity
floating: add, multiply, reciprocal approximation
Other sundry CPU registers are Vector mask, Vector length, Instruction issue registers,
performance monitors, programmable clock, Status bits and finally exchange parameters
and I/O control registers. The quantity and the types of registers evolved and expanded
through the life of the CPU types. The C90 added more functional units to the YMP
design and the T90 even more still.
Memory interface: CPUs are faster than memory so the speed at which a processor can
exchange information with memory limits the effectiveness of the processor. This can
strangle the performance of an architecture so a simple solution to halve the memory
delay is to have two independent banks of memory. Taking this further, having enough
memory banks to match the ratio of memory speed to CPUs speed, would remove the
memory refresh speed delay. For example if a CPU has an 8.5 Nano second clock cycle
time and the memory banks have a refresh time of 68 Nano seconds and there are 16
memory banks an operation such as
do i = 1,60000
c[i] = a[i] + n
done
can run at full speed. Even in modern processors the above operation would become
memory bound as soon as the processors cache was exhausted. As well as multiple banks
there were multiple ports to memory from each CPU to prevent bus contention. Looking
at this from another view, sequential memory locations come from separate memory
banks. As the architecture developed the number of banks and ports increased along with
vector length.
Location, 0,1,2,3,4,5,6,7,8,9,A,B,C,...
bank, 0,1,2,3,4,5,6,7,0,1,2,3,4,...
This memory bank architecture also accounted for machines with identical CPUs, but
different memory sizes having different peak speeds. It also explained why a memory
board failure could not be worked around by shrinking memory. In the above example
removing a memory board would remove every 8th memory location, which is
impossible to code round. C90 systems had the ability to map spare memory chips to
cover failing memory locations. Later T90s did have the ability to down half memory or
some CPUs in the event of a failure.
The Cray-2
The Cray 2 sits on the Cray time line at a position after the XMP had become well
established but before the YMP range was delivered. It is however in a class of its own
for a number of reasons. The Cray 2 had a huge memory starting at 512 Mb and rising to
4 Gbytes, a size that was not matched by other production systems for a decade. The
system had a very small foot print sitting on a 1.35m diameter circle and rising to just
1.15m. This very compact arrangement was made possible by the other major innovation,
total immersion cooling. The processor case was filled in a circulating inert fluid that
cooled the boards allowing a much higher packing density than other arrangements.
Some About Cray 2:
• One foreground and four background processors.

• 4.1 ns cycle (244 MHz)
• Up to 256 M words of memory
• 64 or 128 way interleave depending on configuration
• Eliminates the spill registers in favor of a 16K word cache
• Cache feeds all three computational components with 4-cycle access time
• Has 8 16-word instruction buffers
• Foreground processor controls the I/O subsystem, which has up to 4 high speed ]
communication channels (4 Gb/s).
Massively Parallel Processing systems, T3d, T3e
During the late part of the 1980s a variety of companies were researching and selling a
new class of machines that threatened to topple the super computing crown held by Cray
for so long. These machines derived their compute power from using larger numbers of
smaller processors. Pioneered by Thinking machines, Kendal Square, NCube, Maspar
and Mieko these systems had begun to catch the eye of academics and scientists as
providing an alternative to the expensive and often over-subscribed Cray machines. At
first Cray reacted with scorn, emphasising how hard these machines were to program
indeed many problems just won't sub-divide enough to allow more than a handful of
CPUs to work efficiently in parallel. A whole new "message passing" programming
method was developed to overcome the communication and co-ordination problems
inherent in such loosely bound architecture. Some people likened it to lifting a brick by
harnessing a sack full of wasps.
However the writing was on the wall and the requirement for MPP machines that could
grow in small increments and the programming techniques for utilising them forced a
change of heart at Cray Research. In 1991 a three stage programme was announced that
would in 5 years turn Cray Research in to the leading, and as it turned out only MPP
suppercomputing vendor.
Original Actual
Target Plan Delivery
300 GFlops Max 1993 1993 T3d *
1 TFlop Peek mid-90s 1996 T3e/600
1 Tflop Sustained 1997 1998 T3e/900
The project was not without some pain along the way - essentially the company had to re-
invent all its crown jewels, compiler technology, operating system internals, IO
subsystem and even getting used to someone else's CPUs.
By joining the MPP market after the first round of machines Cray was in a position to
learn from the mistakes of others. It convened a steering committee of computer scientists
and MPP users to set the design parameters for the MPP development program.
The first fruits of the MPP development programme was the T3d. Using a DEC 21044
Alpha chip as the core processor it was surrounded by local memory and attached to a
high speed CPU interconnection network. The T3d had no I/O capability - instead it was
attached to and hosted by a YMP or C90 front-end. Using the Vhisp channels of a Model
E IOS system the T3e dumped its IO load on the host at a phenomenal rate that could
swamp a smaller YMP.
Each IOS is shared between 4 compute nodes. Each compute node interconnects in 3 bi-
directional dimensions with its nearest neighbours. At the edges of the 3D cube of nodes
each direction loops over to join the other side thus placing each node on three bi-
directional network loops.
Cray Superserver systems
The range of Cray Superserver systems designated APP, SMP, CS6400 started with the
acquisition of some of the assets of Floating point systems of Beaverton CO in 1991.
These machines ran a modified version of SunMicrosystems Solaris OS providing a
saleability well beyond that of any available Sun equipment. Using a system of domains
the machines, which could have up to 64 (60 MHz) Super SPARC, (later Ultra Sparc)
CPUs, but be reconfigured to appear as a group of smaller machines.
Cray never managed to sell very many of these systems despite their industry leading
performance. When Cray was bought by SGI the whole project was sold to Sun
Microsystems who developed the idea into the E10000 or "Starfire" range. A press
release that went out early in 1999 announced the 1000th system sold. Unlike the vector
and MPP systems the superservers can be split into seperate independant domains to
provide resiliance and failure isolation capability.
Benchmark 24 CPU 32 CPU

SPECRate-int 92 41,967 54,186
SPECRate-pp 92 55,734 72,177
CS6400 Was Available with 4..64 SuperSPARC CPUs, 256Mbyte..16GBytes memory,

1.8Gbytes/s peek memory bandwidth. Could have over 5 terabytes (Tbytes) of on-line
disk storage.
Cray operating systems

The first operating system produced by Cray Research for its machines was Cray
Operating System (COS). However some customers choose to roll-their-own one US
government Lab wrote and used Cray timesharing system (CTSS). COS ran on all
systems up to 1985 and continued on many machines for several years after that.
Unicos 1.0, known initially as CX-OS, was a Unix derivative developed in 1986 for the
Cray-2 architecture line. It was decided that it would be cheaper and faster to port Unix to
the new processor architecture than COS. Later Unicos was made available on the rest of
the Cray line, probably from customer demand. There was also the long term
maintenance economics to consider, COS had lots of assembler and it's easier to
maintain, port and extend the C code base that forms the heart of any Unix derivative.
Unicos shipped as source code + binaries with the release 1.0 licencing note reads ... "If
only the binary is licenced, the source will be kept under the control of the site analyst
who will build the product from source." Unicos 1.0 shipped with TCP/IP, C, Pascal,
Fortran, Cal ( assembler ), SCM and SCCS src control packages.
According to the Unicos 1.0 software release notice March 1986 "The Unicos operating
system can execute on the Cray-1M, Cray-1S, Cray XMP and Cray-2 computer systems.
Cray-1 systems must have an I/O Subsystem (IOS) to use Unicos."
Over the years, and 10 major releases, Unicos matured and developed into a full
mainframe class Unix with full resource (and user) control, Multilevel security (Orange
book C2), comprehensive tape sub-system, high performance math and science libraries
and Posix compliance. Along the way ISO networking, DCE, Total view debugger and a
GUI called XFilingManager and lots of performance measuring tools put in appearances.
The file system technology ( CFS and NC1FS) remained focused on performance and
scalability at a cost of flexibility. Multi-file system disks and multi-disk file systems were
standard but until Model-E IOS arrived any file system change required a reboot.
The Cray-3 utilised Unicos as did the YMP-EL and J90s but the Cray Superserver
systems used a modified version of Solaris.
The introduction of the MPP range with the T3D saw the start of major work in the
operating system area. The T3D was hosted by a Unicos mainframe but it an ran a micro
kernel OS called Unicos MAX. For the T3D all the physical IO system calls were
performed by the host PVP system.
In a computer system where there is a modest amount of CPUs, say 2..8 it is possible to
have all of OS services provided by a single CPU time slicing between the Kernel and
user application work. As the number of CPUs in the system increases, say 8..32, the
amount of OS service work increases past the point where it is possible to have just a
single service thread in the Kernel. Unless the Kernel is modified to handle separate OS
service requests in parallel the system will lose efficiency by forcing process in multiple
CPUs to wait until there is a service slot in the Kernel CPU. Prior to Unicos 8 as much as
7% of a C90 16 CPU system could be wasted waiting for OS requests. After the
introduction of the multithreaded Unicos 8.0 C90 systems were seen to spend a little as
2..3% of there time servicing OS request. However while this multi-threading answer
allows many cpu threads to be active in the kernel, the kernel still has to exist in a single
CPU. In a MPP system where there could be hundreds of CPUs demanding services from
the kernel, that kernel has to be able to run across multiple CPUs as well as execute
multiple threads. This requires a complete rethink of traditional operating system
implementation.
The solution provided in Unicos/MK was to slice the OS into a number of services, then
allow multiple instances of the services to run on separate CPUs. In one 850+ CPU
system there were 17 OS CPUs with the disk, file system, process management, resource
control, network and tape servers split across them. The exact number of CPUs that are
dedicated to each OS task varies with the size, workload and configuration of the system
but typically was in the ratio of 1 OS PE per 32 worker CPUs.
Practical Considerations in Supercomputer Design
To achieve such high speeds, high-power (i.e. hot) drivers are employed, signals are
detected with specialized analog circuits, conductors are all shielded and precisely tuned
in both impedance and length, and data is encoded with error-correcting so that losses can
be recovered.
In addition, the circuits are usually designed to operate in balanced mode so that there is
no change in power drawn as drivers switch. As one driver switches from low to high,
another switches from high to low, so that the power supply sees a DC load and there is
no coupling of switching noise back into the logic via the power supply. In addition,
using balanced signal lines can increase the signal to noise ratio by 6dB, although these
are not often used. In a design such as the Cray-1, roughly 40% of the transistors
supposedly do nothing but balance the power loading.
Even so, these machines dissipate large amounts of heat. The IBM 3090 uses special
thermal conduction modules in which a multichip substrate is mounted in a carrier with
built-in plumbing for a chilled water jacket. CDC used a similar system in its designs,
and on one instance a maintenance crew pumped live steam through the building air
conditioning system, which crossed over to the processor, with predictable results. This
raises the issue that these machines usually need thermal shut-down systems, and
possibly even fire suppression gear.
The Cray-1 series uses piped freon, and each board has a copper sheet to conduct heat to
the edges of the cage, where freon lines draw it away. The first Cray-1 was in fact
delayed six months due to problems in the cooling system: lubricant that is normally
mixed with the freon to keep the compressor running would leak through the seals as a
mist and eventually coat the boards with oil until they shorted out.
The Cray-2 is unique in that it uses a liquid bath to cool the processor boards. A special
nonconductive liquid (flourinert) is pumped through the system and the chips are
immersed in this.
Special fountains aerate the liquid, and reservoirs are provided for storing the liquid when
it is pumped out for service. This is somewhat remeniscent of the oil cooling bath that
was sometimes used in magnetic core memory units.

Cray Family (Coa)

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cray Family (Coa)

Încărcat de

Drepturi de autor:

Formate disponibile

TERM PAPER

TOPIC – CRAY FAMILY

* Cray supercomputer families

* PVP Generations, XMP, YMP, C90, T90

a. Parallel Vector Processors, the core product line of Cray Research.

b. Inside a Vector CPU

* Cray Superserver systems

* Cray Operating systems

* Ows and other support equipment

Cray supercomputer families

1972 Very Approx. Time line 1996 (dates not to scale)

**T3d* --> T3e --> T3e/1200 = MPP

--> Direct architectural descendant

The Cray 1 had (Baron and Higbie CS manual)

The 12 functional units are divided into four groups.

Group 1 -- Vector units

Vector (integer) Add: 3 stages

Group 2 -- Vector and scalar units

Group 3 -- Scalar units

Integer Add: 3 stages

Group 4 -- Address units

The machine itself is divided into six major subsystems

Cray 1 instructions are 32 or 16 bits, so from 2 to 4 instructions can be packed into a

The three instruction registers are

• Extended the Cray-1 architecture to 4-way multiprocessing.

• Extends the X-MP architecture to 8 processors.

PVP Generations, XMP, YMP, C90, T90

Parallel Vector Processors, the core product line of Cray Research.

Inside a Vector CPU

Registers: lots of them, in a YMP CPU for example

8 V registers, each 64 words long, each word 64 bits,

YMP functional units were:

Some About Cray 2:

• One foreground and four background processors.

Massively Parallel Processing systems, T3d, T3e

Cray Superserver systems

Benchmark 24 CPU 32 CPU

CS6400 Was Available with 4..64 SuperSPARC CPUs, 256Mbyte..16GBytes memory,

Cray operating systems

Practical Considerations in Supercomputer Design

S-ar putea să vă placă și