Work Stations

Workstation and System
Configurations
Brian Bramer,
Faculty of Computing and Engineering Sciences
De Montfort University, Leicester, UK
Workstation and System Configurations
Contents
1 Introduction
2 Performance requirements due to system and application software
2.1 Outline of a typical small to medium sized configuration
2.2 Operating system and system software requirements
2.2.1 Support for a Multi-Programming Environment
2.2.2 Support for Virtual Memory
2.2.3 Main Memory Requirements
2.2.4 Disk Requirements
2.3 Application Dependent Performance Factors
3 Important Factors in System Performance

3.1 Factors Influencing Overall System Performance
3.2 Factors Influencing Processor Performance
3.2.1 Internal Processor Architecture
3.2.2 Clock Speed
3.2.3 Memory Speed
3.2.3 Address Bus Size
3.3.4 Data Bus Size
4 Processor Performance Enhancement Techniques

4.1 Prefetch and Pipelining
4.2 Cache Memory
4.3 Example Processor Evolution: Intel and Motorola Microprocessors
4.4 CISC and RISC Processors
4.5 Special Purpose Processors, Multi-processors, etc.
4.5.1 Special Purpose Processors
4.5.2 Multi-Processors and Parallel Processors
4.5.2.1 Data Parallel Processing
4.5.2.2 Control Parallel Processing
5 Integrated Circuits and Performance Enhancement

6 System Configurations
6.1 Personal Computers, Workstations, Minis, Distributed, etc.
6.2 Performance Factors in a Distributed Environment
7 General requirements, disk backup, disk viruses, etc.

8 Conclusions
9 References
1 Introduction
When considering the acquisition of a computer system the first task undertaken is a to carry out
a feasibility study. The concept of installing a new or upgrading an existing system is analysed
to determine cost effectiveness in terms of end-user requirements and advantages gained, e.g.
increased productivity of skilled staff, reduced product development times, a more viable
product, etc. The result of the feasibility study will be a report to be submitted to senior
management to request funds to implement the proposed system.
The feasibility study to generate system requirements not only in terms of software (to solve the
end-users problems) but also hardware to support that software. The hardware requirements will
be in terms of computer processor power (do you need a £1000 office PC or a
£20000 professional workstation with real-time 3D graphics capability?), memory size
(do you need an 32Mbytes or 256Mbytes of RAM), disk space (even individual PC based
packages often need a 1Gbyte each), network support (to communicate with servers or other
users), etc. In addition, many end-users often forget the requirements of the system software
(operating system, compilers, etc.). These notes consider hardware requirements to support
software and discuss what factors effect overall system performance.
2 Performance requirements due to system and application

software
For further information check the following links
The WWW Virtual Library on computing http://src.doc.ic.ac.uk/bySubject/Computing/Overview.html
CPU Information centre - http://bwrc.eecs.berkeley.edu/CIC/
Intel's developer site - http://developer.intel.com/
Intel PC technology discussion - http://developer.intel.com/technology/
PC reference information - http://www.pcguide.com/index.htm
IBM PC compatible FAQ - http://www.undcom.com/compfaq.html
History of CPUs - http://bwrc.eecs.berkeley.edu/CIC/archive/cpu_history.html
CPU Information & System Performance Summary http://bwrc.eecs.berkeley.edu/CIC/summary/
Chronology of Events in the History of Microcomputers http://www.islandnet.com/~kpolsson/comphist/
2.1 Outline of a typical small to medium sized configuration
Fig 1 Typical microcomputer configuration using a common bus system

Fig 1 is a representation of the hardware (physical components) of a simple single processor
computer system comprising:
1. CPU and associated circuits, e.g. microprocessor integrated circuit chip - see
http://www.mkdata.dk/click/module3a.htm
2. Co-processor(s), e.g. for real number floating point calculations and/or graphics.
3. Main or primary memory, i.e. RAM (Random Access read/write Memory) and ROM
(Read Only Memory) see
http://www.cms.dmu.ac.uk/~cph/Teaching/CSYS1001/lec15/c1001l15.html
Disk interfaces for floppy/hard disks as secondary memory for saving programs and data
- see http://www.cse.dmu.ac.uk/~cph/Teaching/CSYS1001/lec17/c1001l17.html
and http://www.pcguide.com/ref/hdd/index.htm
4. User I/O interface which controls the display screen and the keyboard.
5. Input/output interface devices (for connecting external devices such as printers), e.g.
serial or parallel I/O interfaces.
In Fig 1 an information highway or bus system connects the various components of the system:
Address Bus carries the address of the memory location or I/O device being accessed
Data Bus which carries the data signals.
Control Bus which carries the control signals between the CPU and the other components
of the system, e.g. signals to indicate when a valid address is on the address bus and if
data is to be read or written.
See http://www.intel.com/network/performance_brief/pc_bus.htm and

http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses and
http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm for a
discussion on the AGP (Accelerated Graphics Port).
A sophisticated system may be much more complex than Fig. 1 with multiple processors, cache
memories (see below), separate bus systems for main memory, fast and slow I/O devices, etc.
When attempting to estimate the requirements of a proposed system in terms of processor
performance, main memory and disk size, etc., attention must be paid to the needs of both system
and user software in terms of:
1. supporting the operating system and other general software, e.g. editors, compilers,
window manager, network manager, etc.;
2. supporting user application software, e.g. CAD packages, databases, word processors,
etc.
2.2 Operating system and system software requirements

An early unsophisticated command line PC operating system such a MS-DOS 6.2 can run on an
IBM/PC compatible microcomputer with 640Kbytes of PAM memory and a relativity small disk
(e.g. 20Mbytes, MS-DOS itself needs approximately 6Mbytes of disk space). A more
sophisticated operating system would require much more RAM memory and disk space. For
example, Windows 98 which provides a windowed environment with multitasking/virtual
memory capabilities, needs a minimum of 32Mbytes of RAM memory and takes approximately
200Mbytes of disk space.
2.2.1 Support for a multiprogramming environment.
In a multiprogramming (or multiprocessing or multitasking) environment the user or users can be
running more than one program concurrently with certain programs being executed while others
are waiting for I/O from disk or terminals. In a single processor system only one program can be
executing at any instant and the operating system schedules programs for execution. It is
therefore possible to have more programs currently available for execution than there is room in
main memory and it is then necessary for some programs to be swapped out to disk (into a
reserved swap area). A sophisticated environment where many programs may be concurrent
could well require a large portion of disk to be set aside for the swap area. For example, a typical
professional workstation running UNIX could require a swap area of between 200 and
500Mbytes depending upon application, and allowance must be made for this. In addition,
modern multiprogramming environments also support virtual memory.
2.2.2 Support for virtual memory
Over the past 40 years sophisticated large scale computer based applications (e.g. engineering
CAD) have always required more main memory than was physically available (or affordable) on
the computers of the time. To overcome this problem virtual memory techniques evolved in the
late 1960's (Denning 1970).
Virtual memory makes use of a phenomenon known as locality of reference in which memory
references of both instructions and data tend to cluster. Over short periods of time a significant
amount of:
(a) instruction execution is localized either within loops or heavily used subroutines, and
(b) data manipulation is on local variables or upon tables or arrays of information.
Most virtual memory systems use a technique called paging in which the program and data is
broken down into 'pages' (typical size 4Kbytes) which are held on disk. Pages are then brought
into main memory as required and 'swapped' out when main memory is full. This technique
allows program size to be much larger than the physical main memory size (typically a modern
professional workstation may have 64 to 512Mbytes of main memory but a virtual memory size
of 4Gbyte). As the number and/or size of concurrent programs increases a phenomenon known a
thrashing can occur in which the system spends all its time swapping pages to and from disk and
doing nothing else. It is therefore important to configure sufficient physical memory even under
a virtual memory environment. This problem often becomes apparent over a period of time as
new releases of software (including the operating system) are mounted on a system. New
versions of software are always larger (sometimes two or three times) and users experience a
sudden reduction in response times and extended program run times. This often necessitates the
upgrading of main memory on existing systems every year or two.
2.2.3 Main memory requirements
Sufficient main memory is required to hold the operating system kernel (those functions
permanently in main memory) and those functions which will be loaded as required. If window
managers and/or network managers are also being used allowance should be made for their
requirements. Typically on a PC a simple command line operating system (e.g. MS-DOS)
required between 80 and 200Kbytes depending upon functions loaded and a more sophisticated
environment such as UNIX or Windows 2000 would require between 8 and 32Mbytes. The
follow are minimum recommendations for the size of RAM memory for IBM PC compatible
microcomputer operating systems (large scale applications such as a large database could require
more):
Windows 3.1
Windows 95
minimum 4Mbytes preferred 8Mbytes

minimum 32Mbytes preferred 64/128Mbytes

Windows 98
minimum 64Mbytes preferred
Windows NT/2000 128/256Mbytes
LINUX
minimum 16 Mbytes preferred 64/128Mbytes
SCO UNIX
If the main memory is too small there will be insufficient space for user programs and data or, in
a multiprogramming/virtual memory environment, excessive swapping and paging between main
memory and disk will occur.
2.2.4 Disk requirements
In addition to disk space required to support the user application programs and data sufficient
disk space is required to hold the operating system, utilities, compilers, help system, etc. This can
range from 500Kbytes on a small PC running MS-DOS to 350Mbytes on a professional
workstation running UNIX (where the help files alone can be 100 to 150Mbytes). In addition,
space must be reserved for the swap space in a multiprogramming/virtual memory environment.
For example, the following figures show how the disk space taken by the operating system on an
mid 1990's IBM PC increased as more sophisticated software was installed:
MS-DOS 6.2
5.8 Mbytes
plus
CD-ROM driver
6.9 Mbytes
plus
Windows 3.1
16.3 Mbytes
plus
Win32S
18.5 Mbytes
plus
Windows 95
41 Mbytes
One would then need to allow another 20 to 200Mbyes for swap space (depending upon
application). Other examples of PC operating system requirements are:
OS/2
40 Mbytes plus swap space
Windows 98
100/150Mbytes plus swap space
Windows NT/2000
200/300Mbytes plus swap space
LINUX (a free PC version of UNIX)
LINUX plus X-windows
Some operating systems (e.g. certain versions of Linux) require swap space to be allocated when
the disk is initialized (by setting up a swap partition). Others (e.g. Windows 95/98) have a swap
file which extends and contracts as required (will cause problems if the disk fills up!)
2.3 Application dependent performance factors

The importance of particular processor performance factors can depend upon the application, for
example:
Processor dependent:
the performance of applications in this category is largely dependent on instruction
execution speed and the performance of the ALU (arithmetic/logic unit used to
manipulate integer data), e.g. AI (artificial intelligence) applications are a very good
example (Lisp and Prolog programs, simulating neural networks, etc.).
Floating point dependent:
many mathematical/scientific applications will require a good real number calculation
performance, e.g. the analysis of the structure of a bridge using finite element mesh
techniques.
I/O (input/output) dependent applications:
applications which extensively manipulate disk file based information will require a good
I/O bandwidth, e.g. a large database holding details of clients orders which may be
simultaneously accessed by staff in various departments (production, sales, accounting,
etc.)
In practice one the above factors may predominate in a particular application (e.g. I/O bandwidth
is critical in database applications) or a broader overall system performance may be required.
Sufficient main memory and disk space must be provided to support the executable code and
user data sets. Examples of IBM PC compatible software disk requirements are:
Wordstar 7
6 Mbytes minimum, 17 Mbytes maximum
Turbo C++ 3.1
8.5 Mbytes typical
Borland C++ 5
170 Mbytes typical (depends on libraries installed)
Visual C++ 2
68 Mbytes minimum, 104 Mbytes typical
Oracle
running under SCO UNIX may require 256Mbytes of RAM to support a

sophisticated database system.
Java JDK1.2.2
150Mbytes plus more for extra APIs
Viewlogic CAD
800/1000 Mbytes
It is worth noting that although Java is not particularly large in disk requirements it needs
powerful processors and lots of memory to run complex Java applications using sophisticated
APIs, e.g. minimum Pentium 400 with 64/128Mbytes of memory. In a recent experiment Sun's
Java IDE Forte was mounted on a 5 year old DEC Alpha with 64Mbytes of memory and took 15
minutes to load!
Generally software houses or package sales documentation will provide guidance on processor
and memory requirements, e.g. so much memory and disk space for the base system plus so
much per user giving an immediate guide to the size of system required (one then needs to add
operating system requirements).
3 Important factors in system performance

3.1 Factors influencing overall system performance
Processor performance
(see next sub-section for a detailed discussion)
determines instruction execution speed, arithmetic performance, interrupt handling
capability, etc.
Main or primary memory size
is critical in system performance. In a single user system it limits the size of program
and/or data set which can be processed. In a multiprogramming/virtual memory
environment it effects the number of concurrent processes which can be held without
swapping to and from disk. In general the more powerful the processor the larger the
memory, i.e. a general rule is that as processor power increases so does the user
requirements and this leads to larger and more complex programs. When determining the
main memory size required for a system allowance must be made for the operating
system, e.g. a sophisticated operating system such as UNIX or Windows 98 typically
requires 8 to 32Mbyte for resident components and work area.
Secondary memory (disk) size
determines the number of programs and data sets which can be accessed on-line at any
instant. For example, in a single user word processing environment only one or two
documents will be accessed at a time, which could be held on a small floppy disk. On the
other hand, a large multi-user minicomputer could have 50 simultaneous users running
large programs with large data sets requiring 10000Mbytes or more of disk space. Again
when estimating disk requirements allowance has the to made for the operating system,
e.g. UNIX typically requires of the order of a 300Mbytes if all utilities and help files are
on-line.
Input/output bandwidth
is a measure of how fast information can be transferred between the processor, memory
and I/O devices (see data bus size in next sub-section).
Network capability
is important in a distributed environment where a number of separate systems are
connected via a network, e.g. personal workstations accessing a shared central database.
3.2 Factors influencing processor performance

The performance of the processor in terms of program size and execution speed is determined by
a number of factors.
10
3.2.1 Internal processor architecture

which determines:
a. The number of processor registers (high speed memory within the CPU) used for the
storage of temporary information and intermediate results. For example, holding local
variables in CPU registers reduces traffic to/from main memory and hence overall
program execution time.
b. The number of instructions available: a statement in a high level language is mapped into
a sequence of processor instructions. The approach taken in what has become known as
CISC architecture (complex instruction set computer) was that increasing the number of
instructions shortened the executable code and the program executed faster (see the
discussion on CISC/RISC machines below).
c. The number of addressing modes: the processor uses addressing modes to access the
operands (data to be operated on) of instructions. The approach in CISC architectures was
to increase the number of addressing modes to allow direct manipulation of more and
more complex data structures, e.g. records and arrays of records.
d. The data size of the ALU (Arithmetic/Logic Unit). The ALU can directly manipulate
integer data of a specific size or sizes, e.g. 8, 16, 32 or 64 bit numeric values. For
example, a 32-bit ALU can add a pair of 32-bit numbers with one instruction whereas a
16-bit ALU would require two instructions.
The control unit of first (valve) and second (transistor) generation computer systems was
'hardwired' in that physical circuitry fetched, decoded and executed instructions. The major
problem with very complex 'hardwired' circuits is that modifications are difficult and expensive.
The advent of integrated circuits (used in third and fourth generation computers) enabled the
building of ROMs on the processor chip which then allowed practical microprogramming
(Stallings 2000). In a microprogrammed control unit the fetch, decode and execute of
instructions are controlled by a ROM based 'microprogram' in the control unit which 'executes'
the instructions received by the processor as a succession of simple microinstructions. The
advantage of using ROM based microcode is that it is 'easier' to modify that an equivalent
'hardwired' circuit.
Over the past twenty years more and more instructions have been added making the
microprogram of typical CISC computers (e.g. Intel 8086 and Motorola 68000 families) very
complex and difficult to debug (see the discussion on CISC and RISC machines below).
See CPU Information & System Performance Summary http://bwrc.eecs.berkeley.edu/CIC/summary/ and CPU Information centre http://bwrc.eecs.berkeley.edu/CIC/
3.2.2 Clock Speed
Events within the system are synchronized by a clock which controls the basic timing of
instructions or parts of instructions. A particular microprocessor may be available in a range of
clock speeds. For example, Table 1 presents a summary of the relative performance of the
Motorola MC68000 family against clock speed (the performance in Mips is a guide and will be
effected by factors such as cache hit rate, etc.). All things being equal, a 25MHz MC68020 will
11
execute instructions twice as fast as an 12.5MHz version, but costs more.

clock MHz
68008
68000
68010
68020
68030
0.5
0.6
0.8
0.65
0.8
1.1
1.7
2.2
3.0
6.0
5.0
8
10
12.5
16.65
25
33
50
1.3
12.0
68040
22.0
29.0
Table 1 Relative performance (in Mips) of the Motorola MC68000 family against clock speed
(figures are a guide - results depend on clock speed, memory access time, cache hit rate, etc.)
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which
typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at
two, three or four times the external clock speed making a particular improvement in processor
bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where
a large data bus and fast I/O devices are more important.
3.2.3 Memory speed
Main memory speed should match the speed of the processor. A 25MHz MC68020 requires
faster (hence more expensive) memory than a 12.5MHz version. If necessary, memory attached
to a MC68020 can delay the processor on a memory read/write by using WAIT states, which
makes the processor idle for one or more clock periods and hence slows the overall execution
speed. A common tactic in the early 1990's was to build machines with a fast processor and clock
but with slow (and cheap) memory, e.g. the unwary could be caught by a machine advertised as
having a 25MHz CPU but which could execute programs slower than a 12.5MHz machine.
3.2.4 Address Bus size
The number of address lines determines the memory address space of a processor, i.e. both the
maximum amount of physical main memory which can be accessed (if fitted) and the maximum
logical memory size in a virtual memory environment. Therefore the address bus size effects
maximum program/data size and/or the amount of swapping and paging in a
multiprogramming/virtual memory environment. For example, 16 address lines can access a
maximum of 64Kbytes, 20 lines 1Mbyte, 24 lines 16Mbyte and 32 lines 4Gbyte.
It must be noted that even though a processor has a particular address space this does not mean
that a computer system will be or can be fitted with the maximum amount. For example, a
processor with 32 address lines has an address space of 4Gbyte but typical 32-bit machines are
fitted with anything between 4Mbyte and 256Mbyte of physical memory. The 4Gbyte address
space becomes important under a virtual memory environment where very large programs can be
executed on machines with much smaller physical memory. In practice there is a maximum
amount of memory which can be fitted to a particular model of machine (determined by the
layout of the machine in terms of bus slots, physical space available, etc.). One of the major
12
differences between personal workstations and mini/mainframe computer systems is that the
latter can generally be fitted with much larger physical memory.
3.2.5 Data bus size
The width of the data bus determines how many memory read/write cycles are required to access
instructions/data and has a major effect on I/O bandwidth, e.g. if a processor has a 16-bit data
bus it will require two memory accesses to read a 32-bit number while a processor with a 32-bit
data bus would require a single access. A question often asked is why a multi-user minicomputer
can be up to ten times the cost of a personal workstation with similar processor performance. The
answer is that when purchasing minicomputers and mainframe systems one is buying, to a large
extent, I/O bandwidth and physical memory capacity. An example (from the mid 1980's) is the
comparison between an Apollo DN3000 workstation (based on a MC68020 12MHz
microprocessor) and the DEC VAX 8200 minicomputer
processor rating
I/O bandwidth
typical cost (1987)
Apollo DN3000
1.2 Mips
1Mbyte/sec
£20,000
DEC VAX 8200
1.2 Mips
13 Mbytes/sec
£200,000
The figures are order of magnitude guides but do give an indication of different areas of
application of the systems. The Apollo was a single user workstation used for highly interactive
computational tasks and the VAX was typically be used by a number of concurrent users (e.g.
five to ten) to run tasks which are not heavy in computational terms but which require a system
capable of supporting the I/O of a number of users (e.g. multi-user databases, sales/stock control
packages, accounting packages, etc.)
13
Microprocessor manufacturer
& type
address bus size in

bits
Intel 8080
Zilog Z80
Motorola 6800
Intel 8088 (IBM/PC)
Intel 8086 (IBM/PC XT)
Motorola 68008
Motorola 68000, 68010
Intel 80186, 80286
Motorola 68020/30/40
Intel 80386SX
Intel 80386DX
Intel 80486DX
Intel 80486SX
Intel 80486DX2
Intel 80486DX4
Intel Pentium 400
16
16
16
20
20
20
24
24
32
24
32
32
32
32
32
32
maximum
data bus size in bitsclock
memory bytes
64K
64K
64K
1M
1M
1M
16M
16M
4G
16M
4G
4G
4G
4G
4G
4G
8
8
8
8
16
8
16
16
32
16
32
32
32
32
32
32/64 PCI
*1
*1
*2
*3
*4
Table 2 Common microprocessors with address and data bus sizes

Note: K = 1024 (210), M = 1048576 (220), G = 1073741824 (230) The 40486SX is identical to
the DX except that it has no floating point coprocessor
Table 2 shows address and data bus sizes for various microprocessors:
1. 1 The early microcomputers (e.g. Intel 8080, Zilog Z80, and Motorola 6800 series) have
a 16-bit address bus which can address a maximum memory size of 65536 bytes or 64
Kbytes, i.e. 1111111111111111 in binary.
2. The Intel 8086 (used in the original IBM PC microcomputer) and Motorola MC68008
have a 20-bit address bus which can address a maximum memory size of 1048576 bytes
or 1 Mbyte.
3. The Intel 80186/286 and Motorola MC68000/10 have a 24-bit address bus which can
address a maximum memory size of 16777216 bytes or 16 Mbytes.
4. The Intel 80386/486/Pentium and Motorola MC68020/30/40 have a 32-bit address bus
which can address a maximum memory size of 4294967296 bytes or 4 Gbytes.
Table 2 shows the maximum amount of primary memory which can be addressed. In practice a
computer system may be fitted with less, e.g. typically a modern Pentium system has 32, 64, 128
or 256 Mbytes. Although the primary memory is organized in bytes an instruction or data item
may use several consecutive bytes of storage, e.g. using 2, 4 or 8 bytes to store 16-bit, 32-bit or
64-bit values respectively.
The size of the data bus determines the number of bits which can be transferred between system
components in a single read or write operation. This has a major impact on overall system
14
performance, i.e. a 32-bit value can be accessed with a single memory read operation on a 32-bit
bus but requires two memory reads with a 16-bit bus. In practice the more powerful the
processor the larger the data and address busses.
The size of the address and data busses has a major impact on the overall cost of a system, i.e.
the larger the bus the more complex the interface circuits and the more 'wires' interconnecting
system components. Table 2 shows that there are versions of some processors with a smaller data
and addresses busses, e.g. the Intel 80386SX is (from a programmers viewpoint) internally
identically to the 80386 but has a 20-bit address bus and a 16-bit external data bus (but the
internal data bus is 32-bits). These are used to build low cost systems which are able to run
application programs written for the full processors (but with reduced performance).
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which
typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at
two, three or four times the external clock speed making a particular improvement in processor
bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where
a large data bus and fast I/O devices are more important.
Table 2a shows the Intel processors with address, data bus sizes (internal and external), internal
cache size, presence of internal co-processor and internal clock speed.
IBM PC compatibles
processor model
Intel 8088 (IBM/PC)

Intel 8086 (IBM/PC XT)
Intel 80186, 80286
Intel 80386SX
Intel 80386DX
Intel 80486DX
Intel 80486SX
Intel 80486DX2
Intel 80486DX4
Intel Pentium 400
maximu
address
internal
m
bus size
data bus
memory
in bits
in bits
bytes
20
20
24
32
24
32
32
32
32
32
1M
1M
16M
4G
16M
4G
4G
4G
4G
4G
16
16
16
32
32
32
32
32
32
64
internal
external internal
intern
codata bus cache in
al
processo
in bits
bytes
clock
r
8
16
16
32
16
32
32
32
32
32/64 PCI
none
none
none
none
none
8K
8K
8K
16K
16K
no
no
no
no
no
yes
no
yes
yes
yes
*1
*1
*1
*1
*1
*1
*1
*2
*2
or*3
*4
Table 2a Intel processors

Notes:
Address bus size
determines the memory address space of a processor, e.g. 32 address lines can address a
maximum of 4Gbyte of memory
Data bus size
15
determines how many memory read/write cycles are required to access instructions/data
has a major effect of input/output bandwidth (important in file servers and database
servers)
Cache memory
a fast memory logically positioned between the processor and bus/main memory - can be
on chip (as in 80486) and/or external
Floating point co-processor
is important in real number calculations (twenty times speed up over normal CPU)
important in mathematical, scientific and engineering applications
Clock Speed
The clock times events within the computer - the higher the clock the faster the system
goes - (assuming memory, bus, etc. matches the speed)
Internal clock speed
the 80486DX2, 80486DX4 and Pentium processors contain clock
doublers/triplers/quadrouplers, etc.
on-chip operations are performed at 2/3/4 times the external clock speed - external
operations are the same
16
4 Processor Performance Enhancement Techniques

Modern processors, including microprocessors, use instruction pipelining and cache memory
techniques first used in the large mainframe computers of the 1960's and 1970's (Foster 1976).
Also see Chronology of Events in the History of Microcomputers http://www.islandnet.com/~kpolsson/comphist/ ,CPU Information & System Performance
Summary - http://bwrc.eecs.berkeley.edu/CIC/summary/ and CPU Information centre http://bwrc.eecs.berkeley.edu/CIC/
4.1 Prefetch and Pipelining

A program consists of a sequence of instructions in main memory. Under the control of the
Control Unit each instruction is processed in a cyclic sequence called the fetch/execute or
instruction cycle:
Fetch Cycle
A machine code instruction is fetched from main memory and moved into the Instruction
Register, where it is decoded.
Execute Cycle
The instruction is executed, e.g. data is transferred from main memory and processed by
the ALU.
To speed up the overall operation of the CPU modern microprocessors employ instruction
prefetch or pipelining which overlap the execution of one instruction with the fetch of the next or
following instructions. For example, the MC68000 uses a two-word (each 16-bits) prefetch
mechanism comprising the IR (Instruction Register) and a one word prefetch queue. When
execution of an instruction begins, the machine code operation word and the word following are
fetched into the instruction register and one word prefetch queue respectively. In the case of a
multi-word instruction, as each additional word of the instruction is used, a fetch is made to
replace it. Thus while execution of an instruction is in progress the next instruction is in the
prefetch queue and is immediately available for decoding. Powerful processors make extensive
use of pipelining techniques in which extended sequences of instructions are prefetched with the
decoding, addressing calculation, operand fetch and execution of instructions being performed in
parallel (Stallings 2000). In addition, modern processors cater for the pipelining problems
associated with conditional branch instructions. For more details see
http://www.cs.herts.ac.uk/~comrrdp/pipeline/pipetop.html and
http://www.cs.umass.edu/~weems/CmpSci535/535lecture8.html
4.2 Cache memory - also see (see

http://www.infc.ulst.ac.uk/~desi/b94mn/cache.htm)
There has always been a problem of maintaining comparability between processor and memory
speed (Foster 1976, Stallings 2000). Increasing processor speed is relatively cheap in comparison
to corresponding increases in the speed of the bus and main memory configuration (hence the
use of WAIT states to match processors to slower and cheaper memory).
A cache memory makes use of the locality of reference phenomenon already discussed in the
section on virtual memory, i.e. over short periods of time references of both instructions and data
17
tend to cluster. The cache is a fast memory (matched to CPU speed), typically between 4K and
256Kbytes in size, which is logically positioned between the processor and bus/main memory.
When the CPU requires a word (instruction or data) a check is made to see if it is in the cache
and if so it is delivered to the CPU. If it is not in the cache a block of main memory is fetched
into the cache and it is likely that future memory references will be to other words in the block
(typically a hit ratio of 75% or better can be achieved). Clearly memory writes have to be catered
for and the replacement of blocks when new block is to be read in. Modern microprocessors
(Intel 80486 and Motorola MC68040) have separate on-chip instruction and data cache
memories - additional external caches may also be used, see Fig 2. Cache memory is particularly
important in RISC machines where the one instruction execution per cycle makes heavy
demands on main memory.
The concept of a cache has been extended to disk I/O. When a program requests a block or
blocks several more are read into the cache where it is immediately available for future disk
access requests. Disk caches may take two forms:
Software disk cache
in which the operating system or disk driver maintain the cache in main memory, i.e.
using the main CPU of the system to carry out the caching operations.
Hardware disk cache
in which the disk interface contains its own cache RAM memory (typically 4 to
16Mbytes) and control circuits, i.e. the disk cache is independent of the main CPU.
Hardware disk caches are more effective but require a more complex (and expensive) disk
controller and tend to be used with fast disks in I/O bound applications, e.g. databases.
Fig 2 Showing CPU (with ALU, Control Unit and internal cache), external cache, RAM memory
and busses
18
4.3 Example Processor Evolution: Intel and Motorola Microprocessors

The Motorola MC68000 family has evolved considerably since the introduction of the MC68000
in 1979 (the Intel 8086 family has evolved along similar lines - see Fig. 3):
MC68000 - 1979
NMOS technology approximately 68000 transistors. 16-bit data bus, 24-bit address bus
(maximum 16 Mbyte memory)
2 word prefetch queue (including IR)
approximately 0.6 Mips at 8MHz
MC68008 - 1982
NMOS technology - from a programmers viewpoint almost identical to 68000
8-bit data bus, 20 bit address bus (maximum 1Mbyte memory)
MC68010 - 1982
as 68000 with the following enhancements:
three word prefetch queue (tightly looped software runs in 'loop mode')
memory management support (for virtual memory)
MC68020 - 1984
CMOS technology with 200000 transistors
true 32-bit processor with 32-bit data and address busses (4 Gbyte address space)
extra instructions and addressing modes
three clock bus cycles (68000 bus cycles take four clock cycles)
extended instruction pipeline on-chip 256 byte instruction cache co-processor interface,
e.g. for MC68881 floating-point co-processor
MC68030 - 1987
300000 transistors
extended pipelining
256 byte on-chip instruction cache and 256 byte on-chip data cache
on-chip memory management unit
MC68040 - 1989
1200000 transistors
4Kbyte on-chip instruction cache and 4Kbyte on-chip data cache
on-chip memory management unit and floating point processor
pipelined integer and floating point execution units operating concurrently
19
Fig 3 Showing the relative performance of Intel processors - from

http://bwrc.eecs.berkeley.edu/CIC/summary/icomp.gif
20
Fig 3a Showing the relative performance of Intel overdrive processors - from

http://bwrc.eecs.berkeley.edu/CIC/summary/icomp-cmp.gif
Overdrive processors use newer technology (DX4 and Pentium) in chips which plug into earlier
motherboards
for comparisions of the Intel Pentium III processors see
http://www.intel.com/procs/perf/icomp/index.htm
4.4 CISC and RISC processors (Stallings 2000)

Over the past thirty years as the size the silicon wafers increased and circuit elements reduced the
architecture of processors become more and more complex. In an attempt to close the semantic
gap between high level language operations and processor instructions more and more powerful
and complex instructions and addressing modes were implemented. As microprocessors evolved
this continued until many of todays advanced microprocessors (e.g. Intel 80486, Motorola
68040) have hundreds of instructions and tens of addressing modes. This type of processor
architecture is called a complex instruction set computer or CISC. There are a number of
drawbacks with this approach:
1. The instruction set and addressing modes are so complex that it becomes very difficult to
write compilers which can take advantage of particular very powerful instructions, i.e.
21
optimize the generated code correctly.
2. The microprogram of the control units becomes very complex and difficult to debug.
3. Studies of typical programs have shown that the majority of computation uses only a
small subset of the instruction set, i.e. a large percentage of the chip area allocated to the
processor is used very little. Table 3 (Tanenbaum 1990) presents the results of studies of
five programming languages (SAL is a Pascal like language and XPL an a PL/1 like
language) and presents the percentages of various statement types in a sample of
programs. It can be seen that assignments, IFs and procedure CALLs account for
typically 85% of program statements. Further analysis (Tanenbaum 1990) has shown that
80% of assignments are of the form variable:=value, 15% involve a single operator
(variable:=a+b) and only 5% percent of expressions involve two or more operators.
Statement
SAL
XPL
Fortran
Pascal
Average
Assignment
IF
CALL
LOOP
GOTO
other
47
17
25
6
0
5
55
17
17
5
1
5
51
10
5
9
9
16
38
43
12
3
3
1
45
29
15
5
0
6
47
23
15
6
3
7
Table 3 Percentage of statement types in five programming languages (Tanenbaum 1990)

An alternative approach to processor architecture was evolved called the reduced instruction set
computer or RISC. The number of instructions was reduced by an order of magnitude and the
space created used for more processor registers (a CISC machine typically has 20 registers a
RISC machine 500) and large on-chip cache memories. All data manipulation is carried out on
and using data stored in registers within the processor, only LOAD and STORE instructions
move data between main memory and registers (RISC machines do not allow direct
manipulation upon data in main memory). There are a number of advantages to this approach:
1. Compiler writing becomes much easier with the limited instruction set.
2. All instructions are of the same length simplifying pipelineing. Instructions execute
within one clock cycle - with modern RISC machines executing some instructions in
parallel (this requires sophisticated pipelining techniques). CISC instructions can be
various lengths (e.g. 2 bytes to 10 bytes in length) taking varying times to execute and
making pipelining complex.
3. The control unit is sufficiently simple that it can be 'hardwired'.
4. The circuit design, layout and modelling is simplified, reducing development time, e.g.
Table 4 shows the design and layout effort involved in the development of some modern
RISC and CISC microprocessors (Stallings 2000).
22
The disadvantages are:

1. Programs are typically 25% to 45% larger than on an equivalent CISC machine (not a
major problem with cheap main memory and large caches);
2. Executing one instruction per clock cycle makes heavy demands on main memory
therefore RISC machines tend to have larger cache memories than equivalent CISC
machines.
Until the late 1980's there was no out and out winner with RISC and CISC machines of similar
price giving similar overall performance. However, problems have arisen with the latest
generations of CISC microprocessors which incorporate sophisticated on-chip instruction
pipelines, memory management units, large instruction and data caches, floating point units, etc.
As clock speeds were increased (to improve performance) severe problems occurred in
maintaining reliable production runs with commercially available machines appearing up to a
year after the announcement of the microprocessor concerned. An interesting pointer to the
current trend towards RISC technology is that all the latest high performance workstations are
RISC based (in some cases replacing CISC models), e.g. IBM 6000, DEC 5000, Hewlett Packard
9000/700.
CPU
Transistors
Design
(person-months)
Layout
(person-months)
RISC I
RISC II
MC68000
Z8000
Intel APx-432
44,000
41,000
68,000
18,000
110,000
15
18
100
60
170
12
12
70
70
90
Table 4 Design and layout effort for some microprocessors (Stallings 2000)
4.5 Special Purpose Processors, Multi-processors, etc.

Adding extra processors can significantly enhance the overall performance of a system by
allowing tasks to be performed by specialised hardware and/or in parallel with 'normal'
processing .
4.5.1 Special Purpose Processors.
The use of specialised processors to perform specific functions was implemented in the large
mainframe computer systems of the 1970's, e.g. the PPU's (peripheral processing units) of the
CDC 6600 (Foster 1978). Today's high performance systems may contain a number of
specialised processors:
Floating point co-processor
to carry out real number calculations.
23
Graphics processor
to control the graphics display. This can range from a fairly simple graphics controller
chip which provides basic text, pixel and line drawing capabilities up to specialised
processors which support advanced graphics standards such as X windows.
Input/Output control processors
which carry out complex I/O tasks without the intervention of the CPU, e.g. network,
disk, intelligent terminal I/O, etc. For example, consider a sophisticated network where
the network communications and protocols are handled by a dedicated processor
(sometimes the network processor and associated circuits is more powerful and complex
than the main CPU of the system).
In a 'simple' system all the above tasks would be carried out by sequences of instructions
executed by the CPU. Implementing functions in specialised hardware has the following
advantages which enhance overall system performance:
(a) the specialised hardware can execute functions much faster than the equivalent
instruction sequence executed by the general purpose CPU; and
(b) it is often possible for the CPU to do other processing while a specialist processor is
carrying out a function (at the request of the CPU), e.g. overlapping a floating point
calculation with the execution of further instructions by the CPU (assuming the further
instructions are not dependent upon the result of the floating point calculation).
4.5.2 Multi-processors and Parallel Processors
John von Neuman in 1949 (Foster 1978, Tanenbaum 1990) developed EDSAC, the first
electronic stored program computer, in which a single CPU sent sequential requests over a bus to
memory for instructions and data. The vast majority of computer systems (CISC and RISC) built
since that time are essentially developments of the basic von Neuman machine.
One of the major limitations when increasing processor clock rate is the speed, approximately
20cm/nsec, at which the electrical signals travel around the system. Therefore to build a
computer with 1nsec instruction timing, signals must travel less than 20cm to and from memory.
Attempting to reducing signal path lengths by making systems very compact leads to cooling
problems which require large mainframe and supercomputers to have complex cooling systems
(often the downtime of such systems is not caused by failure of the computer but a fault in the
cooling system). In addition, many of the latest 32-bit microprocessors have experienced overheating problems. It therefore becomes harder and harder to make single processor systems go
faster and an alternative is to have a number of slower CPUs working together. In general
modern computer systems can be categorised as follows:
SISD single instruction single data architecture
SIMD single-instruction multiple-data architecture
MIMD multiple-instruction multiple-data architecture
The von Neuman machine is SISD architecture in which some parallel processing is possible
24
using pipelining and co-processors.

4.5.2.1 Data parallel processing
In data parallel processing one operation acts in parallel on different data. For example, the
SIMD (single-instruction multiple-data) architecture is one in which a control unit issues the
same instruction to a number of identical processing elements or PEs. For example, such an
architecture is useful in specialised applications where a sequence of instructions is to be applied
to a regular data structure. For example, image processing applications (from pattern recognition
to flight simulators) require sequences of operations to be applied to all pixels (picture elements)
of an image; which may be done pixel by pixel in a single processor system or in parallel in a
SIMD system. Many complex application areas (aerodynamics, seismology, meteorology)
require high precision floating point operations to be carried out on large arrays of data.
'Supercomputers' designed to tackle such applications are typically capable of hundreds of
millions of floating point operations per second.
4.5.2.2 Control parallel processing
Data parallel processing is applicable to a limited range of applications where a sequence of
instructions is applied to a data structure. General purpose computing, however, requires multiple
instruction multiple data processing. Such an environment is called control parallel processing in
which different instructions act on different data in parallel.
The MIMD (multiple-instruction multiple-data) architecture is one in which multiple processors
autonomously execute different instructions on different data. For example:
Multi-processing
in which a set of processors (e.g. in a large mini or mainframe system) share common
main memory and are under the integrated control of an operating system, e.g. the
operating system would schedule different programs to execute on different processors.
Parallel processing
in which a set of processors co-operatively work on one task in parallel. The executable
code for such a system can either be generated by:
(a) submitting 'normal' programs to a compiler which can recognize parallelism (if any)
and generate the appropriate code for different processors;
(b) programmers working in a language which allows the specification of sequences of
parallel operations (not easy - the majority of programmers have difficulty designing,
implementing and debugging programs for a single processor computer).
25
5 Integrated circuits and performance enhancement

An integrated circuit chip is a small device a few centimetres square which contains a complex
electronic circuit fabricated onto a wafer of semiconductor material. Over the years the
techniques used to fabricate the wafers have improved, e.g. the maximum chip edge size
increased from 2mm in 1960 to 13mm in 1990, see Fig. 4, and the minimum feature size
decreased from 50 microns in 1960 to 0.8 microns in 1990 allowing more circuits per unit area,
see Fig. 5. The result was that integrated circuits became larger and more complex, see Fig. 6,
with the number of transistors per chip doubling every 18 months (Moore's law originally
handed down in 1975 and still going strong) (Stallings 2000). Alongside the increase in
complexity there has been a corresponding reduction in cost, see Fig. 7:
Over the past 30 years, the performance/dollar ratio of computers has increased by a factor
of over one million (Gelsinger et al 1989).
For example, in 1790 the cost of memory (magnetic core) was between 50 pence and £1
per byte, e.g. 4K of 12-bit PDP8 memory was approximately £4000. By the mid 1970's
16K of 32-bit PDP11 memory cost £4000. Today IBM PC compatible memory is
between £25 and £40 per Mbyte.
The generations of integrated circuit technology range from small scale integration (SSI), to
medium scale integration (MSI), to large scale integration (LSI), very large scale integration
(VLSI) and ultra large scale (ULSI) integration. These can be represented by ranges of
complexity (numbers of components on the chip), see Table 5.
Until recently a sophisticated workstation would have contained a large number of complex
integrated circuit chips, e.g. the microprocessor, floating point co-processor, memory
management unit, instruction and data caches, graphics controller, etc. As chip complexity
increased it became possible to build more and more powerful on-chip microprocessors with
larger and larger address and data busses. The major problem, however, with increasing off-chip
bus widths is that every extra bit requires a contact (pin or leg) on the chip edge to connect it to
the outside world and an extra 'wire' and interface components on the external bus and associated
circuits. Thus every extra bus lines makes the overall system more complex and expensive, i.e.
mini and mainframe computer systems (which have large data buses) can be an order of
magnitude greater in cost than a personal workstation of equivalent CPU performance.
The ability to fabricate more components on a single chip (Fig. 6 and Fig 6a) has meant that a
number of functions can be integrated onto a single integrated circuit, e.g. modern
microprocessors contain the microprocessor, floating point co-processor, memory management
unit and instruction and data caches on a single chip. The advantages of having the majority of
the major components on-chip that very wide internal busses can be used decoupling cycle
timing and bandwidth of on-chip operations from off-chip considerations. Hence the processor
can run at a very fast cycle time relative to the frequency of the external circuitry. On-chip clock
multipliers enhances this effect.
26
complexity typical circuit function
SSI
MSI
LSI
VLSI
ULSI
2-64
64-2000
2000-64000
640002000000
200000064000000
e.g. simple gates AND, OR, EXOR, NOT,

etc.
e.g. counters, registers, adders, etc.
e.g. ALUs, small microprocessors, I/O
interfaces
e.g. microprocessors, DMA controllers, etc.
e.g. parallel processors, 1 Mbyte memory
chips
Table 5 Integrated circuit generations: complexity and typical circuit function
Fig. 4 Maximum chip edge size against time
Fig. 5 Minimum feature size in microns against time
27
Fig. 6 Number of components per chip against time
Fig 6a CPU transistor count Intel 8086 family
28
Fig. 7 Average main memory cost per byte
Fig. 8 Trends in CPU performance growth (Hennessy & Jouppi 1991)

Note: no account is taken of other factors such as I/O bandwidth, memory capacity, etc.
29
6 System configurations
6.1 Personal computers, workstations, minis, distributed, etc.
In the late 1970s computer systems could be classified into microcomputers, minicomputers and
mainframe computers:
A microcomputer:
a single user computer system (cost £2000 to £5000) based on an 8-bit
microprocessor (Intel 8080, Zilog Z80, Motorola 68000). These were used for small
industrial (e.g. small control systems), office (e.g. word-processing, spreadsheets) and
program development (e.g. schools, colleges) applications.
A minicomputer:
a medium sized multi-user system (cost £20000 to £200000) used within a
department or a laboratory. Typically it would support 4 to 16 concurrent users depending
upon its size and area of application, e.g. CAD in a design office.
A mainframe computer:
a large multi-user computer system (cost £500000 upwards) used as the central
computer service of a large organization, e.g. Gas Board customer accounts. Large
organizations could have several mainframe and minicomputer systems, possibly on
different sites, linked by a communications network.
As technology advanced the classifications have become blurred and modern microcomputers
are as powerful as the minicomputers of ten years ago or the mainframes of twenty years ago.
Fig. 8 shows the rate of CPU performance growth since the 1960's (Hennessy & Jouppi 1991) as
measured by a general purpose benchmark such as SPEC (these trends still continue - see Fig. 3).
Microprocessor based systems have been increasing in performance by 1.5 to 2.5 times per year
during the past six to seven years whereas mini and mainframe improvement is about 25% per
year (Hennessy & Jouppi 1991). It must be emphasized that Fig. 8 only compares CPU
performance and no account is taken of other factors such as the larger I/O bandwidth and
memory capacity of mini and mainframe systems and the special applications which require
supercomputers.
Today system configurations may be summarized as PCs (personal computers), professional
workstations, multi-user mini/mainframe computers and distributed environments.
6.1.1 Personal computers
PC - Personal Computer:
a generic term for a small (relatively) personal microcomputer system (cost £500
to £5000) used for a wide range of relatively low-level computer applications (see
Table 6 for a summary of the features of a typical PC). The most common PCs are the
IBM PC and compatible machines (based on the Intel 8086/80286/80386/80486/Pentium
family of microprocessors).
30
Bus size:
Until the late 1980's the major factor which limited the overall performance of IBM PC
compatible computers was the widespread use of the 16 bit IBM PC/AT bus (the 16 bit
refers to the data bus size) developed in the mid 1980s to support the 80286 based IBM
PC/AT microcomputer. This bus system was widely accepted and became known as the
ISA bus (Industry Standard Architecture). Unfortunately in terms of faster 80386/80486
computer systems the ISA bus was very slow, having a maximum I/O bandwidth of 8
Mbytes/sec. This caused a severe I/O bottleneck within 80486 systems when accessing
disk controllers and video displays via the bus, see Fig 9.
Some IBM PC compatibles were available with the IBM Microchannel bus or the EISA
(Extended Industry Standard Architecture) bus, both of which are 32 bit bus systems
having I/O bandwidths of 20 to 30 Mbytes/sec or greater. An EISA bus machine,
however, could cost £500 to £1000 more than the equivalent ISA bus
system with corresponding increases in the cost of the I/O boards (typically two to three
times the cost of an equivalent ISA bus card). The EISA bus maintains compatibility with
ISA enabling existing ISA cards to be used with it.
The problem with EISA was that it made the PC quite expensive and this led to the
development of local busses which are cheaper and have similar or better performance.
There were two major contenders:
1. VESA a 32-bit local bus which was the first to appear
2. PCI a 32/64-bit local bus which is supported by Microsoft and Intel
Because VESA was the first to appear it became popular in the early/mid eighties. Since
that time PCI has taken over - mainly because it was supported by Microsoft and Intel
and could be use to support the Pentium which has a 64-bit data bus (Intel quote peak
bandwidths of 132Mbytes/sec). Early Pentium systems had a PCI local bus used for high
performance devices (video, disk, etc.) plus an ISA bus for slower devices (serial and
parallel I/O, etc.), see Fig. 10. Many of todays Pentiums systems do not have ISA bus
slots which can cause problems if on wishes to interface with old devices, e.g. specialist
hardware boards.
31
Fig 9 Showing the ISA bus of an IBM PC compatible microcomputer
Fig 10 Showing IBM PC compatible microcomputer with a PCI local bus
32
PCI bus The original PCI bus was rated 32 bits at 33MHz giving a maximum throughput
of 132Mbytes per second. Since then PCI-2 has appeared rated 32/64bits at 66MHz
giving a maximum throughput of 528Mbytes persecond. Unfortunately the PCI bus is
now quite dated and is becoming a performance bottleneck in modern Pentium systems see http://www.intel.com/network/performance_brief/pc_bus.htm and
http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses.
For example, many Pentium motherboards are also equipped with a AGP (Accelerated
Graphics Port) which was developed to support high performance graphics cards for 2D
and 3D applications - see http://developer.intel.com/technology/agp/tutorial/,
http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm
Display
The main problem with running sophisticated graphics applications on a PC is that the screen
quality in terms of addressable pixels and physical size is deficient:
1. PC VGA graphics is only 640*480 pixels compared with a workstation 'norm' of greater
than 1000*1000 pixels). The super VGA graphics (1024*768 pixels by 256 colours) of
modern PCs is much better.
2. Screen updating can be very slow (relative to a workstation) on a machine with an ISA
bus (see discussion above on PC bus systems).
3. cheaper PCs sometimes use an interlaced display to reduce overall system cost, i.e.:
Non-interlaced display: every line of pixels is displayed 50 or 60 times per second.
Interlaced display: alternate lines of pixels are displayed 25 or 30 times per second thus
horizontal lines of one pixel thickness flicker.
4. The physical screen size of a PC is typically 14/15/17 inches against the workstation
norm of 19/21 inches.
Operating system
The most common operating system of IBM PC compatibles is generally some variety of
Windows (95/98/NT/2000). Although OK for many application environments UNIX is still
preferred for high-performance robust application areas.
6.1.2 Professional workstations
Professional workstation a generic term applied to the (relatively) high powered personal
computing systems operating in a distributed environment evolved by Apollo (Nelson & Leach
1984) and Sun (Pratt 1984) in the early 1980's. The main advantages of professional
workstations over PCs are:
a. Computing power is an order of magnitude higher: the early machines were based
on the Motorola MC68000 family of microprocessors, today the tendency is to
33
use RISC based architectures. Main memory and disk size is corresponding
higher.
b. Bus system (Stallings 2000): in the past professional workstations used 32 bit bus
systems, e.g. VME with an I/O bandwidth of 40 Mbytes/sec. Modern
workstations have moving to 64 bit or greater buses or independent memory and
I/O bus systems. .
c. UNIX operating system: the de facto industry standard for medium sized
computer systems.
d. Integrated environment: the workstations are designed to operate in a
sophisticated multiprogramming networked distributed environment. The
operating system is integrated with the window manager, network file system, etc.
e. Multiprogramming/virtual memory operating system: the workstations are
designed to run large highly interactive computational tasks requiring a
sophisticated environment.
f. High quality display screen: a large high quality non-interlaced display with
mouse input is used to interact with the window managed multiprogramming
environment.
A modern high-performance PC, equipped with high-performance graphics card and high quality
display can compete with low end workstations (at similar cost). More specialised applications
such as real-time 3D graphics still require professional workstations.
6.1.3 Multi-user minicomputer and mainframe computer systems.
The terms mini and mainframe are becoming very blurred but in general refer to large multi-user
configurations with good I/O bandwidth and main memory capacity, i.e. a number of users (100
to 100000) concurrently running relatively straight-forward applications on a common computer
system. High powered multi-user systems typically have an I/O bandwidth and physical memory
capacity at least an order of magnitude greater than PCs and workstations of similar CPU power.
Such multi-user environments may be accessed via a network by X terminals, PCs or
professional workstations. PCs or professional workstations can be used as multi-user machines
so long as the amount of concurrent I/O does not reduce overall performance.
6.1.4 Distributed environment

A modern computer configuration tends to consist of a number of individual computer systems
connected via a network to form a distributed environment:
1. User workstations: PCs and/or professional workstations running highly
interactive software tools, e.g. word processing, spreadsheets, CAD design, CASE
tools, etc.
2. Fileservers: powerful (relative to the user workstations) computer system which
holds central operating system files, user files, centralized databases, etc. This
34
may be a high powered PC, a specialised workstation (without a user) or a

minicomputer.
3. Nodal processors: some commercial, industrial or scientific applications require a
powerful centralized system to support specialised tasks beyond the capacity of
the user workstations, e.g. heavy floating point numeric processing, very large
databases, etc. Depending upon the configuration a fileserver may provide this
support otherwise a dedicated minicomputer, mainframe or supercomputer would
be used.
4. Bridges and gateways: in a distributed environment care must be taken not to
overload the network and to provide adequate security for sensitive information.
Splitting a large network up into a number of semi independent networks linked
by bridges and gateways can assist with these problems.
6.2 Performance factors in a distributed environment

Distributed environments can be very complex and important factors include:
1. Performance of the network or networks: this is dependent upon the physical network
configuration and speed, the communications protocol used, number of bridges and
gateways, etc.
2. The number of user workstations, their distribution over the network(s) together with
support fileservers and nodal processors.
3. The size of main memory and disks on the user workstations. The network traffic can be
reduced if the operating system and commonly used software is held on local disks
(needs careful management of new software releases). A cheaper alternative used to be to
have a small disk on the user workstation which held the operating system swop space so
at least the operating system did not have to page over the network (also see diskless
nodes below).
4. The number of fileservers and their power in terms of processor performance, main
memory size and disk I/O performance. The distribution of software packages and user
files around the fileservers is critical:
(a) complex intensive centralized tasks could well require a dedicated fileserver, e.g.
an advanced database environment or the analysis of large engineering structures
using finite element mesh techniques;
35
(b) spreading the end-user files around the fileservers prevents overloading of
particular fileservers (and if a fileserver breaks down some users can still do their
work).
5. The number (if any) of diskless nodes. One of the general rules when purchasing disks is
the larger the disk the less the cost per byte. One way to reduce costs in a distributed
system is to equip selected machines with large disks which then act a 'hosts' to a number
of diskless nodes. On start up a diskless node 'boots' the operating system over the
network from the 'host' and carries out all disk access over the network. In practice the
'host' machines may be other user workstations or fileservers. An additional advantage
was that management of the overall system was made easier with less disks to maintain
and update. Diskless system work provided the following factors are taken into
consideration:
(a) The network does not become overloaded with traffic from too many diskless
nodes;
(b) The ratio of diskless to disked host nodes does not become too high, i.e. placing
excessive load on the hosts. In practice a ratio of three to one gives reasonable
performance, however, systems with ratios of ten to one (or greater) have been
implemented with correspondingly poor performance.
(c) There is sufficient main memory in the diskless nodes such than excessive
swapping/paging (over the network) does not occur in a multiprogramming/virtual
memory environment. Sudden degradations in performance can often be observed
when new software releases cause the problem of excessive paging as programs
increase in size.
(d) The network speed is sufficiently high to cope with the overall demands of the
workstations. Until the late 1980's this was not a problem, with typical network
speeds of 10Mbit/sec and typical professional workstations having a power of 1 to 5
Mips. However, modern machines make diskless nodes impossible without very fast
networks.
Clearly great care is needed in configuring a distributed environment with a slight error giving
the impression of 'clockwork' powered machines. Common problems (often due to lack of funds)
are:
1. too few fileservers for the number of user workstations and/or poor distribution of
fileservers across the network;
2. too little main memory on fileservers causing bottlenecks in the accessing of centralized
file systems;
3. too many diskless to disked nodes and/or too little main memory in diskless nodes.
36
7 General requirements, disk backup, disk viruses, etc.

The size and performance of the system(s) required to support end-users tasks depends upon a
range of factors:
1. the maximum number of concurrent users (at any time) and the complexity of the tasks
performed determines processor power, main memory size, I/O bandwidth, disk speed,
etc.;
2. the total number of registered users determines the number of workstations, fileservers
and on-line storage;
3. the size of the operating system, utilities, end-user packages and data sets determines the
size of the main memory and on-line disk storage;
In the case of a multi-user environment, where user files are stored on centralized fileserver(s),
provision must be made for disk backup. Backups would be carried out on a regular basis onto
magnetic tape (half inch industry standard, tape streamer, DAT drive, etc.). On many systems
(particularly PCs) there arises the problem of users moving information on and off the system via
floppy disks and other media. Although provision needs to be made for the transfer of
information care needs to be taken. For example, if user machines are equipped with floppy disks
the following problems may occur:
1. Users may take copies of programs and data either for their own use or to sell.
2. Users may copy unlicensed programs onto the system thus breaking copyright and
making the employer liable to prosecution.
3. Users may consciously or unconsciously (while copying files) copy viruses onto the
system.
All of the above actions can lead to serious problems. In particular, viruses create havoc by
deleting and corrupting files and disks (educational environments where students move from PC
laboratory to PC laboratory are very subject to this problem with often half the machines on a
campus having a virus at any moment).
The main way to avoid the above problems is to provide users with no direct facility for copying
files to/from movable media, i.e. PCs are not fitted with floppy disks. All disks and tapes brought
into an organization are processed by the systems staff who check for illicit copying of files and
any viruses.
Other avenues for illicit copying are via connections to external networks or by attaching
portable computers to local networks. Rigorous procedures for controlling access to networks
(e.g. extensive password protection) and the movement of portable and semi-portable machines
can reduce these problems.
8 Conclusions
This paper reviewed a range of issues critical in system performance evaluation:
37
1. The effect of system and end-user software on the overall requirements of a computer
system.
2. Factors effecting overall system performance in terms of CPU power, memory size, data
bus size, etc.
3. The techniques used to improve processor performance and how modern integrated
circuits have enabled these to be implemented in low to medium cost systems.
4. The range of system configurations (PCs, workstations, multi-user, distributed) with
particular attention to factors which are critical in a distributed system.
9 References
Bramer, B, 1989, 'Selection of computer systems to meet end-user requirements', IEEE
Computer Aided Engineering Journal, Vol. 6 No. 2, April, pp. 52-58.
Denning, P J, 1970, 'Virtual memory', ACM Computing Surveys, Vol. 2 No. 3, September.
Foster, C C, 1976, 'Computer Architecture', Van Nostrand Reinhold.
Gelsinger, PP, Gargini, P A, Parker, G H, & YU A Y C, 1989, 'Microprocessors circa 2000', IEEE
Spectrum, Vol. 26 No. 10, October, pp 43-47.
Hennessy, J L & Jouppi, N P, 1991, 'Computer technology and architecture: an evolving
interaction', IEEE Computer, Vol. 24, No. 9, September, pp 18-28.
Nelson, D L & Leach, P J, 1984, 'The Architecture and Applications of the Apollo Domain',
IEEE CG&A, April, pp 58-66.
Pratt, V R, 1984, 'Standards and Performance Issues in the Workstations Market', IEEE CG&A,
April, pp 71-76.
Stallings, W, 2000, 'Computer organization and architecture', Fifth Edition, Prentice Hall, ISBN
0-130085263-5.
Tanenbaum, A S, 1990, 'Structured Computer Organisation', Prentice-Hall.
38

Work Stations

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Work Stations

Încărcat de

Drepturi de autor:

Formate disponibile

Workstation and System

Workstation and System Configurations

3 Important Factors in System Performance

4 Processor Performance Enhancement Techniques

5 Integrated Circuits and Performance Enhancement

7 General requirements, disk backup, disk viruses, etc.

Workstation and System Configurations

2 Performance requirements due to system and application

2.1 Outline of a typical small to medium sized configuration

Fig 1 Typical microcomputer configuration using a common bus system

Data Bus which carries the data signals.

See http://www.intel.com/network/performance_brief/pc_bus.htm and

2.2 Operating system and system software requirements

minimum 4Mbytes preferred 8Mbytes

Workstation and System Configurations

minimum 32Mbytes preferred 64/128Mbytes

40 Mbytes plus swap space

100/150Mbytes plus swap space

200/300Mbytes plus swap space

LINUX (a free PC version of UNIX)

200 Mbytes plus swap space

LINUX plus X-windows

350 Mbytes plus swap space

2.3 Application dependent performance factors

Workstation and System Configurations

6 Mbytes minimum, 17 Mbytes maximum

Turbo C++ 3.1

8.5 Mbytes typical

170 Mbytes typical (depends on libraries installed)

68 Mbytes minimum, 104 Mbytes typical

running under SCO UNIX may require 256Mbytes of RAM to support a

150Mbytes plus more for extra APIs

Workstation and System Configurations

3 Important factors in system performance

3.2 Factors influencing processor performance

Workstation and System Configurations

3.2.1 Internal processor architecture

execute instructions twice as fast as an 12.5MHz version, but costs more.

typical cost (1987)

DEC VAX 8200

Workstation and System Configurations

address bus size in

Table 2 Common microprocessors with address and data bus sizes

Intel 8088 (IBM/PC)

Table 2a Intel processors

Workstation and System Configurations

4 Processor Performance Enhancement Techniques

4.1 Prefetch and Pipelining

4.2 Cache memory - also see (see

Workstation and System Configurations

4.3 Example Processor Evolution: Intel and Motorola Microprocessors

Workstation and System Configurations

Fig 3 Showing the relative performance of Intel processors - from

Workstation and System Configurations

Fig 3a Showing the relative performance of Intel overdrive processors - from

4.4 CISC and RISC processors (Stallings 2000)

optimize the generated code correctly.

Table 3 Percentage of statement types in five programming languages (Tanenbaum 1990)

The disadvantages are:

4.5 Special Purpose Processors, Multi-processors, etc.

SISD single instruction single data architecture

SIMD single-instruction multiple-data architecture

MIMD multiple-instruction multiple-data architecture