Documente Academic
Documente Profesional
Documente Cultură
Configurations
Brian Bramer,
Faculty of Computing and Engineering Sciences
De Montfort University, Leicester, UK
Contents
1 Introduction
2 Performance requirements due to system and application software
2.1 Outline of a typical small to medium sized configuration
2.2 Operating system and system software requirements
2.2.1 Support for a Multi-Programming Environment
2.2.2 Support for Virtual Memory
2.2.3 Main Memory Requirements
2.2.4 Disk Requirements
2.3 Application Dependent Performance Factors
8 Conclusions
9 References
1 Introduction
When considering the acquisition of a computer system the first task undertaken is a to carry out
a feasibility study. The concept of installing a new or upgrading an existing system is analysed
to determine cost effectiveness in terms of end-user requirements and advantages gained, e.g.
increased productivity of skilled staff, reduced product development times, a more viable
product, etc. The result of the feasibility study will be a report to be submitted to senior
management to request funds to implement the proposed system.
The feasibility study to generate system requirements not only in terms of software (to solve the
end-users problems) but also hardware to support that software. The hardware requirements will
be in terms of computer processor power (do you need a £1000 office PC or a
£20000 professional workstation with real-time 3D graphics capability?), memory size
(do you need an 32Mbytes or 256Mbytes of RAM), disk space (even individual PC based
packages often need a 1Gbyte each), network support (to communicate with servers or other
users), etc. In addition, many end-users often forget the requirements of the system software
(operating system, compilers, etc.). These notes consider hardware requirements to support
software and discuss what factors effect overall system performance.
4. User I/O interface which controls the display screen and the keyboard.
5. Input/output interface devices (for connecting external devices such as printers), e.g.
serial or parallel I/O interfaces.
In Fig 1 an information highway or bus system connects the various components of the system:
Address Bus carries the address of the memory location or I/O device being accessed
Control Bus which carries the control signals between the CPU and the other components
of the system, e.g. signals to indicate when a valid address is on the address bus and if
data is to be read or written.
could well require a large portion of disk to be set aside for the swap area. For example, a typical
professional workstation running UNIX could require a swap area of between 200 and
500Mbytes depending upon application, and allowance must be made for this. In addition,
modern multiprogramming environments also support virtual memory.
2.2.2 Support for virtual memory
Over the past 40 years sophisticated large scale computer based applications (e.g. engineering
CAD) have always required more main memory than was physically available (or affordable) on
the computers of the time. To overcome this problem virtual memory techniques evolved in the
late 1960's (Denning 1970).
Virtual memory makes use of a phenomenon known as locality of reference in which memory
references of both instructions and data tend to cluster. Over short periods of time a significant
amount of:
(a) instruction execution is localized either within loops or heavily used subroutines, and
(b) data manipulation is on local variables or upon tables or arrays of information.
Most virtual memory systems use a technique called paging in which the program and data is
broken down into 'pages' (typical size 4Kbytes) which are held on disk. Pages are then brought
into main memory as required and 'swapped' out when main memory is full. This technique
allows program size to be much larger than the physical main memory size (typically a modern
professional workstation may have 64 to 512Mbytes of main memory but a virtual memory size
of 4Gbyte). As the number and/or size of concurrent programs increases a phenomenon known a
thrashing can occur in which the system spends all its time swapping pages to and from disk and
doing nothing else. It is therefore important to configure sufficient physical memory even under
a virtual memory environment. This problem often becomes apparent over a period of time as
new releases of software (including the operating system) are mounted on a system. New
versions of software are always larger (sometimes two or three times) and users experience a
sudden reduction in response times and extended program run times. This often necessitates the
upgrading of main memory on existing systems every year or two.
2.2.3 Main memory requirements
Sufficient main memory is required to hold the operating system kernel (those functions
permanently in main memory) and those functions which will be loaded as required. If window
managers and/or network managers are also being used allowance should be made for their
requirements. Typically on a PC a simple command line operating system (e.g. MS-DOS)
required between 80 and 200Kbytes depending upon functions loaded and a more sophisticated
environment such as UNIX or Windows 2000 would require between 8 and 32Mbytes. The
follow are minimum recommendations for the size of RAM memory for IBM PC compatible
microcomputer operating systems (large scale applications such as a large database could require
more):
Windows 3.1
Windows 95
5.8 Mbytes
plus
CD-ROM driver
6.9 Mbytes
plus
Windows 3.1
16.3 Mbytes
plus
Win32S
18.5 Mbytes
plus
Windows 95
41 Mbytes
One would then need to allow another 20 to 200Mbyes for swap space (depending upon
application). Other examples of PC operating system requirements are:
OS/2
Windows 98
Windows NT/2000
Some operating systems (e.g. certain versions of Linux) require swap space to be allocated when
the disk is initialized (by setting up a swap partition). Others (e.g. Windows 95/98) have a swap
file which extends and contracts as required (will cause problems if the disk fills up!)
Workstation and System Configurations
Wordstar 7
Borland C++ 5
Visual C++ 2
Oracle
Java JDK1.2.2
Viewlogic CAD
800/1000 Mbytes
It is worth noting that although Java is not particularly large in disk requirements it needs
powerful processors and lots of memory to run complex Java applications using sophisticated
APIs, e.g. minimum Pentium 400 with 64/128Mbytes of memory. In a recent experiment Sun's
Java IDE Forte was mounted on a 5 year old DEC Alpha with 64Mbytes of memory and took 15
minutes to load!
Generally software houses or package sales documentation will provide guidance on processor
and memory requirements, e.g. so much memory and disk space for the base system plus so
much per user giving an immediate guide to the size of system required (one then needs to add
operating system requirements).
10
11
68008
68000
68010
68020
68030
0.5
0.6
0.8
0.65
0.8
1.1
1.7
2.2
3.0
6.0
5.0
8
10
12.5
16.65
25
33
50
1.3
12.0
68040
22.0
29.0
Table 1 Relative performance (in Mips) of the Motorola MC68000 family against clock speed
(figures are a guide - results depend on clock speed, memory access time, cache hit rate, etc.)
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which
typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at
two, three or four times the external clock speed making a particular improvement in processor
bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where
a large data bus and fast I/O devices are more important.
3.2.3 Memory speed
Main memory speed should match the speed of the processor. A 25MHz MC68020 requires
faster (hence more expensive) memory than a 12.5MHz version. If necessary, memory attached
to a MC68020 can delay the processor on a memory read/write by using WAIT states, which
makes the processor idle for one or more clock periods and hence slows the overall execution
speed. A common tactic in the early 1990's was to build machines with a fast processor and clock
but with slow (and cheap) memory, e.g. the unwary could be caught by a machine advertised as
having a 25MHz CPU but which could execute programs slower than a 12.5MHz machine.
3.2.4 Address Bus size
The number of address lines determines the memory address space of a processor, i.e. both the
maximum amount of physical main memory which can be accessed (if fitted) and the maximum
logical memory size in a virtual memory environment. Therefore the address bus size effects
maximum program/data size and/or the amount of swapping and paging in a
multiprogramming/virtual memory environment. For example, 16 address lines can access a
maximum of 64Kbytes, 20 lines 1Mbyte, 24 lines 16Mbyte and 32 lines 4Gbyte.
It must be noted that even though a processor has a particular address space this does not mean
that a computer system will be or can be fitted with the maximum amount. For example, a
processor with 32 address lines has an address space of 4Gbyte but typical 32-bit machines are
fitted with anything between 4Mbyte and 256Mbyte of physical memory. The 4Gbyte address
space becomes important under a virtual memory environment where very large programs can be
executed on machines with much smaller physical memory. In practice there is a maximum
amount of memory which can be fitted to a particular model of machine (determined by the
layout of the machine in terms of bus slots, physical space available, etc.). One of the major
Workstation and System Configurations
12
differences between personal workstations and mini/mainframe computer systems is that the
latter can generally be fitted with much larger physical memory.
3.2.5 Data bus size
The width of the data bus determines how many memory read/write cycles are required to access
instructions/data and has a major effect on I/O bandwidth, e.g. if a processor has a 16-bit data
bus it will require two memory accesses to read a 32-bit number while a processor with a 32-bit
data bus would require a single access. A question often asked is why a multi-user minicomputer
can be up to ten times the cost of a personal workstation with similar processor performance. The
answer is that when purchasing minicomputers and mainframe systems one is buying, to a large
extent, I/O bandwidth and physical memory capacity. An example (from the mid 1980's) is the
comparison between an Apollo DN3000 workstation (based on a MC68020 12MHz
microprocessor) and the DEC VAX 8200 minicomputer
processor rating
I/O bandwidth
Apollo DN3000
1.2 Mips
1Mbyte/sec
£20,000
1.2 Mips
13 Mbytes/sec
£200,000
The figures are order of magnitude guides but do give an indication of different areas of
application of the systems. The Apollo was a single user workstation used for highly interactive
computational tasks and the VAX was typically be used by a number of concurrent users (e.g.
five to ten) to run tasks which are not heavy in computational terms but which require a system
capable of supporting the I/O of a number of users (e.g. multi-user databases, sales/stock control
packages, accounting packages, etc.)
13
Microprocessor manufacturer
& type
Intel 8080
Zilog Z80
Motorola 6800
Intel 8088 (IBM/PC)
Intel 8086 (IBM/PC XT)
Motorola 68008
Motorola 68000, 68010
Intel 80186, 80286
Motorola 68020/30/40
Intel 80386SX
Intel 80386DX
Intel 80486DX
Intel 80486SX
Intel 80486DX2
Intel 80486DX4
Intel Pentium 400
16
16
16
20
20
20
24
24
32
24
32
32
32
32
32
32
maximum
data bus size in bitsclock
memory bytes
64K
64K
64K
1M
1M
1M
16M
16M
4G
16M
4G
4G
4G
4G
4G
4G
8
8
8
8
16
8
16
16
32
16
32
32
32
32
32
32/64 PCI
*1
*1
*2
*3
*4
14
performance, i.e. a 32-bit value can be accessed with a single memory read operation on a 32-bit
bus but requires two memory reads with a 16-bit bus. In practice the more powerful the
processor the larger the data and address busses.
The size of the address and data busses has a major impact on the overall cost of a system, i.e.
the larger the bus the more complex the interface circuits and the more 'wires' interconnecting
system components. Table 2 shows that there are versions of some processors with a smaller data
and addresses busses, e.g. the Intel 80386SX is (from a programmers viewpoint) internally
identically to the 80386 but has a 20-bit address bus and a 16-bit external data bus (but the
internal data bus is 32-bits). These are used to build low cost systems which are able to run
application programs written for the full processors (but with reduced performance).
The Intel 80486DX2, 80486DX4 and Pentium processors have on-chip clock multipliers which
typically multiply the clock by two, three or four times, i.e. on-chip operations are performed at
two, three or four times the external clock speed making a particular improvement in processor
bound jobs. This has little effect on I/O bound jobs (e.g. a database server or a file server) where
a large data bus and fast I/O devices are more important.
Table 2a shows the Intel processors with address, data bus sizes (internal and external), internal
cache size, presence of internal co-processor and internal clock speed.
IBM PC compatibles
processor model
maximu
address
internal
m
bus size
data bus
memory
in bits
in bits
bytes
20
20
24
32
24
32
32
32
32
32
1M
1M
16M
4G
16M
4G
4G
4G
4G
4G
16
16
16
32
32
32
32
32
32
64
internal
external internal
intern
codata bus cache in
al
processo
in bits
bytes
clock
r
8
16
16
32
16
32
32
32
32
32/64 PCI
none
none
none
none
none
8K
8K
8K
16K
16K
no
no
no
no
no
yes
no
yes
yes
yes
*1
*1
*1
*1
*1
*1
*1
*2
*2
or*3
*4
15
determines how many memory read/write cycles are required to access instructions/data
has a major effect of input/output bandwidth (important in file servers and database
servers)
Cache memory
a fast memory logically positioned between the processor and bus/main memory - can be
on chip (as in 80486) and/or external
Floating point co-processor
is important in real number calculations (twenty times speed up over normal CPU)
important in mathematical, scientific and engineering applications
Clock Speed
The clock times events within the computer - the higher the clock the faster the system
goes - (assuming memory, bus, etc. matches the speed)
Internal clock speed
the 80486DX2, 80486DX4 and Pentium processors contain clock
doublers/triplers/quadrouplers, etc.
on-chip operations are performed at 2/3/4 times the external clock speed - external
operations are the same
16
17
tend to cluster. The cache is a fast memory (matched to CPU speed), typically between 4K and
256Kbytes in size, which is logically positioned between the processor and bus/main memory.
When the CPU requires a word (instruction or data) a check is made to see if it is in the cache
and if so it is delivered to the CPU. If it is not in the cache a block of main memory is fetched
into the cache and it is likely that future memory references will be to other words in the block
(typically a hit ratio of 75% or better can be achieved). Clearly memory writes have to be catered
for and the replacement of blocks when new block is to be read in. Modern microprocessors
(Intel 80486 and Motorola MC68040) have separate on-chip instruction and data cache
memories - additional external caches may also be used, see Fig 2. Cache memory is particularly
important in RISC machines where the one instruction execution per cycle makes heavy
demands on main memory.
The concept of a cache has been extended to disk I/O. When a program requests a block or
blocks several more are read into the cache where it is immediately available for future disk
access requests. Disk caches may take two forms:
Software disk cache
in which the operating system or disk driver maintain the cache in main memory, i.e.
using the main CPU of the system to carry out the caching operations.
Hardware disk cache
in which the disk interface contains its own cache RAM memory (typically 4 to
16Mbytes) and control circuits, i.e. the disk cache is independent of the main CPU.
Hardware disk caches are more effective but require a more complex (and expensive) disk
controller and tend to be used with fast disks in I/O bound applications, e.g. databases.
Fig 2 Showing CPU (with ALU, Control Unit and internal cache), external cache, RAM memory
and busses
18
19
20
21
2. The microprogram of the control units becomes very complex and difficult to debug.
3. Studies of typical programs have shown that the majority of computation uses only a
small subset of the instruction set, i.e. a large percentage of the chip area allocated to the
processor is used very little. Table 3 (Tanenbaum 1990) presents the results of studies of
five programming languages (SAL is a Pascal like language and XPL an a PL/1 like
language) and presents the percentages of various statement types in a sample of
programs. It can be seen that assignments, IFs and procedure CALLs account for
typically 85% of program statements. Further analysis (Tanenbaum 1990) has shown that
80% of assignments are of the form variable:=value, 15% involve a single operator
(variable:=a+b) and only 5% percent of expressions involve two or more operators.
Statement
SAL
XPL
Fortran
Pascal
Average
Assignment
IF
CALL
LOOP
GOTO
other
47
17
25
6
0
5
55
17
17
5
1
5
51
10
5
9
9
16
38
43
12
3
3
1
45
29
15
5
0
6
47
23
15
6
3
7
22
CPU
Transistors
Design
(person-months)
Layout
(person-months)
RISC I
RISC II
MC68000
Z8000
Intel APx-432
44,000
41,000
68,000
18,000
110,000
15
18
100
60
170
12
12
70
70
90
Table 4 Design and layout effort for some microprocessors (Stallings 2000)
23
Graphics processor
to control the graphics display. This can range from a fairly simple graphics controller
chip which provides basic text, pixel and line drawing capabilities up to specialised
processors which support advanced graphics standards such as X windows.
Input/Output control processors
which carry out complex I/O tasks without the intervention of the CPU, e.g. network,
disk, intelligent terminal I/O, etc. For example, consider a sophisticated network where
the network communications and protocols are handled by a dedicated processor
(sometimes the network processor and associated circuits is more powerful and complex
than the main CPU of the system).
In a 'simple' system all the above tasks would be carried out by sequences of instructions
executed by the CPU. Implementing functions in specialised hardware has the following
advantages which enhance overall system performance:
(a) the specialised hardware can execute functions much faster than the equivalent
instruction sequence executed by the general purpose CPU; and
(b) it is often possible for the CPU to do other processing while a specialist processor is
carrying out a function (at the request of the CPU), e.g. overlapping a floating point
calculation with the execution of further instructions by the CPU (assuming the further
instructions are not dependent upon the result of the floating point calculation).
4.5.2 Multi-processors and Parallel Processors
John von Neuman in 1949 (Foster 1978, Tanenbaum 1990) developed EDSAC, the first
electronic stored program computer, in which a single CPU sent sequential requests over a bus to
memory for instructions and data. The vast majority of computer systems (CISC and RISC) built
since that time are essentially developments of the basic von Neuman machine.
One of the major limitations when increasing processor clock rate is the speed, approximately
20cm/nsec, at which the electrical signals travel around the system. Therefore to build a
computer with 1nsec instruction timing, signals must travel less than 20cm to and from memory.
Attempting to reducing signal path lengths by making systems very compact leads to cooling
problems which require large mainframe and supercomputers to have complex cooling systems
(often the downtime of such systems is not caused by failure of the computer but a fault in the
cooling system). In addition, many of the latest 32-bit microprocessors have experienced overheating problems. It therefore becomes harder and harder to make single processor systems go
faster and an alternative is to have a number of slower CPUs working together. In general
modern computer systems can be categorised as follows:
The von Neuman machine is SISD architecture in which some parallel processing is possible
Workstation and System Configurations
24
25
26
SSI
MSI
LSI
VLSI
ULSI
2-64
64-2000
2000-64000
640002000000
200000064000000
27
28
29
6 System configurations
6.1 Personal computers, workstations, minis, distributed, etc.
In the late 1970s computer systems could be classified into microcomputers, minicomputers and
mainframe computers:
A microcomputer:
a single user computer system (cost £2000 to £5000) based on an 8-bit
microprocessor (Intel 8080, Zilog Z80, Motorola 68000). These were used for small
industrial (e.g. small control systems), office (e.g. word-processing, spreadsheets) and
program development (e.g. schools, colleges) applications.
A minicomputer:
a medium sized multi-user system (cost £20000 to £200000) used within a
department or a laboratory. Typically it would support 4 to 16 concurrent users depending
upon its size and area of application, e.g. CAD in a design office.
A mainframe computer:
a large multi-user computer system (cost £500000 upwards) used as the central
computer service of a large organization, e.g. Gas Board customer accounts. Large
organizations could have several mainframe and minicomputer systems, possibly on
different sites, linked by a communications network.
As technology advanced the classifications have become blurred and modern microcomputers
are as powerful as the minicomputers of ten years ago or the mainframes of twenty years ago.
Fig. 8 shows the rate of CPU performance growth since the 1960's (Hennessy & Jouppi 1991) as
measured by a general purpose benchmark such as SPEC (these trends still continue - see Fig. 3).
Microprocessor based systems have been increasing in performance by 1.5 to 2.5 times per year
during the past six to seven years whereas mini and mainframe improvement is about 25% per
year (Hennessy & Jouppi 1991). It must be emphasized that Fig. 8 only compares CPU
performance and no account is taken of other factors such as the larger I/O bandwidth and
memory capacity of mini and mainframe systems and the special applications which require
supercomputers.
Today system configurations may be summarized as PCs (personal computers), professional
workstations, multi-user mini/mainframe computers and distributed environments.
6.1.1 Personal computers
PC - Personal Computer:
a generic term for a small (relatively) personal microcomputer system (cost £500
to £5000) used for a wide range of relatively low-level computer applications (see
Table 6 for a summary of the features of a typical PC). The most common PCs are the
IBM PC and compatible machines (based on the Intel 8086/80286/80386/80486/Pentium
family of microprocessors).
Workstation and System Configurations
30
Bus size:
Until the late 1980's the major factor which limited the overall performance of IBM PC
compatible computers was the widespread use of the 16 bit IBM PC/AT bus (the 16 bit
refers to the data bus size) developed in the mid 1980s to support the 80286 based IBM
PC/AT microcomputer. This bus system was widely accepted and became known as the
ISA bus (Industry Standard Architecture). Unfortunately in terms of faster 80386/80486
computer systems the ISA bus was very slow, having a maximum I/O bandwidth of 8
Mbytes/sec. This caused a severe I/O bottleneck within 80486 systems when accessing
disk controllers and video displays via the bus, see Fig 9.
Some IBM PC compatibles were available with the IBM Microchannel bus or the EISA
(Extended Industry Standard Architecture) bus, both of which are 32 bit bus systems
having I/O bandwidths of 20 to 30 Mbytes/sec or greater. An EISA bus machine,
however, could cost £500 to £1000 more than the equivalent ISA bus
system with corresponding increases in the cost of the I/O boards (typically two to three
times the cost of an equivalent ISA bus card). The EISA bus maintains compatibility with
ISA enabling existing ISA cards to be used with it.
The problem with EISA was that it made the PC quite expensive and this led to the
development of local busses which are cheaper and have similar or better performance.
There were two major contenders:
1. VESA a 32-bit local bus which was the first to appear
2. PCI a 32/64-bit local bus which is supported by Microsoft and Intel
Because VESA was the first to appear it became popular in the early/mid eighties. Since
that time PCI has taken over - mainly because it was supported by Microsoft and Intel
and could be use to support the Pentium which has a 64-bit data bus (Intel quote peak
bandwidths of 132Mbytes/sec). Early Pentium systems had a PCI local bus used for high
performance devices (video, disk, etc.) plus an ISA bus for slower devices (serial and
parallel I/O, etc.), see Fig. 10. Many of todays Pentiums systems do not have ISA bus
slots which can cause problems if on wishes to interface with old devices, e.g. specialist
hardware boards.
31
32
PCI bus The original PCI bus was rated 32 bits at 33MHz giving a maximum throughput
of 132Mbytes per second. Since then PCI-2 has appeared rated 32/64bits at 66MHz
giving a maximum throughput of 528Mbytes persecond. Unfortunately the PCI bus is
now quite dated and is becoming a performance bottleneck in modern Pentium systems see http://www.intel.com/network/performance_brief/pc_bus.htm and
http://www.pcguide.com/ref/mbsys/buses/func.htm for a discussion of PC busses.
For example, many Pentium motherboards are also equipped with a AGP (Accelerated
Graphics Port) which was developed to support high performance graphics cards for 2D
and 3D applications - see http://developer.intel.com/technology/agp/tutorial/,
http://agpforum.org/ and http://www.pcguide.com/ref/mbsys/buses/types/agp.htm
Display
The main problem with running sophisticated graphics applications on a PC is that the screen
quality in terms of addressable pixels and physical size is deficient:
1. PC VGA graphics is only 640*480 pixels compared with a workstation 'norm' of greater
than 1000*1000 pixels). The super VGA graphics (1024*768 pixels by 256 colours) of
modern PCs is much better.
2. Screen updating can be very slow (relative to a workstation) on a machine with an ISA
bus (see discussion above on PC bus systems).
3. cheaper PCs sometimes use an interlaced display to reduce overall system cost, i.e.:
Non-interlaced display: every line of pixels is displayed 50 or 60 times per second.
Interlaced display: alternate lines of pixels are displayed 25 or 30 times per second thus
horizontal lines of one pixel thickness flicker.
4. The physical screen size of a PC is typically 14/15/17 inches against the workstation
norm of 19/21 inches.
Operating system
The most common operating system of IBM PC compatibles is generally some variety of
Windows (95/98/NT/2000). Although OK for many application environments UNIX is still
preferred for high-performance robust application areas.
6.1.2 Professional workstations
Professional workstation a generic term applied to the (relatively) high powered personal
computing systems operating in a distributed environment evolved by Apollo (Nelson & Leach
1984) and Sun (Pratt 1984) in the early 1980's. The main advantages of professional
workstations over PCs are:
a. Computing power is an order of magnitude higher: the early machines were based
on the Motorola MC68000 family of microprocessors, today the tendency is to
Workstation and System Configurations
33
use RISC based architectures. Main memory and disk size is corresponding
higher.
b. Bus system (Stallings 2000): in the past professional workstations used 32 bit bus
systems, e.g. VME with an I/O bandwidth of 40 Mbytes/sec. Modern
workstations have moving to 64 bit or greater buses or independent memory and
I/O bus systems. .
c. UNIX operating system: the de facto industry standard for medium sized
computer systems.
d. Integrated environment: the workstations are designed to operate in a
sophisticated multiprogramming networked distributed environment. The
operating system is integrated with the window manager, network file system, etc.
e. Multiprogramming/virtual memory operating system: the workstations are
designed to run large highly interactive computational tasks requiring a
sophisticated environment.
f. High quality display screen: a large high quality non-interlaced display with
mouse input is used to interact with the window managed multiprogramming
environment.
A modern high-performance PC, equipped with high-performance graphics card and high quality
display can compete with low end workstations (at similar cost). More specialised applications
such as real-time 3D graphics still require professional workstations.
6.1.3 Multi-user minicomputer and mainframe computer systems.
The terms mini and mainframe are becoming very blurred but in general refer to large multi-user
configurations with good I/O bandwidth and main memory capacity, i.e. a number of users (100
to 100000) concurrently running relatively straight-forward applications on a common computer
system. High powered multi-user systems typically have an I/O bandwidth and physical memory
capacity at least an order of magnitude greater than PCs and workstations of similar CPU power.
Such multi-user environments may be accessed via a network by X terminals, PCs or
professional workstations. PCs or professional workstations can be used as multi-user machines
so long as the amount of concurrent I/O does not reduce overall performance.
34
2. The number of user workstations, their distribution over the network(s) together with
support fileservers and nodal processors.
3. The size of main memory and disks on the user workstations. The network traffic can be
reduced if the operating system and commonly used software is held on local disks
(needs careful management of new software releases). A cheaper alternative used to be to
have a small disk on the user workstation which held the operating system swop space so
at least the operating system did not have to page over the network (also see diskless
nodes below).
4. The number of fileservers and their power in terms of processor performance, main
memory size and disk I/O performance. The distribution of software packages and user
files around the fileservers is critical:
(a) complex intensive centralized tasks could well require a dedicated fileserver, e.g.
an advanced database environment or the analysis of large engineering structures
using finite element mesh techniques;
Workstation and System Configurations
35
(b) spreading the end-user files around the fileservers prevents overloading of
particular fileservers (and if a fileserver breaks down some users can still do their
work).
5. The number (if any) of diskless nodes. One of the general rules when purchasing disks is
the larger the disk the less the cost per byte. One way to reduce costs in a distributed
system is to equip selected machines with large disks which then act a 'hosts' to a number
of diskless nodes. On start up a diskless node 'boots' the operating system over the
network from the 'host' and carries out all disk access over the network. In practice the
'host' machines may be other user workstations or fileservers. An additional advantage
was that management of the overall system was made easier with less disks to maintain
and update. Diskless system work provided the following factors are taken into
consideration:
(a) The network does not become overloaded with traffic from too many diskless
nodes;
(b) The ratio of diskless to disked host nodes does not become too high, i.e. placing
excessive load on the hosts. In practice a ratio of three to one gives reasonable
performance, however, systems with ratios of ten to one (or greater) have been
implemented with correspondingly poor performance.
(c) There is sufficient main memory in the diskless nodes such than excessive
swapping/paging (over the network) does not occur in a multiprogramming/virtual
memory environment. Sudden degradations in performance can often be observed
when new software releases cause the problem of excessive paging as programs
increase in size.
(d) The network speed is sufficiently high to cope with the overall demands of the
workstations. Until the late 1980's this was not a problem, with typical network
speeds of 10Mbit/sec and typical professional workstations having a power of 1 to 5
Mips. However, modern machines make diskless nodes impossible without very fast
networks.
Clearly great care is needed in configuring a distributed environment with a slight error giving
the impression of 'clockwork' powered machines. Common problems (often due to lack of funds)
are:
1. too few fileservers for the number of user workstations and/or poor distribution of
fileservers across the network;
2. too little main memory on fileservers causing bottlenecks in the accessing of centralized
file systems;
3. too many diskless to disked nodes and/or too little main memory in diskless nodes.
36
8 Conclusions
This paper reviewed a range of issues critical in system performance evaluation:
Workstation and System Configurations
37
1. The effect of system and end-user software on the overall requirements of a computer
system.
2. Factors effecting overall system performance in terms of CPU power, memory size, data
bus size, etc.
3. The techniques used to improve processor performance and how modern integrated
circuits have enabled these to be implemented in low to medium cost systems.
4. The range of system configurations (PCs, workstations, multi-user, distributed) with
particular attention to factors which are critical in a distributed system.
9 References
Bramer, B, 1989, 'Selection of computer systems to meet end-user requirements', IEEE
Computer Aided Engineering Journal, Vol. 6 No. 2, April, pp. 52-58.
Denning, P J, 1970, 'Virtual memory', ACM Computing Surveys, Vol. 2 No. 3, September.
Foster, C C, 1976, 'Computer Architecture', Van Nostrand Reinhold.
Gelsinger, PP, Gargini, P A, Parker, G H, & YU A Y C, 1989, 'Microprocessors circa 2000', IEEE
Spectrum, Vol. 26 No. 10, October, pp 43-47.
Hennessy, J L & Jouppi, N P, 1991, 'Computer technology and architecture: an evolving
interaction', IEEE Computer, Vol. 24, No. 9, September, pp 18-28.
Nelson, D L & Leach, P J, 1984, 'The Architecture and Applications of the Apollo Domain',
IEEE CG&A, April, pp 58-66.
Pratt, V R, 1984, 'Standards and Performance Issues in the Workstations Market', IEEE CG&A,
April, pp 71-76.
Stallings, W, 2000, 'Computer organization and architecture', Fifth Edition, Prentice Hall, ISBN
0-130085263-5.
Tanenbaum, A S, 1990, 'Structured Computer Organisation', Prentice-Hall.
38