Documente Academic
Documente Profesional
Documente Cultură
Lecture 4: Performance
Evaluation
qProfessor George Yuan
qOffice: Rm. 2527
qEmail:eeyuan@ust.hk
Note: some of the slides are adapted from Computer Organization and Design.
Copyright 1998 Morgan Kaufmann Publishers and Notes of Prof. Pattersons CS152
Class, Copyright 1997 UCB.
OUTLINE
qWhat is the computer performance?
qHow to evaluate the performance?
Page 2
Passengers
Range (mi)
101
630
Boeing 747
470
BAC/Sud Concorde
132
Douglas
DC-8-50
Time
to perform146
the
Speed (mph)
598
4150
610
4000
1350
8720
544
(Execution
Time)
task
execution time, response time, latency
D
L=
S
Page 3
Example
Execution time of Concorde vs. 747:
vConcorde is 1350 mph / 610 mph = 2.2 times faster
q Throughput of Concorde vs. 747:
vBoeing is 286700 pmph / 178200 pmph = 1.6 times
faster (470*610=286700, 132*1350=178200)
q Conclusions:
vConcorde is 2.2 times faster in terms of flying time.
v747 is 1.6 times faster in terms of throughput.
q
Page 4
q Throughput:
v How many tasks can the machine run at once?
v What is the average execution rate?
v How much work is getting done?
qComputer upgrade:
1.P3 -> P4
2.1 P3 -> 2 P3
qWe will focus primarily on execution time for a
single job.
ELEC2300 Computer Organization Fall 2013
Page 5
Definitions
qFor computer study,
1
performanceX =
execution
time X
" X is n times faster than Y" means
Page 6
Execution Time
qElapsed time or response time
vcount everything (disk and memory accesses, I/O , etc.)
va useful number, but often not good for comparison purposes
qCPU time
vDoes not count I/O or time spent running other programs
vcan be broken up into system time, and user time
performanceX =
1
user CPU
time X
ELEC2300 Computer Organization Fall 2013
Page 7
time
Page 8
Page 9
Example-Problem
q Description:
vA program takes 10 seconds to run on a 400 MHz
machine (computer A). We want to design a faster
machine (computer B) that can run the same program
in 6 seconds.
vThe increase in clock rate affects the rest of the CPU
design, causing machine B to require 1.2 times as
many clock cycles as machine A for the program.
q Problem to solve:
vWhat clock rate should machine B have?
Example - Answer
program
assembly program
assembler
compiler
Instruction #
machine instructions
ISA
processor
qActual situation
vFor some processors, some instructions may take more cycles
than the others:
E.g. multiplication takes more cycles than addition
Floating point operations takes more cycles than integer
operations
Memory access takes more cycles than accessing registers
vConclusion: not all instructions require the same # of cycles to
execute.
and
i =1
i =1
CPI = (CPI i Ci ) / Ci
ELEC2300 Computer Organization Fall 2013 Page 15
CPI Example
qSuppose we have two implementations of the
same instruction set architecture (ISA)
qFor some program, machine A has a clock cycle
time of 1 ns (1 GHz) and a CPI of 2.0. Machine
B has a clock cycle time of 2 ns (500MHz) and a
CPI of 1.2. Which machine is faster for this
program, and by how much?
qIf two machines have the same ISA which of our
quantities (e.g., clock rate, CPI, execution time, # of
instructions, MIPS) will always be identical?
Example - Solution
seconds
# of instructions
=
program
a program
# of clocks
second
*
*
# of instructions
clock
A
2
4
B
1
1
C
2
1
q Problem to solve
vWhich code sequence executes the most instructions? Which is
faster? What is the CPI for each sequence?
ELEC2300 Computer Organization Fall 2013 Page 19
Example - Answer
IPS
Year
Motorola 68000
1MIPS @ 8MHz
1979
Intel 386DX
8.5MIPS @ 25MHz
1988
Intel 486DX
54MIPS @ 66MHz
1992
PowerPC G2
35MIPS @ 33MHz
1994
541MIPS @ 200MHz
1996
ARM 7500FE
35.9MIPS @ 40MHz
1996
PowerPC G3
525MIPS @ 233MHz
1997
Zilog eZ80
80MIPS @ 50MHz
1999
1354MIPS @ 500MHz
1999
AMD Athlon
3561MIPS @ 1.2GHz
2000
Pentium 4
9726MIPS @ 3.2GHz
2003
ARM Cortex A8
2000MIPS @ 1.0GHz
2005
6400MIPS @ 3.2GHz
2005
2005
2006
57063MIPS @ 3.33GHz
MIPS example
q Two different compilers are being tested for a 100 MHz. machine
with three different classes of instructions: Class A, Class B, and
Class C, which require one, two, and three cycles (respectively).
Both compilers are used to produce code for a large piece of
software.
vThe first compiler's code uses 5 million Class A instructions, 1
million Class B instructions, and 1 million Class C instructions.
vThe second compiler's code uses 10 million Class A instructions,
1 million Class B instructions, and 1 million Class C instructions.
q What are the execution times for each sequence?
q What is the MIPS index for this processor based on the two testing
sequence?
Summary
qSome related terminology:
vclock, clock cycle, cycle
vclock cycle time, cycle time (seconds, us, ns)
vclock rate, cycle rate (Hz, MHz)
vCPI (cycles per instruction)
vMIPS (millions of instructions per second)
OUTLINE
qWhat is the computer performance?
qHow to evaluate the performance?
Benchmarks
q Execution time calculation:
Execution Time = instruction count * CPI * clock cycle time
= instruction count * CPI / clock rate
q Benchmark: a set of specially designed programs to test the
performance of a computer
q Performance best determined by running a real application
vBenchmarks are application specific
CPU performance, graphics, high-performance computing, objectoriented computing, Java applications, client-server models, mail
systems, file systems, Web servers.
SPEC 89
q Compiler enhancements and performance
800
700
600
500
400
300
200
100
gcc
espresso
spice
doduc
nasa7
li
eqntott
matrix300
fpppp
tomcatv
Benchmark
Compiler
Enhanced compiler
SPEC CPU2000
qSPEC ratio
vReference: Sun Ultra 5_10 with a 300MHz
processor
qCINT2000, CFP2000
vGeometric mean of SPEC ratios
Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
Example:
"Suppose a program runs in 100 seconds on a machine, with
multiplication responsible for 80 seconds of this time. How much do we
have to improve the speed of multiplication if we want the program to run
4 times faster?"
How about making the program 5 times faster?
Principle: Make the common case fast
Example
q Suppose we enhance a machine making all floating-point instructions
five times faster. If the execution time of some benchmark before the
floating-point enhancement is 10 seconds, what will the speedup be if
half of the 10 seconds is spent executing floating-point instructions?
q We are looking for a benchmark to show off the new floating-point
unit described above, and want the overall benchmark to show a
speedup of 3. One benchmark we are considering runs for 100
seconds with the old floating-point hardware. How much of the
execution time would floating-point instructions have to account for
in this program in order to yield our desired speedup on this
benchmark?
Remember
qPerformance is specific to a particular program
vTotal execution time is a consistent summary of performance