Sunteți pe pagina 1din 34

ELEC2300 Computer Organization

Lecture 4: Performance
Evaluation
qProfessor George Yuan
qOffice: Rm. 2527
qEmail:eeyuan@ust.hk
Note: some of the slides are adapted from Computer Organization and Design.
Copyright 1998 Morgan Kaufmann Publishers and Notes of Prof. Pattersons CS152
Class, Copyright 1997 UCB.

OUTLINE
qWhat is the computer performance?
qHow to evaluate the performance?

ELEC2300 Computer Organization Fall 2013

Page 2

Which of these airplanes has the best performance?


4 types of airplanes fly between Hong Kong & Shanghai
(distance: D mi.)
Airplane
Boeing 737-100

Passengers

Range (mi)

101

630

Boeing 747
470
BAC/Sud Concorde
132
Douglas
DC-8-50
Time
to perform146
the

Speed (mph)
598

4150
610
4000
1350
8720
544
(Execution
Time)

task
execution time, response time, latency

D
L=
S

Tasks per day, hour, week, sec, ns. ..


1 S
1
T=
C = S C
throughput, bandwidth
D
D
Latency and throughput often are in opposition
ELEC2300 Computer Organization Fall 2013

Page 3

Example
Execution time of Concorde vs. 747:
vConcorde is 1350 mph / 610 mph = 2.2 times faster
q Throughput of Concorde vs. 747:
vBoeing is 286700 pmph / 178200 pmph = 1.6 times
faster (470*610=286700, 132*1350=178200)
q Conclusions:
vConcorde is 2.2 times faster in terms of flying time.
v747 is 1.6 times faster in terms of throughput.
q

ELEC2300 Computer Organization Fall 2013

Page 4

Execution Time vs. Throughput


q Execution time
v How long does it take for my job to run?
v How long does it take to execute a job?
v How long must I wait for the database query?

q Throughput:
v How many tasks can the machine run at once?
v What is the average execution rate?
v How much work is getting done?

qComputer upgrade:
1.P3 -> P4
2.1 P3 -> 2 P3
qWe will focus primarily on execution time for a
single job.
ELEC2300 Computer Organization Fall 2013

Page 5

Definitions
qFor computer study,
1
performanceX =

execution
time X
" X is n times faster than Y" means

n = performance X = execution time Y


performance Y execution
time X
Problem:
vmachine A runs a program in 20 seconds (1 program/20
sec)
vmachine B runs the same program in 25 seconds (1
program/25 sec)
ELEC2300 Computer Organization Fall 2013

Page 6

Execution Time
qElapsed time or response time
vcount everything (disk and memory accesses, I/O , etc.)
va useful number, but often not good for comparison purposes

qCPU time
vDoes not count I/O or time spent running other programs
vcan be broken up into system time, and user time

qOur focus: user CPU time


vtime spent executing the lines of code that are "in" our program
vSystem CPU time: time the CPU spends executing system
(kernal) code in order to run your program, such as, reading
files, moving information into and out of virtual memory, etc.

performanceX =

1

user CPU
time X
ELEC2300 Computer Organization Fall 2013

Page 7

CPU Time Measurement: Clock Cycles


qInstead of reporting execution time in seconds, we often
use cycles
seconds
cycles seconds
=

program program cycle


qProcessor runs machine instructions based on clock
clock cycle time

time

qclock rate (frequency) = cycles per second (1 Hz. = 1


cycle/sec)
A 200 Mhz. clock cycle time is
ELEC2300 Computer Organization Fall 2013

Page 8

Relating the Metrics


qCPU time for a program
CPU time = CPU clock cycles * clock cycle time
= CPU clock cycles/clock rate

qCommon ways to improve performance


(i.e. shorten CPU execution time):
vReduce number of required CPU clock cycles for
a program
vShorten clock cycle time (i.e. increase clock rate)

ELEC2300 Computer Organization Fall 2013

Page 9

Example-Problem
q Description:
vA program takes 10 seconds to run on a 400 MHz
machine (computer A). We want to design a faster
machine (computer B) that can run the same program
in 6 seconds.
vThe increase in clock rate affects the rest of the CPU
design, causing machine B to require 1.2 times as
many clock cycles as machine A for the program.

q Problem to solve:
vWhat clock rate should machine B have?

ELEC2300 Computer Organization Fall 2013 Page 10

Example - Answer

ELEC2300 Computer Organization Fall 2013 Page 11

Cycle Number Calculation


qCPU time for a program
CPU time = CPU clock cycles * clock cycle time
= CPU clock cycles/clock rate
compiler

program
assembly program

assembler

compiler

Instruction #

machine instructions
ISA
processor

clock cycles/instruction (CPI)

Cycle # = Instruction # CPI


ELEC2300 Computer Organization Fall 2013 Page 12

Cycles Per Instruction


qWrong assumption:
v# of CPU clock cycles in a program = # of instructions in the
program,

qActual situation
vFor some processors, some instructions may take more cycles
than the others:
E.g. multiplication takes more cycles than addition
Floating point operations takes more cycles than integer
operations
Memory access takes more cycles than accessing registers
vConclusion: not all instructions require the same # of cycles to
execute.

qCycle per instructions (CPI) an average number of


clock cycles that each instruction in a program takes to
execute.

ELEC2300 Computer Organization Fall 2013 Page 13

Cycles Per Instruction (CPI)


qDefinition (for a given program):
CPI = (CPU clock cycles)/(instruction count)
qA program has the same instruction count on two
different implementations of the same instruction set
architecture, but it may have different CPIs (because an
instruction may require different numbers of clock cycles
on different implementations). If the number of clock
cycles for a program is known, knowing either the
instruction count or the CPI can determine the other.
qCPI provides a measure for comparing implementations.
qInstruction count can be measured using software tools
or simulators.
ELEC2300 Computer Organization Fall 2013 Page 14

Cycles Per Instruction


qLet there be n different instruction classes
(with different CPIs). For a given program,
suppose we know:
vCPIi = CPI for instruction class i
vCi = # of instruction of class I

qCPU clock cycles = CPI * instruction count. It


can be generalized to
n

CPU _ clock _ cycles = (CPI i Ci )


i =1

and

i =1

i =1

CPI = (CPI i Ci ) / Ci
ELEC2300 Computer Organization Fall 2013 Page 15

CPI Example
qSuppose we have two implementations of the
same instruction set architecture (ISA)
qFor some program, machine A has a clock cycle
time of 1 ns (1 GHz) and a CPI of 2.0. Machine
B has a clock cycle time of 2 ns (500MHz) and a
CPI of 1.2. Which machine is faster for this
program, and by how much?
qIf two machines have the same ISA which of our
quantities (e.g., clock rate, CPI, execution time, # of
instructions, MIPS) will always be identical?

ELEC2300 Computer Organization Fall 2013 Page 16

Example - Solution

ELEC2300 Computer Organization Fall 2013 Page 17

Relating the metrics


qFor a given program X running on a machine A
Time =

seconds
# of instructions
=
program
a program

# of clocks
second
*
*
# of instructions
clock

= instruction count * CPI * clock cycle time


= instruction count * CPI / clock rate

qThe only complete and reliable measure is CPU execution


time
qOther measures are unreliable. E.g. changing the
instruction set to lower the instruction count may lead to a
larger CPI or an organization with a slower clock rate.
Either case can offset the improvement in instruction count.

ELEC2300 Computer Organization Fall 2013 Page 18

Example Comparing Code Segments


q Description
vA particular machine has the following hardware facts:
Instruction class
A
B
C

CPI for this instruction class


1
2
3

vFor a given C++ statement, a compiler designer considers two


code sequences with the following instruction counts:
Code sequence
1
2

Instruction counts for instruction classes

A
2
4

B
1
1

C
2
1

q Problem to solve
vWhich code sequence executes the most instructions? Which is
faster? What is the CPI for each sequence?
ELEC2300 Computer Organization Fall 2013 Page 19

Example - Answer

ELEC2300 Computer Organization Fall 2013 Page 20

A misleading measure - MIPS


qThere are some performance measures that are
famous among computer manufacturers and
sellers but are misleading!
qMIPS (million instructions per second)
(meaningless indication of processor speed)
vMIPS = (instruction count)/(execution time * 106)
vMIPS depends on
Instruction set (instructions have different capabilities)
Program

vMIPS can vary inversely with performance


vPeak performance

ELEC2300 Computer Organization Fall 2013 Page 21

Some Processors in MIPS


Processor

IPS

Year

Motorola 68000

1MIPS @ 8MHz

1979

Intel 386DX

8.5MIPS @ 25MHz

1988

Intel 486DX

54MIPS @ 66MHz

1992

PowerPC G2

35MIPS @ 33MHz

1994

Intel Pentium Pro

541MIPS @ 200MHz

1996

ARM 7500FE

35.9MIPS @ 40MHz

1996

PowerPC G3

525MIPS @ 233MHz

1997

Zilog eZ80

80MIPS @ 50MHz

1999

Intel Pentium III

1354MIPS @ 500MHz

1999

AMD Athlon

3561MIPS @ 1.2GHz

2000

Pentium 4

9726MIPS @ 3.2GHz

2003

ARM Cortex A8

2000MIPS @ 1.0GHz

2005

Xbox360 IBM Xenon Triple Core

6400MIPS @ 3.2GHz

2005

AMD Athlon 64 3800+ X2(Dual Core) 14564MIPS @ 2.0GHz

2005

Intel Core2 Extreme QX6700

2006

57063MIPS @ 3.33GHz

ELEC2300 Computer Organization Fall 2013 Page 22

Another misleading measure - MFLOPS


qMFLOPS (million floating-point operations per second):
vMFLOPS =
(# of floating point operations)/(execution time * 106)
vMFLOPS considers only floating-point operations
(addition, subtraction, multiplication, or division
operation applied to a number in a single or double
precision floating-point representation).
vMFLOPS depends on:
Floating-point operation
(e.g., addition and multiplication differ in complexity)
Program

vMeaningless if there is little or no floating-point


arithmetic.

ELEC2300 Computer Organization Fall 2013 Page 23

MIPS example
q Two different compilers are being tested for a 100 MHz. machine
with three different classes of instructions: Class A, Class B, and
Class C, which require one, two, and three cycles (respectively).
Both compilers are used to produce code for a large piece of
software.
vThe first compiler's code uses 5 million Class A instructions, 1
million Class B instructions, and 1 million Class C instructions.
vThe second compiler's code uses 10 million Class A instructions,
1 million Class B instructions, and 1 million Class C instructions.
q What are the execution times for each sequence?
q What is the MIPS index for this processor based on the two testing
sequence?

ELEC2300 Computer Organization Fall 2013 Page 24

Summary
qSome related terminology:
vclock, clock cycle, cycle
vclock cycle time, cycle time (seconds, us, ns)
vclock rate, cycle rate (Hz, MHz)
vCPI (cycles per instruction)
vMIPS (millions of instructions per second)

qPerformance is determined by the execution time


qExecution time calculation:
Execution Time = instruction count * CPI * clock cycle time
= instruction count * CPI / clock rate

ELEC2300 Computer Organization Fall 2013 Page 25

OUTLINE
qWhat is the computer performance?
qHow to evaluate the performance?

ELEC2300 Computer Organization Fall 2013 Page 26

Benchmarks
q Execution time calculation:
Execution Time = instruction count * CPI * clock cycle time
= instruction count * CPI / clock rate
q Benchmark: a set of specially designed programs to test the
performance of a computer
q Performance best determined by running a real application
vBenchmarks are application specific
CPU performance, graphics, high-performance computing, objectoriented computing, Java applications, client-server models, mail
systems, file systems, Web servers.

q SPEC (System Performance Evaluation Cooperative)


vcompanies have agreed on a set of real program and inputs
vvaluable indicator of computer performance
Processor (ISA implementation) + compiler
ELEC2300 Computer Organization Fall 2013 Page 27

SPEC 89
q Compiler enhancements and performance
800

700

SPEC performance ratio

600

500

400

300

200

100

gcc

espresso

spice

doduc

nasa7

li

eqntott

matrix300

fpppp

tomcatv

Benchmark
Compiler
Enhanced compiler

ELEC2300 Computer Organization Fall 2013 Page 28

SPEC CPU2000
qSPEC ratio
vReference: Sun Ultra 5_10 with a 300MHz
processor
qCINT2000, CFP2000
vGeometric mean of SPEC ratios

ELEC2300 Computer Organization Fall 2013 Page 29

SPEC CPU2000 Benchmarks

ELEC2300 Computer Organization Fall 2013 Page 30

SPEC CPU2000 ratings

ELEC2300 Computer Organization Fall 2013 Page 31

Amdahl's Law
Execution Time After Improvement =
Execution Time Unaffected +
( Execution Time Affected / Amount of Improvement )
Example:
"Suppose a program runs in 100 seconds on a machine, with
multiplication responsible for 80 seconds of this time. How much do we
have to improve the speed of multiplication if we want the program to run
4 times faster?"
How about making the program 5 times faster?
Principle: Make the common case fast

ELEC2300 Computer Organization Fall 2013 Page 32

Example
q Suppose we enhance a machine making all floating-point instructions
five times faster. If the execution time of some benchmark before the
floating-point enhancement is 10 seconds, what will the speedup be if
half of the 10 seconds is spent executing floating-point instructions?
q We are looking for a benchmark to show off the new floating-point
unit described above, and want the overall benchmark to show a
speedup of 3. One benchmark we are considering runs for 100
seconds with the old floating-point hardware. How much of the
execution time would floating-point instructions have to account for
in this program in order to yield our desired speedup on this
benchmark?

ELEC2300 Computer Organization Fall 2013 Page 33

Remember
qPerformance is specific to a particular program
vTotal execution time is a consistent summary of performance

qFor a given architecture performance increases come


from:
vincreases in clock rate (without adverse CPI affects)
vimprovements in processor organization that lower CPI
vcompiler enhancements that lower CPI and/or instruction count

qPitfall: expecting improvement in one aspect of a


machines performance to affect the total performance
qYou should not always believe everything you read!
Read carefully!
ELEC2300 Computer Organization Fall 2013 Page 34

S-ar putea să vă placă și