Documente Academic
Documente Profesional
Documente Cultură
Is a contract b/w h/w and s/w ISA defines an interface that separates what is done statically at compile time versus what is done dynamically at run time This interface is called the DSI
Dynamic-Static Interface
Semantic gap between s/w and h/w Placement of DSI determines how gap is bridged
Dynamic-Static Interface
PERFORMANCE
Defining Performance
What is important to whom? Computer system user
Minimize elapsed time for program = time_end time_start Called response time
Improve Performance
Improve (a) response time or (b) throughput?
Faster CPU
Helps both (a) and (b)
Iron Law
Time Processor Performance = --------------Program
=
Instructions Program
(code size)
Cycles X Instruction
(CPI)
Time Cycle
(cycle time)
CPI ..?
What is the CPI for a CISC machine? For a RISC m/c ? For a pipelined m/c ?
With pipelining ..
CPI can be made 1 by overlapping instructions This is the best figure for CPI ,or inversely one IPC (Instruction per cycle)
With superscalar..
The aim is to reduce CPI to < 1 or inversely ..to increase IPC to be >1
For some program running on machine X, PerformanceX = 1 / Execution timeX "X is n times faster than Y"
Problem: Machine A runs a program in 10 seconds and machine B in 15 seconds. How much faster is A than B?
Answer: n = PerformanceA / PerformanceB = Execution timeB/Execution timeA = 15/10 = 1.5 A is 1.5 times faster than B.
BENCH MARKS
Performance best determined by running a real application Use programs typical of expected workload Or, typical of expected class of applications e.g., compilers/editors, scientific applications, graphics, etc. Small benchmarks nice for architects and designers easy to standardize can be abused SPEC (System Performance Evaluation Cooperative) companies have agreed on a set of real programs and inputs can still be abused valuable indicator of performance (and compiler
AMDAHLS LAW
Version 1 Execution Time After Improvement = Execution Time Unaffected + Execution Time Affected / Amount of Improvement Version 2 Speedup = Performance after improvement / Performance before improvement = Execution time before improvement / Execution time after improvement
Before: After:
n n a/p
n+ a su = a n+ p
Amdahls law
Suppose a program runs in 100 seconds on a machine, with multiply responsible for 80 seconds of this time. How much do we have to improve the speed of multiplication if we want the program to run 4 times faster?"
100 /4 = 80 /n + 20 5 = 80/n n= 80 / 5 = 16
i.e. To improve the overall performance by 4, the multiplication part has to be enhanced 16 times
A benchmark program spends half of the time executing floating point instructions. We improve the performance of the floating point unit by a factor of four. What is the speedup?
Amdahls Law
N No. of Processors 1 h 1-h 1-f f Time
h = fraction of time in serial code f = fraction that is vectorizable v = speedup for f Overall speedup:
Speedup =
1 f v
1 f +
If N is the number of processors used for parallel computations, Speedup = 1/ [(1-f) +(f/N)] This simply means that even if we increase the number of processors , the speedup is limited by the non vectorisable part of the program
Consider that half the program is vectorisable . Speedup = 1/( 0.5 +0.5/N) Find the speedup for various values of N . N=1,2,5,10,100
N=1 , SU=1 N=2, SU =1.33 N=4 ,SU=1.6 N=10 .SU=1.8 N=100 , SU=1.98 N=
Amdahl's law
A parallel application can not run faster than the sum of its sequential parts!
Parallelization ideally yields: T=Ts+ Tp Parallelization ideally yields: T=Ts+ Tp Ts= Ts1+Ts2+Ts3+Ts4 Tp=Tp1+Tp2+Tp3
g = fraction of time pipeline is filled 1-g = fraction of time pipeline is not filled (stalled)
1-g
1 1-g
g
g = fraction of time pipeline is filled 1-g = fraction of time pipeline is not filled (stalled)
The second profile is similar to this one which we have seen before
1-g is the time the pipeline is stalled. Speed up = 1/[ (1-g) +(g/N)] As g drops off slightly from 100% , the speedup drops off quickly
We can borrow from the parallel processor model for interpreting pipeline effects g is now the time the pipeline is full
This simply means that ,the performance gain obtained from pipelining ,is strongly degraded by a few number of stall cycles
1-g
When g is even slightly below 100%, a big performance drop will result Stalled cycles are the key adversary and must be minimized as much as possible
Stall cycles ..
..constitute the sequential bottleneck for pipelined processors When a pipeline is stalled, there is only one instruction in the pipeline ,no overlapping of instructions occur Thus the pipeline is stalled for N cycles
Typical Range
Superscalar Proposal
Moderate tyranny of Amdahls Law
Ease sequential bottleneck More generally applicable Robust (less sensitive to f) Revised Amdahls Law:
1 Speedup = 1 f + f s v
ILP
Is defined as the aggregate degree of parallelism (measured as the number of instructions) that can be achieved by the concurrent execution of multiple instructions
90 (Fishers optimism)
Superscalar Proposal
Go beyond single instruction pipeline, achieve IPC > 1 Dispatch multiple instructions per cycle Provide more generally applicable form of concurrency (not just vectors) Geared for sequential code that is hard to parallelize otherwise Exploit fine-grained or instruction-level parallelism (ILP)
Operation Latency
The number of machine cycles until the result of an instruction is available for use by a subsequent instruction . For a reference instruction, OL is the number of machine cycles required for the execution of such an instruction
MP-Machine parallelism
..is the maximum no of simultaneously executing instructions the m/c can support . This is the maximum no of instructions that can be in the pipeline at any one time
IL Issue Latency
is the number of machine cycles required between issuing two consecutive instructions . Issuing means the initiating of a new instruction into the pipeline
Issue Parallelism
.. Is the maximum no of instructions that can be issued in every m/c cycle
SUCCESSIVE INSTRUCTIONS
Superpipelined m/c
1
2 3 4 5 6 IF 2 DE 3 EX 4 5 WB 6
1 2 3 4 5 6 7 8 9 IF DE EX WB
IF
DE EX
WB
1 2 3 4 5 6 7 8 9 IF DE EX WB
SUPERPIPELINED
10
11
12
13
Superpipelining
Superpipelining is a new and special term meaning
pipelining. The prefix is attached to increase the probability of funding for research proposals. There is no theoretical basis distinguishing superpipelining from pipelining. Etymology of the term is probably similar to the derivation of the now-common terms, methodology and functionality as pompous substitutes for method and function. The novelty of the term superpipelining lies in its reliance on a prefix rather than a suffix for the pompous extension of the root word.
Jouppi s ba se machine Unde rpipelined ma chines cannot issue instructions as fast a s they a re executed Unde rpipelined ma chine Note - key characteristic of Superpipe lined machines is that results are not available to M-1 suc cessive instructions
Superpipelined machine