Documente Academic
Documente Profesional
Documente Cultură
Introduction
Computer Technology
Performance improvements:
Feature size, clock speed Enabled by HLL compilers, UNIX Lead to RISC architectures
CISC (Complex Instruction Set Computers) RISK ( Reduced Instruction Set Computer
Introduction
RISC
Introduction
Classes of Computers
Classes of Computers
e.g. start phones, tablet computers Emphasis on energy efficiency and real-time Emphasis on price-performance Emphasis on availability, scalability, throughput Used for Software as a Service (SaaS) Emphasis on availability and price-performance Sub-class: Supercomputers, emphasis: floating-point performance and fast internal networks Emphasis: price
Copyright 2012, Elsevier Inc. All rights reserved.
Desktop Computing
Servers
Embedded Computers
Classes of Computers
Parallelism
Instruction-Level Parallelism (ILP) Vector architectures/Graphic Processor Units (GPUs) Thread-Level Parallelism Request-Level Parallelism
Classes of Computers
Flynns Taxonomy
Single instruction stream, single data stream (SISD) Single instruction stream, multiple data streams (SIMD)
No commercial implementation
registers, memory addressing, addressing modes, instruction operands, available operations, control flow instructions, instruction encoding
Specific requirements of the target machine Design to maximize performance within constraints: cost, power, and availability Includes ISA, microarchitecture, hardware
Trends in Technology
Trends in Technology
10
Trends in Technology
Bandwidth or throughput
Total work done in a given time 10,000-25,000X improvement for processors 300-1200X improvement for memory and disks
Time between start and completion of an event 30-80X improvement for processors 6-8X improvement for memory and disks
11
Trends in Technology
Trends in Technology
Feature size
Minimum size of transistor or wire in x or y dimension 10 microns in 1971 to .032 microns in 2011 Transistor performance scales linearly
13
Problem: Get power in, get power out Thermal Design Power (TDP)
Characterizes sustained power consumption Used as target for power supply and cooling system Lower than peak power, higher than average power consumption
Clock rate can be reduced dynamically to limit power consumption Energy per task is often a better measurement
Copyright 2012, Elsevier Inc. All rights reserved. 14
Dynamic energy
Dynamic power
15
Power
Intel 80386 consumed ~ 2 W 3.3 GHz Intel Core i7 consumes 130 W Heat must be dissipated from 1.5 x 1.5 cm chip This is the limit of what can be cooled by air
16
Reducing Power
Do nothing well Dynamic Voltage-Frequency Scaling Low power state for DRAM, disks Overclocking, turning off cores
17
Static Power
18
Trends in Cost
Trends in Cost
Yield
19
20
Trends in Cost
Integrated circuit
Bose-Einstein formula:
21
Dependability
Dependability
Module reliability
Mean time to failure (MTTF) Mean time to repair (MTTR) Mean time between failures (MTBF) = MTTF + MTTR Availability = MTTF / MTBF
22
Measuring Performance
Measuring Performance
Speedup of X relative to Y
Execution time
Wall clock time: includes all system overheads CPU time: only computation time
Benchmarks
Kernels (e.g. matrix multiply) Toy programs (e.g. sorting) Synthetic benchmarks (e.g. Dhrystone) Benchmark suites (e.g. SPEC06fp, TPC-C)
Copyright 2012, Elsevier Inc. All rights reserved. 23
Principles
e.g. multiple processors, disks, memory banks, pipelining, multiple functional units
Principle of Locality
Amdahls Law
24
Principles
25
Principles
26
CPU Registers
Registers Instr. Operands L1 Cache Blocks L2 Cache Blocks Memory Pages Disk Files Tape
1/26/2014
Upper Level
prog./compiler 1-8 bytes cache cntl 32-64 bytes
faster
L1 and L2 Cache
Main Memory
Disk
OS 4K-8K bytes
user/operator Mbytes
Tape
Larger
Lower Level
27
27
28
Increasing throughput of server computer via multiple processors or multiple disks Detailed HW design
Carry lookahead adders uses parallelism to speed up computing sums from linear to logarithmic in number of bits per operand Multiple memory banks searched in parallel in set-associative caches
Pipelining: overlap instruction execution to reduce the total time to complete an instruction sequence.
Not every instruction depends on immediate predecessor executing instructions completely/partially in parallel possible Classic 5-stage pipeline: 1) Instruction Fetch (Ifetch), 2) Register Read (Reg), 3) Execute (ALU), 4) Data Memory Access (Dmem), 5) Register Write (Reg)
1/26/2014
29
29
ALU
I n s t r.
O r d e r
ALU
Ifetch
Reg
DMem
Reg
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
ALU
Ifetch
Reg
DMem
Reg
30
30
Limits to pipelining
Hazards prevent next instruction from executing during its designated clock cycle
Structural hazards: attempt to use the same hardware to do two different things at once Data hazards: Instruction depends on result of prior instruction still in the pipeline Control hazards: Caused by delay between the fetching of instructions and decisions about changes in control flow (branches and jumps).
1/26/2014
31
31
END
32