Documente Academic
Documente Profesional
Documente Cultură
Budditha Hettige
Department of Statistics and Computer Science University of Sri Jayewardenepura
Multi-core architectures
Budditha Hettige
Single Computer
Budditha Hettige
Budditha Hettige
Budditha Hettige
Budditha Hettige
Why Multi-core
Difficult to make single-core clock frequencies even higher Deeply pipelined circuits:
heat problems speed of light problems difficult design and verification large design teams necessary server farms need expensive air-conditioning
Many new applications are multithreaded General trend in computer architecture (shift towards more parallelism)
Budditha Hettige 7
Instruction-level parallelism
Parallelism at the machine-instruction level The processor can re-order, pipeline instructions, split them into microinstructions, do aggressive branch prediction, etc. Instruction-level parallelism enabled rapid increases in processor speeds over the last 15 years
Budditha Hettige
Budditha Hettige
11
Budditha Hettige
12
More examples
Editing a photo while recording a TV show through a digital video recorder Downloading software while running an antivirus program Anything that can be threaded today will map efficiently to multi-core BUT: some applications difficult to parallelize
Budditha Hettige 13
Waiting for the result of a long floating point (or integer) operation Waiting for data to arrive from memory
L1 D-Cache D-TLB
Integer
Floating Point
Schedulers
Uop queues Rename/Alloc
BTB Trace Cache uCode ROM
Budditha Hettige
15
Without SMT, only a single thread can run at any given time
L1 D-Cache D-TLB
Integer
L2 Cache and Control
Floating Point
Schedulers
Bus
Without SMT, only a single thread can run at any given time
L1 D-Cache D-TLB
Integer
L2 Cache and Control
Floating Point
Schedulers
Bus
17
Integer
L2 Cache and Control
Floating Point
Schedulers
Bus
18
Integer
L2 Cache and Control
Floating Point
Schedulers
This scenario is impossible with SMT on a single core (assuming a single integer unit) 19
Bus
Integer
L2 Cache and Control
Floating Point
L2 Cache and Control
Integer
Floating Point
Rename/Alloc
BTB Trace Cache uCode ROM
Rename/Alloc
BTB Trace Cache uCode ROM
Budditha Hettige
Integer
L2 Cache and Control
Floating Point
L2 Cache and Control
Integer
Floating Point
Rename/Alloc
BTB Trace Cache uCode ROM
Rename/Alloc
BTB Trace Cache uCode ROM
Budditha Hettige
The number of SMT threads: 2, 4, or sometimes 8 simultaneous threads Intel calls them hyper-threads
Budditha Hettige 23
Integer
L2 Cache and Control
Floating Point
L2 Cache and Control
Integer
Floating Point
Rename/Alloc
BTB Trace Cache uCode ROM
Rename/Alloc
BTB Trace Cache uCode ROM
Budditha Hettige
Budditha Hettige
25
SMT
Can have one large and fast superscalar core Great performance on a single thread Mostly still only exploits instruction-level parallelism
Budditha Hettige 26
Multi-core chips:
L1 caches private L2 caches private in some architectures and shared in others
Budditha Hettige
27
Fish machines
Dual-core Intel Xeon processors
CORE1
hyper-threads
L1 cache
CORE0
L1 cache
L2 cache
memory
28
L1 cache L2 cache
L1 cache L2 cache
L1 cache
L2 cache L3 cache
memory
memory
Both L1 and L2 are private Examples: AMD Opteron, AMD Athlon, Intel Pentium D
Budditha Hettige
Budditha Hettige
30
Advantages of shared:
Threads on different cores can share the same cache data More cache space available if a single (or a few) high-performance thread runs on the system
Budditha Hettige 31
core 1
Budditha Hettige
32