Documente Academic
Documente Profesional
Documente Cultură
Computer Organization
and Architecture
8th Edition
Chapter 18
Multicore Computers
Increase in Parallelism
Pipelining
Superscalar (multi-issue)
Simultaneous multithreading (SMT)
Diminishing returns
More complexity requires more logic
Increasing chip area for coordinating and
signal transfer logic
Harder to design, make and debug
Alternative Chip
Organizations
http://www.cadalyst.com/files/cadalyst/nodes/2008/6351/i4.jpg
Intel Hardware
Trends
Exponential speedup trend
ILP has come and gone
http://smoothspan.files.wordpress.com/2007/09/clockspeeds.jpg
http://www.ixbt.com/cpu/semiconductor/intel-65nm/power_density.jpg
Increased Complexity
Power requirements grow exponentially with chip
density and clock frequency
Can use more chip area for cache
Smaller
Order of magnitude lower power requirements
By 2015
http://techreport.com/r.x/core-i7/die-callout.jpg
http://www.tomshardware.com/reviews/core-duo-notebooks-trade-batterylife-quicker-response,1206-4.html
More action
Less action
We passed 50%!!!
Is this a RAM or a processor?
Increased Complexity
Pollacks rule:
Cache
CPU
Multi-process applications
Oracle, SAP, PeopleSoft
Java applications
Java VM is multi-threaded with scheduling and memory
management (not so good at SSE )
Suns Java Application Server, BEAs Weblogic, IBM
Websphere, Tomcat
Multi-instance applications
One application running multiple times
Multicore Organization
Main design variables:
Number of core processors on chip (dual, quad ... )
Number of levels of cache on chip (L1, L2, L3, ...)
Amount of shared cache v.s. not shared (1MB, 4MB, ...)
ARM11 MPCore
AMD Opteron
Intel Core i7
No shared
Shared
Core i7
Core 2 duo
2006
Two x86 superscalar, shared L2 cache
Dedicated L1 cache per core
32KB instruction and 32KB data
November 2008
Four x86 SMT processors
Dedicated L2, shared L3 cache
Speculative pre-fetch for caches
On chip DDR3 memory controller
Three 8 byte channels (192 bits) giving 32GB/s
No front side bus (just like labs 1 & 2 with the SDRAM
controller)
ARM11 MPCore
ARM vs. x86 and Microsoft
Intel started this fight by challenging ARM
with its Atom processor, which is moving
downmarket and towards
smartphones. Apparently, the major ARM
vendors are feeling the threat, are now
moving upmarket and are beginning to
make their run at low-end PCs and
storage appliances to put the pressure
back on Intel.
http://www.tgdaily.com/trendwatch-features/41561-the-coming-arm-vs-intel-pc-battle
ARM11 MPCore
Up to 4 processors each with own L1 instruction and data
cache
Distributed Interrupt Controller (DIC)
Recall the APIC from Intels core architecture
CPU interface
CPU
L1 cache
Snoop control unit
L1 cache coherency
http://barfblog.foodsafety.ksu.edu/DogObedienceTraining.jpg
ARM11
MPCore
Block
Diagram
DIC Routing
Interrupt States
Inactive
Non-asserted
Completed by that CPU but pending or active
in others
E.g. allgather
Pending
Asserted
Processing not started on that CPU
Active
Started on that CPU but not complete
Can be pre-empted by higher priority interrupt
Interrupt Sources
Inter-process Interrupts (IPI)
Private to CPU
ID0-ID15 (16 IPIs per CPU as mentioned earlier)
Software triggered
Priority depends on receiving CPU not source
Hardware
Triggered by programmable events on associated
interrupt lines
Up to 224 lines
Start at ID32
Cache Coherency
Snoop Control Unit (SCU) resolves most shared
data bottleneck issues
Note: L1 cache coherency based on MESI similar to
Intels core architecture
3. Migratory lines
Recommended Reading