Documente Academic
Documente Profesional
Documente Cultură
ComputerArchitecture
I t d ti & G
Introduction&GeneralTrends
lT d
CourseOutline(1)
Course:EE/CS520ComputerArchitecture
Semester:Fall201213
Instructor:AdeelPasha
Office:9301A,EEDept.,SSE
ff
OfficeHours:TueThu:11:00AM 12:00PM
Email:adeel.pasha@lums.edu.pk
Email: adeel pasha@lums edu pk
TA:Ms.ArfeenKhalid(Section
TA:Ms. Arfeen Khalid (Section1),
1),TBD(Section
TBD (Section2)
2)
2
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(3)
Prerequisites:
EE/CS320ComputerOrganizationandAssemblyLanguage
DeliveryMethod:
Therewillbetwolectures(75minuteseach)and1tutorialof
(2 hrs.) per week.
(2hrs.)perweek.
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(3)
CourseDescription:
Thiscourseextendstheconceptsofcomputer
organizationanduniprocessorarchitecturelearnt
during EE/CS 320 to more advanced topics like:
duringEE/CS320tomoreadvancedtopicslike:
Advanced(outoforder)pipelining
InstructionLevelParallelism(ILP)
( )
Dynamicscheduling
SuperscalarandVLIWarchitectures
ThreadLevelParallelism(TLP)
Thread Level Parallelism (TLP)
Multiprocessors
Memoryhierarchy,speciallyinmulticores
StoragesystemsandI/Odevices
4
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(4)
Textbook:
ComputerArchitecture:AQuantitativeApproach
byJohnL.HennessyandDavidA. Patterson,4th Edition
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(5)
Homeworkwillcompriseof:
1) Problem solving exercises
1)Problemsolvingexercises
2)Researchpaper/productorientedwhitepaper
readingandreview.Youhavetosubmit12page
review that should capture the essence of the paper and
reviewthatshouldcapturetheessenceofthepaperand
criticalanalysis.
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(6)
GradingScheme
Quizzes
Homework
Midterm
Final
15%
10%
35%
40%
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(7)
FundamentalsofComputerArchitecture
Introductiontocomputerarchitecture,generaltrends
Introduction to computer architecture general trends
Measuringperformanceofcomputers
Review:BasicsofInstructionSetArchi.(ISA)
Classification
Memoryaddressing
Instructionencoding
ContributionofCompilerstoISAdesign
Contribution of Compilers to ISA design
Examples
Review:Basicsofpipelining
Pipelining?
Limitations
Hazards
Staticbranchprediction
p
Dynamicschedulingbasics(Scoreboard)
8
EE/CS520:Comp.Archi.
Week1
Week2
Week3,4
9/3/2012
CourseOutline(8)
ILPanddynamicschedulingWeek5,6
ILP
Tomasulosalgorithm
Dynamicbranchprediction
SuperscalararchitectureWeek7
Superscalar
VLIW
Casestudy:Pentium4
MidTerm
EndofILPandstartofTLPWeek9
LimitationstoILP
Limitations to ILP
BasicsofTLP
9
EE/CS520:Comp.Archi.
9/3/2012
CourseOutline(9)
Cachereview
Week10
Basics
Typesandorganization
Performanceimprovements
MultiprocessorsandmemoryhierarchyWeek11,12
Simultaneousmultithreading(SMT)
Multiprocessorsonchip(MPC)
Memoryhierarchy
Cachecoherence
Cache coherence
ExamplesofMPCandmultimediaprocessors
Systemprotection
Week13
Virtualmemoryandvirtualmachines
y
StoragesystemsandI/Os
Week14
CourseReview
Week15
Finals
10
EE/CS520:Comp.Archi.
9/3/2012
ComputingSystemsToday
Theworldisalargecomputingsystem
Clusters
HighSpeedNW
Microprocessorsineverything
p
y
g
Clusters
Internet
Connectivity
Databases
Cloudcomputing
Remotestorage
Onlinegames
PDA
Sensor
Nets
11
Servers
Cars
EE/CS520:Comp.Archi.
Routers
Robots
9/3/2012
ComputerArchitecture?
Application[Idea]
g
Algorithm
Original
domain
ofthe
f th
computer
architect
((50s
80s)
ProgrammingLanguage
OperatingSystem/VirtualMachine
InstructionSetArchitecture(ISA)
Microarchitecture
Register Transfer Level (RTL)
RegisterTransferLevel(RTL)
GateLevel
Circuits&Devices
Domainof
Domain
of
recent
computer
architecture
(90s)
Parallelism,
Multithreads,
Security,
Reliability,
Power,
Physics(Silicon)[Realization]
mid2000sonward
12
EE/CS520:Comp.Archi.
9/3/2012
WhyComputerArchitecture?
Exploitsadvancementintechnology
Makethingsfaster,smaller,cheaper,
Enablesnewapplications
AVATAR20yearsago?
AVATAR 20
?
Makenewthingspossible
Accurate
one month weather forecasts? Life like
Accurateonemonthweatherforecasts?Lifelike
virtualreality?Onlineinteractivegames?
Advancementincomputerarchitecture
p
advancementinallotherareasofcomputingi.e.
(CS,CEandEE)!
13
EE/CS520:Comp.Archi.
9/3/2012
WhyStudyComputerArchitecture?
Understandwherecomputersaregoing
Futurecapabilitiesdrivethe(computing)world(EE,CS,CE)!!
F
bl
d
h (
)
ld (EE CS CE)!!
Realworldimpact:nocomputerarchitecture!nocomputers!
Understandhighleveldesignconcepts
Understandhigh leveldesignconcepts
Thebestarchitectsunderstandallthelevels
Devices,circuits,architectures,compilers,applications
,
,
,
p
, pp
Geta(designorresearch)hardwarejob
Understandcomputerperformance
Writingwelltuned(fast)softwarerequiresknowledgeofHW
Bestsoftwaredesignersunderstandhardware
Geta(designorresearch)softwarejob
G t (d i
h) ft
j b
14
EE/CS520:Comp.Archi.
9/3/2012
MooresLaw
No.oftransistorsoncosteffectiveintegratedcircuitdoublesevery18months
15
EE/CS520:Comp.Archi.
9/3/2012
EvolutionofUniprocessors
IfIdoublethenumberofpeopleona
Project,willitbespedupby2x?
19781986:25%/year
19862002:52%/year
20022006:20%/year
Similarly,2xtransistorsdoesnot
automaticallygivea2xperformance
i ll i
2
f
Possiblebecauseofcontinued
advancesincomputerarchitecture.
Muchofcomputerarchitectureis
about how you organize these
abouthowyouorganizethese
resourcestogetmorebenefits
16
From Hennessy and Patterson, Computer Architecture: A Quantitative Approach, 4th ed.
EE/CS520:Comp.Archi.
9/3/2012
NewConceptstoLearn
Oldconcept: Powerisfree,Transistorsareexpensive
Newconcept:Powerwall Powerexpensive,transistorsfree?
Oldconcept:SufficientincreaseinInstructionLevelParallelismvia
Old concept : Sufficient increase in Instruction Level Parallelism via
compilers,innovation(outoforder,speculation,VLIW,)
Newconcept:ILPwall lawofdiminishingreturns
Oldconcept:Multipliersareslow,Memoryaccessisfast
ld
l l
l
f
Newconcept:Memorywall Memoryisslow,multipliersarefast
(200clockcyclestoaccessDRAMmemory,4clocksformultiply)
Oldconcept:Uniprocessorperformance2X/1.5yrs
Newconcept:PowerWall+ILPWall+MemoryWall=EndofUniprocessor
Uniprocessorperformancenow2X/5(?)yrs
p
p
/ ( )y
Seachangeinchipdesign:multiplecores
(2Xprocessorsperchip/~2years)
Largeno.ofsimpleprocessorsbetterthansmallno.ofcomplexprocessors
17
EE/CS520:Comp.Archi.
9/3/2012
SeaChangeinChipDesign
Intel4004(1971):
4bitprocessor,
2312transistors,0.4MHz,
10m PMOS,11mm2chip
RISCII(1983):
32bit,5stagepipeline
40,760transistors,3MHz,
40 760 transistors 3 MHz
3m NMOS,60mm2chip
Pentium4(2003)
Pentium 4 (2003)
32bit,20+stagepipeline
55Mtransistors,3.2GHz
90nm,101mm2chip
18
EE/CS520:Comp.Archi.
9/3/2012
EndoftheRaceforHigherFrequencies
2004,IntelCEOCraigBarrett,
publicallyexcusedfornot
launchingPentium@4GHzbut
suggestedothermeasurestobe
taken for performance
takenforperformance
improvement
19
EE/CS520:Comp.Archi.
9/3/2012
TheEndoftheUniprocessorEra
Singlebiggestchangeinthehistoryof
Si
l bi
h
i h hi
f
computingsystems
20
EE/CS520:Comp.Archi.
9/3/2012
AdvancementsinTechnology
No.oftransistorsnotlimitingfactor
Currently~1billiontransistors/chip
Problems:
ToomuchPower,Heat,Latency
NotenoughParallelism
TheIntelCorei7
microprocessor(Nehalem)
i
(N h l )
Intel Corei7
IntelCore
i7
4cores/chip
45nm
45 nm
731MTransistors
SharedL3Cache 8MB
L2Cache 1MB(256Kx4)
21
EE/CS520:Comp.Archi.
9/3/2012
ManyCore:TheFutureisHere!
Intel80coremulticorechip(Feb.2007)
80simplecores
Twofloatingpointengines/core
Meshlike"networkonachip
100milliontransistors
65nmfeaturesize
Frequency
q
y Voltage
g
Power
Bandwidth
3.16GHz
0.95V 62W 1.62Terabits/s
5.1GHz
1.2V 175W 2.61Terabits/s
5.7GHz
1.35V 265W 2.92Terabits/s
Performance
1.01Teraflops
1.63Teraflops
1.81Teraflops
ManyCorereferstomanyprocessors/chip
Howtoprogramthese?
How to program these?
Use2CPUsforvideo/audio
Use1forwordprocessor,1forbrowser
Whattodowithrestof76?
Wh
d
ih
f 76?
22
Somethingnewisclearlyneededhere
EE/CS520:Comp.Archi.
9/3/2012
ChallengeswithSeaChange
Algorithms,ProgrammingLanguages,Compilers,
OperatingSystems,Architectures,Libraries,notready
tosupplyparallelism for1000CPUs/chip
Needawholenewapproach
Peoplehavebeenworkingonparallelismforover50years
People have been working on parallelism for over 50 years
withoutgeneralsuccess
Architecturesnotreadyfor1000CPUs/chip
y
/ p
UnlikeInstructionLevelParallelism,cannotbesolvedjust
bycomputerarchitectsandcompilerwritersalone
butalsocannotbesolvedwithout participationofcomputer
architects!!!
23
EE/CS520:Comp.Archi.
9/3/2012
WhattoLearninEE/CS520
Theprocessorbuiltin
EE/CS320CO
(asimple5stageRISC)
24
EE/CS520:Comp.Archi.
WhatwouldbeunderstoodaftertakingCS520
(Superscalar,multicoresystemswithcomplex
memoryhierarchy,andvirtualmemorymanagement)
9/3/2012
WhattoLearninEE/CS520
I/OandStorage
DiskDrive,CD/DVDROM,Tape
/
p
RAID
DRAM/SRAM
Memory
Hierarchy
L2Cache
Processor
L1 Cache
L1Cache
Coherence,
Bandwidth,
Latency
ate cy
Addressing,
Protection,,
InstructionSetArchitecture
25
Pipelining,HazardResolution,
Superscalar,Reordering,
l
d
Prediction,Speculation,
Vector,DynamicScheduling
EE/CS520:Comp.Archi.
ILP,TLP,SMT,MPC
ILP TLP SMT MPC
9/3/2012
WhattoLearninEE/CS520
Past,presentandfuturetrendsincomputerarchitecture
Advancedconceptsinpipelining
Ad
d
t i i li i
suchasdynamicschedulingandILP(Inst.LevelParallelism)
Compileraspectsofperformanceimprovement
Principlesandworkingofsuperscalarsystems
DifferentconceptsinThreadLevelParallelism(TLP)
Principlesandworkingofmulti
Principles and working of multiprocessor
processorsystems
systems
Thefutureofcomputingsystems
Memoryhierarchyandvirtualmemorymanagement
Criticalanalysisofresearchandwhitepapers
C iti l
l i f
h d hit
Helpfulforbothresearchanddevelopmentorientedpeople
Interactionwithapracticalretargetablecompilationflowfor
ASIPs(ApplicationSpecificInstructionsetProcessors)
26
EE/CS520:Comp.Archi.
9/3/2012