Documente Academic
Documente Profesional
Documente Cultură
ASahu 1 ASahu 2
Processesrunondifferentprocessors
Multiprocessorsarelikelytobecost/power independently
effectivesolutions
Becauseitsharelotsofresources Atsomepointtheyneedtoknowthestatusof
Personalroomiscostlierthandormitory eachotherfor
Sharingresourcearisemanyotherproblems Communication,mutualexclusionetc
Coherence
Coherence SimpleLocking
Si l L ki
Shareddataatallplacedshouldbesame
CriticalSections PartofACA
LockandBarrierDesign Course@IITG
Lock(L)
Consistency CriticalSection(C)
Ordershouldbesimilartoserial(ROB) Unlock(L)
OneprocessorInterferenceothers
Shareefficientlyusingsomepolicy
ASahu 3 ASahu slide4
SimpleLocking Noinstruction
lock: ldreg,loc//copylocationtoreg areatomic
cmp reg #0//comparewith0 Hardwareprimitiveforatomicread+write is
bnz lock//ifnotzerotryagain required e.g.
st loc#1//store1atloctomarkitlocked
return; Test&Set, //testforunlock(0)thensetthelock(1)
Exchange,
Unloock:st
Unloock: st loc#0
loc #0 Fetch&Increment
F t h&I t
return
Supposetwoprocessorarecontendingtoacquirelock
Bothreadlockatsametimevalue0andpassesthe
branches
BothlockedthevariableandentertoCS
ThiscontradictthemeaningofaLOCK
ASahu slide5 ASahu slide6
ASahu 1
CS521CSEIITG 11/23/2012
Lock:0indicatesfreeand1indicateslocked
CodetolockX: SPINing LLr1XReadingfromalocationX
Trytotestandacquirethelockina
r2 1 tightloop //dosomeoperation
lockit: r2 X ;atomic exchange Time
( )
if(r20)lockit ;
;already y locked SCr3XStoringtolocationr3toX
locksarecachedforefficiency,coherenceisused ifunsuccessfulr3==0
BettercodetolockX:
StorewillbeunsuccessfulifvaluesofXis
lockit: r2 X ;read lock altered/changedbyothersprocessorbetween
if(r20)lockit ;not available timeofLLandtimeofSC
r2 1
r2 X ;atomic exchange Loadlinked/StoreConditional
if(r20)lockit
ASahu ;already locked 7 ASahu slide8
Simplertoimplement lockit:
AtomicexchangeusingLLandSC LL r2, X ;load locked
try: r3 r2 ;move exchange value
if(r20)lockit ;not available
LL r1, X ;load locked
SC r3, X ;store conditional r2 1
if(r3
if(r3=0)try
0)try ;branch store fails SC r2, X ;store cond
r2 r1 ;put loaded value in r2 if(r2=0)lockit ;branch store
Fetch&incrementusingLLandSC fails
try: LL r1, X ;load locked
r3 r1 + 1 ;increment
SC r3, X ;store conditional Spinlockwithexponentialbackoffreduces
if(r3=0)try ;branch store fails contention
DoWork(){ Initialize
Spinningwastetime Do Phase I work
TAS Lock
RecallMACProtocol Barrier();
Do Phase II work PhaseI 1 2 3 N
NonPersistenceCSMAprotocol
Waitrandomtimeifmediumif Barrier();
Do Phase III work()
time
busy,thensend PhaseII
Backoff lock
Barrier(); 1 2 3 N
Spinlockwithexponential }
backoffreducescontention
threads for (i=0;i<NumProc;i++){
Waitkamountoftimefor1st PhaseIII
1 2 3 N
attempt P[i].Start(DoWork);
Waitk*ci amountoftime }
PrintResult
forith attempt Print result();
ASahu slide11 ASahu slide12
ASahu 2
CS521CSEIITG 11/23/2012
IfAllProcessor
Barrier(){ Passes/completedthen Sumofallfinishedprocessor
lock (X) theygotonextPhase
if(count=0)release 0 Lock
count++ Lock
unlock(X)
Lock1 Lock2
if(count=total){count0;release1}
else spin until(release==1) P1 P2 P3 PN
P1 P1 PM P1 P1 PM
}
Everyoneaccessingto Lockcontentionis
samelock distributedinaTreeFashion
ASahu slide13 ASahu slide14
Do Phase I work
local_sense !local_sense //Toggle
Barrier(bar1, p);
lock (X)
//After this release=1, but not
//visible to all some how, it may count++
//happened one process is not got this unlock(X)
//
//and waiting
g while other entered to if(count = total)
//NEXt barrier {count0;releaselocal_sense}
Do Phase II work else
Barrier(bar1, p); {spin
//Some will enter to this not all, So until(release==local_sense)}
barrier will not end atall.
Variousmodels:HighlyrecommendedbyHPBook WhichstatementsamongS1andS2aredone?
http://rsim.cs.uiuc.edu/~sadve/Publications/
models_tutorial.ps BothS1,S2maybedoneifwritesaredelayed
ASahu 3
CS521CSEIITG 11/23/2012
XY
OperationXmustcompletebeforeoperationYisdone Loadsareallowedtoovertakestores
Sequentialconsistencyrequires:R W,RR,WR,
WW Writebufferingispermitted
RelaxWR
Totalstoreordering
Total store ordering
RelaxWW 1. TotalStoreOrdering:Writesareatomic
Partialstoreorder 2. ProcessorConsistency:Writesneednotbe
RelaxRWandRR atomic Invalidationsmaygradually
Weakorderingandreleaseconsistency propagate
Consistencymodelismultiprocessorspecific S1:X=10 S2:Y=10
Programmerswilloftenimplementexplicit FENCE FENCE //NoWritebufferingpoint
synchronization L1:R1=Y L2:R2=X
ASahu slide22
P1 P2 SCensuresthat1is
PartialStoreOrdering printed
A = 1; while(flag=0);
flag = 1; print A; TSO,PCalsodoso
Loadsareallowedtoovertakestores PSOdoesnot
Writescanbereordered
Writes can be re ordered
P1 P2 SCensuresthatifBis
Memorybarrierorfenceareusedto printedas1then
A = 1; print B;
explicitlyorderanyoperations Aisalsoprintedas
B = 1; print A;
1
1. TSO:Writesareatomic
TSO,PCalsodoso
Furtherimprovestheperformance 2. PC:Writesneednotbeatomic
Invalidationsmaygradually PSOdoesnot
propagate
ASahu slide23 ASahu
3. PSO:Writecanbeordered slide24
ASahu 4
CS521CSEIITG 11/23/2012
P1 P2 P3
A = 1; while(A=0); while(B=0);
B = 1; print A;
WeakOrderingorWeakConsistency
SCensuresthat1isprinted.TSOandPSOalsodothatbutPC LoadsandStoresarenotrestrictedtofollow
doesnot
//PSOdoesasonlyWriteperProcess anorder
Explicitsynchronizationprimitivesareused
li i h i i i ii d
P1 P2
A = 1; B = 1; Synchronizationprimitivesfollowastrictorder
print B; print A; 1. TSO:Writesareatomic
2. PC:Writesneednotbeatomic Easytoachieve
SCensuresthatbothcantbe Invalidationsmaygradually
printedas0.TSO,PCand propagate Lowoverhead
PSOdonot 3. PSO:Writecanbeordered
AsLoadsareallowedtoovertakestores
ASahu slide25 ASahu slide26
WC RC
Furtherrelaxationofweakordering R/W R/W
1
Synchprimitivesaredividedintoaquire and R/W 1 R/W
release operations
synch aquire
R/Woperationsafteranaquire
R/W operations after an aquire cannotmove
can not move
beforeitbutthosebeforeitcanbemoved 2
R/W
2
R/W
R/W R/W
after
R/Woperationsbeforearelease cannotmove synch release
afteritbutthoseafteritcanbemovedbefore R/W R/W
3 3
R/W R/W
ASahu 5