Documente Academic
Documente Profesional
Documente Cultură
Introducere
• Nota:
10 = 6 (Parcurs) + 4 (Examen)
https://cipsm.hpc.pub.ro
@ ciprian.dobre@upb.ro
http://www.facebook.com/ciprian.dobre
Bibliografia
G.R.Andrews G.R. Andrews, R.A. Allen B. Do
Concurrent Olsson The Little Bo
Programming. The SR Programming Semapho
Principles and Language.
Practice Concurrency in Wan Fokk
Practice Distribut
C.A.R. Hoare Algorithms
Communicating Ian Foster Intuitive App
Sequential Processes Designing and (2nd editi
Building Parallel
S.G. Akl Programs Nancy A. L
The Design and Distributed Alg
Analysis of Parallel A.S. Tanenbaum
Algorithms Structured Computer
Organization (Fourth
Edition)
De ce calcul paralel și distribuit?
Timp de execuție mai scurt
Permite abordarea problemelor de dimensiuni mari
Accesul resurselor aflate la distanță
Reducerea costurilor
Toleranță la defecte
Ascunderea timpilor de așteptare
Redundanța
Scalabilitatea
Scăderea timpului de răspuns
Securitate
Limitele programării secvențiale?
Limitele programării secvențiale?
cofounded Intel
Limitele programării secvențiale
Viteza de transmisie
‒ Maxim c – viteza luminii
Miniaturizare
‒ Tranzistor de mărimea unui atom
Economic
‒ Costuri enorme pentru cercetare și proiectarea unui nou timp de
procesor
Limitele programării secvențiale
CMOS
Importanța cursului
Apar tot mai multe tehnologii distribuite
‒ Blockchain; Peer-to-Peer
Chiar și un procesor de ceas are mai multe core-uri
‒ LG Watch Sport - MSM8909w Processor
o Quad-Core
2020
2019
Algoritmi Paraleli/Distribuiți vs Secvențiali
Resurse fizice
Procesor – multi-core
High End Base /
Cores / L3
Desktop MSRP / Retail Boost Memory
Threads Cache
(HEDT) GHz
Quad
Threadripper
$3,990 / $3,750 64 / 128 2.9 / 4.3 256 DDR4-
3990X
3200
Six-
Intel W- Channel
$2,999 / N/A 28 / 56 3.1 / 4.8 38.5
3175X DDR4-
2666
Cluster
Grid/Cloud
Taxonomia Flynn
Taxonomia Flynn
SISD
‒ Single Instruction Stream, Single Data Stream
‒ Calculatorul Clasic one-core
SIMD
‒ Single Instruction Stream, Multiple Data Streams
‒ Suportul SSE; procesoare GPU
MISD
‒ Multiple Instruction Streams, Single Data Stream
‒ Sisteme specializate
MIMD
‒ Multiple Instruction Streams, Multiple Data Streams
‒ Procesoare actuale (ce aveți acasă și în buzunar)
SISD
Model clasic Arhitectura von Neumann
Unitate
de Procesor Memorie
control Flux de Flux de date
instrucțiuni
SISD
Model clasic Arhitectura von Neumann
SIMD
𝑃1
M
C 𝑃2
M
𝑃𝑛
M
MISD
𝑃1
𝑃2 M
𝑃𝑛
MIMD
𝑃1
𝑀1
𝑃2
𝑀2
𝐶𝑛
𝑃𝑛
𝑀𝑛
Memorie Partajată
Shared memory
‒ Parallel Random Access Memory - PRAM
M M M M M M M
M M M M M M M
M M M M
Depinde de
‒ Aplicație
‒ Performanțe dorite
‒ Număr procesoare disponibile
Exemple: IBM 9000, Cray C90, Fujitsu VP
Tipuri de paralelism
La nivel de bit
La nivel de instrucțiune
La nivel de task
Tipuri de paralelism
La nivel de bit
La nivel de instrucțiune
La nivel de task
Tipuri de paralelism
La nivel de bit
La nivel de task
4 2 7
La nivel de instrucțiune
La nivel de task
‒ Multi-Tasking (pot comunica și procesele)
‒ Multi-Threading
Tipuri de paralelism
La nivel de bit
La nivel de instrucțiune
La nivel de task
‒ Multi-Tasking (pot comunica și procesele)
‒ Multi-Threading
Cum pot comunica două procese?
Notații pseudocod
co S1 || S2 || ... || Sn oc
Ex.1:
x=0; y=0;
co x=x+1 || y=y+1 oc
z=x+y;
Notații pseudocod
co [cuantificator]{Sj}
Ex. 2:
co [j=1 to n] {a[j]=0; b[j]=0;}
process Name[cuatificatori] { Sj }
Threads vs cores
Procese și thread-uri
• Proces
Process 1 Process 2
Instanță a unui program în
execuție Process
Process Process
Pentru un proces, SO alocă: Control
Control Block
Block Control Block
Un spațiu în memorie
(codul programului, zona Code
Code Code
Code
de date, stiva)
Controlul anumitor resurse Data
Data Data
Data
(fișiere, dispozitive I/O, …)
Stack
Stack Stack
Stack
Thread (fir de execuție)
Process 1 Process 2
Process
Process
Control
Control Block
Block
Process
Process Process
Control
Control Block
Block Control Block
Code
Code Data
Data
Code
Code Code
Code Thread 1 Thread 2
Data
Data Data
Data Thread Thread
Control Block Control Block
Stack
Stack Stack
Stack Stack Stack
Execuția thread-urilor
t Defapt:
Thank you
Hyperthreading – the confusion
Multi-tasking vs Multi-threading
Multi-tasking
Task 1
Task 2
Multi-tasking
Task 2
Task 1
Multi-tasking
Task 1
Task 2
Multi-tasking
Task 2
Task 1
Multi-threading
Thread 2
Thread 3
Thread 4
Thread 1
Multi-threading
Thread 1
Thread 2
Thread 4
Thread 3
Multi-threading
Thread 1
Thread 2
Thread 4 Thread 3
Multi-threading
Thread 2
Thread 3
Thread 4
Thread 1
Multi-threading
Thread 1
Thread 4
Thread 2 Thread 3
Multi-tasking vs Multi-threading
Thread 1 Thread 2
a=a+2 a=a+2
Thread 1 Thread 2
a=a+2 a=a+2
Thread 1 Thread 2
a=a+2 a=a+2
Thread 1 Thread 2
a=a+2 a=a+2
Thread 1 Thread 2
eax = eax =
Race condition
a=0
Thread 1 Thread 2
eax = 0 eax =
Race condition
a=0
Thread 1 Thread 2
eax = 0 eax = 0
Race condition
a=0
Thread 1 Thread 2
eax = 2 eax = 0
Race condition
a=0
Thread 1 Thread 2
eax = 2 eax = 2
Race condition
a=2
Thread 1 Thread 2
eax = 2 eax = 2
Race condition
a=2
Thread 1 Thread 2
eax = 2 eax = 2
Un pas si mai departe...
Race condition
a=0
Thread 1 Thread 2
eax = eax =
Race condition
a=0
Thread 1 Thread 2
eax = 0 eax =
Race condition
a=0
Thread 1 Thread 2
eax = 2 eax =
Race condition
a=2
Thread 1 Thread 2
eax = 2 eax =
Race condition
a=2
Thread 1 Thread 2
eax = 2 eax = 2
Race condition
a=2
Thread 1 Thread 2
eax = 2 eax = 4
Race condition
a=4
Thread 1 Thread 2
eax = 2 eax = 4
Primitive de sincronizare
Synchronization primitives
Atomics
Semaphore
‒ Binary semaphore (Mutex)
‒ Critical section
Barrier
Atomics
Fie operația de 64 biți pe un procesor de 64 biți C = A + B
load(A, eax)
load(B, ebx)
eax = eax + ebx
write(C, eax)
Atomics
Fie operația de 64 biți pe un procesor de 32 biți C = A + B
load(A[0], eax)
load(B[0], ebx)
eax = eax + ebx
write(C[0], eax)
load(A[1], eax)
load(B[1], ebx)
eax = eax + ebx
write(C[1], eax)
Atomics
Fie operația de 64 biți pe un procesor 32 biți C = A + B
load(A[0], eax)
load(B[0], ebx)
eax = eax + ebx
Putem avea doar
write(C[0], eax)
jumătate de C
load(A[1], eax)
modificat
load(B[1], ebx)
eax = eax + ebx
write(C[1], eax)
Atomics
Fie operația de 64 biți pe un procesor 32 biți C = A + B
load(A[0], eax)
load(B[0], ebx)
eax = eax + ebx
Atomicitatea asigură
write(C[0], eax)
că C va fi vizibil doar
load(A[1], eax)
complet modificat, sau
load(B[1], ebx)
complet nemodificat.
eax = eax + ebx
write(C[1], eax)
Excludere mutuală - mutex
Mutual exclusion – soluția lui Dekker
wants_to_enter[0] = true
while (wants_to_enter[1]) {
if (turn != 0) {
wants_to_enter[0] = false
while (turn != 0) { // busy wait }
wants_to_enter[0] = true
}
}
// critical section ...
turn = 1
wants_to_enter[0] = false
Soluția lui Dijsktra
b[i] = fals
while(sw[i]) {
sw[i] = F
if (k!=i) {
c[i] = true
if(b[k]) lock() - P
k=i
sw[i] = T
} else {
c[i] = false
for(j=0;j<N;j++)
if(j!=i && !c[j])
sw[i] = T
}
}
//critical
b[i] = true
c[i] = true unlock() - V
flag[0] =
; flag[0] =
;
flag[0] = true;
P0_gate: turn = 1;
while (flag[1] && turn == 1) { // busy wait }
// critical section ...
flag[0] = false;
Soluție cu asistare hardware
while (test_and_set(lock));
// critical section
lock = 0
Race condition – soluție
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 0
Race condition – soluție ?
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 2
Race condition – soluție OK
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 2
Race condition – soluție ?
a=2
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 2
Race condition – soluție OK
a=2
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 2
Race condition – soluție ?
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 0
Race condition – soluție OK
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 0
Race condition – soluție ?
a=2
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 0
Race condition – soluție OK
a=2
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 0
Race condition – soluție ?
a=4
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 2
Race condition – soluție OK
a=4
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 2 eax = 2
Race condition – soluție ?
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax) write(a, eax)
unlock(locka) unlock(locka)
eax = 0 eax = 2
Race condition – soluție
a=0
Thread 1 Thread 2
lock(locka) lock(locka)
load(a, eax) load(a, eax)
eax = eax + 2 eax = eax + 2
write(a, eax)
I
write(a, eax)
M
unlock(locka) PO unlock(locka)
SIB
eax = 0 ILeax = 2