Documente Academic
Documente Profesional
Documente Cultură
72
28
8
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Without parallelism
Parallel Parallel
Finish
Start
(NumPy|S (Numba|
ciPy| Sklearn|
Numexpr) PyDAAL)
Sequential Regions Python Compute Regions Regions
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Without parallelism: {infinite} resources
Amdahl's Law:
“speedup is limited by
Finish
Start
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Gustafson-Barsis’ Law
States that if the problem size grows along with the number of parallel
processors, while the serial portion grows slowly or remains fixed, speedup
increases as processors are added.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 7
*Other names and brands may be claimed as the property of others.
Without parallelism: what we can do?
Parallel Parallel
Finish
Start
(NumPy|S (Numba|
ciPy| Sklearn|
Numexpr) PyDAAL)
Sequential Regions Python Compute Regions Regions
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Make application parallel
Finish
Start
(NumPy|S PyDAAL)
ciPy| Regions
Numexpr)
Sequential Regions Python Speedup?
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Later, we will see some performance issues
that we can still have with “blind” parallelism.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. Intel Confidential 10
*Other names and brands may be claimed as the property of others.
A quick look at some parallelism techniques
• Python Multithreading
• Multiprocessing
• Joblib
• Dask
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 11
*Other names and brands may be claimed as the property of others.
Sources
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 12
*Other names and brands may be claimed as the property of others.
Python Multithreading
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Python Multithreading
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Python Multithreading
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Python Multithreading
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 13
*Other names and brands may be claimed as the property of others.
Multithreading example
For a given list of numbers, print square and cube for each number.
Input: [2,3,8,9]
Output:
Square: [4,9,64,81]
Cube: [8, 27, 512, 729]
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 14
*Other names and brands may be claimed as the property of others.
Python Multithreading
Running several threads is similar to running several different programs
concurrently, but with the following benefits:
• Multiple threads within a process share the same data space with the main
thread and can therefore share information or communicate with each other
more easily than if they were separate processes.
• Threads sometimes called light-weight processes and they do not require
much memory overhead; they are cheaper than processes.
A thread has a beginning, an execution sequence, and a conclusion. It has an
instruction pointer that keeps track of where within its context it is currently
running.
It can be pre-empted (interrupted)
It can temporarily be put on hold (also known as sleeping) while other threads
Optimization Notice
are running - this is called yielding.
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
15
Python Multithreading
Multi-threading benefits:
• Multiple threads within a process share the same data space with the main
thread and can therefore share information or communicate with each other
more easily than if they were separate processes.
• Threads sometimes called light-weight processes and they do not require
much memory overhead; they are cheaper than processes.
A thread has a beginning, an execution sequence, and a conclusion. It has an
instruction pointer that keeps track of where within its context it is currently
running.
It can be pre-empted (interrupted)
It can temporarily be put on hold (also known as sleeping) while other threads
are running - this is called yielding.
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 16
*Other names and brands may be claimed as the property of others.
5
Multiprocessing 4
3
2
1
25
16
9
4
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 17
*Other names and brands may be claimed as the property of others.
5
4
Multiprocessing 3
2
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
5
4
Multiprocessing 3
2
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
5
4
Multiprocessing 3
2
5 1
1 2 3 4
Core 1 Core 2 Core 3 Core 4
def f(n): def f(n): def f(n): def f(n):
return n*n return n*n return n*n return n*n
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
5
4
Multiprocessing 3
2
5 1
1 2 3 4
Core 1 Core 2 Core 3 Core 4
def f(n): def f(n): def f(n): def f(n):
return n*n return n*n return n*n return n*n
4 9 16
25
1 25
16
9
4
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
5
4
Multiprocessing 3
2 Map
5 1
1 2 3 4
Core 1 Core 2 Core 3 Core 4
def f(n): def f(n): def f(n): def f(n):
return n*n return n*n return n*n return n*n
4 9 16
25
1 25 Reduce
16
9
4
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 18
*Other names and brands may be claimed as the property of others.
Multiprocessing – address spaces
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 19
*Other names and brands may be claimed as the property of others.
What’s the difference between multiprocessing
and multithreading?
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 20
*Other names and brands may be claimed as the property of others.
Multiprocessing vs multithreading
Process
0x0f12 3453
Thread 2
…
… Thread 3
…
0xFFFF FFFF
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 21
*Other names and brands may be claimed as the property of others.
Process 1 Process 2
… Shared Memory …
… …
… …
0xFFFF FFFF Message 0xFFFF FFFF
Pipe
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 22
*Other names and brands may be claimed as the property of others.
Threads Processes
Picture sources:
https://www.askideas.com/50-most-funny-skinny-pictures-that-will-make-you-laugh-every-time/
https://www.reddit.com/r/whowouldwin/comments/36n70j/10_skinny_guys_vs_10_fat_guys/
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 23
*Other names and brands may be claimed as the property of others.
Threads Processes
Picture sources:
https://www.askideas.com/50-most-funny-skinny-pictures-that-will-make-you-laugh-every-time/
https://www.reddit.com/r/whowouldwin/comments/36n70j/10_skinny_guys_vs_10_fat_guys/
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 23
*Other names and brands may be claimed as the property of others.
Joblib and Dask
Examples in Jupyter
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 24
*Other names and brands may be claimed as the property of others.
The parallelism spaces
Dask
*Unicorn? MPI4PY…?
Celery
Concurrent Futures
Buildbot Single-threaded
MPI4PY (Single Node)
Twisted Concurrency
Openstack
Tornado
Async/await
Threading
Trio
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 26
*Other names and brands may be claimed as the property of others.
The risks
Data Parallelism-
Multiprocessing Python Focus
Joblib Multiprocessing OpenMP, TBB, NumPy/SciPy
Pthreads Numba
Cython
Dask Numexpr
Python
Multithreading Nested parallelism area
with risk of oversubscription
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 27
*Other names and brands may be claimed as the property of others.
Nested parallelism
data = numpy.random.random((256, 256))
pool = multiprocessing.pool.ThreadPool() # creates P threads
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 28
*Other names and brands may be claimed as the property of others.
Oversubscription
P CPUs P CPUs
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 29
*Other names and brands may be claimed as the property of others.
Oversubscription overheads
• Types of impact
• Direct OS overhead for switching out a thread
• CPU cache becomes cold: invisible impact
• Other threads are waiting until the preempted one returns
• Tensorflow, Scikit-Learn, PyTorch have a recurring battle with these
• How do they solve it?
• Most use OMP_NUM_THREADS=1… KMP_BLOCKTIME=1…
• SMP ironically addresses this (more on this later)
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 30
*Other names and brands may be claimed as the property of others.
Introducing composability modules
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 31
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent
1
Subcomponent 2
Subcomponent
1
Subcomponent
Subcomponent
1
1
Subcomponent K
Libraries/modules/components are
Subcomponent
1
Component N
not aware of the big picture Subcomponent 1
Subcomponent
1
Subcomponent M
Subcomponent
1
Subcomponent
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent
1
Subcomponent 2
Subcomponent
1
Subcomponent
Subcomponent
1
1
Subcomponent K
Libraries/modules/components are
Subcomponent
1
Component N
not aware of the big picture Subcomponent 1
Subcomponent
1
Subcomponent M
Subcomponent
1
Subcomponent
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent
1
Subcomponent 2
Subcomponent
1
Subcomponent
Subcomponent
1
1
Subcomponent K
Libraries/modules/components are
Subcomponent
1
Component N
not aware of the big picture Subcomponent 1
Subcomponent
1
Subcomponent M
Subcomponent
1
Subcomponent
1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Introducing composability modules
• smp: Static Multi-Processing
• A Pure Python package managing nested parallelism through coarse-grain static settings
• Instantiates via monkey patching of Python’s pools
(no code changes required)
• Utilizes affinity mask + OpenMP settings to statically allocate resources and avoid
excessive threads
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 33
*Other names and brands may be claimed as the property of others.
Nested parallelism (again)
data = numpy.random.random((256, 256))
pool = multiprocessing.pool.ThreadPool() # creates P threads
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 34
*Other names and brands may be claimed as the property of others.
TBB’s Thread coordination system
tbb4py module
Application Application
Running
Python & MKL
under
OpenMP
the TBB
Threading TBB pool scheduler
Coordinated
Separate,
TBB Threads
Uncoordinated
Too many Software threads Software Threads mapped
OpenMP Parallel
compete for logical to logical processors
regions
processors
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
SMP’s total threading affinity system
Application Application
Running
under
OpenMP the SMP
Threading module
Separate,
Uncoordinated
Too many Software threads
OpenMP Parallel
compete for logical
regions
processors
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 36
*Other names and brands may be claimed as the property of others.
Without nested parallelism
Parallel Parallel
Finish
Start
(NumPy|S (Numba|
ciPy| Sklearn|
Numexpr) PyDAAL)
Sequential Regions Python Compute Regions Regions
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Without nested parallelism: {infinite} resources
Amdahl's Law:
“speedup is limited by
Finish
Start
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Without nested parallelism: what we can do?
Parallel Parallel
Finish
Start
(NumPy|S (Numba|
ciPy| Sklearn|
Numexpr) PyDAAL)
Sequential Regions Python Compute Regions Regions
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Nested parallelism: Make application parallel
Finish
Start
(NumPy|S PyDAAL)
ciPy| Regions
Numexpr)
Sequential Regions Python Speedup?
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
nested parallelism: Oversubscription
Parallel
(Numba| Sklearn| PyDAAL)
Compute Regions
Regions Reasonable
limit
#threads
Finish
Parallel
(NumPy|SciPy|Numexpr)
Regions
Sequential Python Speedup?
time
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent 1
Subcomponent 2
Subcomponent 1
Subcomponent 1
Subcomponent 1
Subcomponent K
Libraries/modules/components are
Subcomponent 1
Component N
not aware of the big picture Subcomponent 1
Subcomponent 1
Subcomponent M
Subcomponent 1
Subcomponent 1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent 1
Subcomponent 2
Subcomponent 1
Subcomponent 1
Subcomponent 1
Subcomponent K
Libraries/modules/components are
Subcomponent 1
Component N
not aware of the big picture Subcomponent 1
Subcomponent 1
Subcomponent M
Subcomponent 1
Subcomponent 1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Application
Threading Composability Component 1
Subcomponent 1
Subcomponent 1
Subcomponent 2
Subcomponent 1
Subcomponent 1
Subcomponent 1
Subcomponent K
Libraries/modules/components are
Subcomponent 1
Component N
not aware of the big picture Subcomponent 1
Subcomponent 1
Subcomponent M
Subcomponent 1
Subcomponent 1
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Our GOAL
Parallel turtle
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Intel® TBB: parallelism orchestration in Python ecosystem
PyDAAL
NumPy
Thread
Scikit-
Joblib
SciPy
learn
Dask
Pool
Pool
OpenCV
Numba
Intel® Intel® TBB
Intel® DAAL
MKL module for Python
• Restricts oversubscription
$ python –m tbb
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.
Summary
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 55
*Other names and brands may be claimed as the property of others.
Legal Disclaimer & Optimization Notice
The benchmark results reported above may need to be revised as additional testing is conducted. The results depend on the specific platform configurations and
workloads utilized in the testing, and may not be applicable to any particular user’s components, computer system or workloads. The results are not necessarily
representative of other benchmarks and other benchmark results may show greater or lesser impact from mitigations.
Software and workloads used in performance tests may have been optimized for performance only on Intel microprocessors. Performance tests, such as SYSmark
and MobileMark, are measured using specific computer systems, components, software, operations and functions. Any change to any of those factors may cause
the results to vary. You should consult other information and performance tests to assist you in fully evaluating your contemplated purchases, including the
performance of that product when combined with other products. For more complete information visit www.intel.com/benchmarks.
INFORMATION IN THIS DOCUMENT IS PROVIDED “AS IS”. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY
RIGHTS IS GRANTED BY THIS DOCUMENT. INTEL ASSUMES NO LIABILITY WHATSOEVER AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY,
RELATING TO THIS INFORMATION INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR
INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT.
Copyright © 2018, Intel Corporation. All rights reserved. Intel, Pentium, Xeon, Xeon Phi, Core, VTune, Cilk, and the Intel logo are trademarks of Intel Corporation
in the U.S. and other countries.
Optimization Notice
Intel’s compilers may or may not optimize to the same degree for non-Intel microprocessors for optimizations that are not unique to Intel microprocessors.
These optimizations include SSE2, SSE3, and SSSE3 instruction sets and other optimizations. Intel does not guarantee the availability, functionality, or
effectiveness of any optimization on microprocessors not manufactured by Intel. Microprocessor-dependent optimizations in this product are intended for
use with Intel microprocessors. Certain optimizations not specific to Intel microarchitecture are reserved for Intel microprocessors. Please refer to the
applicable product User and Reference Guides for more information regarding the specific instruction sets covered by this notice.
Notice revision #20110804
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved. 70
*Other names and brands may be claimed as the property of others.
Example: QR with numpy
2 x = da.random.random((100000, 2000),
4 t0 = time.time()
5 q, r = da.linalg.qr(x)
7 assert(test.compute()) # threaded
8 print(time.time() - t0)
Optimization Notice
Copyright © 2018, Intel Corporation. All rights reserved.
*Other names and brands may be claimed as the property of others.