Documente Academic
Documente Profesional
Documente Cultură
Computing
Joint International Masters
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?
Your Background:
0. What have you learned so far (e.g. in your Bachelors)?
1. Have you had a chance to program parallel and/or distributed
systems already?
2. What do you expect from this course?
3. What would you like to learn in this course?
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?
Meanwhile,
extended to
accommodate
shops, home
banking, etc.
(Pardon the
German).
Source: “Verteilte Systeme”, Skript, Peter Wollenweber, Hochschule Darmstadt, Fachbereich Informatik.
Very OPA CS
(Client)
OPA CS
(Server) OPA DS
MQSeries
Client
MQSeries M-Broker
OPA DS
heterogeneous. XM
L
OLY / K+
Interface
SUN
IMS IMS
MQSeries
Much of the
architecture IMS DB2 DB2
Vorfalls
DB2
LuxNet
LuxNet
Repl
DB2
Olympic
Sybase
Kondor+
consists of legacy
KREKIN GeParD
DB DB
MVS
systems.
Source: “Verteilte Systeme”, Skript, Alois Schütte, Hochschule Darmstadt, Fachbereich Informatik.
Source: http://en.wikipedia.org/w/index.php?title=Google_platform&oldid=202504102,, Barroso et. al. Web Search for a Planet: The
Google Cluster Architecture, IEEE Micro, March-April 2003 (http://labs.google.com/papers/googlecluster-ieee.pdf)
Financial Market
HTTP Request for an URL
internet or
Read (Pull)
Write (Push)
intranet Calculation
Read/Write
application.
Active Invalidate
Processes
Active Update
granularity...
Source: Cotoaga, K.; Müller, A.; Müller, R.: Effiziente Distribution dynamischer Inhalte im
Web. In Wirtschaftsinformatik 44 (2002) 3, p. 249-259.
Sources: "Deep Blue (chess computer)." Wikipedia, The Free Encyclopedia.15 Oct 2009
<http://en.wikipedia.org/w/index.php?title=Deep_Blue_(chess_computer)&oldid=318710273>, corrected with information from
<http://www.research.ibm.com/deepblue/meet/html/d.3.shtml>.
Prof. Ronald Moore – P&DC – Introduction – Winter 2010/11 16
Examples (7) – Blue Gene
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?
Four Factors:
1) Moore's Law
2) Amdahl's Law
3) Inherent Difficulty
4) Lack of a Unifying
Paradigm
The number of transistors per chip
doubles every two years.
Source: H. Bauke, S. Mertens, Cluster Comuting, Springer Verlag, 2006. Section 1.5 (p. 10-13).
Four Factors:
1) Moore's Law
2) Amdahl's Law
3) Inherent Difficulty
4) Lack of a Unifying
Paradigm
Since the upper limit on
Speedup is 1/f , Efficiency
goes to zero as p → ∞
If we introduce communication
into our model, we can find a
value of p with maximum
speedup. Afterwards, more Speedup and Efficiency,
processors mean more run-time! with and without communication costs,
for f = 0.005, and 1 ≤ p ≤ 10000
Source: H. Bauke, S. Mertens, Cluster Computing, Springer Verlag, 2006. Abb. 1.3, p. 12.
Four Factors:
1) Moore's Law
2) Amdahl's Law
3) Inherent Difficulty
4) Lack of a Unifying
Paradigm
“Some Parallel
Programming
Environments from
the Mid-1990s”
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?
Four Factors:
1) Moore's Law
Register Register
2) Amdahl's Law set set Core
3) Inherent Difficulty Core
Package
4) Lack of a Unifying Die
Paradigm Die
Package
The number of transistors keeps
growing, but they can't all be Main-Board
used in one CPU any more:
Deep Pipelining has its limits, Hyper-threading has its
this led to hyper-threading limits, this led to multicore Result: Sequential
(virtually more than 1 CPU) chips (really more than 1 Processors are dying
CPU per chip). out! Multiprocessors are
now the rule, not the
Both can be combined. exception!
Amdahl: T(1)
Speedup is...
S(p) = T(1) / T(p)
T(p)
1
= 1− f
f Moral of the Story:
p
Speedup is limited only if T(1) is held constant;
For sufficiently large T(1), f goes to zero
Source: H. Bauke, S. Mertens, Cluster Comuting, Springer Verlag, 2006. Section 1.7 and Abb. 1.6, p. 16.
Source: T. Mattson, B. Sanders, B. Massingill, Patterns of Parallel Programming, Addison-Wesley, 2005, p. 1-2.
Four Factors:
1) Moore's Law
2) Amdahl's Law
3) Inherent Difficulty
4) Lack of a Unifying
Paradigm
Of all of these,
only Java...
...and MPI
have survived.
plus Newcomers: ...more
or
C++0x (Boost),
less...
OpenMP &
something for
GPGPUs
Prof. Ronald Moore – P&DC – Introduction – Winter 2010/11 36
Outline of Chapter I – Introduction
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?
C++ Java
Inherits posix threads, Includes Threads &
sockets from C / Unix Sockets from the start
2) Synchronization
void thread() {
3) Thread-local
std::cout << "Hello from Thread "
Memory
<< boost::this_thread::get_id()
4) Sockets for << std::endl;
Message Passing
}
Boost::thread
Constructor starts Function pointer
int main()
threads running,
{
join method waits until
a thread is finished. boost::thread t1(thread);
boost::thread t2(thread);
We have a
t1.join();
synchronization
problem here. Where? t2.join();
}
Source: http://en.highscore.de/cpp/boost/multithreading.html
3) Thread-local
Memory void thread() {
mutex.lock(); // critical section!
4) Sockets for std::cout << "Hello from Thread "
Message Passing << boost::this_thread::get_id()
<< std::endl;
mutex.unlock(); // parallelism resumes here...
Mutex = Mutual }
Exclusive. Mutexes
are used to guard int main()
critical sections – {
areas with limited boost::thread t1(thread);
(or no) parallelism. boost::thread t2(thread);
t1.join();
t2.join();
} Source: http://en.highscore.de/cpp/boost/multithreading.html
Message Passing }
} // end number_generator
boost::mutex mutex;
lock_guard:
We need more than void random_number_generator() {
Constructor locks,
init_number_generator();
static (global) and int i = std::rand();
destructor unlocks –
local variables! boost::lock_guard<boost::mutex> lock(mutex); Principle: RAII =
std::cout << i << std::endl; Resource Acquisition
} // end random_number_generator Is Initialization
int main() {
boost::thread t[3];
for (int i = 0; i < 3; ++i)
t[i] = boost::thread(random_number_generator);
for (int i = 0; i < 3; ++i)
t[i].join();
}
Source: http://en.highscore.de/cpp/boost/multithreading.html
1) Threads
#include <iostream>
#include <string>
boost::asio::io_service io_service;
Memory if (!ec) {
std::cout << std::string(buffer.data(), bytes_transferred) << std::endl;
4) Sockets for }
sock.async_read_some(boost::asio::buffer(buffer), read_handler);
more general }
sock.async_read_some(boost::asio::buffer(buffer), read_handler);
library.
void resolve_handler(const boost::system::error_code &ec,
boost::asio::ip::tcp::resolver::iterator it) {
2) Synchronization
with private and main ()
shard data, {
fork/join // serial startup goes here...
3) Loop Parallelism #pragma omp parallel num_threads (8)
{ // parallel segment
OpenMP is a compiler-
extension (not really a printf(
"\nHello world, I am thread %d\n",
language extension) for C, C++ omp_get_thread_num() );
and Fortran. Assumes shared
}
memory (usually).
// rest of serial segment ...
Supported by the Intel and gnu
compilers (et. al.). }
2) Synchronization main ()
{
Thread0 a=1, b=17, sum=16
with private and // serial segment
b = 1;
a = 1;
shard data, sum = 0;
MPI supports int rank, size, next, prev, message, tag = 201;
MPI::Init();
1) Remote Execution rank = MPI::COMM_WORLD.Get_rank();
(not threads) size = MPI::COMM_WORLD.Get_size();
next = (rank + 1) % size;
2) Message Passing prev = (rank + size - 1) % size;
3) Data Parallelism Message = 10;
if (0 == rank)
The basic operations in MPI::COMM_WORLD.Send(&message, 1, MPI::INT, next, tag);
MPI are sending and while (1) {
receiving messages. MPI::COMM_WORLD.Recv(&message, 1, MPI::INT, prev, tag);
Message passing can if (0 == rank) --message;
be blocking or non- MPI::COMM_WORLD.Send(&message, 1, MPI::INT, next, tag);
blocking. Functions are if (0 == message) break; // exit loop!
available for handling } // end while
arbitrary data – and if (0 == rank)
ensuring compatibility
MPI::COMM_WORLD.Recv(&message, 1, MPI::INT, prev, tag);
across heterogeneous
MPI::Finalize();
machines.
return 0;
}
MPI supports P0 A P0 A
1) Remote Execution P1 Bcast P1 A
(not threads) P2 P2 A
2) Message Passing P3 P3 A
3) Data Parallelism
P0 A B C D P0 A
MPI also contains P1 P1 B
message passing Scatter
P2 P2 C
functions that make it
easy to distribute data P3 P3 D
Gather
(usually arrays) across
a group of processors.
I Introduction
A) Personal Introductions
II Models of Parallel Computing
B) What is Parallel &
III Parallel Computation Design Distributed Computing?