Sunteți pe pagina 1din 42

Distributed Systems Design

COMP 6231
Introduction
Lecture 1

Essam Mansour

1
2
Who am I?
Dr. Essam Mansour
Assistant Professor
www.emansour.com
@DrEssam_Mansour
2019 - Now
Concordia Data
2013 - 2019 Systems (CoDS) Lab
2009 - 2013
Large-scale Analytics
2008 - 2009 on Strings
Ireland

2004 - 2008 Querying


Ph.D. in Computer Science Geo-Distributed Graphs
Data Management
Egypt
Semi-structured data
Active Databases Mobile Data Management
1997 - 2003 Energy-aware algorithms Elastic OLTP Systems
B.Sc. and M.Sc.
Database Systems Data Integration and Discovery
Software Engineering
Temporal Query Processing

3
Who am I?
Dr. Essam Mansour
Assistant Professor
www.emansour.com
@DrEssam_Mansour
2019 - Now
Concordia Data
2013 - 2019 Systems (CoDS) Lab
2009 - 2013

2008 - 2009 30+ Publications


Ireland

Expertise Venues
2004 - 2008
• Big data • VLDBJ
Ph.D. in Computer Science
• PVLDB
Egypt
Data Management • Database systems
Semi-structured data • SIGMOD
• Data discovery and integration • ICDE
Active Databases
1997 - 2003 • Distributed and parallel systems • WWW
B.Sc. and M.Sc.
Database Systems
• Graph and web data management • CIKM
Software Engineering • …….
Temporal Query Processing

4
Supervision/Mentoring Experience
- Co-supervising postgraduate students:
3 PhD students
 2 students in the string analytics project (2015, and 2017)
 1 student in the decentralized graph management (2018)
 1 VLDBJ, 2 PVLDB, 1 ICDE, and 1 CIKM research papers
 1 SIGMOD, 1 VLDB, and 1 ICDE demo/poster papers

2 Master students
 1 student in the string analytics project (2012)
 1 student in the mobile data management project (2009)
 1 PVLDB and 1 ICSOFT research papers

- Mentoring different postdocs.

5
Academic Service
• A Program Committee member of:

• I also regularly review papers for a number of top international journals including:
• ACM Transactions on Database Systems (TODS)
• The International Journal on Very Large Data Bases (VLDB Journal)
• IEEE Transactions on Knowledge and Data Engineering (TKDE)
• IEEE Transactions on Parallel and Distributed Systems (TPDS)
• IEEE Transactions on Cloud Computing (TCCSI)
• IEEE Transactions on Big Data (TBD)

6
Teaching Style
• I like interaction in class
• I like to ask questions
• I like to be asked questions

• I like to know (and memorize) your names 

• I like to give practical assignments and projects

• I like to learn …

7
… and now

8
Guest Lecture, CMPT 523 Distributed Systems, Spring 2016 9
2009
10
On the Verge of A Disruptive Century :
Breakthroughs

Ubiquitous
Astronomy
Computing

Smaller, Faster,
Cheaper Sensors

Gene
Sequencing and
Biotechnology

11
A Common Theme is Data
The amount of data is only growing…
1.2 Zettabytes (1021 B or 1 Billion TB)

12
We Live in a World of Data…

13
What Do We Do With Data?

Store Share

Access Process

…. and
Encrypt more!

We want to do these seamlessly...


14
Using Diverse Interfaces & Devices

Mobile Devices
Computers

…and even appliances

Consumer Electronics Personal Monitors and


Sensors

We also want to access, share and process our data from all of
our devices, anytime, anywhere!

15
Data Becoming Critical to Our Lives

Health Science
Domains
Education of Data
Work

Environment Finance

… and more

16
How to Store and Process Data at Scale?

• A system can be scaled:


• Either vertically (or up) 
• Can be achieved by hardware upgrades (e.g., faster CPU, more
memory, and/or larger disk)

• And/Or horizontally (or out)


• Can be achieved by adding more machines

17
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems

1. Caches and Memory:


L1
16-32 KB/Core, 4-5 cycles
Cache

L2 Cache 128-256 KB/Core, 12-15 cycles

L3 Cache 512KB- 2 MB/Core, 30-50 cycles

Main Memory 8GB- 128GB, 300+ cycles


18
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems

2. Disks-- some advancements, but still:


 Limited capacity

 Limited number of channels

 Limited bandwidth

19
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems

2. Processors:
 Moore’s law still holds

 Chip Multiprocessors (CMPs) are now available


P P P P
P L1 L1 L1 L1
L1
Interconnect
L2
L2 Cache

A single Processor Chip


20
A CMP
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems

2. Processors:
 But up until a few years ago, CPU speed grew at the rate of 55%
annually, while the memory speed grew at the rate of only 7%

Processor-Memory speed gap


M

21
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems

2. Processors:
 But up until a few years ago, CPU speed grew at the rate of 55%
annually, while the memory speed grew at the rate of only 7%

 Even if 100s or 1000s of cores are placed on a CMP, it is a challenge


to deliver input data to these cores fast enough for processing
Vertical Scaling

P P P P
10000 Suffer from
L1 L1 L1 L1 seconds (or limited scalability
3 hours) to
Interconnect
load data
L2 Cache
A Data Set
of 4 TBs

Memory

4 100MB/S IO Channels

23
How to Store and Process Data at Scale?

• A system can be scaled:


• Either vertically (or up)
• Can be achieved by hardware upgrades (e.g., faster CPU, more
memory, and/or larger disk)

• And/Or horizontally (or out)


• Can be achieved by adding more machines 

24
Horizontal Scaling

P 100
P
Splits Only 3
L1 Machines L1
minutes to
L2 L2 load data

Memory Memory
A Data Set (data)
of 4 TBs

25
Requirements
• But, this necessitates:
• A way to express the problem in terms of parallel processes and execute them
on different machines (Programming and Concurrency Models)

• A way to organize processes (Architectures)

• A way for distributed processes to exchange information (Communication


Paradigms)

• A way to locate and share resources (Naming Protocols)

26
Requirements
• But, this necessitates:
• A way for distributed processes to cooperate, synchronize with one another,
and agree on shared values (Synchronization)

• A way to reduce latency, enhance reliability, and improve performance


(Caching, Replication, and Consistency)

• A way to enhance load scalability, reduce diversity across heterogeneous


systems, and provide a high degree of portability and flexibility
(Virtualization)

• A way to recover from partial failures (Fault Tolerance)


27
Degree of Parallelism

DATA D A T A

Task Parallelism Data Parallelism

28
So, What is a Distributed System?

A distributed system is:

One in which components


A collection of independent
located at networked
computers that appear to its
computers communicate and
users as a single coherent
coordinate their actions only
system
by passing messages

29
Features
• Distributed Systems imply four main features:

1 Geographical Separation

2 No Common Physical Clock

3 No Common Physical Memory

4 Autonomy and Heterogeneity

30
Parallel vs. Distributed Systems
• Distributed systems contrast with parallel systems, which entail:

1 Strong Coupling

2 A Common Physical Clock


I am not sure.
Can you verify that?

3 A Shared Physical Memory

4 Homogeneity
Some Administrivia!

32
Distributed Systems Design
Considered: a reasonably critical and
comprehensive perspective. .9. Fault Tolerance
Thoughtful: Fluent, flexible and efficient .8. Distributed Frameworks
perspective. .7. Replication
Masterful: a powerful and illuminating .6. Caching
perspective. .5. Synchronization
.4. Naming
3. Architectures
.2. Remote Procedure Calls
.1. Networking
.0. Introduction
33
Course Objectives
The course aims at providing an in-depth
understanding and hands-on experience on

How modern
distributed
Distributed systems meet the
system demands of
Principles on programming contemporary
which models and distributed
Principles on distributed analytics applications
which systems are engines
distributed optimized
systems are
based

34
Teaching Team
COMP
Instructor:
Essam Mansour Teaching 6231
Assistants Teaching
(EM)
Team

EM Office Hours
• Monday, 2:30 - 4:00PM
• Welcome when my office door is open
• By appointment
• My office: EV 3.251
TAs Office Hours
• To be announced
• By appointment

35
Textbooks

36
Paper Reading and Reviews

37
Teaching Methods
13 Lectures
• Motivate learning
• Provide a framework or roadmap to organize the information of
the course
• Explain subjects and reinforce the critical big ideas

12 Labs
•Get you to reveal what you don’t understand, so that we can help you
•Allow you to practice skills you will need to become an expert

38
Assignments and Projects

Assignments

• required problem solving and reading assignments

Projects

• large programming project

39
Projects
Project Proposal

Introduction and Related Work

Proposed Solution

Experiential Evaluation

Paper Demo
40
Assessment Methods
 How do we measure learning?
Type # Weight
Project 1 35%
Final Exam 1 30%

Problem Set 4 20%

Presentations/reviews 3 15%

41
Next Lecture
• Networking- Part I

Questions?

42

S-ar putea să vă placă și