Documente Academic
Documente Profesional
Documente Cultură
COMP 6231
Introduction
Lecture 1
Essam Mansour
1
2
Who am I?
Dr. Essam Mansour
Assistant Professor
www.emansour.com
@DrEssam_Mansour
2019 - Now
Concordia Data
2013 - 2019 Systems (CoDS) Lab
2009 - 2013
Large-scale Analytics
2008 - 2009 on Strings
Ireland
3
Who am I?
Dr. Essam Mansour
Assistant Professor
www.emansour.com
@DrEssam_Mansour
2019 - Now
Concordia Data
2013 - 2019 Systems (CoDS) Lab
2009 - 2013
Expertise Venues
2004 - 2008
• Big data • VLDBJ
Ph.D. in Computer Science
• PVLDB
Egypt
Data Management • Database systems
Semi-structured data • SIGMOD
• Data discovery and integration • ICDE
Active Databases
1997 - 2003 • Distributed and parallel systems • WWW
B.Sc. and M.Sc.
Database Systems
• Graph and web data management • CIKM
Software Engineering • …….
Temporal Query Processing
4
Supervision/Mentoring Experience
- Co-supervising postgraduate students:
3 PhD students
2 students in the string analytics project (2015, and 2017)
1 student in the decentralized graph management (2018)
1 VLDBJ, 2 PVLDB, 1 ICDE, and 1 CIKM research papers
1 SIGMOD, 1 VLDB, and 1 ICDE demo/poster papers
2 Master students
1 student in the string analytics project (2012)
1 student in the mobile data management project (2009)
1 PVLDB and 1 ICSOFT research papers
5
Academic Service
• A Program Committee member of:
• I also regularly review papers for a number of top international journals including:
• ACM Transactions on Database Systems (TODS)
• The International Journal on Very Large Data Bases (VLDB Journal)
• IEEE Transactions on Knowledge and Data Engineering (TKDE)
• IEEE Transactions on Parallel and Distributed Systems (TPDS)
• IEEE Transactions on Cloud Computing (TCCSI)
• IEEE Transactions on Big Data (TBD)
6
Teaching Style
• I like interaction in class
• I like to ask questions
• I like to be asked questions
• I like to learn …
7
… and now
8
Guest Lecture, CMPT 523 Distributed Systems, Spring 2016 9
2009
10
On the Verge of A Disruptive Century :
Breakthroughs
Ubiquitous
Astronomy
Computing
Smaller, Faster,
Cheaper Sensors
Gene
Sequencing and
Biotechnology
11
A Common Theme is Data
The amount of data is only growing…
1.2 Zettabytes (1021 B or 1 Billion TB)
12
We Live in a World of Data…
13
What Do We Do With Data?
Store Share
Access Process
…. and
Encrypt more!
Mobile Devices
Computers
We also want to access, share and process our data from all of
our devices, anytime, anywhere!
15
Data Becoming Critical to Our Lives
Health Science
Domains
Education of Data
Work
Environment Finance
… and more
16
How to Store and Process Data at Scale?
17
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems
Limited bandwidth
19
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems
2. Processors:
Moore’s law still holds
2. Processors:
But up until a few years ago, CPU speed grew at the rate of 55%
annually, while the memory speed grew at the rate of only 7%
21
Vertical Scaling
• Caveat: Individual computers can still suffer from limited resources
with respect to the scale of today’s problems
2. Processors:
But up until a few years ago, CPU speed grew at the rate of 55%
annually, while the memory speed grew at the rate of only 7%
P P P P
10000 Suffer from
L1 L1 L1 L1 seconds (or limited scalability
3 hours) to
Interconnect
load data
L2 Cache
A Data Set
of 4 TBs
Memory
4 100MB/S IO Channels
23
How to Store and Process Data at Scale?
24
Horizontal Scaling
P 100
P
Splits Only 3
L1 Machines L1
minutes to
L2 L2 load data
Memory Memory
A Data Set (data)
of 4 TBs
25
Requirements
• But, this necessitates:
• A way to express the problem in terms of parallel processes and execute them
on different machines (Programming and Concurrency Models)
26
Requirements
• But, this necessitates:
• A way for distributed processes to cooperate, synchronize with one another,
and agree on shared values (Synchronization)
DATA D A T A
28
So, What is a Distributed System?
29
Features
• Distributed Systems imply four main features:
1 Geographical Separation
30
Parallel vs. Distributed Systems
• Distributed systems contrast with parallel systems, which entail:
1 Strong Coupling
4 Homogeneity
Some Administrivia!
32
Distributed Systems Design
Considered: a reasonably critical and
comprehensive perspective. .9. Fault Tolerance
Thoughtful: Fluent, flexible and efficient .8. Distributed Frameworks
perspective. .7. Replication
Masterful: a powerful and illuminating .6. Caching
perspective. .5. Synchronization
.4. Naming
3. Architectures
.2. Remote Procedure Calls
.1. Networking
.0. Introduction
33
Course Objectives
The course aims at providing an in-depth
understanding and hands-on experience on
How modern
distributed
Distributed systems meet the
system demands of
Principles on programming contemporary
which models and distributed
Principles on distributed analytics applications
which systems are engines
distributed optimized
systems are
based
34
Teaching Team
COMP
Instructor:
Essam Mansour Teaching 6231
Assistants Teaching
(EM)
Team
EM Office Hours
• Monday, 2:30 - 4:00PM
• Welcome when my office door is open
• By appointment
• My office: EV 3.251
TAs Office Hours
• To be announced
• By appointment
35
Textbooks
36
Paper Reading and Reviews
37
Teaching Methods
13 Lectures
• Motivate learning
• Provide a framework or roadmap to organize the information of
the course
• Explain subjects and reinforce the critical big ideas
12 Labs
•Get you to reveal what you don’t understand, so that we can help you
•Allow you to practice skills you will need to become an expert
38
Assignments and Projects
Assignments
Projects
39
Projects
Project Proposal
Proposed Solution
Experiential Evaluation
Paper Demo
40
Assessment Methods
How do we measure learning?
Type # Weight
Project 1 35%
Final Exam 1 30%
Presentations/reviews 3 15%
41
Next Lecture
• Networking- Part I
Questions?
42