Cloud Computing 3999

UC Berkeley
Cloud Computing: Past, Present, and Future

Professor Anthony D. Joseph*, UC Berkeley Reliable Adaptive Distributed Systems Lab
RWTH Aachen 22 March 2010

h ttp :// a b o ve th e cl u d s. cs. b e rke l y. o e edu/
* Director, IntelResearch B e rke l y e
RAD Lab 5-year Mission

Enable 1 person to develop, deploy, operate next -generation Internet application Key enabling technology: Statistical machine learning
debugging, monitoring, pwr mgmt, auto-configuration, perf prediction, ...
Highly interdisciplinary faculty & students

PIs: Patterson/Fox/Katz (systems/networks), Jordan (machine learning), Stoica (networks & P2P), Joseph (security), Shenker (networks), Franklin (DB) 2 postdocs, ~30 PhD students, ~6 undergrads
Grad/Undergrad teaching integrated with research
Course Timeline
Friday
10:00-12:00 History of Cloud Computing: Time-sharing, virtual machines, datacenter architectures, utility computing 12:00-13:30 Lunch 13:30-15:00 Modern Cloud Computing: economics, elasticity, failures 15:00-15:30 Break 15:30-17:00 Cloud Computing Infrastructure: networking, storage, computation models
Monday
10:00-12:00 Cloud Computing research topics: scheduling, multiple datacenters, testbeds
NEXUS: A COMMON SUBSTRATE FOR CLUSTER COMPUTING

Joint work with Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi,
Recall: Hadoop on HDFS

namenode namenode daemon job submission node jobtracker
tasktracker datanode daemon Linux file system
slave node slave node
slave node
A d a p te d fro m sl d e s b y J m m y Li , C h ri p h e B i g l a , A a ro n K i b a l , & S i rra M i e l -S l ttve t, G o o g l D i b u te d i i n sto sci i m l e ch s e e stri C o m p u ti g S e m i a r, 2 0 0 7 ( l ce n se d u n d e r C re a ti n C o m m o n s A ttri u ti n 3 . 0 Li n se ) n n i o b o ce
Problem
Rapid innovation in cluster computing frameworks No single framework optimal for all applications Energy efficiency means maximizing cluster utilization Want to run multiple frameworks in a single cluster
What do we want to run in the cluster?
Apache Hama
Pregel
Pig
Dryad
Why share the cluster between frameworks?

Better utilization and efficiency (e.g., take advantage of diurnal patterns) Better data sharing across frameworks and applications
Solution
Nexus is an operating system for the cluster over which diverse frameworks can run
Nexus multiplexes resources between frameworks Frameworks control job execution
Goals
Scalable Robust (i.e., simple enough to harden) Flexible enough for a variety of different cluster frameworks Extensible enough to encourage innovative future frameworks
Option: Coarse-grained sharing
Question 1: Granularity of Sharing
Give framework a (slice of) machine for its entire duration

Hadoop 1
Data locality compromised if machine held for long time Hard to account for new frameworks and changing demands -> hurts utilization and interactivity
Hadoop 2
Hadoop 3
Question 1: Granularity of Sharing
Nexus: Fine-grained sharing

Support frameworks that use smaller tasks (in time and space) by multiplexing them across all available resources
Hadoop 3 Hadoop 1 3 Hadoop 2 Hadoop 2 Hadoop 1 Hadoop 2
Frameworks can take turns accessing data on each node Can resize frameworks shares to get utilization & interactivity
Hadoop 2 Hadoop 3
Hadoop 1 Hadoop 3
1 Hadoop 3 Hadoop 2
Hadoop 2 Hadoop 1
Hadoop 3 Hadoop 2 1
Hadoop 2 Hadoop 3
Question 2: Resource Allocation
Option: Global scheduler

Frameworks express needs in a specification language, a global scheduler matches resources to frameworks
Requires encoding a frameworks semantics using the language, which is complex and can lead to ambiguities Restricts frameworks if specification is unanticipated
Designing a general-purpose global scheduler is hard
Question 2: Resource Allocation
Nexus: Resource offers

Offer free resources to frameworks, let frameworks pick which resources best suit their needs
+ Keeps Nexus simple and allows us to support future jobs - Distributed decisions might not be optimal
Outline
Nexus Architecture Resource Allocation Multi-Resource Fairness Implementation Results
NEXUS ARCHITECTURE
Overview
Hadoop job Hadoop job MPI job MPI scheduler Hadoop v19 scheduler v20 scheduler Hadoop
Nexus master
Nexus slave
Hadoop Hadoop v20 executor
Nexus slave
MPI v19 executor executor task
Nexus slave
MPI Hadoop v19 executor executor task
task
task
task
Resource Offers
MPI job MPI scheduler Hadoop job Hadoop scheduler
NexusResource master offer
Pick framework to offer to
Nexus slave
MPI executor
Nexus slave
MPI executor
task
task
Resource Offers
MPI job MPI scheduler Hadoop job Hadoop scheduler
offer = list of {machine, free_resources}

Resource Nexus master Example: offer Pick framework to offer to [ {node 1, <2 CPUs, 4 GB>}, {node 2, <2 CPUs, 4 Nexus slave ] Nexus slave GB>}
MPI executor MPI executor
task
task
Resource Offers
MPI job MPI scheduler Hadoop job Hadoop task scheduler
Framework-specific scheduling
NexusResource master offer
Pick framework to offer to
Nexus slave
MPI executor
Nexus slave
MPI executor Hadoop executor
Launches & isolates executo
task
task
Resource Offer Details

Min and max task sizes to control fragmentation Filters let framework restrict offers sent to it
By machine list By quantity of resources
Timeouts can be added to filters Frameworks can signal when to destroy filters, or when they want
Using Offers for Data Locality

We found that a simple policy called delay scheduling can give very high locality:
Framework waits for offers on nodes that have its data If waited longer than a certain delay, starts launching non-local tasks
Framework Isolation
Isolation mechanism is pluggable due to the inherent perfomance/isolation tradeoff Current implementation supports Solaris projects and Linux containers
Both isolate CPU, memory and network bandwidth Linux developers working on disk IO isolation
RESOURCE ALLOCATION
Allocation Policies
Nexus picks framework to offer resources to, and hence controls how many resources each framework can get (but not which) Allocation policies are pluggable to suit organization needs, through allocation modules
Cluster Share Policy
Example: Hierarchical Fairshare Policy
Facebook.com
20% 100% 80% 0
Spam
30 20% 100% 6% 14% 70%
Ads
Job 3
User 1
Curr Curr Time Time Curr Time
User 2
Job 4
Job 1
Job 2
Revocation
Killing tasks to make room for other users Not the normal case because finegrained tasks enable quick reallocation of resources Sometimes necessary:
Long running tasks never relinquishing resources Buggy job running forever Greedy user who decides to makes his task long
Revocation Mechanism
Allocation policy defines a safe share for each user

Users will get at least safe share within specified time
Revoke only if a user is below its safe share and is interested in offers
Revoke tasks from users farthest above their safe share Framework warned before its task is killed
How Do We Run MPI?
Users always told their safe share

Avoid revocation by staying below it
Giving each user a small safe share may not be enough if jobs need many machines Can run a traditional grid or HPC scheduler as a user with a larger safe share of the cluster, and have MPI jobs queue up on it
E.g. Torque gets 40% of cluster
Example: Torque on Nexus

Facebook.com
20% 40% 40%
Safe share = 40%
Spam
User 1
Job 1 Job 2
Ads
Job 1
Torque
User 2
Job 4
MPI Job MPI MPI MPI Job Job Job
MULTI-RESOURCE FAIRNESS
What is Fair?
Goal: define a fair allocation of resources in the cluster between multiple users Example: suppose we have:
30 CPUs and 30 GB RAM Two users with equal shares User 1 needs <1 CPU, 1 GB RAM> per task User 2 needs <1 CPU, 3 GB RAM> per task
Definition 1: Asset Fairness

Idea: give weights to resources (e.g. 1 CPU = 1 GB) and equalize value of resources PROBLEM each user given to User 1 has < 50% of both CPUs and RAM Algorithm: when resources are free, offer to User User whoever has the least value 1 2 100% Result: U1: 12 tasks: 12 CPUs, 12 GB ($24) 50% U2: 6 tasks: 6 CPUs, 18 GB ($24)
0% CPU RAM
Lessons from Definition 1

You shouldnt do worse than if you ran a smaller, private cluster equal in size to your share Thus, given N users, each user should get 1/N of his dominating resource (i.e., the resource that he consumes most of)
Def. 2: Dominant Resource Fairness

Idea: give every user an equal share of her dominant resource (i.e., resource it consumes most of) Algorithm: when resources are free, offer to the user with the smallest dominant share User (i.e., fractional share of the her dominantUser 1 2 100% resource) Result: 50% U1: 15 tasks: 15 CPUs, 15 GB U2: 5 tasks: 5 CPUs, 150% GB
CPU RAM
Fairness Properties
Scheduler Property Asset Dynamic x x x CEEI x x x x DRF x x x x x x x Pareto x efficiency Single-resource x fairness Bottleneck fairness Share guarantee Population monotonicity Envy-freedom Resource monotonicity x x
IMPLEMENTATION
Implementation Stats
7000 lines of C++ APIs in C, C++, Java, Python, Ruby
Executor isolation using Linux containers and Solaris projects
Frameworks
Ported frameworks:
Hadoop (900 line patch) MPI (160 line wrapper scripts)
New frameworks:
Spark, Scala framework for iterative jobs (1300 lines) Apache+haproxy, elastic web server farm (200 lines)
RESULTS
Overhead
Less than 4% seen in practice
Dynamic Resource Sharing
Multiple Hadoops Experiment
Hadoop 1
Hadoop 2
Hadoop 3
Multiple Hadoops Experiment
Hadoop 1
Hadoop 3 Hadoop 1
3 Hadoop 2 Hadoop 2
Hadoop 1 Hadoop 2
Hadoop 2
Hadoop 2 Hadoop 3
Hadoop 1 Hadoop 3
1 Hadoop 3 Hadoop 2
Hadoop 3
Hadoop 2 Hadoop 1
Hadoop 3 Hadoop 1 2
Hadoop 2 Hadoop 3
Results with 16 Hadoops
WEB SERVER FARM FRAMEWORK
Web Framework Experiment

httperf HTTP request HTTP request
Load calculation
Scheduler (haproxy)Load gen framework

task resource offer
Nexus master
status update
Nexus slave
Nexus slave
Nexus slave
executor Load gen Web executor
task task task ) (Apache
Load gen executor Web executor Load gen executor Web executor
task task (Apache) task task (Apache)
Web Framework Results
Future Work
Experiment with parallel programming models Further explore low-latency services on Nexus (web applications, etc) Shared services (e.g. BigTable, GFS) Deploy to users and open source
CLOUD COMPUTING TESTBEDS
OPEN CIRRUS: SEIZING THE OPEN SOURCE CLOUD STACK OPPORTUNITY

A JOINT INITIATIVEhSPONSORED BY . o rg INTEL, AND YAHOO! ttp : // o p e n cirru s HP, /
Proprietary Cloud Computing stacks Publicly accessible layer

GOOGLE
Applications
AMAZON
Applications
MICROSOFT
Applications
Application Frameworks MapReduce, Sawzall, Google App Engine, Protocol Buffers Software Infrastructure
VM Management Job Scheduling
Application Frameworks EMR Hadoop
Application Frameworks .NET Services
Software Infrastructure
VM Management
Software Infrastructure
VM Management
EC2
Job Scheduling Storage Management
Fabric Controller
Job Scheduling
Borg
Storage Management
Fabric Controller
Storage Management
GFS, BigTable
Monitoring
S3, EBS
Monitoring
SQL Services, blobs, tables, queues

Monitoring
Borg
Borg
Fabric Controller Hardware Infrastructure Fabric Controller
Hardware Infrastructure Borg
Hardware Infrastructure
Open Cloud Computing stacks

Applications
Storage Management
HDFS KFS Gluster Lustre PVFS MooseFS HBase Hypertable
Monitoring
Ganglia Nagios Zenoss MON Moara
Application Frameworks
Pig , Hadoop , MPI , Sprout , Mahout

Software Infrastructure VM Management Job Scheduling Storage Management Monitoring Hardware Infrastructure
Heavily fragmente today !

Job Scheduling
Maui / Torque
VM Management
Eucalyptus Enomalism Tashi Reservoir Nimbus , oVirt
Hardware Infrastructure
PRS Emulab Cobbler xCat
PRS , Emulab , Cobbler , xCat
Open Cirrus Cloud Computing Testbed

research, applications, infrastructure (12K cores), data sets
Shared :
Global services: sign on, monitoring, store. Open src stack (prs, tashi, hadoop 9 sites currently, target of around 20 in the next two years.
Sponsored by HP , Intel , and Yahoo ! (with additional support from NSF)
Open Cirrus Goals

Goals
Foster new systems and services research around cloud computing Catalyze open-source stack and APIs for the cloud
How are we unique?

Support for systems research and applications research Federation of heterogeneous datacenters
Open Cirrus Organization

Central Management Office, oversees Open Cirrus Governance model
Currently owned by HP Research team Technical team New site additions Support (legal (export, privacy), IT, etc.)
Each site E.g.
Runs its own research and technical teams Contributes individual technologies Operates some of the global services HP site supports portal and PRS Intel site developing and supporting Tashi Yahoo! contributes to Hadoop
Intel BigData Open Cirrus Site

http://opencirrus.intel-research.net
45 Mb / s T3 to Internet
1 Gb/s (x4)
1 Gb/s (x4) *
Switch 48 Gb/s
1 Gb/s (x4)
1 Gb/s (x8)
1 Gb/s (x4) Switch 48 Gb/s
Switch 24 Gb/s
1 Gb/s (x8 p2p)
Mobile Rack 8 (1u) nodes

------------2 Xeon E5440 (quad-core) [Harpertown / Core 2] 16GB DRAM 2 1TB Disk
1 Gb/s (x2x5 p2p) 1 Gb/s (x4) 1 Gb/s (x4)
1 Gb/s (x4)
Switch 48 Gb/s
1 Gb/s (x4x4 p2p) Blade Rack 40 nodes
20 nodes : 1 Xeon (1-core) [Irwindale /Pent4 ], 6GB DRAM, 366GB disk (36+300GB) 10 nodes: 2 Xeon 5160 (2-core) [Woodcrest /Core], 4GB RAM, 2 75GB disks 10 nodes: 2 Xeon E5345 (4-core) [Clovertown /Core ],8GB DRAM, 2 150GB Disk
Switch 48 Gb/s
1 Gb/s (x4x4 p2p) Blade Rack 40 nodes
2 Xeon E5345 (quad-core) [Clovertown / Core] 8GB DRAM 2 150GB Disk
Switch 48 Gb/s
1 Gb/s (x15 p2p) 1U Rack 15 nodes
2 Xeon E5420 (quad-core ) [Harpertown / Core 2] 8GB DRAM 2 1TB Disk
Switch 48 Gb/s
2 Xeon E5440 (quad-core) [Harpertown / Core 2] 8GB DRAM 6 1TB Disk
Switch 48 Gb/s
2 Xeon E5520 (quad -core) [Nehalem -EP/ Core i7] 16GB DRAM 6 1TB Disk
3U Rack 5 storage nodes ------------12 1TB Disks
(r1r5)
PDU w/per-port monitoring and control
(r2r1c1-4)
(r2r2c1-4)
(r1r1, r1r2)
x2
(r1r3, r1r4, r2r3)
x3
(r3r2, r3r3)
x2
Key:
Nodes Cores
r1r3 r1r4 r2r1c1-4 r2r2c1-4 r1r1 r1r2 r2r3 r3r2 r3r3 40 40 30 45 30 140 320 240 360 240
rXrY=row X rack Y rXrYcZ=row X rack Y chassis Z
Open Cirrus Sites

Site Characteristics
#Cores #Srvrs Public HP 1,024 256 178 3.3TB 632TB 1152 10G internal 1Gb/s x-rack 1Gb/s Hadoop, Cells, PRS, scheduling Apps based on Hadoop, Pig Tashi, PRS, MPI, Hadoop Apps with high throughput Datasets, cloud infrastructure Storage, Tashi Memory Storage Spindles Network Focus
IDA
2,400
300
100
4.8TB
43TB+ 16TB SAN
600
Intel
1,364
198
145
1.8TB
610TB local 746 60TB attach 1PB 192
1Gb/s
KIT
2,048
256
128
10TB
1Gb/s
UIUC
1,024
128
64
2TB
~500TB
288
1Gb/s
CMU
1,024
128
64
2TB
--
--
1 Gb/s
Yahoo (M45)
3,200
480
400
2.4TB
1.2PB
1600
1Gb/s
Hadoop on demand
Total 12 , 074 1 , 746 1 , 029
26 . 3 TB
4 PB
Testbed Comparison
Testbeds
Open Cirrus
Type of research
IBM/Google TeraGrid
PlanetLab EmuLab
Open Cloud Consortium
Amazon LANL/NSF EC2 cluster
Systems & Dataservices intensive applications research Federation of heterogeneous data centers A cluster supported by Google and IBM
Scientific Systems applications and services
Systems
Interoperab. Commer. Systems across clouds use using open APIs Raw Re-use of access to LANLs virtual retiring machines clusters Amazon CMU, LANL, NSF
Approach
Multi-site hetero clusters super comp
A few 100 nodes hosted by research instit. Many schools and orgs
A single-site Multi-site cluster with heteros flexible clusters, control focus on network University of 4 centers Utah
Participants HP, Intel,
IDA, KIT, UIUC, Yahoo! CMU Distribution 7(9) sites 1,746 nodes 12,074 cores
IBM, Google, Many Stanford, schools U.Wash, and orgs MIT 1 site 11 partners in US
> 700 >300 nodes 480 cores, 1 site nodes univ@Utah distributed in world-wide four locations
1 site 1000s of older, still useful nodes
Open Cirrus Stack
Management and control subsystem
Compute + network + storage resources Power + cooling
Physical Resource set (Zoni) service

Credit: John Wilkes (HP)
Open Cirrus Stack
PRS clients, each with their own physical data center
Research
Tashi
NFS storage service
HDFS storage service
Zoni service
Open Cirrus Stack
Virtual clusters (e.g., Tashi)
Virtual cluster Virtual cluster
Research
Tashi
NFS storage service
Zoni service
Open Cirrus Stack

BigData App Hadoop
1.Application running 2.On Hadoop 3.On Tashi virtual cluster 4.On a PRS 5.On real hardware
Research
Tashi
NFS storage service
Zoni service
Open Cirrus Stack

BigData app Hadoop
Experiment/ save/restore
Research
Tashi
NFS storage service
Zoni service
Open Cirrus Stack

BigData App Platform services Hadoop
Research
Tashi
NFS storage service
Zoni service
Open Cirrus Stack

User services BigData App Hadoop
Platform services
Research
Tashi
NFS storage service
Zoni service
Open Cirrus Stack

User services BigData App Hadoop
Platform services
Research
Tashi
NFS storage service
Zoni
System Organization
Compute nodes are divided into dynamicallyallocated, vlan-isolated PRS subdomains Apps switch
Open service research Tashi development Production storage service Proprietary service research Open workload monitoring and trace collection
Apps running in a VM mgmt infrastructure ( e . g ., Tashi )
Open Cirrus stack - Zoni

Zoni service goals
Provide mini-datacenters to researchers Isolate experiments from each other Stable base for other research
Zoni service approach

Allocate sets of physical co-located nodes, isolated inside VLANs.
Zoni code from HP being merged into Tashi Apache project and extended by Intel
Running on HP site Being ported to Intel site Will eventually run on all sites
Open Cirrus Stack - Tashi

An open source Apache Software Foundation project sponsored by Intel (with CMU, Yahoo, HP)
Infrastructure for cloud computing on Big Data http://incubator.apache.org/proj ects/tashi
Research focus:
Location-aware co-scheduling of VMs, storage, and power. Seamless physical/virtual migration.
Joint with Greg Ganger (CMU), Mor Harchol-Balter (CMU), Milan Milenkovic (CTG)
Tashi High-Level Design

Most decisions happen in the scheduler; manages compute/storage/power in concert Services are instantiated through virtual machines Data location and power information is exposed to scheduler and services
Sche dule r
Virtualization Service Storage Service
C lu ste r M anager
Nod e Nod e
The storage service aggregates the capacity of the commodity nodes to house Big Data repositories.
Nod e Nod e Nod e Nod e
CM maintains databases and routes messages; decision logic is limited
Cluster nodes are assumed to be commodity machines
Location Matters (calculated)

Calculated (40 racks * 30 nodes * 2 disks) Throughput/disk (MB/s) 300 9.2X Disk-10G 3.5X SSD-1G SSD-10G 11X Disk-1G 3.6X 250 200 150 100 50 0
Random Placement
Location-Aware Placement
Open Cirrus Stack Hadoop

An open-source Apache Software Foundation project sponsored by Yahoo!
http://wiki.apache.org/hadoop/ProjectDescr
Provides a parallel programming model (MapReduce), a distributed file system, and a parallel database (HDFS)
73
What kinds of research projects are Open Cirrus sites looking for?
Open Cirrus is seeking research in the following areas (different centers will weight these differently):

Datacenter federation Datacenter management Web services Data-intensive applications and systems
The following kinds of projects are generally not of interest:

Traditional HPC application development Production applications that just need lots of cycles Closed source system development
How do users get access to Open Cirrus sites?

Project PIs apply to each site separately.
Contact names, email addresses, and web links for applications to each site will be available on the Open Cirrus Web site (which goes live Q209)
http://opencirrus.org
Each Open Cirrus site decides which users and projects get access to its site.
Developing a global sign on for all sites (Q2 09)

Users will be able to login to each Open Cirrus site for which they are authorized using the
Summary and Lessons

Intel is collaborating with HP and Yahoo! to provide a cloud computing testbed for the research community Using the cloud as an accelerator for interactive streaming/big data apps is an important usage model Primary goals are to
Foster new systems research around cloud computing Catalyze open-source reference stack and APIs for the cloud
Access model, Local and global services, Application frameworks
Explore location-aware and power-aware workload scheduling Develop integrated physical/virtual allocations to combat cluster squatting Design cloud storage models
OTHER CLOUD COMPUTING RESEARCH TOPICS: ISOLATION AND DC ENERGY
Heterogeneity in Virtualized Environments

VM technology isolates CPU and memory, but disk and network are shared
Full bandwidth when no contention Equal shares when there is contention
2.5x performance difference
EC2 small instances
Isolation Research
Need predictable variance over raw performance Some resources that people have run into problems with:
Power, disk space, disk I/O rate (drive, bus), memory space (user/kernel), memory bus, cache at all levels (TLB, etc), hyperthreading/etc, CPU rate, interrupts Network: NIC (Rx/Tx), Switch, crossdatacenter, cross-country OS resources: File descriptors, ports,
Datacenter Energy
EPA, 8/2007:
1.5% of total U.S. energy consumption Growing from 60 to 100 Billion kWh in 5 yrs 48% of typical IT budget spent on energy
75 MW new DC deployments in PG&Es service area that they know about! (expect another 2x) Microsoft: $500m new Chicago facility
Three substations with a capacity of 198MW 200+ shipping containers w/ 2,000 servers each
Power/Cooling Issues
81
First Milestone: DC Energy Conservation

DCs limited by power
For each dollar spent on servers, add $0.48 (2005)/$0.71 (2010) for power/cooling $26B spent to power and cool servers in 2005 grows to $45B in 2010
Within DC racks, network equipment often the hottest components in the hot spot

Thermal Image of Typical Cluster Rack
Rack Switch
M . K . Pa tte rso n , A . Pra tt, P. K u m a r, From UPS to Silicon an end -to-end evaluation of : d a ta ce n te r e ffi e n cy , I te lC o rp o ra ti n ci n o
DC Networking and Power

Selectively power down ports/portions of net elements Enhanced power-awareness in the network stack
Power-aware routing and support for system virtualization
Support for datacenter slice power down and restart
Application and power-aware media access/control

Dynamic selection of full/half duplex Directional asymmetry to save power, e.g., 10Gb/s send, 100Mb/s receive
Power-awareness in applications and protocols
Hard state (proxying), soft state (caching), protocol/data streamlining for power as well as b/w reduction
Power implications for topology design

Tradeoffs in redundancy/high-availability vs. power consumption VLANs support for power-aware system virtualization
Summary
Many areas for research into Cloud Computing!
Datacenter design, languages, scheduling, isolation, energy efficiency (at all levels)
Opportunities to try out research at scale!

Amazon EC2, Open Cirrus,
UC Berkeley
Thank you!
adj@eecs.berkeley.edu http://abovetheclouds.cs.berkeley. edu/
86

Cloud Computing 3999

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Cloud Computing 3999

Încărcat de

Drepturi de autor:

Formate disponibile

UC Berkeley

Cloud Computing: Past, Present, and Future

RWTH Aachen 22 March 2010

RAD Lab 5-year Mission

debugging, monitoring, pwr mgmt, auto-configuration, perf prediction, ...

Highly interdisciplinary faculty & students

Grad/Undergrad teaching integrated with research

NEXUS: A COMMON SUBSTRATE FOR CLUSTER COMPUTING

Recall: Hadoop on HDFS

tasktracker datanode daemon Linux file system

tasktracker datanode daemon Linux file system

tasktracker datanode daemon Linux file system

slave node slave node

What do we want to run in the cluster?

Why share the cluster between frameworks?

Nexus multiplexes resources between frameworks Frameworks control job execution

Option: Coarse-grained sharing

Question 1: Granularity of Sharing

Give framework a (slice of) machine for its entire duration

Question 1: Granularity of Sharing

Nexus: Fine-grained sharing

Question 2: Resource Allocation

Option: Global scheduler

Designing a general-purpose global scheduler is hard

Question 2: Resource Allocation

Nexus: Resource offers

NexusResource master offer

Pick framework to offer to

offer = list of {machine, free_resources}

NexusResource master offer

Pick framework to offer to

Launches & isolates executo

Resource Offer Details

Using Offers for Data Locality

Cluster Share Policy

Example: Hierarchical Fairshare Policy

Allocation policy defines a safe share for each user

How Do We Run MPI?

Users always told their safe share

E.g. Torque gets 40% of cluster

Example: Torque on Nexus

MPI Job MPI MPI MPI Job Job Job

Definition 1: Asset Fairness

Lessons from Definition 1

Def. 2: Dominant Resource Fairness

7000 lines of C++ APIs in C, C++, Java, Python, Ruby

Executor isolation using Linux containers and Solaris projects

Less than 4% seen in practice

Dynamic Resource Sharing

Multiple Hadoops Experiment

Multiple Hadoops Experiment

Results with 16 Hadoops

WEB SERVER FARM FRAMEWORK

Web Framework Experiment

Scheduler (haproxy)Load gen framework

Web Framework Results

CLOUD COMPUTING TESTBEDS

OPEN CIRRUS: SEIZING THE OPEN SOURCE CLOUD STACK OPPORTUNITY

Proprietary Cloud Computing stacks Publicly accessible layer

Application Frameworks EMR Hadoop

Application Frameworks .NET Services

SQL Services, blobs, tables, queues

Fabric Controller Hardware Infrastructure Fabric Controller