Sunteți pe pagina 1din 23

Big Data Platform Elements - Part 1

CIS 415 Lecture 3


Hina Arora

Announcements
We have a Grader!
o Anirudh Dhawan (andhawan@asu.edu)
o Office Hours: Thur 10am-12pm; BA Suite 318

Show of hands did you complete last weeks required


readings?
o Contents of Lecture-1 Deck and any supplemental notes you took in class
o Review: Vocabulary Section in Lecture-1 Deck
o Review: List of common data applications here
o Watch: The Beauty of Data Visualization

Big Data Platform Elements


Virtualization

Map Reduce

Big
Data
Platform
s

Parallel
Programming

Cloud
Computing

What will we cover today?


Virtualization
Cloud Computing

Virtualization

What is Virtualization?
Virtualization means that Applications can use a resource without
any concern for where it resides, what the technical interface is,
how it has been implemented, which platform it uses, and how
much of it is available
~Rick F. Van der Lans in Data Virtualization for Business Intelligence Systems

Well look at a few different types of virtualization:


o Server Virtualization can be HW-level or OS-level Virtualization
o Storage Virtualization
o Network Virtualization
o Desktop Virtualization
o Application Virtualization

Server Virtualization: HW-level virtualization


Ability to run multiple Virtual Machines (VMs or guests) on a single Physical
Machine (host).
Each Virtual Machine emulates the underlying physical hardware and has an
Operating System (OS).
Guest VMs are mostly completely isolated from each other.
Each guest VM can run a different OS.
Hypervisors (or Virtual Machine Monitors or VMMs) are used to App
create and
run
App
Bins/Li
Bins/Lib
VMs. There are two types of hypervisors:
VM
bs
s
Guest
Guest
OS
OS
Hypervisor Type-1

o Type-1, Native or Bare-metal Hypervisors:

Server

Run directly on the host's hardware.


Example: Hyper-V Hypervisor.
o Type-2 or Hosted Hypervisors:
Run on the hosts OS.

VM

App
App
Bins/Li
Bins/Lib
bs
s
Guest
Guest
OS
OS
Hypervisor Type-2
Host OS
Server

Example: VMware Player, VirtualBox

erence: https://en.wikipedia.org/wiki/Hypervisor
Server Virtualization provides

improved utilization, and scalability

Server Virtualization: OS-level virtualization


Ability to run multiple isolated Containers (user-space instances or guests)
on a single Physical Machine (host).
Containers do not emulate the underlying HW and dont have their own OS
(they share the host OS). This lighter footprint allows hosts to support a
higher density of guest Containers (as against guest VMs). But on the flip
side raises Security concerns.
Containers can also share binaries and libraries with other Containers.
Example: Docker
Container

App
Bins/Lib
s

App

App

Bins/Libs

Docke
r

Each Container typically runs a single Application.

Host OS
Server

ence: https://en.wikipedia.org/wiki/Operating-system-level_virtualization

Review: Storage Definitions


Block
o A sequence of bytes.
o Storage systems typically provide access to blocks.
o The OS typically abstracts other logical views like files and records.

Striping
o Sequential blocks of data are stored on different physical storage devices in (typically) round-robin fashion.
o Example: Disk1 <A, C, E>; Disk2 <B, D, F>
o Striping is useful when requests for data are faster than a single storage device can deliver. Striping data across multiple storage devices
allows for concurrent access to data thereby improving performance.

Mirroring
o Replication of data onto separate disks in real time.
o Example: Disk1 <A, B, C>; Disk2 <A, B, C>
o Improves data redundancy and reliability.

Parity
o When data on a crashed disk can be reconstructed using data on other disks (using the XOR operation)
o Example: Disk1 <A:11010011>; Disk2 <B:10011001>; Disk3 <PAB: 01001010>
Essentially, PAB = A XOR B, so is any one disk crashes, you can reconstruct using XOR operation between other two

o Improves data redundancy

File System:
o Controls how data is managed, stored and retrieved.
o Without a file system, we would just have a large blob of data with no way to identify different connected pieces of information.
o File systems are organized around groups of data called files, and groups of files called directories or folders.
o Distributed files systems are files systems that are spread across multiple servers.

Reference: Wikipedia

Storage Virtualization

Data is abstracted into what appears to be a single storage unit, while the physical
storage actually spans multiple heterogeneous devices and often locations

Storage Virtualization provides location independence, improved utilization,


performance, reliability and availability

Example: RAID (redundant array of independent/inexpensive disks)


Popular
RAID
Types

Striping
(provides
excellent
performance)

Mirroring
(provides
excellent
redundancy)

Parity
(provides
good
redundancy)

Minimum
Number
of Disks

Example
(Disk Blocks)

Comments

RAID 0

Yes

No

No

Disk 1 -- A, C, E
Disk 2 -- B, D, F

Excellent Performance.
No Redundancy.
Do not use for critical
applications.

RAID 1

No

Yes

No

Disk 1 -- A, B, C
Disk 2 -- A, B, C

Good Performance.
Excellent Redundancy.

RAID 5

Yes

No

Yes
(Distributed
Parity)

Disk 1 A, C,
PEF
Disk 2 B, PCD,
E
Disk 3 PAB, D,
F

Good Performance.
Good Redundancy.
Most cost effective.
Fast Reads; Slow Writes.

RAID 10

Yes

Yes

No

Disk
Disk
Disk
Disk

Excellent Performance.
Excellent Redundancy.
Great for mission critical
applications.

ference: https://en.wikipedia.org/wiki/RAID

1
2
3
4

-----

A,
A,
B,
B,

C, E
C, E
D, F
D, F

Review: Network Definitions

Local Area Network (LAN):


o

Wide Area Network (WAN):


o

A computer network that spans large geographical areas

IP Address
o

Address of a device participating in a network

IPv4: 32 bits | IPv6: 128 bits

Example: 11000000.10101000.00000101.10000010 (192.168.5.130)

Higher order bits determine network (indicated by subnet mask), and lower order bits determine host (device)

Subnetting:
o

Dividing a network into smaller parts

This affects the total number of hosts that can be addressed

Switch:
o

A computer network with interconnected devices within a limited geographical area such as a house or building.

Connects devices together on a computer network

Router
o

Carry traffic from one network/subnet to the other

Routers maintain routing tables to determine whether traffic is meant for this LAN, a connected LAN or a different
network.

Example: the home router connects home computers to the internet (these are similar networks since they both share
TCP/IP protocol)

Reference: Wikipedia

Image Source: http://netprivateer.com/lanwan.h

Network Virtualization

Creation of logical, virtual networks that are decoupled from the (limitations of) underlying
physical hardware.

Example: VLAN, VPN


o

Virtual Local Area Network (VLAN)

Allows for grouping of hosts within a virtual LAN regardless of geographical


location

mage Source: link

Provides scalability, flexibility, simplified administration, and security

Virtual Private Network (VPN)

Securely extends a private network over a public network such as the internet

Users can remotely communicate with the private network as though they were
directly connected to it with the same functionality, security and administrative
policies

Provides flexibility, simplified administration, and security

Image Source: https://en.wikipedia.org/wiki/Virtual_private_ne

(Remote) Desktop Virtualization


Enables access to applications on a remote OS using a virtual desktop.
The remote OS carries the application and data, and only the display, keyboard, and mouse
information are communicated with the local client device.
Users (on the local client devices) must establish a session and be connected with the
remote server to access the application.
Makes installation, upgrades and management of applications easier for IT.
Two kinds: RDS, VDI
Remote Desktop Services (RDS) aka Terminal Services
o Provides remote desktop to multiple users on a Host OS
o Provides users session-based isolation (session virtualization) - users share Host OS
o Users have no admin privileges on the host OS
o Can support higher user density

Virtual Desktop Infrastructure (VDI)


o Provides remote desktop to multiple users on Guest OSs
o Provides users VM-based isolation - each user gets a dedicated Guest OS
o Users have admin privileges on the Guest OS
o Support lower users density

Application Virtualization

Application Virtualization separates the Application from the OS, so Applications can
be more easily deployed and delivered.

The application is packaged and streamed from the server down the network to the
client and, instead of being installed on the client device, is executed on the local
device in a virtual bubble that is completely isolated from the client OS.

Applications are streamed intelligently.


o

Only required parts are streamed as and when they are used.

Once the application has been streamed, it is cached on the client device so it doesnt have
to be streamed every time a user uses it on the client. This also means the application can
be used even when the client is not connected to the server.

When an application upgrade is available, the server copy is upgraded, and the upgrades are
streamed down to the clients the next time the application is used on the client.

Makes installation, upgrades and management of applications easier for IT.

Examples: VMware ThinApp, Citrix XenApp and Microsoft App-V

Reference: http://blogs.msdn.com/b/ianm/archive/2010/06/11/microsoft-virtual-desktop-101-making-sense-of-vdi-rds-app-v-med-v-and-

Cloud Computing

Have you used Applications Hosted on the Cloud?

What are some characteristics these applications have in common*?

You typically sign up for service (free with ads, free trial, or subscription)

You connect to the internet for access

You dont need to install application software, and version upgrades are
pushed seamlessly

You expect reliable, on-demand, self-service of the application

You expect ability to instantaneously upgrade (eg more storage, no ads,


etc)

You rely on the service provider for infrastructure (eg: you dont set up mail
server)

You rely on the service provider for security and privacy

You rely on the service provider for backup and recovery

*Note: a lot of these services come with clients apps we are not considering
that scenario here.

What is Cloud Computing?

Cloud computing is a model for enabling convenient, on-demand network


access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned
and released with minimal management effort or service provider interaction.

Key enabling technologies include: (1) fast wide-area networks, (2) powerful,
inexpensive server computers, and (3) high-performance virtualization for
commodity hardware.

ource: http://www.nist.gov/itl/cloud/

http://
www.intel.com/content/www/us/en/cloud-computing/cloud-101-vid
eo.html

Deployment Models
There are 3 basic deployment models in cloud computing:

Private Cloud
o

Two kinds of private clouds:

On-Prem Private Cloud: On-Prem Data Center + Network Virtualization + Cloud Orchestration Software

Externally Hosted Private Cloud (also called Virtual Private Cloud): Logically isolated, user-defined, and usercontrolled portion of a 3rd party hosted cloud (like AWS or Microsoft).

Provides high degree of Control

Good for highly-sensitive data and applications

Public Cloud
o

Third-Party Provides Cloud Services (3 different service models - IaaS, PaaS, or SaaS)

Typically pay-as-you-go model (you pay for what you use)

Service Provider held to agreed upon availability, reliability, privacy and security standards

Provides high degree of Scalability

Example: Amazon AWS, Microsoft Azure, Google Cloud

Hybrid Cloud
o

Combination of Private and Public Cloud

Allows you to pick desired level of Control vs Scalability

Service Models
There are 4 basic service models in cloud computing, based on what
parts of the stack the User controls vs what the Cloud Provider
manages.

Private: User controls everything from the networking


to the applications. Example: users on-premise
datacenter.

IaaS: User controls the application down to the


underlying OS, and the Cloud Provider manages the
virtualization layer and the hardware. Example: getting
a virtual server in the cloud.

PaaS: User controls application and data, and the Cloud


Provider provisions the underlying supporting
infrastructure, typically including operating system,
programming-language execution environment,
database, and web servers. This allows developers to
focus on application development instead of worrying
about underlying hardware and software layers.

SaaS: User gains access to application software and


databases. Cloud providers install and operate
application software, and manage the infrastructure and
platforms that run the applications. Example: O365 in
* Note: Managed by Microsoft is just an example its essentially cloud provider of your choice
the cloud.

erence: https://en.wikipedia.org/wiki/Cloud_computing

Image Source: http://cloudcomputing.sys-con.com/node/2932

Key Characteristics

On-demand self-service:
A consumer can provision computing capabilities, as needed automatically without requiring human interaction
with each service provider.

Device and location independence:


Users can access service using a web browser regardless of location or device used (e.g., PC, mobile phone).

Resource pooling:
Computing resources are pooled to serve multiple consumers, with different physical and virtual resources
dynamically assigned and reassigned according to consumer demand.

Scalability and elasticity:


Dynamic on-demand provisioning of resources on a fine-grained, self-service basis in near real-time without
users having to engineer for peak loads.

Measured service:
Resource usage can be monitored, controlled, and reported, providing transparency for both the provider and
consumer of the utilized service.

Reference: http://www.nist.gov/itl/cloud/ and https://

Advantages and Risks

Advantages
o

Scalability and elasticity by design (dynamic on-demand provisioning of resources)

Convenience by design (device and location independence)

Continuous Availability by design (on-demand self-service)

Improved Reliability due to use of multiple redundant sites

Faster Deployment since infrastructure set up is quick, and software integration is easier

Cost Reduction due to savings on sunk cost of infrastructure, licenses, and maintenance

Risks
o

Limited Control over infrastructure, software, and data

Security and Privacy of data is at the mercy of the Service Provider

Dependency on the Provider can lead to vendor lock-in and migration challenges

Downtime of service can occur due to Service Provider outage or network access issues

Reference: https://en.wikipedia.org/wiki/Cloud_computing

What did we learn today?


Four key elements make up big data platforms:
o Virtualization, Cloud Computing, Parallel Programming and Map Reduce.

Virtualization means that Applications can use a resource without any concern
for where it resides, what the technical interface is, how it has been
implemented, which platform it uses, and how much of it is available.
o Virtualization can occur at different levels of the stack: Server, Storage, Network, Desktop
and Application.

Cloud computing is a model for enabling convenient, on-demand network


access to a shared pool of configurable computing resources (e.g., networks,
servers, storage, applications, and services) that can be rapidly provisioned
and released with minimal management effort or service provider interaction.
o Three Deployment Models: Private, Public, Hybrid.
o Four Service Models: Private, IaaS, PaaS, SaaS.
o There are Advantages and Risks involved in Cloud Computing that one must be aware.

Required Readings for this Lecture


Contents of this Deck
o Note: Anything Ive linked to as Source, Reference, or Optional Reading in the deck is not required reading.

Supplemental notes you take during class

Homework - spend a 5-10 minutes on each of these Sites: Amazon AWS, Microsoft Azure,
Google Cloud
o Do you now see a number of familiar terms on these sites?
o What deployment models do they cover?
o What service models do they cover?
o Note how they all have very similar competing offers (including free trials to improve adoption).

S-ar putea să vă placă și