Sunteți pe pagina 1din 36

B.

RAMAMURTHY
Hadoop File System
07/28/14
1
Reference

The Hadoop Distribted File System! "rchitectre a


nd Desi#n by "pache Fondation $nc%
07/28/14
2
&asic Featres! HDFS

Hi#hly falt'tolerant

Hi#h thro#hpt

Sitable for applications (ith lar#e data sets

Streamin# access to file system data

)an be bilt ot of commodity hard(are


07/28/14
*
Falt tolerance

Failre is the norm rather than e+ception

" HDFS instance may consist of thosands of ser,er


machines- each storin# part of the file system.s data%

Since (e ha,e h#e nmber of components and that


each component has non'tri,ial probability of failre
means that there is al(ays some component that is
non'fnctional%

Detection of falts and /ic0- atomatic reco,ery


from them is a core architectral #oal of HDFS%
07/28/14
4
Data )haracteristics

Streamin# data access

"pplications need streamin# access to data

&atch processin# rather than interacti,e ser access%

1ar#e data sets and files! #i#abytes to terabytes si2e

Hi#h a##re#ate data band(idth

Scale to hndreds of nodes in a clster

Tens of millions of files in a sin#le instance

3rite'once'read'many! a file once created- (ritten and


closed need not be chan#ed 4 this assmption simplifies
coherency

" map'redce application or (eb'cra(ler application fits


perfectly (ith this model%
07/28/14
5
Cat
Bat
Dog
Other
Words
(size:
TByte)
map
map
map
map
split
split
split
split
combine
combine
combine
reduce
reduce
reduce
part0
part1
part
6apRedce
07/28/14
7
"rchitectre
07/28/14
7
8amenode and Datanodes

6aster/sla,e architectre

HDFS clster consists of a sin#le Namenode- a master ser,er that


mana#es the file system namespace and re#lates access to files by
clients%

There are a nmber of DataNodes sally one per node in a


clster%

The Data8odes mana#e stora#e attached to the nodes that they rn


on%

HDFS e+poses a file system namespace and allo(s ser data to be


stored in files%

" file is split into one or more bloc0s and set of bloc0s are stored in
Data8odes%

Data8odes! ser,es read- (rite re/ests- performs bloc0 creation-


deletion- and replication pon instrction from 8amenode%
07/28/14
8
HDFS "rchitectre
07/28/14
9
8amenode
&
replication
Rac01
Rac02
)lient
&loc0s
Datanodes
Datanodes
)lient
3rite
Read
6etadata ops
6etadata:8ame- replicas%%;
:/home/foo/data-7% %%
&loc0 ops
File system 8amespace
07/28/14
10

Hierarchical file system (ith directories and files

)reate- remo,e- mo,e- rename etc%

8amenode maintains the file system

"ny meta information chan#es to the file system


recorded by the 8amenode%

"n application can specify the nmber of replicas of


the file needed! replication factor of the file% This
information is stored in the 8amenode%
Data Replication
07/28/14
11

HDFS is desi#ned to store ,ery lar#e files across


machines in a lar#e clster%

<ach file is a se/ence of bloc0s%

"ll bloc0s in the file e+cept the last are of the same
si2e%

&loc0s are replicated for falt tolerance%

&loc0 si2e and replicas are confi#rable per file%

The 8amenode recei,es a Heartbeat and a


&loc0Report from each Data8ode in the clster%

&loc0Report contains all the bloc0s on a Datanode%


Replica =lacement
07/28/14
12

The placement of the replicas is critical to HDFS reliability and performance%

>ptimi2in# replica placement distin#ishes HDFS from other distribted file


systems%

Rac0'a(are replica placement!

?oal! impro,e reliability- a,ailability and net(or0 band(idth tili2ation

Research topic

6any rac0s- commnication bet(een rac0s are thro#h s(itches%

8et(or0 band(idth bet(een machines on the same rac0 is #reater than those in
different rac0s%

8amenode determines the rac0 id for each Data8ode%

Replicas are typically placed on ni/e rac0s

Simple bt non'optimal

3rites are e+pensi,e

Replication factor is *

"nother research topic@

Replicas are placed! one on a node in a local rac0- one on a different node in the
local rac0 and one on a node in a different rac0%

1/* of the replica on a node- 2/* on a rac0 and 1/* distribted e,enly across
remainin# rac0s%
Replica Selection
07/28/14
1*

Replica selection for R<"D operation! HDFS tries to


minimi2e the band(idth consmption and latency%

$f there is a replica on the Reader node then that is


preferred%

HDFS clster may span mltiple data centers!


replica in the local data center is preferred o,er the
remote one%
Safemode Startp
07/28/14
14

>n startp 8amenode enters Safemode%

Replication of data bloc0s do not occr in Safemode%

<ach Data8ode chec0s in (ith Heartbeat and


&loc0Report%

8amenode ,erifies that each bloc0 has acceptable


nmber of replicas

"fter a confi#rable percenta#e of safely replicated


bloc0s chec0 in (ith the 8amenode- 8amenode e+its
Safemode%

$t then ma0es the list of bloc0s that need to be replicated%

8amenode then proceeds to replicate these bloc0s to


other Datanodes%
Filesystem 6etadata
07/28/14
15

The HDFS namespace is stored by 8amenode%

8amenode ses a transaction lo# called the <dit1o#


to record e,ery chan#e that occrs to the filesystem
meta data%

For e+ample- creatin# a ne( file%

)han#e replication factor of a file

<dit1o# is stored in the 8amenode.s local filesystem

<ntire filesystem namespace incldin# mappin# of


bloc0s to files and file system properties is stored in a
file Fs$ma#e% Stored in 8amenode.s local filesystem%
8amenode
07/28/14
17

Aeeps ima#e of entire file system namespace and file


&loc0map in memory%

4?& of local R"6 is sfficient to spport the abo,e data


strctres that represent the h#e nmber of files and
directories%

3hen the 8amenode starts p it #ets the Fs$ma#e and


<ditlo# from its local file system- pdate Fs$ma#e (ith
<dit1o# information and then stores a copy of the
Fs$ma#e on the filesytstem as a chec0point%

=eriodic chec0pointin# is done% So that the system can


reco,er bac0 to the last chec0pointed state in case of a
crash%
Datanode
07/28/14
17

" Datanode stores data in files in its local file system%

Datanode has no 0no(led#e abot HDFS filesystem

$t stores each bloc0 of HDFS data in a separate file%

Datanode does not create all files in the same directory%

$t ses heristics to determine optimal nmber of files


per directory and creates directories appropriately!

Research isse@

3hen the filesystem starts p it #enerates a list of all


HDFS bloc0s and send this report to 8amenode!
&loc0report%
=rotocol
07/28/14
18
The )ommnication =rotocol
07/28/14
19

"ll HDFS commnication protocols are layered on top of


the T)=/$= protocol

" client establishes a connection to a confi#rable T)=


port on the 8amenode machine% $t tal0s )lient=rotocol
(ith the 8amenode%

The Datanodes tal0 to the 8amenode sin# Datanode


protocol%

R=) abstraction (raps both )lient=rotocol and


Datanode protocol%

8amenode is simply a ser,er and ne,er initiates a


re/estB it only responds to R=) re/ests issed by
Data8odes or clients%
Robstness
07/28/14
20
>bCecti,es

=rimary obCecti,e of HDFS is to store data reliably in


the presence of failres%

Three common failres are! 8amenode failre-


Datanode failre and net(or0 partition%
07/28/14
21
Data8ode failre and heartbeat

" net(or0 partition can case a sbset of Datanodes to


lose connecti,ity (ith the 8amenode%

8amenode detects this condition by the absence of a


Heartbeat messa#e%

8amenode mar0s Datanodes (ithot Hearbeat and does


not send any $> re/ests to them%

"ny data re#istered to the failed Datanode is not


a,ailable to the HDFS%

"lso the death of a Datanode may case replication factor


of some of the bloc0s to fall belo( their specified ,ale%
07/28/14
22
Re'replication

The necessity for re'replication may arise de to!

" Datanode may become na,ailable-

" replica may become corrpted-

" hard dis0 on a Datanode may fail- or

The replication factor on the bloc0 may be increased%


07/28/14
2*
)lster Rebalancin#

HDFS architectre is compatible (ith data


rebalancin# schemes%

" scheme mi#ht mo,e data from one Datanode to


another if the free space on a Datanode falls belo( a
certain threshold%

$n the e,ent of a sdden hi#h demand for a particlar


file- a scheme mi#ht dynamically create additional
replicas and rebalance other data in the clster%

These types of data rebalancin# are not yet


implemented! research isse%
07/28/14
24
Data $nte#rity

)onsider a sitation! a bloc0 of data fetched from


Datanode arri,es corrpted%

This corrption may occr becase of falts in a stora#e


de,ice- net(or0 falts- or b##y soft(are%

" HDFS client creates the chec0sm of e,ery bloc0 of its


file and stores it in hidden files in the HDFS namespace%

3hen a clients retrie,es the contents of file- it ,erifies


that the correspondin# chec0sms match%

$f does not match- the client can retrie,e the bloc0 from a
replica%
07/28/14
25
6etadata Dis0 Failre

Fs$ma#e and <dit1o# are central data strctres of HDFS%

" corrption of these files can case a HDFS instance to be


non'fnctional%

For this reason- a 8amenode can be confi#red to maintain


mltiple copies of the Fs$ma#e and <dit1o#%

6ltiple copies of the Fs$ma#e and <dit1o# files are


pdated synchronosly%

6eta'data is not data'intensi,e%

The 8amenode cold be sin#le point failre! atomatic


failo,er is 8>T spportedD "nother research topic%
07/28/14
27
Data >r#ani2ation
07/28/14
27
Data &loc0s

HDFS spport (rite'once'read'many (ith reads at


streamin# speeds%

" typical bloc0 si2e is 746& :or e,en 128 6&;%

" file is chopped into 746& chn0s and stored%


07/28/14
28
Sta#in#

" client re/est to create a file does not reach 8amenode


immediately%

HDFS client caches the data into a temporary file% 3hen


the data reached a HDFS bloc0 si2e the client contacts the
8amenode%

8amenode inserts the filename into its hierarchy and


allocates a data bloc0 for it%

The 8amenode responds to the client (ith the identity of


the Datanode and the destination of the replicas
:Datanodes; for the bloc0%

Then the client flshes it from its local memory%


07/28/14
29
Sta#in# :contd%;

The client sends a messa#e that the file is closed%

8amenode proceeds to commit the file for creation


operation into the persistent store%

$f the 8amenode dies before file is closed- the file is


lost%

This client side cachin# is re/ired to a,oid net(or0


con#estionB also it has precedence is "FS :"ndre(
file system;%
07/28/14
*0
Replication =ipelinin#

3hen the client recei,es response from 8amenode-


it flshes its bloc0 in small pieces :4A; to the first
replica- that in trn copies it to the ne+t replica and
so on%

Ths data is pipelined from Datanode to the ne+t%


07/28/14
*1
"=$ :"ccessibility;
07/28/14
*2
"pplication =ro#rammin# $nterface

HDFS pro,ides Ea,a "=$ for application to se%

=ython access is also sed in many applications%

" ) lan#a#e (rapper for Ea,a "=$ is also a,ailable%

" HTT= bro(ser can be sed to bro(se the files of a


HDFS instance%
07/28/14
**
FS Shell- "dmin and &ro(ser $nterface

HDFS or#ani2es its data in files and directories%

$t pro,ides a command line interface called the FS shell


that lets the ser interact (ith data in the HDFS%

The synta+ of the commands is similar to bash and csh%

<+ample! to create a directory /foodir


/bin/hadoop dfs 4m0dir /foodir

There is also DFS"dmin interface a,ailable

&ro(ser interface is also a,ailable to ,ie( the


namespace%
07/28/14
*4
Space Reclamation

3hen a file is deleted by a client- HDFS renames file to a


file in be the /trash directory for a confi#rable amont
of time%

" client can re/est for an ndelete in this allo(ed time%

"fter the specified time the file is deleted and the space
is reclaimed%

3hen the replication factor is redced- the 8amenode


selects e+cess replicas that can be deleted%

8e+t heartbeat:@; transfers this information to the


Datanode that clears the bloc0s for se%
07/28/14
*5
Smmary

3e discssed the featres of the Hadoop File


System- a peta'scale file system to handle bi#'data
sets%

3hat discssed! "rchitectre- =rotocol- "=$- etc%

6issin# element! $mplementation

The Hadoop file system :internals;

"n implementation of an instance of the HDFS :for se by


applications sch as (eb cra(lers;%
07/28/14
*7

S-ar putea să vă placă și