Sunteți pe pagina 1din 3

hadoop --- hdfs and mapreduce

hdfs - file system


hive - sql - facebook
pig - query script - yahoo
sqoop - data migration
mpi - message parsing interface - difficult to implement in hadoop 1. but possib
le in hadoop2
hadoop 2 - both oltp and olap
clients not ready to move to hadoop because of coherency model.
single point of failure in hadoop 1
namespace
hdfs - data pipelining - client api to write data
Namenode has NAMESPACE + block pool
metadate in nn
limitation
sinlge namenode -scalabilty for NN
Namenode- only one NN for all clients so - reduce performance
single point of failure
poo isolation - sepratae volume for tenants
- seprate namespce for application eg hbase
hdfs federation - multiple namenode/namespace
datanode sends information to all namenodes
zera complaints
high availability----
1. to avoid spof
2. a pair of redundant namenode
- once active n one hot standby
- block report to NN and DN
3. standby node does checkpointing
- disappearance of SNN
4. suport both manual and automatic failover
5. backward failuover
chekcpoint is done by active nn but hearbeats are sent to both nn
standby collects the information just
Jouranl node - daemon for nn
shared NN state through the quorum (zookeeper) of jouirnal node
zookeeper - metadata from nn and failover control active n standby
fencing - when down machine again get active, zookeepr wont allow this. make th
at standby
journal node is daemon for zookeeper
fialure detectiion, elect the new NN
two compenet - zookeper quorum, zookeper client - (ZKFC) -zookeper fialover cont
roller
split brain scenario
occerus when both nn try to act as active nn
it wont alow because both make change to namespce
nfs gateway
nrwose the hadfs file system to their local file system
download the file from hdfs file system
supports nfs v3
cannot write directly into hdfs
if you want to mount nfs - write that into hdfs-site.xml
hdfs snapshot
read only copy on write snapshot
snap shot for entire namespace
use case
data backup-prtection against use errors - disaster recovery
JIRA 2802,
other features-
short circuit local reads - impala no need to request the DN, read directly fro
m Local fs
protbuf-wire compatibility - upgrading the versions of dn. rolling upgrade
one the wire encryption
client side mount
bigdatasimplified.blogspot.in. - certification question
map reduce - aim to provide the data locality
more than 1000+ node - data locality in not achived in job trakcer
JT
- manage map reduce slot
- chedule job
-monitor
-counter
-speculative executive
Resource pressure for JT
facebok MR framework - carona - more than one JT
MR1 limitation ---
scalibility
availability
predictability latency
cluster utilisation
lack of suport for laternate paradigm
iterative aplication
protobuf
pig hive
mr1
HDFS
spark
mr, storm, mpi, interastive analysis etc
yarn
hdfs
Jt
1. reousrce manager
a. schedulre - reource allocation
b. aplication manager to launch the aplication master
2. application master
RM - resource manaer
ASM - application manager
S - scheduler
NM - node manager
AM -aplication master
client --1---> RM(ASM---2--> S)
client submit job to ASM
ASM negotiate resource for AM from scheduler
ASM launches AM
AM request resources from scheduler
aplication master Launches container(task)
APACHE TEZ

S-ar putea să vă placă și