hive - sql - facebook pig - query script - yahoo sqoop - data migration mpi - message parsing interface - difficult to implement in hadoop 1. but possib le in hadoop2 hadoop 2 - both oltp and olap clients not ready to move to hadoop because of coherency model. single point of failure in hadoop 1 namespace hdfs - data pipelining - client api to write data Namenode has NAMESPACE + block pool metadate in nn limitation sinlge namenode -scalabilty for NN Namenode- only one NN for all clients so - reduce performance single point of failure poo isolation - sepratae volume for tenants - seprate namespce for application eg hbase hdfs federation - multiple namenode/namespace datanode sends information to all namenodes zera complaints high availability---- 1. to avoid spof 2. a pair of redundant namenode - once active n one hot standby - block report to NN and DN 3. standby node does checkpointing - disappearance of SNN 4. suport both manual and automatic failover 5. backward failuover chekcpoint is done by active nn but hearbeats are sent to both nn standby collects the information just Jouranl node - daemon for nn shared NN state through the quorum (zookeeper) of jouirnal node zookeeper - metadata from nn and failover control active n standby fencing - when down machine again get active, zookeepr wont allow this. make th at standby journal node is daemon for zookeeper fialure detectiion, elect the new NN two compenet - zookeper quorum, zookeper client - (ZKFC) -zookeper fialover cont roller split brain scenario occerus when both nn try to act as active nn it wont alow because both make change to namespce nfs gateway nrwose the hadfs file system to their local file system download the file from hdfs file system supports nfs v3 cannot write directly into hdfs if you want to mount nfs - write that into hdfs-site.xml hdfs snapshot read only copy on write snapshot snap shot for entire namespace use case data backup-prtection against use errors - disaster recovery JIRA 2802, other features- short circuit local reads - impala no need to request the DN, read directly fro m Local fs protbuf-wire compatibility - upgrading the versions of dn. rolling upgrade one the wire encryption client side mount bigdatasimplified.blogspot.in. - certification question map reduce - aim to provide the data locality more than 1000+ node - data locality in not achived in job trakcer JT - manage map reduce slot - chedule job -monitor -counter -speculative executive Resource pressure for JT facebok MR framework - carona - more than one JT MR1 limitation --- scalibility availability predictability latency cluster utilisation lack of suport for laternate paradigm iterative aplication protobuf pig hive mr1 HDFS spark mr, storm, mpi, interastive analysis etc yarn hdfs Jt 1. reousrce manager a. schedulre - reource allocation b. aplication manager to launch the aplication master 2. application master RM - resource manaer ASM - application manager S - scheduler NM - node manager AM -aplication master client --1---> RM(ASM---2--> S) client submit job to ASM ASM negotiate resource for AM from scheduler ASM launches AM AM request resources from scheduler aplication master Launches container(task) APACHE TEZ