Sunteți pe pagina 1din 2

Replication factor :

dfs.replication=3 in hdfs-site.xml
Hdfs federation and high availability :
Hdfs federation introduce to address the limitation of memory of name node to sc
alability.
datanode can be associated with multiple namenodes.Namenodes do not communicate
with each other .
Fencing is to avoid previously active name node to do any damage.
lot of small file occupies a lot of name node main memory
MapTask Output is written to the Local Disk .
Job tracker cleans up the Map output only after successful completion of the red
ucer job.
Mpa output is written to Hdfs only when there are zero reducers.
Map output :
Merge
sort
Partitioning
Understand MapReduce Mechanism -I :
Map : Input : key values and output key,values
Key : Byteoffset- programmer has control
Default mapper is also called as Identity mapper - Copies key n Value pairs with
no processing.
Understand MapReduce Mechanism -II :
Role of Context is to catch the Output Key and value Pair
1st - Declare the type parameters ()
2nd -Overwrite the map function
3rd - Write the Logic
The driver is responsible for initializing the job with its configuration detail
s, specifying the mapper and the reducer classes for the job, informing the Hado
op platform to execute the code on the specified input file(s) and controlling t
he location where the output files are placed.
Combiner :
Extends reducer class

Applied only when the naature of the problem is commutative and associative
Combiners run multiple times on map output.

S-ar putea să vă placă și