Hadoop Multinode Setup

HADOOP
MULTI NODE SETUP
Pre-requiste
To proceed with a Multi-node set up, we need

a successful Hadoop single node installation
ready.
Scenario is Add an extra Datanode / Slave
machine to the single node cluster, so it
becomes a multi-node with 2 machines.
Likewise, we can add any number of extra
slave machines to scale up the multi-node.
DataNode Pre-requisites
Same Linux Kernel
Same Java version
Same Hadoop version
At the same path
i.e. In master if you have hadoop installed at

/data/hadoop, the other slaves should also have the
hadoop installed at /data/hadoop
The XML configurations need not be done

manually for every slave. Since, it is the same
set of configuration files, we can transfer it from
master to all the slaves.
Configuration
In the first node, where we setup the single node

already, if the cluster is running, stop all the
daemons.
$ cd /data/hadoop
$ bin/stop-all.sh
Now, in all the configuration files (core-site.xml,

mapred-site.xml, masters, slaves), change the
localhost to the machines ip-address or
hostname.
To get hostname, use hostname -f command.
Configuration(2)
In core-site.xml, for the namenode address,
instead of using localhost, change it to ipaddress or hostname of the machine.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hdfstmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://masternode-hostname:54310</value>
</property>
</configuration>
Configuration(3)
Also in mapred-site.xml, for the jobtracker
address, instead of using localhost, change it to
ip-address or hostname of the machine.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jobtracker-hostname:54311</value>
</property>
</configuration>
Configuration(4)
In the slaves file, change the localhost to the
machine where we want to run the slave. Also,
provide the hostname of each slave in each line.
Master needs to know all the slaves, so he can
start the required daemons remotely in them.
hostname-ofslave-1
hostname-ofslave-2
hostname-ofslave-3
In each of the slave, we need to download and

extract the hadoop setup at the same location.
(refer pre-requisites section)
Share Configurations
To each of the slave we transfer the same set of configuration
files (only XML configuration is enough) though secure-copy
(scp). We give this to every slave, since each slave needs to
know the master, so it can send the block reports and
heartbeats.
$ cd /data/hadoop/conf
$ scp *.xml slave-ip-address:/data/hadoop/conf
Run the above command in the master machine, for each
slave. (Note: Ensure that you are in that same path of those
config xmls).
You can login to each of the slave and verify if the files are
successfully transfered.
Start Cluster
In the master node, we need to start the cluster.
$ cd /data/hadoop
$ bin/start-all.sh
We need to verify if the hadoop components

have been started and runnning use jps
command which displays all the java processes
running in the master machine.
$ jps
Namenode
Datanode
Jobtracker
Tasktracker
SecondaryNamenode
Start Cluster(2)
In each of the slave node, we can see that the
Datanode and the Tasktracker daemons are
running.
$ jps
Datanode
Tasktracker
Check the jobtracker WEB UI if the number of

nodes is 3.
Enough, huh?
Now, we need to see,
how to scale out the cluster.
SCLAING OUT
EXISTING CLUSTER
MASTER
S1
S2
S3
slaves
S4
Adding New Slave
In the new slave, ensure the pre-requisites are satisfied as

mentioned before.
In the master machine, add the new slave machines ipaddress or hostname in the slaves file.
hostname-ofslave-1
hostname-ofslave-2
hostname-ofslave-3
hostname-ofnew-slave
Transfer the XML configurations of master to the new slave
$ cd /data/hadoop/conf
$ scp *.xml new-slaves-ip-address:/data/hadoop/conf
Start Daemons
In the new slave, we manually start the slave

daemons datanode and tasktracker.
$ cd /data/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker
Verify using jps is the daemons are running.

$ jps
Datanode
Tasktracker
Scaled out
In the masters Jobtracker UI (masterip:50030),

we can see that the count has increased to 4.
Now, group work!!
Try to build a 5 node cluster with your fellow training mates!!

and then, scale it out to 8 node cluster, adding one by one.

Hadoop Multinode Setup

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Hadoop Multinode Setup

Încărcat de

Drepturi de autor:

Formate disponibile

HADOOP

MULTI NODE SETUP

To proceed with a Multi-node set up, we need

Same Linux Kernel

Same Java version

Same Hadoop version

At the same path

i.e. In master if you have hadoop installed at

The XML configurations need not be done

In the first node, where we setup the single node

Now, in all the configuration files (core-site.xml,

In each of the slave, we need to download and

We need to verify if the hadoop components

Check the jobtracker WEB UI if the number of

Adding New Slave

In the new slave, ensure the pre-requisites are satisfied as

Transfer the XML configurations of master to the new slave

In the new slave, we manually start the slave

Verify using jps is the daemons are running.

In the masters Jobtracker UI (masterip:50030),

Now, group work!!

Try to build a 5 node cluster with your fellow training mates!!

S-ar putea să vă placă și