Sunteți pe pagina 1din 16

HADOOP

MULTI NODE SETUP

Pre-requiste

To proceed with a Multi-node set up, we need


a successful Hadoop single node installation
ready.
Scenario is Add an extra Datanode / Slave
machine to the single node cluster, so it
becomes a multi-node with 2 machines.
Likewise, we can add any number of extra
slave machines to scale up the multi-node.

DataNode Pre-requisites

Same Linux Kernel

Same Java version

Same Hadoop version

At the same path

i.e. In master if you have hadoop installed at


/data/hadoop, the other slaves should also have the
hadoop installed at /data/hadoop

The XML configurations need not be done


manually for every slave. Since, it is the same
set of configuration files, we can transfer it from
master to all the slaves.

Configuration

In the first node, where we setup the single node


already, if the cluster is running, stop all the
daemons.
$ cd /data/hadoop
$ bin/stop-all.sh

Now, in all the configuration files (core-site.xml,


mapred-site.xml, masters, slaves), change the
localhost to the machines ip-address or
hostname.
To get hostname, use hostname -f command.

Configuration(2)
In core-site.xml, for the namenode address,
instead of using localhost, change it to ipaddress or hostname of the machine.
<configuration>
<property>
<name>hadoop.tmp.dir</name>
<value>/data/hdfstmp</value>
</property>
<property>
<name>fs.default.name</name>
<value>hdfs://masternode-hostname:54310</value>
</property>
</configuration>

Configuration(3)
Also in mapred-site.xml, for the jobtracker
address, instead of using localhost, change it to
ip-address or hostname of the machine.
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>jobtracker-hostname:54311</value>
</property>
</configuration>

Configuration(4)
In the slaves file, change the localhost to the
machine where we want to run the slave. Also,
provide the hostname of each slave in each line.
Master needs to know all the slaves, so he can
start the required daemons remotely in them.
hostname-ofslave-1
hostname-ofslave-2
hostname-ofslave-3

In each of the slave, we need to download and


extract the hadoop setup at the same location.
(refer pre-requisites section)

Share Configurations
To each of the slave we transfer the same set of configuration
files (only XML configuration is enough) though secure-copy
(scp). We give this to every slave, since each slave needs to
know the master, so it can send the block reports and
heartbeats.

$ cd /data/hadoop/conf
$ scp *.xml slave-ip-address:/data/hadoop/conf
Run the above command in the master machine, for each
slave. (Note: Ensure that you are in that same path of those
config xmls).
You can login to each of the slave and verify if the files are
successfully transfered.

Start Cluster
In the master node, we need to start the cluster.
$ cd /data/hadoop
$ bin/start-all.sh

We need to verify if the hadoop components


have been started and runnning use jps
command which displays all the java processes
running in the master machine.
$ jps
Namenode
Datanode
Jobtracker
Tasktracker
SecondaryNamenode

Start Cluster(2)
In each of the slave node, we can see that the
Datanode and the Tasktracker daemons are
running.
$ jps
Datanode
Tasktracker

Check the jobtracker WEB UI if the number of


nodes is 3.

Enough, huh?
Now, we need to see,
how to scale out the cluster.

SCLAING OUT
EXISTING CLUSTER

MASTER

S1

S2

S3

slaves

S4

Adding New Slave

In the new slave, ensure the pre-requisites are satisfied as


mentioned before.
In the master machine, add the new slave machines ipaddress or hostname in the slaves file.

hostname-ofslave-1
hostname-ofslave-2
hostname-ofslave-3
hostname-ofnew-slave

Transfer the XML configurations of master to the new slave

$ cd /data/hadoop/conf
$ scp *.xml new-slaves-ip-address:/data/hadoop/conf

Start Daemons

In the new slave, we manually start the slave


daemons datanode and tasktracker.
$ cd /data/hadoop
$ bin/hadoop-daemon.sh start datanode
$ bin/hadoop-daemon.sh start tasktracker

Verify using jps is the daemons are running.


$ jps
Datanode
Tasktracker

Scaled out

In the masters Jobtracker UI (masterip:50030),


we can see that the count has increased to 4.

Now, group work!!

Try to build a 5 node cluster with your fellow training mates!!


and then, scale it out to 8 node cluster, adding one by one.

S-ar putea să vă placă și