Documente Academic
Documente Profesional
Documente Cultură
Note : Create a dedicated user on the Linux machine(master as well as slaves) for Hadoop configuration and installation and make it as root , so suppose we have created a user called cluster so login to the cluster account and start the configuration
2. Installing SSH
Apache Hadoop startup scripts (start-all.sh & stop-all.sh) uses SSH to connect and start hadoop in slaves machines. So, to install SSH follow the steps below Step-1:Install SSH from Ubuntu repository. user1@ubuntu-server:~$ sudo apt-get install ssh
::1 ip6-localhost ip6-loopback fe00::0 ip6-localnet ff00::0 ip6-mcastprefix ff02::1 ip6-allnodes ff02::2 ip6-allrouters Just need to make some changes : 127.0.0.1 #127.0.1.1 192.168.2.118 192.168.2.117 192.168.2.116 192.168.2.56 192.168.2.69 192.168.2.121 localhost localhost shashwat.blr.pointcross.com shashwat chethan tariq alok sandish moses
then try
ssh localhost
Why ssh localhost? To check whether step-1 & 2 was done correctly or not. ssh localhost should connect to localhost without asking for password, because ssh uses public key for authentication and we have already added the public key in authorized_keys file. Copy the content of id_rsa.pub to authorized_keys to others slaves machine, which will be inside .ssh folder, this is required to enable master to communicate with slaves without using any password.
wget http://www.apache.org/dist/hadoop/common/hadoop0.20.2/hadoop-0.20.2.tar.gz 3. sudo tar -xzf hadoop-0.20.2.tar.gz 4. After extracting just give these two commands 1. chown -R cluster hadoop-0.20.2/ 2. chmod -R 755 hadoop-0.20.2 3. Set JAVA_HOME in /hadoop/conf/hadoop-env.sh
2.
<property> <name>fs.default.name</name> <value>hdfs://shashwat:9000</value> <description>The name of the default file system. A URI whose scheme and authority determine the FileSystem implementation. The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class. The uri's authority is used to determine the host, port, etc. for a filesystem.</description> </property>
4. Edit the mapred-site.xml file and put the following lines inside configuration tag /hadoop/conf/mapred-site.xml as follows :
<property> <name>mapred.job.tracker</name> <value>shashwat:54311</value> <description>The host and port that the MapReduce job tracker runs at. If "local", then jobs are run in-process as a single map and reduce task. </description> </property>
5. Edit the hdfs-site.xml file and put the following lines inside configuration tag /hadoop/conf/hdfs-site.xml as follows :
<property> <name>dfs.replication</name> <value>2</value> <description>(According to no </description> </property>
of
nodes) Default
block
replication
Then copy the same hadoop folder to slaves with the same path as in master : supoose hadoop folder is in /home/cluster/hadoop it should exist on slaves too, so you can use following command to copy file from master to slave as follows :
ssh all salves from the master. e.g. shown below ssh alok ssh tariq ssh chethan ssh moses
-r -r -r -r
hadoop-env.sh
hdfs-site.xml
mapred-site.xml
masters
slaves
2. How to check hadoop is running or not? use jps command or goto http://localhost:50070 to get more information on HDFS and goto http://localhost:50030 to get more information on MapReduce