Sunteți pe pagina 1din 7

Big Data Assignment-1

Hadoop Configuration
Nithin Mohan
AM.EN.U4CSE15143

Environment

Ubuntu 18.10
JDK 15 or JDK 16
Java 8
Hadoop 2.9.2 (Any Stable Release)

Step 1 – Install Oracle Java 8 on Ubuntu


A)Installation
sudo add-apt-repository ppa:webupd8team/java
sudo apt-get update
sudo apt-get install oracle-java8-installer

B)Verify Java Installation

sudo apt-get install oracle-java8-set-default

java -version

C)Setup JAVA_HOME and JRE_HOME Variable

After installing Java on Linux system, You must have to set JAVA_HOME and JRE_HOME
environment variables. Which is used by many Java applications to find Java libraries during
runtime.

cat >> /etc/environment <<EOL


JAVA_HOME=/usr/lib/jvm/java-8-oracle
JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
EOL

OR

export JAVA_HOME=/usr/lib/jvm/java-8-oracle
export JRE_HOME=/usr/lib/jvm/java-8-oracle/jre
Step 2- Create Hadoop User
A)Creating a normal (nor root) account for Hadoop

adduser hduser

passwd hduser

B)Set up key-based ssh to its own account

sudo apt-get install ssh (Optional Step if not already installed)

su - hduser

ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa


cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 0600 ~/.ssh/authorized_keys

C)Verify key based login

ssh localhost
exit

Step 3 - Download Hadoop 3.2 Archive


cd ~

wget http://www-eu.apache.org/dist/hadoop/common/hadoop-2.9.2/hadoop-
2.9.2.tar.gz
tar xzf hadoop-2.9.2.tar.gz
mv hadoop-2.9.2 hadoop

Step 4 - Setup Hadoop Pseudo-Distributed Mode(Single Node)


A)Setup Hadoop Environment Variables

First, we need to set environment variable uses by Hadoop. Edit ~/.bashrc file and append
following values at end of file.
nano ./.bashrc

OR

sudo gedit ~/.bashrc


export HADOOP_HOME=/home/hduser/hadoop
export HADOOP_INSTALL=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin

Now apply the changes in the current running environment

source ~/.bashrc

Checking whether hadoop is installed properly

hadoop version

Now edit $HADOOP_HOME/etc/hadoop/hadoop-env.sh file and set JAVA_HOME


environment variable. Change the JAVA path as per install on your system.

cd $HADOOP_HOME/etc/hadoop/
nano hadoop-env.sh
//In the file add the line
export JAVA_HOME=/usr/lib/jvm/java-8-oracle

B)Setup Hadoop Configuration Files


We need to configure basic Hadoop single node clusters as per requirements of your Hadoop
infrastructure.
cd $HADOOP_HOME/etc/hadoop

nano core-site.xml

<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
nano hdfs-site.xml

<configuration>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>

<property>
<name>dfs.name.dir</name>
<value>file:///home/hduser/hadoop/hdfs/namenode</value>
</property>

<property>
<name>dfs.data.dir</name>
<value>file:///home/hduser/hadoop/hdfs/datanode</value>
</property>
</configuration>

nano mapred-site.xml

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>

nano yarn-site.xml

<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>

C)Format Namenode

Now format the namenode using the following command, make sure that Storage directory is
hdfs namenode -format
Step 5 - Start Hadoop Cluster
Let’s start your Hadoop cluster using the scripts provides by Hadoop.
start-dfs.sh

start-yarn.sh

jps

Step 6 - Access Hadoop Services in Browser


Hadoop NameNode started on port 50070 default. Access your server on port 50070 in your favorite
web browser.(Its system and OS based)
http://localhost:50070/
Now access port 8088 for getting the information about the cluster and all applications
http://localhost:8088/

Running A Map-Reduce Job on a single node Cluster

cd $HADOOP_HOME
hdfs dfs -mkdir -p input
hdfs dfs -put input.txt input
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.9.2.jar wordcount
input output
hdfs dfs -ls output
hdfs dfs -cat output/part-r-00000

S-ar putea să vă placă și