Documente Academic
Documente Profesional
Documente Cultură
3. Login to your Cloudera Virtual instance and log you in using the username,password
(cloudera,cloudera). The screen should look similar to the one below. Open the firefox browser
shown at the toolbar at the top left hand side.
4. Click on the bookmarks shown at the top and open the HDFS NameNode and JobTracker interfaces
in separate tabs in Firefox -
5. Now, minimize the Firefox window and then open the Cloudera folder, which is the home folder of
the user (since you are logged in as cloudera) as shown.
6. Right click inside the File Explorer and select Open Terminal as shown.
7. Install wget utility as shown. This utility will be used to download a dataset for input into
MapReduce.
# install wget utility into the virtual image
sudo yum install wget
9. We will now use wget to download Huckleberry Finn from the Internet wget -U firefox http://www.gutenberg.org/cache/epub/76/pg76.txt
11. Rename and then Move the file into the HDFS working directory (/user/cloudera) as shown
here mv pg76.txt HuckFinn.txt
hadoop fs -put HuckFinn.txt /user/cloudera
12.Invoke the Hadoop wordcount example as shown and observe the output in the terminal window.
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount HuckFinn.txt a20.out
13.Open the Task Tracker admin interface and observe the progress of the job.
15. In the Task Tracker interface, click on the JobID to open up some highlights about job execution
as shown below.
16. Verify that there is a a20 directory that has been created by the running of the job.
hadoop fs -ls a20.out
17. Browse the output of the run as shown below to verify that the job completed successfully.
hadoop fs -cat /user/cloudera/a20.out/part-r-00000 | less