Sunteți pe pagina 1din 15

HDFS Commands and running your first MapReduce Job 1.

Hadoop file commands take the form of hadoop fs - cmd args


Where cmd is the specific file command and <args>is a variable number of arguments.
The command cmd is usually named after the corresponding Unix equivalent. The
below examples follow this pattern. Let us now examine the most common file mgmt
tasks in HDFS including adding files and directories, retrieving files, deleting files etc
We will first list out these commands for your reference and then go through them some
of them as part of the Lab.
2. a.Create a directory - hadoop fs -mkdir <paths>
b.List file contents - hadoop fs -ls <args>
c.List file contents recursively - hadoop fs -lsr <args>
d.Create a file - hadoop fs -touchz <path[filename]>
e.Copy a file from a source to a destination - hadoop fs -cp <source> <dest>
f.Similar to get except that the destination is limited to local files.
hadoop fs -copyToLocal <src:Hdfs> <dest:localFileSystem>
g. Move file from source to destination. hadoop fs -mv <src> <dest>
h.Remove files specified as argument. Deletes directory only when it is empty.
hadoop fs -rm <arg>

3. Login to your Cloudera Virtual instance and log you in using the username,password
(cloudera,cloudera). The screen should look similar to the one below. Open the firefox browser
shown at the toolbar at the top left hand side.

3.The browser opens up as shown below -

4. Click on the bookmarks shown at the top and open the HDFS NameNode and JobTracker interfaces
in separate tabs in Firefox -

5. Now, minimize the Firefox window and then open the Cloudera folder, which is the home folder of
the user (since you are logged in as cloudera) as shown.

6. Right click inside the File Explorer and select Open Terminal as shown.

7. Install wget utility as shown. This utility will be used to download a dataset for input into
MapReduce.
# install wget utility into the virtual image
sudo yum install wget

8. The install should complete shortly.

9. We will now use wget to download Huckleberry Finn from the Internet wget -U firefox http://www.gutenberg.org/cache/epub/76/pg76.txt

10. Now, lets rename the file pg76.txt to HuckFinn.txt -

11. Rename and then Move the file into the HDFS working directory (/user/cloudera) as shown
here mv pg76.txt HuckFinn.txt
hadoop fs -put HuckFinn.txt /user/cloudera

12.Invoke the Hadoop wordcount example as shown and observe the output in the terminal window.
hadoop jar /usr/lib/hadoop-0.20-mapreduce/hadoop-examples.jar wordcount HuckFinn.txt a20.out

13.Open the Task Tracker admin interface and observe the progress of the job.

14. The job should complete as shown in the terminal window.

15. In the Task Tracker interface, click on the JobID to open up some highlights about job execution
as shown below.

16. Verify that there is a a20 directory that has been created by the running of the job.
hadoop fs -ls a20.out

17. Browse the output of the run as shown below to verify that the job completed successfully.
hadoop fs -cat /user/cloudera/a20.out/part-r-00000 | less

S-ar putea să vă placă și