Had Oop Setup

Hadoop Setup
Prerequisite:
System: Mac OS / Linux / Cygwin on Windows Notice: 1. only works in Ubuntu will be supported by TA. You may try other environments for challenge. 2. Cygwin on Windows is not recommended, for its instability and unforeseen bugs. Java Runtime Environment, JavaTM 1.6.x recommended ssh must be installed and sshd must be running to use the Hadoop scripts that manage remote Hadoop daemons. Hadoop Setup
Single Node Setup (Usually for debug)

Untar hadoop-*.**.*.tar.gz to your user path
About Version: The latest stable version 1.0.1 is recommended.
edit the file conf/hadoop-env.sh to define at least JAVA_HOME to be the root of your Java installation edit the files to configure properties:
conf/core-site.xml: conf/hdfs-site.xml: <configuration> <configuration> <property> <property> <name> <name> fs.default.name dfs.replication </name> </name> <value> <value> hdfs://localhost:9000 1 </value> </value> </property> </property> </configuration> </configuration> conf/mapred-site.xml: <configuration> <property> <name> mapred.job.tracker </name> <value> localhost:9001 </value> </property> </configuration> Hadoop
Setup
Cluster Setup ( the only acceptable setup for HW)

Same steps as single node setup Set dfs.name.dir and dfs.data.dir property in hdfs-site.xml Add the masters node name to conf/master Add all the slaves node name to conf/slaves Edit /etc/hosts in each node: add IP and node name item for each node Suppose your masters node name is ubuntu1 and its IP is 192.168.0.2, then add line 192.168.0.2 ubuntu1 to the file Copy the folder to the same path of all nodes Notice: JAVA_HOME may not be set the same in each node
Hadoop Setup
Execution
generating ssh keygen. Passphrase will be omitted when starting up: $ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys $ ssh localhost
Format a new distributed-filesystem: $ bin/hadoop namenode format Start the hadoop daemons: $ bin/start-all.sh The hadoop daemon log output is written to the ${HADOOP_LOG_DIR} directory (defaults to ${HADOOP_HOME}/logs).
Hadoop Setup
Execution(continued)
Copy the input files into the distributed filesystem: $ bin/hadoop fs -put conf input Run some of the examples provided: $ bin/hadoop jar hadoop-*-examples.jar grep input output 'dfs[a-z.]+' Examine the output files: View the output files on the distributed filesystem: $ bin/hadoop fs -cat output/* When you're done, stop the daemons with: $ bin/stop-all.sh
Hadoop Setup
Details About Configuration Files

Hadoop configuration is driven by two types of important configuration files: 1.Read-only default configuration: src/core/core-default.xml src/hdfs/hdfs-default.xml src/mapred/mapred-default.xml conf/mapred-queues.xml.template. 2.Site-specific configuration: conf/core-site.xml conf/hdfs-site.xml conf/mapred-site.xml conf/mapred-queues.xml
Hadoop Setup
Details About Configuration Files (continued)

conf/core-site.xml:
Parameter fs.default.name Value URI of NameNode. Notes hdfs://hostname/
conf/hdfs-site.xml:
Parameter Value Path on the local filesystem where the NameNode stores the namespace and transactions logs persistently. Notes If this is a comma-delimited list of directories then the name table is replicated in all of the directories, for redundancy.
dfs.name.dir
dfs.data.dir
If this is a comma-delimited Comma separated list of list of directories, then data paths on the local filesystem will be stored in all named of a DataNode where it directories, typically on should store its blocks. different devices.
Hadoop Setup
Details About Configuration Files (continued)

conf/mapred-site.xml:
Parameter mapred.job.tracker mapred.system.dir Value Host or IP and port of JobTracker. Path on the HDFS where where the Map/Reduce framework stores system files e.g. /hadoop/mapred/system/. Comma-separated list of paths on the local filesystem where temporary Map/Reduce data is written. Notes host:port pair. This is in the default filesystem (HDFS) and must be accessible from both the server and client machines. Multiple paths help spread disk i/o.
mapred.local.dir
The maximum number of Map/Reduce tasks, which Defaults to 2 (2 maps and 2 reduces), but vary it mapred.tasktracker.{map|reduce}.tasks.maximum are run simultaneously on a given TaskTracker, depending on your hardware. individually. If necessary, use these files to control the list of dfs.hosts/dfs.hosts.exclude List of permitted/excluded DataNodes. allowable datanodes. If necessary, use these files to control the list of mapred.hosts/mapred.hosts.exclude List of permitted/excluded TaskTrackers. allowable TaskTrackers.
mapred.queue.names
The Map/Reduce system always supports atleast one queue with the name as default. Hence, this parameter's value should always contain the string default. Some job schedulers supported in Hadoop, like the Capacity Scheduler, support multiple queues. If such a scheduler is being used, the list of configured queue names must be specified here. Comma separated list of queues to which jobs can Once queues are defined, users can submit jobs to be submitted. a queue using the property name mapred.job.queue.name in the job configuration. There could be a separate configuration file for configuring properties of these queues that is managed by the scheduler. Refer to the documentation of the scheduler for information on the same.
Hadoop Setup
You may get detailed information from

The official site: http://hadoop.apache.org Course slides & Textbooks: http://www.cs.sjtu.edu.cn/~liwujun/course/mmds.html
Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/

If you have good materials to share, please send them to TA.
Hadoop Setup

Had Oop Setup

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Had Oop Setup

Încărcat de

Drepturi de autor:

Formate disponibile

Hadoop Setup

Single Node Setup (Usually for debug)

Cluster Setup ( the only acceptable setup for HW)

Details About Configuration Files

Details About Configuration Files (continued)

Details About Configuration Files (continued)

You may get detailed information from

Michael G. Noll's Blog (a good guide): http://www.michael-noll.com/

S-ar putea să vă placă și