Documente Academic
Documente Profesional
Documente Cultură
DEFINIES
MapReduce
Especifica seleo/filtragem/transformao
Especifica a agregao
Sada do Map
Entrada do Reduce
Reduce 1: (isto,[1])
Reduce 2: ( , [1])
Reduce 3: (testando , [1,1])
Reduce 4: (um, [1, 1])
Reduce 5: (Hadoop, [1])
Reduce 6: (com, [1])
Reduce 7: (teste, [1,1])
Sada do Reduce
Reduce 1: (isto 1)
Reduce 2: ( 1)
Reduce 3: (testando 2)
Reduce 4: (um 2)
Reduce 5: (Hadoop 1)
Reduce 6: (com 1)
Reduce 7: (teste 2)
HDFS
O HDFS (Hadoop Distributed File System), que foi baseado no GFS (Google File
System), como o prprio nome sugere um sistema de arquivos distribudo. Este
sistema foi desenvolvido para ser escalvel, tolerante a falhas, garantir alto
gerenciamento, confiabilidade, usabilidade e desempenho, alm de ser capaz de
trabalhar em conjunto com MapReduce.
O interessante que quando voc for desenvolver a sua aplicao voc no precisa
se preocupar onde os dados esto, o prprio Hadoop se encarrega dessa tarefa, ou
seja, para a sua aplicao isso transparente, seria como se os dados estivessem
armazenados localmente.
Utilizado duas mquinas virtuais para a instalao sendo adicionado uma como
master e outra como slave:
Host node01 (Master)
Host node02 (Slave)
- Download do Hadoop
wget http://apache.mesi.com.ar/hadoop/common/hadoop-1.2.1/hadoop-1.2.0.tar.gz
Configurando o core-site.xml:
<configuration>
<property>
<name>fs.default.name</name>
<value>hdfs://hadoop-master:9000/</value>
</property>
<property>
<name>dfs.permissions</name>
<value>false</value>
</property>
</configuration>
Configurar hdfs-site.xml
<configuration>
<property>
<name>dfs.data.dir</name>
<value>/opt/hadoop/hadoop/dfs/name/data</value>
<final>true</final>
</property>
<property>
<name>dfs.name.dir</name>
<value>/opt/hadoop/hadoop/dfs/name</value>
<final>true</final>
</property>
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
Configurar mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>hadoop-master:9001</value>
</property>
</configuration>
# su hadoop
$ cd /opt/hadoop
$ scp -r hadoop node02:/root/hadoop
$ cd /root/hadoop
Configurando N Principal
$ vi conf/masters
node01
Configurando n escravo
$ vi conf/slaves
node02
Formatar hdfs
$ cd /root/hadoop/
$ bin/hadoop namenode format
Iniciar Hadoop
$/root/hadoop/bin/start-all.sh
Formatando o namenode
root@node01:~# sudo su -
root@node01:~# hadoop-0.20 namenode -format
10/05/11 18:39:58 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/127.0.1.1
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 0.20.2+228
STARTUP_MSG: build = -r cfc3233ece0769b11af9add328261295aaf4d1ad;
************************************************************/
10/05/11 18:39:59 INFO namenode.FSNamesystem: fsOwner=root,root
10/05/11 18:39:59 INFO namenode.FSNamesystem: supergroup=supergroup
10/05/11 18:39:59 INFO namenode.FSNamesystem: isPermissionEnabled=true
10/05/11 18:39:59 INFO common.Storage: Image file of size 94 saved in 0 seconds.
10/05/11 18:39:59 INFO common.Storage:
Storage directory /tmp/hadoop-root/dfs/name has been successfully formatted.
10/05/11 18:39:59 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/127.0.1.1
************************************************************/
root@node01:~#
Iniciando o namenode
root@master:~# /usr/lib/hadoop-0.20/bin/start-dfs.sh
starting namenode, logging to
/usr/lib/hadoop-0.20/bin/../logs/hadoop-root-namenode-mtj-desktop.out
192.168.108.135: starting datanode, logging to
/usr/lib/hadoop-0.20/bin/../logs/hadoop-root-datanode-mtj-desktop.out
192.168.108.134: starting datanode, logging to
/usr/lib/hadoop-0.20/bin/../logs/hadoop-root-datanode-mtj-desktop.out
192.168.108.133: starting secondarynamenode,
logging to
/usr/lib/hadoop-0.20/logs/hadoop-root-secondarynamenode-mtj-desktop.out
root@master:~# jps
7367 NameNode
7618 Jps
7522 SecondaryNameNode
root@master:~#
Testando o HDFS
root@master:~# hadoop-0.20 fs -df
File system Size Used Avail Use%
/ 16078839808 73728 3490967552 0%
root@master:~# hadoop-0.20 fs -ls /
Found 1 items
drwxr-xr-x - root supergroup 0 2010-05-12 12:16 /tmp
root@master:~# hadoop-0.20 fs -mkdir test
root@master:~# hadoop-0.20 fs -ls test
root@master:~# hadoop-0.20 fs -rmr test
Deleted hdfs://192.168.108.133:54310/user/root/test
root@master:~# hadoop-0.20 fsck /
.Status: HEALTHY
Total size: 4 B
Total dirs: 6
Total files: 1
Total blocks (validated): 1 (avg. block size 4 B)
Minimally replicated blocks: 1 (100.0 %)
Over-replicated blocks: 0 (0.0 %)
Under-replicated blocks: 0 (0.0 %)
Mis-replicated blocks: 0 (0.0 %)
Default replication factor: 2
Average block replication: 2.0
Corrupt blocks: 0
Missing replicas: 0 (0.0 %)
Number of data-nodes: 2
Number of racks: 1