Sunteți pe pagina 1din 4

Ubuntu - Install Hive and configure Hive Metastore with MySQL

Install Hive and configure Hive Metastore with MySQL


Hive is designed for data summarization, ad-hoc querying, and analysis of large
volumes of data. Hive is a Data Warehousing package built on top of Hadoop.
HiveQL -> HiveQueryLanguage will create MapReduce job for your task and you can
assign the number of reducer and bytes to be consume by reducer tasks etc..
Hive Vs MapReduce
Hive provides no additional capabilities to MapReduce. Hive programs are execute
d as MapReduce jobs via Hive interpreter.
Hive is powerful that in Hive, you can join table data with log file and query t
o see the result using regexp pattern.
1) Download hive
2) Extract Hive (Hope you followed "Install Hadoop" with user hduser)
sudo su
cd /usr/local; cp /home//Download/apache-hive-0.14.0-bin.tar.gz /usr/local/
tar -xvzf apache-hive-0.14.0-bin.tar.gz
mv apache-hive-0.14.0-bin hive-0.14
chown -R hduser:hadoop hive-0.14

6) Update /etc/profile PATH or su hduser; vi ~/.bashrc


export HIVE_HOME=/usr/local/hive-0.14
export PATH=$PATH:$HADOOP_HOME/bin:$HIVE_HOME/bin
7) Running Hive
Note : Make sure HADOOP_HOME is installed and configured.
In addition, you must create /tmp and /user/hive/warehouse (aka hive.metastore.w
arehouse.dir) and set them chmod g+w in HDFS before you can create a table in Hi
ve.

$HADOOP_HOME/bin/hadoop fs -mkdir /tmp


$HADOOP_HOME/bin/hadoop fs -mkdir /user/hive/warehouse
$HADOOP_HOME/bin/hadoop fs -chmod g+w /tmp
$HADOOP_HOME/bin/hadoop fs -chmod g+w /user/hive/warehouse

8) Remove .template extension from all the files stored. (Ex : mv hive-env.sh.te
mplate hive-env.sh)
cd $HIVE_HOME/conf
9) Enter into hive console
$hive (Press Enter to hive shell)
Note : If you want to proceed with the default hive metastore with derby db in y
our local machine experimental proceed $hive and start creating your table and t
ry HiveQueryLanguage..(HQL). Go to Step 18
10) Configure the hive metaStore pointing MySQL (Recommended)
This will install the jar for mysql connector through java, cd /usr/share/java/
to see the mysql-connector-java.jar)
sudo apt-get install mysql-server-5.5
sudo apt-get install libmysql-java
sudo ln -s /usr/share/java/mysql.jar /usr/local/hive-0.14/lib/libmysql-java.jar

11) sudo sysv-rc-conf mysql on (Start mysql upon machine start. If sysv-rc-conf
missing in your ubuntu then "sudo apt-get install sysv-rc-conf")
Create the initial database schema using the hive-schema-.sql file located in t
he $HIVE_HOME/scripts/metastore/upgrade/mysql directory.

$ mysql -uroot -proot


mysql>CREATE DATABASE metastore;
mysql>USE metastore;
mysql> SOURCE $HIVE_HOME/scripts/metastore/upgrade/mysql/hive-schema-.sql

You also need a MySQL user account for Hive to use to access the metastore. It
is very important to prevent this user account from creating or altering tables
in the metastore database schema. Replace metastorehost(your remote metastore se
rver) with localhost if your metastore server is same.

mysql> CREATE USER 'hive'@'metastorehost' IDENTIFIED BY 'mypassword';


...
mysql> REVOKE ALL PRIVILEGES, GRANT OPTION FROM 'hive'@'metastorehost';
mysql> GRANT SELECT,INSERT,UPDATE,DELETE,LOCK TABLES,EXECUTE ON metastore.* TO '
hive'@'metastorehost';
mysql> FLUSH PRIVILEGES;
mysql> quit;
This step shows the configuration properties you need to set in hive-site.xml t
o configure the metastore service to communicate with the MySQL database, and pr
ovides sample settings. Though you can use the same $HIVE_HOME/conf/hive-site.xm
l or $HIVE_HOME/conf/hive-default.xml on all hosts (client, metastore, HiveServe
r), hive.metastore.uris is the only property that must be configured on all of t
hem; the others are used only on the metastore host.
12) Update hive-site.xml or hive-default.xml under $HIVE_HOME/conf with below
<property>
<name>javax.jdo.option.ConnectionURL</name>
<value>jdbc:mysql://localhost/metastore</value>
<description>the URL of the MySQL database</description>
</property>
<property>
<name>javax.jdo.option.ConnectionDriverName</name>
<value>com.mysql.jdbc.Driver</value>
</property>
<property>
<name>javax.jdo.option.ConnectionUserName</name>
<value>hive</value>
</property>
<property>
<name>javax.jdo.option.ConnectionPassword</name>
<value>hive</value>
</property>
<property>
<name>datanucleus.autoCreateSchema</name>
<value>false</value>
</property>
<property>
<name>datanucleus.fixedDatastore</name>
<value>true</value>
</property>
<property>
<name>hive.metastore.uris</name>
<value>thrift://localhost:9083</value>
<description>IP address (or fully-qualified domain name) and port of the metas
tore host</description>
</property>
13) Start Hive Metastore Service
hive --service metastore &
14) Create vi /home/hduser/user.txt in your local file system and add below cont
ent or downloaduser.txt
userid,username,city,state,country
1,John,Montgomery,Alabama,US
2,David,Phoenix,Arizona,US
3,Sarah,Sacramento,California,US
4,Anoop,Montgomery,Alabama,US
5,Gubs,Villupuram,TamilNadu,India
Note : In hive, table names are all case insensitive

15) Go to hive prompt and we'll create the table users in the Hive MetaStore to
map data from user.txt

$ hive (Enter)
hive>CREATE TABLE user(id INT, name STRING, City STRING, State STRING, Country S
TRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE;
hive>show tables;

16) The following command maps user.txt data to the users table by loading data
from user.txt.
(LOCAL => keyword needed to load file from local into hive. Not necessary if y
our file is in hdfs)
You can add OVERWRITE prior to 'INTO TABLE' if you want to overwrite existing
user table content
You can use
hadoop fs -put <localfilesystempath> <hdfsfiledirectory>
Load data into hive from your local directory. Remove LOCAL keyword if you file
is in hdfs.
$hive>LOAD DATA LOCAL INPATH '/home/hduser/user.txt' INTO TABLE user;
Note : Load Local command copies the file from local location to /user/hive/ware
house/userdb/user/user.txt. When you execute "DROP TABLE user" the file will als
o be dropped/removed from the hive location /user/hive/warehouse/userdb/user/use
r.txt.
17) Query How many people belong to each state?
$hive>select state, count(state) from user group by state;
18) You can use most of the relational db (SQL) queries in hive
hive>describe user;
hive>show create table user;
19) Drop hive table drops the table and data file from the warehouse (/user/hive
/warehouse/db/table/user.txt). What if you have MapReduce Program reference to t
his data file ? Refer next External Table creation
hive>DROP TABLE user
20) Create external table in hive. So, multiple tables can refer to the same dat
a file and if you drop the table the data file will be available in the same loc
ation.
hive>CREATE EXTERNAL TABLE user(id INT, name STRING, city STRING, state STRING,
country STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n' STORED AS TEXTFILE;
hive>LOAD DATA INPATH '/samples/user.txt' INTO TABLE user;
Note : Load command moves the file from hdfs location to /user/hive/warehouse/us
erdb/user/user.txt. So, file won't be available in original hdfs location. Upon
DROP TABLE user (External created table) the file will be available still in the
same location /user/hive/warehouse/userdb/user/user.txt.Even after table in hiv
e got dropped.
21) Exit from hive
$hive>quit;
22) Hive Log file location :
Note : hive.log path can be find in $HIVE_HOME/conf/hive-log4j.properties. ${jav
a.io.tmpdir}/${user.name}/hive.log (/tmp/hduser/hive.log)
You can do all the DDL (DataDefinitionLanguage commands) you do in SQL in HiveQu
eryLanguage. Refer : Hive DDL Manual

S-ar putea să vă placă și