Sunteți pe pagina 1din 15

Sqoop

Sqoop Import

Traditional RDBMS data into Hadoop, Hbase and HIVE.


Prerequisites:-
RDBMS
Hadoop Cluster up in running
Set HADOOP_HOME environment variable
Basic command

bin/sqoop import connect jdbc:mysql://url username name password pwd


table name target-dir path/for/storing/db
How import works?

First connection is set up to the Database server to pull desired metadata info
from the input table we are using.
Then it executes a Mapreduce job on Hadoop cluster. Sqoop will use
metadata to perform actual import.
Modify Delimiters

--fields-terminated-by ,
--lines-terminated-by ,
--escaped-by \\
--enclosed-by \
Different file formats

--as-sequencefile Store data in sequential file format


--as-avrodatafile Store data in Avro file
--as-textfile Store data in Text file

--direct Direct Access Mode for non jdbc based access


Different table access

--columns field1, field2 Import selected columns


--where condition Import selected rows
--columns fields where cond. Selected rows of selected columns
--query any query For any SQL query
import-all-tables For importing all tables

-m no. No. of map tasks


--split-by column_name For dividing mapped tasks
Incremental import

For importing new version/latest record


For appending new recods
--incremental append last-value value check-column column_name
For appending and updating records
--incremental lastmodified last-value value(timestamp) check-column
column_name
(Will need to maintain timestamp, so an extra column)
Job info

--create job_name
--delete job_name
--exec job_name
--show job_name Show parameters
--list List of all saved jobs
Importing data in Hbase

Prerequisites:-
Hbase cluster up in running
HBASE_HOME environment variable is set
For importing a Primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hbase-table hbase_name column-family hbase_table_col1
hbase-create-table
For importing a non-primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hbase-table hbase_name column-family hbase_table_col1
hbase-row-key col_name hbase-create-table
Importing database in HIVE

Prerequisites:-
HIVE installed
HIVE_HOME environment variable is set
Importing primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hive-table name create-hive-table hive-import hive-home
path/to/hive/home
Importing non-primary key table
bin/sqoop import connect jdbc:mysql://url username name password pwd
table name hive-table name create-hive-table hive-import hive-home
path/to/hive/home split-by col_name
Getting HDFS data into HIVE

Hive> CREATE EXTERNAL TABLE student(id int, name string)


ROW FORMAT DELIMITED FIELDS TERMINATED BY ',
LINES TERMINATED BY '\n
STORED AS TEXTFILE
LOCATION '/user/username/student';
Sqoop export

Basic command:
Bin/sqoop export connect location table name username name password
pwd export-dir /location
--input-fields-terminated-by,
--input-lines-terminated-by,
How export works

Validate metadata of output RDBMS table


Execute the Mapreduce job to perform actual transfer

Use staging-table argument to move staged data in single transaction


Export from HIVE

Create an invoice table as


CREATE TABLE invoice(
id INT NOT NULL PRIMARY KEY
from VARCHAR(32), to VARCHAR(32));
Use command:-
bin/sqoop export connect jdbc:Location table invoice export-dir
Location/invoice username name password pwd m no. input-fields-
terminated-by\001(Octal of ^A)

S-ar putea să vă placă și