Documente Academic
Documente Profesional
Documente Cultură
and HBase
Enis Soztutar
enis [at] apache [dot] org
@enissoz
Driver M
S
C
Parser Planner l Metastore
i
e
Execution Optimizer n
t
MapReduce
HDFS RDBMS
From: Bigtable: A Distributed Storage System for Structured Data, Chang, et al.
Client
HMaster
Zookeeper
Region Region Region
server server server
Region Region Region
HDFS
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
From HUG - Hive/HBase Integration or, MaybeSQL? April 2010 John Sichi Facebook
http://www.slideshare.net/hadoopusergroup/hive-h-basehadoopapr2010
ROW COLUMN+CELL
bit.ly/aaaa column=s:hits, value=100
bit.ly/aaaa column=u:url,
value=hbase.apache.org/
bit.ly/abcd column=s:hits, value=123
bit.ly/abcd column=u:url,
value=example.com/foo
Architecting the Future of Big Data
Page 14
© Hortonworks Inc. 2011
Example: Hive + HBase (Hive table)
CREATE TABLE short_urls(
short_url string,
url string,
hit_count int
)
STORED BY
'org.apache.hadoop.hive.hbase.HBaseStorageHandler'
WITH SERDEPROPERTIES
("hbase.columns.mapping" = ":key, u:url, s:hits")
TBLPROPERTIES
("hbase.table.name" = ”short_urls");
Architecting the Future of Big Data
Page 15
© Hortonworks Inc. 2011
Storage Handler
• Hive defines HiveStorageHandler class for different storage
backends: HBase/ Cassandra / MongoDB/ etc
• Storage Handler has hooks for
– Getting input / output formats
– Meta data operations hook: CREATE TABLE, DROP TABLE, etc
• Storage Handler is a table level concept
– Does not support Hive partitions, and buckets
Driver M
S
Parser Planner C
l Metastore
i
Execution Optimizer e
n
t
StorageHandler
MapReduce HBase
HDFS RDBMS
SELECT ... FROM users WHERE userid > 1000000 and email LIKE
‘%@gmail.com’;
-> scan.setStartRow(Bytes.toBytes(1000000))
CLI
Driver M
Authorization
S
C
Parser Planner l Authentication
i
Metastore
e
Execution Optimizer n
t
A/A A/A
MapReduce HBase
A12n/A11N A12n/A11N
HDFS
RDBMS
Architecting the Future of Big Data
Page 32
© Hortonworks Inc. 2011
Hive Deployment Option 2
Client
CLI
Driver M
Authorization
S
C
Parser Planner l Authentication
i
e Metastore
n
Execution Optimizer
t
A/A A/A
MapReduce HBase
A12n/A11N A12n/A11N
HDFS
RDBMS
Architecting the Future of Big Data
Page 33
© Hortonworks Inc. 2011
Hive Deployment Option 3
Client
JDBC/ODBC
M
Driver Authorization
S
C
Parser Planner l Authentication
i Metastore
e
Execution Optimizer n
t
A/A A/A
MapReduce HBase
A12n/A11N
HDFS A12n/A11N
RDBMS
Architecting the Future of Big Data
Page 34
© Hortonworks Inc. 2011
Hive + HBase + Hadoop Security
• Regardless of Hive’s own security, for Hive to work on
secure Hadoop and HBase, we should:
– Obtain delegation tokens for Hadoop and HBase jobs
– Ensure to obey the storage level (HDFS, HBase) permission checks
– In HiveServer deployments, authenticate and impersonate the user
• Hadoop Summit
– June 13-14
– San Jose, California
– www.Hadoopsummit.org