Oscon Carbondata May18 1 160901094446

CarbonData : A New Hadoop File Forma
t For Faster Data Analysis
HUAWEI TECHNOLOGIES CO., LTD.
Outline
Use Case & Motivation : Why introducing a new file format?
CarbonData File Format Deep Dive
Framework Integrated with CarbonData
Performance
Demo
Future Plan
Use case: Sequential scan

Full table scan
Big scan & fast batch processing
Only fetch a few columns of the table
C
1
C
2
C
3
C
4
C
5
C
6
C
7
R1
R2
Common usage scenario:

ETL job
Log Analysis
R3
R4
R5
R6
R7
R8
R9
Use case: OLAP-Style Query

Multi-dimensional data analysis
Involves aggregation / join
Roll-up, Drill-down, Slicing and Dicing
Low-latency ad-hoc query
C
1
C
2
C
3
C
4
C
5
C
6
C
7
R1
R2
R3

Dash-board reporting
Fraud & Ad-hoc Analysis
R4
R5
R6
R7
R8
R9
Use case: Random Access

Predicate filtering on range of columns
Full row keys or range of keys lookup
Narrow scan but might fetch all columns
Requires second/sub-second level low-latency
C
1
C
2
C
3
C
4
C
5
C
6
C
7
R1
R2
R3

Operational query
User profiling
R4
R5
R6
R7
R8
R9
Motivation
OLAP Style Query
(multi-dimensional analysis)
Sequential Access
(big scan)
CarbonData: A Single File Format

suits for different types of access
Random Access
(narrow scan)
Design Goals
Low-Latency for various types of data access pattern

Allow fast query on fast data
Ensure Space Efficiency
General format available on Hadoop-ecosystem
CarbonData:
Read-optimized columnar storage
Leveraging multi-level Index for low-latency
Support column group to leverage the benefit of row-based
Enables dictionary encoding for deferred decoding for aggregation
Optimized streaming ingestion support
Broader Integration across Hadoop-ecosystem
7
Outline
Use cases & Motivation: Why introducing a new file format?
Performance
Demo
Future Plan
CarbonData File Structure

Blocklet : A set of rows in columnar format
Carbon File
Blocklet 1
Default blocklet size: ~120k rows

Balance between efficient scan and compression
Col1 Chunk
Col2 Chunk
Column chunk : Data for one column/column group in a Blocklet
Allow multiple columns forms a column group & stored as row-based

Column data stored as sorted index
Colgroup1 Chunk
Colgroup2 Chunk
Footer : Metadata information

File level metadata & statistics
Schema
Blocklet Index & Blocklet level Metadata
Blocklet N
Footer
Format
Carbon
Carbon Data
Data File
File
Blocklet 1
Column 1 Chunk
Column 2 Chunk
ColumnGroup 1 Chunk
ColumnGroup 2 Chunk
Blocklet N
Blocklet Info
Blocklet 1 Info
Column 1 Chunk Info

Compression scheme
ColumnFormat
ColumnID list
ColumnChunk length
ColumnChunk offset
ColumnGroup1 Chunk Info
File Footer
Blocklet Index
Blocklet 1 Index Node
Minmax index: min, max
Multi-dimensional index:
startKey, endKey
Blocklet N Index Node
File Metadata
Version, No.
Version,
No. Row,,
Blocklet N Info
Segment
Segment Info
Info
Schema
Schema for each column
Blocklet Index
Blocklet Info
10
Blocklet
Data are sorted along MDK (multi-dimensional keys)
data stored as index in columnar format
1143
1143
Blocklet
Blocklet Logical
Logical View
View 2
2
C1
C1
C7
C7
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
C2
C2
C3
C3 C4
C4
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
2
2
2
1
1
1
1
1
1
2
2
2
3
3
3
3
3
3
4
4
4
1
1
1
1
3
3
1
1
1
1
1
2
2
3
3
1
1
3
1
1
3
3
2
2
4
4
5
7
7
8
8
6
6
1
1
9
C64462
C6
2
2
5470
5470
142
142
2
2
443
443
5887
541
541
1
1
545
545
5618
5618
675
1
1
570
570
5101
5101
561
561
8
52
52
5524
5524
144
144
5
5
525
9749
9749
1153
1153
2
5039
5039
C5
C5
Years
Quarters
Months
Territory
Country
2003
QTR1
Jan
EMEA
Germany
142
11,432
2003
QTR1
Jan
APAC
China
541
54,702
2003
QTR1
Jan
EMEA
Spain
443
44,622
2003
QTR1
Feb
EMEA
Denmark
545
58,871
2003
QTR1
Feb
EMEA
Italy
675
56,181
2003
QTR1
Mar
APAC
India
52
9,749
2003
QTR1
Mar
EMEA
UK
570
51,018
2003
QTR1
Mar
Japan
Japan
561
55,245
2003
QTR2
Apr
APAC
Australia
525
50,398
2003
QTR2
Apr
EMEA
Germany
144
11,532
Sorted
Sorted MDK
MDK
Index
Index
[1,1,1,1,1] :
[142,11432]
[1,1,1,1,3] :
[443,44622]
[1,1,1,3,2] :
[541,54702]
[1,1,2,1,4] :
[545,58871]
[1,1,2,1,5] :
[675,56181]
[1,1,3,1,7] :
[570,51018]
[1,1,3,2,8] :
[561,55245]
Quantity
Sales
Encoding
Sort
(MDK Index)
[1,1,1,1,1] :
[142,11432]
[1,1,1,3,2] :
[541,54702]
[1,1,1,1,3] :
[443,44622]
[1,1,2,1,4] :
[545,58871]
[1,1,2,1,5] :
[675,56181]
[1,1,3,3,6] :
[52,9749]
[1,1,3,1,7] :
[570,51018]
11
File Level Blocklet Index

File Footer
Blocklet
Block 1
1 1 1 1
12000
1 1 1 2
5000
1 1 2 1Block
1 21
12000
1 2 1 3
11000
1 1 2 2
5000
1 2 2 3
11000
1 1 3 1 1 1
1 2 3 3Block
2 33
12000
1
1 1
3 3
2 2
5
1 3 1 4
2
4
4
5000
1000
2000
1 3 3 4 3 4
1 3 1 5 3 4
Block 4
1 3 3 5 3 4
1
2 3
1 2
1 4
1 3
1 4
1
2000
1000
1000
2000
12000
1 4 1 4
20000
2 1 1 2
5000
1 4 2 4
20000
2 1 2 1
12000
1 4 3 4
20000
2 1 2 2
5000
1
3
3
Blocklet Index
Block1
Start Key1
End Key1
C1(Min, Max)
.
C7(Min, Max)
11000
Block4
Start Key4
End Key4
C1(Min, Max)
.
C7(Min, Max)
Build in-memory file level MDK index tree for

filtering
Major optimization for efficient scan
Start Key1
End Key4
Start Key1
End Key2
Start Key3
End Key4
Start
Key1
End Key1
Start
Key2
End
Key2
Start
Key3
End Key3
Start
Key4
End Key4
C1(Min,Max)
C1(Min,Max)
C1(Min,Max)
C1(Min,Max)
C7(Min,Max)
C7(Min,Max)
C7(Min,Max)
C7(Min,Max)
12
Column Chunk Inverted Index
Blocklet
Blocklet
(( sort
sort column
column within
within column
column
chunk)
chunk)
[1|1] :[1|1] :[1|1] :[1|1] :[1|1]

: [142]:
[11432]
[1|2] :[1|2] :[1|2] :[1|2] :[1|9]
: [443]:
[44622]
[1|3] :[1|3] :[1|3] :[1|4] :[2|3]
: [541]:
[54702]
[1|4] :[1|4] :[2|4] :[1|5] :[3|2]
: [545]:
[58871]
[1|5] :[1|5] :[2|5] :[1|6] :[4|4]
: [675]:
[56181]
Column chunk Level
[1|6] :[1|6] :[3|6] :[1|9] :[5|5]
: [570]:
[51018]
inverted Index
[1|7] :[1|7] :[3|7] :[2|7] :[6|8]
: [561]:
[55245]
Run
Run Length
Length Encoding
Encoding &
& Compression
Compression
[1|8] :[1|8] :[3|8] :[3|3] :[7|6]
: [52]:
[9749]
Blocklet Rows
Columnar
Columnar Store
Store
[1|9] :[2|9] :[4|9] :[3|8] :[8|7]
: [144]: Measure1 Measure2
Block
Block
[11532]
Dim1 Block
Dim2 Block
Dim3
Block
Dim4 Block
Dim5 Block
[142]:[11432]
[1|10]:[2|10]:[4|10]:[3|10]
:[9|10]
:
[525]:
1(1-10)
1(1-8)
1(1-3)
1(1-2,4-6,9)
1(1,9)
[443]:[44622]
[50398]
2(9-10)
2(4-5)
2(7)
2(3)
[541]:[54702]
3(6-8)
3(3,8,10)
3(2)
4(9-10)
4(4)
[545]:[58871]
5(5)
[675]:[56181]
6(8)
[570]:[51018]
7(6)
[561]:[55245]
8(7)
[52]:[9749]
9(10)
[144]:[11532]
[525]:[50398]
Optionally store column data as inverted index

within column chunk
suitable to low cardinality column
better compression & fast predicate filtering
Blocklet
Blocklet Physical
Physical View
View
C1
C1
d
d
1
1
1
0
0
C2
C2
rr
1
1
1
0
0
d
d
1
8
8
2
2
2
2
C3
C3
rr
1
1
1
0
0
1
3
3
2
2
2
2
3
3
3
4
4
2
2
C4
C4
d
d
rr
1
1
1
0
0
1
6
6
2
2
1
1
3
3
3
C5
C5
d
d
1
2
2
4
4
3
3
9
9
1
7
7
1
1
3
3
1
1
rr
1
2
2
2
2
1
1
3
3
1
4
4
1
1
5
5
1
1
1
1
1
9
9
1
1
3
3
1
2
2
1
1
4
4
1
1
C6
C6
d
d
rr
142
142
443
443
541
541
545
545
675
675
570
570
561
561
52
52
144
144
525
525
1143
C7
2
C7
2
4462
4462
d rr
d
2
5470
2
2
5887
5887
1
1
5618
5618
1
5101
5101
8
8
5524
5524
5
5
9749
1153
1153
2
2
5039
5039
13
Column Group
Allow multiple columns form a column group
stored as a single column chunk in rowbased format
suitable to set of columns frequently
fetched together
saving stitching cost for reconstructing
row
Blocklet 1
C
Col
1
C
Col
2
C
Col3
Chunk Chunk Chunk
C
C
4 Col 5
10
23
23
10
50
10
11
12
Chun
k
C
6
Col
Chunk
38
15.2
15
29
18.5
51
18
52
22.8
60
29
16
32.9
68
32
18
21.6
14
Nested Data Type Representation

Arrays
Struts
Represented as a composite of two columns
Represented as a composite of finite number

of columns
One column for the element value

Each struct element is a separate column
One column for start_index & length of Array
Name
Array<Ph_Number
>
John
[192,191]
Sam
Bob
[121,345,333]
[198,787]
Nam
e
Array
[start,len
]
Ph_Number
John
0,2
192
Sam
2,3
191
Bob
5,2
Name
Info
Strut<age,gender
>
121
John
[31,M]
345
Sam
[45,F]
Bob
[16,M]
Nam
e
Info.age
Info.gender
John
31
Sam
45
Bob
16
333
198
15
Encoding & Compression

Efficient encoding scheme supported:
DELTA, RLE, BIT_PACKED
Dictionary:
medium high cardinality: file level dictionary
very low cardinality: table level global dictionary
CUSTOM
Compression Scheme: Snappy
Big Win:
Speedup Aggregation
Reduce run-time memory
footprint
Enable deferred decoding
Enable fast distinct count
16
Outline
Use Case & Motivation: Why introducing a new file format?
Performance
Demo
Future Plan
17
CarbonData Modules
Carbon-Spark
Integration
Integration of Carbon with Spark

including query optimization
Carbon-Hadoop
Input/Output Format
Carbon-core
Reader/Writer
Provide Hadoop Input/Output Format

interface
Core component of format
implementation for reading/writing
Carbon data
Carbon-format
Thrift definition
Language Agnostic Format

Specification
18
Spark Integration
Query CarbonData Table
DataFrame API
Spark SQL Statement
CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name

data_type [COMMENT col_comment], ...)] [COMMENT table_comment]
[PARTITIONED BY (col_name data_type [COMMENT
col_comment], ...)] STORED BY
org.carbondata.hive.CarbonHanlder [TBLPROPERTIES
(property_name=property_value, ...)] [AS select_statement];
Support schema evolution of Carbon table via ALTER TABLE

Add, Delete or Rename Column
schema update only, data stored on disk is untouched
19
Spark Integration
Table Level MDK Tree Index
Query optimization
Vectorized record reading

Predicate push down by leveraging multi-level index
Column Pruning
Defer decoding for aggregation
Table
C1
Block
Block
Block
Block
Blocklet
Blocklet
Blocklet
Blocklet
Blocklet
Blocklet
Blocklet
Blocklet
Footer +
Index
Footer +
Index
Footer +
Index
Footer +
Index
C2
Blocklet
C3
C4
C5 C6 C7
C9
Inverte
d
Index
20
Data Ingestion
Bulk Data Ingestion
CSV file conversion

MDK clustering level: load level vs. node level
Save Spark dataframe as Carbon data file
LOAD DATA [LOCAL] INPATH 'folder path' [OVERWRITE]

INTO TABLE tablename
OPTIONS(property_name=property_value, ...)
INSERT INTO TABLE tablennme AS select_statement1
FROM table1;
df.write
.format("org.apache.spark.CarbonSource")
.options(Map("dbName" -> "db1", "tableName" ->
"tbl1"))
.mode(SaveMode.Overwrite)
.save(/path)
21
Data Compaction
Data compaction is used to merge small files
Re-clustering across loads
Two types of compactions

- Minor compaction
Compact adjacent files into a single big file (~HDFS block size)
- Major compaction
Reorganize adjacent loads to achieve better clustering along MDK index
22
Outline
Use Case & Motivation: Why introducing a new file format?
Performance
Demo
Future Plan
23
Performance comparison
Carbon vs Popular Columnar Stores
120.00
High Throughput/Full Scan

Query
1.4x to
6x faster
Random Access
Query
26x
688x111.86
107.39
faster
101.62
OLAP/Interactive Query
20x 33x faster
100.00
80.00
Popular
Columnar
Stores
60.00
Response Time (Seconds)

40.00
Carbon
26.28
20.00
24.64
23.05
17.33
9.45
12.71
4.41
0.00
SQL1
SQL2
10.38
9.82
1.62
SQL3
2.54
SQL4
15.49
17.82
11.21
8.16
0.89
SQL5
SQL6
0.55
SQL7
0.52
SQL8
Benchmark Queries
0.54
SQL9
1.19
0.16
2.24
4.28
SQL10 SQL11 SQL12 SQL13
Data Size : 2TB

24
Performance comparison - Observations

High Throughput/Full Scan Query
1.4 to 6 times faster
Deferred decoding enables faster aggregation

on the fly.
OLAP/Interactive Query
20 to 33 times faster
MDK, Min-Max and Inverted indices enable

block pruning
Deferred decoding enables faster aggregation
on the fly.
Random Access Query

26 to 688 times faster
Inverted index enables faster row reconstruction.

Column group eliminates implicit joins for row
reconstruction.
25
Outline
Motivation: Why introducing a new file format?
Performance
Demo
Future Plan
26
Live Demo
High Throughput/Full Scan Query
SELECT PROD_BRAND_NAME, SUM(STR_ORD_QTY)
FROM oscon_demo GROUP BY PROD_BRAND_NAME;
OLAP/Interactive query
SELECT PROD_COLOR, SUM(STR_ORD_QTY) FROM
oscon_demo WHERE CUST_COUNTRY ='New Zealand'
AND CUST_CITY = 'Auckland' AND PRODUCT_NAME =
'Huawei Honor 4X' GROUP BY PROD_COLOR;
Random Access Query
Random
Query
SELECT
* FROMAccess
oscon_demo
WHERE
CUST_PRFRD_FLG= "Y" AND PROD_BRAND_NAME =
"Huawei" AND PROD_COLOR = "BLACK" AND
CUST_LAST_RVW_DATE = "2015-12-11 00:00:00" AND
CUST_COUNTRY ='New Zealand' AND CUST_CITY =
'Auckland' AND PRODUCT_NAME = 'Huawei Honor
4X' ;
Demo Environment
Number of
Nodes
5 VM (AWS
r3.4xlarge)
vCPU
80 (16/node)
Memory
500 GiB (100

GiB/node)
#Columns
300
Data Size
600GB
#Records
300M
27
Outline
Motivation: Why introducing a new file format?
Performance
Demo
Future Plan
28
Future Plan
Upgrade to Spark 2.0
Add append support
Support pre-aggregated table
Enable offline IUD support
Broader Integration across Hadoop-ecosystem
29
Community
CarbonData is open sourced & will become Apache Incubator project
Welcome contribution to our Github @:

https://github.com/HuaweiBigData/carbondata
Main Contributors:
Jihong MA, Vimal, Raghu, Ramana, Ravindra, Vishal, Aniket, Liang Chenliang, Jacky Likun,
Jarry Qiuheng, David Caiqiang, Eason Linyixin, Ashok, Sujith, Manish, Manohar, Shahid,
Ravikiran, Naresh, Krishna, Babu, Ayush, Santosh, Zhangshunyu, Liujunjie, Zhujing
(Huawei)
Jean-Baptiste Onofre (Talend, ASF member), Henry Saputra (eBay, ASF member),
Uma Maheswara Rao G(Intel, Hadoop PMC)
30
Thank you
www.huawei.com
Copyright2014 Huawei Technologies Co., Ltd. All Rights Reserved.

The information in this document may contain predictive statements including, without limitation,
statements regarding the future financial and operating results, future product portfolio, new technology,
etc. There are a number of factors that could cause actual results and developments to differ materially
from those expressed or implied in the predictive statements. Therefore, such information is provided for
reference purpose only and constitutes neither an offer nor an acceptance. Huawei may change the
information at any time without notice.

Oscon Carbondata May18 1 160901094446

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Oscon Carbondata May18 1 160901094446

Încărcat de

Drepturi de autor:

Formate disponibile

CarbonData : A New Hadoop File Forma

t For Faster Data Analysis

HUAWEI TECHNOLOGIES CO., LTD.

Use case: Sequential scan

Common usage scenario:

Use case: OLAP-Style Query

Common usage scenario:

Use case: Random Access

Common usage scenario:

CarbonData: A Single File Format

Low-Latency for various types of data access pattern

CarbonData File Structure

Default blocklet size: ~120k rows

Column chunk : Data for one column/column group in a Blocklet

Allow multiple columns forms a column group & stored as row-based

Footer : Metadata information

Column 1 Chunk Info

ColumnGroup1 Chunk Info

Minmax index: min, max

Blocklet N Index Node

Schema for each column

File Level Blocklet Index

Build in-memory file level MDK index tree for

Column Chunk Inverted Index

[1|1] :[1|1] :[1|1] :[1|1] :[1|1]

Optionally store column data as inverted index

Chunk Chunk Chunk

Nested Data Type Representation

Represented as a composite of two columns

Represented as a composite of finite number

One column for the element value

Encoding & Compression

Integration of Carbon with Spark

Provide Hadoop Input/Output Format

Language Agnostic Format

CREATE TABLE [IF NOT EXISTS] [db_name.]table_name [(col_name

Support schema evolution of Carbon table via ALTER TABLE

Vectorized record reading

CSV file conversion

Save Spark dataframe as Carbon data file

LOAD DATA [LOCAL] INPATH 'folder path' [OVERWRITE]

Data compaction is used to merge small files

Re-clustering across loads

Two types of compactions

High Throughput/Full Scan

Response Time (Seconds)

SQL10 SQL11 SQL12 SQL13

Data Size : 2TB

Performance comparison - Observations

Deferred decoding enables faster aggregation

MDK, Min-Max and Inverted indices enable

Random Access Query

Inverted index enables faster row reconstruction.

Random Access Query

500 GiB (100

Upgrade to Spark 2.0

Add append support

Support pre-aggregated table

Enable offline IUD support

Broader Integration across Hadoop-ecosystem

CarbonData is open sourced & will become Apache Incubator project

Welcome contribution to our Github @:

Copyright2014 Huawei Technologies Co., Ltd. All Rights Reserved.