Sunteți pe pagina 1din 29

DBI-B411

Columnstore
Technical Deep Dive
Sunil Agarwal
Program Manager, SQL
Server
sunila@Microsoft.com

Agenda

Trends In Data Warehousing Space


Columnstore architecture
New for ColumnStore in SQL 2014
Where to learn more

Trends in the Data Warehousing Space


Approximate data volume managed by DW
Less than 1TB

17%

1 - 3 TB
3 - 10 TB

21%
18%

In 3
years

19%
25%

Today

17%

More than 10 TB
Don't Know

41%

34%

2%
6%

0% 10%20%30% 40% 50%


Source: TDWI Report Next Generation DW

Columnstore designed
needs.
4

Scale more: DW
systems continue to
grow at a fast pace,
scalability is a key
concern, growing a
system from 10s of
TBs, to 100s of TB, to
PBs.
Performance at scale:
ability to analyze
massive amounts of
data while offering
interactive query
to address
above
response.
Data warehousing for
masses: drive down
price per TB.

In-memory Technologies
In-Memory
Technologies
In-Memory OLTP

5-20X performance gain


for OLTP integrated into
SQL Server

In-Memory DW

5-25X performance gain


and high data
compression

Updatable and clustered

SSD Bufferpool
Extension

4-10X of RAM and up to


3X performance gain
transparently for apps

Applicable to
Transactional workloads:
Concurrent data entry,
processing and retrieval
Applicable to
Decision support workloads:
Large scans and aggregates
Applicable to
Disk-based transactional
workloads:
Large working (data)set

ColumnStore - How is it different ?


Data stored as columns

Data stored as rows

C1

C2

C3

C4

Improved compression:
Data from same domain compress better

Reduced I/O:
Fetch only columns needed

Improved Performance:

More data fits in memory


Optimized for CPU utilization

C5

Columnstore Index Terminology


Row Group
Set of rows (typically 1
million)

Row Group
Column Segment
C1

C2

C3

C4

C5

Column Segment

C6

Contains values from one


column from the row
group

Segments are
compressed
Each segment stored
separately
Segment is unit of
transfer between disk
and memory

ColumnStore Index - Example


OrderDateKe
y

ProductKey StoreKey

RegionKe
y

Quantity

SalesAmount

20101107

106

01

30.00

20101107

103

04

17.00

20101107

109

04

20.00

20101107

103

03

17.00

20101107

106

05

20.00

20101108

106

02

25.00

20101108

102

02

14.00

20101108

106

03

25.00

20101108

109

01

10.00

20101109

106

04

20.00

20101109

106

04

25.00

20101109

103

01

17.00

Step-1: Horizontally Partition (create Row


Groups)
OrderDateKey ProductKey StoreKe
y

RegionKey Quantity SalesAmou


nt

20101107

106

01

30.00

20101107

103

04

17.00

20101107

109

04

20.00

20101107

103

03

17.00

20101107

106

05

20.00

20101108

106

02

25.00

~1M rows

OrderDateKe
y

ProductKe
y

StoreKe
y

RegionKey

Quantity

SalesAmount

20101108

102

02

14.00

20101108

106

03

25.00

20101108

109

01

10.00

20101109

106

04

20.00

20101109

106

04

25.00

20101109

103

01

17.00

Step-2: Vertically Partition (create Segments)


OrderDateKey

ProductKey

StoreKey

RegionKey

Quantity

SalesAmount

20101107

106

01

30.00

20101107

103

04

17.00

20101107

109

04

20.00

20101107

103

03

17.00

20101107

106

05

20.00

20101108

106

02

25.00

OrderDateKey

ProductKey

StoreKey

RegionKey

Quantity

SalesAmount

20101108

102

02

14.00

20101108

106

03

25.00

20101108

109

01

10.00

20101109

106

04

20.00

20101109

106

04

25.00

20101109

103

01

17.00

Step-3: Compress Each Segment


OrderDateKey

ProductKey

StoreKey

RegionKey

Quantity

SalesAmount

20101107

106

01

30.00

17.00

20101107
20101107
20101107
20101107
20101108

103
109
103
106
106

04

2
2

04

03

05

02

OrderDateKey
20101108

ProductKey
102

20101108
20101108

106

20101109

109

20101109
20101109

106
106

StoreKey
02
03
01
04
04

103
01

9/29/16

20.00
17.00
20.00
25.00

RegionKey

Quantity

SalesAmount

14.00

2
1
2

5
1

5
1

25.00
10.00
20.00
25.00
17.00

Some segments will compress more than


others and reordering not shown
*Encoding

11

Query Processing - Read The Data Needed


SELECT ProductKey, SUM
(SalesAmount)
FROM SalesTable
WHERE OrderDateKey < 20101108
ProductKey

StoreKey

RegionKey

Quantity

SalesAmount

20101107

106

01

30.00

20101107

103

04

109

04

103

03

106

05

20101107
20101107
20101107
20101108

106

2
3
1

02

OrderDateKey
ProductKey

StoreKey

20101108

102

02

03

01

106

20101109

109

20101109

106

20101109

106
103

04
04

4
5

20.00
17.00
20.00
25.00

RegionKey

20101108

20101108

17.00

2
1

Quantity
1
5
1
4
5
1

01

SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00

Column Elimination

Segment
Elimination

OrderDateKey

Multi-Row Batch Batch Processing


Batch object

Motivation:

Column store significantly reduces i/o


required.
Next bottleneck is CPU usage.
Batch processing addresses CPU usage.

bitmap of qualifying
rows

Column vectors

C1

C2

C3

Functionality:

Batch = columnar format + filter vector.


Moving set of rows - batch (~900 rows).
Batch moved between iterators.
Near-zero data copying with slight batch
updates.
# of function calls reduced orders of
magnitude.

13

Agenda
Trends In Data Warehousing
Space
How Does Columnstore Work?
Whats New In
Columnstore?
Demo
9/29/16
In Summary

14

Motivation for SQL Server 2014


SQL Server 2012, columnstore functionality:
Improvements
Non-clustered columnstore indexes.
Improved compression, compared to ROW/PAGE compression.
Improved query performance

Gaps:
No DML support, no updates (data refresh)
Only secondary, non-clustered, columnstore indexes supported
Poor memory management (resource governor was not honored, index
build/re-build, run-time)
No batch hash join spilling
Limited data types support
Limited batch operations supported

Goals for new columnstore functionality:


Competitive load performance and efficient index creation
Leading compression ratios and competitive query performance
Functional parity with row store, as much as possible

Clustered Columnstore Index


Space Used in GB (101 million row table)

20.0

Why is clustered index


important?

15.0

Saves space
Simplifies management no secondary
indexes to maintain

Columnstore (and clustered


columnstore index) are
PREFERRED storage engine
for DW scenarios
We encourage users to either move existing
tables to CCI, or start using CCI for new tables

10.0
5.0
0.0

91%
savings

Additional data types are


supported
High precision decimal, datatypeoffset, binary,
varbinary, uniqueidentifier, etc)
Unsupported types: spatial, XML, max types

DDL supported
Evolve your schema design as needed
** Space Used = Table space + Index space

16

C1

C1

C2

C2

C3

C3

C4

C4

C5

C5

Table consists of column store and row


store

C6

DML (update, delete, insert) operations


leverage delta store
INSERT Values

C6

tuple mover

Column
Store

Delta (row)
store

Updatable Columnstore Index

Always lands into delta store

DELETE
Logical operation
Data physically remove after REBUILD operation is
performed.

UPDATE
DELETE followed by INSERT.

BULK INSERT
if batch < 100k, inserts go into delta store,
otherwise columnstore

SELECT
Unifies data from Column and Row stores - internal
UNION operation.

Tuple mover converts data into


columnar format once delta store is full
(~1M of rows)
REORGANIZE statement converts delta
store into columnar storage.
17

Improved Query Performance

Batch hash join spilling implemented.

Mixed mode (row and batch) query


execution
presence of row operators does not prevent
operators to be executed in the batch mode

Additional batch operators:


joins (inner, outer)
partial/global aggregates w/ and w/o group by
union all operator

Note:
Distinct aggregates and UNION operators
continue to be executed in row mode.

18

Columnstore Performance Benefits


10000

345

302.4

1000

295

223.9

245
195

100

Response Time (s)

92.1

10

31.0
22.7
10.3 -1.0
1.71.73.2

Row Store

1.37.51.51.6

Column Store

145
95

Improvement Factor

3.9 45
-5

Improvement

Row Store vs. Column Store


PowerMetric
Power
Metric
Total Execution Time

19

5.7

Columnstore with Competitive Compression


Table compression options:
DATA_COMPRESSION = { NONE | ROW | PAGE | COLUMNSTORE |
COLUMNSTORE_ARCHIVE }

1. COLUMNSTORE Compression
Default compression when creating a table with Clustered
Columnstore Index
Typical customer workloads gets 5-7x compression ratios

2. ARCHIVAL Compression

TPCH
TPCDS
Customer
1
Customer
2

3.1X ** compression measured against raw data file


2.8X
8X
5.5X

Enables additional 30% compression for whole table and/or chosen partitions, with
CPU overhead.
Going back and forth between columnstore and columnstore_archive compressions.
sys.partitions exposes compression info (3 columnstore, 4
columnstore_archive)

20

Columnstore Index: TSQL Commands


Index Build:
Creates clustered columnstore index.

CREATE CLUSTERED COLUMNSTORE INDEX


// from HEAP
CREATE CLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING = ON)
// from CI

Index Rebuild:
Re-creates clustered columnstore index completely.

ALTER TABLE REBUILD


ALTER INDEX REBUILD
CREATE CLUSTERED COLUMNSTORE INDEX WITH (DROP_EXISTING = ON)

Reorganize:
Forces delta store operation.

ALTER INDEX REORGANIZE


// compresses closed row groups
REORGANIZE WITH (COMPRESS_ALL_ROW_GROUPS = ON)
// compresses all
row groups

21

Columnstore Index: DMVs


sys.column_store_row_groups
Visibility into all columnstore row groups (in columnar + delta store)
Use this DMV to determine number of delta stores
Notes:
o Every partition has at least one delta store
o Each partition can have multiple delta stores

sys.column_store_segments
Visibility into columnstore segments
Use this DMV to determine quality of clustered columnstore index:
o If segments contain <1M rows, investigate why

22

Columnstore Index: Data Load


Loading performance comparable to loading into CI (actually, load is a bit
faster to CCI )
Load data directly into CCI (presort data file if possible)

23

Demo

9/29/16

25

Track resources
Download Microsoft SQL Server 2014
http://www.trySQLSever.com

Try out Power BI for Office 365!


http://www.powerbi.com

Sign up for Microsoft HDInsight today!


http://microsoft.com/bigdata

Resources
Learning
Sessions on Demand

http://channel9.msdn.com/Events/Tec
hEd

TechNet
Resources for IT Professionals

http://microsoft.com/technet

Microsoft Certification & Training Resources

www.microsoft.com/learning

msdn
Resources for Developers

http://microsoft.com/msdn

Complete an evaluation and enter


to win!

Evaluate this session


Scan this
QR code
to evaluate
this session.

2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR
STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

S-ar putea să vă placă și