Documente Academic
Documente Profesional
Documente Cultură
Columnstore
Technical Deep Dive
Sunil Agarwal
Program Manager, SQL
Server
sunila@Microsoft.com
Agenda
17%
1 - 3 TB
3 - 10 TB
21%
18%
In 3
years
19%
25%
Today
17%
More than 10 TB
Don't Know
41%
34%
2%
6%
Columnstore designed
needs.
4
Scale more: DW
systems continue to
grow at a fast pace,
scalability is a key
concern, growing a
system from 10s of
TBs, to 100s of TB, to
PBs.
Performance at scale:
ability to analyze
massive amounts of
data while offering
interactive query
to address
above
response.
Data warehousing for
masses: drive down
price per TB.
In-memory Technologies
In-Memory
Technologies
In-Memory OLTP
In-Memory DW
SSD Bufferpool
Extension
Applicable to
Transactional workloads:
Concurrent data entry,
processing and retrieval
Applicable to
Decision support workloads:
Large scans and aggregates
Applicable to
Disk-based transactional
workloads:
Large working (data)set
C1
C2
C3
C4
Improved compression:
Data from same domain compress better
Reduced I/O:
Fetch only columns needed
Improved Performance:
C5
Row Group
Column Segment
C1
C2
C3
C4
C5
Column Segment
C6
Segments are
compressed
Each segment stored
separately
Segment is unit of
transfer between disk
and memory
ProductKey StoreKey
RegionKe
y
Quantity
SalesAmount
20101107
106
01
30.00
20101107
103
04
17.00
20101107
109
04
20.00
20101107
103
03
17.00
20101107
106
05
20.00
20101108
106
02
25.00
20101108
102
02
14.00
20101108
106
03
25.00
20101108
109
01
10.00
20101109
106
04
20.00
20101109
106
04
25.00
20101109
103
01
17.00
20101107
106
01
30.00
20101107
103
04
17.00
20101107
109
04
20.00
20101107
103
03
17.00
20101107
106
05
20.00
20101108
106
02
25.00
~1M rows
OrderDateKe
y
ProductKe
y
StoreKe
y
RegionKey
Quantity
SalesAmount
20101108
102
02
14.00
20101108
106
03
25.00
20101108
109
01
10.00
20101109
106
04
20.00
20101109
106
04
25.00
20101109
103
01
17.00
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
30.00
20101107
103
04
17.00
20101107
109
04
20.00
20101107
103
03
17.00
20101107
106
05
20.00
20101108
106
02
25.00
OrderDateKey
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101108
102
02
14.00
20101108
106
03
25.00
20101108
109
01
10.00
20101109
106
04
20.00
20101109
106
04
25.00
20101109
103
01
17.00
ProductKey
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
30.00
17.00
20101107
20101107
20101107
20101107
20101108
103
109
103
106
106
04
2
2
04
03
05
02
OrderDateKey
20101108
ProductKey
102
20101108
20101108
106
20101109
109
20101109
20101109
106
106
StoreKey
02
03
01
04
04
103
01
9/29/16
20.00
17.00
20.00
25.00
RegionKey
Quantity
SalesAmount
14.00
2
1
2
5
1
5
1
25.00
10.00
20.00
25.00
17.00
11
StoreKey
RegionKey
Quantity
SalesAmount
20101107
106
01
30.00
20101107
103
04
109
04
103
03
106
05
20101107
20101107
20101107
20101108
106
2
3
1
02
OrderDateKey
ProductKey
StoreKey
20101108
102
02
03
01
106
20101109
109
20101109
106
20101109
106
103
04
04
4
5
20.00
17.00
20.00
25.00
RegionKey
20101108
20101108
17.00
2
1
Quantity
1
5
1
4
5
1
01
SalesAmount
14.00
25.00
10.00
20.00
25.00
17.00
Column Elimination
Segment
Elimination
OrderDateKey
Motivation:
bitmap of qualifying
rows
Column vectors
C1
C2
C3
Functionality:
13
Agenda
Trends In Data Warehousing
Space
How Does Columnstore Work?
Whats New In
Columnstore?
Demo
9/29/16
In Summary
14
Gaps:
No DML support, no updates (data refresh)
Only secondary, non-clustered, columnstore indexes supported
Poor memory management (resource governor was not honored, index
build/re-build, run-time)
No batch hash join spilling
Limited data types support
Limited batch operations supported
20.0
15.0
Saves space
Simplifies management no secondary
indexes to maintain
10.0
5.0
0.0
91%
savings
DDL supported
Evolve your schema design as needed
** Space Used = Table space + Index space
16
C1
C1
C2
C2
C3
C3
C4
C4
C5
C5
C6
C6
tuple mover
Column
Store
Delta (row)
store
DELETE
Logical operation
Data physically remove after REBUILD operation is
performed.
UPDATE
DELETE followed by INSERT.
BULK INSERT
if batch < 100k, inserts go into delta store,
otherwise columnstore
SELECT
Unifies data from Column and Row stores - internal
UNION operation.
Note:
Distinct aggregates and UNION operators
continue to be executed in row mode.
18
345
302.4
1000
295
223.9
245
195
100
92.1
10
31.0
22.7
10.3 -1.0
1.71.73.2
Row Store
1.37.51.51.6
Column Store
145
95
Improvement Factor
3.9 45
-5
Improvement
19
5.7
1. COLUMNSTORE Compression
Default compression when creating a table with Clustered
Columnstore Index
Typical customer workloads gets 5-7x compression ratios
2. ARCHIVAL Compression
TPCH
TPCDS
Customer
1
Customer
2
Enables additional 30% compression for whole table and/or chosen partitions, with
CPU overhead.
Going back and forth between columnstore and columnstore_archive compressions.
sys.partitions exposes compression info (3 columnstore, 4
columnstore_archive)
20
Index Rebuild:
Re-creates clustered columnstore index completely.
Reorganize:
Forces delta store operation.
21
sys.column_store_segments
Visibility into columnstore segments
Use this DMV to determine quality of clustered columnstore index:
o If segments contain <1M rows, investigate why
22
23
Demo
9/29/16
25
Track resources
Download Microsoft SQL Server 2014
http://www.trySQLSever.com
Resources
Learning
Sessions on Demand
http://channel9.msdn.com/Events/Tec
hEd
TechNet
Resources for IT Professionals
http://microsoft.com/technet
www.microsoft.com/learning
msdn
Resources for Developers
http://microsoft.com/msdn
2014 Microsoft Corporation. All rights reserved. Microsoft, Windows, and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be
interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR
STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.