Sunteți pe pagina 1din 41

Designing and Tuning High

Speed Data Loading

Thomas Kejser
Principal Program Manager
tkejser@microsoft.com
1

Agenda
Tuning Methodology
Bulk Load API Basics
Design Pattern and Techniques
Parallelism
Table Layout

Tuning the SQL Server Engine


Tuning the Network Stack
Tuning Integration Services

Tuning ETL and ELT

Tuning Methodology

The Tuning Loop


Get a baseline
Make small change at a

Generate
Hypothesi
s

time
Agree on targets for

Save
Result

Measure

optimization
Actual runtime
CPU, Memory, I/O

The greedy tuner:


Measure

Change

Tune it till it breaks, then

fix it, so you can break it


again
4

Tools of the Trade - Windows


Perfmon
Logical Disk
Memory
Processor
Process (specifically the DTEXEC process)
Network Interface

Task Manager
WinDbg
KernRate

Tool of the Trade SQL Server


Sys.dm_os_wait_stats
All my tuning starts here
Get familiar with common wait types

Sys.dm_os_latch_stats
Allows deep dive into LATCH_<X> waits

Sys.dm_os_spinlock_stats
When too much CPU seems to be spend

Sys.dm_io_virtual_filestats
Because I/O systems are rarely perfect

Designing and Tuning High Speed Data


Loading
Bulk load API Basics

Four ways to Load Data to SQL


Server
Integration Services
OLEDB Destination
SQL Server Destinations

BULK INSERT
CSV or fixed width files

BCP
Like BULK INSERT, but can be run remotely

INSERT ... SELECT

Minimally logged and Bulk


Bulk Load
Feeds a continuous stream of data into a table
As opposed to running singleton INSERT statements

Minimally logged
Only allocations are logged, not individual rows/pages

Key Takeway: An operation can be a bulk

load operation without being minimally


logged

To TABLOCK or not to TABLOCK


General Rule (batch style):
Heaps: Use TABLOCK on Heaps
Cluster Indexes: Do NOT use TABLOCK

Minimally logged:
INSERT Heap WITH (TABLOCK) SELECT ...
If TF610 is on:
INSERT ClusterIndex SELECT ...

Same rules apply for SSIS OLEDB and SQL

Destinations in SSIS

10

Designing and Tuning High Speed Data


Loading
Design Patterns

11

Integration Services or T-SQL


Sometimes: Matter or preference
Integration Services is graphical
Some users like this
Hard to make modular
SQL Server uses T-SQL text language
Modular programming

The right tool for the right job


Learn both

12

SQL Server Which load


method?
BULK INSERT / BCP
INSERT ... SELECT
Pro
Can takes BU-lock
No need for Linked

Servers or OPENROWSET

Cons
Only CSV and fixed width

files for input

Pro
Can perform

transformations
Any OLEDB enabled input

Cons
Takes X-locks on table
Linked Servers or

OPENROWSET needed

13

Integration Services Which


Destination?
OLEDB Destination
SQL Server Destination
Pros:

Pro:

Can be used over TCP/IP

Fastest option

ETL Servers can be scaled

Easy to configure

out remote

Con:
Typically slower than SQL

Destination

Con:
Must run on same box as

SQL Server (shared


memory connections)

14

Design Pattern: Parallel Load


Create a (priority) queue for your packages
SQL Table good for this purpose

Packages / T-SQL include a loop:


Loop takes one item from queue

DTEXEC
(1) Task
Get
Do Work
Loop

Until queue empty

Priority Queue
Pn

P5

P4

P3

P2

P1

DTEXEC
(2) Task
Get
Do Work
Loop
15

Design Pattern: Table Hash


Partitioning

Create filegroups to hold the


partitions

Use CREATE PARTITION FUNCTION


command

Partition the tables into #cores


partitions

0
1
2
3
4
5
6

Use CREATE PARTITION SCHEME


command

Equally balance over LUN using


optimal layout

hash

Bind partition function to filegroups

Add hash column to table (tinyint,


just one byte per row)

Calculate a good hash distribution

For example, use hashbytes with


modulo or binary_checksum

253
254
255
16

Design Pattern: Large Updates

Sale
s

Sales
200
Updated
1
200
2
200
3
200
4

SW
ITC
H

SWITCH

Sales_Ne
w

Sales_Old

Sales_Del
ta

Update
Records

BULK INSERT

17

Design Pattern: Large Deletes

Sale
s

200
1
200
2
200
3
200
4

2001
(Filtered)

SWITCH
BULK
INSERT

SW
ITC
H

Sales_Tem
p
(2001
Filtered)
Sales_Tem
p
(2001)

18

Designing and Tuning High Speed Data


Loading
Tuning the SQL Server

Engine

19

ALLOC_FREESPACE_CACHE
- Measure:
Heap limits
250.0

Sys.dm_os_latch_waits
Long waits for

200.0

ALLOC_FREESPACE_CACHE
SQL Server Books Online:

150.0

Used to synchronize the access to a

cache of pages with available space for


heaps and binary large objects
(BLOBs). Contention on latches of this
class can occur when multiple
connections try to insert rows into a
heap or BLOB at the same time. You
can reduce this contention by
partitioning the object.

Hypothesis: More heaps =

more speed

MB/Sec
100.0

50.0

0.0
0

10
15
20
Concurrent Bulks

25

30

20

PAGELATCH_UP
PFS contention
Measure:
sys.dm_os_wait_stats

Hypothesis Generation
I/O problem?
What can we predict?

Fix: Add more files

to the filegoup!

21

RESOURCE_SEMAPHORE
- Query memory usage
DW load queries will often

be very memory intensive


By default, a single query

can max use 25% of SQL


Servers allocated
memory
Queries waiting to get a

memory grant will wait


for:
RESOURCE_SEMAPHORE
Can use RG to work

around it
22

SOS_SCHEDULER_YIELD
Hypothesis: Caused by two bulk commands

at same scheduler
Predict:
We should see multiple bulk commands on same scheduler

Observe: And we do
scheduler_id in sys.dm_exec_requests

23

Fixing SOS_SCHEDULER_YIELD
How can we fix this?
Two ways:
Terminate and reconnect
Soft NUMA
Core 0

Soft-NUMA
Node 0

TCP port
1433

x CPU
cores

Core
X

Soft-NUMA
Node X

TCP port
1433 + X

BULK INSERT

x CPU
cores

BULK INSERT

24

I/O Related Waits for BULK


INSERT
BULK insert uses a

double buffering
scheme
Important to feed it

Table

PAGEIOLATCH_EX
Pars
e

64KB

64KB

CSV

IMPROVIO_WAIT
OLEDB
ASYNC_NETWORK_IO

fast enough
Also, target SQL

Server must be able


to absorb writes

25

CXPACKET When it Matters


Statements of type

Throughput / DOP

INSERTSELECT

50.0
45.0

Measure:

Sometimes
throughput drops
with higher DOP
Hypothesis:

backpressure in
query execution

40.0
35.0
30.0
Throughput (MB/sec(

25.0
20.0
15.0
10.0
5.0
0.0
1 6 11 16 21 26 31 36 41 46
DOP

26

Drinking From a Fire Hose


CXPACKET waits / Throughput
200,000,000
180,000,000
160,000,000
140,000,000
120,000,000
CXPACKET Waits

Solution:
OPTION (MAXDOP = X)

100,000,000
80,000,000
60,000,000
40,000,000
20,000,000
0
30.0 10.0
Throughput (MB/sec)

27

SQL Server waits - Summary


Wait Type

Typical Cause

Resolution

PAGELATCH_UP

Contention on PFS pages

Add more data files to filegroup

ALLOC_FREESPACE_CACHE

Heap allocation bottleneck

Partition target table and use


SWITCH

SOS_SCHEDULER_YIELD

Network speed not keeping up

Optimize network settings in


Windows (Jumbo Frames)
Increase packet size

RESOURCE_SEMAPHORE

Too much memory used by query

Optimize query for less memory or


use Resource Governor to limit max
allocation

LCK_X

Locks prevent parallelism

Use correct lock hints

WRITELOG

Transaction log contention

Use TF610, seeks minimally logged


operatorions

PAGEIOLATCH_<X>

I/O system not keeping

Tune I/O

IMPROV_IOWAIT

Input file I/O too slow

Improve input file latency and/or


through

CXPACKET

Normallly harmless. But may be too


much coordination

Use MAXDOP hint, but carefully

OLEDB/ASYNC_NETWORK_IO

Not feeding bulk load fast enough

Optimize source

28

Designing and Tuning High Speed Data


Loading
Tuning the Network Stack

29

How to Affinitize NICs


Using the Interrupt-Affinity

Policy Tool you can


affinitize individual NICs to
CPU cores
Affinitize each of the NIC to

their own core


One NIC per hard NUMA node
You mileage may very

depends on the box


Match Soft NUMA TCP/IP

connections with NIC


NIC on the hardware NUMA

node maps to SQL bulk


stream target on same node

30

Tune Network Parameters


Jumbo Frames = 9014 bytes enabled
Adaptive Inter-Frame spacing disabled
Flow control = Tx & Rx enabled
Client & server Interrupt Moderation = Medium
Coalesc buffers = 256
Set server Rx buffers to 512 and server Tx

buffers to 512
Set client Rx buffers to 512 and client Tx buffers

to 256
Link speed 1000mbps Full Duplex
31

Network Packet Size

Measure
Perfmon shows huge

discrepancy between num reads


and writes

Hypothesis:
This is caused by small

network packet size (Default


4096) forcing stream to be
broken into smaller pieces

Test and prove:


Adjusting network packet

size to 32K
Increases throughput by 15%

32

Designing and Tuning High Speed Data


Loading
Tuning Integration Services

33

Integration Services vs. SQL


Lab Test Setup
Transform fact data

with surrogate key


lookups
5 dimension tables,

100K rows each


Partitioned fact table,

Test 2: Raw Join

Time/s

Krows/s

SSIS 2008

144

2222

SQL MAXDOP = 0

158

2025

SQL MAXDOP = 1 x 32

162

1975

Test 3: Join and write

SQL MAXDOP = 1 x 32

246

1301

SSIS 2008

278

1151

1927

166

SQL MAXDOP = 0

total of 320M rows


Test speed of hash
Integration
Services lookup join is comparable in speed
joins
with T-SQL!

34

Baseline of Package
Sanity check:
How much memory does each package use?
How much CPU does each package stream use?
Need enough CPU and Memory to run them all

Performance counters:
Process Private Bytes / Working Set (DTEXEC)
Processor % Processor Time
Network interface
Network / Current Bandwidth
Network / Bytes Total/sec

35

Scaling the Package - Method


Using the parallel load technique described

earlier you can run multiple copies of the


package
Using the baseline of the package, you can

now calculate how many scale servers you


will need

36

Data Loading Fast Enough?


Bulk load scales near linearly with bulk

streams
Measured so far up to 96

cores

Possible to reach 100% CPU load on all

cores
Just Get rid of all bottlenecks

37

Q&A
38

2008 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S.
and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond
to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after
the date of this presentation.
MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

39

Tuning ETL and ELT

APPENDIX

40

Data Loading Links


The Data Loading Performance Guide
Top 10 SQL Server Integration Services Best

Practices
Managing and Deploying SQL Server Integrat
ion Services
SQL Server 2005 Integration Services: A Str
ategy for Performance
Integration Services: Performance Tuning Te
chniques
High Impact Data Warehousing with SQL Serv
er Integration Services
41

S-ar putea să vă placă și