Sunteți pe pagina 1din 72

tips and ticks

for better SSIS


performance
david peter hansen
microsoft certified master
david@davidpeterhansen.com | @dphansen
4
parallelise
with a queue
scenario
lots and lots of child packages
no dependencies
some take long, some short
create a queue
add path to all packages to the queue
pop from queue
pop item (child package path) from queue
execute child package
keep doing this until queue is empty
parallelise
run in parallel
e.g. as many as number of logical cores
measure
balanced run
create a queue
pop from queue
continue until done
parallelise
some take long, some short
more rows
per buffer
buffer
group of data
allocated memory for rows and columns
dynamically sized
not a waterfall
not from transformation to transformation
group of transformations pass over the buffers
in-place changes to data
less is more
less columns = more rows per buffer
narrow data types = more rows per buffer
unused columns
remove them
dont do SELECT *
RunInOptimizedMode
engine will not allocate
for unused columns
more rows
remove unused columns
narrow your data types
RunInOptimizedMode
buffers,
buffers,
size your
buffers
size of buffers
calculated by data flow engine
size of buffers
sizerow = calculating the
estimated size of
a single row of data

sizebuffer = sizerow * DefaultBufferMaxRows

> DefaultBufferSize < MinBufferSize < DefaultBufferSize


(64KB) > MinBufferSize

engine decreases engine increases engine sizes the buffer as close


number of rows number of rows as possibleto sizebuffer
change the
values
DefaultBufferSize
DefaultBufferMaxRows
goal (if you have enough memory)
small number of large buffers
fit as many rows into a buffer
watch out for paging to disk (use perfmon)
DefaultBufferSize
default is 10 MB
max is 100 MB
DefaultMaxBufferRows
databytes = 45,552 * 1,024 = 46,645,248
sizerow = 46,645,248 / 776,286 = 60 bytes
DefaultMaxBufferRows
= DefaultBufferSize / sizerow
= 10 MB * 1,024 * 1,024 / 60
= 174,762
buffer size
begin with default values
make sure you have enough memory
turn on logging: BufferSizeTuning (how many rows in each buffer)
change DefaultBufferMaxRows/DefaultBufferSize
do not
block
the road
blocking nature
non-blocking transformations
semi-blocking transformation
blocking transformations
non-blocking
streaming
row-based
semi-blocking
hold up rows for
a period of time
semi-blocking
what if one input is slower than the other?
exceed the buffer memory while waiting (potentially)
throttle the source (new feature in 2012)
blocking
all rows into memory
try to avoid these
real world
~30 sources / millions of rows
low-memory condition notification to SSIS
spill to disk
this was in production!
push down
try to push down to the source
GROUP BY
ORDER BY
dont block
avoid row based non-blocking transformations
be careful with semi-blocking transformation
avoid blocking transformations
bulk insert
your data
data destinations
SQL Server Destination
shared memory
OLE DB Destination
tcp/ip
named pipes
use fast load
batch size
Maximum Insert Commit Size (MICS)
> buffer size one commit for every buffer
=0 entire batch is committed in
one big batch
< buffer size commit after MICS
& commit after
every buffer
smaller
commit size
inserting into a table with indexes
sort must happen for every index
smaller commit size makes sort fit in memory
beware of fragmentation
indexes
disable/drop indexes before bulk load
rebuild/create indexes after bulk load
clustered index
source data is ordered by cluster key
specify ORDER hint in FastLoadOptions
minimal logged if empty or trace flag 610
bulk insert
sql server destination vs. ole db destination
maximum insert commit size
tables with indexes
insert into clustered index
dont spool
your blob data
blob data
xml
varchar(max) / nvarchar(max)
varbinary(max)
DT_TEXT / DT_NTEXT
DT_IMAGE
blob in pipeline
half of a buffer for in-row data
half of a buffer for blob data
blob spooled
dont fit in memory
spooled to disk
spool to disk
if you really really have to
use SSDs
BufferTempStoragePath
BLOBTempStoragePath
default is TMP/TEMP (C:\...)
minimize spool
size DefaultBufferSize & DefaultBufferMaxRows
1) find max
blob buffer
DefaultBufferSize = e.g. 100MB
MaxBufferSizeblob = 100 MB / 2 = 50 MB
2) estimate size
of blob data
3) set Default
BufferMaxRows
< MaxBufferSize / estimated size of blob data
dont spool blob
xml, varchar(max), varbinary(max), DT_TEXT
half of a buffer for blob data
BufferTempStoragePath / BLOBTempStoragePath
size DefaultBufferSize / DefaultBufferMaxRows
divide
and
conquer
real world
very complicated business logic
almost 100 transformations in one data flow
in production!
very hard to debug or tune
divide
split up your package extract/stage
dimensions
facts
very large data flows?
stage your data SQL Server or RAW
conquer
master package
parallelise if possible
optimize
your query
t-sql queries
source components
lookup transformations
query plan
tune your query
use sql sentry plan explorer (free)
indexing
cardinality estimation problems
with (nolock)
use with care
dirty reads / inconsistent data
make sure nobody is writing to the table
optimize
tune your query
with (nolock) use with care
get the data,
already
blocking iterators
sometimes you dont want to wait for the query
to return the rows
running time
without hint:
CPU time = 498 ms, elapsed time = 2043 ms.

with fast hint:


CPU time = 1186 ms, elapsed time = 2231 ms.
option (fast n)
optimize plan for the first n rows
can help remove blocking iterators
data into SSIS faster
at a cost of overall query performance
n = DefaultMaxBufferRows
dont guess,
measure it
approach
measure performance
come up with a hypothesis
tune your package
measure performance again
10 tips and tricks
1) parallelise with a queue
2) more rows per buffer
3) buffers, buffers, size your buffers
4) do not block the road
5) bulk insert your data
6) dont spool your blob data
7) divide and conquer
8) optimize your query
9) get the data, already
10) dont guess, measure it
references
troubleshooting ssis package performance issues
blogs.msdn.com/b/mattm/archive/2011/08/07/troubleshooting-ssis-package-performance-issues.aspx

top 10 sql server integration services best practices


sqlcat.com/sqlcat/b/top10lists/archive/2008/10/01/top-10-sql-server-integration-services-best-practices.aspx

the data loading performance guide


msdn.microsoft.com/en-us/library/dd425070.aspx

data flow performance features


msdn.microsoft.com/en-us/library/ms141031.aspx

performance counters
msdn.microsoft.com/en-us/library/ms137622.aspx

ssis operational and tuning guide


msdn.microsoft.com/en-us/library/jj873729.aspx

monitor your ssis package


msdn.microsoft.com/en-us/library/ms137622.aspx

ssis performance design patterns


www.mattmasson.com/2012/04/ssis-performance-design-patterns/

raw files are awesome


www.jasonstrate.com/2011/01/31-days-of-ssis-raw-files-are-awesome-131/

the ssis tuning tip that everyone misses


sqlblog.com/blogs/rob_farley/archive/2011/02/17/the-ssis-tuning-tip-that-everyone-misses.aspx

ssis: tuning buffer size


sqlsolace.blogspot.co.uk/2010/06/ssis-tuning-buffer-size.html
davidpeterhansen.com/talks