Documente Academic
Documente Profesional
Documente Cultură
Level 300
Bob Duffy
DTS 2000
SSIS 2005
1.75 Developers
Minimize staging (else use RawFiles if possible) Hardware Infrastructure: Disks, RAM, CPU, Network SQL Infrastructure: File Groups, Indexing, Partitioning
Measure Tune
Parallelize
Replace destinations with RowCount Source->RowCount throughput Source->Destination throughput OVAL performance tuning strategy The Three Ss Data Flow Bag of Tricks
Sharpen Share
Spend
Row Based
(synchronous)
Partially Blocking
(asynchronous)
Blocking
(asynchronous)
http://msdn.microsoft.com/en-us/library/ms345346.aspx
Database
EMC CX3-80
4 Gb Fiber Channel Source servers: Unisys ES3220L 2 sockets each with 4 core Intel 2 GHz CPUs 4 GB RAM Windows Server 2008 SQL Server 2008 Destination server: Unisys ES7000/One 32 sockets each with dual core Intel 3.4 GHz CPUs 256 GB RAM Windows Server 2008 SQL Server 2008
Unisys ES7000/one Enterprise Server Microsoft Windows Server 2008 x64 Datacenter Edition 32 socket dual core Intel Xeon 3.4 GHz (7140M) 256 GB 8 dual port 4Gbit FC Intel PRO/1000 MT Server Adapter Pre-release build of SQL Server 2008 Enterprise Edition (V10.0.1300.4) EMC Clariion CX3-80 (Qty 1)
11 trays of 15 disks; 165 spindles x 146 GB 15Krpm; 4Gbit FC
Quantity: Make: Model: OS: CPU: RAM: HBA: NIC: Database: Storage:
4 Unisys ES3220L Windows2008 x64 Enterprise Edition 2 socket quad core Intel Xeon processors @ 2.0GHz 4 GB 1 dual port 4Gbit Emulex FC Intel PRO1000/PT dual port Pre-release build of SQL Server 2008 Integration Services (V10.0.1300.4) 2x EMC CLARiiON CX600 (ea: 45 spindles, 4 2Gbit FC)
C1
C1
C1
C1
Orders Table
Partition 1 Partition 2 Partition 3 Partition 4 Partition 5 Partition 6 Partition 55 Partition 56
Orders_1
Orders_2
Orders_3
Orders_4
Orders_5
Orders_6
... ...
Orders_55
Orders_56
orders.tbl.1
SSIS
orders.tbl.2
SSIS
orders.tbl.3
SSIS
orders.tbl.4
SSIS
orders.tbl.5
SSIS
orders.tbl.6
SSIS
orders.tbl.55
SSIS
orders.tbl.56
SSIS
People & Processes Kimballs ETL and SSIS books are an excellent reference
Resource contention, processing windows, SSIS does not forgive bad database design Old principles still apply e.g. load with/without indexes?
Will this run on IA64 / X64? No BIDS on IA64 how will I debug? Is OLE-DB driver XXX available on IA64? Memory and resource usage on different platforms
Platform considerations
Break complex ETL into logically distinct packages (vs monolithic design) Improves development & debug experience
Separate sub-processes within package into separate Containers More elegant, easier to develop Simple to disable whole Containers when debugging
Use Script Task/Transform for one-off problems Build custom components for maximum re-use
Annotations
Error Logging Configurations
BEFORE:
AFTER:
select select * from etl.uf_FactStoreSales(@Date) dbo.Tbl_Dim_Store.SK_Store_ID , Tbl_Dim_Store.Store_Num ,isnull(dbo.Tbl_Dim_Merchant_Division.SK_Merch_Di v_ID, 0) as SK_Merch_Div_ID from dbo.Tbl_Dim_Store left outer join dbo.Tbl_Dim_Merchant_Division on dbo.Tbl_Dim_Store.Merch_Div_Num = dbo.Tbl_Dim_Merchant_Division.Merch_Div_N um where Current_Row = 1
Use the power of TSQL to clean the data 'on the fly'
Too many moving parts is inelegant and likely slow But dont be afraid to experiment there are many ways to solve a problem Allocate enough threads EngineThreads property on DataFlow Task See Performance Talk
Full Cache for small lookup datasets No Cache for volatile lookup datasets Partial Cache for large lookup datasets
Tradeoff memory vs. performance Can use Merge Join component instead
Full Cache is optimal, but uses the most memory, also takes time to load Partial Cache can be expensive since it populates on the fly using singleton SELECTs No Cache uses no memory, but takes longer
Catch is that it requires Sorted inputs See SSIS Performance white paper for more details
Custom components
Can written in any .Net language Must be signed, registered and installed but can be widely re-used Quite fiddly for single task
Scripts
Can be written in VisualBasic.Net or C# Are persisted within a package and have limited reuse Have template methods already created for you
http://sqlcat.com
http://technet.microsoft.com/en-us/library/bb961995.aspx
http://blogs.msdn.com/sqlperf/archive/2008/02/27/etl-world-record.aspx