Using Columstore Indexes PDF

Topic Transcript
Page 1 of 18
Using ColumnStore Indexes, Partitions, and Compression

Learning Objective
After completing this topic, you should be able to

describe the best practices for creating columnstore indexes
1. Demo: Columnstore indexes

SQL Server 2012 introduces columnstore indexes. Well clustered and nonclustered indexes in SQL Server are based upon rows. In 2012, Microsoft introduces columnstore indexes. Columnstore indexes, the storage of the data is based upon the columns rather than the rows. This moves away from the row-based paradigm indexing that we've seen. Now there are a number of limitations that you will see in columnstore indexing. However, keep in mind, this was only first introduced in SQL Server 2012. Service Pack 1 has enhancements to columnstore indexes. As Microsoft has also intimated, that there will be future enhancements to columnstore indexes. It does provide another means of indexing data. The biggest detriment that you'll see is that there is no way to insert, update, or delete on a table that contains a columnstore index. You must first drop the table and then recreate the index. In this demonstration, we'll look at columnstore indexes. So just a couple of things to point out within a columnstore index. First and foremost, columnstore indexes are available in the enterprise edition. That means if you're running standard or Business Intelligence, you don't have columnstore indexes. Obviously, that requires more money for licensing. So you might want to take into consideration the overall return on investment or benefit when you look at licensing. All right, another big issue to keep in mind when you're looking at a columnstore index is the fact that a columnstore index requires that the table in which it's built on be read-only. Now if it's not read-only, that really doesn't matter because you will not be able to insert, update, or delete into that table. You must first drop the columnstore index, then you can do your transactions, and then you will be able to rebuild or recreate that index.
Graphic
The SQLQuery5.sql file is open in Microsoft SQL Server Management Studio, the AdventureWorksDW2012 database is selected, and the following code is displayed: --Create the Columnstore index: CREATE COLUMNSTORE INDEX CSI_InternetSales ON dbo.FactInternetSales (SalesOrderLineNumber, SalesTerritoryKey, CustomerKey, ProductKey, OrderDateKey, OrderQuantity, SalesAmount, UnitPrice, DiscountAmount); GO --Use the Columnstore index:
http://library.skillport.com/courseware/Content/cca/md_dwsq_a01_it_enus//output/t33/mi... 10/30/2013
Topic Transcript
Page 2 of 18
SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear;
Now just like a nonclustered or a clustered index, a columnstore index will automatically be included by the query optimizer. You can also utilize query hints. When the query references a column in the columnstore index, only that row is fetched from this. Now this provides a great deal of enhancement and performance in that it will reduce disk I/O and memory cache consumption. Your rows must be reconstructed when they are fetched. Now this can, obviously, cause CPU and memory resources to be used. Because of the row reconstruction, tables with columnstore indexes are read-only. Again, that is a huge limitation. Now there have been discussions as to the enhancement of columnstore indexes to remove that actual limitation. What I do I want to mention is the fact that columnstore indexes are...this is their first iteration...they were just introduced in SQL Server 2012, right. So they are brand new. Many of the limitations that we currently have, you might see being removed as the columnstore index evolves. Because of the row reconstruction, tables with the indexes are read-only, in order for your transactions to occur, the index must be dropped. In order to alter the data in the table or to move the data to a new partition, if the table is partitioned, you can alter and then move it back.
Graphic
The SQLQuery5.sql file is displayed in Microsoft SQL Server Management Studio.
Now the best place to use a columnstore index is on fact tables. It uses all of the columns on the fact table in the columnstore index. And in this example, we'll look at utilizing a columnstore index on a fact table.
Graphic
The SQLQuery5.sql file is displayed in Microsoft SQL Server Management Studio.
So what I'm going to do is, I'm going to go ahead and create my columnstore index. And actually first, let's look at our execution plan of my query in Fac t I nt er net Sal es . And here, if I scan all the way over, I can see that first I have a Clustered Index Scan on my primary key, of the date key. In parallel, I have a Clustered Index Scan on my primary key Sal esOr der Li neNum ber . I then have a Hash M at ch that's used to join those two results together and another Hash Match that's used to join my
Topic Transcript
Page 3 of 18
NonClustered dimension Index Scan on the Di m Pr oduct subcategory key, a Sort, that's done. And finally, a Stream Aggregate. I'll create my COLUM NSTORE I NDEX. And now, I'm going to take a look at the execution plan. And what we're going to see is, the use of my CSI _I nt er net Sal es . So by sorting down through, here I can see now I'm utilizing that columnstore index, the nonclustered. So I can see that, that columnstore index is being used instead of the index that was used previously.
Graphic
The SQLQuery5.sql file is displayed in Microsoft SQL Server Management Studio. The instructor selects the following code and then clicks the Display Estimated Execution Plan icon: SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear; GO The Messages and Execution plan tabbed pages are displayed. The Execution plan tabbed page is open and contains one section called Query 1: Query cost (relative to the batch): 100%. This section pertains to the selected command and contains, at least, six icons. The first icon is the SELECT icon with 0% cost. The second icon is Stream Aggregate (Aggregate) with 0% cost. This icon connects to the SELECT icon. The third icon, Sort has a 1% cost and connects to the Stream Aggregate (Aggregate) icon. The fourth icon is the Hash Match (Inner Join) icon with 1% cost. The Hash Match (Inner Join) icon connects to the Sort icon. The fifth and sixth icons are at the same level in the hierarchy and are called Index Scan (NonClustered) with 0% cost and Hash Match (Aggregate) with 16% cost. Both these icons connect to the Hash Match (Inner Join) icon with 1% cost. The instructor scrolls to the right to show three more icons on the Execution plan tabbed page. The first of these additional icons is the Hash Match (Inner Join) icon with 15% cost. This icon connects to the Hash Match (Inner Join) icon with 16% cost. The next two icons are at the same level and are called Clustered Index Scan (Clustered). The first of these icons pertains to the DimDate table and has 2% cost. The second icon pertains to the FactInternetSales table has 66% cost. Then the instructor points to the Clustered Index Scan (Clustered) icon for the DimDate table and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Clustered Index Scan, Logical Operation is Clustered Index Scan, Estimated Execution Mode is Row, Estimated Operator Cost is 0.0368032 (2%), Estimated I/O Cost is 0.342361, Estimated CPU Cost is 0.0025671, Estimated Subtree Cost is 0.3368032, Estimated Number of Executions is 1, Estimated Number of Rows is 2191, Estimated Row Size is 13 B, Ordered is False, Note ID is 6, Object is [AdventureWorksDW2012].[dbo]. [DimDate].[PK_DimDate_DateKey][d], and Output List is [AdventureWorksDW2012].[dbo]. [DimDate].DateKey, [AdventureWorksDW2012].[dbo].[DimDate].CalendarYear. Next the instructor points to the Clustered Index Scan (Clustered) icon for the FactInternetSales table
Topic Transcript
Page 4 of 18
and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Clustered Index Scan, Logical Operation is Clustered Index Scan, Estimated Execution Mode is Row, Estimated Operator Cost is 1.58602 (66%), Estimated I/O Cost is 1.51942, Estimated CPU Cost is 0.0665948, Estimated Subtree Cost is 1.58602, Estimated Number of Executions is 1, Estimated Number of Rows is 60398, Estimated Row Size is 23 B, Ordered is False, Note ID is 7, Object is [AdventureWorksDW2012].[dbo]. [FactInternetSales].[PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber][I], and Output List is [AdventureWorksDW2012].[dbo].[FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo].[FactInternetSales].OrderDateKey, [AdventureWorksDW2012]. [dbo].[FactInternetSales].SalesAmount. Then the instructor points to the Hash Match (Inner Join) icon with 15% cost and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Hash Match, Logical Operation is Inner Join, Estimated Execution Mode is Row, Estimated I/O Cost is 0, Estimated Operator Cost is 0.3541168 (15%), Estimated Subtree Cost is 1.97694, Estimated CPU Cost is 0.354122, Estimated Number of Executions is 1, Estimated Number of Rows is 50507.5, Estimated Row Size is 21 B, Note ID is 7, Output List is [AdventureWorksDW2012].[dbo].[FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo].[FactInternetSales].SalesAmount, [AdventureWorksDW2012].[dbo]. [DimDate].CalendarYear, and Hash Keys probe is [AdventureWorksDW2012].[dbo]. [FactInternetSales].OrderDateKey. Next the instructor points to the Hash Match (Inner Join) icon with 16% cost and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Hash Match, Logical Operation is Aggregate, Estimated Execution Mode is Row, Estimated I/O Cost is 0, Estimated Operator Cost is 0.37306 (16%), Estimated Subtree Cost is 2.35, Estimated CPU Cost is 0.373059, Estimated Number of Executions is 1, Estimated Number of Rows is 792.76, Estimated Row Size is 21 B, Note ID is 4, Output List is [AdventureWorksDW2012].[dbo].[FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo].[DimDate].CalendarYear, partialagg1010, and Build Residual is [AdventureWorksDW2012].[dbo].[FactInternetSales].[ProductKey] as [I].[ProductKey] = [AdventureWorksDW2012].[dbo].[FactInternetSales].[ProductKey] as [I].[ProductKey] AND [AdventureWorksDW2012].[dbo].[DimDate].[CalendarYear] = [AdventureWorksDW2012].[dbo]. [DimDate].[CalendarYear] as [d].[CalendarYear]. Then the instructor points to the Index Scan (NonClustered) icon and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Index Scan, Logical Operation is Index Scan, Estimated Execution Mode is Row, Storage is RowStore, Estimated I/O Cost is 0.0038657, Estimated Operator Cost is 0.0046893 (0%), Estimated Subtree Cost is 0.0046893, Estimated CPU Cost is 0.0008236, Estimated Number of Executions is 1, Estimated Number of Rows is 606, Estimated Row Size is 15 B, Ordered is False, Note ID is 3, Object is [AdventureWorksDW2012]. [dbo].[DimProduct].[IX_DimProduct_ProductSubcategoryKey][P], and Output List is [AdventureWorksDW2012].[dbo].[DimProduct].ProductKey, [AdventureWorksDW2012].[dbo]. [DimProduct].ProductSubcategoryKey. Next the instructor points to the Sort icon and a popup containing information about the icon is displayed.
Topic Transcript
Page 5 of 18
This popup indicates that Physical Operation is Sort, Logical Operation is Sort, Estimated Execution Mode is Row, Estimated I/O Cost is 0.0112613, Estimated Operator Cost is 0.02241 (1%), Estimated Subtree Cost is 2.4061, Estimated CPU Cost is 0.0111529, Estimated Number of Executions is 1, Estimated Number of Rows is 742.913, Estimated Row Size is 21 B, Note ID is 1, Output List is [AdventureWorksDW2012].[dbo].[DimProduct].ProductSubcategoryKey, [AdventureWorksDW2012]. [dbo].[DimDate].CalendarYear, partialagg1010, and Order By is [AdventureWorksDW2012].[dbo]. [DimProduct].ProductSubcategoryKey Ascending, [AdventureWorksDW2012].[dbo]. [DimDate].CalendarYear Ascending. Then the instructor points to the Stream Aggregate (Aggregate) icon and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Stream Aggregate, Logical Operation is Aggregate, Estimated Execution Mode is Row, Estimated Operator Cost is 0.00053 (0%), Estimated I/O Cost is 0, Estimated CPU Cost is 0.0005295, Estimated Subtree Cost is 2.40663, Estimated Number of Executions is 1, Estimated Number of Rows is 167.562, Estimated Row Size is 21 B, Note ID is 0, Output List is [AdventureWorksDW2012].[dbo].[DimProduct].ProductSubcategoryKey, [AdventureWorksDW2012].[dbo].[DimDate].CalendarYear, Expr1006, and Group By is [AdventureWorksDW2012].[dbo].[DimProduct].ProductSubcategoryKey, [AdventureWorksDW2012]. [dbo].[DimDate].CalendarYear. Next the instructor runs the following code: --Create the Columnstore index: CREATE COLUMNSTORE INDEX CSI_InternetSales ON dbo.FactInternetSales (SalesOrderLineNumber, SalesTerritoryKey, CustomerKey, ProductKey, OrderDateKey, OrderQuantity, SalesAmount, UnitPrice, DiscountAmount); GO The Messages tabbed page is displayed. This tabbed page indicates that the commands were completed successfully. Then the instructor selects the following code and clicks the Display Estimated Execution Plan icon: SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear; GO The Messages and Execution plan tabbed pages are displayed. The Execution plan tabbed page is open and contains one section called Query 1: Query cost (relative to the batch): 100%. This section pertains to the selected command and contains six icons.
Topic Transcript
Page 6 of 18
The first icon is a Hash Match (Inner Join) icon with 3% cost. The second and third icons are at the same level and are called Index Scan (NonClustered) with 1% cost and Hash Match (Aggregate) with 41% cost. Both these icons connect to the Hash Match (Inner Join) icon with 3% cost. The fourth icon is a Hash Match (Inner Join) icon with 39% cost. This icon connects to the Hash Match (Aggregate) icon with 41% cost. The remaining two icons are Clustered Index Scan (Clustered) with 4% cost and Columnstore Index Scan (NonClustered) with 9% cost. The instructor points to the Columnstore Index Scan (NonClustered) icon and a popup containing information about the icon is displayed. This popup indicates that Physical Operation is Columnstore Index Scan, Logical Operation is Index Scan, Estimated Execution Mode is Row, Storage is ColumnStore, Estimated I/O Cost is 0.0157176, Estimated Operator Cost is 0.0823124 (9%), Estimated Subtree Cost is 0.0823124, Estimated CPU Cost is 0.665948, Estimated Number of Executions is 1, Estimated Number of Rows is 60398, Estimated Row Size is 23 B, Ordered is False, Note ID is 7, Object is [AdventureWorksDW2012]. [dbo].[FactInternetSales].[CSI_InternetSales[I}, and Output List is [AdventureWorksDW2012].[dbo]. [FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo].[FactInternetSales].OrderDateKey, [AdventureWorksDW2012].[dbo].[FactInternetSales].SalesAmount.
Now I can actually use my query hint of W I TH I NDEX of zero, calling in the zero-based index value, which is going to be the clustered index. So what I want to do is, I want to compare these two. I want to compare my query, which would utilize that clustered index. Here we can see the primary key. Again, going back to the Fac t I nt er net Sal es primary key, compared to utilizing the columnstore index. And what you'll notice here is in the comparison.
Graphic
The SQLQuery5.sql file is displayed in Microsoft SQL Server Management Studio. The instructor adds the following code: SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I WITH (INDEX(0)) INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear; Then the instructor runs the following code: SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I WITH (INDEX(0)) INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d
Topic Transcript
Page 7 of 18
ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear; GO SELECT P.ProductSubcategoryKey, D.CalendarYear, SUM(I.SalesAmount) AS Sales FROM dbo.FactInternetSales AS I INNER JOIN dbo.DimProduct AS P ON I.ProductKey = p.ProductKey INNER JOIN dbo.DimDate AS d ON I.OrderDateKey = D.DateKey GROUP BY P.ProductSubcategoryKey, D.CalendarYear ORDER BY P.ProductSubcategoryKey, D.CalendarYear; The Messages and Execution plan tabbed pages are displayed. The Execution plan tabbed page is open and a section called Query 1: Query cost (relative to the batch): 73% is displayed. This section pertains to the selected command and contains seven icons. The instructor scrolls down to display and point to the last icon called Clustered Index Scan (Clustered) with 66% cost. A popup containing information about the icon is displayed. This popup indicates that Physical Operation is Clustered Index Scan, Logical Operation is Clustered Index Scan, Estimated Execution Mode is Row, Estimated Operator Cost is 1.58602 (66%), Estimated I/O Cost is 1.5195, Estimated CPU Cost is 0.0665163, Estimated Subtree Cost is 1.58602, Estimated Number of Executions is 1, Estimated Number of Rows is 60398, Estimated Row Size is 23 B, Ordered is False, Note ID is 7, Object is [AdventureWorksDW2012].[dbo].[FactInternetSalese]. [PK_FactInternetSales_SalesOrderNumber_SalesOrderLineNumber][I], and Output List is [AdventureWorksDW2012].[dbo].[FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo]. [FactInternetSales].OrderDateKey, [AdventureWorksDW2012].[dbo]. [FactInternetSales].SalesAmount.
First, it shows, that the first query that uses the query hint, forcing the optimizer to utilize the primary key of Fac t I nt er net Sal es , takes 70% of the overall cost. In comparison, my index or my query that utilizes the query optimizer's recommendation of that columnstore index, in comparison, only takes 27% of the cost. So you can see a great difference between these two. Now I'm able to see here, utilizing the columnstore index CSI _I nt er net Sal es , it is much more efficient, it has a lower cost. So the columnstore index, again, it is not like a nonclustered or a clustered index. It is based upon the columns. It provides the ability to include a number of different columns that will benefit specific queries. It is ideal to be placed upon a fact table, generally, because of the data types of fact tables are, generally, going to be numeric based. So they're going to be...take less space than what you would see within a c har , var c har , or a uni c ode char , uni c ode variable character. The limitation, however is the fact that before I can do an incremental or a full update of the table that contains or utilizes a columnstore index, I must first drop the index, and then do the transaction, and then recreate that index.
Graphic
The SQLQuery5.sql file is displayed in Microsoft SQL Server Management Studio. The Execution plan tabbed page is open. The instructor scrolls down to display the section called Query 2: Query
Topic Transcript
Page 8 of 18
cost (relative to the batch): 27%. This section contains, at least, six icons. The first four icons are SELECT with 0% cost, Stream Aggregate (Aggregate) with 0% cost, Sort with 2% cost, and Hash Match (Inner Join) with 3% cost. Each of these icons connects to their preceding icon. The next two icons are Index Scan (NonClustered) with 1% cost and Hash Match (Aggregate) with 41% cost. Both these icons connect to the Hash Match (Inner Join) icon. Then the instructor scrolls to the right to display three more icons. The first icon called Hash Match (Inner Join) with 39% cost connects to the Hash Match (Aggregate) icon with 41% cost. The next two icons are the Clustered Index Scan (Clustered) with 4% cost and Columnstore Index Scan (NonClustered) with 9% cost. Both these icons connect to the Hash Match (Inner Join) icon with 39% cost. The instructor points to the last icon called Clustered Index Scan (NonClustered) with 9% cost. A popup containing information about the icon is displayed. This popup indicates that Physical Operation is Clustered Columnstore Index Scan, Logical Operation is Index Scan, Estimated Execution Mode is Row, Storage is ColumnStore, Estimated I/O Cost is 0.0157176, Estimated Operator Cost is 0.0823124 (9%), Estimated Subtree Cost is 0.0823124, Estimated CPU Cost is 0.0665948, Estimated Number of Executions is 1, Estimated Number of Rows is 60398, Estimated Row Size is 23 B, Ordered is False, Note ID is 7, Object is [AdventureWorksDW2012].[dbo].[FactInternetSalese].[CSI_InternetSales][I], and Output List is [AdventureWorksDW2012].[dbo].[FactInternetSales].ProductKey, [AdventureWorksDW2012].[dbo]. [FactInternetSales].OrderDateKey, [AdventureWorksDW2012].[dbo]. [FactInternetSales].SalesAmount.
So after mentioning the limitations of columnstore indexes, what are some of the best practices? When would it be useful? If you have a very large dimension table, that can benefit. But small dimensions do not benefit from our columnstore indexing. Other best practices, obviously, read-mostly workloads. If you've gone ahead and you've created a table that contains or maintains your historic information, which means you won't have updates, it's read-only, you will benefit from that since the columnstore index cannot be updated as well as the table cannot be updated without dropping that. Do not use data that is updated often, use small lookup queries instead. Again, it is a read-only index, you can only be nonclustered for the limitations of the columnstore indexes. There obviously, can only be one columnstore index per table. It must be partitioned, aligned. It cannot be created on indexed views and your columnstore index does not support filtering.
2. Demo: Partitioning
So table and index partitioning was first introduced in SQL Server 2005. It is part of the database engine. Now previous to 2005, we had what was called partitioning, but it was partitioning using views. All right, so just to give you some insight, partitioning is done in the database engine for tables and indexes. In 2000 and before, consider that we had a number of data that was spread out over different servers. Now I have, let's just say for example, retail sales. I have four different stores, I have a northwest store, a southwest store, a northeast store, a southeast store. Each store has an identical schema, each store maintains its own database, its own instance of SQL Server.
Graphic
Topic Transcript
Page 9 of 18
The SQLQuery6.sql file is open in Microsoft SQL Server Management Studio, the AdventureWorksDW2012 database is selected, and the following code is displayed: -- create partition function CREATE PARTITION FUNCTION pf_OrderDate (int) AS RANGE LEFT FOR VALUES (20081231, 20091231); GO -- Create the partition scheme, the 4 filegroups will need to be added to the database -- before this code is executed, do not show in the demo, however it should be noted to the learner CREATE PARTITION SCHEME ps_OrderDate AS PARTITION pf_OrderDate TO (fg2008, fg2009, fg2010, fg2011); GO -- Create the fact table that will be partitioned CREATE TABLE dbo.FactSaleOrders (CustomerKey int NOT NULL, ProductKey int NOT NULL, OrderDateKey int NOT NULL,
Now what I need to do is, I need to create a means of showing the northeast, northwest, southeast, southwest data all together. In order to do this, I create a query. That query is going to use a distributed query with a UNI ON or UNI ON ALL . And just to give you an example, I could say, CREATE my VI EWdistributed sales AS. Now I'm going to say SELECT all from northwest. And this would, generally, be a linked server or I could use an open query. But just for simplicity's sake, I'll use a link server. dbo schema, Sal es table, so I have my...I have my server name NW , followed by the database name Dat aW ar ehous e , followed by the schema, followed by the table or the view in which it resides, right. So now I'm selecting all of that, but that's only coming from my northwest. So what I want to do is I want to UNI ON ALL northwest. And here I'll say northeast and here I'll say southwest and southeast. All right, so this one query, this one view is going to return this partitioned type of table.
Graphic
The SQLQuery6.sql file is open in Microsoft SQL Server Management Studio. The instructor clicks the New Query button and the SQLQuery7.sql file opens. In this file, the instructor enters the following code: CREATE VIEW DistSales AS SELECT * FROM NW.DataWarehouse.dbo.Sales UNION ALL
Topic Transcript
Page 10 of 18
SELECT * FROM NE.DataWarehouse.dbo.Sales UNION ALL SELECT * FROM SW.DataWarehouse.dbo.Sales UNION ALL SELECT * FROM SE.DataWarehouse.dbo.Sales
Now the partitioning isn't really done in the database engine. That all changed in SQL Server 2005. In 2005, we have the ability to create partitioned tables and partitioned indexes. In order to create a partitioned table, the first thing that I need to do is, I need to create a PARTI TI ON FUNCTI ON. That PARTI TI ON FUNCTI ON defines the value or the range. And here I'm saying, I want to CREATE a PARTI TI ON FUNCTI ON called pf _Or der Dat e . I then want to specify the RANGE as LEFT. Anything that is less than 2008, December 31, will go in the first partition. Then I want to create a RANGE that will be greater than 2008, December 31. Anything that falls between that value and my next value of 20091231, will fall within that middle range. Then anything that is equal to or greater than December 31, 2009, will fall within that range. Once that's created, my PARTI TI ON FUNCTI ON, I need to use a PARTI TI ON SCHEM E. My PARTI TI ON SCHEM E will reference my PARTI TI ON FUNCTI ON. I must have a minimum of file groups available to apply to that partition range.
Graphic
The SQLQuery7.sql file is open in Microsoft SQL Server Management Studio. The instructor navigates back to SQLQuery6.sql and runs the following code: -- create partition function CREATE PARTITION FUNCTION pf_OrderDate (int) AS RANGE LEFT FOR VALUES (20081231, 20091231); GO The Messages tabbed page is displayed. This tabbed page indicates that the commands were completed successfully.
Now in this case, I have arranged for the LEFT that will be anything less than December 31, 2008. So that's one. Anything between December 31, 2008 and December 31, 2009 that is a second file group. And anything that will be greater than December 31, 2009...I need a minimum of three file groups in this case. And it's going to be based upon the range and how many ranges. You can see, in this PARTI TI ON FUNCTI ON, I create file groups that go...I specify four file groups. Now that's fine, I have to have a minimum of three, but I can have more than three.
Graphic
The SQLQuery6.sql file is open in Microsoft SQL Server Management Studio.
Topic Transcript
Page 11 of 18
Now if I attempt to create this PARTI TI ON FUNCTI ON utilizing my PARTI TI ON, excuse me PARTI TI ON SCHEM E utilizing PARTI TI ON FUNCTI ON, you'll see it shows, Invalid object name. Well that's because my AdventureWorks database must have those file groups defined. So what I'll do now is I'm going to add some files, I'll call this FG1 , it'll be Rows. The file group, I'll add a new file group. and see I want Fg 08, 2009, 2010, 2011, okay. So 2008, I'll add Rows, I'm going to add another file group. Let me make sure, yeah, 2008, 2009, good. 2009, I'll add another, 2010, and my final, 2011. All right, so that creates the files and the file groups necessary to support this partition schema.
Graphic
The SQLQuery6.sql file is open in Microsoft SQL Server Management Studio and the instructor runs the following code: CREATE PARTITION SCHEME ps_OrderDate AS PARTITION pf_OrderDate TO (fg2008, fg2009, fg2010, fg2011); GO An error message is displayed in the Messages tabbed page. The error message indicates that at level 16, state 58, line 1 and invalid object with the name fg2008 was found. Then from Object Explorer, the instructor right-clicks the AdventureWorksDW2012 node and then selects Properties. As a result, the Database Properties - AdventureWorksDW2012 dialog box opens. In this dialog box, from the Select a page pane, the instructor selects Files and clicks the Add button. A new row is added to the Database files list box. By default, the Logical Name of the row is blank, File Type is Rows, Filegroup is PRIMARY, Initial Size (MB) is 3, Autogrowth / Maxsize is By 1 MB, Unlimited. The instructor specifies the Logical Name of the new row as FG1 and selects <new filegroup> from the Filegroup column of the row. As a result, the New Filegroup for AdventureWorksDW2012 dialog box is displayed. In this dialog box, the instructor specifies the Name as Fg2008 and clicks OK. The dialog box closes and the Filegroup for the new row changes to Fg2008. Similarly, adds three more rows in the Database files list with Logical Names FG2, FG3, and FG4 and Filegroups fg2009, fg2010, and fg2011, respectively. Next the instructor closes the New Filegroup for AdventureWorksDW2012 dialog box.
When I execute it now, it's been created. It now shows file group 11 is marked as the next used file group in the partition. Well you know why? Because I only need three. I need any...a file group for RANGE LEFT anything less than December 31, 2008, anything between 2008 and 2009, December 31, and anything greater than December 31, 2009. So it marks that as file group 11...will be the next used file group. After I created my function, PARTI TI ON FUNCTI ON, created my PARTI TI ON SCHEM E, I can now create my partitioned table or partition index. I use Data Definition Language, or DDL, to do this. The only difference that you'll notice in this DDL statement is the fact I say I want this ON my ps_Or der Dat e . Well the ps_Or der Dat e refers to my PARTI TI ON SCHEM E order date. It applies the file groups. Now I also must specify the column in which to check the values. The column is the Or der Dat eKey , that Or der Dat eKey will be used to apply the value that...it will fall in-between. When I execute this, my table has been created as values are inserted or updated. Automatically, the location of the file group will be dictated based upon the value in the Or der Dat eKey .
Topic Transcript
Page 12 of 18
Graphic
The SQLQuery6.sql file is open in Microsoft SQL Server Management Studio and the instructor runs the following code: CREATE PARTITION SCHEME ps_OrderDate AS PARTITION pf_OrderDate TO (fg2008, fg2009, fg2010, fg2011); GO As a result, the text in the Messages tabbed page indicates that the partition scheme ps_OrderDate was created successfully. The message also indicates that fg2011 is marked as the next used filegroup in the newly created partition scheme. Next the instructor runs the following code: CREATE TABLE dbo.FactSaleOrders (CustomerKey int NOT NULL, ProductKey int NOT NULL, OrderDateKey int NOT NULL, OrderNom int NOT NULL, LineNom int NOT NULL, Quantity smallint NULL, SalesAmount money NULL, CONSTRAINT PK_FactSalesOrder PRIMARY KEY CLUSTERED( CustomerKey, ProductKey, OrderDateKey, OrderNom, LineNom) ) ON ps_OrderDate (OrderDateKey); GO The Messages tabbed page indicates that the commands were completed successfully.
Table and index partitioning was first introduced in SQL Server 2005. This remains an enterprise edition only feature. What it allows you to do is to create multiple files with enough multiple file groups. You can then partition a table across file groups. So consider that what you want to do to reduce disk I/O, is to place historic read-only data on separate file groups. However, it's all part of one table, right. So we don't want to have to go through and create multiple tables for our historic information. We create our fact table and then, as the year passes, as we move into 2013 from 2012, we can go ahead and switch the actual partition from one file group to another file group. Again, this allows us a means to reduce disk I/O. Now as we're doing our inserts, updates, deletes, as we're doing our incremental loads the historic file groups are not going to be seeing that disk I/O since they're placed on actual separate file groups, on actual separate physical disks. Only the file groups that we've marked as active in the partition will see that. This can help to improve performance, it can also help to improve recoverability by using file or file group backup and restore.
Graphic
A sample filegroup active partition table contains partitions containing data for eight weeks, starting from Week 1 until Week 8. When new data is added to this partitioned table, data for Week 1 is removed and data for Week 9 is added after data from Week 8. Next data for Week 1 is placed in a filegroup archive table.
Topic Transcript
Page 13 of 18
So what we see in this slide is our Transact-Structured Query Language, or T-SQL, code that will be used to create our PARTI TI ON FUNCTI ON, our partition scheme, and then the table. So the first thing that we're going to do is, we're going to create our PARTI TI ON FUNCTI ON. The PARTI TI ON FUNCTI ON is called pf _Or der Dat e , and we're going to use a RANGE. We create the empty PARTI TI ON for a new ALTER PARTI TI ON. So now we're going to ALTER our TABLE, and we're going to say that we want to ALTER the TABLE of Fac t Sal es Or der S. We want to SW I TCH PARTI TI ON 1 to our new Fact Sal eOr der Tem p PARTI TI ON 1 . Then we can begin to I NSERT I NTO our fact orders archive, selecting from the fact orders temporary. What this is showing is, first of all, it's showing the ability to create our partitions. We can then begin to move data around within these partitions. The overall, ideal purpose for our partitions is to reduce disk I/O. Also, consider that we want to utilize the columnstore index but we want to place it on a read-only type of table. We can use our PARTI TI ON functions and partitioning for this. This does require the fact that you have enterprise edition for both partitioning as well as columnstore index or compression. Now here, we're going to be able to create our PARTI TI ON FUNCTI ON, we're going to ALTER our TABLE, we're going to SW I TCH the partition of our fact orders temp, then we're going to begin to SELECT I NTO that, it's now utilizing that specific partition. The PARTI TI ON RANGE is going to be based upon the date that was passed in, 2010, December 31. We can then M ERGE the deleted partition to remove it from the file group. Once we've done that, we can then DROP the file group and the temporary table. Now all of this can be used in conjunction together, again, to provide columnstore indexing, partitions, and compression. If you're looking at a partitioning scheme, also keep in mind that it does reduce disk I/O, can help to improve performance, but you could, actually, have degradation during the selecting. And this is because it's based upon the query optimizer and what's called our seeking of the partition. It may require that you rewrite your queries to use the FORCESEEK query hint.
Graphic
The sample T-SQL code to create and manage data in a partitioned table is as follows: -- Create a new empty partition for the new data ALTER PARTITION ALTER PARTITION FUNCTION pf_OrderDate() SPLIT RANGE (20101231) GO -- Switch the first partition in the fact table into a temporary table on the -- same filegroup ALTER TABLE FactSaleOrderS SWITCH PARTITION 1 TO FactSaleOrderTemp PARTITION 1 GO -- Insert temp data into the archive data INSERT INTO FactSaleOrdersArchive SELECT * FROM FactSaleOrderTemp GO -- Merge the deleted partition to remove it from the filegroup ALTER PARTITION FUNCTION pf_OrderData()
Topic Transcript
Page 14 of 18
MERGE RANGE (20081231) GO -- Drop the filegroup and the temp table DROP TABLE FactSaleOrderTemp GO ALTER DATABASE AdventureWorksDW2012 REMOVE FILEGROUP fg2008 GO
3. Demo: Data Compression

SQL Server also supports compression. Compression can be used to reduce the storage requirements. Now obviously, compression can reduce disk I/O. But keep in mind that as it will reduce disk I/O, it will also increase CPU. And that's because it has to compress as it writes, decompress as it reads. So you can see an inverse relationship, reduced disk I/O with increased CPU. However, if you're looking at huge terabytes of data that's being stored, compression can improve the performance of the reads and the writes because it reduces the disk I/O. Compression can be placed at the page or the row level. You have to configure compression separately at each table and define whether that table will use page compression or row-level compression. So again, there're two types of compression within SQL Server. Data compression consists of either row-level compression or page-level compression. Compression can be defined at the table level, enabled at table, at the index, or at the partition. One important thing to remember, again, is not all editions of SQL Server support data compression. Enterprise edition, your premium editions, do support data compression. This specific space that will be saved between row and page-level compression will be dictated based upon the actual data type of each column. And actually, our page-level compression is a subset of row-level compression. So you need to first look at, will the compression improve performance? You have to look at the disk I/O reduction but keep in mind that either row or page-level compression includes an increase in CPU. So in this demonstration, we'll look at utilizing data compression. And actually, we'll first take a look at utilizing data compression when creating our tables and we'll look at using SQL Server Management Studio, the graphical user interface, in order to modify our data compression. We'll start with that. Looking at AdventureWorks. Here I can see in my Tables, I have a number of different tables. Now I found out that I have a huge amount of data that's being used within these tables and I need to apply some type of compression. Now I need to get an idea of which tables have or take up the most space. And one way I can do this is by right-clicking and choosing Reports on the AdventureWorks database, go to Standard Reports, and I'll go to Disk Usage by Table. And what I can see is the fact that...here we go...as it opens up, it is going to show me the table usage, right, the space usage by table. I want to find out, which table is the largest, which might benefit most by utilizing compression. So I can see the reserved space as well as the number of records. I can see that DimProduct has a rather large amount of reserve space. DimCustomer does also, and the products, here we go. FactInternetSales and FactProductInventory are my top offenders, they're the largest tables.
Graphic
Topic Transcript
Page 15 of 18
The SQLQuery8.sql file is open in Microsoft SQL Server Management Studio, the AdventureWorksDW2012 database is selected, and the following code is displayed: -- you first need to create the temporary and the archive tables that will be used (do not show on screen) CREATE TABLE [dbo].[FactSaleOrdersArchive]( [CustomerKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [OrderNom] [int] NOT NULL, [LineNom] [int] NOT NULL, [Quantity] [smallint] NULL, [SalesAmount] [money] NULL, CONSTRAINT [PK_FactSalesOrderArchive] PRIMARY KEY CLUSTERED ( [CustomerKey] ASC, [ProductKey] ASC, [OrderDateKey] ASC, [OrderNom] ASC, [LineNom] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO CREATE TABLE [dbo].[FactSaleOrderTemp]( [CustomerKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [OrderNom] [int] NOT NULL, [LineNom] [int] NOT NULL, [Quantity] [smallint] NULL, From Object Explorer, the instructor expands the Tables node from the AdventureWorksDW2012 node and a number of tables are displayed. Then the instructor collapses the Tables node and rightclicks the AdventureWorksDW2012 node. From the resultant shortcut menu, the instructor points to Reports, points to Standard Reports, and then selects Disk Usage by Table. A report titled Disk Usage by Table is displayed. This report contains a table with columns, such as Table Name, # Records, Reserved (KB), Data (KB), Indexes (KB), and Unused (KB). The instructor notes the details of a few tables. The dbo.DimProduct table has 606 records, with 19,832 KBs of reserved space, 19,616 KBs of data, and 184 indexes. The dbo.DimCustomer table has 18,484 records, 13,608 KBs of reserved space, 12,552 KBs of data, and 920 indexes. The dbo.FactInternetSales table has 60,398 records, 36,704 KBs of reserved space, 17,216 KBs of data, and 14,040 indexes. The dbo.FactProductInventory table has 776,286 records, 49,512 KBs of reserved space, 45,552 KBs of data, and 15,784 indexes.
Now I want to look at my FactProductInventory, I'm going to right-click, I'm going to look at properties, or excuse me, I'm going to look at Storage, and I want to Manage Compression, and I get a wizard. I'll say Next. Now I want to use the same compression for all partitions and even though this is only a single partition, you
Topic Transcript
Page 16 of 18
can select this. If you're using partitioning within the table, then you can specify the partition to use it on. Now what I want to do is, I want to specify whether you use Row or Page level. I'm going to choose Row level compression, and I'll say Calculate. And what this does, it's going to go through and it's going to tell me the overall savings that might occur based upon the specific...here we can, here we go. Current space is 45 MBs. If I apply Row level compression, it will cut that almost in half. If I apply Page level compression, again, I'll go through the Calculate and we'll wait for it to calculate and see what that difference is. So even though I've created the table and didn't initially apply compression, compression can be applied afterwards. Here it shows me that the space will be cut in half, by more than half or less than half by using Page level compression. I could then say Next and walk through to create the script that would apply that compression or run it immediately.
Graphic
The Disk Usage by Table report is open in Microsoft SQL Server Management Studio. From Object Explorer, the instructor right-clicks the dbo.FactProductInventory table, points to Storage, and selects Manage Compression. As a result the Welcome page of the Data Compression Wizard opens. The instructor clicks Next and the Select Compression Type page opens. This page indicates that the selected table contains 1 uncompressed partition with a row count of 776286. The page also contains the Use same compression type for all partitions checkbox. The instructor selects the Row option from the Compression type drop-down list box of the selected partition and clicks Calculate. While the wizard is still calculating, the instructor selects the Page option from the Compression type drop-down list. Later, the table indicates that the Current space is 45.008 MB and the Requested compressed space after Row Compression type is 23.813 MB. The instructor clicks Calculate again to check for Page Compression type and the Requested compressed space is set to 18.453 MB. Next the instructor clicks Next and the Select an Output Option page opens. This page contains the Create script, Run immediately, and Schedule options. The Create script option is selected by default. The page also contains the Script options section that contains the Script to file option which contains the File name text box and the Unicode text and ANSI text options for the Save as subsection. The Script options section also contains the Script to Clipboard and Script to New Query Window options. The Script to New Query Window option is selected by default. Next the instructor clicks Cancel to close the wizard.
Now in this example, we can create our tables and apply compression immediately upon the creation within the DDL statement. Here you can see I have my CREATE TABLE statement, I define my table and all of the columns as well as the data types. I define the constraint or the constraints, and then I specified the compression that I want to use. Here I'm going to say ON [ PRI M ARY] , ignore duplicates, ALLOW _ROW _LOCKS. And I specify the...specify the partition in which to create this. I'm going to create the temporary and archive tables first, and now I can go back in. And I've decided that, you know what, for my Fact Sal esOr der Tem p that archive table, actually, or my Fac t Sal es Or der Ar chi v e , will begin to take up quite a bit of space.
Graphic
The Disk Usage by Table report is open in Microsoft SQL Server Management Studio. The instructor closes this report and the SQLQuery8.sql file is displayed. The instructor scrolls down to display the following code:
Topic Transcript
Page 17 of 18
CREATE TABLE [dbo].[FactSaleOrderTemp]( [CustomerKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [OrderNom] [int] NOT NULL, [LineNom] [int] NOT NULL, [Quantity] [smallint] NULL, [SalesAmount] [money] NULL, CONSTRAINT [PK_FactSalesOrderTemp] PRIMARY KEY CLUSTERED ( Then the instructor runs the following code: -- you first need to create the temporary and the archive tables that will be used (do not show on screen) CREATE TABLE [dbo].[FactSaleOrdersArchive]( [CustomerKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [OrderNom] [int] NOT NULL, [LineNom] [int] NOT NULL, [Quantity] [smallint] NULL, [SalesAmount] [money] NULL, CONSTRAINT [PK_FactSalesOrderArchive] PRIMARY KEY CLUSTERED ( [CustomerKey] ASC, [ProductKey] ASC, [OrderDateKey] ASC, [OrderNom] ASC, [LineNom] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO The Messages tabbed page is displayed. This tabbed page indicates that the commands were completed successfully. Next the instructor runs the following command: CREATE TABLE [dbo].[FactSaleOrderTemp]( [CustomerKey] [int] NOT NULL, [ProductKey] [int] NOT NULL, [OrderDateKey] [int] NOT NULL, [OrderNom] [int] NOT NULL, [LineNom] [int] NOT NULL, [Quantity] [smallint] NULL, [SalesAmount] [money] NULL, CONSTRAINT [PK_FactSalesOrderTemp] PRIMARY KEY CLUSTERED ( [CustomerKey] ASC, [ProductKey] ASC,
Topic Transcript
Page 18 of 18
[OrderDateKey] ASC, [OrderNom] ASC, [LineNom] ASC )) ON fg2008 GO The Messages tabbed page is displayed. This tabbed page indicates that the commands were completed successfully. Next the instructor right-clicks the Tables node from Object Explorer and clicks Refresh.
So I want to ensure that compression is enabled on that. If I look at my FactSaleOrdersArchive and I look at my Storage, I look at my compression. I can now go in and apply compression at Page or Row level. And I'll specify since this does or can span file groups or partitions, use it for all of the partitions. I'll generate the script, and now I'm going to ALTER my TABLE and specify I want to alter it with compression set to PAGE level compression. So in this demonstration, we looked at applying page or row-level compression to our tables.
Graphic
The SQLQuery8.sql file is open in Microsoft SQL Server Management Studio. The instructor rightclicks the dbo.FactSaleOrdersArchive table from Object Explorer. From the resultant shortcut menu, the instructor points to Storage and clicks Manage Compression. As a result the Welcome page of the Data Compression Wizard is displayed. The instructor clicks Next on this page and on the Select Compression Type page, selects the Page option from the Compression type drop-down list. Then the instructor clicks to select the Use same compression type for all partitions checkbox and clicks Next. On the Select an Output Option page, the instructor clicks Next. The Data Compression Wizard Summary page opens. On this page, the instructor clicks Finish and the Compression Wizard Progress page is displayed. After the procedure is successful, the instructor clicks Close and the wizard closes. The SQLQuery9.sql file is displayed with the following code: USE [AdventureWorksDW2012] ALTER TABLE [dbo.][FactSaleOrdersArchive] REBUILD PARTITION = ALL WITH (DATA_COMPRESSION = PAGE )
2013 SkillSoft Ireland Limited

Using Columstore Indexes PDF

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Using Columstore Indexes PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Topic Transcript

Using ColumnStore Indexes, Partitions, and Compression

After completing this topic, you should be able to

1. Demo: Columnstore indexes

3. Demo: Data Compression

2013 SkillSoft Ireland Limited

S-ar putea să vă placă și