Sunteți pe pagina 1din 2

TECH2TECH HANDS ON

Using PPIs to improve performance


Tips on how partition elimination can boost your query workload. by Paul Sinclair

P artitioned primary indexes (PPIs),


introduced in Teradata Database
V2R5 and extended in Teradata Database
CREATE TABLE Sales
(storeid INTEGER NOT NULL,
productid INTEGER NOT NULL, Partitioning basics
12.0 to multiple levels—referred to as salesdate DATE FORMAT ‘yyyy-mm-dd’
multilevel partitioned primary indexes NOT NULL, > Each partition on a level is sub-
(MLPPIs)—provide an opportunity to totalrevenue DECIMAL(13,2), partitioned on the next level.
significantly improve the performance of totalsold INTEGER,
> A partition number indicates
certain queries and high-volume insert, note VARCHAR(256))
in which partition a row is
update and delete operations. UNIQUE PRIMARY INDEX (storeid, productid,
assigned for a particular level.
The performance gain depends on the salesdate)
number of partitions and the specific PARTITION BY ( > Partitioning columns are the
query being measured. In the best case, the RANGE_N(salesdate BETWEEN columns in a table that are used
query conditions allow every partition but DATE ‘2002-01-01’ AND DATE in a partitioning expression.
one to be eliminated for each partitioning ‘2008-12-31’
expression. With thousands of combined EACH INTERVAL ‘1’ YEAR), > A partitioning expression,
partitions, the I/O for such a query can RANGE_N(storeid BETWEEN 1 AND 300 based on one or more partition-
be reduced to less than 1% of the I/O that EACH 100), ing columns, computes the
it takes to run the same query against the RANGE_N(productid BETWEEN 1 AND 400 partition number for a level.
table with a non-partitioned primary index EACH 100));
> Up to 15 partitioning expres-
(NPPI). Even with only tens or hundreds
sions (one per partitioning
of partitions, huge improvements can be This table is first partitioned by year
level) may be specified for a
made in some queries. based on salesdate. Next, within each year
primary index.
Significant performance benefits can be the data will be partitioned by storeid in
achieved, therefore, if the data demograph- groups of 100. Finally, within each cluster
ics and queries in the workload lead to par- of years and storeid group, the data will
tition elimination. Careful selection of the be partitioned by productid in groups to the base table after retrieving the rowids
partitioning expressions for a table or set of of 100. With seven years, three groups of from the index. Partitioning puts rows with
tables, including the choice of partitioning storeids and four groups for productids, similar values in clusters so that access to
columns, number of partitions per level this partitioning defines 84 (7*3*4) com- the table for the various dimensions can be
and number of partitioning expressions, bined partitions. done by eliminating all but the partitions
is required to be successful. In some cases, Without the partitioning, secondary for the dimension values of interest. Only
modifying the tables can further improve indexes are often used to provide access to the data blocks associated with the non-
partitioning usefulness. the table for the various dimensions so that eliminated partitions need to be read. This
performance is improved over a full-table results in fewer data block I/Os needed to
MLPPI scan. Partitioning avoids the overhead of retrieve the qualifying rows.
The following shows a CREATE TABLE storing and maintaining such secondary In the alternative approach of rowid
statement for a table with an MLPPI: indexes and the query cost of going back retrieval using a secondary index, similar

PAGE 1 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5731


rows are not clustered; consequently, a 3. Add query conditions on the partitioning opportunities and verify
data block may contain only one or a few partitioning columns, where the success of your partitioning.
qualifying rows. This creates the need for possible, to improve partition 8. Collect statistics on the system-
many more data block I/Os to retrieve the elimination opportunities. derived column PARTITION and
qualifying rows. Also, partitioning sup- 4. If queries join on the PI but the the usual recommended indexes and
ports range-based access, which is often PI doesn’t include the partitioning columns. Collecting on the partition-
difficult to effectively achieve with second- column, consider propagating the ing columns themselves is usually
ary indexes. partitioning column value to the also a good idea, but statistics on
other table and modifying the query PARTITION may be enough for good
Choices to also join on the partitioning col- plans. Check EXPLAINs and measure
Some choices of partitioning may have umn and the column propagated to performance to make sure.
trade-offs. If the partitioning columns the other table. A partitioning expression is good
are not part of the primary index (PI), 5. Make sure the selected partitioning only if queries take advantage of it—in
then PI access, join and aggregation clusters the data so that a com- other words, if partition elimination
queries may be degraded, while parti- bined partition contains either a occurs—and all of the following work
tion elimination may improve other large number of rows (resulting in well together, based on validated trade-off
queries. Reducing the number of multiple data blocks per AMP) or choices: partitioning expressions, specific
partitions or adding a secondary index contains no rows. This is to ensure and overall queries, performance, access
on the columns of the PI can mini- that when a combined partition is method, join strategy, partition elimi-
mize the negative performance impact read, a majority of rows read from nation, data maintenance, altering the
to such queries. Note that decreasing the data blocks qualify. (The first partitioning and backup/restore.
the number of partitions reduces the and last data block may contain rows The best choice, if any, of candidate
benefits of partitioning, and secondary from other non-qualifying parti- partitioning expressions depends on the
indexes have their storage and main- tions.) Note that an empty partition mix of anticipated queries and the capa-
tenance overhead. The trade-offs must does not take any space. As a rule bility of the Optimizer to detect partition
be analyzed for the query workload to of thumb, include at least 10 data elimination. The extended logical data
determine whether the benefits offset blocks per combined partition per model can serve as the starting point for
any degradation. Typical queries in the AMP. Since, on average, half of the making the decision, but testing different
workload can be explained and their first data block and half of the last scenarios is often still required. Though it
performance measured in order to help data block will be rows for other may take some experimentation with vari-
in this analysis. The following are some partitions, then 90% of the rows read ous choices of partitioning, the potential
tips on partitioning: will be from the qualifying parti- performance improvements for queries
1. Use the DATE data type, if pos- tion. Simple queries can be run on and other operations are usually worth
sible, for a date-based partitioning the data to determine how well the the effort. T
expression. A date-based partition- data clusters for a candidate set of
ing expression is often a good choice partitioning expressions. Paul Sinclair, a software architect and
for at least one level of partition- 6. If you follow the previous tip, the Teradata Certified Master, has been with
ing. This will allow the Teradata order of partitioning expression the Teradata Research and Development
Database to better recognize parti- shouldn’t matter too much. If all Division for more than 19 years.
tion elimination opportunities. else is constant, place the partition-
2. Keep the partitioning expression ing expressions in ascending order

T
simple. A RANGE_N partitioning based on the number of partitions For more on PPI, visit
Online
expression usually works the best for they each define. However, you may Teradata.com and
partition elimination. With multi- want to put your date-based parti- download the white
level partitioning, while one level tioning expression first. paper “Single-level and
usually does RANGE_N date-based 7. Use tools such as Teradata Database Multilevel Partitioned
partitioning, other levels may use Query Log and Index Wizard to better Primary Indexes.”
CASE_N partitioning. understand your workload, identify

PAGE 2 | Teradata Magazine | September 2008 | ©2008 Teradata Corporation | AR-5731

S-ar putea să vă placă și