Documente Academic
Documente Profesional
Documente Cultură
Jeff Tallman Staff SW Engineer II/Architect ITSG Engineering Evangelism Team tallman@sybase.com
Agenda
System changes
Roadmap Catalog, VLSS, scrollable cursors
Query processing
Overall improvements Optimization goals, criteria, controls Query plan metrics Showplan changes
Semantic partitions
Partitioning overview Range, hash, list in-depth discussion
12.5.2*
ASE 12.5.4
Encrypted columns RTDS MQ
beta 1
beta 2
ASE 15.0
Partitions New optimizer Function columns/indexes
15.1
Clusters
Q304
Q404
Q105
Q205
Q305
Q405
Q106
Q206
Q306
*ASE 12.5.2
100 80 60 40 20
96 64 22
Memory (GB) MS SQL 2003 Oracle 10g ASE 12.5.2
$250K
HP Integrity Server rx5670, 4 Intel Itanium2 1.5Ghz, 6MB Oracle: 96GB/RH 3.0; MS: 64GB/Win Server 2003; ASE: 22GB/RH 3.0 Oracle & MS SQL from official TPC-C tests Sybase ASE from unofficial/internal TPC Style (TPC Suite) (see white paper IA64_Benchmark_wp.pdf)
System Changes
Large identifiers
Remove 30 char limit on object names (255 or 253 with quoted)
#temp tables 238 distinct chars (+17 byte hash)
Theoretical DB storage
32,767 DBs * 32TB = 1 EB (exabyte) = 1,048,544 TB
VLSS does not increase the theoretical maximum size of a database (32TB)
The user can define rules at which point update statistics would need to run
In this example update statistics would run only if the data on the object authors has changed more than 50% since the last update statistics execution
select @datachange = datachange(authors, null, null) if @datachange > 50 begin update statistics authors End
Scrollable Cursors
Pre-15.x cursors
Unidirectional only with no positioning
ASE 15.0
scrollable cursors
Explicit positioning
fetch last [from] <cursor_name> fetch absolute 500 [from] <cursor_name> fetch {next | prior | first | last | absolute # | relative #} [from] <cursor_name>
Agenda
System changes
Roadmap Catalog, VLSS, scrollable cursors
Query processing
Overall improvements Optimization goals, criteria, controls Query plan metrics Showplan changes
Semantic partitions
Partitioning overview Range, hash, list in-depth discussion
Improved parallelism
Vertical: Bushy join support (especially good for large number of tables) Horizontal: Parallel optimization for both data and index partitioning
Pre-ASE 15.x did not support index partitioning index access was serialized
Improved costing, using join histograms for joins with data skews in joining columns
Optimization Goals
Syntax/settings
Server
sp_configure "optimization goal", 0, allrows_oltp"
Session
set plan optgoal allrows_oltp
Query
select * from A order by A.a plan "(use optgoal allrows_dss)"
Optimization Criteria
Union/Union All
append_union_all, merge_union_all, hash_union_distinct , merge_union_distinct
Reformatting
store_index , multi_table_store_ind
Parallelism
parallel_query, index_intersection
Syntax
Controlling Optimization
N ! I + JTC
Now can be controlled
Optimization will timeout after a specified percentage of query processing time most optimal plan determined by that time will be used. sp_configure optimization timeout limit, 10
default is 5 range 1 - 100
Syntax
Server sp_configure "enable metrics capture", 1 Session set metrics_capture on/off
Accessing
QP metrics are captured in default running group Move to different group with sp_metrics groupname View with sysquerymetrics view
Each db has its own sysquerymetrics Built on <db>..sysqueryplans
Sysquerymetrics Columns
Field uid gid id hashkey sequence exec_min exec_max exec_avg elap_min elap_max elap_avg lio_min lio_max lio_avg pio_min pio_max pio_avg cnt abort_cnt text
Definition User ID Group ID Unique ID The hashkey over the SQL query text Sequence number for a row when multiple rows are required for the SQL text Minimum execution time Maximum execution time Average execution time Minimum elapsed time Maximum elapsed time Average elapsed time Minimum logical IO Maximum logical IO Average logical IO Minumum physical IO Maximum physical IO Average physical IO Number of times the query has been executed. Number of times a query was aborted by Resource Governor as a resource limit was exceeded. query text
ASE 15 PlanViewer
1000 703
4,000,000
750
I/O Count
674
500
3.6M
3,000,000
250
377 8 118
Query 2 Query 3
2,000,000 1,000,000 0
Query 1
1M
8K
Database size 630 MB Query 1: Join with expressions and no indexes (infinite improvement) Query 2: Vector aggregation with Group By (exec time improvement 186%; I/O savings 83%) Query 3: Multiple scalar aggregation (exec time improvement 571%; I/O savings 93%)
Query 1
One query that takes 31 hours in 12.5.X, finishes in 8 secs in 15.0 On average this application runs 20% faster in 15.0
TPC-H Query
Agenda
System changes
Roadmap Catalog, VLSS, scrollable cursors
Query processing
Overall improvements Optimization goals, criteria, controls Query plan metrics Showplan changes
Semantic partitions
Partitioning overview Range, hash, list in-depth discussion
Computed Columns
Some definitions
Computed columns
defined by an expression, whether from regular columns in the same row, or functions, arithmetic operators, path names, and so forth.
Function-based indexes
indexes that contain one or more expressions as index keys.
Deterministic property
a property assuring that an expression always returns the same results from a specified set of inputs.
Computed columns can be deterministic or non-deterministic Computed columns can be materialized (evaluated and stored values) on not materialized (aka virtual) Indexed computed columns must be materialized
but do not need to be deterministic but care should be taken with queries
Computed Columns
Syntax:
create table [database.[owner].] table_name (column_name {datatype | {compute | as} computed_column_expression [materialized | not materialized] } {null | not null} create table rental (cust_id int, start_date as getdate()materialized, prod_id int) create index ind_start_date on rental (start_date)
Restrictions
Column expressions can only ref columns in the same row
Similar T-SQL datatype rules & column check constraints To bypass, you can invoke a SQLJ function
You cant drop columns referenced by computed columns, etc. Updating a computed column may have unexpected results
Update base columns in expression instead
create table customer ( first_name varchar(20) not null, last_name varchar(40) not null, phone_number varchar(10) not null, -- an example of a deterministic materialized column customer_id as soundex(first_name) + soundex(last_name) + convert(varchar(10),phone_number) materialized not null, birth_date date not null, -- two examples of a nondeterministic virtual column age_yrs as datediff(yy,birth_date,getdate()), type_person as convert(varchar(8),(case when datediff(yy,birth_date,getdate())>65 then old else young end)), -- example of index on materialized computed column primary key (customer_id) )
Syntax:
create [unique] [clustered] | nonclustered] index index_name on [[ database.] owner.] table_name (column_expression [asc | desc] [, column_expression [asc | desc]]... CREATE INDEX generalized_index on parts_table (general_key(part_no,listPrice, part_no>>version)
Restrictions
Column_expression must be indexable datatype No bit, text, image, java class Cant use subqueries, aggregate functions, local variables or another computed column SQLJ functions are allowed
Provides the same benefits as an index on a computed column, but no need to change the table schema by adding the column
Indexes on expressions Index key expressions are pre-evaluated Dont need to be evaluated again when being accessed
XML example
CREATE TABLE Loan_table ( xml_col image, id as xmlextract(//Loan_application/ID/text(), xml_col, returns int) materialized, ssn as xmlextract(//Loan_application/SSN(text(), xml_col, returns varchar(15)) materialized ) CREATE INDEX fi_ssn ON Loan_table (id, ssn) SELECT partial_xml from Loan_table where id = 168 and ssn = 600-60-6000
NOTE:
Computed columns used to extract relational elements in xml doc Indexes are created on computed column to improve performance
NOTE:
To find the qualified XML doc, index scan is used without evaluating xmlextract() on each every row in the table A big performance improvement
NOTE:
To find the qualified Java objects, index scan is used without evaluating getAuthor() on each every row in the table
Agenda
System changes
Roadmap Catalog, VLSS, scrollable cursors
Query processing
Overall improvements Optimization goals, criteria, controls Query plan metrics Showplan changes
Semantic partitions
Partitioning overview Range, hash, list in-depth discussion
Segment Slices
Parallel Query/DBCC Create Index
Primary goals
Decrease last page contention Allow parallel query Allow parallel dbcc checkstorage Allow parallel index creation
Worker Threads
Segment Partitions
Assessment
Myth that they are no longer useful due to SAN disk speeds
Largely used only for parallel dbcc checkstorage
Worker Threads
Partition maintenance
Add or Alter one or more partitions
A-I
J-R
S-Z
Range/Hash Partitions
Partition-aware maintenance
Update statistics on one or all partitions Truncate, reorg, dbcc, bcp (out) partition
New syntax:
Note that old partitions used :slice_num New semantic partitions specify partition partn_name
bcp [[db_name.]owner.]table_name[:slice_num] [partition pname] {in | out} [filename] [-U username] [-P password] [-S server] [-m maxerrors] [-f formatfile] [-e errfile] [-F firstrow] [-L lastrow] [-b batchsize] [-n] [-c] [-t field_terminator] [-r row_terminator] [-I interfaces_file] [-a display_charset] [-z language] [-v] [-A packet size] [-J client character set] [-T text or image size] [-E] [-g id_start_value] [-N] [-X] [-M LabelName LabelValue] [-labeled] [-K keytab_file] [-R remote_server_principal] [-V [security_options]] [-Z security_mechanism] [-Q] [-Y] c:\sybase\ocs-15_0\bin\bcp pubs2..hash_partn_test partition partn_1 out hash_partn_test_1.txt -Usa -P -SASE15BETA -c
Create table customer ( c_custkey integer not null, c_name varchar(20) not null, c_address varchar(40) not null, other columns ) partition by range (c_custkey) (cust_ptn1 values <= (20000) on segment1, cust_ptn2 values <= (40000) on segment2, cust_ptn3 values <= (60000) on segment3 )
cust_ptn1: values <=20000 cust_ptn2: values <=40000 cust_ptn3: values <=60000
Segment 1
Segment 2
Segment 3
Segment 1
Segment 2
Segment 3
Segment 1
Segment 2
Segment 4
Segment 3
Index size adjusted according to the number of rows in each partition Fewer index pages searched/traversed for smaller partitions
Pre-Galaxy with unpartitioned index
Query C Query B Query A Query A Query B Query C
Unpartitioned Table
Partitioned Table
Restrictions/limitations
Up to 31 columns for partition keys
Think of it as:
If < key1 then partition 1 If = key1 then check key2 If > key1 then check partition2-n
Wrong specification
alter table telco_facts_ptn partition by range (month_key, customer_key) (p1 values <= (3, 1055000) on part_01, p2 values <= (3, 1100000) on part_02, p3 values <= (6, 1055000) on part_03, p4 values <= (6, 1100000) on part_04, p5 values <= (9, 1055000) on part_05, p6 values <= (9, 1100000) on part_06, p7 values <= (12, 1055000) on part_07, p8 values <= (12, 1100000) on part_08)
create unique nonclustered index primary_key_idx on telco_facts_ptn (month_key, customer_key, service_key, status_key) local index p1 on part_01, p2 on part_02, p3 on part_03, p4 on part_04, p5 on part_05, p6 on part_06, p7 on part_07, p8 on part_08, p9 on part_01, p10 on part_02, p11 on part_03, p12 on part_04, p13 on part_05, p14 on part_06, p15 on part_07, p16 on part_08, p17 on part_01, p18 on part_02, p19 on part_03, p20 on part_04, p21 on part_05, p22 on part_06, p23 on part_07, p24 on part_08
Example Showplan
QUERY PLAN FOR STATEMENT 1 (at line 1). 2 operator(s) under root The type of query is SELECT. ROOT:EMIT Operator |SCALAR AGGREGATE Operator | Evaluate Ungrouped COUNT AGGREGATE | | |SCAN Operator | | FROM TABLE | | telco_facts_ptn | | [ Eliminated Partitions : 2 3 4 5 6 7 8 ] | | Index : primary_key_idx | | Forward Scan. | | Positioning by key. | | Index contains all needed columns. Base table will not be read. | | Keys are: | | month_key ASC | | Using I/O Size 4 Kbytes for index leaf pages. | | With LRU Buffer Replacement Strategy for index leaf pages. (1 row affected)
Hash Partitioning
Uses a hash to try to balance a high cardinality of values across the partitions Hash function assures that partition keys with same values always go to the same partition
But you can not *CONTROL* which partition is used
Recommendations:
Large numbers of partitions w/ medium high data cardinality Data with no particular order (i.e. product SKUs) Data that is typically accessed using equality (=) vs. range
Hash key equality would allow partition elimination, however, a range scan may require access many to all partitions.
Partition
Partn_1 Partn_2 Partn_3 Partn_4 Partn_5 Partn_6 Partn_7 Partn_8 Partn_9 Partn_10
Partn_key Values
3,8,3,8,3,8,3,8, 10,13,15,10,13,15, {empty} 17,17,17,17, 2,11,18,2,11,18, 16,19,16,19,16, 1,6,9,1,6,9,1,6, 4,14,4,14,4,14, 7,20,7,20,7,20, 5,12,5,12,5,12,
Partition
Partn_1 Partn_2 Partn_3 Partn_4 Partn_5 Partn_6 Partn_7 Partn_8 Partn_9 Partn_10
Partn_key Values
3,3,3,,8,8,8,. 10,10,10,,13,13,13,15,15,15, {empty} 17,17,17,17, 2,2,2,,11,11,11,18,18,18, 16,16,16,,19,19,19, 1,1,1,,6,6,6,,9,9,9, 4,4,4,,14,14,14, 7,7,7,,20,20,20, 5,5,5,,12,12,12,
Recommendations
Small to medium numbers of partitions w/ low medium data cardinality Data that is typically accessed using equality (=) vs. range
key equality would allow partition elimination, however, a range scan may require access many to all partitions. Would work well for range and aggregation when range is within a single partition or aggregate is grouped by partition key
Gotchas
Partition key values not listed If using for sequential keys, regular maintenance will be required to add partitions
Syntax
Create table tablename (colspec) partition by list (column_name [, column_name ] ...) ( [ partition_name ] values ( constant[, constant ] ...) [ on segment_name ] [, [ partition_name ] values ( constant[, constant ] ...) [ on segment_name ] ] ...)
create table list_partn_test ( row_id numeric(10,0) identity month varchar(10) txn_date datetime txn_amount money ) lock datarows partition by list (month) ( partn_Jan values ("January","Jan") on seg1, partn_Feb values ("February","Feb") on seg1, partn_Mar values ("March","Mar") on seg1, partn_Apr values ("April","Apr") on seg1, partn_May values ("May") on seg1, partn_Jun values ("June","Jun") on seg1, partn_Jul values ("July","Jul") on seg1, partn_Aug values ("August","Aug") on seg1, partn_Sep values ("September","Sep") on seg1, partn_Oct values ("October","Oct") on seg1, partn_Nov values ("November","Nov") on seg1, partn_Dec values ("December","Dec") on seg1 )
Recommendation
If you dont want to deal with a new error code, add a column constraint/rule specifying full range of values
The above may only be possible if list is finite and bounded
If using with unbounded sequential values (dates, etc.), you will need to maintain partition via alter table in advance of users
Altering Partitions