Sunteți pe pagina 1din 49

MySQL 5.

7: Whats New in the


Optimizer?
Manyi Lu
MySQL Optimizer Team, Oracle
October, 2015

Copyright
Copyright2015,
2014,Oracle
Oracleand/or
and/oritsitsaffiliates.
affiliates.All
Allrights
rightsreserved.
reserved.|

Safe Harbor Statement


The following is intended to outline our general product direction. It is intended for
information purposes only, and may not be incorporated into any contract. It is not a
commitment to deliver any material, code, or functionality, and should not be relied upon
in making purchasing decisions. The development, release, and timing of any features or
functionality described for Oracles products remains at the sole discretion of Oracle.

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL Optimizer
MySQL Server

Optimizer

Parser

SELECT a, b
FROM t1, t2, t3
WHERE t1.a = t2.b
AND t2.b = t3.c
AND t2.d > 20
AND t2.d < 30;

JOIN

Cost based
optimizations

Table/index info
(data dictionary)

Cost Model
Heuristics

Statistics
(storage engines)
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Table
scan

JOIN

Range
scan

Ref
access

t2

t3

t1

MySQL Optimizer: Design Principles


Best out of the box performance

Easy to use, minimum tuning needed


When you need to understand: explain and trace
Flexibility through optimizer hints, switches, and plugins
Fast evolving

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Optimizer Improvements

Generated columns and functional indexes


New JSON datatype and functions
Improved cost model
New hint syntax and improved hint support
Query rewrite plugin
UNION ALL queries do not always use temporary tables
Improved optimizations for queries with IN expressions
Merging Derived Tables into Outer Query
Explain on a running query

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Generated Columns

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Generated Columns

Kodus to Andrey Zhakov for his contribution!

CREATE TABLE order_lines


(orderno integer,
lineno integer,
price decimal(10,2),
qty integer,
sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) STORED );

Column generated from the expression


VIRTUAL: computed when read, not stored, indexable
STORED: computed when inserted/updated, stored in SE, indexable
Useful for:
Functional index
Materialized cache for complex conditions
Simplify query expression
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Functional Index
CREATE TABLE order_lines
(orderno integer,
lineno integer,
price decimal(10,2),
qty integer,
sum_price decimal(10,2) GENERATED ALWAYS AS (qty * price) VIRTUAL);
ALTER TABLE order_lines ADD INDEX idx (sum_price);

Online index creation


Composite index on a mix of ordinary, virtual and stored columns

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Generated column: STORED vs VIRTUAL


STORED

VIRTUAL

Requires table rebuild at creation


Updating table data at
INSERT/UPDATE
Fast retrieval

Metadata change only, instant


Faster INSERT/UPDATE, no
change to table
Compute when read

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Indexing Generated column: STORED vs VIRTUAL (InnoDB)


STORED

VIRTUAL

Primary & secondary index


B-TREE, Full text search, R-TREE
Duplication of data in base table
and index
Independent of SE
Online operation

Secondary index only


B-TREE only
Less storage
Requires SE support
Online operation

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

10

JSON support

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

JSON Support
Seemless integration of relational and schema-less data
Leverage existing database infrastructure for new applications
Provide a native JSON datatype

Provide a set of built-in JSON functions

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

New JSON Datatype


CREATE TABLE employees (data JSON);
INSERT INTO employees VALUES ('{"id": 1,
"name": "Jane"}');
INSERT INTO employees VALUES ('{"id": 2,
"name": "Joe"}');

SELECT * FROM employees;


+-------------------------------------+
| data
|
+-------------------------------------+
| {"id": 1, "name": "Jane"} |
| {"id": 2, "name": "Joe"}
|
+-------------------------------------+
2 rows in set (0,00 sec)

You can always store JSON data in TEXT or VARCHAR column, but
native JSON datatype has many benefits:
Validation on insert
Efficient access
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

New functions to handle JSON data


Info
JSON_VALID()
JSON_TYPE()
JSON_KEYS()
JSON_LENGTH()
JSON_DEPTH()
JSON_CONTAINS_PATH()
JSON_CONTAINS()

Modify
JSON_REMOVE()
JSON_ARRAY_APPEND()
JSON_SET()
JSON_INSERT()
JSON_ARRAY_INSERT()
JSON_REPLACE()

Create
JSON_MERGE()
JSON_ARRAY()
JSON_OBJECT()

Get data
JSON_EXTRACT()
JSON_SEARCH()
Helper
JSON_QUOTE()
JSON_UNQUOTE()

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

14

Inlining JSON PATH Expressions in SQL


[[database.]table.]column->$<path spec>
SELECT * FROM employees WHERE data->$.name = Jane";

Is a short hand for


SELECT * FROM employees WHERE JSON_EXTRACT(data, $.name ) = Jane;

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Indexing JSON data


TABLE employees
(data JSON);
CREATE
Use Functional
Indexes,
ALTER
STORED
VIRTUAL types are supported
TABLEand
employees
ADD COLUMN name VARCHAR(30) AS (JSON_UNQUOTE(data->$.name)) VIRTUAL,
ADD INDEX name_idx (name);

Functional index approach


Use JSON_EXTRACT to specify field to be indexed

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

16

Using Real Life Data


Via SF OpenData
206K JSON objects
representing subdivision
parcels.
CREATE TABLE features (
id INTEGER NOT NULL auto_increment primary key,
feature JSON NOT NULL
);

Imported from https://github.com/zemirco/sf-city-lots-json + small tweaks


Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

17

{
"type":"Feature",
"geometry":{
"type":"Polygon",
"coordinates":[
[
[-122.42200352825247,37.80848009696725,0],
[-122.42207601332528,37.808835019815085,0],
[-122.42110217434865,37.808803534992904,0],
[-122.42106256906727,37.80860105681814,0],
[-122.42200352825247,37.80848009696725,0]
]
]
},
"properties":{
"TO_ST":"0",
"BLKLOT":"0001001",
"STREET":"UNKNOWN",
"FROM_ST":"0",
"LOT_NUM":"001",
"ST_TYPE":null,
"ODD_EVEN":"E",
"BLOCK_NUM":"0001",
"MAPBLKLOT":"0001001"
}
}
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

18

Naive Performance Comparison


Unindexed traversal of 206K documents
# as JSON type
SELECT DISTINCT
feature->"$.type" as json_extract
FROM features;
+------------------+
| json_extract |
+------------------+
| "Feature" |
+-----------------+
1 row in set (1.25 sec)

# as TEXT type
SELECT DISTINCT
feature->"$.type" as json_extract
FROM features;
+------------------+
| json_extract |
+------------------+
| "Feature" |
+------------------+
1 row in set (12.85 sec)

Explanation: Binary format of JSON type is very efficient at


searching. Storing as TEXT performs over 10x worse at traversal.
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

19

Create Index

From table scan on 206K documents to index scan on 206K materialized values

ALTER TABLE features ADD feature_type VARCHAR(30) AS (feature->"$.type") VIRTUAL;


Query OK, 0 rows affected (0.01 sec)
Records: 0 Duplicates: 0 Warnings: 0
Meta data change only (FAST).
Does not need to touch table.

ALTER TABLE features ADD INDEX (feature_type);


Query OK, 0 rows affected (0.73 sec)
Records: 0 Duplicates: 0 Warnings: 0
SELECT DISTINCT feature_type FROM features;
+-------------------+
| feature_type |
+-------------------+
| "Feature"
|
+-------------------+
1 row in set (0.06 sec)

Creates index online. Does not


modify table rows.

Down from 1.25 sec to


0.06 sec

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

20

JSON Path Search


Provides a novice way to know the path. To retrieve via:
[[database.]table.]column->"$<path spec>" (GA VERSION ONLY)
SELECT JSON_SEARCH(feature,'one', 'MARKET')
AS extract_path FROM features
WHERE id = 121254;

SELECT feature->"$.properties.STREET"
AS property_street FROM features
WHERE id = 121254;

+--------------------------------+
| extract_path
|
+--------------------------------+
| "$.properties.STREET" |
+--------------------------------+
1 row in set (0.00 sec)

+-----------------------+
| property_street |
+-----------------------+
| "MARKET"
|
+-----------------------+
1 row in set (0.00 sec)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

21

JSON Array Creation


SELECT JSON_ARRAY(id,
feature->"$.properties.STREET",
feature->"$.type") AS json_array
FROM features ORDER BY RAND() LIMIT 3;
+------------------------------------------+
| json_array
|
+------------------------------------------+
| [65298, "10TH", "Feature"]
|
| [122985, "08TH", "Feature"] |
| [172884, "CURTIS", "Feature"] |
+------------------------------------------+
3 rows in set (2.66 sec)

Evaluates a (possibly empty) list of


values and returns a JSON array
containing those values

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

22

Cost Model

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Motivation for Changing the Cost Model


Adopt to new hardware architectures
SSD, larger memories, caches

Allows storage engines to provide accurate and dynamic cost estimate


Whether the data is in RAM, SSD, HDD?

More maintainable cost model implementation


Avoid hard coded constants
Refactoring of existing cost model code

Tunable/configurable

Replace heuristics with cost based decisions

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Cost Model: Main Focus in 5.7


Address the following pain points in current cost model:
Inaccurate record estimation for JOIN
Solution: condition filtering

Hard-coded cost constants


Solution: adjustable cost parameters

Imprecise cardinality/records per key estimates from SE


Solution: Integer value replaced by floating point

Hard to obtain detailed cost numbers


Solution: Added to JSON explain and MySQL WorkBench

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.6: Record Estimates for JOIN


t1 JOIN t2

Prefix_rows_t1
Number of records read
from t1

Access
Method

Without condition filtering

t1

t2

Total cost = cost (access method t1) + Prefix_rows_t1 * cost (access method t2)
Prefix_rows_t1 is records read by t1
Overestimation if where conditions apply!->Suboptimial join order
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimates for JOIN


t1 JOIN t2
t1

Condition filter

Number of
records read
from t1

Prefix_rows_t1
Records passing the table
conditions on t1

Access
Method

Condition filter

t2

Prefix_rows_t1 Takes into account the entire query condition


More accurate record estimate -> improved JOIN order

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimates for JOIN


CREATE TABLE emp (
id INTEGER NOT NULL PRIMARY KEY,
office_id INTEGER NOT NULL,
first_name VARCHAR(20),
hire_date DATE NOT NULL,
KEY office (office_id)
) ENGINE=InnoDB;

CREATE TABLE office (


id INTEGER NOT NULL PRIMARY KEY,
officename VARCHAR(20)
) ENGINE=InnoDB;

10 000 rows in the emp table


100 rows in the office table
100 rows with first_name=John AND hire_date BETWEEN 2014-0101 AND 2014-06-01
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7 Improved Record Estimates for JOIN


SELECT office_name
FROM office JOIN employee ON office.id = employee.office
WHERE employee.name LIKE John AND
hire_date BETWEEN 2014-01-01 AND 2014-06-01;

Explain for 5.6: Total Cost = cost(scan office) + 100 * cost(ref_access emp)
Table

Type

Possible keys Key

Ref

Rows

Filtered

Extra

office

ALL

PRIMARY

NULL

NULL

100

100.00

NULL

employee

ref

office

office

office.id

99

100.00

Using where

Explain for 5.7: Total Cost = cost(scan emp) + 9991*1.23% *


cost(eq_ref_access office)
Table

Type

Possible keys

Key

Ref

Rows

Filtered

Extra

employee

ALL

NULL

NULL

NULL

9991

1.23

NULL

office

eq_ref

PRIMARY

PRIMARY

employee.office 1

100.00

Using where

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

JOIN ORDER
HAS CHANGED!

HINT

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Improved HINTs
Introduced new hint syntax /*+ ...*/
Flexibility over optimizer switch, effect individual statement only
Hints within one statement take preceduence over optimizer switch
Hints apply at different scope levels: global, query block, table, index

Extended hint support


BKA, BNL, MRR, ICP, SEMIJOIN, SUBQUERY,MAX_EXECUTION_TIME++
Disabling prevents optimizer to use it
Enabling means optimizer is free to use it, but is not forced to use it

Will gradually replace the old hint syntax in upcoming releases

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Hint Example: MAX_EXECUTION_TIME


SELECT /*+ MAX_EXECUTION_TIME(1) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1;
ERROR 3024 (HY000): Query execution was interrupted, maximum statement execution time exceeded
SELECT /*+ MAX_EXECUTION_TIME(1000) */ * FROM t1 a, t1 b, t1 c, t1 d, t1 e LIMIT 1;
+---+-----+----+------+----+-----+----+------+---+------+
|a|b |a|b |a |b |a |b |a| b |
+---+-----+----+------+----+-----+----+------+---+------+
| 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 | 1 | 10 |
+---+------+---+------+----+-----+----+------+---+------+
1 row in set (0,00 sec)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Hint Example: SEMIJOIN


EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT a FROM t3);
No hint, optimizer chooses semi-join algorithm loosescan
id Select_type

Table

Type

Possible_keys Key

Key_len Ref

Rows

Extra

simple

t3

index

null

Using where; LooseScan

simple

t2

ref

test.t3.a 1

Using index

EXPLAIN SELECT * FROM t2 WHERE t2.a IN (SELECT /*+ NO_SEMIJOIN() */ a FROM t3);
Semi-join disabled with hint, subquery is executed for each row of outer table
id Select_type

Table

Type

Possible_keys Key

Key_len

Ref

Rows

Extra

primary

t2

index

null

null

Using where; Using index

dependent
subquery

t3

Index_
subquery

func

Using index

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

MySQL 5.7: Hint Example: SEMIJOIN


EXPLAIN SELECT /*+ SEMIJOIN(@subq MATERIALIZATION) */ * FROM t2 WHERE t2.a IN
(SELECT /*+ QB_NAME(subq) */ a FROM t3);
Hint on a particular algorithm, in this case semi-join materialization
id Select_type

Table

Type

Possible_ke
ys

Key

Key_len Ref

Rows

Extra

simple

t2

index

null

Using where;using index

simple

<subquery2>

eq_ref

<auto_key>

<auto_key>

test.t2.a 1

null

null

Using index

23 rows
materialized
t3
index
in set, 1 warning
(0.01 sec)

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Query Rewrite Plugin

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Why Query Rewrite Plugin?


Problem
Optimizer choses a suboptimal plan
Users can change the query plan by adding hints or rewrite the
query
However, database application code cannot be changed

Solution: query rewrite plugin!

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Query Rewrite Plugin


New pre and post parse query rewrite APIs
Users can write their own plug-ins

Provides a post-parse query plugin


Rewrite problematic queries without the need to make application changes
Add hints
Modify join order
Rewrite rules are defined in a table

Improve problematic queries from ORMs, third party apps, etc

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

How Rewrites Happen?


Pattern is:

Replacement is:

SELECT *
FROM t1 JOIN t2 ON t1.keycol =
t2.keycol
WHERE col1 = ? AND col2 = ?

SELECT a, b, c
FROM t1 STRAIGHT_JOIN t2 FORCE
INDEX (col1)
ON t1.keycol = t2.keycol
WHERE col1 = ? AND col2 = ?

For query
SELECT * FROM t1 JOIN t2 ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2
Replace parameter markers in Replacement with actual literals:
SELECT a, b, c
FROM t1 STRAIGHT_JOIN t2 FORCE INDEX (col1)
ON t1.keycol = t2.keycol WHERE col1 = 42 AND col2 = 2
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Query Rewrite Plugin cont.


query_rewrite.rewrite_rules table:
Pattern

Pattern_
database

SELECT name, department_name


FROM employee JOIN department
USING ( department_id ) WHERE
salary > ?

SELECT name, department_name FROM


employee STRAIGHT_JOIN department
employe
USING ( department_id ) WHERE salary
es
>?

SELECT name, department_name


FROXM employee JOIN
department USING (
department_id ) WHERE salary > ?

SELECT name, department_name FROM


employe employee STRAIGHT JOIN department
es
USING ( department_id ) WHERE salary
>?

Replacement

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Enabl
ed

Message

NULL

Parse error in
pattern:near
at line 1

Query Rewrite Plugin: Performance Impact


What is the Cost of Rewriting queries?
Designed for rewriting problematic queries only!
~ Zero cost for queries not to be rewritten
Statement digest computed for performance schema anyway

Cost of queries to be rewritten is insignificant compared to


performance gain
Cost of generating query + reparsing max ~5% performance overhead
Performance gain potentially x times

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

User Requested
Performance Improvements

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Avoid Creating Temporary Table for UNION ALL


SELECT * FROM table_a UNION ALL SELECT * FROM table_b;

5.6: Always materialize results of UNION ALL in temporary tables


5.7: Do not materialize in temporary tables unless used for sorting,
rows are sent directly to client
5.7: Client will receive the first row faster, no need to wait until the
last query block is finished
5.7: Less memory and disk consumption

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Optimizations for IN Expressions


CREATE TABLE t1 (a INT, b INT, c INT, KEY x(a, b));
SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));
5.6: Certain queries with IN predicates cant use index scans or range scans even
though all the columns in the query are indexed.
5.6: Range optimizer ignores lists of rows
5.6: Needs to rewrite to de-normalized form
SELECT a, b FROM t1 WHERE ( a = 0 AND b = 0 ) OR ( a = 1 AND b = 1 )

5.7: IN queries with row value expressions executed using range scans.
5.7: Explain output: Index/table scans change to range scans

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Optimizations for IN Expressions


SELECT a, b FROM t1 WHERE (a, b) IN ((0, 0), (1, 1));
A table has 10 000 rows, 2 match the where condition
MySQL 5.6:

MySQL 5.7:

**************1. row *****************

*************1. row *****************

select_type: SIMPLE

select_type: SIMPLE

table: t1

table: t1

type: index

type: range

key: x

key: x

key_len: 10

key_len: 10

ref: NULL

ref: NULL

rows: 10 000

rows: 2

Extra: Using where; Using index

Extra: Using where; Using index

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Merging Derived Tables into Outer Query


CREATE VIEW v1 AS (SELECT * FROM t1);
SELECT * FROM v1 JOIN t2 USING (a);
SELECT * FROM (SELECT * FROM t1) AS dt1 JOIN t2 USING (a);

MySQL 5.6:
Derived table (subquery in FROM
clause) always materialized in
temporary table

MySQL 5.7:
Merged into outer query or
materialized
Derived table optimized as part of
outer query:
Faster queries

Derived tables and views are now


optimized the same way
Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

EXPLAIN

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Explain on a Running Query


EXPLAIN [FORMAT=(JSON|TRADITIONAL)] FOR CONNECTION <id>;

Shows query plan on connection <id>


Useful for diagnostic on long running queries
Plan isnt available when query plan is under creation
Applicable to SELECT/INSERT/DELETE/UPDATE

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

What is on Our Roadmap?


Improve prepared statement support
Add histogram

Support parallel queries


Common table expression(WITH RECURSIVE)
Windowing functions

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

Copyright 2014, Oracle and/or its affiliates. All rights reserved. |

S-ar putea să vă placă și