Sunteți pe pagina 1din 49

FORMING RANGE-BASED BREAK GROUPS

WITH ADVANCED SQL

110515958.doc

Author:

Brendan Furey

Creation Date:

12 June 2011

Version:

1.4

Last Updated:

25 September 2012

Page 1 of 49

Table of Contents

Introduction.......................................................................................................4
Hardware/Software Summary.......................................................................4
Problem Definitions and Examples.....................................................................5
Problem Definitions......................................................................................5
Problem 1: Contiguous Ranges.............................................................................5
Problem 2: Overlapping Ranges...........................................................................5
Problem 3: Bursts of Activity.................................................................................5

Functional Test Data....................................................................................5


Activity_nov, Activity Table...................................................................................5
Indexes..................................................................................................................5
Test Cases.............................................................................................................6
Test Data...............................................................................................................6

Test Data Grouping Diagram........................................................................7


Performance Testing Strategy......................................................................8
SQL Change for Single Break Group Problems.....................................................9

Problem 1: Contiguous Ranges.........................................................................10


Analytics Solution.......................................................................................10
How It Works.......................................................................................................10
Query Diagram....................................................................................................10
SQL......................................................................................................................11
Inline View Diagram............................................................................................11
Solution Stage Table...........................................................................................11

Model Solution...........................................................................................12
How It Works.......................................................................................................12
Query Diagram....................................................................................................13
SQL......................................................................................................................13

Recursive Subquery Factor Solution...........................................................14


How It Works.......................................................................................................14
Query Diagram....................................................................................................15
SQL......................................................................................................................15

Performance Analysis.................................................................................16
Test Data Sets.....................................................................................................16
Output Record Counts.........................................................................................16
CPU Times...........................................................................................................17
Slice Graphs........................................................................................................19
Explain Plans (Data Point W256-D1)...................................................................19
Discussion of Results..........................................................................................20

Problem 2: Overlapping Ranges.......................................................................21


Analytics Solution.......................................................................................21
How It Works.......................................................................................................21
Query Diagram....................................................................................................21
SQL......................................................................................................................21
Inline View Diagram............................................................................................22
Solution Stage Table...........................................................................................22

Model Solution............................................................................................23
How It Works.......................................................................................................23
Query Diagram....................................................................................................24
SQL......................................................................................................................24

Recursive Subquery Factor Solution...........................................................24


How It Works.......................................................................................................24
Query Diagram....................................................................................................26
SQL......................................................................................................................27

Performance Analysis.................................................................................27
Test Data Sets.....................................................................................................27
Output Record Counts.........................................................................................28
CPU Times...........................................................................................................28

110515958.doc

Page 2 of 49

Slice Graphs........................................................................................................31
Explain Plans (Data Point W64-D1).....................................................................31
Discussion of Results..........................................................................................33

Problem 3: Bursts of Activity............................................................................34


Analytics Solution (None)...........................................................................34
Model Solution............................................................................................34
How It Works.......................................................................................................34
Query Diagram....................................................................................................35
SQL......................................................................................................................35

Recursive Subquery Factoring Solution.......................................................35


How It Works.......................................................................................................35
Query Diagram....................................................................................................36
SQL......................................................................................................................36

Performance Analysis.................................................................................37
Test Data Sets.....................................................................................................37
Output Row Counts.............................................................................................37
CPU Times...........................................................................................................38
Slice Graphs........................................................................................................39
Explain Plans (Data Point W128-D1)...................................................................40
Discussion of Results..........................................................................................40

Analytics Anomaly Analysis..............................................................................41


Analytic Query Variations...........................................................................41
Problem 1: Contiguous Ranges...........................................................................41
Problem 2: Overlapping Ranges.........................................................................43

Performance Analysis.................................................................................45
Problem 1: Contiguous Ranges...........................................................................45
Problem 2: Overlapping Ranges.........................................................................46
CPU Times...........................................................................................................46

Conclusions.....................................................................................................48
References.......................................................................................................49

Change Record
Date

Author

Version

12-Jun-2011

BPF

1.0

14-Jun-2011

BPF

1.1

19-Jul-2011

BPF

1.2

02-Aug-2011
25-Sep-2012

BPF
BPF

1.3
1.4

110515958.doc

Change Reference
Initial covering 2 problems, analytic solutions only, no performance
analysis
Added test case 5, and tabulated intermediate solutions
Restructured, adding third problem, Model and RSF solutions, and
performance analysis
Analytics anomaly analysis
References now hyperlinks

Page 3 of 49

Introduction
Records in a database often include range fields, such as a start and end time for some activity, and it is
sometimes desired to group the records by range. There are several possible ways of grouping by range:
In one case the records do not overlap, but additional breaking fields may be present; in a second case,
records may overlap, but additional breaking fields do not then make sense; in the third case considered
('bursts of activity'), only a single start field is used and break groups consist of all the records whose
range start is within a given distance from the starting point. For each problem, we consider two variations
that affect the choice of SQL: In the first, we are looking for all break groups, while in the second we want
to retrieve only a single one.
This article provides solutions for these problems, using three SQL techniques, namely: Analytic
Functions, Model Clause, and Recursive Subquery Factoring. Diagrams are used extensively to depict
query structures and help explain the solutions.
Performance analyses are included that compare performance of the three methods (only two for the third
problem) on each problem across a two-dimensional domain of size and depth. The analyses follow an
approach described in an earlier article (SQL Pivot and Prune Queries Keeping an Eye on
Performance). The results show that the best method depends on the depth of the groups, with Analytic
Functions being best for deep groups and Recursive Subquery Factoring best for shallow groups where
only a single group is required. The Model Clause performs best where an Analytic Functions solution is
not available (the bursts of activity problem) and either all groups are required or a single deep group is
required. The Model Clause also gives very stable performance across depth range, and is surprisingly
simple in structure. The article may be of interest to developers who have yet to learn about some of these
techniques.
An important performance glitch was discovered in using the analytic function First_Value with the Ignore
Nulls option, and methods for avoiding it presented.
This document replaces a preliminary version (Forming Range-Based Break Groups with SQL Analytic
Functions) with only analytic solutions, two problems, and no performance analysis.

Hardware/Software Summary
Component
Database
Diagrammer
Operating System
Computer

110515958.doc

Description
Oracle Database 11g Express Edition Release 11.2.0.2.0 - Beta
Microsoft Visio 2003 (11.3216.5606)
Microsoft Windows 7 Home Premium (32 bit)
Samsung X120, 3GB memory, Intel U4100 @ 1.3GHz x 2

Page 4 of 49

Problem Definitions and Examples


Problem Definitions
In this section, we define the problems generically. Consider the fields in a record set to divide into the
following categories:

key

range start, range end


problem 3)

- range fields (range end is just viewed as another attribute in

break

- break fields (where allowed)

other

- partition by fields

- any other fields

For each problem, we consider two variations that affect the choice of SQL: In the first, we are looking for
all break groups, while in the second we want to retrieve only a single one enclosing (or, starting from, for
the third problem) a particular value.
Problem 1: Contiguous Ranges
The first problem is to obtain for each record a group start, group end pair that are the range start and
range end values for the records that respectively start and end the break group of the current record. The
records are to be ordered by range start within the partitioning key, and a new break group starts when,
between successive records, either there is a gap between range end and range start fields, or any of the
break fields change value. No overlaps are allowed in the ranges within a key.
Problem 2: Overlapping Ranges
The second problem is the same as the first but with no break fields and overlapping is allowed. In other
words, groups consist of all records that overlap, counting contiguity as overlapping.
Problem 3: Bursts of Activity
The third problem is to determine the break groups using distance from the group start point, with
overlapping allowed (since the range end is here just another attribute). In other words, once a group
starts, all records that start within a fixed distance from the group start are in the group, and the first record
after the end of a group defines the next group start.

Functional Test Data


The problem data structure is based on a question posed in Tom Kytes Oracle forum, see Activities and
breaks, while the test data are my own. We will use it for all three problems, but the first problem will use a
separate table of the same structure but with indexes different from those for the others.
Activity_nov, Activity Table

Column
activity_id
person_id
start_date
end_date
activity_name

Type
Number
Number
Date
Date
Char(10)

Indexes
Activity_nov (problem 1, indexes unique)

Index
ACTIVITY_NOV_U1
ACTIVITY_NOV_U2

110515958.doc

Column
person_id
start_date
person_id
end_date

Page 5 of 49

Activity (problems 2 and 3, indexes non-unique)

Index

Column

ACTIVITY_N1

ACTIVITY_N2

person_id
start_date
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
person_id
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
start_date

Test Cases
There are five test cases, two for the first problem, three for the other two, which can use the same data
sets, with a person for each case. The groups for the third problem are defined by a burst size limit of 3
days. Oracle standard dates have 1 second precision, but well take a time component of zero in the test
data for simplicity as this causes no loss of generality.

Test
Case
Scenario
Test Cases T1 and T2 - Non-Overlapping with Additional Breaks
T1
T2

3 records, gap, 2 records, gap, 1 record


3 records, gap, 2 records (names differ), gap, 1 record null end
date
Test Cases T3, T4, T5 - Overlapping without Additional Breaks
T3
3 records (with overlaps), gap, 2 records (second enclosed by
first), gap, 1 record
T4
3 records (with overlaps), gap, 3 records, second overlaps

first, with null end date


T5
Test Data
Per
Act
Id
Id

3 records (with overlaps), gap, 2 records (second enclosed by


first), gap but not with respect to first, 1 record
Activity
Name

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

10

LEAVE

11

TRAINING

12

TRAINING

13

LEAVE

14

LEAVE

110515958.doc

Start Date

End Date

01-Jun-11

02-Jun-11

02-Jun-11

04-Jun-11

04-Jun-11

07-Jun-11

08-Jun-11

09-Jun-11

09-Jun-11

14-Jun-11

20-Jun-11

30-Jun-11

01-Jun-11

02-Jun-11

02-Jun-11

04-Jun-11

04-Jun-11

07-Jun-11

08-Jun-11

09-Jun-11

09-Jun-11

14-Jun-11

20-Jun-11
01-Jun-11

03-Jun-11

02-Jun-11

05-Jun-11

Group
Start
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11
01-Jun11
01-Jun11

Group
End
07-Jun11
07-Jun11
07-Jun11
14-Jun11
14-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11
09-Jun-11

Burst
Date
01-Jun-11

14-Jun-11

08-Jun-11

07-Jun11
07-Jun11

01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11

20-Jun-11
01-Jun-11
01-Jun-11

Page 6 of 49

15

LEAVE

16

LEAVE

17

TRAINING

18

TRAINING

19

LEAVE

20

LEAVE

21

LEAVE

22

LEAVE

23

TRAINING

24

TRAINING

25

LEAVE

26

LEAVE

27

LEAVE

28

LEAVE

29

TRAINING

30

TRAINING

04-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

09-Jun-11

14-Jun-11

20-Jun-11

30-Jun-11

01-Jun-11

03-Jun-11

02-Jun-11

05-Jun-11

04-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

09-Jun-11
20-Jun-11

30-Jun-11

01-Jun-11

03-Jun-11

02-Jun-11

05-Jun-11

04-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

09-Jun-11

14-Jun-11

15-Jun-11

30-Jun-11

01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11

07-Jun11
16-Jun11
16-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11

01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11

07-Jun11
07-Jun11
07-Jun11
30-Jun11
30-Jun11
30-Jun11

01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
15-Jun-11

Test Data Grouping Diagram


The red and yellow boxes in the diagram show the required groupings. The numeric column headers are
the days of this month of June.

110515958.doc

Page 7 of 49

Performance Testing Strategy


In SQL Pivot and Prune Queries Keeping an Eye on Performance we applied an approach to
performance testing of SQL queries whereby the queries are tested across a 2-dimensional domain, using
a testing framework developed for that work. The same approach has been followed here, using the same
framework (note that minor changes to the PL/SQL package and tables were made for this article, such as
excluding file writing times from the recorded times). Further details can be found in the referenced article,
from which the following description is extracted:
In order to provide a realistic scenario, the queries are executed within the context of a simple outbound
interface that writes each record to a file as a comma-separated string. A small PL/SQL package has been
written to automate the testing. The program loops over width and depth dimensions, and for each data
set point makes a call to a separate package to set up the test data and have the CBO statistics gathered;
it then loops over a set of queries defined in the same separate package as strings that are executed by
the main package.
The execution plan is obtained in each case, using an Oracle API, and is written to the generic log. The
query string includes a random number that guarantees a hard-parse and thus recalculation of the
execution plans at each data set point.
For this work, width was taken to correspond to the total number of records, while depth was taken to
correspond to group size. The definitions of the test data vary by problem and are described separately
later.

110515958.doc

Page 8 of 49

SQL Change for Single Break Group Problems


One of the solution techniques is only applicable to the form of problem where a single break group is
required, and so for consistency that is the form used for all solutions in performance testing. It is worth
noting that the other two solution techniques solve this form by obtaining all groups within an inline view,
then applying a restriction outside the view. This means the timings for these should be very similar to
what would be obtained for finding all groups. The change required looks like this:
SELECT FROM (SQL for all groups minus ORDER BY)
WHERE To_Date (root_date, 'DD-MON-YYYY HH24:MI:SS') BETWEEN group_start AND group_end
ORDER BY

110515958.doc

Page 9 of 49

Problem 1: Contiguous Ranges


Analytics Solution
How It Works
The first solution for this problem uses analytic functions (see Oracle Database SQL Language
Reference 11g Release 2 (11.2)), partitioned by person and ordered by start date in a two level query
structure.
1. Within an inline view, use Lag and Lead functions with CASE expressions to set group start and
group end dates on the respective start and end records of the groups, leaving other values null.
2. Select all the original fields from the inline view, as well as the new fields within First_Value,
Last_Value functions with the IGNORE NULLS option
3. The output from step 2 obtains all groups, and if necessary, can be used within another inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all groups, while the
performance testing uses the form for obtaining a single group, for consistency with the third solution
method.
Query Diagram

Notes
The diagram notation follows and extends notation developed earlier, including in SQL Pivot and Prune
Queries Keeping an Eye on Performance. The key can be referred to for subsequent diagrams.
110515958.doc

Page 10 of 49

SQL
SELECT /* NO_OVERLAP */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
ORDER BY person_id, start_date

Inline View Diagram


The diagram below attempts to show how the inline view obtains the group start and end dates. The start
points of the red arrows indicate the records what have group start dates set in the inline view (1, 4, 6 for
T1, 7, 10. 11, 12 for T2); the end points, which have group end dates set (3, 5, 6 for T1, 9, 10, 11, 12 for
T2). Since all other group values are null, the outer query can set the correct values by looking for the last
not null value from the past, for the group start date, and the first not null value in the future, for the group
end date.

Solution Stage Table


The table below shows how the solution proceeds in stages, through level 1.
Per
Id

Act
Id

Activity
Name

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

LEAVE

110515958.doc

Record Level
Start
End Date
Date
01-Jun-11 02-Jun-11

Level 1 View
Start
End Date
Date
01-Jun-11

02-Jun-11

04-Jun-11

04-Jun-11

07-Jun-11

08-Jun-11

09-Jun-11

09-Jun-11

14-Jun-11

20-Jun-11

30-Jun-11

20-Jun-11

01-Jun-11

02-Jun-11

01-Jun-11

02-Jun-11

04-Jun-11

07-Jun-11
08-Jun-11
14-Jun-11
30-Jun-11

Solution
Start
End Date
Date
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
08-Jun14-Jun-11
11
08-Jun14-Jun-11
11
20-Jun30-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
Page 11 of 49

LEAVE

10

LEAVE

11

TRAINING

12

TRAINING

04-Jun-11

07-Jun-11

08-Jun-11

09-Jun-11

08-Jun-11

09-Jun-11

09-Jun-11

14-Jun-11

09-Jun-11

14-Jun-11

20-Jun-11

07-Jun-11

20-Jun-11

11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11

07-Jun-11
09-Jun-11
14-Jun-11

Model Solution
How It Works
The key to solving this problem using Oracles Model clause (Oracle Database SQL Language
Reference 11g Release 2 (11.2)) is to realise that the solution can be represented as simple inductions,
forward for the group start dates, then backward for the group end dates. If, a, s, e, S, E are the current
activity, start date, end date, group start date, end date and (pa, ps, pe, pS, pE) and (na, ns, ne, nS, nE)
are the prior and next values then (using C-like terminology for brevity):
Initial,

S = s;

later,

S = (a != pa or s > pe) ? s : pS

Final,

E = e;

earlier, E = nS > S ? e : nE

These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the previous group
start date, unless there is a gap or the activity changes, relative to the previous record, in which
case take the new start date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows as the next group end date,
unless the next group start date is greater than the current one, or there is no next (i.e. at the last
row), in which case take the current end date. This rule must be processed in descending row
order, and this is specified as it is not the default.
6. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all groups, while the
performance testing uses the form for obtaining a single group, for consistency with the third solution
method.

110515958.doc

Page 12 of 49

Query Diagram

Notes
Queries with the Model clause have a structure that is rather different from other queries, and the
diagrams attempt to reflect that structure for these problems. The main query feeds its output into an array
processing component with a set of rules that specify how any additional (here) data items (called
measures) are to be calculated, in a mostly declarative fashion.
The model box above contains 4 specification types:

Partition
- processing is to be performed separately by one or more columns; the same
meaning as in analytic functions

Dimension
here

Measures
- remaining columns that may be calculated or updated by the rules, possibly
including placeholders from the main query

Rules
- a set of rules that specify measure calculation; rules are processed
sequentially, unless otherwise specified; in the diagram:

- columns by which the array is dimensioned; can included analytic functions, as

- the current dimension value, here row number ordered by start

- maximum dimension value

f(n-1,n)
(and so on)

- denotes that the value depends on values from previous and current rows

^
- denotes that the calculation progresses in ascending order by dimension;
this is the default so does not have to be coded

v
- denotes that the calculation progresses in descending order by dimension;
this is not the default so does have to be coded

SQL
SELECT /* MOD_OVL */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity_nov
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
110515958.doc

Page 13 of 49

RULES (
group_start[rn > 1] =
CASE WHEN start_date[cv()] > end_date[cv()-1] OR activity_name[cv()] != activity_name[cv()-1]
THEN start_date[cv()] ELSE group_start[cv()-1] END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
end_date[cv()])
)
ORDER BY 1, 2

Recursive Subquery Factor Solution


How It Works
This approach is based on new Oracle SQL functionality available only from Oracle Database v11.2,
called Recursive Subquery Factor (RSF) (Oracle Database SQL Language Reference 11g Release 2
(11.2)).
1. Define a recursive subquery factor.
2. The anchoring branch of the RSF selects records defined by the start point. A direction column is
defined that here is set to E for Either, meaning extend in either direction in the recursive
branch.
3. The recursive branch extends the record set by joining records that link to extreme parent records
and that push the envelope. The direction column is set to B or F according as the direction of
extension (Forward or Backward).
4. Select all records from the RSF, applying analytic Min, Max to get the group start and end dates.
The idea here is that for cases where the group is small this will avoid expensive processing of the entire
record set. Well demonstrate this saving in our performance analysis section. This solution only applies to
the form of problem where a single group is required.

110515958.doc

Page 14 of 49

Query Diagram

Notes
Queries with a recursive subquery factor have a special structure, and the diagrams attempt to reflect that
structure for these problems. The recursive factor is a subquery having a Union All structure in which there
are two branches:

Anchor Branch

Recursive Branch

- this is a normal query from which the recursion begins


- this is a query that references the recursive factor itself by alias

Notice the use of subtypes in the diagram: records in the recursive branch can be split into back and
front subtypes.
SQL
WITH
SELECT
FROM
WHERE
UNION

rsq (person_id, start_date, end_date, activity_name, activity_id, direction) AS (


person_id, start_date, end_date, activity_name, activity_id, 'E' direction
activity_nov
start_date <= '&TODAY' AND Nvl(end_date, To_Date ('&TODAY', 'DD-MON-YYYY') + 1) > '&TODAY'
ALL

110515958.doc

Page 15 of 49

SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id, CASE WHEN


act.start_date = rsq.end_date THEN 'F' ELSE 'B' END
FROM rsq
JOIN activity_nov act
ON ((act.start_date
= rsq.end_date AND direction IN ('E', 'F')) OR
(act.end_date
= rsq.start_date AND direction IN ('E', 'B')))
AND act.person_id
= rsq.person_id
AND act.activity_name
= rsq.activity_name
)
SELECT /* RSQ_NON '&TODAY' */ person_id, start_date, end_date, activity_name, activity_id,
Min (start_date) OVER (PARTITION BY person_id) grp_start, Max (end_date) OVER (PARTITION BY
person_id) grp_end
FROM rsq
ORDER BY person_id, start_date

Performance Analysis
Test Data Sets
For the performance analysis it is simpler to generate test date using a single activity, with groups
determined only by the dates. If w and d are the numeric width and depth points, records are generated for
three persons as follows:
Let random(d) be a random integer between 1 and d (generated afresh on each access)
Start date = '01-JAN-1900'
Record limit = 3 * 100 * w
Loop while number of records <= record limit
Add group of records for person 1, with group size = random(d), as follows:
o

First start date = last start date + random(d)

Subsequent start date = previous start date + 1

End date = start date + 1

Exit if record limit reached

Repeat for persons 2 and 3


End loop
Store the root date as the mid point of the first group of records generated
This generation process ensures that the size of the record set is proportional to the width point, while the
groups are of random sizes but within a scale determined by the depth point. The width and depth points,
together with the (randomized) size of the root group, are shown in the next section.
Output Record Counts
The output consists of all the records in the root group, which is defined as the group containing the root
date, and has at least one record by definition. Of course, each solution method operates on the same
data set, and so the number of records written to file is always the same for both (which was checked).

Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
110515958.doc

W1

W2

W4

W8

W16

W32

W64

W12
W256
8

300

600

1200

2400

4800

9600

1920
0

3840
0

76800

1
3
5
8
80
6
300
300
300

1
1
8
16
11
219
93
600
600

1
2
3
8
25
135
75
1196
717

1
3
5
26
55
196
290
1501
2400

1
2
7
10
26
49
134
972
4330

1
1
8
2
72
131
68
1300
3737

1
2
8
13
67
90
547
346
4243

1
1
2
2
41
132
627
437
1331

1
3
2
9
42
168
446
1265
4103
Page 16 of 49

CPU Times
Analytics

Query

W1

W2

W4

W8

W16

D1

0.02

0.05

0.17

0.64

2.42

D3

0.01

0.03

0.10

0.33

1.27

D9
D27
D81
D243
D729
D2187
D6561

0.02
0.02
0.00
0.02
0.02
0.03
0.02

0.01
0.03
0.01
0.04
0.03
0.07
0.06

0.05
0.03
0.03
0.03
0.03
0.10
0.08

0.16
0.08
0.04
0.06
0.05
0.18
0.14

0.50
0.22
0.12
0.09
0.10
0.15
0.33

W12
W64
8
W256
38.4
147. 604.9
9.73
5
35
1
19.2
298.0
4.87
8
72.8
0
28.4 113.0
1.85
7.24
1
2
0.71
2.62 10.18 39.16
0.32
1.00
3.81 14.23
0.22
0.57
1.55
5.27
0.17
0.41
0.99
2.64
0.25
0.34
0.68
1.68
0.31
0.51
0.67
1.42

W32

Notes

The graph generated with Microsoft Excel 2007 may be slightly misleading as the pale blue peak
does not appear to reach 605.

Performance for a given width improves dramatically with depth

Model

Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561

110515958.doc

W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.03
0.03
0.06
0.10
0.19
0.38
0.74
1.43
2.98
0.03
0.03
0.06
0.11
0.19
0.37
0.75
1.53
3.01
0.02
0.03
0.07
0.11
0.20
0.37
0.77
1.52
2.99
0.01
0.03
0.06
0.10
0.17
0.37
0.74
1.51
2.99
0.05
0.03
0.06
0.11
0.21
0.39
0.73
1.50
3.00
0.02
0.05
0.07
0.11
0.18
0.38
0.75
1.54
3.06
0.04
0.03
0.04
0.12
0.20
0.39
0.77
1.51
3.02
0.04
0.08
0.12
0.19
0.22
0.47
0.79
1.50
3.09
0.03
0.06
0.12
0.19
0.44
0.56
0.95
1.60
3.18

Page 17 of 49

Notes

Performance for a given width is essentially independent of depth

Recursive Subquery Factor

Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561

W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.02
0.02
0.02
0.01
0.02
0.01
0.01
0.03
0.05
0.01
0.02
0.00
0.01
0.01
0.02
0.02
0.02
0.09
0.01
0.02
0.02
0.01
0.03
0.05
0.06
0.05
0.07
0.02
0.02
0.01
0.03
0.04
0.01
0.08
0.05
0.25
0.03
0.02
0.02
0.08
0.06
0.27
0.44
0.53
1.05
0.01
0.08
0.09
0.19
0.11
0.42
0.60
1.61
4.33
0.11
0.03
0.03
0.29
0.16
0.19
2.82
7.95 12.02
0.14
0.42
1.45
2.26
1.02
3.01
2.15
4.51 33.49
13.2
27.0
17.8
0.12
0.40
0.54
5.55
17.8
9
3
3 76.35

Notes

110515958.doc

Performance for a given width worsens dramatically with depth

Page 18 of 49

Slice Graphs
Wide Slice

Deep Slice

Explain Plans (Data Point W256-D1)


Analytics
--------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
6 (100)|
|
|
1 | SORT ORDER BY
|
|
12 |
828 |
6 (50)| 00:00:01 |
|* 2 |
VIEW
|
|
12 |
828 |
5 (40)| 00:00:01 |
|
3 |
WINDOW SORT
|
|
12 |
828 |
5 (40)| 00:00:01 |
|
4 |
VIEW
|
|
12 |
828 |
4 (25)| 00:00:01 |
|
5 |
WINDOW SORT
|
|
12 |
336 |
4 (25)| 00:00:01 |
|
6 |
TABLE ACCESS FULL| ACTIVITY_NOV |
12 |
336 |
3
(0)| 00:00:01 |
--------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))

Model
-------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------110515958.doc

Page 19 of 49

|
0 | SELECT STATEMENT
|
|
|
|
5 (100)|
|
|
1 | SORT ORDER BY
|
|
12 |
828 |
5 (40)| 00:00:01 |
|* 2 |
VIEW
|
|
12 |
828 |
4 (25)| 00:00:01 |
|
3 |
SQL MODEL ORDERED |
|
12 |
336 |
4 (25)| 00:00:01 |
|
4 |
WINDOW SORT
|
|
12 |
336 |
4 (25)| 00:00:01 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV |
12 |
336 |
3
(0)| 00:00:01 |
-------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))

Recursive Subquery Factor


--------------------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |C(%CPU)| Time
|
--------------------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|6 (100)|
|
|
1 | WINDOW SORT
|
|
2 |
102 |6
(0)| 00:00:01 |
|
2 |
VIEW
|
|
2 |
102 |6
(0)| 00:00:01 |
|
3 |
UNION ALL (RECURSIVE WITH) BREADTH FIRST|
|
|
|
|
|
|* 4 |
TABLE ACCESS BY INDEX ROWID
| ACTIVITY_NOV
|
1 |
28 |2
(0)| 00:00:01 |
|* 5 |
INDEX SKIP SCAN
| ACTIVITY_NOV_U1 |
1 |
|1
(0)| 00:00:01 |
|
6 |
NESTED LOOPS
|
|
|
|
|
|
|
7 |
NESTED LOOPS
|
|
1 |
69 |4
(0)| 00:00:01 |
|
8 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|* 9 |
INDEX RANGE SCAN
| ACTIVITY_NOV_U1 |
6 |
|1
(0)| 00:00:01 |
|* 10 |
TABLE ACCESS BY INDEX ROWID
| ACTIVITY_NOV
|
1 |
28 |2
(0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------4 - filter("END_DATE">TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
5 - access("START_DATE"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter("START_DATE"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
9 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
10 - filter(((("ACT"."START_DATE"="RSQ"."END_DATE" AND INTERNAL_FUNCTION("DIRECTION")) OR
("ACT"."END_DATE"="RSQ"."START_DATE" AND "ACT"."END_DATE" IS NOT NULL AND
INTERNAL_FUNCTION("DIRECTION"))) AND "ACT"."ACTIVITY_NAME"="RSQ"."ACTIVITY_NAME"))

Discussion of Results

110515958.doc

The best method for deep data sets is Analytics

The best method for shallow data sets is Recursive Subquery Factor

The Model method is independent of depth and performs in the wide slice at a level between the
two other methods, except for one intermediate data point where it is better than both

Page 20 of 49

Problem 2: Overlapping Ranges


Analytics Solution
How It Works
The solution for the second problem is derived from that for the first, but without the additional break
checking, and with an extra starting step to obtain a running end date that is the largest end date up to
the current record, ordered by start date. The running end date then replaces the end date in the next
step. The query thus has one more level.
0. Within an inline view, use Max to set a running end date on each record, converting null end dates
to a large value
1. Within an inline view, select all the original fields from the level-0 inline view, and use Lag and
Lead functions with CASE expressions to set group start and group end dates on the respective
start and running end dates of the break groups, leaving other values null.
2. Select all the original fields from the inline view, as well as the new fields within First_Value,
Last_Value functions with the IGNORE NULLS option, and convert back any large values to null
3. The output from step 2 solves the problems as defined, but if necessary, can be used within
another inline view to restrict the output to certain groups only (e.g. a 'current' group)
Query Diagram

SQL
SELECT /* OVERLAP */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
CASE First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
110515958.doc

Page 21 of 49

start_date-1)) THEN start_date END group_start,


CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date),
running_end+1) > running_end) THEN running_end END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Max (Nvl(end_date, '01-JAN-3000')) OVER (PARTITION BY person_id ORDER BY start_date) running_end
FROM activity
WHERE person_id IN (3, 4)
)
)
ORDER BY person_id, start_date

Inline View Diagram


The diagram below shows how the level-0 inline view obtains the running end dates, which are denoted by
the end points of the red arrows. The red extension blocks denote records where the running end date is
greater than the current end date (17 and 24). Now, although we still have overlaps, the solution for the
first class will work because the running end date ensures the current record always has the latest date for
the current group: Thus in record 17 below, we wont wrongly set the group end date to 14 on seeing the
gap to record 18, and had record 18 started at 15, we would have correctly assigned it to G2, not left it in
G3 (I have added a test case T5 for that, but not included it in the diagram).

Solution Stage Table


The table below shows how the solution proceeds in stages from level 0, through level 1.
Per
id

Act
id

Activity
name

13

LEAVE

14

LEAVE

15

LEAVE

16

LEAVE

17

TRAINING

18

TRAINING

19

LEAVE

110515958.doc

Record Level

Running
(Level 0)

Level 1 View

Start
date
01-Jun-11

End date

End date

Start
date
01-Jun-11

03-Jun-11

03-Jun-11

02-Jun-11

05-Jun-11

05-Jun-11

04-Jun-11

07-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

16-Jun-11

09-Jun-11

14-Jun-11

16-Jun-11

20-Jun-11

30-Jun-11

30-Jun-11

20-Jun-11

01-Jun-11

03-Jun-11

03-Jun-11

01-Jun-11

End date

07-Jun-11
08-Jun-11
16-Jun-11
30-Jun-11

Solution
Start
date
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11

End date
07-Jun-11
07-Jun-11
07-Jun-11
16-Jun-11
16-Jun-11
30-Jun-11
07-Jun-11

Page 22 of 49

20

LEAVE

21

LEAVE

22

LEAVE

23

TRAINING

24

TRAINING

25

LEAVE

26

LEAVE

27

LEAVE

28

LEAVE

29

TRAINING

30

TRAINING

02-Jun-11

05-Jun-11

05-Jun-11

04-Jun-11

07-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

16-Jun-11

09-Jun-11
20-Jun-11

07-Jun-11
08-Jun-11

01-Jan-00
30-Jun-11

01-Jan-00

01-Jun-11

03-Jun-11

03-Jun-11

02-Jun-11

05-Jun-11

05-Jun-11

04-Jun-11

07-Jun-11

07-Jun-11

08-Jun-11

16-Jun-11

16-Jun-11

09-Jun-11

14-Jun-11

16-Jun-11

15-Jun-11

30-Jun-11

30-Jun-11

01-Jan-00
01-Jun-11

07-Jun-11
08-Jun-11

30-Jun-11

01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11

07-Jun-11
07-Jun-11

07-Jun-11
07-Jun-11
07-Jun-11
30-Jun-11
30-Jun-11
30-Jun-11

Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as three simple inductions. If s, e, S, E are the current start date, end date, group start date,
end date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values, ordering by start date,
then (using C-like terminology for brevity):
Initial,

E = e; later,

E = (e > pE) ? e : pE

-- this gets the running latest end dates

Initial,

S = s; later,

S = (s > pE) ? s : pS

-- this gets group start dates

Final,

E = e; earlier,

E = (S < nS) ? E : nE

-- this gets group end dates

These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain a running latest end date for all rows after the first as the previous
running end date, unless the current end date is greater than the previous running end date, in
which case take the new end date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group start date for all rows after the first as the start date,
unless the start date is greater than the previous running latest end date,, in which case take the
previous group start date. This rule will be processed in the default ascending row order.
6. Define the third rule to obtain the group end date for all rows before the last as the next running
latest end date, unless the group start date is less than the previous group start date, in which
case take the next group end date. This rule must be processed in descending row order, and this
is specified as it is not the default.
7. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the third
solution method.

110515958.doc

Page 23 of 49

Query Diagram

SQL
SELECT /* MOD_OVL */ person_id, start_date,
CASE end_date WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE end_date END end_date,
activity_name, activity_id, group_start,
CASE group_end WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE group_end END group_end
FROM activity
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, Nvl (end_date, '01-JAN-3000') end_date, activity_name, activity_id,
start_date group_start, Nvl (end_date, '01-JAN-3000') group_end)
RULES (
group_end[rn > 1] =
CASE WHEN end_date[cv()] > group_end[cv()-1] THEN end_date[cv()] ELSE group_end[cv()-1] END,
group_start[rn > 1] =
CASE WHEN start_date[cv()] > group_end[cv()-1] THEN start_date[cv()] ELSE group_start[cv()-1]
END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
group_end[cv()])
)
ORDER BY 1, 2, 3

Recursive Subquery Factor Solution


How It Works
This approach is based on new Oracle SQL functionality available only from Oracle Database v11.2,
called Recursive Subquery Factor (RSF).
1. Define a recursive subquery factor.
2. The anchoring branch of the RSF selects records defined by the start point. A direction column is
defined that here is set to E for Either, meaning extend in either direction in the recursive
branch.
3. Add analytic function columns for row number by start date and by end date descending, and for
the minimum start date and maximum end dates. These go in both branches.

110515958.doc

Page 24 of 49

4. The recursive branch extends the record set by joining records that link to extreme parent records
and that push the envelope. The direction column is set to B or F according as the direction of
extension (Forward or Backward).
5. Define a subquery factor for the envelope that simply obtains the minimum start date and
maximum end dates from the recursive factor grouped by person
6. Select all records from the envelope factor, joining the activity table for all records within the
envelope by person to get all the group records with the group start and end dates being the
envelope values.
Note that we need the additional subquery factor because the recursive factor may exclude some records
that do not extend the envelope but are contained within it; for example, record 29 in data set T5 above.
The idea here is that for cases where the break group is small this will avoid expensive processing of the
entire record set. Well demonstrate this saving in our performance analysis section.

110515958.doc

Page 25 of 49

Query Diagram

110515958.doc

Page 26 of 49

SQL
WITH
rsq (person_id, start_date, end_date, activity_name, activity_id, env_start, env_end, rn_asc,
rn_dsc, direction) AS (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Min (start_date) OVER (PARTITION BY person_id) env_start,
Max (Nvl (end_date, '01-JAN-3000')) OVER (PARTITION BY person_id) env_end,
Row_Number () OVER (PARTITION BY person_id ORDER BY start_date) rn_asc,
Row_Number () OVER (PARTITION BY person_id ORDER BY Nvl (end_date, '01-JAN-3000') DESC) rn_dsc,
'E' direction
FROM activity
WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
AND person_id IN (3, 4, 5)
UNION ALL
SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id,
Min (act.start_date) OVER (PARTITION BY act.person_id) env_start,
Max (Nvl (act.end_date, '01-JAN-3000')) OVER (PARTITION BY act.person_id) env_end,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY act.start_date) rn_asc,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY Nvl (act.end_date, '01-JAN-3000') DESC)
rn_dsc,
CASE WHEN act.start_date < rsq.env_start THEN 'B' ELSE 'F' END
FROM rsq
JOIN activity act
ON act.person_id
= rsq.person_id
AND ((
act.start_date
< rsq.env_start AND
Nvl (act.end_date, '01-JAN-3000')
>= rsq.env_start AND
rsq.rn_asc
= 1 AND
rsq.direction
IN ('E', 'B')
) OR
(
Nvl (act.end_date, '01-JAN-3000')
> rsq.env_end AND
act.start_date
<= rsq.env_end AND
rsq.rn_dsc
= 1 AND
rsq.direction
IN ('E', 'F')
)
)
), env AS (
SELECT person_id, Min (env_start) env_start, Max (env_end) env_end
FROM rsq
GROUP BY person_id
)
SELECT /* RSQ_OVL '&TODAY' */ act.person_id, act.start_date, act.end_date, act.activity_name,
act.activity_id, env.env_start, CASE WHEN env.env_end = '01-JAN-3000' THEN NULL ELSE env.env_end END
env_end
FROM env
JOIN activity act
ON act.person_id
= env.person_id
WHERE act.start_date
BETWEEN env.env_start AND env.env_end
AND Nvl (act.end_date, '01-JAN-3000')
BETWEEN env.env_start AND env.env_end
ORDER BY act.person_id, act.start_date, act.end_date

Performance Analysis
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Century start date = '01-JAN-1900'
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o

Start date = random day in 20th century

End date = start date + random (Ceil (sqrt(d)) + 1

Repeat for persons 2 and 3


End loop
Store the root date as the mid point of the last record generated
This generation process ensures that the size of the record set is proportional to the width point, while the
ranges are of random sizes but within a scale determined by the depth point; larger ranges correlate with
larger groups. The width and depth points, together with the (randomized) size of the root group, are
shown in the next section.
110515958.doc

Page 27 of 49

Output Record Counts


The output consists of all the records in the root group, which is defined as the group containing the root
date, and has at least one record by definition.. Of course, each solution method operates on the same
data set, and so the number of records written to file is always the same for both (and this was checked).
Note that the output record count reached its maximum in the shaded data points below.

Depth/
Width
Total
Records
>
D1
D2
D4
D8
D16
D32
D64
D128

W1

W2

W4

W8

W16

W32

W64

1500

3000

6000

12000

24000

48000

96000

1
1
1
1
1
2
2
1

1
1
1
1
3
7
10
11

1
1
3
3
6
20
20
94

1
3
1
8
6
19
134
4778

7
4
5
11
47
117
8893
24000

3
3
10
45
231
6531
48000
48000

11
30
62
556
7814
96000
96000
96000

W1

W2

W4

W8

W16

W32

W64

D1

0.28

1.09

3.93

13.96

46.65

126.2

D2
D4
D8
D16
D32
D64
D128

0.28
0.28
0.29
0.28
0.20
0.18
0.12

1.51
0.95
0.96
0.78
0.61
0.44
0.22

3.51
3.68
3.17
2.42
1.71
0.67
0.25

12.39
11.08
9.13
5.48
2.29
0.63
0.58

40.73
31.73
20.02
8.19
1.67
1.16
2.03

96.72
62.95
24.98
4.86
1.28
3.73
3.93

CPU Times
Analytics

Depth/
Width

396.4
6
160.8
9
63.96
12.04
2.31
7.76
7.30
7.33

Notes

Performance for a given width improves dramatically with depth

Model

Depth/
110515958.doc

W1

W2

W4

W8

W16

W32

W64
Page 28 of 49

Width
D1
D2
D4
D8
D16
D32
D64
D128

0.11
0.11
0.12
0.11
0.11
0.11
0.08
0.08

0.23
0.19
0.18
0.20
0.20
0.23
0.19
0.20

0.41
0.36
0.37
0.40
0.37
0.39
0.40
0.44

0.71
0.75
0.74
0.75
0.73
0.77
0.75
1.06

1.54
1.48
1.40
1.44
1.85
1.53
2.00
2.94

2.82
2.79
2.77
3.16
2.82
3.27
6.39
5.47

5.74
5.74
5.87
5.80
6.15
10.80
10.32
10.63

Notes

Performance for a given width is largely independent of depth, except where it starts to drop off at
the maximum depths on the wider data points

Recursive Subquery Factor (No Hint)

Depth/
Width
D1
D2
D4
D8

W1

W2

W4

W8

W16

W32

W64

0.03
0.01
0.02
0.01

0.01
0.03
0.02
0.03

0.03
0.02
0.03
0.03

0.05
0.05
0.01
0.05

0.09
0.04
0.06
0.14

0.10
0.09
0.16
0.57

D16

0.04

0.00

0.04

0.05

0.26

2.25

D32

0.02

0.05

0.07

0.06

0.58

D64

0.03

0.03

0.06

0.39

39.28

D128

0.03

0.05

0.16

9.86

90.30

59.10
330.4
3
325.5
1

0.38
0.71
1.30
10.03
121.2
2
1255.
15
1240.
12
1213.
75

110515958.doc

Page 29 of 49

Notes

Performance for a given width worsens dramatically with depth

Recursive Subquery Factor (Hint)


This query had the following hint added to the anchor branch of the recursive union:
/*+ INDEX (activity ACTIVITY_N1) */
And this to the recursive branch (the first hint means resolve the Or into a Union):
/*+ USE_CONCAT INDEX (act ACTIVITY_N1) */

Depth/
Width
D1
D2
D4
D8

W1

W2

W4

W8

W16

W32

W64

0.03
0.02
0.02
0.02

0.03
0.03
0.03
0.04

0.02
0.03
0.11
0.03

0.14
0.19
0.10
0.20

0.79
0.40
0.03
0.78

0.69
0.65
0.71
3.27

D16

0.03

0.06

0.10

0.19

1.27

D32

0.03

0.10

0.32

0.25

0.06

D64

0.03

0.13

0.24

0.10

68.10

9.27
113.8
3
186.7
0

D128

0.03

0.03

0.42

0.61

48.46

93.41

5.64
4.14
6.21
48.11
242.4
7
735.1
5
731.5
5
158.9
4

110515958.doc

Page 30 of 49

Notes

Performance for a given width worsens dramatically with depth, although less so than for the
unhinted query

Slice Graphs
Wide Slice

Deep Slice

Explain Plans (Data Point W64-D1)


Analytics
--------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 4369 (100)|
|
|
1 | SORT ORDER BY
|
| 96660 | 6513K| 8416K| 4369
(1)| 00:00:53 |
|* 2 |
VIEW
|
| 96660 | 6513K|
| 2794
(1)| 00:00:34 |
|
3 |
WINDOW SORT
|
| 96660 | 6513K| 8416K| 2794
(1)| 00:00:34 |
|
4 |
VIEW
|
| 96660 | 6513K|
| 1219
(1)| 00:00:15 |
|
5 |
WINDOW BUFFER
|
| 96660 | 5663K|
| 1219
(1)| 00:00:15 |
|
6 |
VIEW
|
| 96660 | 5663K|
| 1219
(1)| 00:00:15 |
|
7 |
WINDOW SORT
|
| 96660 | 3964K| 5696K| 1219
(1)| 00:00:15 |
|
8 |
TABLE ACCESS FULL| ACTIVITY | 96660 | 3964K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd
110515958.doc

Page 31 of 49

hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd


hh24:mi:ss')))

Model
-----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2794 (100)|
|
|
1 | SORT ORDER BY
|
| 96660 | 6513K| 8416K| 2794
(1)| 00:00:34 |
|* 2 |
VIEW
|
| 96660 | 6513K|
| 1219
(1)| 00:00:15 |
|
3 |
SQL MODEL ORDERED |
| 96660 | 3964K|
| 1219
(1)| 00:00:15 |
|
4 |
WINDOW SORT
|
| 96660 | 3964K| 5696K| 1219
(1)| 00:00:15 |
|
5 |
TABLE ACCESS FULL| ACTIVITY | 96660 | 3964K|
|
171
(1)| 00:00:03 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))

Recursive Subquery Factor


-------------------------------------------------------------------------------------------------------| Id | Operation
|Name
|Rows | Bytes|TempSpc|Cost (%CPU)|Time
|
-------------------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 4673 (100)|
|
|
1 | SORT ORDER BY
|
|
1|
63 |
| 4673 (62)|00:00:57|
|
2 |
NESTED LOOPS
|
|
1|
63 |
| 4672 (62)|00:00:57|
|
3 |
VIEW
|
|
3|
63 |
| 4666 (63)|00:00:56|
|
4 |
HASH GROUP BY
|
|
3|
63 |
| 4666 (63)|00:00:56|
|
5 |
VIEW
|
|21964| 450K|
| 4664 (63)|00:00:56|
|
6 |
UNION ALL (RECURSIVE WITH) BREAD F |
|
|
|
|
|
|
|
7 |
WINDOW SORT
|
|21616| 886K| 1280K| 645
(1)|00:00:08|
|
8 |
WINDOW SORT
|
|21616| 886K| 1280K| 645
(1)|00:00:08|
|* 9 |
TABLE ACCESS FULL
|ACTIVITY
|21616| 886K|
| 172
(2)|00:00:03|
| 10 |
WINDOW SORT
|
| 348|35496 |
| 4019 (72)|00:00:49|
| 11 |
WINDOW SORT
|
| 348|35496 |
| 4019 (72)|00:00:49|
|* 12 |
HASH JOIN
|
| 348|35496 | 1520K| 4017 (72)|00:00:49|
| 13 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 14 |
TABLE ACCESS FULL
|ACTIVITY
|96660| 3964K|
| 171
(1)|00:00:03|
| 15 |
TABLE ACCESS BY INDEX ROWID
|ACTIVITY
|
1|
42 |
|
2
(0)|00:00:01|
|* 16 |
INDEX RANGE SCAN
|ACTIVITY_N1|
1|
|
|
1
(0)|00:00:01|
-------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------9 - filter(("START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
NVL("END_DATE",TO_DATE(' 3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))>=
TO_DATE(' 1966-04-03 12:00:00','syyyy-mm-dd hh24:mi:ss')))
12 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
filter((("ACT"."START_DATE"<"RSQ"."ENV_START" AND "RSQ"."ENV_START"<=NVL("END_DATE",TO_DATE('
3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND "RSQ"."RN_ASC"=1 AND
INTERNAL_FUNCTION("RSQ"."DIRECTION")) OR ("RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE(' 3000-01-01 00:00:00','syyyy-mm-dd hh24:mi:ss'))
AND "ACT"."START_DATE"<="RSQ"."ENV_END" AND "RSQ"."RN_DSC"=1 AND
INTERNAL_FUNCTION("RSQ"."DIRECTION"))))
16 - access("ACT"."PERSON_ID"="ENV"."PERSON_ID" AND "ACT"."START_DATE">="ENV"."ENV_START" AND
"ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ACT"."START_DATE"<="ENV"."ENV_END" AND
"ENV"."ENV_END">="ACT"."SYS_NC00006$")
filter(("ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ENV"."ENV_END">="ACT"."SYS_NC00006$"))

Recursive Subquery Factor with Hint


-------------------------------------------------------------------------------------------------------| Id |Operation
| Name
|Rows |Bytes|TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------------------|
0 |SELECT STATEMENT
|
|
|
|
|
306K(100)|
|
|
1 | SORT ORDER BY
|
|
1 | 63 |
|
306K (2)| 01:01:21 |
|
2 | NESTED LOOPS
|
|
1 | 63 |
|
306K (2)| 01:01:21 |
|
3 |
VIEW
|
|
3 | 63 |
|
306K (2)| 01:01:21 |
|
4 |
HASH GROUP BY
|
|
3 | 63 |
|
306K (2)| 01:01:21 |
|
5 |
VIEW
|
| 3503K| 70M|
|
306K (2)| 01:01:19 |
|
6 |
UNION ALL (RECURSIVE WITH) BRE F|
|
|
|
|
|
|
|
7 |
WINDOW SORT
|
|21616 | 886K| 1280K| 22328
(1)| 00:04:28 |
|
8 |
WINDOW BUFFER
|
|21616 | 886K|
| 22328
(1)| 00:04:28 |
|
9 |
TABLE ACCESS BY INDEX ROWID | ACTIVITY
|21616 | 886K|
| 22092
(1)| 00:04:26 |
|* 10 |
INDEX FULL SCAN
| ACTIVITY_N1|21616 |
|
|
517
(1)| 00:00:07 |
| 11 |
WINDOW SORT
|
| 3482K| 338M|
|
284K (3)| 00:56:51 |
| 12 |
WINDOW SORT
|
| 3482K| 338M|
|
284K (3)| 00:56:51 |
| 13 |
CONCATENATION
|
|
|
|
|
|
|
110515958.doc

Page 32 of 49

| 14 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 15 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 16 |
INDEX FULL SCAN
| ACTIVITY_N1|96000 |
|
|
517
(1)| 00:00:07 |
|* 17 |
FILTER
|
|
|
|
|
|
|
|* 18 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 19 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 20 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 21 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 22 |
INDEX FULL SCAN
| ACTIVITY_N1|96000 |
|
|
517
(1)| 00:00:07 |
|* 23 |
FILTER
|
|
|
|
|
|
|
|* 24 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 25 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 26 |
TABLE ACCESS BY INDEX ROWID
| ACTIVITY
|
1 | 42 |
|
2
(0)| 00:00:01 |
|* 27 |
INDEX RANGE SCAN
| ACTIVITY_N1|
1 |
|
|
1
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------10 - access("ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter(("START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss')))
17 - filter(("ACT"."START_DATE"<="RSQ"."ENV_END" AND "RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE(' 3000-01-0100:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
18 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
23 - filter(("ACT"."START_DATE"<"RSQ"."ENV_START" AND "RSQ"."ENV_START"<=NVL("END_DATE",TO_DATE('
3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND (LNNVL("RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE('3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
OR LNNVL("ACT"."START_DATE"<="RSQ"."ENV_END") OR
LNNVL("RSQ"."RN_DSC"=1) OR (LNNVL("RSQ"."DIRECTION"='E')
AND LNNVL("RSQ"."DIRECTION"='F')))))
24 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
27 - access("ACT"."PERSON_ID"="ENV"."PERSON_ID" AND "ACT"."START_DATE">="ENV"."ENV_START" AND
"ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ACT"."START_DATE"<="ENV"."ENV_END" AND
"ENV"."ENV_END">="ACT"."SYS_NC00006$")
filter(("ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ENV"."ENV_END">="ACT"."SYS_NC00006$"))

Discussion of Results

110515958.doc

The best method for deep data sets is Analytics

The best method for shallow data sets is Recursive Subquery Factor. The hinted version levels
the performance off at the extremes, but does not make a preferred option

The Model method is largely independent of depth and performs in the wide slice at a level
between the two other methods, except for one intermediate data point where it is better than both

Page 33 of 49

Problem 3: Bursts of Activity


Analytics Solution (None)
I am unaware of a solution to this problem using analytic functions alone.

Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as simple inductions, forward for the group start dates, then backward for the group end
dates. If D is the distance parameter, s, e, S, E are the current start date, end date, group start date, end
date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values then (using C-like terminology
for brevity):
Initial,

S = s; later,

S = (s pS > D) ? s : pS

Final,

E = e; earlier,

E = nS > S ? e : nE

These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the start date, unless
the start date is less than the distance parameter from the previous group start date, in which
case take that value. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows before the last as the next group
end date, unless the group start date is less than the next group start date, in which case take the
current end date. This rule must be processed in descending row order, and this is specified as it
is not the default.
6. The output from the above obtains all groups, but if necessary, can be used within an inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the second
solution method.

110515958.doc

Page 34 of 49

Query Diagram

SQL
SELECT /* MOD */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
RULES (
group_start[rn > 1] = CASE WHEN start_date[cv()] - group_start[cv()-1] > 3 THEN start_date[cv()]
ELSE group_start[cv()-1] END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN end_date[cv()] ELSE group_end[cv()+1] END,
end_date[cv()])
)
ORDER BY 1, 2, 3

Recursive Subquery Factoring Solution


How It Works
This approach is based on new Oracle SQL functionality available only from Oracle Database v11.2,
called Recursive Subquery Factor (RSF).
1. Define a (non-recursive) subquery factor, act, that selects all records after a given root date and
obtains a row number by person ordered by start date.
2. Define a recursive subquery factor.
3. The anchoring branch of the RSF selects the first record from act, with group start as the start
date.
4. The recursive branch extends the record set by joining the next record from act if it is within the
distance limit from the previous group start, and retaining the group start at its previous value.
5. Select all records from the RSF, and get the group end date using an analytic Max.
The idea here is that for cases where the break group is small this will avoid expensive processing of the
entire record set. Well demonstrate this saving in our performance analysis section.

110515958.doc

Page 35 of 49

Query Diagram

.
SQL
WITH
act AS (
SELECT person_id, start_date, end_date, activity_name, activity_id, Row_Number() OVER (PARTITION BY
person_id ORDER BY start_date) rn
FROM activity
WHERE start_date >= '&TODAY'
),
rsq (person_id, rn, start_date, end_date, activity_name, activity_id, group_start) AS (
SELECT person_id, rn, start_date, end_date, activity_name, activity_id, start_date
group_start
FROM act
WHERE rn = 1
UNION ALL
SELECT act.person_id,
act.rn,
act.start_date,
act.end_date,
act.activity_name,
act.activity_id,
rsq.group_start
110515958.doc

Page 36 of 49

FROM
JOIN
ON
AND
AND

act
rsq
rsq.rn
= act.rn - 1
rsq.person_id
= act.person_id
act.start_date - rsq.group_start <= 3

)
SELECT /* RSQ_DST '&TODAY' */ rsq.person_id,
rsq.start_date,
rsq.end_date,
rsq.activity_name,
rsq.activity_id,
rsq.group_start,
Max (rsq.end_date) OVER (PARTITION BY rsq.person_id)
FROM rsq
ORDER BY 1, 2, 3

Performance Analysis
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o

Start date = random day in 20th century

End date = start date + random (d) + 1

Repeat for persons 2 and 3


End loop
Store the root date as the earliest start date generated
This generation process ensures that the size of the record set is proportional to the width point, while the
ranges are of random sizes but within a scale determined by the depth point; larger ranges have no effect
on group size here: the maximum group range is taken to be the depth parameter value in days. In this
way, depth correlates with the group sizes.
The width and depth points, together with the (randomized) size of the root group, are shown in the next
section.
Output Row Counts
The output consists of all the records in the first group, starting at the root date. Of course, each solution
method operates on the same data set, and so the number of records written to file is always the same for
both (and this was checked).

Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187

110515958.doc

W1

W2

W4

W8

W16

W32

W64 W128

1500

3000

6000

1200
0

2400
0

4800
0

9600
0

19200
0

3
3
3
5
4
11
31

3
4
3
9
9
21
70

3
3
3
7
16
38
138

3
6
5
15
32
74
229

4
5
9
28
64
150
494

4
7
16
31
125
290
907

7
12
29
72
218
678
1959

97

164

361

742

1444

2881

5785

12
23
71
138
438
1295
3794
1156
1

Page 37 of 49

CPU Times
Model

Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187

W1

W2

W4

W8

W16

W32

W64 W128

1500

3000

6000

1200
0

2400
0

4800
0

9600
0

19200
0

0.07
0.10
0.10
0.09
0.09
0.09
0.11
0.08

0.16
0.16
0.15
0.14
0.16
0.16
0.15
0.18

0.29
0.30
0.29
0.31
0.31
0.32
0.31
0.29

0.58
0.59
0.59
0.58
0.58
0.59
0.61
0.69

1.19
1.15
1.17
1.19
1.17
1.17
1.20
1.23

2.28
2.34
2.31
2.35
2.36
2.29
2.34
2.43

4.73
4.64
4.67
4.68
4.69
4.71
4.84
5.00

9.39
9.41
9.29
9.41
9.38
9.42
9.57
10.03

Notes

Performance for a given width is essentially independent of depth

Recursive Subquery Factor

Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81

W1

W2

W4

W8

W16

W32

W64 W128

1500

3000

6000

1200
0

2400
0

4800
0

9600
0

19200
0

0.01
0.03
0.03
0.03
0.03

0.03
0.03
0.02
0.03
0.05

0.05
0.03
0.03
0.05
0.06

0.07
0.06
0.06
0.11
0.14

0.13
0.12
0.16
0.24
0.36

0.22
0.29
0.31
0.47
1.11

1.22
1.48
3.05
4.93
13.61

D243

0.05

0.05

0.10

0.22

0.71

2.42

D729

0.01

0.10

0.21

0.53

2.00

D2187

0.08

0.13

0.46

1.62

5.63

7.00
21.4
5

0.53
0.61
0.83
1.62
3.74
10.5
8
27.7
2
82.3
7

110515958.doc

37.66
107.7
3
317.8
2

Page 38 of 49

Notes

Performance for a given width worsens dramatically with depth

Slice Graphs
Wide Slice

Deep Slice

110515958.doc

Page 39 of 49

Explain Plans (Data Point W128-D1)


Model
-----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 5591 (100)|
|
|
1 | SORT ORDER BY
|
|
193K|
14M|
18M| 5591
(1)| 00:01:08 |
|* 2 |
VIEW
|
|
193K|
14M|
| 2073
(1)| 00:00:25 |
|
3 |
SQL MODEL ORDERED |
|
193K| 6422K|
| 2073
(1)| 00:00:25 |
|
4 |
WINDOW SORT
|
|
193K| 6422K| 9112K| 2073
(1)| 00:00:25 |
|* 5 |
TABLE ACCESS FULL| ACTIVITY |
193K| 6422K|
|
310
(1)| 00:00:04 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter("GROUP_START"="MIN_START")
5 - filter("START_DATE">=TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))

Recursive Subquery Factor


0FD9D6648_110EBBB
------------------------------------------------------------------------------------------------------| Id | Operation
|Name
|Rows |Bytes |TempSpc|Cost (%CPU)| Time
|
------------------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 4455 (100)|
|
|
1 | TEMP TABLE TRANSFORMATION
|
|
|
|
|
|
|
|
2 |
LOAD AS SELECT
|
|
|
|
|
|
|
|
3 |
WINDOW SORT
|
| 193K| 6422K| 9112K| 2073
(1)| 00:00:25|
|* 4 |
TABLE ACCESS FULL
|ACTIVITY | 193K| 6422K|
| 310
(1)| 00:00:04|
|
5 |
WINDOW SORT
|
| 6429K| 367M|
| 2382 (23)| 00:00:29|
|
6 |
VIEW
|
| 6429K| 367M|
| 2382 (23)| 00:00:29|
|
7 |
UNION ALL (RECURSIVE WITH) BREADTH F|
|
|
|
|
|
|
|* 8 |
VIEW
|
| 193K|
11M|
| 246
(1)| 00:00:03|
|
9 |
TABLE ACCESS FULL
|SYS_TEMP_| 193K| 6422K|
| 246
(1)| 00:00:03|
|* 10 |
HASH JOIN
|
| 6235K| 588M| 8880K| 2136 (25)| 00:00:26|
| 11 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 12 |
VIEW
|
| 193K|
11M|
| 246
(1)| 00:00:03|
| 13 |
TABLE ACCESS FULL
|SYS_TEMP_| 193K| 6422K|
| 246
(1)| 00:00:03|
------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------4 - filter("START_DATE">=TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
8 - filter("RN"=1)
10 - access("RSQ"."RN"="ACT"."RN"-1 AND "RSQ"."PERSON_ID"="ACT"."PERSON_ID")
filter("ACT"."START_DATE"-"RSQ"."GROUP_START"<=1)
[SYS_TEMP_ was SYS_TEMP_0FD9D6648_110EBBB - truncated to fit the Word box]

Discussion of Results

110515958.doc

No solution method using Analytics was found

The best method for shallow data sets is Recursive Subquery Factor

The best method for deep data sets is Model, which also is independent of depth

Page 40 of 49

Analytics Anomaly Analysis


We observed in the performance analysis sections for problems 1 and 2 that the analytics solutions
behaved in the opposite manner to recursive subquery factoring: performance improved roughly in
proportion to depth for given width. This is surprising, since we might expect performance to remain
largely independent of depth, as with the model solutions, given that depth does not affect overall problem
size. The behaviour of recursive subquery factoring is consistent with expectation, given the construction
of the methods.
After completion of the initial performance analysis (v1.2 of the document) this issue was further analysed.
It was determined by experiment that variations on the queries could avoid the deterioration in
performance with decreasing depth. The problem seems to be due to a glitch in Oracles execution of
queries with First_Value and the IGNORE NULLS option, and occurs in both 10g and 11g XE. It seems as
though Oracle does a lot of unnecessary recalculation for each row processed when there are few null
values.
The first variation involves noting that finding the first value in a list looking forward from the current row is
the same as finding the last value looking back from the end to the current row. At first it might seem that
the latter would be slower, but reuse of processing for previous rows as one progresses through the row
set clearly is important.
The second variation involves removing the First_Value from the existing query, then adding an enclosing
query that gets the group end as the maximum for person and group start.
The performance analysis was repeated for the two variations, plus the original analytic solutions and the
model solution on a single wide slice, using the same data setup programs. As there is no RSF method
now, we have taken the original forms of the problems where all groups are obtained. Both variations now
perform as well for shallow as for deep data sets. Notice that the explain plans suggest that the variations
will perform worse, having additional sort operations and higher estimated costs, but they are wrong.

Analytic Query Variations


Problem 1: Contiguous Ranges
Query NOF (Replace First_Value with Last_Value Inverted)
The query structure is essentially unchanged.
SQL
SELECT /* NOV_NOF */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
ORDER BY person_id, start_date

Explain Plan
---------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
---------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
|
522 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
2 |
WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
3 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
4 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
----------------------------------------------------------------------------------------------

110515958.doc

Page 41 of 49

Query MAX (Remove First_Value, Adding Max in Outer Level)

SQL
SELECT /* NOV_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
Max (end_date) OVER (PARTITION BY person_id, group_start) group_end
FROM (
SELECT
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
)
ORDER BY person_id, start_date

Explain Plan
-----------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-----------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 1041 (100)|
|
|
1 | SORT ORDER BY
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
2 |
WINDOW SORT
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
3 |
VIEW
|
| 19200 | 1125K|
|
484
(1)| 00:00:06 |
|
4 |
WINDOW SORT
|
| 19200 | 1125K| 1448K|
484
(1)| 00:00:06 |
|
5 |
VIEW
|
| 19200 | 1125K|
|
205
(1)| 00:00:03 |
|
6 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
7 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
------------------------------------------------------------------------------------------------

Query Analytics (Original)


Explain Plan
--------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
110515958.doc

Page 42 of 49

--------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
|
205 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
2 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
3 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
4 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Query Model (Original)


Explain Plan
--------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
|
379 (100)|
|
|
1 | SORT ORDER BY
|
| 19200 |
618K|
912K|
379
(1)| 00:00:05 |
|
2 |
SQL MODEL ORDERED |
| 19200 |
618K|
|
379
(1)| 00:00:05 |
|
3 |
WINDOW SORT
|
| 19200 |
618K|
912K|
379
(1)| 00:00:05 |
|
4 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Problem 2: Overlapping Ranges


Query NOF (Replace First_Value with Last_Value Inverted)
The query structure is essentially unchanged.
SQL
SELECT /* ANA_NOF */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
CASE Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date),
running_end+1) > running_end) THEN running_end END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Max (Nvl(end_date, '01-JAN-3000')) OVER (PARTITION BY person_id ORDER BY start_date) running_end
FROM activity
)
)
ORDER BY person_id, start_date

Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------

110515958.doc

Page 43 of 49

Query MAX (Remove First_Value, Adding Max in Outer Level)

SQL
SELECT /* ANA_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
CASE Max (Nvl(end_date, To_Date('01-JAN-3000', 'DD-MON-YY'))) OVER (PARTITION BY person_id,
group_start) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
Max (end_date) OVER (PARTITION BY person_id, group_start) END group_end
FROM (
SELECT /* ANA_OVL */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date),
running_end+1) > running_end) THEN running_end END group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
Max (Nvl(end_date, '01-JAN-3000')) OVER (PARTITION BY person_id ORDER BY start_date) running_end
FROM activity
)
)
)
ORDER BY person_id, start_date

Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
110515958.doc

Page 44 of 49

|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------

Query Analytics (Original)


Explain Plan
------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
------------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 1199 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
2 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
3 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
4 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
6 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
-------------------------------------------------------------------------------------------

Query Model (Original)


Explain Plan
----------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
----------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
| 2226 (100)|
|
|
1 | SORT ORDER BY
|
| 94880 | 3891K| 5592K| 2226
(1)| 00:00:27 |
|
2 |
SQL MODEL ORDERED |
| 94880 | 3891K|
| 2226
(1)| 00:00:27 |
|
3 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 2226
(1)| 00:00:27 |
|
4 |
TABLE ACCESS FULL| ACTIVITY | 94880 | 3891K|
|
171
(1)| 00:00:03 |
-----------------------------------------------------------------------------------------

Performance Analysis
Problem 1: Contiguous Ranges
Group Sizes by Depth
The output consists of all the records (76,000) and the table below gives the average group sizes, which
are written to the log by a query in the data setup program.

Depth
D1
D3
D9
D27
D81
D243
D729
D2187

Group Size
1
2
5
14
41
125
356
985

CPU Times
D1
D3
D9
Depth ->
1
2
5
Group Size
->
ANA
694.71 340.58 130.57
NOF
7.05
6.63
6.88
MAX
5.78
5.91
5.46
MOD
7.78
7.48
7.62

110515958.doc

D27
14

D81
41

D243
125

50.47
7.24
4.99
7.55

21.28
6.53
5.32
7.78

10.87
6.92
5.60
7.53

D729
356

D2187
985

8.63
7.12
5.84
8.08

7.74
7.27
5.42
7.45

Page 45 of 49

Slice Graph (Wide Slice)

Problem 2: Overlapping Ranges


Group Sizes by Depth
The output consists of all the records (96,000) and the table below gives the average group sizes, which
are written to the log by a query in the data setup program.

Depth
D1
D3
D9
D27
D81
D243
D729
D2187

Group Size
4
6
9
33
120
2602
24615
32000

CPU Times
D1
D3
D9
Depth ->
4
6
9
Group Size
->
ANA
278.09 180.97 120.87
NOF
9.18
8.70
8.89
MAX
9.19
8.95
8.95
MOD
11.76
12.12
12.33

110515958.doc

D27
33

D81
120

D243
2602

36.26
9.53
8.83
11.37

16.6
8.76
8.63
11.61

9.13
8.83
8.69
11.64

D729
24615

D2187
32000

9.37
9.15
8.95
11.97

8.40
8.42
8.60
11.47

Page 46 of 49

Slice Graph (Wide Slice)

110515958.doc

Page 47 of 49

Conclusions
Solution methods have been presented for a number of range-based SQL grouping problems, including
relatively new techniques from Oracle Database 10.1 and 11.2. It has been shown that the best method
depends not just on the size of the data set, but also on its shape. A few summary points may be made in
relation to these problems:

110515958.doc

The Model clause tends to produce relatively simple SQL that performs consistently across data
sets

The new Recursive Subquery Factor feature can be extremely efficient in cases where the
records in the solution set are much fewer than the total, but only works for a single group

Solutions using analytic functions are slightly more efficient than model solutions where available,
but an important performance glitch in certain cases has been identified and needs to be worked
around

Explain plan costings should be treated with caution

SQL developers interested in performance need to be proficient in all three techniques (most are
familiar only with the older, from Oracle v8, analytic functions technique)

Performance testing can be more effective when executed by automated methods across multidimensional domains

Page 48 of 49

References
REF

Document

REF-1

Activities and breaks

REF-2
REF-3

SQL Pivot and Prune Queries Keeping an Eye on


Performance
Oracle Database SQL Language Reference 11g Release 2
(11.2)

110515958.doc

Details
Question by Jayadev on Tom Kytes Oracle
database forum
BP Furey, June 2011
http://www.oracle.com/pls/db112

Page 49 of 49

S-ar putea să vă placă și