Forming Range-Based Break Groups With Advanced SQL

FORMING RANGE-BASED BREAK GROUPS
WITH ADVANCED SQL
110515958.doc
Author:
Brendan Furey
Creation Date:
12 June 2011
Version:
1.4
Last Updated:
25 September 2012
Page 1 of 49
Table of Contents
Introduction.......................................................................................................4
Hardware/Software Summary.......................................................................4
Problem Definitions and Examples.....................................................................5
Problem Definitions......................................................................................5
Problem 1: Contiguous Ranges.............................................................................5
Problem 2: Overlapping Ranges...........................................................................5
Problem 3: Bursts of Activity.................................................................................5
Functional Test Data....................................................................................5

Activity_nov, Activity Table...................................................................................5
Indexes..................................................................................................................5
Test Cases.............................................................................................................6
Test Data...............................................................................................................6
Test Data Grouping Diagram........................................................................7

Performance Testing Strategy......................................................................8
SQL Change for Single Break Group Problems.....................................................9
Problem 1: Contiguous Ranges.........................................................................10

Analytics Solution.......................................................................................10
How It Works.......................................................................................................10
Query Diagram....................................................................................................10
SQL......................................................................................................................11
Inline View Diagram............................................................................................11
Solution Stage Table...........................................................................................11
Model Solution...........................................................................................12
How It Works.......................................................................................................12
Query Diagram....................................................................................................13
SQL......................................................................................................................13
Recursive Subquery Factor Solution...........................................................14

How It Works.......................................................................................................14
Query Diagram....................................................................................................15
SQL......................................................................................................................15
Performance Analysis.................................................................................16
Test Data Sets.....................................................................................................16
Output Record Counts.........................................................................................16
CPU Times...........................................................................................................17
Slice Graphs........................................................................................................19
Explain Plans (Data Point W256-D1)...................................................................19
Discussion of Results..........................................................................................20
Problem 2: Overlapping Ranges.......................................................................21

Analytics Solution.......................................................................................21
How It Works.......................................................................................................21
Query Diagram....................................................................................................21
SQL......................................................................................................................21
Inline View Diagram............................................................................................22
Solution Stage Table...........................................................................................22
Model Solution............................................................................................23
How It Works.......................................................................................................23
Query Diagram....................................................................................................24
SQL......................................................................................................................24
Recursive Subquery Factor Solution...........................................................24

How It Works.......................................................................................................24
Query Diagram....................................................................................................26
SQL......................................................................................................................27
Test Data Sets.....................................................................................................27
Output Record Counts.........................................................................................28
CPU Times...........................................................................................................28
110515958.doc
Page 2 of 49
Slice Graphs........................................................................................................31
Explain Plans (Data Point W64-D1).....................................................................31
Problem 3: Bursts of Activity............................................................................34

Analytics Solution (None)...........................................................................34
Model Solution............................................................................................34
How It Works.......................................................................................................34
Query Diagram....................................................................................................35
SQL......................................................................................................................35
Recursive Subquery Factoring Solution.......................................................35

How It Works.......................................................................................................35
Query Diagram....................................................................................................36
SQL......................................................................................................................36
Test Data Sets.....................................................................................................37
Output Row Counts.............................................................................................37
CPU Times...........................................................................................................38
Slice Graphs........................................................................................................39
Explain Plans (Data Point W128-D1)...................................................................40
Analytics Anomaly Analysis..............................................................................41

Analytic Query Variations...........................................................................41
Problem 1: Contiguous Ranges...........................................................................41
Problem 2: Overlapping Ranges.........................................................................43
Problem 1: Contiguous Ranges...........................................................................45
Problem 2: Overlapping Ranges.........................................................................46
CPU Times...........................................................................................................46
Conclusions.....................................................................................................48
References.......................................................................................................49
Change Record
Date
Author
Version
12-Jun-2011
BPF
1.0
14-Jun-2011
BPF
1.1
19-Jul-2011
BPF
1.2
02-Aug-2011
25-Sep-2012
BPF
BPF
1.3
1.4
110515958.doc
Change Reference
Initial covering 2 problems, analytic solutions only, no performance
analysis
Added test case 5, and tabulated intermediate solutions
Restructured, adding third problem, Model and RSF solutions, and
performance analysis
Analytics anomaly analysis
References now hyperlinks
Page 3 of 49
Introduction
Records in a database often include range fields, such as a start and end time for some activity, and it is
sometimes desired to group the records by range. There are several possible ways of grouping by range:
In one case the records do not overlap, but additional breaking fields may be present; in a second case,
records may overlap, but additional breaking fields do not then make sense; in the third case considered
('bursts of activity'), only a single start field is used and break groups consist of all the records whose
range start is within a given distance from the starting point. For each problem, we consider two variations
that affect the choice of SQL: In the first, we are looking for all break groups, while in the second we want
to retrieve only a single one.
This article provides solutions for these problems, using three SQL techniques, namely: Analytic
Functions, Model Clause, and Recursive Subquery Factoring. Diagrams are used extensively to depict
query structures and help explain the solutions.
Performance analyses are included that compare performance of the three methods (only two for the third
problem) on each problem across a two-dimensional domain of size and depth. The analyses follow an
approach described in an earlier article (SQL Pivot and Prune Queries Keeping an Eye on
Performance). The results show that the best method depends on the depth of the groups, with Analytic
Functions being best for deep groups and Recursive Subquery Factoring best for shallow groups where
only a single group is required. The Model Clause performs best where an Analytic Functions solution is
not available (the bursts of activity problem) and either all groups are required or a single deep group is
required. The Model Clause also gives very stable performance across depth range, and is surprisingly
simple in structure. The article may be of interest to developers who have yet to learn about some of these
techniques.
An important performance glitch was discovered in using the analytic function First_Value with the Ignore
Nulls option, and methods for avoiding it presented.
This document replaces a preliminary version (Forming Range-Based Break Groups with SQL Analytic
Functions) with only analytic solutions, two problems, and no performance analysis.
Hardware/Software Summary
Component
Database
Diagrammer
Operating System
Computer
110515958.doc
Description
Oracle Database 11g Express Edition Release 11.2.0.2.0 - Beta
Microsoft Visio 2003 (11.3216.5606)
Microsoft Windows 7 Home Premium (32 bit)
Samsung X120, 3GB memory, Intel U4100 @ 1.3GHz x 2
Page 4 of 49
Problem Definitions and Examples

Problem Definitions
In this section, we define the problems generically. Consider the fields in a record set to divide into the
following categories:
key
range start, range end

problem 3)
- range fields (range end is just viewed as another attribute in
break
- break fields (where allowed)
other
- partition by fields
- any other fields
For each problem, we consider two variations that affect the choice of SQL: In the first, we are looking for
all break groups, while in the second we want to retrieve only a single one enclosing (or, starting from, for
the third problem) a particular value.
Problem 1: Contiguous Ranges
The first problem is to obtain for each record a group start, group end pair that are the range start and
range end values for the records that respectively start and end the break group of the current record. The
records are to be ordered by range start within the partitioning key, and a new break group starts when,
between successive records, either there is a gap between range end and range start fields, or any of the
break fields change value. No overlaps are allowed in the ranges within a key.
Problem 2: Overlapping Ranges
The second problem is the same as the first but with no break fields and overlapping is allowed. In other
words, groups consist of all records that overlap, counting contiguity as overlapping.
Problem 3: Bursts of Activity
The third problem is to determine the break groups using distance from the group start point, with
overlapping allowed (since the range end is here just another attribute). In other words, once a group
starts, all records that start within a fixed distance from the group start are in the group, and the first record
after the end of a group defines the next group start.
Functional Test Data

The problem data structure is based on a question posed in Tom Kytes Oracle forum, see Activities and
breaks, while the test data are my own. We will use it for all three problems, but the first problem will use a
separate table of the same structure but with indexes different from those for the others.
Activity_nov, Activity Table
Column
activity_id
person_id
start_date
end_date
activity_name
Type
Number
Number
Date
Date
Char(10)
Indexes
Activity_nov (problem 1, indexes unique)
Index
ACTIVITY_NOV_U1
ACTIVITY_NOV_U2
110515958.doc
Column
person_id
start_date
person_id
end_date
Page 5 of 49
Activity (problems 2 and 3, indexes non-unique)
Index
Column
ACTIVITY_N1
ACTIVITY_N2
person_id
start_date
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
person_id
Nvl(end_date, To_Date(' 3000-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss')
start_date
Test Cases
There are five test cases, two for the first problem, three for the other two, which can use the same data
sets, with a person for each case. The groups for the third problem are defined by a burst size limit of 3
days. Oracle standard dates have 1 second precision, but well take a time component of zero in the test
data for simplicity as this causes no loss of generality.
Test
Case
Scenario
Test Cases T1 and T2 - Non-Overlapping with Additional Breaks
T1
T2
3 records, gap, 2 records, gap, 1 record

3 records, gap, 2 records (names differ), gap, 1 record null end
date
Test Cases T3, T4, T5 - Overlapping without Additional Breaks
T3
3 records (with overlaps), gap, 2 records (second enclosed by
first), gap, 1 record
T4
3 records (with overlaps), gap, 3 records, second overlaps
first, with null end date

T5
Test Data
Per
Act
Id
Id
3 records (with overlaps), gap, 2 records (second enclosed by

first), gap but not with respect to first, 1 record
Activity
Name
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
10
LEAVE
11
TRAINING
12
TRAINING
13
LEAVE
14
LEAVE
110515958.doc
Start Date
End Date
01-Jun-11
02-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
02-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
Group
Start
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11
01-Jun11
01-Jun11
Group
End
07-Jun11
07-Jun11
07-Jun11
14-Jun11
14-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11
09-Jun-11
Burst
Date
01-Jun-11
14-Jun-11
08-Jun-11
07-Jun11
07-Jun11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
Page 6 of 49
15
LEAVE
16
LEAVE
17
TRAINING
18
TRAINING
19
LEAVE
20
LEAVE
21
LEAVE
22
LEAVE
23
TRAINING
24
TRAINING
25
LEAVE
26
LEAVE
27
LEAVE
28
LEAVE
29
TRAINING
30
TRAINING
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
20-Jun-11
30-Jun-11
01-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
15-Jun-11
30-Jun-11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
07-Jun11
16-Jun11
16-Jun11
30-Jun11
07-Jun11
07-Jun11
07-Jun11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
20-Jun-11
07-Jun11
07-Jun11
07-Jun11
30-Jun11
30-Jun11
30-Jun11
01-Jun-11
01-Jun-11
01-Jun-11
08-Jun-11
08-Jun-11
15-Jun-11
Test Data Grouping Diagram

The red and yellow boxes in the diagram show the required groupings. The numeric column headers are
the days of this month of June.
110515958.doc
Page 7 of 49
Performance Testing Strategy

In SQL Pivot and Prune Queries Keeping an Eye on Performance we applied an approach to
performance testing of SQL queries whereby the queries are tested across a 2-dimensional domain, using
a testing framework developed for that work. The same approach has been followed here, using the same
framework (note that minor changes to the PL/SQL package and tables were made for this article, such as
excluding file writing times from the recorded times). Further details can be found in the referenced article,
from which the following description is extracted:
In order to provide a realistic scenario, the queries are executed within the context of a simple outbound
interface that writes each record to a file as a comma-separated string. A small PL/SQL package has been
written to automate the testing. The program loops over width and depth dimensions, and for each data
set point makes a call to a separate package to set up the test data and have the CBO statistics gathered;
it then loops over a set of queries defined in the same separate package as strings that are executed by
the main package.
The execution plan is obtained in each case, using an Oracle API, and is written to the generic log. The
query string includes a random number that guarantees a hard-parse and thus recalculation of the
execution plans at each data set point.
For this work, width was taken to correspond to the total number of records, while depth was taken to
correspond to group size. The definitions of the test data vary by problem and are described separately
later.
110515958.doc
Page 8 of 49
SQL Change for Single Break Group Problems

One of the solution techniques is only applicable to the form of problem where a single break group is
required, and so for consistency that is the form used for all solutions in performance testing. It is worth
noting that the other two solution techniques solve this form by obtaining all groups within an inline view,
then applying a restriction outside the view. This means the timings for these should be very similar to
what would be obtained for finding all groups. The change required looks like this:
SELECT FROM (SQL for all groups minus ORDER BY)
WHERE To_Date (root_date, 'DD-MON-YYYY HH24:MI:SS') BETWEEN group_start AND group_end
ORDER BY
110515958.doc
Page 9 of 49

Analytics Solution
How It Works
The first solution for this problem uses analytic functions (see Oracle Database SQL Language
Reference 11g Release 2 (11.2)), partitioned by person and ordered by start date in a two level query
structure.
1. Within an inline view, use Lag and Lead functions with CASE expressions to set group start and
group end dates on the respective start and end records of the groups, leaving other values null.
2. Select all the original fields from the inline view, as well as the new fields within First_Value,
Last_Value functions with the IGNORE NULLS option
3. The output from step 2 obtains all groups, and if necessary, can be used within another inline view
to restrict the output to certain groups only (e.g. a 'current' group)
The query diagram, SQL and functional testing use the form for obtaining all groups, while the
performance testing uses the form for obtaining a single group, for consistency with the third solution
method.
Query Diagram
Notes
The diagram notation follows and extends notation developed earlier, including in SQL Pivot and Prune
Queries Keeping an Eye on Performance. The key can be referred to for subsequent diagrams.
110515958.doc
Page 10 of 49
SQL
SELECT /* NO_OVERLAP */
person_id, start_date, end_date, activity_name, activity_id id,
Last_Value (group_start IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date)
group_start,
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) group_end
FROM (
SELECT person_id, start_date, end_date, activity_name, activity_id,
CASE WHEN (start_date > Nvl (Lag (end_date) OVER (PARTITION BY person_id ORDER BY start_date),
start_date-1)) OR
(activity_name != Lag (activity_name) OVER (PARTITION BY person_id ORDER BY start_date))
THEN start_date END group_start,
CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date), end_date+1) >
end_date) OR
(activity_name != Lead (activity_name) OVER (PARTITION BY person_id ORDER BY
start_date)) THEN end_date END group_end
FROM activity_nov
)
ORDER BY person_id, start_date
Inline View Diagram

The diagram below attempts to show how the inline view obtains the group start and end dates. The start
points of the red arrows indicate the records what have group start dates set in the inline view (1, 4, 6 for
T1, 7, 10. 11, 12 for T2); the end points, which have group end dates set (3, 5, 6 for T1, 9, 10, 11, 12 for
T2). Since all other group values are null, the outer query can set the correct values by looking for the last
not null value from the past, for the group start date, and the first not null value in the future, for the group
end date.
Solution Stage Table

The table below shows how the solution proceeds in stages, through level 1.
Per
Id
Act
Id
Activity
Name
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
LEAVE
110515958.doc
Record Level
Start
End Date
Date
01-Jun-11 02-Jun-11
Level 1 View
Start
End Date
Date
01-Jun-11
02-Jun-11
04-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
30-Jun-11
20-Jun-11
01-Jun-11
02-Jun-11
01-Jun-11
02-Jun-11
04-Jun-11
07-Jun-11
08-Jun-11
14-Jun-11
30-Jun-11
Solution
Start
End Date
Date
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
11
08-Jun14-Jun-11
11
08-Jun14-Jun-11
11
20-Jun30-Jun-11
11
01-Jun07-Jun-11
11
01-Jun07-Jun-11
Page 11 of 49
LEAVE
10
LEAVE
11
TRAINING
12
TRAINING
04-Jun-11
07-Jun-11
08-Jun-11
09-Jun-11
08-Jun-11
09-Jun-11
09-Jun-11
14-Jun-11
09-Jun-11
14-Jun-11
20-Jun-11
07-Jun-11
20-Jun-11
11
01-Jun11
08-Jun11
08-Jun11
20-Jun-11
07-Jun-11
09-Jun-11
14-Jun-11
Model Solution
How It Works
The key to solving this problem using Oracles Model clause (Oracle Database SQL Language
Reference 11g Release 2 (11.2)) is to realise that the solution can be represented as simple inductions,
forward for the group start dates, then backward for the group end dates. If, a, s, e, S, E are the current
activity, start date, end date, group start date, end date and (pa, ps, pe, pS, pE) and (na, ns, ne, nS, nE)
are the prior and next values then (using C-like terminology for brevity):
Initial,
S = s;
later,
S = (a != pa or s > pe) ? s : pS
Final,
E = e;
earlier, E = nS > S ? e : nE
These inductions can easily be implemented as rules within the model clause:
1. Form the basic Select, with all the table columns required, and append placeholders group_start
and group_end
2. Add the Model keyword, partitioning by person, dimensioning by analytic function Row_Number,
ordering by start date within person, and with the remaining columns as measures
3. Initialise group start and end to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the previous group
start date, unless there is a gap or the activity changes, relative to the previous record, in which
case take the new start date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows as the next group end date,
unless the next group start date is greater than the current one, or there is no next (i.e. at the last
row), in which case take the current end date. This rule must be processed in descending row
order, and this is specified as it is not the default.
6. The output from the above obtains all groups, but if necessary, can be used within an inline view
The query diagram, SQL and functional testing use the form for obtaining all groups, while the
performance testing uses the form for obtaining a single group, for consistency with the third solution
method.
110515958.doc
Page 12 of 49
Query Diagram
Notes
Queries with the Model clause have a structure that is rather different from other queries, and the
diagrams attempt to reflect that structure for these problems. The main query feeds its output into an array
processing component with a set of rules that specify how any additional (here) data items (called
measures) are to be calculated, in a mostly declarative fashion.
The model box above contains 4 specification types:
Partition
- processing is to be performed separately by one or more columns; the same
meaning as in analytic functions
Dimension
here
Measures
- remaining columns that may be calculated or updated by the rules, possibly
including placeholders from the main query
Rules
- a set of rules that specify measure calculation; rules are processed
sequentially, unless otherwise specified; in the diagram:
- columns by which the array is dimensioned; can included analytic functions, as
- the current dimension value, here row number ordered by start
- maximum dimension value
f(n-1,n)
(and so on)
- denotes that the value depends on values from previous and current rows
^
- denotes that the calculation progresses in ascending order by dimension;
this is the default so does not have to be coded
v
- denotes that the calculation progresses in descending order by dimension;
this is not the default so does have to be coded
SQL
SELECT /* MOD_OVL */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity_nov
MODEL
PARTITION BY (person_id)
DIMENSION BY (Row_Number() OVER (PARTITION BY person_id ORDER BY start_date) rn)
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
110515958.doc
Page 13 of 49
RULES (
group_start[rn > 1] =
CASE WHEN start_date[cv()] > end_date[cv()-1] OR activity_name[cv()] != activity_name[cv()-1]
THEN start_date[cv()] ELSE group_start[cv()-1] END,
group_end[ANY] ORDER BY rn DESC = PRESENTV (group_start[cv()+1],
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
end_date[cv()])
)
ORDER BY 1, 2
Recursive Subquery Factor Solution

How It Works
This approach is based on new Oracle SQL functionality available only from Oracle Database v11.2,
called Recursive Subquery Factor (RSF) (Oracle Database SQL Language Reference 11g Release 2
(11.2)).
1. Define a recursive subquery factor.
2. The anchoring branch of the RSF selects records defined by the start point. A direction column is
defined that here is set to E for Either, meaning extend in either direction in the recursive
branch.
3. The recursive branch extends the record set by joining records that link to extreme parent records
and that push the envelope. The direction column is set to B or F according as the direction of
extension (Forward or Backward).
4. Select all records from the RSF, applying analytic Min, Max to get the group start and end dates.
The idea here is that for cases where the group is small this will avoid expensive processing of the entire
record set. Well demonstrate this saving in our performance analysis section. This solution only applies to
the form of problem where a single group is required.
110515958.doc
Page 14 of 49
Query Diagram
Notes
Queries with a recursive subquery factor have a special structure, and the diagrams attempt to reflect that
structure for these problems. The recursive factor is a subquery having a Union All structure in which there
are two branches:
Anchor Branch
Recursive Branch
- this is a normal query from which the recursion begins

- this is a query that references the recursive factor itself by alias
Notice the use of subtypes in the diagram: records in the recursive branch can be split into back and
front subtypes.
SQL
WITH
SELECT
FROM
WHERE
UNION
rsq (person_id, start_date, end_date, activity_name, activity_id, direction) AS (

person_id, start_date, end_date, activity_name, activity_id, 'E' direction
activity_nov
start_date <= '&TODAY' AND Nvl(end_date, To_Date ('&TODAY', 'DD-MON-YYYY') + 1) > '&TODAY'
ALL
110515958.doc
Page 15 of 49
SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id, CASE WHEN

act.start_date = rsq.end_date THEN 'F' ELSE 'B' END
FROM rsq
JOIN activity_nov act
ON ((act.start_date
= rsq.end_date AND direction IN ('E', 'F')) OR
(act.end_date
= rsq.start_date AND direction IN ('E', 'B')))
AND act.person_id
= rsq.person_id
AND act.activity_name
= rsq.activity_name
)
SELECT /* RSQ_NON '&TODAY' */ person_id, start_date, end_date, activity_name, activity_id,
Min (start_date) OVER (PARTITION BY person_id) grp_start, Max (end_date) OVER (PARTITION BY
person_id) grp_end
FROM rsq
Performance Analysis
Test Data Sets
For the performance analysis it is simpler to generate test date using a single activity, with groups
determined only by the dates. If w and d are the numeric width and depth points, records are generated for
three persons as follows:
Let random(d) be a random integer between 1 and d (generated afresh on each access)
Start date = '01-JAN-1900'
Record limit = 3 * 100 * w
Loop while number of records <= record limit
Add group of records for person 1, with group size = random(d), as follows:
o
First start date = last start date + random(d)
Subsequent start date = previous start date + 1
End date = start date + 1
Exit if record limit reached
Repeat for persons 2 and 3

End loop
Store the root date as the mid point of the first group of records generated
This generation process ensures that the size of the record set is proportional to the width point, while the
groups are of random sizes but within a scale determined by the depth point. The width and depth points,
together with the (randomized) size of the root group, are shown in the next section.
Output Record Counts
The output consists of all the records in the root group, which is defined as the group containing the root
date, and has at least one record by definition. Of course, each solution method operates on the same
data set, and so the number of records written to file is always the same for both (which was checked).
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
110515958.doc
W1
W2
W4
W8
W16
W32
W64
W12
W256
8
300
600
1200
2400
4800
9600
1920
0
3840
0
76800
1
3
5
8
80
6
300
300
300
1
1
8
16
11
219
93
600
600
1
2
3
8
25
135
75
1196
717
1
3
5
26
55
196
290
1501
2400
1
2
7
10
26
49
134
972
4330
1
1
8
2
72
131
68
1300
3737
1
2
8
13
67
90
547
346
4243
1
1
2
2
41
132
627
437
1331
1
3
2
9
42
168
446
1265
4103
Page 16 of 49
CPU Times
Analytics
Query
W1
W2
W4
W8
W16
D1
0.02
0.05
0.17
0.64
2.42
D3
0.01
0.03
0.10
0.33
1.27
D9
D27
D81
D243
D729
D2187
D6561
0.02
0.02
0.00
0.02
0.02
0.03
0.02
0.01
0.03
0.01
0.04
0.03
0.07
0.06
0.05
0.03
0.03
0.03
0.03
0.10
0.08
0.16
0.08
0.04
0.06
0.05
0.18
0.14
0.50
0.22
0.12
0.09
0.10
0.15
0.33
W12
W64
8
W256
38.4
147. 604.9
9.73
5
35
1
19.2
298.0
4.87
8
72.8
0
28.4 113.0
1.85
7.24
1
2
0.71
2.62 10.18 39.16
0.32
1.00
3.81 14.23
0.22
0.57
1.55
5.27
0.17
0.41
0.99
2.64
0.25
0.34
0.68
1.68
0.31
0.51
0.67
1.42
W32
Notes
The graph generated with Microsoft Excel 2007 may be slightly misleading as the pale blue peak
does not appear to reach 605.
Performance for a given width improves dramatically with depth
Model
Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
110515958.doc
W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.03
0.03
0.06
0.10
0.19
0.38
0.74
1.43
2.98
0.03
0.03
0.06
0.11
0.19
0.37
0.75
1.53
3.01
0.02
0.03
0.07
0.11
0.20
0.37
0.77
1.52
2.99
0.01
0.03
0.06
0.10
0.17
0.37
0.74
1.51
2.99
0.05
0.03
0.06
0.11
0.21
0.39
0.73
1.50
3.00
0.02
0.05
0.07
0.11
0.18
0.38
0.75
1.54
3.06
0.04
0.03
0.04
0.12
0.20
0.39
0.77
1.51
3.02
0.04
0.08
0.12
0.19
0.22
0.47
0.79
1.50
3.09
0.03
0.06
0.12
0.19
0.44
0.56
0.95
1.60
3.18
Page 17 of 49
Notes
Performance for a given width is essentially independent of depth
Recursive Subquery Factor
Query
D1
D3
D9
D27
D81
D243
D729
D2187
D6561
W12
W1
W2
W4
W8
W16
W32
W64
8
W256
0.02
0.02
0.02
0.01
0.02
0.01
0.01
0.03
0.05
0.01
0.02
0.00
0.01
0.01
0.02
0.02
0.02
0.09
0.01
0.02
0.02
0.01
0.03
0.05
0.06
0.05
0.07
0.02
0.02
0.01
0.03
0.04
0.01
0.08
0.05
0.25
0.03
0.02
0.02
0.08
0.06
0.27
0.44
0.53
1.05
0.01
0.08
0.09
0.19
0.11
0.42
0.60
1.61
4.33
0.11
0.03
0.03
0.29
0.16
0.19
2.82
7.95 12.02
0.14
0.42
1.45
2.26
1.02
3.01
2.15
4.51 33.49
13.2
27.0
17.8
0.12
0.40
0.54
5.55
17.8
9
3
3 76.35
Notes
110515958.doc
Performance for a given width worsens dramatically with depth
Page 18 of 49
Slice Graphs
Wide Slice
Deep Slice
Explain Plans (Data Point W256-D1)

Analytics
--------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------|
0 | SELECT STATEMENT
|
|
|
|
6 (100)|
|
|
1 | SORT ORDER BY
|
|
12 |
828 |
6 (50)| 00:00:01 |
|* 2 |
VIEW
|
|
12 |
828 |
5 (40)| 00:00:01 |
|
3 |
WINDOW SORT
|
|
12 |
828 |
5 (40)| 00:00:01 |
|
4 |
VIEW
|
|
12 |
828 |
4 (25)| 00:00:01 |
|
5 |
WINDOW SORT
|
|
12 |
336 |
4 (25)| 00:00:01 |
|
6 |
TABLE ACCESS FULL| ACTIVITY_NOV |
12 |
336 |
3
(0)| 00:00:01 |
--------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter(("GROUP_START"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd
hh24:mi:ss')))
Model
-------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes | Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------110515958.doc
Page 19 of 49
|
|
|
|
|
5 (100)|
|
|
1 | SORT ORDER BY
|
|
12 |
828 |
5 (40)| 00:00:01 |
|* 2 |
VIEW
|
|
12 |
828 |
4 (25)| 00:00:01 |
|
3 |
SQL MODEL ORDERED |
|
12 |
336 |
4 (25)| 00:00:01 |
|
4 |
WINDOW SORT
|
|
12 |
336 |
4 (25)| 00:00:01 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV |
12 |
336 |
3
(0)| 00:00:01 |
-------------------------------------------------------------------------------------Predicate Information (identified by operation id):
hh24:mi:ss')))

--------------------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |C(%CPU)| Time
|
--------------------------------------------------------------------------------------------------------|
|
|
|
|6 (100)|
|
|
1 | WINDOW SORT
|
|
2 |
102 |6
(0)| 00:00:01 |
|
2 |
VIEW
|
|
2 |
102 |6
(0)| 00:00:01 |
|
3 |
UNION ALL (RECURSIVE WITH) BREADTH FIRST|
|
|
|
|
|
|* 4 |
TABLE ACCESS BY INDEX ROWID
| ACTIVITY_NOV
|
1 |
28 |2
(0)| 00:00:01 |
|* 5 |
INDEX SKIP SCAN
| ACTIVITY_NOV_U1 |
1 |
|1
(0)| 00:00:01 |
|
6 |
NESTED LOOPS
|
|
|
|
|
|
|
7 |
NESTED LOOPS
|
|
1 |
69 |4
(0)| 00:00:01 |
|
8 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|* 9 |
INDEX RANGE SCAN
| ACTIVITY_NOV_U1 |
6 |
|1
(0)| 00:00:01 |
|* 10 |
| ACTIVITY_NOV
|
1 |
28 |2
(0)| 00:00:01 |
--------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------4 - filter("END_DATE">TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
5 - access("START_DATE"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter("START_DATE"<=TO_DATE(' 1900-01-02 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
9 - access("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
10 - filter(((("ACT"."START_DATE"="RSQ"."END_DATE" AND INTERNAL_FUNCTION("DIRECTION")) OR
("ACT"."END_DATE"="RSQ"."START_DATE" AND "ACT"."END_DATE" IS NOT NULL AND
INTERNAL_FUNCTION("DIRECTION"))) AND "ACT"."ACTIVITY_NAME"="RSQ"."ACTIVITY_NAME"))
Discussion of Results
110515958.doc
The best method for deep data sets is Analytics
The best method for shallow data sets is Recursive Subquery Factor
The Model method is independent of depth and performs in the wide slice at a level between the
two other methods, except for one intermediate data point where it is better than both
Page 20 of 49

Analytics Solution
How It Works
The solution for the second problem is derived from that for the first, but without the additional break
checking, and with an extra starting step to obtain a running end date that is the largest end date up to
the current record, ordered by start date. The running end date then replaces the end date in the next
step. The query thus has one more level.
0. Within an inline view, use Max to set a running end date on each record, converting null end dates
to a large value
1. Within an inline view, select all the original fields from the level-0 inline view, and use Lag and
Lead functions with CASE expressions to set group start and group end dates on the respective
start and running end dates of the break groups, leaving other values null.
2. Select all the original fields from the inline view, as well as the new fields within First_Value,
Last_Value functions with the IGNORE NULLS option, and convert back any large values to null
3. The output from step 2 solves the problems as defined, but if necessary, can be used within
another inline view to restrict the output to certain groups only (e.g. a 'current' group)
Query Diagram
SQL
SELECT /* OVERLAP */
group_start,
CASE First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
First_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date RANGE
BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) END group_end
FROM (
CASE WHEN (start_date > Nvl (Lag (running_end) OVER (PARTITION BY person_id ORDER BY start_date),
110515958.doc
Page 21 of 49
start_date-1)) THEN start_date END group_start,

CASE WHEN (Nvl (Lead (start_date) OVER (PARTITION BY person_id ORDER BY start_date),
running_end+1) > running_end) THEN running_end END group_end
FROM (
Max (Nvl(end_date, '01-JAN-3000')) OVER (PARTITION BY person_id ORDER BY start_date) running_end
FROM activity
WHERE person_id IN (3, 4)
)
)
Inline View Diagram

The diagram below shows how the level-0 inline view obtains the running end dates, which are denoted by
the end points of the red arrows. The red extension blocks denote records where the running end date is
greater than the current end date (17 and 24). Now, although we still have overlaps, the solution for the
first class will work because the running end date ensures the current record always has the latest date for
the current group: Thus in record 17 below, we wont wrongly set the group end date to 14 on seeing the
gap to record 18, and had record 18 started at 15, we would have correctly assigned it to G2, not left it in
G3 (I have added a test case T5 for that, but not included it in the diagram).
Solution Stage Table

The table below shows how the solution proceeds in stages from level 0, through level 1.
Per
id
Act
id
Activity
name
13
LEAVE
14
LEAVE
15
LEAVE
16
LEAVE
17
TRAINING
18
TRAINING
19
LEAVE
110515958.doc
Record Level
Running
(Level 0)
Level 1 View
Start
date
01-Jun-11
End date
End date
Start
date
01-Jun-11
03-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
16-Jun-11
20-Jun-11
30-Jun-11
30-Jun-11
20-Jun-11
01-Jun-11
03-Jun-11
03-Jun-11
01-Jun-11
End date
07-Jun-11
08-Jun-11
16-Jun-11
30-Jun-11
Solution
Start
date
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
20-Jun11
01-Jun11
End date
07-Jun-11
07-Jun-11
07-Jun-11
16-Jun-11
16-Jun-11
30-Jun-11
07-Jun-11
Page 22 of 49
20
LEAVE
21
LEAVE
22
LEAVE
23
TRAINING
24
TRAINING
25
LEAVE
26
LEAVE
27
LEAVE
28
LEAVE
29
TRAINING
30
TRAINING
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
20-Jun-11
07-Jun-11
08-Jun-11
01-Jan-00
30-Jun-11
01-Jan-00
01-Jun-11
03-Jun-11
03-Jun-11
02-Jun-11
05-Jun-11
05-Jun-11
04-Jun-11
07-Jun-11
07-Jun-11
08-Jun-11
16-Jun-11
16-Jun-11
09-Jun-11
14-Jun-11
16-Jun-11
15-Jun-11
30-Jun-11
30-Jun-11
01-Jan-00
01-Jun-11
07-Jun-11
08-Jun-11
30-Jun-11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
01-Jun11
01-Jun11
01-Jun11
08-Jun11
08-Jun11
08-Jun11
07-Jun-11
07-Jun-11
07-Jun-11
07-Jun-11
07-Jun-11
30-Jun-11
30-Jun-11
30-Jun-11
Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as three simple inductions. If s, e, S, E are the current start date, end date, group start date,
end date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values, ordering by start date,
then (using C-like terminology for brevity):
Initial,
E = e; later,
E = (e > pE) ? e : pE
-- this gets the running latest end dates
Initial,
S = s; later,
S = (s > pE) ? s : pS
-- this gets group start dates
Final,
E = e; earlier,
E = (S < nS) ? E : nE
-- this gets group end dates
and group_end
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain a running latest end date for all rows after the first as the previous
running end date, unless the current end date is greater than the previous running end date, in
which case take the new end date. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group start date for all rows after the first as the start date,
unless the start date is greater than the previous running latest end date,, in which case take the
previous group start date. This rule will be processed in the default ascending row order.
6. Define the third rule to obtain the group end date for all rows before the last as the next running
latest end date, unless the group start date is less than the previous group start date, in which
case take the next group end date. This rule must be processed in descending row order, and this
is specified as it is not the default.
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the third
solution method.
110515958.doc
Page 23 of 49
Query Diagram
SQL
SELECT /* MOD_OVL */ person_id, start_date,
CASE end_date WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE end_date END end_date,
activity_name, activity_id, group_start,
CASE group_end WHEN To_Date ('01-JAN-3000', 'DD-MON-YYYY') THEN NULL ELSE group_end END group_end
FROM activity
MODEL
MEASURES (start_date, Nvl (end_date, '01-JAN-3000') end_date, activity_name, activity_id,
start_date group_start, Nvl (end_date, '01-JAN-3000') group_end)
RULES (
group_end[rn > 1] =
CASE WHEN end_date[cv()] > group_end[cv()-1] THEN end_date[cv()] ELSE group_end[cv()-1] END,
group_start[rn > 1] =
CASE WHEN start_date[cv()] > group_end[cv()-1] THEN start_date[cv()] ELSE group_start[cv()-1]
END,
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN group_end[cv()] ELSE group_end[cv()+1]
END,
group_end[cv()])
)
ORDER BY 1, 2, 3
Recursive Subquery Factor Solution

How It Works
called Recursive Subquery Factor (RSF).
2. The anchoring branch of the RSF selects records defined by the start point. A direction column is
defined that here is set to E for Either, meaning extend in either direction in the recursive
branch.
3. Add analytic function columns for row number by start date and by end date descending, and for
the minimum start date and maximum end dates. These go in both branches.
110515958.doc
Page 24 of 49
4. The recursive branch extends the record set by joining records that link to extreme parent records
and that push the envelope. The direction column is set to B or F according as the direction of
extension (Forward or Backward).
5. Define a subquery factor for the envelope that simply obtains the minimum start date and
maximum end dates from the recursive factor grouped by person
6. Select all records from the envelope factor, joining the activity table for all records within the
envelope by person to get all the group records with the group start and end dates being the
envelope values.
Note that we need the additional subquery factor because the recursive factor may exclude some records
that do not extend the envelope but are contained within it; for example, record 29 in data set T5 above.
The idea here is that for cases where the break group is small this will avoid expensive processing of the
entire record set. Well demonstrate this saving in our performance analysis section.
110515958.doc
Page 25 of 49
Query Diagram
110515958.doc
Page 26 of 49
SQL
WITH
rsq (person_id, start_date, end_date, activity_name, activity_id, env_start, env_end, rn_asc,
rn_dsc, direction) AS (
Min (start_date) OVER (PARTITION BY person_id) env_start,
Max (Nvl (end_date, '01-JAN-3000')) OVER (PARTITION BY person_id) env_end,
Row_Number () OVER (PARTITION BY person_id ORDER BY start_date) rn_asc,
Row_Number () OVER (PARTITION BY person_id ORDER BY Nvl (end_date, '01-JAN-3000') DESC) rn_dsc,
'E' direction
FROM activity
WHERE '&TODAY' BETWEEN start_date AND Nvl(end_date, '&TODAY')
AND person_id IN (3, 4, 5)
UNION ALL
SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id,
Min (act.start_date) OVER (PARTITION BY act.person_id) env_start,
Max (Nvl (act.end_date, '01-JAN-3000')) OVER (PARTITION BY act.person_id) env_end,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY act.start_date) rn_asc,
Row_Number () OVER (PARTITION BY act.person_id ORDER BY Nvl (act.end_date, '01-JAN-3000') DESC)
rn_dsc,
CASE WHEN act.start_date < rsq.env_start THEN 'B' ELSE 'F' END
FROM rsq
JOIN activity act
ON act.person_id
= rsq.person_id
AND ((
act.start_date
< rsq.env_start AND
Nvl (act.end_date, '01-JAN-3000')
>= rsq.env_start AND
rsq.rn_asc
= 1 AND
rsq.direction
IN ('E', 'B')
) OR
(
Nvl (act.end_date, '01-JAN-3000')
> rsq.env_end AND
act.start_date
<= rsq.env_end AND
rsq.rn_dsc
= 1 AND
rsq.direction
IN ('E', 'F')
)
)
), env AS (
SELECT person_id, Min (env_start) env_start, Max (env_end) env_end
FROM rsq
GROUP BY person_id
)
SELECT /* RSQ_OVL '&TODAY' */ act.person_id, act.start_date, act.end_date, act.activity_name,
act.activity_id, env.env_start, CASE WHEN env.env_end = '01-JAN-3000' THEN NULL ELSE env.env_end END
env_end
FROM env
JOIN activity act
ON act.person_id
= env.person_id
WHERE act.start_date
BETWEEN env.env_start AND env.env_end
AND Nvl (act.end_date, '01-JAN-3000')
BETWEEN env.env_start AND env.env_end
ORDER BY act.person_id, act.start_date, act.end_date
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Century start date = '01-JAN-1900'
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o
Start date = random day in 20th century
End date = start date + random (Ceil (sqrt(d)) + 1

End loop
Store the root date as the mid point of the last record generated
ranges are of random sizes but within a scale determined by the depth point; larger ranges correlate with
larger groups. The width and depth points, together with the (randomized) size of the root group, are
shown in the next section.
110515958.doc
Page 27 of 49
Output Record Counts

The output consists of all the records in the root group, which is defined as the group containing the root
date, and has at least one record by definition.. Of course, each solution method operates on the same
data set, and so the number of records written to file is always the same for both (and this was checked).
Note that the output record count reached its maximum in the shaded data points below.
Depth/
Width
Total
Records
>
D1
D2
D4
D8
D16
D32
D64
D128
W1
W2
W4
W8
W16
W32
W64
1500
3000
6000
12000
24000
48000
96000
1
1
1
1
1
2
2
1
1
1
1
1
3
7
10
11
1
1
3
3
6
20
20
94
1
3
1
8
6
19
134
4778
7
4
5
11
47
117
8893
24000
3
3
10
45
231
6531
48000
48000
11
30
62
556
7814
96000
96000
96000
W1
W2
W4
W8
W16
W32
W64
D1
0.28
1.09
3.93
13.96
46.65
126.2
D2
D4
D8
D16
D32
D64
D128
0.28
0.28
0.29
0.28
0.20
0.18
0.12
1.51
0.95
0.96
0.78
0.61
0.44
0.22
3.51
3.68
3.17
2.42
1.71
0.67
0.25
12.39
11.08
9.13
5.48
2.29
0.63
0.58
40.73
31.73
20.02
8.19
1.67
1.16
2.03
96.72
62.95
24.98
4.86
1.28
3.73
3.93
CPU Times
Analytics
Depth/
Width
396.4
6
160.8
9
63.96
12.04
2.31
7.76
7.30
7.33
Notes
Performance for a given width improves dramatically with depth
Model
Depth/
110515958.doc
W1
W2
W4
W8
W16
W32
W64
Page 28 of 49
Width
D1
D2
D4
D8
D16
D32
D64
D128
0.11
0.11
0.12
0.11
0.11
0.11
0.08
0.08
0.23
0.19
0.18
0.20
0.20
0.23
0.19
0.20
0.41
0.36
0.37
0.40
0.37
0.39
0.40
0.44
0.71
0.75
0.74
0.75
0.73
0.77
0.75
1.06
1.54
1.48
1.40
1.44
1.85
1.53
2.00
2.94
2.82
2.79
2.77
3.16
2.82
3.27
6.39
5.47
5.74
5.74
5.87
5.80
6.15
10.80
10.32
10.63
Notes
Performance for a given width is largely independent of depth, except where it starts to drop off at
the maximum depths on the wider data points
Recursive Subquery Factor (No Hint)
Depth/
Width
D1
D2
D4
D8
W1
W2
W4
W8
W16
W32
W64
0.03
0.01
0.02
0.01
0.01
0.03
0.02
0.03
0.03
0.02
0.03
0.03
0.05
0.05
0.01
0.05
0.09
0.04
0.06
0.14
0.10
0.09
0.16
0.57
D16
0.04
0.00
0.04
0.05
0.26
2.25
D32
0.02
0.05
0.07
0.06
0.58
D64
0.03
0.03
0.06
0.39
39.28
D128
0.03
0.05
0.16
9.86
90.30
59.10
330.4
3
325.5
1
0.38
0.71
1.30
10.03
121.2
2
1255.
15
1240.
12
1213.
75
110515958.doc
Page 29 of 49
Notes
Recursive Subquery Factor (Hint)

This query had the following hint added to the anchor branch of the recursive union:
/*+ INDEX (activity ACTIVITY_N1) */
And this to the recursive branch (the first hint means resolve the Or into a Union):
/*+ USE_CONCAT INDEX (act ACTIVITY_N1) */
Depth/
Width
D1
D2
D4
D8
W1
W2
W4
W8
W16
W32
W64
0.03
0.02
0.02
0.02
0.03
0.03
0.03
0.04
0.02
0.03
0.11
0.03
0.14
0.19
0.10
0.20
0.79
0.40
0.03
0.78
0.69
0.65
0.71
3.27
D16
0.03
0.06
0.10
0.19
1.27
D32
0.03
0.10
0.32
0.25
0.06
D64
0.03
0.13
0.24
0.10
68.10
9.27
113.8
3
186.7
0
D128
0.03
0.03
0.42
0.61
48.46
93.41
5.64
4.14
6.21
48.11
242.4
7
735.1
5
731.5
5
158.9
4
110515958.doc
Page 30 of 49
Notes
Performance for a given width worsens dramatically with depth, although less so than for the
unhinted query
Slice Graphs
Wide Slice
Deep Slice

Analytics
--------------------------------------------------------------------------------------------| Id | Operation
| Name
| Rows | Bytes |TempSpc| Cost (%CPU)| Time
|
--------------------------------------------------------------------------------------------|
|
|
|
|
| 4369 (100)|
|
|
1 | SORT ORDER BY
|
| 96660 | 6513K| 8416K| 4369
(1)| 00:00:53 |
|* 2 |
VIEW
|
| 96660 | 6513K|
| 2794
(1)| 00:00:34 |
|
3 |
WINDOW SORT
|
| 96660 | 6513K| 8416K| 2794
(1)| 00:00:34 |
|
4 |
VIEW
|
| 96660 | 6513K|
| 1219
(1)| 00:00:15 |
|
5 |
WINDOW BUFFER
|
| 96660 | 5663K|
| 1219
(1)| 00:00:15 |
|
6 |
VIEW
|
| 96660 | 5663K|
| 1219
(1)| 00:00:15 |
|
7 |
WINDOW SORT
|
| 96660 | 3964K| 5696K| 1219
(1)| 00:00:15 |
|
8 |
TABLE ACCESS FULL| ACTIVITY | 96660 | 3964K|
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
110515958.doc
Page 31 of 49

hh24:mi:ss')))
Model
-----------------------------------------------------------------------------------------| Id | Operation
| Name
|
-----------------------------------------------------------------------------------------|
|
|
|
|
| 2794 (100)|
|
|
1 | SORT ORDER BY
|
| 96660 | 6513K| 8416K| 2794
(1)| 00:00:34 |
|* 2 |
VIEW
|
| 96660 | 6513K|
| 1219
(1)| 00:00:15 |
|
3 |
SQL MODEL ORDERED |
| 96660 | 3964K|
| 1219
(1)| 00:00:15 |
|
4 |
WINDOW SORT
|
| 96660 | 3964K| 5696K| 1219
(1)| 00:00:15 |
|
5 |
|
171
(1)| 00:00:03 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
hh24:mi:ss')))

-------------------------------------------------------------------------------------------------------| Id | Operation
|Name
|Rows | Bytes|TempSpc|Cost (%CPU)|Time
|
-------------------------------------------------------------------------------------------------------|
|
|
|
|
| 4673 (100)|
|
|
1 | SORT ORDER BY
|
|
1|
63 |
| 4673 (62)|00:00:57|
|
2 |
NESTED LOOPS
|
|
1|
63 |
| 4672 (62)|00:00:57|
|
3 |
VIEW
|
|
3|
63 |
| 4666 (63)|00:00:56|
|
4 |
HASH GROUP BY
|
|
3|
63 |
| 4666 (63)|00:00:56|
|
5 |
VIEW
|
|21964| 450K|
| 4664 (63)|00:00:56|
|
6 |
UNION ALL (RECURSIVE WITH) BREAD F |
|
|
|
|
|
|
|
7 |
WINDOW SORT
|
|21616| 886K| 1280K| 645
(1)|00:00:08|
|
8 |
WINDOW SORT
|
|21616| 886K| 1280K| 645
(1)|00:00:08|
|* 9 |
TABLE ACCESS FULL
|ACTIVITY
|21616| 886K|
| 172
(2)|00:00:03|
| 10 |
WINDOW SORT
|
| 348|35496 |
| 4019 (72)|00:00:49|
| 11 |
WINDOW SORT
|
| 348|35496 |
| 4019 (72)|00:00:49|
|* 12 |
HASH JOIN
|
| 348|35496 | 1520K| 4017 (72)|00:00:49|
| 13 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 14 |
TABLE ACCESS FULL
|ACTIVITY
|96660| 3964K|
| 171
(1)|00:00:03|
| 15 |
|ACTIVITY
|
1|
42 |
|
2
(0)|00:00:01|
|* 16 |
INDEX RANGE SCAN
|ACTIVITY_N1|
1|
|
|
1
(0)|00:00:01|
-------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------9 - filter(("START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
NVL("END_DATE",TO_DATE(' 3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))>=
TO_DATE(' 1966-04-03 12:00:00','syyyy-mm-dd hh24:mi:ss')))
filter((("ACT"."START_DATE"<"RSQ"."ENV_START" AND "RSQ"."ENV_START"<=NVL("END_DATE",TO_DATE('
3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND "RSQ"."RN_ASC"=1 AND
INTERNAL_FUNCTION("RSQ"."DIRECTION")) OR ("RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE(' 3000-01-01 00:00:00','syyyy-mm-dd hh24:mi:ss'))
AND "ACT"."START_DATE"<="RSQ"."ENV_END" AND "RSQ"."RN_DSC"=1 AND
INTERNAL_FUNCTION("RSQ"."DIRECTION"))))
16 - access("ACT"."PERSON_ID"="ENV"."PERSON_ID" AND "ACT"."START_DATE">="ENV"."ENV_START" AND
"ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ACT"."START_DATE"<="ENV"."ENV_END" AND
"ENV"."ENV_END">="ACT"."SYS_NC00006$")
filter(("ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ENV"."ENV_END">="ACT"."SYS_NC00006$"))
Recursive Subquery Factor with Hint

-------------------------------------------------------------------------------------------------------| Id |Operation
| Name
|Rows |Bytes|TempSpc| Cost (%CPU)| Time
|
-------------------------------------------------------------------------------------------------------|
0 |SELECT STATEMENT
|
|
|
|
|
306K(100)|
|
|
1 | SORT ORDER BY
|
|
1 | 63 |
|
306K (2)| 01:01:21 |
|
2 | NESTED LOOPS
|
|
1 | 63 |
|
306K (2)| 01:01:21 |
|
3 |
VIEW
|
|
3 | 63 |
|
306K (2)| 01:01:21 |
|
4 |
HASH GROUP BY
|
|
3 | 63 |
|
306K (2)| 01:01:21 |
|
5 |
VIEW
|
| 3503K| 70M|
|
306K (2)| 01:01:19 |
|
6 |
UNION ALL (RECURSIVE WITH) BRE F|
|
|
|
|
|
|
|
7 |
WINDOW SORT
|
|21616 | 886K| 1280K| 22328
(1)| 00:04:28 |
|
8 |
WINDOW BUFFER
|
|21616 | 886K|
| 22328
(1)| 00:04:28 |
|
9 |
TABLE ACCESS BY INDEX ROWID | ACTIVITY
|21616 | 886K|
| 22092
(1)| 00:04:26 |
|* 10 |
INDEX FULL SCAN
| ACTIVITY_N1|21616 |
|
|
517
(1)| 00:00:07 |
| 11 |
WINDOW SORT
|
| 3482K| 338M|
|
284K (3)| 00:56:51 |
| 12 |
WINDOW SORT
|
| 3482K| 338M|
|
284K (3)| 00:56:51 |
| 13 |
CONCATENATION
|
|
|
|
|
|
|
110515958.doc
Page 32 of 49
| 14 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 15 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 16 |
INDEX FULL SCAN
|
|
517
(1)| 00:00:07 |
|* 17 |
FILTER
|
|
|
|
|
|
|
|* 18 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 19 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 20 |
MERGE JOIN
|
| 1741K| 169M|
|
121K (3)| 00:24:23 |
| 21 |
TABLE ACCESS BY INDEX ROWID| ACTIVITY
|96660 |3964K|
| 96331
(1)| 00:19:16 |
| 22 |
INDEX FULL SCAN
|
|
517
(1)| 00:00:07 |
|* 23 |
FILTER
|
|
|
|
|
|
|
|* 24 |
SORT JOIN
|
|21616 |1266K| 3256K| 22642
(1)| 00:04:32 |
| 25 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 26 |
| ACTIVITY
|
1 | 42 |
|
2
(0)| 00:00:01 |
|* 27 |
INDEX RANGE SCAN
| ACTIVITY_N1|
1 |
|
|
1
(0)| 00:00:01 |
-------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------10 - access("ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss'))
filter(("START_DATE"<=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss') AND
"ACTIVITY"."SYS_NC00006$">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd hh24:mi:ss')))
17 - filter(("ACT"."START_DATE"<="RSQ"."ENV_END" AND "RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE(' 3000-01-0100:00:00', 'syyyy-mm-dd hh24:mi:ss'))))
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
23 - filter(("ACT"."START_DATE"<"RSQ"."ENV_START" AND "RSQ"."ENV_START"<=NVL("END_DATE",TO_DATE('
3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')) AND (LNNVL("RSQ"."ENV_END"<NVL("END_DATE",
TO_DATE('3000-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss')))
OR LNNVL("ACT"."START_DATE"<="RSQ"."ENV_END") OR
LNNVL("RSQ"."RN_DSC"=1) OR (LNNVL("RSQ"."DIRECTION"='E')
AND LNNVL("RSQ"."DIRECTION"='F')))))
filter("ACT"."PERSON_ID"="RSQ"."PERSON_ID")
27 - access("ACT"."PERSON_ID"="ENV"."PERSON_ID" AND "ACT"."START_DATE">="ENV"."ENV_START" AND
"ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ACT"."START_DATE"<="ENV"."ENV_END" AND
"ENV"."ENV_END">="ACT"."SYS_NC00006$")
filter(("ENV"."ENV_START"<="ACT"."SYS_NC00006$" AND "ENV"."ENV_END">="ACT"."SYS_NC00006$"))
110515958.doc
The best method for deep data sets is Analytics
The best method for shallow data sets is Recursive Subquery Factor. The hinted version levels
the performance off at the extremes, but does not make a preferred option
The Model method is largely independent of depth and performs in the wide slice at a level
between the two other methods, except for one intermediate data point where it is better than both
Page 33 of 49
Problem 3: Bursts of Activity

Analytics Solution (None)
I am unaware of a solution to this problem using analytic functions alone.
Model Solution
How It Works
The key to solving this problem using Oracles Model clause is to realise that the solution can be
represented as simple inductions, forward for the group start dates, then backward for the group end
dates. If D is the distance parameter, s, e, S, E are the current start date, end date, group start date, end
date and (ps, pe, pS, pE) and (ns, ne, nS, nE) are the prior and next values then (using C-like terminology
for brevity):
Initial,
S = s; later,
S = (s pS > D) ? s : pS
Final,
E = e; earlier,
E = nS > S ? e : nE
and group_end
3. Initialise group start and end dates to start and end dates in the measures clause
4. Define the first rule to obtain the group start date for all rows after the first as the start date, unless
the start date is less than the distance parameter from the previous group start date, in which
case take that value. This rule will be processed in the default ascending row order.
5. Define the second rule to obtain the group end date for all rows before the last as the next group
end date, unless the group start date is less than the next group start date, in which case take the
current end date. This rule must be processed in descending row order, and this is specified as it
is not the default.
The query diagram, SQL and functional testing use the form for obtaining all break groups, while the
performance testing uses the form for obtaining a single break group, for consistency with the second
solution method.
110515958.doc
Page 34 of 49
Query Diagram
SQL
SELECT /* MOD */ person_id, start_date, end_date, activity_name, activity_id, group_start, group_end
FROM activity
MODEL
MEASURES (start_date, end_date, activity_name, activity_id, start_date group_start, end_date
group_end)
RULES (
group_start[rn > 1] = CASE WHEN start_date[cv()] - group_start[cv()-1] > 3 THEN start_date[cv()]
ELSE group_start[cv()-1] END,
CASE WHEN group_start[cv()] < group_start[cv()+1] THEN end_date[cv()] ELSE group_end[cv()+1] END,
end_date[cv()])
)
ORDER BY 1, 2, 3
Recursive Subquery Factoring Solution

How It Works
called Recursive Subquery Factor (RSF).
1. Define a (non-recursive) subquery factor, act, that selects all records after a given root date and
obtains a row number by person ordered by start date.
3. The anchoring branch of the RSF selects the first record from act, with group start as the start
date.
4. The recursive branch extends the record set by joining the next record from act if it is within the
distance limit from the previous group start, and retaining the group start at its previous value.
5. Select all records from the RSF, and get the group end date using an analytic Max.
The idea here is that for cases where the break group is small this will avoid expensive processing of the
entire record set. Well demonstrate this saving in our performance analysis section.
110515958.doc
Page 35 of 49
Query Diagram
.
SQL
WITH
act AS (
SELECT person_id, start_date, end_date, activity_name, activity_id, Row_Number() OVER (PARTITION BY
person_id ORDER BY start_date) rn
FROM activity
WHERE start_date >= '&TODAY'
),
rsq (person_id, rn, start_date, end_date, activity_name, activity_id, group_start) AS (
SELECT person_id, rn, start_date, end_date, activity_name, activity_id, start_date
group_start
FROM act
WHERE rn = 1
UNION ALL
SELECT act.person_id,
act.rn,
act.start_date,
act.end_date,
act.activity_name,
act.activity_id,
rsq.group_start
110515958.doc
Page 36 of 49
FROM
JOIN
ON
AND
AND
act
rsq
rsq.rn
= act.rn - 1
rsq.person_id
= act.person_id
act.start_date - rsq.group_start <= 3
)
SELECT /* RSQ_DST '&TODAY' */ rsq.person_id,
rsq.start_date,
rsq.end_date,
rsq.activity_name,
rsq.activity_id,
rsq.group_start,
Max (rsq.end_date) OVER (PARTITION BY rsq.person_id)
FROM rsq
ORDER BY 1, 2, 3
Test Data Sets
If w and d are the numeric width and depth points, records are generated for three persons as follows:
Let random(x) be a random integer between 1 and x (generated afresh on each access)
Record limit (per person) = 500 * w
Loop for record limit (per person)
Add record for person 1, as follows:
o
Start date = random day in 20th century
End date = start date + random (d) + 1

End loop
Store the root date as the earliest start date generated
ranges are of random sizes but within a scale determined by the depth point; larger ranges have no effect
on group size here: the maximum group range is taken to be the depth parameter value in days. In this
way, depth correlates with the group sizes.
The width and depth points, together with the (randomized) size of the root group, are shown in the next
section.
Output Row Counts
The output consists of all the records in the first group, starting at the root date. Of course, each solution
method operates on the same data set, and so the number of records written to file is always the same for
both (and this was checked).
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
110515958.doc
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
3
3
3
5
4
11
31
3
4
3
9
9
21
70
3
3
3
7
16
38
138
3
6
5
15
32
74
229
4
5
9
28
64
150
494
4
7
16
31
125
290
907
7
12
29
72
218
678
1959
97
164
361
742
1444
2881
5785
12
23
71
138
438
1295
3794
1156
1
Page 37 of 49
CPU Times
Model
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
D243
D729
D2187
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
0.07
0.10
0.10
0.09
0.09
0.09
0.11
0.08
0.16
0.16
0.15
0.14
0.16
0.16
0.15
0.18
0.29
0.30
0.29
0.31
0.31
0.32
0.31
0.29
0.58
0.59
0.59
0.58
0.58
0.59
0.61
0.69
1.19
1.15
1.17
1.19
1.17
1.17
1.20
1.23
2.28
2.34
2.31
2.35
2.36
2.29
2.34
2.43
4.73
4.64
4.67
4.68
4.69
4.71
4.84
5.00
9.39
9.41
9.29
9.41
9.38
9.42
9.57
10.03
Notes
Performance for a given width is essentially independent of depth
Depth/
Width
Total
Records
>
D1
D3
D9
D27
D81
W1
W2
W4
W8
W16
W32
W64 W128
1500
3000
6000
1200
0
2400
0
4800
0
9600
0
19200
0
0.01
0.03
0.03
0.03
0.03
0.03
0.03
0.02
0.03
0.05
0.05
0.03
0.03
0.05
0.06
0.07
0.06
0.06
0.11
0.14
0.13
0.12
0.16
0.24
0.36
0.22
0.29
0.31
0.47
1.11
1.22
1.48
3.05
4.93
13.61
D243
0.05
0.05
0.10
0.22
0.71
2.42
D729
0.01
0.10
0.21
0.53
2.00
D2187
0.08
0.13
0.46
1.62
5.63
7.00
21.4
5
0.53
0.61
0.83
1.62
3.74
10.5
8
27.7
2
82.3
7
110515958.doc
37.66
107.7
3
317.8
2
Page 38 of 49
Notes
Slice Graphs
Wide Slice
Deep Slice
110515958.doc
Page 39 of 49

Model
-----------------------------------------------------------------------------------------| Id | Operation
| Name
|
-----------------------------------------------------------------------------------------|
|
|
|
|
| 5591 (100)|
|
|
1 | SORT ORDER BY
|
|
193K|
14M|
18M| 5591
(1)| 00:01:08 |
|* 2 |
VIEW
|
|
193K|
14M|
| 2073
(1)| 00:00:25 |
|
3 |
SQL MODEL ORDERED |
|
193K| 6422K|
| 2073
(1)| 00:00:25 |
|
4 |
WINDOW SORT
|
|
193K| 6422K| 9112K| 2073
(1)| 00:00:25 |
|* 5 |
TABLE ACCESS FULL| ACTIVITY |
193K| 6422K|
|
310
(1)| 00:00:04 |
-----------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------2 - filter("GROUP_START"="MIN_START")
5 - filter("START_DATE">=TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd
hh24:mi:ss'))

0FD9D6648_110EBBB
------------------------------------------------------------------------------------------------------| Id | Operation
|Name
|Rows |Bytes |TempSpc|Cost (%CPU)| Time
|
------------------------------------------------------------------------------------------------------|
|
|
|
|
| 4455 (100)|
|
|
1 | TEMP TABLE TRANSFORMATION
|
|
|
|
|
|
|
|
2 |
LOAD AS SELECT
|
|
|
|
|
|
|
|
3 |
WINDOW SORT
|
| 193K| 6422K| 9112K| 2073
(1)| 00:00:25|
|* 4 |
TABLE ACCESS FULL
|ACTIVITY | 193K| 6422K|
| 310
(1)| 00:00:04|
|
5 |
WINDOW SORT
|
| 6429K| 367M|
| 2382 (23)| 00:00:29|
|
6 |
VIEW
|
| 6429K| 367M|
| 2382 (23)| 00:00:29|
|
7 |
UNION ALL (RECURSIVE WITH) BREADTH F|
|
|
|
|
|
|
|* 8 |
VIEW
|
| 193K|
11M|
| 246
(1)| 00:00:03|
|
9 |
TABLE ACCESS FULL
|SYS_TEMP_| 193K| 6422K|
| 246
(1)| 00:00:03|
|* 10 |
HASH JOIN
|
| 6235K| 588M| 8880K| 2136 (25)| 00:00:26|
| 11 |
RECURSIVE WITH PUMP
|
|
|
|
|
|
|
| 12 |
VIEW
|
| 193K|
11M|
| 246
(1)| 00:00:03|
| 13 |
TABLE ACCESS FULL
|SYS_TEMP_| 193K| 6422K|
| 246
(1)| 00:00:03|
------------------------------------------------------------------------------------------------------Predicate Information (identified by operation id):
--------------------------------------------------4 - filter("START_DATE">=TO_DATE(' 1900-01-01 00:00:00', 'syyyy-mm-dd hh24:mi:ss'))
8 - filter("RN"=1)
10 - access("RSQ"."RN"="ACT"."RN"-1 AND "RSQ"."PERSON_ID"="ACT"."PERSON_ID")
filter("ACT"."START_DATE"-"RSQ"."GROUP_START"<=1)
[SYS_TEMP_ was SYS_TEMP_0FD9D6648_110EBBB - truncated to fit the Word box]
110515958.doc
No solution method using Analytics was found
The best method for shallow data sets is Recursive Subquery Factor
The best method for deep data sets is Model, which also is independent of depth
Page 40 of 49
Analytics Anomaly Analysis

We observed in the performance analysis sections for problems 1 and 2 that the analytics solutions
behaved in the opposite manner to recursive subquery factoring: performance improved roughly in
proportion to depth for given width. This is surprising, since we might expect performance to remain
largely independent of depth, as with the model solutions, given that depth does not affect overall problem
size. The behaviour of recursive subquery factoring is consistent with expectation, given the construction
of the methods.
After completion of the initial performance analysis (v1.2 of the document) this issue was further analysed.
It was determined by experiment that variations on the queries could avoid the deterioration in
performance with decreasing depth. The problem seems to be due to a glitch in Oracles execution of
queries with First_Value and the IGNORE NULLS option, and occurs in both 10g and 11g XE. It seems as
though Oracle does a lot of unnecessary recalculation for each row processed when there are few null
values.
The first variation involves noting that finding the first value in a list looking forward from the current row is
the same as finding the last value looking back from the end to the current row. At first it might seem that
the latter would be slower, but reuse of processing for previous rows as one progresses through the row
set clearly is important.
The second variation involves removing the First_Value from the existing query, then adding an enclosing
query that gets the group end as the maximum for person and group start.
The performance analysis was repeated for the two variations, plus the original analytic solutions and the
model solution on a single wide slice, using the same data setup programs. As there is no RSF method
now, we have taken the original forms of the problems where all groups are obtained. Both variations now
perform as well for shallow as for deep data sets. Notice that the explain plans suggest that the variations
will perform worse, having additional sort operations and higher estimated costs, but they are wrong.
Analytic Query Variations

Query NOF (Replace First_Value with Last_Value Inverted)
The query structure is essentially unchanged.
SQL
SELECT /* NOV_NOF */
group_start,
Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
group_end
FROM (
start_date-1)) OR
end_date) OR
FROM activity_nov
)
Explain Plan
---------------------------------------------------------------------------------------------| Id | Operation
| Name
|
---------------------------------------------------------------------------------------------|
|
|
|
|
|
522 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
2 |
WINDOW SORT
|
| 19200 | 1293K| 1680K|
522
(1)| 00:00:07 |
|
3 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
4 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
5 |
TABLE ACCESS FULL| ACTIVITY_NOV | 19200 |
618K|
|
30
(0)| 00:00:01 |
----------------------------------------------------------------------------------------------
110515958.doc
Page 41 of 49
Query MAX (Remove First_Value, Adding Max in Outer Level)
SQL
SELECT /* NOV_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
Max (end_date) OVER (PARTITION BY person_id, group_start) group_end
FROM (
SELECT
group_start
FROM (
start_date-1)) OR
end_date) OR
FROM activity_nov
)
)
Explain Plan
-----------------------------------------------------------------------------------------------| Id | Operation
| Name
|
-----------------------------------------------------------------------------------------------|
|
|
|
|
| 1041 (100)|
|
|
1 | SORT ORDER BY
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
2 |
WINDOW SORT
|
| 19200 | 1125K| 1448K| 1041
(1)| 00:00:13 |
|
3 |
VIEW
|
| 19200 | 1125K|
|
484
(1)| 00:00:06 |
|
4 |
WINDOW SORT
|
| 19200 | 1125K| 1448K|
484
(1)| 00:00:06 |
|
5 |
VIEW
|
| 19200 | 1125K|
|
205
(1)| 00:00:03 |
|
6 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
7 |
618K|
|
30
(0)| 00:00:01 |
------------------------------------------------------------------------------------------------
Query Analytics (Original)

Explain Plan
--------------------------------------------------------------------------------------------| Id | Operation
| Name
|
110515958.doc
Page 42 of 49
--------------------------------------------------------------------------------------------|
|
|
|
|
|
205 (100)|
|
|
1 | WINDOW SORT
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
2 |
VIEW
|
| 19200 | 1293K|
|
205
(1)| 00:00:03 |
|
3 |
WINDOW SORT
|
| 19200 |
618K|
912K|
205
(1)| 00:00:03 |
|
4 |
618K|
|
30
(0)| 00:00:01 |
---------------------------------------------------------------------------------------------
Query Model (Original)

Explain Plan
--------------------------------------------------------------------------------------------| Id | Operation
| Name
|
--------------------------------------------------------------------------------------------|
|
|
|
|
|
379 (100)|
|
|
1 | SORT ORDER BY
|
| 19200 |
618K|
912K|
379
(1)| 00:00:05 |
|
2 |
SQL MODEL ORDERED |
| 19200 |
618K|
|
379
(1)| 00:00:05 |
|
3 |
WINDOW SORT
|
| 19200 |
618K|
912K|
379
(1)| 00:00:05 |
|
4 |
618K|
|
30
(0)| 00:00:01 |
---------------------------------------------------------------------------------------------

Query NOF (Replace First_Value with Last_Value Inverted)
The query structure is essentially unchanged.
SQL
SELECT /* ANA_NOF */
group_start,
CASE Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
Last_Value (group_end IGNORE NULLS) OVER (PARTITION BY person_id ORDER BY start_date DESC)
END group_end
FROM (
FROM (
FROM activity
)
)
Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
|
-------------------------------------------------------------------------------------------|
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------
110515958.doc
Page 43 of 49
Query MAX (Remove First_Value, Adding Max in Outer Level)
SQL
SELECT /* ANA_MAX */
person_id, start_date, end_date, activity_name, id,
group_start,
CASE Max (Nvl(end_date, To_Date('01-JAN-3000', 'DD-MON-YY'))) OVER (PARTITION BY person_id,
group_start) WHEN To_Date('01-JAN-3000', 'DD-MON-YY') THEN NULL ELSE
Max (end_date) OVER (PARTITION BY person_id, group_start) END group_end
FROM (
SELECT /* ANA_OVL */
group_start
FROM (
FROM (
FROM activity
)
)
)
Explain Plan
-------------------------------------------------------------------------------------------| Id | Operation
| Name
|
-------------------------------------------------------------------------------------------|
|
|
|
|
| 2745 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
2 |
WINDOW SORT
|
| 94880 | 6393K| 8264K| 2745
(1)| 00:00:33 |
|
3 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
110515958.doc
Page 44 of 49
|
4 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
6 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
7 |
|
171
(1)| 00:00:03 |
--------------------------------------------------------------------------------------------
Query Analytics (Original)

Explain Plan
------------------------------------------------------------------------------------------| Id | Operation
| Name
|
------------------------------------------------------------------------------------------|
|
|
|
|
| 1199 (100)|
|
|
1 | WINDOW SORT
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
2 |
VIEW
|
| 94880 | 6393K|
| 1199
(1)| 00:00:15 |
|
3 |
WINDOW BUFFER
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
4 |
VIEW
|
| 94880 | 5559K|
| 1199
(1)| 00:00:15 |
|
5 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 1199
(1)| 00:00:15 |
|
6 |
|
171
(1)| 00:00:03 |
-------------------------------------------------------------------------------------------
Query Model (Original)

Explain Plan
----------------------------------------------------------------------------------------| Id | Operation
| Name
|
----------------------------------------------------------------------------------------|
|
|
|
|
| 2226 (100)|
|
|
1 | SORT ORDER BY
|
| 94880 | 3891K| 5592K| 2226
(1)| 00:00:27 |
|
2 |
SQL MODEL ORDERED |
| 94880 | 3891K|
| 2226
(1)| 00:00:27 |
|
3 |
WINDOW SORT
|
| 94880 | 3891K| 5592K| 2226
(1)| 00:00:27 |
|
4 |
|
171
(1)| 00:00:03 |
-----------------------------------------------------------------------------------------
Group Sizes by Depth
The output consists of all the records (76,000) and the table below gives the average group sizes, which
are written to the log by a query in the data setup program.
Depth
D1
D3
D9
D27
D81
D243
D729
D2187
Group Size
1
2
5
14
41
125
356
985
CPU Times
D1
D3
D9
Depth ->
1
2
5
Group Size
->
ANA
694.71 340.58 130.57
NOF
7.05
6.63
6.88
MAX
5.78
5.91
5.46
MOD
7.78
7.48
7.62
110515958.doc
D27
14
D81
41
D243
125
50.47
7.24
4.99
7.55
21.28
6.53
5.32
7.78
10.87
6.92
5.60
7.53
D729
356
D2187
985
8.63
7.12
5.84
8.08
7.74
7.27
5.42
7.45
Page 45 of 49
Slice Graph (Wide Slice)

Group Sizes by Depth
The output consists of all the records (96,000) and the table below gives the average group sizes, which
are written to the log by a query in the data setup program.
Depth
D1
D3
D9
D27
D81
D243
D729
D2187
Group Size
4
6
9
33
120
2602
24615
32000
CPU Times
D1
D3
D9
Depth ->
4
6
9
Group Size
->
ANA
278.09 180.97 120.87
NOF
9.18
8.70
8.89
MAX
9.19
8.95
8.95
MOD
11.76
12.12
12.33
110515958.doc
D27
33
D81
120
D243
2602
36.26
9.53
8.83
11.37
16.6
8.76
8.63
11.61
9.13
8.83
8.69
11.64
D729
24615
D2187
32000
9.37
9.15
8.95
11.97
8.40
8.42
8.60
11.47
Page 46 of 49
Slice Graph (Wide Slice)
110515958.doc
Page 47 of 49
Conclusions
Solution methods have been presented for a number of range-based SQL grouping problems, including
relatively new techniques from Oracle Database 10.1 and 11.2. It has been shown that the best method
depends not just on the size of the data set, but also on its shape. A few summary points may be made in
relation to these problems:
110515958.doc
The Model clause tends to produce relatively simple SQL that performs consistently across data
sets
The new Recursive Subquery Factor feature can be extremely efficient in cases where the
records in the solution set are much fewer than the total, but only works for a single group
Solutions using analytic functions are slightly more efficient than model solutions where available,
but an important performance glitch in certain cases has been identified and needs to be worked
around
Explain plan costings should be treated with caution
SQL developers interested in performance need to be proficient in all three techniques (most are
familiar only with the older, from Oracle v8, analytic functions technique)
Performance testing can be more effective when executed by automated methods across multidimensional domains
Page 48 of 49
References
REF
Document
REF-1
Activities and breaks
REF-2
REF-3
SQL Pivot and Prune Queries Keeping an Eye on

Performance
Oracle Database SQL Language Reference 11g Release 2
(11.2)
110515958.doc
Details
Question by Jayadev on Tom Kytes Oracle
database forum
BP Furey, June 2011
http://www.oracle.com/pls/db112
Page 49 of 49

Forming Range-Based Break Groups With Advanced SQL

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Forming Range-Based Break Groups With Advanced SQL

Încărcat de

Drepturi de autor:

Formate disponibile

FORMING RANGE-BASED BREAK GROUPS

WITH ADVANCED SQL

Functional Test Data....................................................................................5

Test Data Grouping Diagram........................................................................7

Problem 1: Contiguous Ranges.........................................................................10

Recursive Subquery Factor Solution...........................................................14

Problem 2: Overlapping Ranges.......................................................................21

Recursive Subquery Factor Solution...........................................................24

Problem 3: Bursts of Activity............................................................................34

Recursive Subquery Factoring Solution.......................................................35

Analytics Anomaly Analysis..............................................................................41

Problem Definitions and Examples

range start, range end

- range fields (range end is just viewed as another attribute in

- break fields (where allowed)

- any other fields

Functional Test Data

Activity (problems 2 and 3, indexes non-unique)

3 records, gap, 2 records, gap, 1 record

first, with null end date

3 records (with overlaps), gap, 2 records (second enclosed by

Test Data Grouping Diagram

Performance Testing Strategy

SQL Change for Single Break Group Problems

Problem 1: Contiguous Ranges

Inline View Diagram

Solution Stage Table

- columns by which the array is dimensioned; can included analytic functions, as

- the current dimension value, here row number ordered by start

- maximum dimension value

Recursive Subquery Factor Solution

- this is a normal query from which the recursion begins

rsq (person_id, start_date, end_date, activity_name, activity_id, direction) AS (

SELECT act.person_id, act.start_date, act.end_date, act.activity_name, act.activity_id, CASE WHEN

First start date = last start date + random(d)

Subsequent start date = previous start date + 1

End date = start date + 1

Exit if record limit reached

Repeat for persons 2 and 3

Performance for a given width improves dramatically with depth

Performance for a given width is essentially independent of depth

Recursive Subquery Factor

Performance for a given width worsens dramatically with depth

Explain Plans (Data Point W256-D1)

Recursive Subquery Factor

The best method for deep data sets is Analytics

Problem 2: Overlapping Ranges

start_date-1)) THEN start_date END group_start,

Inline View Diagram

Solution Stage Table

-- this gets the running latest end dates

-- this gets group start dates

-- this gets group end dates

Recursive Subquery Factor Solution

Start date = random day in 20th century

End date = start date + random (Ceil (sqrt(d)) + 1

Repeat for persons 2 and 3

Output Record Counts

Performance for a given width improves dramatically with depth

Recursive Subquery Factor (No Hint)

Performance for a given width worsens dramatically with depth

Recursive Subquery Factor (Hint)

Explain Plans (Data Point W64-D1)

hh24:mi:ss') AND "GROUP_END">=TO_DATE(' 1966-04-03 12:00:00', 'syyyy-mm-dd