Documente Academic
Documente Profesional
Documente Cultură
com/lpt/a/5063
Using the
ew MODEL Clause in Oracle Database 10g
by Anthony Molinaro
08/11/2004
One of the great new features of Oracle's flagship database software, Oracle Database 10g, is its new MODEL
clause, which you can use in SELECT statements. In this article we'll look at some examples of the MODEL
clause in action, and show how you can use MODEL to manipulate your data.
The simplest MODEL clause example does nothing more than a regular SELECT statement. Here's an example:
select empno,ename,sal
from emp;
select empno,ename,sal
from emp
model
dimension by (empno)
measures (ename,sal)
rules ();
The MODEL clause example simply returns all the employee numbers,
names, and salaries from the emp table. Nothing out of the ordinary Related Reading
happened, but the syntax is obviously more than just "select ... from ... ."
The measures, ename, and sal are our arrays. So, when using the MODEL
clause, the attributes that make up our tables can be treated like arrays.
Each row and column can be manipulated independently just like an array.
The dimension clause is used to identify a specific array value. So, in the
example above, we have two arrays named ename and sal whose default
values are the names and salaries of the employees. The way to access an
individual name or salary is to reference the "dimension"--in this case the
employee number.
For example, how would you reference the name King or King's salary?
You would use ename[7839] or sal[7839], respectively. The array that
holds the employee names is ename[], and referencing ename[7839]
returns a specific name, KING. Mastering Oracle SQL
By Sanjay Mishra,
Since we can treat our rows like arrays, we can easily modify their values Alan Beaulieu
through assignment. Let's change King's name to HOMER and his salary
to 0: Table of Contents
Index
select empno,ename,sal
Sample Chapter
from emp
model Read Online--Safari
dimension by (empno) Search this book on Safari:
measures (ename,sal)
rules (
ename[7839] = 'HOMER',
sal[7839] = 0
Code Fragments only
);
Not only can we modify existing values in our result set, but we can also add values that don't exist. (Please
note that we are not performing DML (Data Manipulation Language) on the table; we're just modifying the
result set.)
select empno,ename,sal
from emp
model
dimension by (empno)
measures (ename,sal)
rules (
ename[7839] = 'HOMER',
sal[7839] = 0,
ename[9999] = 'MR.BURNS',
sal[9999] = 250
);
MR.BURNS with a salary of 250 does not exist in the emp table, but we easily added it to the result set.
Using DECODE or CASE, we can easily change values in a result set just like we did in the example with
HOMER, but the MODEL clause makes it easier to add new rows to the result set.
The Oracle documentation explains how to use the MODEL clause detail. The point of the simple examples
above is to introduce you to the syntax and how the MODEL clause allows you to manipulate your data.
A practical use of this forecasting (for any DBA or database developer) could be to determine future
tablespace growth based on past growth during the last n months. An example of calculating exponential
growth is included in the documentation. Because of the flexibility of the MODEL clause, you can easily
forecast more accurate growth patterns using, say, best-fit polynomials rather than just calculating
exponential growth patterns (which may not be realistic).
Another useful feature of the MODEL clause is that it lets you embed procedural logic directly in your SQL.
This can let you perform some of your complex code directly in SQL. The power of SQL lies in its ability to
process data in a set-oriented fashion. The MODEL clause retains this set-based nature and also introduces
procedural power and flexibility directly into your SQL. The aim of this paper is to introduce you to the
procedural capabilities of the 10g MODEL clause and its effect on performance and problem solving.
I'll use Scott's (the demo schema that comes with every oracle database) standard emp table along with a table
called EMP_SCORE as defined below to help me explain further:
28 rows selected.
The SCORE column represents the employee's scores during two evaluations.
The "power score" is computed by summing the two prior scores n times (for this example, after the initial
sum, we'll just sum twice to calculate the power score). So, for example, if an employee scored 1 and 5, his
power score would be 17, because 1+5=6, 6+5=11, and 11+6=17. The CSV list would display all the numbers
involved in getting to the final score, which in this case is 1,5,6,11,17.
Based on the data in EMP_SCORE, the results for employee 7369 should look like this:
Due to the recursive nature of the computation (we see Fibonacci in there), my first attempt made use of the
analytic LAG along with the WITH clause to calculate the power score, while the CSV list was constructed in a
hierarchical fashion. The CSV was easy enough, but the power score was tough to compute efficiently
because future rows depended on rows created through past computation (rows that didn't yet exist). After
some testing using only SQL, the performance proved to be poor and also a bit inaccurate.
I finally settled on a pipelined table function, much like the one below:
begin
for i in (
select emp_score_obj (empno,score,null) emp_row
from emp_score
order by empno
)
loop
/* this is the first loop iteration set l_data to the first row in the loop */
if ( l_data.count() = 0 )
then
l_data.extend();
l_data(l_data.last()) := i.emp_row;
l_score2 := l_data(l_data.last()).score;
l_score1 := i.emp_row.score;
l_tmp := l_score1 + l_score2;
l_list := l_data(l_data.last()).score || ',' || i.emp_row.score ||
',' || l_tmp || ',';
for j in 1 .. 2
loop
l_score2 := l_score1;
l_score1 := l_tmp;
l_tmp := l_score1 + l_score2;
l_list := l_list || l_tmp || ',';
end loop;
l_data(l_data.last()).score := l_tmp;
l_data(l_data.last()).list := rtrim(l_list,',');
else
/* reached a new employee, pipe the row and reset l_data */
pipe row (l_data(l_data.last()));
l_data(l_data.last()) := i.emp_row;
end if;
end loop;
return;
end get_emp_power_score;
/
Since we were returning the rows in a pipelined (streaming) fashion, the performance was fine initially. It was
when the function was called constantly and then joined with other tables that we ran into trouble. We can
get a glimpse of the potential problems even when using the tiny emp_score table:
14 rows selected.
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=26 Card=8168 Bytes=16336)
1 0 SORT (ORDER BY) (Cost=26 Card=8168 Bytes=16336)
2 1 COLLECTION ITERATOR (PICKLER FETCH) OF 'GET_EMP_POWER_SCORE'
Statistics
----------------------------------------------------------
1 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
732 bytes sent via SQL*Net to client
512 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
For frequently executing SQL, recursive calls can be problematic, but the main problem here is the erroneous
cardinality estimate (which will vary based on your db block size). The solution is to use the CARDINALITY
hint. At the time, this had proven to be a huge help, but this "solution" had two fundamental problems:
1. It's a hint.
2. It's static, so the cardinality you pick at time t1 might be wrong at time t2.
Along with the cardinality error, in 9.2.0.1 there was a bug when trying to either join a pipelined function to
another table or use it in the WHERE clause as an argument to the IN operator to filter multiple rows. The
9.2.0.4 patch fixed those problems but the cardinality error remained, and large tables that were joined with
these table functions were being full-table-scanned. Regardless of the number of rows returned (which was
usually very small), the full scans were still being performed.
For this particular problem, the MODEL clause proved to be a nice solution. By incorporating the MODEL clause,
we were able to:
select empno,
s power_score,
list
from (
select score,
empno,
lag(score) over (partition by empno order by score) ls
from emp_score
)
where ls is not null
model
dimension by (empno)
measures (score s, ls, 0 tmp, cast(ls||','||score as varchar2(20)) list)
rules iterate(3) (
-- save the current score
tmp[any] = s[cv()],
-- compute the new score
14 rows selected.
Let's briefly examine the example above before moving on. While the inline comments let you follow the
logical flow of the code, I'd like to elaborate a bit on certain areas. The meaning behind the MODEL-specific
syntax is not immediately obvious but is covered in great detail in the Oracle documentation, and it makes
sense once you begin using it.
First, I've used the analytic function LAG(). For those not familiar with LAG(), it allows you to access prior
rows in your result set without having to use a self-join. So if the results were initially like this:
EMPNO SCORE
---------- ----------
7369 2
7369 4
LAG lets me access scores 2 and 4 at the same time without a self-join.
You'll also notice ITERATE(3) in the RULES clause. In this case, 3 could have been any number (as long as it's
a constant, not a variable or expression--hopefully this will be changed soon).
That instructs the MODEL clause to perform the code in the RULES clause three times.
tmp[any] = s[cv()]
1. tmp[] is our array, and its values default to 0 for every row; that is, tmp[7839] has a value of 0
initially.
2. tmp[any] The ANY keyword lets you reference all empnos; that is, "for any empno in the table" (ALL
might have been more intuitive).
3. s[cv()] s[] is our array and defaults to the last score in emp_score for every empno; that is, s[7839]
has a value of 4. (Only the last score is kept in s[]; the first score is kept in ls[].) cv() allows you to
reference the current value of the dimension. I've used empty parentheses so the position will indicate
the value, but you can be explicit: s[cv(empno)]
Through the second iteration, tmp[7839] is set to 5 (the second score of 4 + the prior score of 1).
Through the third iteration, tmp[7839] is set to 9 (the new score of 5 + the second score of 4).
Now that we know what is going on, let's see what AUTOTRACE says:
Execution Plan
----------------------------------------------------------
0 SELECT STATEMENT Optimizer=ALL_ROWS (Cost=5 Card=28 Bytes=1092)
1 0 SORT (ORDER BY) (Cost=5 Card=28 Bytes=1092)
2 1 SQL MODEL (ORDERED) (Cost=5 Card=28 Bytes=1092)
3 2 VIEW (Cost=4 Card=28 Bytes=1092)
4 3 WINDOW (SORT) (Cost=4 Card=28 Bytes=196)
5 4 TABLE ACCESS (FULL) OF 'EMP_SCORE' (TABLE) (Cost=3 Card=28
Bytes=196)
Statistics
----------------------------------------------------------
0 recursive calls
0 db block gets
7 consistent gets
0 physical reads
0 redo size
738 bytes sent via SQL*Net to client
512 bytes received via SQL*Net from client
2 SQL*Net roundtrips to/from client
2 sorts (memory)
0 sorts (disk)
14 rows processed
According to autotrace, the performance is about the same, but notice there are no recursive calls since this is
just SQL and the cardinality estimates are correct as well. By using the MODEL clause, not only do we help the
optimizer make better decisions, but we also get (some) flexibility of procedural programming while keeping
the set-based power.
The example above demonstrates that the MODEL clause gives us the ability to:
Look at the syntax! This opens doors to new thinking when dealing with relational data. Things that were
impossible or extremely inefficient to implement in SQL may now be as simple as using SELECT. I'm alluding
to the possibility of performing matrix (eigenvalue) calculations or truly complex temporal functions directly
in SQL. There is the potential for some great things here, and it's all in SQL.
Investigating further on the potential benefits of using the MODEL clause, let's look at a snippet from a 10046
trace on the two examples above.
=====================
PARSING IN CURSOR #1 len=44 dep=0 uid=57 oct=3 lid=57 tim=11151268307
hv=4265205233 ad='183eee1c'
select * from table( get_emp_power_score() )
END OF STMT
PARSE #1:c=0,e=230,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=11151268295
BINDS #1:
EXEC #1:c=0,e=286,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=11151269167
WAIT #1: nam='SQL*Net message to client' ela= 9 p1=1111838976 p2=1 p3=0
=====================
PARSING IN CURSOR #2 len=78 dep=1 uid=57 oct=3 lid=57 tim=11151270047
hv=3940482563 ad='1911392c'
SELECT EMP_SCORE_OBJ (EMPNO,SCORE,NULL) EMP_ROW FROM EMP_SCORE ORDER BY EMPNO
END OF STMT
PARSE #2:c=0,e=160,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=11151270035
BINDS #2:
EXEC #2:c=0,e=261,p=0,cr=0,cu=0,mis=0,r=0,dep=1,og=1,tim=11151271005
=====================
PARSING IN CURSOR #3 len=47 dep=2 uid=0 oct=3 lid=0 tim=11151271963
hv=1023521005 ad='1a6876ec'
select metadata from kopm$ where name='DB_FDO'
END OF STMT
PARSE #3:c=0,e=191,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=11151271952
BINDS #3:
EXEC #3:c=0,e=199,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=11151272830
FETCH #3:c=0,e=69,p=0,cr=2,cu=0,mis=0,r=1,dep=2,og=4,tim=11151273029
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=353 op='TABLE ACCESS BY INDEX ROWID KOPM$
(cr=2 pr=0 pw=0 time=75 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=354 op='INDEX UNIQUE SCAN I_KOPM1 (cr=1 pr=0
pw=0 time=42 us)'
FETCH #2:c=0,e=3539,p=0,cr=9,cu=0,mis=0,r=28,dep=1,og=1,tim=11151274675
FETCH #1:c=0,e=5819,p=0,cr=9,cu=0,mis=0,r=1,dep=0,og=1,tim=11151275272
WAIT #1: nam='SQL*Net message from client' ela= 401 p1=1111838976 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 5 p1=1111838976 p2=1 p3=0
FETCH #1:c=0,e=966,p=0,cr=0,cu=0,mis=0,r=13,dep=0,og=1,tim=11151277220
WAIT #1: nam='SQL*Net message from client' ela= 70159 p1=1111838976 p2=1 p3=0
=====================
MODEL clause
=====================
PARSING IN CURSOR #1 len=497 dep=0 uid=57 oct=3 lid=57 tim=61027987923
hv=1265802836 ad='18e326a8'
select empno,
s power_score,
list
from (
select score,
empno,
lag(score) over (partition by empno order by score) ls /* lag score */
from emp_score
)
where ls is not null
model
dimension by (empno)
measures (score s, ls, 0 tmp, cast(ls||','||score as varchar2(20)) list)
rules iterate(3) (
tmp[any] = s[cv()],
s[any] = s[cv()] + ls[cv()],
ls[any] = tmp[cv()],
list[any] = list[cv()]||','||s[cv()]
)
order by 2 desc, 1
END OF STMT
PARSE #1:c=0,e=167,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=61027987912
BINDS #1:
EXEC #1:c=0,e=306,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,tim=61027990149
WAIT #1: nam='SQL*Net message to client' ela= 8 p1=1111838976 p2=1 p3=0
FETCH #1:c=0,e=2804,p=0,cr=7,cu=0,mis=0,r=1,dep=0,og=1,tim=61027993234
WAIT #1: nam='SQL*Net message from client' ela= 407 p1=1111838976 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 4 p1=1111838976 p2=1 p3=0
FETCH #1:c=0,e=251,p=0,cr=0,cu=0,mis=0,r=13,dep=0,og=1,tim=61027994500
WAIT #1: nam='SQL*Net message from client' ela= 123843 p1=1111838976 p2=1 p3=0
STAT #1 id=1 cnt=14 pid=0 pos=1 obj=0 op='SORT ORDER BY (cr=7 pr=0 pw=0
time=2874 us)'
STAT #1 id=2 cnt=14 pid=1 pos=1 obj=0 op='SQL MODEL ORDERED (cr=7 pr=0 pw=0
time=2760 us)'
STAT #1 id=3 cnt=14 pid=2 pos=1 obj=0 op='VIEW (cr=7 pr=0 pw=0 time=572 us)'
STAT #1 id=4 cnt=28 pid=3 pos=1 obj=0 op='WINDOW SORT (cr=7 pr=0 pw=0 time=613
us)'
STAT #1 id=5 cnt=28 pid=4 pos=1 obj=51474 op='TABLE ACCESS FULL EMP_SCORE (cr=7
pr=0 pw=0 time=263 us)'
==========================================
Observe the extra work being done by the CBO to convert our PL/SQL into a valid table expression that can
be used in SQL:
=====================
PARSING IN CURSOR #3 len=47 dep=2 uid=0 oct=3 lid=0 tim=11151271963
hv=1023521005 ad='1a6876ec'
select metadata from kopm$ where name='DB_FDO'
END OF STMT
PARSE #3:c=0,e=191,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=11151271952
BINDS #3:
EXEC #3:c=0,e=199,p=0,cr=0,cu=0,mis=0,r=0,dep=2,og=4,tim=11151272830
FETCH #3:c=0,e=69,p=0,cr=2,cu=0,mis=0,r=1,dep=2,og=4,tim=11151273029
STAT #3 id=1 cnt=1 pid=0 pos=1 obj=353 op='TABLE ACCESS BY INDEX ROWID KOPM$
(cr=2 pr=0 pw=0 time=75 us)'
STAT #3 id=2 cnt=1 pid=1 pos=1 obj=354 op='INDEX UNIQUE SCAN I_KOPM1 (cr=1 pr=0
pw=0 time=42 us)'
FETCH #2:c=0,e=3539,p=0,cr=9,cu=0,mis=0,r=28,dep=1,og=1,tim=11151274675
FETCH #1:c=0,e=5819,p=0,cr=9,cu=0,mis=0,r=1,dep=0,og=1,tim=11151275272
WAIT #1: nam='SQL*Net message from client' ela= 401 p1=1111838976 p2=1 p3=0
WAIT #1: nam='SQL*Net message to client' ela= 5 p1=1111838976 p2=1 p3=0
FETCH #1:c=0,e=966,p=0,cr=0,cu=0,mis=0,r=13,dep=0,og=1,tim=11151277220
WAIT #1: nam='SQL*Net message from client' ela= 70159 p1=1111838976 p2=1 p3=0
=====================
kopm$ is the data structure being used to store and pipe our rows out. This is part of how the results of a
PL/SQL function are transformed into a valid table expression. Although it may seem harmless, more work is
involved when using object types and table functions in SQL, and this could come into play during peak load
times or complex queries.
Conclusion
By using the MODEL clause, I was able to move the PL/SQL logic directly into SQL, thus avoiding the
recursive calls and context switching that can result from calling PL/SQL in SQL. Ultimately this improves
performance.
The MODEL clause is not a cure-all, but if you take the time to learn it and open yourself to new ideas, it can be
a great new tool to have. In the right situation it could not only make the difference between poor and great
performance, but also provide you an opportunity to do something exclusively in SQL that normally requires
a procedural language.
It's almost like a new language; you really begin to think about your data and result sets differently.
It gives you array access to your rows; procedural programmers may find the transition to set-based
programming easier.
You'll be able to easily "make up" data and generate rows.
It brings recursive power to SQL.
The syntax quickly becomes very intuitive--PROLOG programmers will find this especially true.
It lets you perform complex inter-row calculations.
It combines procedural flexibility with set-based processing power.
which lets you specify an exit condition using a bind variable or expression.
The MODEL clause may seem more like a niche addition to SQL rather than a long-awaited solution.
Once this new feature has been accepted and used by a large number of developers, its usefulness will
grow as developers will undoubtedly discover clever and unexpected uses for it. In other words, give it
time, and it will grow on you.