Sunteți pe pagina 1din 154

Sec: 8.

1 Datalog {1{
CZ

Facts and Rules


A relational database about students and courses
student
Name Major Year
"Joe Doe" cs senior
"Jim Jones" cs junior
"Jim Black" ee junior

took
Name Course Grade
"Joe Doe" cs123 2.7
"Jim Jones" cs101 3.0
"Jim Jones " cs143 3.3
"Jim Black" cs143 3.3
"Jim Black" cs101 2.7

The same fact base for Datalog


student("Joe Doe", cs, senior).
student("Jim Jones", cs, junior).
student("Jim Black", ee, junior).

took("Joe Doe", cs123, 2.7)


took("Jim Jones" , cs101, 3.0)
took("Jim Jones", cs143, 3.3)
took("Jim Black", cs143, 3.3)
took("Jim Black", cs101, 2.7)

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.1 Datalog {2{
CZ

Rules
How to express logical conjunction:
Find the name of junior-level students who have taken
both cs101 and cs143

firstreq(Name) student(Name; Major; junior);


took(Name; cs101; Grade1);
took(Name; cs143; Grade2):

Rule head, rule body.


Upper case, lower case, anonymous variables.
The commas in the body represent logical conjunction.
Junior-level students who took course cs131 or course
cs151 with grade better than 3.0
scndreq(Name) took(Name; cs131; Grade); Grade > 3:0;
student(Name; Major; junior):
scndreq(Name) took(Name; cs151; Grade); Grade > 3:0;
student(Name; ; junior):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.1 Datalog {3{
CZ

QUERIES
A closed query; the answer to such query is either yes or not. For
instance,

?firstreq(\Jim Black")

An open query:

?firstreq(X)

and its answer:

firstreq(\Jim Jones")
firstreq(\Jim Black")

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.1 Datalog {4{
CZ

The Relational Model vs Datalog


The terminology of Datalog versus that of the rela-
tional model
Datalog Relational Model
Base Predicate Table or Relation
Derived Predicate View
Fact Row or Tuple
Argument Column

Most of the power is in cascading Both previous re-


quirements must be satis ed to enroll in cs298
req cs298(Name) firstreq(Name); scndreq(Name):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.1 Datalog {5{
CZ

Negation in Datalog
Only negated goals are allowed. Negated heads are not
Junior-level Students who did not take course cs143

hastaken(Name; Course) took(Name; Course; Grade):


lacks cs143(Name) student(Name; ; junior);
:hastaken(Name; cs143):

Universal quanti cation by Double Negation


Find the senior students who completed all the require-
ments for the cs major: ?all req sat(X)
The rst step is that of formulating the complementary query:
Find students who did not take some of the courses required
for a cs major.
We can now re-express the original query as: Find the senior
students who are NOT missing any requirement

req missing(Name) student(Name; ; senior);


:
req(cs; Course); hastaken(Name; Course):

all req sat(Name) student(Name; ; senior);


:req missing(Name):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.3 Relational Algebra {9{
CZ

Additional Operators
Addditional operators of frequent use can be derived from these.
For instance, we have join, semijoin, intersection, division and
generalized projection.
 The join operator can be constructed using Cartesian prod-
uct and selection. In general, a join has the following form:
R 1 S , where F = $i11$j1 ^ : : : ^ i  $j ; i1; : : : ; i are
F k k k k

columns of R; j1; : : : ; i are columns of S ; and 1; : : : ; 


k k

are comparison operators. Then, if R has arity m, we de ne


F = $i11$(m + j1) ^ : : : ^ $i  $(m + j ).
0
k k k

Therefore,
R 1 S =  0 (R  S )
F F

 The intersection of two relations can be constructed either


by taking the equijoin of the two relations in every column
(and then projecting out duplicate columns) or by using the
following property: R \ S = R (R S ) = S (S R).
 The generalized projection of a relation R is denoted  (R), L

where L is a list of column numbers and constants. Unlike or-


dinary projection, components might appear more than once,
and constants as components of the list L are permitted (e.g.,
$1 $1 is a valid generalized projection).
;c;

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.3 Relational Algebra {8{
CZ

Relational Operators|Cont
Selection.  R denotes the selection on R according to the
F

selection formula F , where F obeys one of the following pat-


terns:
(i) $iC , where i is a column of R,  is an arithmetic com-
parison operator, and C is a constant, or
(ii) $i$j , where $i and $j are columns of R, and  is an
arithmetic comparison operator, or
(iii) an expression built from terms such as those described in
(i) and (ii), above, and the logical connectives _; ^; and
:.
Then,
 R = ft j t 2 R ^ F g
F
0

where F denotes the formula obtained from F by replacing


0

$i and $j with t[i] and t[j ].


For example, if F is \$2 = $3 ^ $1 = bob", then F is 0

\t[2] = t[3] ^ t[1] = bob".


Thus:
$2=$3 $1= R = ft j t 2 R ^ t[2] = t[3] ^ t[1] = bobg:
^ bob

All previous operators, but set-di erenc, are monotonic.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.3 Relational Algebra {7{
CZ

Relational Operators
 Cartesian product. The Cartesian product of R and S is
denoted R  S .
RS =
ftj(9r 2 R)(9s 2 S )(t[1; : : : ; n] = r ^ t[n + 1; : : :; n + m] = s)g
If R has n columns and S has m columns, then RS contains
all the possible m + n tuples whose rst m components form
a tuple in R and the last n components form a tuple in S .
Thus, R  S has m + n columns and jRjjS j tuples, where
jRj and jS j denote the respective cardinalities of the two
relations.
 Projection. Let R be a relation with n columns, and L =
$1; : : : ; $n be a list of the columns of R. Let L be a sublist 0

of L obtained by (1) eliminating of the elements, and (2)


reordering the remaining ones in an arbitrary order. Then,
the projection of R on columns L , denoted  0 , is de ned as
0
L

follows:
 0 R = fr[L ] j r 2 Rg
L
0

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.3 Relational Algebra {6{
CZ

Relational Algebra (RA)


A family of operators on relations that have the closure property:
take relations as arguments and return relations as result.
Set Operators:

 Union. The union of relations R and S , denoted R [ S , is


the set of tuples that are in R, or in S , or in both. Thus, it
can be de ned using TRC as follows:
R [ S = ftjt 2 R _ t 2 S g
This operation is de ned only if R and S have the same
number of columns.
 Set di erence. The di erence of relations R and S , denoted
R S , is the set of tuples that belong to R but not to S .
Thus, it can be de ned as follows: (t = r denotes that both t
and r have n components and t[1] = r[1] ^ : : : ^ t[n] = r[n]):
R S = ftjt 2 R ^ :9r(r 2 S ^ t = r)g
This operation is de ned only if R and S have the same
number of columns (arity).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.2 Relational Calculi {5{
CZ

Commercial DB Languages
The actual query languages of commercial RDMS are largely
based on the formal query languages just discussed. For instance:

 Query-By-Example (QBE) is a visual query language based


on DRC
 Languages such as QUEL and SQL are instead based on TRC.
In QUEL and SQL, the notation t:Name and t:Course are used
instead of t[1] and t[2]; also existential quanti cation is (resp.)
replaced by the constructs RANGE and FROM.
RA algebra provides a good basis for the ecient implementation
of these relational languages.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.2 Relational Calculi {4{
CZ

Relational DB Languages
The di erences between the various languages so far de ned does
not really impact their ultimate expressive power

 TRC and DRC are equivalent, and there are mappings that
transform a formula in one language into an equivalent one
in the other.
 Also for each TRC or DRC expression there is an equivalent,
nonrecursive Datalog program. The converse is also true,
since a nonrecursive Datalog program can be mapped into an
equivalent DRC query.

Query languages that achieve the level of of expressive power


shared by these languages are called relational completene.
Another language that is equivalent to these, and thus relation-
ally complete, is relational algebra (RA). Relational algebra is
an operator-based language, and thus provides a useful link to
concrete implementation of these logic-based languages.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.2 Relational Calculi {3{
CZ

Tuple Relational Calculus (TRC)


In TRC, variables range over the tuples of a relation. For instance,
the TRC expression for the query ?firstreq(N) is:

f(t[1])j 9u9s(took(t) ^ took(u) ^ student(s) ^ t[2] = cs101 ^


u[2] = cs143 ^ t [1 ] = u [1 ] ^ s [3 ] = junior ^ s [1 ] = t [1 ])g

 The variables t and s, respectively denote tuples ranging over


took and student.
 t[1] denotes the rst component in t (corresponding to Name);
t[2] denotes the second component (i.e., the Course value of
this tuple)
 Let j1; : : : ; j denote columns of a relation R, and t 2 R.
n

Then the notation t[j1; : : : ; j ] will be used to denote the


n

n-tuple (t[j1]; : : : ; t[j ]).


n

 TRC requires an explicit statement of equality (e.g., s[1] =


t[1]), while in DRC equality is denoted implicitly by the pres-
ence of the same variable in di erent places.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.2 Relational Calculi {2{
CZ

Explicit Quanti ers


DRC presents several syntactic di erences w.r.t. Datalog:

 set-de nition by abstraction (rather than rules)


 conjunctions and disjunctions in the same formula,
 nesting of parentheses, and
 explicit quanti ers.
Existential and universal quanti cation are both allowed in DRC.
A query such as ?all req sat(N) can be expressed either
(i) using double negation (and only existential quanti ers),
(ii) or directly using the universal quanti er as shown in the follow-
ing example Find the seniors who completed all cs requirements):

f(N )j 9M (student(N; M; senior)) ^


8C (req(cs; C ) ! 9G(took(N; C; G))g (1)

The implication sign !: p ! q is just a shorthand for :p _ q .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.2 Relational Calculi {1{
CZ

Domain Relational Calculus


Relational calculus comes in two main avors:

1. in the Domain Relational Calculus (DRC) the variables de-


note values of attributes,
2. in the Tuple Relational Calculus (TRC) variables denote whole
tuples.

For instance the query \Find the name of junior-level students


who have taken both cs101 and cs143" (i.e. the Datalog query
?firstreq(N)|top of page 165) can be expressed as follows:

f(N ) j 9G1(took(N; cs101 ; G1)) ^ 9G2(took(N; cs143 ; G )) ^ 2

9M (student(N; M; junior)) g
The query ?scndreq(N) can be expressed as follows:

f(N ) j 9G; 9M (took(N; cs131 ; G ) ^ G > 3 :0 ^ student (N ; M ; junior )) _


9G; 9M (took(N; cs151 ; G ) ^ G > 3 :0 ^ student (N ; M ; junior ))g

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {7{
CZ

Equivalence of RA and Safe Nonrecursive


Datalog
Theorem: Let P be a safe Datalog program with-
out recursion or function symbols. Then, for each
predicate in P , there exists an equivalent rela-
tional algebra expression.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {6{
CZ

Mapping with Negated Goals


Say that the body of some rule contains a negated goal, such as
the following following body:

r : ::: b1(a; Y); b2(Y):

Then we consider a positive body, i.e., one constructed by drop-


ping the negated goal,

rp : : : : b1(a; Y); b2(Y); :b3(Y):


and a negative body, i.e., one obtained by removing the negation
sign from the negated goal:

rn : : : : b1(a; Y); b2(Y); b3(Y):

The two bodies so generated are safe and contain no negation, so


we can transform them into equivalent relational algebra expres-
sions as per Step 2 of Algorithm ?? above; let Bodypr and
Bodynr be the RA expressions so obtained. Then the body
expression to be used in Step 3 of said algorithm is simply
Bodyr = Bodyrp Bodyrn.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {5{
CZ

Mapping|cont
Step 2 The body of a rule r is translated into the RA expression
Bodyr. Bodyr consists of the cartesian product of all the
base or derived relations in the body, followed by a selection
 , where F is the conjunction of the following conditions:
F

(i) inequality conditions for each such goal (e.g., Z > 24:3),
(ii) equality conditions between columns containing the same
variable, (iii) equality conditions between a column and the
constant occurring in such a column:
For the example at hand, (i) the condition Z > 24:3 trans-
lates into the selection condition $5 > 24:3, while (ii) the
equality between the two occurrences of X translates into
$1 = $2, while the equality between the two Y s maps into
$3 = $4, and (iii) the constant in the last column of p maps
into $6 = a. Thus we obtain:
Bodyr = $1=$2 $3=$4 $6=a $5
; ; ; >24:3 (Q  P)
Step 3 Each rule r is translated into an extended projection on
Bodyr, according to the patterns in the head of r. For the
rule at hand we obtain:
S = $5 b $5(Bodyr)
; ;

Step 4 Multiple rules with the same head are translated into
union or their equivalent expressions.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {4{
CZ

From Safe Rules to RA


Mapping a safe, non-recursive Datalog program P
into RA
Step 1 P is transformed into an equivalent program P that 0

does not contain any equality goal by replacing equals with


equals and removing the equality goals. For example:

r : s(Z; b; W) q(X; X; Y); p(Y; Z; a); W = Z; W > 24:3

Is translated into:

r : s(Z; b; Z) q(X; X; Y); p(Y; Z; a); Z > 24:3

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {3{
CZ

Safe Datalog
The following is an inductive de nition of safety for a program P :

1. Safe Predicates. A predicate q of P is safe if


(i) q is a database predicate, or
(ii) every rule de ning q is safe
2. Safe Variables. A variable X in rule r is safe if
(i) X is contained in some positive goal q (t1; :::; t ), where n

the predicate q (A1; :::; A ) is safe, or


n

(ii) r contains some equality goal X = Y , where Y is safe.


3. Safe Rules. A rule r is safe if all its variables are safe
4. The goal ?q (t1; :::; t ) is safe when the predicate q (A1; :::; A )
n n

is safe.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {2{
CZ

Safety
In practical languages, it is desirable to allow only
safe formulas, which avoid the problems of in nite
answers, and loss of domain independence.
But the problems of domain independence and
niteness of answers are undecidable even for non-
recursive queries. Therefore, necessary and su-
cient syntactic conditions that characterize safe
formulas cannot be given in general.
In practice, therefore, sucient conditions are de-
ned that might be a more restrictive than neces-
sary.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.4 From Safe Datalog to RA {1{
CZ

Unsafe Rules
For instance, to nd grades better than the grade Joe Doe got in
cs143, a user might write the following rule:

bettergrade(G1) took(\JoeDoe"; cs143; G); G1 > G:

This rule presents the following peculiar traits:

1. In nite answers. Assuming that, say Joe Doe got the grade
of 3:3 (i.e., B+) in course cs143, then, there are in nitely
many numbers that satisfy the conditions of being greater
than 3:3.
2. Lack of domain independence. A query formula is said to
be domain independent when its answer only depends on
the database and the constants in the query, but not on the
domain of interpretation. The set of values for G1 satisfying
the rule above depends on what domain we assume for num-
bers: e.g., integer, rational or real. Thus there is no domain
independence.
3. No relational algebra equivalent. Only database relations
are allowed as operand of a relational algebra expressions.
These relations are nite, and so is the result of every RA ex-
pression over these relations. Therefore, there cannot be any
RA expression over the database relations that is equivalent
to the rule above.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.6 Strati cation {7{
CZ

Strati cation
By sorting on pdg (P ), the nodes of P can partitioned into a
nite set of n strata 1; :::; n, such that, for each rule r 2 P , the
predicate-name, of the head of r belongs to a stratum that
(i) is  to each stratum containing some positive goal, and also

(ii) is strictly > than each stratum containing some negated goal.

Programs which are strati able, always have a clear meaning;


but programs that are not strati able might be ill-de ned from a
semantic viewpoint (See Chapter 10 Examples).
A strati cation of a program will be called strict every stratum
either contains a single predicate or a set of predicates that are
mutually recursive.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.6 Strati cation {6{
CZ

Predicate Dependency Graph


The Predicate Dependency Graph for a program P is a graph
having as nodes the names of the predicates in P . The graph
contains an arc a ! b if there exists a rule with goal name a and
head-name b. If the goal is negated then the arc is marked as a
negative arc.
PDG for the howsoon program
howsoon

 Z
}
Z
Z
:


larger
>




 
timeForbasic
6 Z
}
Z
Z
fastest
' : OCC
Z
}
Z
Z
C Z
basic subpart faster
&
C
Q
k C 

6 Q
C
 Q 
Q C
 
Q
 Q C 
assembly part cost

The nodes and arcs of the strong components of pdg (P ), respec-


tively, identify the recursive predicates and recursive rules of P .
A program is said to be strati able when none of its negative arcs
belongs to a strong component.o
Programs which are strati able, always have a clear meaning; but
programs that are not strati able might not.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.5 Recursive Rules {5{
CZ

One-at-the-Time
Set aggregates, such as count or sum, in SQL, require that the
element of a set be visited one-at-the-time. (These aggregates also
require arithmetic predicates, that we will consider later.)
Counting the elements in a set modulo an integer does not require
arithmetic, but still requires the elements of the set be visited one-
at-the-time.
The parity query: how many tuples in the base re-
lation? br(X)
between(X; Z) br(X); br(Y); br(Z); X < Y; Y < Z:
next(X; Y) :
br(X); br(Y); X < Y; between(X; Y):
next(nil; X) :
br(X); smaller(X):
smaller(X) br(X); br(Y); Y < X:
even(nil):
even(Y) odd(X); next(X; Y):
odd(Y) even(X); next(X; Y):
br is even :
even(X); next(X; Y):

next sorts the elements of br into an ascending chain, where the


rst link of the chain connects the distinguished node nil to the
least element in br (third rule in the example).
This works assuming that the elements in br are totally ordered.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.5 Recursive Rules {4{
CZ

Negation and Recursion


For each basic part nd the least time needed for
delivery
fastest(Part; Time) part cost(Part; Sup1; Cost; Time);
:faster(Part; Time):
faster(Part; Time) part cost(Part; Sup2; Cost; Time);
part cost(Part; Sup1; Cost; Time1);
Time1 < Time:

Times required for basic subparts of the given as-


sembly
timeForbasic(AssPart; BasicSub; Time)
basic subparts(AssPart; BasicSub);
fastest(BasicSub; Time):

The maximum time required for basic subparts of the given as-
sembly
howsoon(AssPart; Time) timeForbasic(AssPart; ; Time);
:larger(AssPart; Time):
larger(Part; Time) timeForbasic(Part; ; Time);
timeForbasic(Part; ; Time1);
Time1 > Time:

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.5 Recursive Rules {3{
CZ

Subparts

All subparts: a transitive-closure query


all subparts(Part; Sub) assembly(Part; Sub; ):
all subparts(Part; Sub2) all subparts(Part; Sub1);
assembly(Sub1; Sub2; ):

For each part, basic or otherwise, nd its basic sub-


parts. A basic part is a subpart of itself
basic subparts(BasicP; BasicP) part cost(BasicP; ; ; ):
basic subparts(Prt; BasicP) assembly(Prt; SubP; );
basic subparts(SubP; BasicP):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.5 Recursive Rules {2{
CZ

Relational Tables for a BoM application


PART COST
BASIC PART SUPPLIER COST TIME
top tube cinelli 20.00 14 ASSEMBLY
top tube columbus 15.00 6 PART SUBPART QTY
down tube columbus 10.00 6 bike frame 1
head tube cinelli 20.00 14 bike wheel 2
head tube columbus 15.00 6 frame top tube 1
seat mast cinelli 20.00 6 frame down tube 1
seat mast cinelli 15.00 14 frame head tube 1
seat stay cinelli 15.00 14 frame seat mast 1
seat stay columbus 10.00 6 frame seat stay 2
chain stay columbus 10.00 6 frame chain stay 2
fork cinelli 40.00 14 frame fork 1
fork columbus 30.00 6 wheel spoke 36
spoke campagnolo 0.60 15 wheel nipple 36
nipple mavic 0.10 3 wheel rim 1
hub campagnolo 31.00 5 wheel hub 1
hub suntour 18.00 14 wheel tire 1
rim mavic 50.00 3
rim araya 70.00 1

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.5 Recursive Rules {1{
CZ

Transitive Closure Queries


Transitive closure of the graph: arc(X, Y)

path(X; Y) arc(X; Y):


path(X; Z) arc(X; Y); path(Y; Z):

Transitive Closure of the graph: arc(X, Y)

path(X; Y) arc(X; Y):


path(X; Z) path(X; Y); arc(Y; Z):

Transitive Closure of the graph: arc(X, Y)

path(X; Y) arc(X; Y):


path(X; Z) path(X; Y); path(Y; Z):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7 Expressive Power and Data Complexity {3{
CZ

The Expressive Power Hierarchy


1. Strati ed safe Datalog is equivalent to RA + xpoint (on
monotonic RA)
2. Safe, strati ed Datalog expresses every DB-PTIME query if
we assume that there exists a total order in the databases
(thus it is DB-PTIME complete).
3. Order-independence property of queries (genericity):
queries are insensitive to the renaming of constants.
4. To be able to express all DB-PTIME queries Under genericity,
a non-deterministic construct such as choice is needed:
Strati ed Datalog with choice is DB-PTIME-complete.
5. Safe Datalog (without function symbols) can express expo-
nential queries, using non-strati ed negation and stable model
semantics (to be discussed later)
6. Function symbols and recursion are needed for Turing com-
pleteness.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7 Expressive Power and Data Complexity {2{
CZ

Polynomial Data Complexity


1. Use Turing machines as the general model of computation
and encode the database as a tape of length n
2. Then any computable function on the database can be en-
coded as a Turing machine
3. some of these machines halt (complete their computation),
in O(n) steps, other in an an exponential number of steps,
others never terminate.
4. The set machines that halt in a number of steps which is
polynomial in n de nes the class of DB-PTIME functions.

Are relational algebra expressions ('safe relational calculus' safe


non-recursive Datalog) evaluable in DB-PTIME?
Yes, and actually we use indices and query optimizers to
keep exponents and coecient small.
But these languages cannot express DB-PTIME. For instance
they cannot express transitive closures, or aggregates (thus the
most frequently used aggregates were added to SQL in ad hoc
fashion).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7 Expressive Power and Data Complexity {1{
CZ

Expressive Power
1. Expressive Power of a language: the set of functions that can
be written as programs of the language.
2. Data Complexity: query languages are viewed as mappings
from the DB to the answer. The big O is evaluated in terms
of the size of the database, which is always nite.

The following languages are equivalent w.r.t. expressive power:

1. Relational Algebra expressions


2. Safe relational calculus queries (tuple or domain)
3. Datalog with Safe non-recursive rules.

Any language that is at least as powerful as these is said to be re-


lationally complete. Query languages must be relationally com-
plete (Codd{1970).
But relational completeness is not enough. For instance set-
aggregates are beyond relational completeness, and had to be
added to SQL in ad-hoc fashion.
Question: what is a reasonable requirement for a query language?
DB-PTIME completeness might be the answer.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7.1 Functors and Complex Terms {3{
CZ

Nesting a Flat Relation


Now, if we have:

ps(top tube, cinelli)


ps(top tube, columbus)
ps(top tube, mavic)

How do we construct the nested relation back?

between(P X Z) ; ; ; ; ; ; ; ; < ; < Z:


ps(P X) ps(P Y) ps(P Z) X Y Y
smaller(P X) ; ; ; ; ; < :
ps(P X) ps(P Y) Y X
nested(P [X]) ; ; ; ; :
ps(P X) :smaller(P X)
;
nested(P [Yj[XjW]]) ; ; ; ; < ;
nested(P [XjW]) ps(P Y) X Y
; ; :
:between(P X Y)
ps nested(P W) ; ; ; ; :
nested(P W) :nested(P [XjW])

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7.1 Functors and Complex Terms {2{
CZ

Lists
[ ] is the empty list.
[HeadjTail] represents a non-empty list.
[mary; mike; seattle]
[mary; [mike; [seattle; [ ]]]]
A list-based representation for suppliers of top tube

;
part sup list(top tube [cinelli columbus mavic]) ; ; :

Normalizing a nested relation into a at relation


flatten(P S L) ; ; part sup list(P [SjL]) ; :
flatten(P S L) ; ; flatten(P [SjL]);; :
;
ps(Part Sup) flatten(Part Sup ) ; ; :
This program applied to the previous fact yields.

ps(top tube, cinelli)


ps(top tube, columbus)
ps(top tube, mavic)

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.7.1 Functors and Complex Terms {1{
CZ

Functors and Complex Terms


Flat parts, their number, shape and weight, follow-
ing the schema: part(Part#; Shape; Weight)
; ;
part(202 circle(11) actualkg(0 034)) : :
;
part(121 rectangle(10 20) unitkg(2 1)); ; : :
part weight(No Kilos) ; part(No ; actualkg(Kilos)) ; :
part weight(No Kilos) ; part(No Shape unitkg(K)) ; ; ;
area(Shape Area) Kilos = K  Area ; ; :
area(circle(Dmtr) A) ; A = Dmtr  Dmtr  3:14=4:
area(rectangle(Base Height) A) ; ; A = Base  Height:
The complex terms circle(11), actualkg(34), rectangle(10; 20),
and unitkg(2:1) are in logical parlance called functions (A func-
tor followed by a list of arguments in parentheses).

In actual applications, these complex terms do not represent evalu-


able functions; they are used as variable length sub-records.
Thus, circle(11) and rectangle(10, 20), respectively, de-
note that the shape of our rst part is a circle with diameter
20cm, while the shape of the second part is a rectangle with base
10cm and height 20cm. Any number of sub-arguments is allowed
in such complex terms, recursively.
Objects of arbitrary complexity, including solid objects, can be
nested and represented in this fashion.
Functors are here used as case discriminants.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.9 The Models of a Program {9{
CZ

Minimal Models and Least Models


A model M for a program P is said to be a minimal model for P
if there exists no other model M of P where M  M . A model
0 0

M for a program P is said to be its least model if M  M for 0

every model M of P .0

Model Intersection Property. Let P be a positive program, and


M1 and M2 be two models for P . Then, M1 \ M2 is also a model
for P .
Theorem: Every positive program has a least model.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.9 The Models of a Program {8{
CZ

Models of a Program
Let I be an interpretation for a program P . If an atom a 2 I we
say that a is true, otherwise we say that a is false. Conversely for
negated atoms :a.
Satisfaction: A rule r 2 P is said to hold true in interpretation
I , or to be satis ed in I , if every instance of r is satis ed in I .
Model. An interpretation I that makes true all rules P is called
a model for P .
I is a model for P i it satis es all the rules in ground(P ).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.9 The Models of a Program {7{
CZ

Ground version of a program


Let r be a rule in a program P .
ground(r) denotes the set of ground instances of r (i.e., all the
rules obtained by assigning to the variables in r, values from the
Herbrand universe UP ).
Example:
parent(X Y) ; mother(X Y) ; :
Since there are 2 variables in this rule and UP = 3, then ground(r)
consists of 3  3 rules:
parent(anne anne) ; mother(anne anne) ; :
parent(anne marc) ; mother(anne marc) ; :
:::
parent(silvia silvia) ; mother(silvia silvia) ; :
The ground version of a program P , denoted ground(P ), is the
set of the ground instances of its rules:

ground(P ) = fground(r) j r 2 P g

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {6{
CZ

Examples

anc(X Y); parent(X Y) ; :


anc(X Z); anc(X Y) parent(Y Z); ; ;
parent(X Y) ;
father(X Y) ; :
parent(X Y) ;
mother(X Y) ; :
mother(anne silvia) ; :
mother(anne marc) ; :
In this example:
UP = fanne; silvia; marcg,
BP = fparent(x; y)jx; y 2 UP g [ ffather(x; y)jx; y 2 UP g [
fmother(x; y)jx; y 2 UP g [ fanc(x; y)jx; y 2 UP g
There are 236 Herbrand interpretations for this program. (Four bi-
nary predicates, and three possible assignments for the rst argu-
ments and three for their second arguments then BP = 4  3  3 =
36. There are 2 BP subsets of BP )
j j

With in nite universe an in nite number of interpretations.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {5{
CZ

Herbrand Interpretation for a program P


 The Herbrand Universe for P , denoted UP , is de ned as the
set of all terms that can be recursively constructed by letting
the arguments of the functions be constants in P or elements
in UP .
 Then the Herbrand Base of P is de ned as the set of atoms
that can be built by assigning the arguments in the predicates
elements of UP .
 Herbrand Interpretation is de ned by assigning to each n-
ary predicate q an n-relation q, where q (a1 ; :::; an) is true i
(a1 ; :::; an) 2 q, a1 ; :::; an denoting elements in UL.
Alternatively, an Herbrand interpretation of P is a subset of
the Herbrand Base of P .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {4{
CZ

Positive Programs
A de nite clause with an empty body is called a unit clause.
It is customary to use the notation \A:" instead of the more
precise notation \A :" for such clauses.
A fact is a unit clause without variables.
A unit clause (everybody loves himself) and three
facts
loves(X X) ; :
;
loves(marc mary) :
loves(mary tom); :
hates(marc tom); :
A positive logic program is a set of de nite clauses.
We will use the terms de nite clause program and positive pro-
gram as synonyms.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {3{
CZ

Closed Formulas and Clauses


A WFF F is said to a closed formula if every variable occurrence
in F is quanti ed.
The formula in the previous example is not closed. But the fol-
lowing one is.

8x8y8z (p(x; z) _ :q(x; y) _ :r(y; z))


A De nite Clause is a WFF that has the following properties:
 it is closed
 all its variables are universally quanti ed
 it is a disjunction of one positive atom and zero or more
negated atoms.

A de nite clause is representable with the rule notation:


forallx8y8zp(x; z) q(x; y); r(y; z):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {2{
CZ

Well-Formed Formulas (WFFs)


1. If p is an n-ary predicate and t1; :::; tn are terms, then
p(t1; :::; tn) is a formula (called an atomic formula or, more
simply, an atom).
2. If F and G are formulas, then so are :F , F _ G; F ^
G; F G F ! G and F $ G are formulas.
3. If F is a formula and x is a variable, then 8x (F ) and 9x (F )
are formulas. When so, x is said to be quanti ed in F .

Example:

9G1(took(N; cs101; G1)) ^ 9G2(took(N; cs143; G2)) ^


9M (student(N; M; junior))

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.8 Syntax and Semantics of Datalog Languages {1{
CZ

Syntax of First Order Logic


Its alphabet consists of:

1. Constants
2. Variables: in addition identi ers beginning with upper case,
x, y and z also represent variables in this section.
3. Functions, such as f (t1 ; :::; tn ) where f is an n-ary functor
and t1; :::; tn are the arguments.
4. Predicates
5. Connectives. These include basic logical connectives _, ^,
: and the implication symbol , !, and $.
6. Quanti ers. The existential quanti er 9 and the universal
quanti er 8.
7. Parentheses and punctuation symbols, used liberally as needed
to avoid ambiguities.

A Term is de ned inductively as follows:

(a) A variable is a term.


(b) A constant is a term.
(c) If f is an n-ary functor and t1; :::; tn are terms, then
f (t1; :::; tn) is a term.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.10 Fixpoint-Based Semantics {4{
CZ
Computation of the Least Fixpoint
MP = lfp(TP ) = TP"! (;) yields a simple algorithm for comput-
ing the least model of a de nite-clause program.
Since TP is monotonic, and that TP"0(;)  TP"1(;), then TP"n(;) 
TP"n+1(;)
Thus, the successive powers of TP , form an ascending chain.
Moreover:

TP"k (;) = [ TP"n (;)


+1

nk

If TP"n+1(;) = TP"n(;), then TP"n(;) = TP"! (;).


Thus, the least xpoint and least model can be computed by
starting from the bottom and iterating the application of T until
no new atoms are obtained and the n + 1-th power is identical
to the n-th one (if such condition never occurs then we have an
in nite computation).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.10 Fixpoint-Based Semantics {3{
CZ
Operational Semantics: Powers of TP
TP" (I ) = I
0

:::
TP"n (I ) = TP (TP"n(I ))
+1

Moreover, with ! denoting the rst limit ordinal, we de ne:

TP"! (I ) = [fT "n(I ) j n  0g

Of particular interest are the powers of TP starting from the empty


set, i.e., for I = ;
Theorem: Let P be a de nite clause program. Then lfp(TP ) =
TP"! (;).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.10 Fixpoint-Based Semantics {2{
CZ
Least Fixpoint of TP
Ee view a program P as de ning the following xpoint equation
over Herbrand interpretations:

I = TP (I )

In general, a xpoint equation might have no solution, one solu-


tion or several solutions.
Interpretations are subsets of 2jBP j that are partially ordered by
. Actually (2jBP j); ) is a complete lattice.
Also, TP is monotonic.
Thus by the Knaster/Tarski's xpoint theorem:
Theorem:Let P be a de nite clause program. There always
exist a least xpoint for TP , denoted lfp(TP ).
Theorem: Let P be a de nite clause program. Then MP =
lfp(TP ).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 8.10 Fixpoint-Based Semantics {1{
CZ
Immediate Consequence Operator
Rules can be viewed as mappings. Recursive rules de ne a Fix-
point Equation
The Immediate Consequence Operator TP is de ned as follows:

TP (I ) = fA 2 BP j 9r : A A ; :::; An 2 ground(P ); fA ; :::; Ang  I g


1 1

Thus TP is a mapping from Herbrand interpretations of P to


Herbrand interpretations of P .
For the ancestor program:
With I = fanc(anne; marc); parent(marc; silvia)g,
we have:

TP (I ) = fanc(marc; silvia); anc(anne; silvia);


mother(anne; silvia); mother(anne; marc)g

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {9{
CZ
General Rewriting
The general expression of TR (S; S; S 0) for a recursive rule of rank
k is as follows:

r : Q0 c0; Q1; c1; Q2; : : : ; Qk ; ck :

r1 :  0Q0 c0;  Q1; c1 ; Q02 ; ::: 0


Qk; ck
r2 :  0Q0 c0; Q1; c1;  Q2; ::: 0
Q ;
k ck

:::
rj :  0Q0 :::  Qj 0
Qk; ck
:::
r2 :  0 Q0 c0; Q1; c1; Q2; :::  Qk; ck

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {8{
CZ
Seminaive Fixpoint
Semianive xpoint is another name for the di erential xpoint
just described.
Analogy with symbolic di erentiation
Performance improvements: it is typically the case that n =
jS j  N = jS j  jS 0j.
The original ancs rule, for instance, requires the equijoin of two
relations of size N ; after the di erentiation we need to compute
two equijoins, each joining a relation of size n with one of size N .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {7{
CZ
NonLinear Rules
Quadratic Ancestor Rules
ancs(X; Y) parent(X; Y):
ancs(X; Z) ancs(X; Y); ancs(Y; Z):

r :  0ancs(X; Z) ancs0(X; Y); ancs0(Y; Z):

r1 :  0ancs(X; Z)  ancs(X; Y); ancs0(Y; Z):


r2 :  0ancs(X; Z) ancs(X; Y); ancs0(Y; Z):
Now, we can re-write r2 as:
r2;1 :  0ancs(X; Z) ancs(X; Y);  ancs(Y; Z):
r2;2 :  0ancs(X; Z) ancs(X; Y); ancs(Y; Z):

Rule r2;2 produces only `old' values, and can thus be eliminated.
All is left is rules r1 and r2;1 , below:

 0 ancs(X; Z)  ancs(X; Y); ancs0(Y; Z):


 0 ancs(X; Z) ancs(X; Y);  ancs(Y; Z):

For nonlinear rules, the immediate consequence operator in the


Algorithm has the more general form  0S := TR(S; S; S 0) S 0,
where S = S 0 S .
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {6{
CZ
Symbolic Di erentiation
anc,  anc, and anc0, respectively, denote ancestor atoms that are
in S , S , and S 0 = S S . [
Then, to compute S 0 := TR (S 0) S0 we can use a TR de ned by
the following rule:

 0anc(X; Z) anc0(X; Y); parent(Y; Z):

This can be rewritten as:

 anc(X; Z)  anc(X; Y); parent(Y; Z):


 anc(X; Z) anc(X; Y); parent(Y; Z):

The second rule can now be eliminated, since it produces


only atoms that were already contained in anc0, i.e., in the
S 0 computed in the previous iteration.

Thus, in the previous Algorithm rather than using S := TR(S 0)


S 0 we can write S := TR (S ) S 0 , to express the fact that the
argument of TR is the set of delta tuples from the previous step,
rather the set of all tuples obtained so far. This transformation
holds for all linear recursive rules.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {5{
CZ
Di erential Fixpoint
Redundant Computation: the j -th iteration step also re-computes
all atoms obtained in the j 1-th step. Finite di erences tech-
niques tracing the derivations over two steps:
1. S is a relation containing the atoms obtained up to step j 1,

2. S = R(S ) S = TR(S ) S denotes the atoms newly


obtained at step j (i.e., the atoms that were not in S at step
j 1).
3.  0S = R(S 0 ) S 0 = TR (S 0) S 0 are the atoms obtained at
step j .
The naive xpoint Algorithm can be improved as follows
Di erential xpoint
S := M ;
S := E (M );
S 0 := S S ;[
while (S = ) 6 ;
f
 0 S := TR (S 0 ) S 0;
S := S 0 ;
S :=  0 S ;
S 0 := S S[
g
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {4{
CZ
In ationary Fixpoint
"P! (Mj ) for stratum j of the iterated xpoint computation is is
j
computed as follows (to simplify the notation  stands for P j

and M for Mj 1).


The computation for each stratum (naive xpoint
algorithm)
S := M ;
S 0 := (M )
while  (S S 0)
f
S := S 0 ;
S 0 := (S ) g
But T (M ) = TE (M ) and (M ) = E (M ) where TE denotes
the immediate consequence operator for the exit rules and E
denotes its in ationary version.
Let TR denote the immediate consequence operator for the recur-
sive rules and let R be its in ationary version. Then, (S ) in
the while loop can be replaced by R(S ), while (M ) outside
the loop can can be replaced by E (M ).
r1 : anc(X; Y) parent(X; Y):
r2 : anc(X; Z) anc(X; Y); parent(Y; Z)
E and R are de ned by rules r1 and r2 , respectively.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.3 Di erential Fixpoint Computation {3{
CZ
Semantics
For positive programs:
Theorem: Let P be a positive program strati ed in n strata,
and let Mn be the result produced by the iterated xpoint
computation. Then, MP = Mn, where MP is the least model
of P .
For programs with strati ed negation:
Theorem: Let be a program strati ed in n strata, and let
P
Mn be the result produced by the iterated xpoint computa-
tion. Mn is the unique stable model for P .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.1,2 Operational Semantics: Bottom-Up Execution {2{
CZ
Strati ed Programs and Iterated Fixpoint
In actual systems, TP"! (;) is computed by strata. Unless otherwise
speci ed, le us assume strict strati cation.
Let P be program. The inflationary immediate consequence
operator for P , denoted P is a mapping on a subset of BP de-
ned as follows:
P (I ) = TP (I ) [ I

The computation TP"! (;) is frequently called in ationary x-


point computation

"Pn(;) = TP"n(;)

MP = lf p(TP ) = TP"! (;) = lf p(P ) = "P! (;)

Iterated Fixpoint Computation for program P strat-


i ed in n strata
Let Pj , 1  j  n denote the rules with their head in the j -th
stratum. Then, Mj be inductively constructed as follows:

1. M0 = ; and
2. Mj = "P! (Mj 1).
j

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Chapter 9:

Implementation of Rules and Recursion

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {17{
CZ

Programming in Prolog
A solution to the previous problems is to put the exit rule before
the recursive one.
anc(X Y); parent(X Y); :
anc(X Z); ; ;
anc(X Y) parent(Y Z) ; :
Prolog loops after the generation of all the results. To make things
work parent must be put before anc in the recursive rule. A skill
not hard to learn.
In many cases, however, reordering rules and goals does avoid
in nite loops.

anc(X Y) ; ; :
parent(X Y)
anc(X Z) ; ; ; ; :
anc(Y Z) anc(X Y)

Cycles in the parent database will also cause problems| SLD-


resolution has no memory.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {16{
CZ

Prolog
 Depth- rst exploration of alternatives, where goals are al-
ways chosen in a left-to-right order and the heads of the
rules are also considered in the order they appear in the
program.
 The programmer is given responsibility for ordering the rules
and their goals in such a fashion as to guide Prolog into suc-
cessful and ecient searches.
 The programmer must also make sure that the procedure
never falls into an in nite loop.

Example: The goal ?anc(marc; mary) on the program:


anc(X Y); ; ;
anc(X Y) parent(Y Z) ; :
anc(X Z); parent(X Y) ; :
This causes an in nite loop that never returns any result.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {15{
CZ

Refutation
S = fF1; :::; Fng is a nite set of closed formulas, then F is a
logical consequence of S i F1 ^ ::: ^ Fn ! F is valid.
Theorem: Let S be a set of closed formulas and F be a closed
formula. Then F is a logical consequence of S i S [ f:F g is
unsatis able.
Thus to prove a goal G from a set of rules and facts P we simply
have to prove that P [ f Gg is unsatis able|i.e., we have to
refute P [ f Gg.
Resolution theorem proving that exactly that. It refutes the goal
list.
Prolog can be viewed in that light. But in fact there is no real
refutation|just procedural composition via uni cation.
The term SLD stands for Selected literal Linear resolution (or
refutation) strategy over De nite clauses.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {14{
CZ

Satis ability
Let S be a set of closed formulas. We say that
S is satis able there is an interpretation which is a model for S .
S is valid if every interpretation of L is a model for S .
S is unsatis able if it has no models.
Theorem: Let S be a set of clauses. Then S is unsatis able i
S has no Herbrand models.
Let S be a set of closed formulas and F be a closed formula of a
rst order language L. We say F is a logical consequence of S if,
every interpretation of L that is a model for S is also a model for
F.
Note that if S = fF1; :::; Fng is a nite set of closed formulas,
then F is a logical consequence of S i F1 ^ ::: ^ Fn ! F is
valid.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {13{
CZ

Equivalent Semantics
Theorem: The success set of a program is equal to its least
Herbrand model.

 Equivalence of the three formal semantics. (Least Model,


Least Fixpoint, and SLD-resolution).
 SLD-resolution is a form of theorem proving (an ecient one).
 In general, generation of the success requires that all choices
are visited in a breadth- rst fashion. This too inecient for
practical languages such as Prolog that uses depth- rst in-
stead.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {12{
CZ

Success Set
 SLD-derivations can be nite or in nite.
{ A nite SLD-derivation can be successful or failed.
{ A successful SLD-derivation is a nite one that ends in
the empty clause. This is also called an SLD-refutation.
{ A failed SLD-derivation is a nite one that ends in a non-
empty goal, where the selected atom in this goal does not
unify with the head of any program clause.
 De nition Let P be a program. The success set of P is the
set of all A 2 BP such that P [f Ag has an SLD-refutation
(i.e., there exist some successful derivation for it).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {11{
CZ

SLD|In nite Trees


p(x,b)
1. p(x,z) q(x,y), p(y,z)
2. p(x,x)
3. q(a,b)
.
p(X, b)
Figure: An in nite SLD-tree. @@
1 @@ 2
@@
q(X,Y)
.
, p(Y, b) X/b
@@
1 @@ 2
@@
q(X,Y), g(Y, Z), p(Z,b) q(X, b)
.
@@
1 @@ 2
@@ 3
q(X,Y), q(Y, Z), q(Z, U), p(U, b) q(X, Y), q(Y,b)
@@
1 @@ 2 X/a
3

in nite q(X,a)
failure

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {10{
CZ

This SLD-tree comes from the standard PROLOG computation


rule (select the leftmost atom). The selected atoms are under-
lined.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {9{
CZ

The SLD-Refutation Procedure


Example Consider the goal p(x,b) on program
1. p(x,z) q(x,y), p(y,z)
2. p(x,x)
3. q(a,b)

A nite SLD-tree for this program is:


p(X, b)
.
@@
1 @@ 2
@@
q(X,Y) , p(Y, b) X/b

.
p(b,b)
@@
1 @@ 2
@@
q(b,Z),
. p(Z, b) X/a
Figure: A nite SLD-tree. denotes success.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {8{
CZ

Examples of SLD-Resolution
Any realization of the top-down evaluation procedure will have to
make two choices at each step by selecting

1. the next goal from the goal list and


2. the rule whose head uni es with the selected goal.

In general, there will be more than one goal and many rules to
choose from. The choice a ects the eciency of the deduction
process and also the actual result when the search falls into an
in nite loop.
PROLOG interpreters usually choose goals in a left-to-right
order and rules in a sequential order that corresponds to
a depth- rst search of the SLD-tree with backtracking when
failure occurs. Thus, PROLOG treats the goal list as a stack
onto which goals are pushed or popped, depending on success
or failure.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {7{
CZ

SLD-resolution. Example

;
s(X Y) ; ;
p(X Y) q(Y) :
; :
p(X 3)
q(3) :
q(4) :
1. The initial goal list is
s(5; W)

2. This uni es the head of the rst rule with mgu: fX=5; Y=W g.
New goal list
p(5; W); q(W)

3. Say that we choose q(W) as a goal: it uni es with the fact


q(3), under the substitution fW=3g.
p(5; 3)
This uni es with the fact p(X; 3) under the substitution fX=5g.
The goal list becomes empty and we report success.
Thus, a top-down evaluation returns the answer fW=3g for
the query s(5; W) from the example program.

However, if we choose instead q(4), ...

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {6{
CZ

SLD-Resolution
A rule r : A B1 ; : : : ; Bn , and
A query goal g , r and g have no variables in common.
If 9 a most general uni er (mgu)  for A and g , the goal list:
B1; : : : ; Bn:
is called resolvent of r and g .
SLD-Resolution Algorithm:
Input: A rst-order program P and a goal list G.
Output: A G that was proved from P , or failure.
begin Set Res = G;
While Res is not empty, repeat the following:
Choose a goal g from Res;
Choose a rule A B1 ; : : : ; Bn (n  0) from P
such that A and g unify under the mgu  ,
(renaming the variables in the rule as needed);
If no such rule exists, then
output failure and exit.
else Delete g from Res;
Add B1 ; : : : ; Bn to Res;
Apply  to Res and G;
If Res is empty then output G
end
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {5{
CZ

Uni cation
A substitution  is called a uni er for (i.e., they cannot be
made identical)
two terms A and B if A = B.
Example The two terms p(f (x); a); and p(y; f (w)) are not
uni able, because the second arguments cannot be uni ed
The two terms p(f (x); z ); and p(y; a) are uni able, since  =
fy=(f (a); x=a; z=a)g is a uni er.

A uni er  for two terms is called a most general uni er (mgu),


if for each other uni er , there exists a substitution  such that
= .
 = fy=(f (a); x=a; z=a)g is not the mgu of p(f (x); z); and
p(y; a).
A most general uni er for these two is  = fy=(f (x); z=ag. Note
that  = fx=ag.
There exist ecient algorithms to perform uni cation: such al-
gorithms either return a most general uni er or report that none
exists.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {4{
CZ

Composition of Substitutions
Let  = fu1=s1; : : : ; um=smg and  = fv1=t1 ; : : : ; vn=tng be sub-
stitutions.
Then the composition  of  and  is the substitution obtained
from the set
fu1 =s1; : : : ; um=sm; v1=t1 ; : : : ; vn=tng
by deleting any binding ui=si for which ui = si and deleting
any binding vj =tj for which vj 2 fu1; : : : ; umg.
Example
Let  = f(x=f (y ); y=z )g and  = fx=a; y=b; z=y g.
Then  = fx=f (b); z=y g.

  
x=f (y) x=a x=f (b)
y=z y=b y=y
z=z z=y z=y

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {3{
CZ

Substitutions
Substitutions: A substitution  is a nite set of the form fv1=t1; : : : ; vn=tng,
where each vi is a distinct variable, and each ti is a term distinct
from vi. Each ti is called a binding for vi.
The substitution  is called a ground substitution if every ti is a
ground term. (Then X= is an instantiation of X to .)
E denotes the result of applying the substitution  to E; i.e.,
of replacing each variable with its respective binding. For in-
stance, if E = p(x; y; f (a)) and  = fx=b; y=xg. Then E =
p(b; x; f (a)). If = fx=cg, then E = p(c; y; f (a)).
Thus variables that are not part of the substitution are left un-
changed.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {2{
CZ

Passing Bindings from Goals to Heads

r1 : part weight(No; Kilos) ;;


part(No actualkg(Kilos)) :
r2 : part weight(No; Kilos) ; ;
part(No Shape unitkg(K)) ;
; ;
area(Shape Area) Kilos = K  Area :
r3 : area(circle(Dmtr); A) A = Dmtr  Dmtr  3:14=4:
r4 : area(rectangle(Base; Height); A) A = Base  Height:

The goal area(Shape; Area) in rule r1 can be viewed as a call to


the procedure area de ned by rules r3 and r4.
Thus Shape is instantiated to circle(11) by the execution of
r3, and rectangle(10; 20) and r4.
Instantiated to c means \assigned the value of the constant c".
Shape=rectangle(10; 20) denotes that Shape has been instan-
tiated to rectangle(10; 20).
Arguments can be complex; thus the passing of parameters is
performed through a process known as uni cation.
=
Shape rectangle(10 20), ;
this is made equal to (uni ed to)
the rst argument of the second area rule, rectangle(Base,
Height), by setting Base=10 and Height=20.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.4 Top-Down Execution {1{
CZ

Top-Down Execution of Datalog


 A strict bottom-up execution strategy is frequently not nat-
ural nor ecient.
 Pure top-down, SLD-resolution, Prolog
 Mixing top-down and bottom-up in deductive databases

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5 Rule Rewriting Methods {4{
CZ

Right-Linear Rules
Consider now the right-linear formulation of ancestor:
Right-linear rules for the descendants of Tom
anc(Old; Young) parent(Old; Young):
anc(Old; Young) parent(Old; Mid); anc(Mid; Young):

With these right-linear rules the query ?anc($Name; X) can no


longer be implemented by specializing the program.
Solution: turn the rules into equivalent left-recursive ones!
The situation is symmetric. A query such as anc(X; $Y) cannot
be supporte on the left-linear version of the program. But the
program can be transformed into the one above, to right-linear
rules above to which specializion can apply.
Deductive Database compilers do that.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5 Rule Rewriting Methods {3{
CZ

Left-Linear and Right-Linear Recursion


?anc(tom; Desc).

anc(Old; Young) parent(Old; Young):


anc(Old; Young) anc(Old; Mid); parent(Mid; Young)

?anc(tom; Desc)
anc(Old=tom; Young) parent(Old=tom; Young):
anc(Old=tom; Young) anc(Old=tom; Mid); parent(Mid; Young):

These are left-linear recursive rules.


Query form are used for compilation: ?anc($Someone; Desc).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5 Rule Rewriting Methods {2{
CZ

Specialization of the Original Rules


Also a partial instantiation of the program that we will call spe-
cialization.

Also the result is the same as (equality to constants) selection


pushing into equivalent relational algebra expression.
For non-recursive rules it is all simple.
For recursive rules it is complicated.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5 Rule Rewriting Methods {1{
CZ

Uni cation at Compile Time


Blood Relations
anc(Old; Young) parent(Old; Young):
anc(Old; Young) anc(Old; Mid); parent(Mid; Young)
grandma(Old; Young) parent(Mid; Young); mother(Old; Mid):
parent(F; Cf) father(F; Cf):
parent(M; Cm) mother(M; Cm):

Find the grandma of marc

?grandma(GM; marc)

grandma(Old; Young=marc) parent(Mid; Young=marc);


mother(Old; Mid):
parent(F; Cf=marc) father(F; Cf=marc):
parent(M; Cm=marc) mother(M; Cm=marc):

This a form of partial evaluation.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.4 Supplementary Magic {9{
CZ

Supplementary Magic Sets


 Only those variables that are needed for the second xpoint
are stored in the supplementary magic relations: thus St is
not included.
 The method of choice in many prototypes because of gener-
ality and robustness.
 the method works with cycles in the database
 Ability of storing one-way predicates, such as X = f(Y; Z; : : :).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.4 Supplementary Magic {8{
CZ

Bene ts of Memorizing
People who are of the same generation through common an-
cestors who are less than 12 levels remote and always lived
in the same state
?stsg(marc; 12; Z):
stsg(X; K; Y) parent(XP; X); K > 0; KP = K 1;
born(X; St); born(XP; St);
stsg(XP; KP; YP);
parent(YP; Y):
stsg(X; K; X):

Since the rst two arguments of stsg are bound, The supplemen-
tary magic method for this example is:

m:stsg(marc; 12):
spm:stsg(X; K; XP; KP) m:stsg(X; K);
parent(XP; X); K > 0; KP = K 1;
born(X; St); born(XP; St):
m:stsg(X; K) spm:stsg(X; K; XP; KP):
stsg(X; K; X) m:stsg(X; K):
stsg(X; K; Y) stsg(XP; KP; YP); spm:stsg(X; K; XP; KP);
parent(YP; Y):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.4 Supplementary Magic {7{
CZ

Supplementary Magic Sets

m:sg(marc):
m:sg(XP) m:sg(X); parent(XP; X):
spm:sg(X; XP) parent(XP; X); m:sg(X):
sg0(X; X) m:sg(X):
sg0(X; Y) sg0(XP; YP); spm:sg(X; XP); parent(YP; Y):
%sg (X; Y)
0
parent(XP; X); sg (XP; YP); parent(YP; Y); m:sg(X):
0

?sg0(marc; Z):

In addition to the magic predicates, supplementary predicates are


used to store the pairs bound-arguments-in-head/bound-arguments-
in-recursive-goal.
The magic set method and supplementary magic set method are
very similar{often the rst term is used to refer to both methods.
However, there are important di erences:
The magic predicate and the supplementary magic predicate are
normally written in a mutually recursive form.

m:sg(marc):
spm:sg(X; XP) m:sg(X); parent(XP; X)
m:sg(XP) spm:sg(X; XP):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.3 Same-Generation Example {6{
CZ

The counting method: pros and cons


 The counting method is often more ecient than the magic-
set method.
 However it is not as general: e.g. add the goal X 6= Y to
ensure that you to leave out marc from the people who are
of the same generation as marc. One need to memorize.
 Cycles in the database will throw it into a loop{just as Prolog
 Many approaches to get methods that combine the strengths
of magic and counting have been developed.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.3 Same-Generation Example {5{
CZ

The Counting Method


\People who are of the same generation as marc". Is logically
equivalent to:

1. Find the ancestors of marc and their levels, where marc is


a zero-level ancestor of himself, his parents are rst-generation
(i.e., rst-level) ancestors, his grandparents are second-generation
ancestors, and so on.
This computation is performed by the predicate sg up in
2. Switch to the computation of descendants
3. Perform the computation of descendants|while descreasing
the level by one at each step
This is performed by the predicate sg dwn in
4. Check when you return to level 0 to nd those who are of the
same generation as marc.

Find ancestors of marc, and then their descendants


sg up(0; marc):
sg up(J + 1; XP) parent(XP; X); sg up(J; X):
sg dwn(J; X) sg up(J; X):
dwn(J 1; Y) sg dwn(J; YP); parent(YP; Y):
?dwn(0; Z):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.2 The Magic Sets Method {4{
CZ

Computing the Magic Predicate

?sg(marc; Who):
sg(X; Y) parent(XP; X); sg(XP; YP); parent(YP; Y):
sg(A; A):

Binding analysis of the top-down behavior.


The rst argument in the query: thus X is bound and through
goal parent(XP; X) the binding is passed to XP in the recursive
goal.
The variables Y; YP remain unbound.
The rules for the magic predicates can be otained by:
(1) using the query constant as the exit rule (a fact).
(2) using the top-down bound arguments and predicates for the
exit rule|however head and tail must be reversed!

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.2 The Magic Sets Method {3{
CZ

Only ancestors of marc are of interest


The sg rules are linear, but not left-linear or right-linear.
Say that the predicate m:sg(X) computes the ancestors of marc.
We can add m:sg(X) to the exit rule to make it safe, and to the
recursive rule to make it more selective:

?sg (marc; Z):


0

sg (X; X)
0
m:sg(X):
sg (X; Y)
0
parent(XP; X); sg0(XP; YP); parent(YP; Y); m:sg(X):

Now, m:sg(X) is called the magic predicate for sg and can be


computed from the original program as follows:
m:sg(marc):
m:sg(XP) m:sg(X); parent(XP; X):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.2 The Magic Sets Method {2{
CZ

The Top-Down Computation for the


same-generation

?sg(marc; Who):
sg(X; Y) parent(XP; X); sg(XP; YP); parent(YP; Y):
sg(A; A):

 If parent(tom marc) is in the database, the resolvent of the


;

query goal with the rst rule is


parent(XP; marc); sg(XP; YP); parent(YP; Y).
Then, by unifying the rst goal with the fact parent(tom; marc),
the new goal list becomes:
sg(tom; YP); parent(YP; Y).

 The recursive call unfolds as in the previous case, yielding the


parents of tom, who are the grandparents of marc.
Thus the top-down computation generates all the ancestors
of marc using the recursive rule.
 The binding has been passed from the rst argument in the
head to the rst argument in the body of the recursive pred-
icate. The computation causes the instantiation of variables
X and XP, while Y and YP remain unbound.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.5.2 The Magic Sets Method {1{
CZ

The Same-Generation Example


People are of the same generation if their parents are of the
same generation
?sg(marc; Who):
sg(X; Y) parent(XP; X); sg(XP; YP); parent(YP; Y):
sg(A; A):

This program cannot be computed in a bottom-up fashion because


the exit rule is not safe.
Even if we make it safe by adding a goal such as people(A), it
is inecient to compute in a bottom-up fashion since all same-
generation pairs are produced, while we only want those that have
marc as their rst component.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {17{
CZ

Three-Way Join

p1(X; Y); p2(Y; Z); p3(Y; W)

The basic nested-loop join:

Loop 1: for each tuple in p1 do


Loop 2: for each tuple in p2 (joining with p1) do
Loop 3: for each tuple in p3
(joining with p1 and p2) do
return the computed tuple
end Loop 3
end Loop 2
end Loop 1.

But if p3(Y; W) fails on rst (i.e., step (i) fails) there is no


point in going back to Loop 2, since only a new value of Y
can make it succeed!
methos mgu nomore

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {16{
CZ

Existential Variables: Optimization


Existential Variables.

p(X) q1(X; Y); q2(Y; Z); :q3(W)

If q2(Y; Z) succeeds or fails for certain value of Y, there is no need


to nd all the other values of Y.
Same for W. Y and W are existential variables.
A tuple a tuple-oriented model of computation:
(i) Get- rst tuple in relation (joining with the pre-
vious tuples, if any)
(ii) Get-next of same, and repeat this step till nomore
such tuples

For q1(X; Y) both steps must me performed.


For q3(W) and q2($Y; Z) only step (i) is executed.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {15{
CZ

Optimization|cont.
Ideally, the cost/bene ts of di erent recursive methods should be
quanti ed and compared. But quanti cation is often expensive
and and prediction is unreliable.
In practice, therefore, only very coarse criteria are given|e.g., use
certain goals as chain goals in the SIP.
Even for nonrecursive rules, full cost-based optimization is prob-
lematic (many goals deeply stacked). Heuristics approaches are
used instead. E.g., Glue/Nail! uses the following Heuristic: Do
First goals with more bound argument; and between two
goals with the same number of bound arguments, select those
which have fewer unbound arguments.
Following the order of goals speci ed by the user| in LDL++
and Prolog.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {14{
CZ

Optimization
In relational databases there are two kinds of optimizations

1. Greedy optimization: whenever a technique is applicable


apply it. E.g., always push selection and projection into rela-
tional expressions. Computationally this is not very expensive
2. Cost Based Optimization: evaluate alternatives and predict
the cost. Then choose the least-expected-cost solution. This
is done for choosing a join order. Basically exponential in the
number of joins being evaluated.

Deductive Database prototypes follow mostly the rst approach.


E.g., in chosing the method for recursion.

1. The binding passing property is tested, and if satis ed


2. The applicability of the following methods are considered in
the order shown:
1. left- right-linear rules
[1.5 Counting Method]
2. magic or supplementary magic sets
[2.5 Generalized magic sets methods]

Most systems will not bother with [1.5] or [2.5].


Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {13{
CZ

Generalizations
 Unique binding property. Relaxing this assumption does
not require major modi cations or extensions
 No Sideway Information Passing (SIP) between recursive
goals: only goals from lower strata can be used as chain goals
This assumption can be removed yielding the Generalized
Magic Set method.
The programs produced by this extension tend to be complex
and inecient to execute.
 In the CORAL system, not all the variables are required to
be instantiated after a goal executes.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {12{
CZ

Trivial Second Phase{Cont


After dropping the recursive rule can be dropped along with the
condition in the rst argument of the query goal, we obtain:

m:anc(tom):
m:anc(Mid) m:anc(Old); parent(Old; Mid):
anc0(Old; Young) m:anc(Old); father(Old; Young):
?anc0( ; Young):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {11{
CZ

Trivial Second Phase


The descendants of tom with right-linear rules:

?anc(tom; Desc):
anc(Old; Young) father(Old; Young):
anc(Old; Young) parent(Old; Mid); anc(Mid; Young):

Magic-set rewriting:

m:anc(tom):
m:anc(Mid) m:anc(Old); parent(Old; Mid):
anc0(Old; Young) m:anc(Old); father(Old; Young):
anc0(Old; Young) parent(Old; Mid); anc0(Mid; Young);
m:anc0(Old):
?anc (tom; Young):
0

Observe that the recursive rule just copies the value of Young
generated by the exity rule, from the tail to the head. This value
of Y oung is returned as an answer if, after a few iterations, Old =
tom. But that is always true since this rule basically re-visits the
magic-set computation.
Thus the recursive rule can be dropped along with the con-
dition in the rst argument of the query goal, yielding:

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {10{
CZ

Selecting a Method for Recursion


Using the rewriting for the magic sets method, which can then be
used as the basis for other methods. The magic sets method can
also be used as the basis for detecting and handling the special
cases of left-linear and right-linear rules.
For instance,

?anc(tom; Desc):
anc(Old; Young) parent(Old; Young):
anc(Old; Young) anc(Old; Mid); parent(Mid; Young):

If we write the magic rules for Example we obtain:

m:anc(tom):
m:anc(Old) m:anc(Old):

The recursive magic rule above is trivial and can be eliminated


(trivial rst phase).
The magic relation anc now contains only the value tom, rather
than appending the magic predicate goal to the original rules, we
can substitute this value directly into the rules.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {9{
CZ

Binding Passing Property


Algorithm for Binding Passing Analysis

1. Initially A = fq g, with q the initial goal, where q is a


recursive predicate and is not a totally free adornment.
2. For each h 2 A, pass the binding to the heads of rules de n-
ing q .
3. For each recursive rule, determine the adornments of its re-
cursive goals (i.e., of q or predicates mutually recursive with
q ).
4. If the last step generated adornments not currently in A, add
them to A and resume from step 2. Otherwise halt.

The calling goal g is said to have the

1. binding passing property when A does not contain any re-


cursive predicate with totally free adornment.
2. Unique binding passing property: if binding passing prop-
erty holds and A contains one pattern for each recursive pred-
icate.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {8{
CZ

Safety{Cont
The basic idea behind the notion of chain goals is that the bind-
ing in the head will have to reduce the search space. Any goal
that is called with all its adornment free will not be bene cial in
that respect. Also, there is no sideway information passing (SIP)
between two recursive goals; bindings come only from the head
through nonrecursive goals.
If q is not a recursive predicate, then safety is determined as
previously described.
If q is a recursive goal, then it belongs to a lower stratum; there-
fore, safety can be determined independently using the techniques
described here for recursive predicates.
Since we have a nite number of strata the process soon termi-
nates.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {7{
CZ

Binding passing analysis for recursive


predicates
The Same-Generation query

?stsgbbf
ZZ
ZZ~
' $
?
stsgbbf
parentbf ; >bb ; =fb ;

& %
bornbf ; bornbb ;
stsgbbf

Only chain goals are used in the top-down propagation. An


adorned goal q in a recursive rule r is called a chain goal when:

1. SIP independence of recursive goals: q is not a recursive


goal (i.e., not the same predicate as that in the head of r,
nor a predicate mutually recursive with q ; however, recursive
predicates of lower strata can be used as chain goals).
2. Selectivity: q has some argument bound (according to the
bound variables in the head of r and the chain goals to the
left of q ).
3. Safety: q is a safe goal.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {6{
CZ

Recursive Predicates
The treatment of recursive predicates is somewhat more complex
because a choice of recursive methods must be performed along
with the binding passing analysis.
The simplest case occurs when the goal calling a recursive pred-
icate has no bound argument. The recursive predicate, say p,
and all the predicates that are mutually recursive with it, will be
computed in a single di erential xpoint.
The construction of the rule graph for a recursive rule is the same
as for a non-recursive one.

1. The head of the rule is assumed to have no bound argument,


and
2. Safety analysis is performed by treating the recursive goals
(i.e., p and predicates mutually recursive with it) as safe a
priori|in fact, they are bound to the values computed in the
previous step.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {5{
CZ

Computation of Safe Nonrecursive


Programs:
Let us have a safe rgg (P ) tree.
Every non-leaf node (a goal) which with bound adornments is
basically computed in two phases.

1. In the rst phase, the bound values of a goal's arguments are


passed to its de ning rules, i.e., its children in the rule-goal
graph.
2. In the second phase, the goal receives the values of the f -
adorned arguments from its children.

Only the second computation takes place for goals without bound
arguments.
The computation of the heads of the rules follows the computation
of all the goals in the body. Thus, we have a strict strati cation
where predicates are computed according to the postorder traver-
sal of the rule-goal graph.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {4{
CZ

Safety
Safe a-priori:
1. For instance, base predicates are safe for every adornment.
Thus, partfff is safe.
2. The pattern bb is safe for  denoting any comparison oper-
ator, such as  or >.
3. Moreover, there is the special case of =bf or =fb where the
free argument consists of only one variable; in either case the
arithmetic expression in the bound argument can be com-
puted and the resulting value can be assigned to the free
variable.
(These are the basic patterns: a be more sophisticated com-
piler could solve more equations and accept other patterns as
safe)

Inductive De nition for Safety: Let P be a program with rule-


goal graph rgg (P ), where rgg (P ) is a tree (DAGs can be reduced
to trees):
Then P is safe if the following two conditions hold:

1. Every leaf node of rgg (P ) is safe a priori, and


2. Every variable in every rule in rgg (P ) is bound after the last
goal.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {3{
CZ

Uni cation of Goal and Head: example


Goal ?g , with: g = p(f (X1 ); Y1; Z1; a)
Rule: r : p(X2 ; g (X2 ; Y2); Y2 ; W2) : : :.
Thus: h(r) : p(X2 ; g (X2 ; Y2); Y2 ; W2) (If g and h(r) had variables
in common, then a renaming step would be required here.)
A most general uni er for g and h(r) is:
= fX2=f (X1 ); Y1 =g (f (X1 ); Y2); Z1 =Y2; W2=ag;
Yielding:

g = h(r) = h(r) = p(f (X1 ); g (f (X1 ); Y2); Y2 ; a)

If the adorned goal is pbff b: variables in the rst argument of the


head (i.e., X1 ) are bound. The resulting adorned head is pbf f b,
and there is an edge from pbf f b to pbf f b .
If the adorned goal is pfbf b: all the variables in the second argu-
ment of the head (i.e., X1 ; Y2) are bound. Then the remaining
arguments of the head are bound as well. In this case, there is an
edge from the adorned goal pf bf b to the adorned head pbbbb .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {2{
CZ

Rule-Goal graph for a Nonrecursive P


The graph depicts all possible top-down, left-to-right executions.
Construction of the rule-goal graph rgg(P ) for a non-
recursive program P .
1. Initial step: The query goal is adorned according to the
constants and deferred constants (i.e., the variables pre-
ceded by $), and becomes the root of rgg (P ).
2. Bindings passing from goals to rule heads: If the calling goal
g uni es with the head of the rule r, with mgu , then we
draw an edge (labeled with the name of the rule, i.e., r)
from the adorned calling goal to the adorned head, where
the adornments for h(r) are computed as follows: (i) all
arguments bound in g are marked bound in h(r) ; (ii) all
variables in such arguments are also marked bound; and
(iii) the arguments in h(r) that contain only constants
or variables marked bound in (ii) are adorned b, while
the others are adorned f .
3. Left-to-right passing of bindings to goals:
A variable X is bound after the nth goal in a rule, if X
is among the bound head variables (as for the last step),
or if X appears in one of the goals of the rule preceding
the nth goal.
The (n + 1)th goal of the rule is adorned on the basis of
the variables that are bound after the nth goal.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.6 Compilation and Optimization {1{
CZ

Rule-Goal Graph
 The graph has as nodes rules with adorned predicate names.
 The adornment of the predicate is the subscript that denotes
bound/free argument.
E.g. The rule-goal graph for the Flat Parts Example and
query: ?part weight(Part; Weight).

part weightff
ll
r1 llr2
ll
part weightff part weightff
partfff partfff; area
.% .@ bf; =fb
r3% @@r4
%
%% @@
area bf areabf
=fb =fb

The Flat Parts Example:


r1 : part weight(No; Kilos) part(No; ; actualkg(Kilos)):
r2 : part weight(No; Kilos) part(No; Shape; unitkg(K));
area(Shape; Area);
Kilos = K  Area:

r3 : area(circle(Dmtr); A) A = Dmtr  Dmtr  3:14=4:


r4 : area(rectangle(Base; Height); A) A = Base  Height:

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.7 Recursive Queries SQL {5{
CZ

Recursion in SQL and Datalog


Take the similar query:
SELECT *
FROM all subparts
WHERE Minor = 'top tube'

expressed against the virtual view of


CREATE RECURSIVE view all subparts(Major, Minor) AS
SELECT PART SUBPART
FROM assembly
UNION
SELECT all.Major assb.SUBPART
FROM all subparts all, assembly assb
WHERE all.Minor= assb.PART
Here, the addition of the condition Minor = 'top tube' to the
recursive select would not produce an equivalent query.
Instead, the SQL compiler must transform the original recursive
select into its right-linear equivalent before the condition Minor =
'top tube' can be attached to the WHERE clause.
In general the compilation techniques usable for such transforma-
tions are basically those previously described for Datalog.
Also strati cation w.r.t. negation and aggregates is required in
the proposed SQL3 standards.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.7 Recursive Queries SQL {4{
CZ

Left-Linear and Right-Linear Recursion


Find the parts using top tube
WITH RECURSIVE all super(Major, Minor) AS
( SELECT PART, SUBPART
FROM assembly
UNION
SELECT assb.PART, all.Minor
FROM assembly assb, all super all
WHERE assb.SUBPART = all.Major )
SELECT *
WHERE Minor = 'top tube'
This can be supported by simply adding the condition Minor
= 'top tube', to the WHERE clauses in the exit select and the
recursive select, yielding:
WITH RECURSIVE all super(Major, Minor) AS
( SELECT PART, SUBPART
FROM assembly
WHERE SUBPART = 'top tube'
UNION
SELECT assb.PART, all.Minor
FROM assembly assb, all super all
WHERE assb.SUBPART = all.Major
AND all.Minor = 'top tube')
SELECT *

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.7 Recursive Queries SQL {3{
CZ

Implementation of Recursive SQL Queries


All Parts/Subparts: transitive closure in SQL3:

CREATE RECURSIVE view all subparts(Major, Minor) AS


SELECT PART SUBPART
FROM assembly
UNION
SELECT all.Major assb.SUBPART
FROM all subparts all, assembly assb
WHERE all.Minor= assb.PART

To implement the di erential xpoint improvement one only need


to replace the recursive relation all subparts in the FROM clause
by  all subparts, where  all subparts contains the new tuples gen-
erated in the previous iteration of di erential xpoint Algorithm.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.7 Recursive Queries SQL {2{
CZ

The WITH Construct


Since all subparts is a virtual view, an actual query on this view
is needed to materialize the recursive relation or portions thereof.
Materialization of the recursive view from the previous Ex-
ample
SELECT *
FROM all subparts

The WITH construct provides another way, and a more direct one,
to express recursion in SQL3.
Find the parts using top tube
WITH RECURSIVE all super(Major, Minor) AS
( SELECT PART, SUBPART
FROM assembly
UNION
SELECT assb.PART, all.Minor
FROM assembly assb, all super all
WHERE assb.SUBPART = all.Major
)
SELECT *
WHERE Minor = 'top tube'

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 9.7 Recursive Queries SQL {1{
CZ

New SQL3 Standards


Relational tables for a BoM application
part cost
BASIC PART SUPPLIER COST TIME
top tube cinelli 20.00 14 assembly
top tube columbus 15.00 6 PART SUBPART QTY
down tube columbus 10.00 6 bike frame 1
head tube cinelli 20.00 14 bike wheel 2
head tube columbus 15.00 6 frame top tube 1
seat mast cinelli 20.00 6 frame down tube 1
seat mast cinelli 15.00 14 frame head tube 1
seat stay cinelli 15.00 14 frame seat mast 1
seat stay columbus 10.00 6 frame seat stay 2
chain stay columbus 10.00 6 frame chain stay 2
fork cinelli 40.00 14 frame fork 1
fork columbus 30.00 6 wheel spoke 36
spoke campagnolo 0.60 15 wheel nipple 36
nipple mavic 0.10 3 wheel rim 1
hub campagnolo 31.00 5 wheel hub 1
hub suntour 18.00 14 wheel tire 1
rim mavic 50.00 3
rim araya 70.00 1

All Parts/Subparts: transitive closure in SQL3:


CREATE RECURSIVE view all subparts(Major, Minor) AS
SELECT PART SUBPART
FROM assembly
UNION
SELECT all.Major assb.SUBPART
FROM all subparts all, assembly assb
WHERE all.Minor= assb.PART
This is often called a recursive union. We will say that we have
the union of an Exit Select and a Recursive Select.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {8{
CZ

Stable Model Characterization


Assume now that N is kept constant to a certain M = BP M
throughout the computation. Then,
Theorem: Let P be a logic program with Herbrand base BP
and M = BP M . Then, M is a stable model for P i

"!
P (M ) (;) = M
This theorem can be used to check whether an interpretation I is
a stable model without having rst to construct groundP (I )|the
two computations are in fact identical.

 Furthermore, the computation of the !-power of the positive


consequence operator has polynomial data complexity.
 Thus, checking whether a given model is stable can be done
in polynomial time.
 However, deciding whether a given program has a stable
model is, in general, NP -complete; thus, nding any such
model is NP -hard.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {7{
CZ

ICOs with Negated Goals


A modi ed version of the immediate consequence operator (ICO):
With r being a rule of P , let h(r) denote the head of r, gp(r)
denote the set of positive goals of r, and gn(r) denote the set of
negated goals of r without their negation sign.
For instance, if r : a b; :c; :d:, then h(r) = a, gp(r) = fbg,
and gn(r) = fc; dg.

De nition Let P be a program and I  BP . Then


the explicit negation ICO for P under a set of negative
assumptions N  BP is de ned as follows:
P (N ) (I ) = fh(r) j r 2 ground(P ); gp(r)  I; gn(r)  N g

The implicit negation ICO of P , TP , is de ned as follows:

TP (I ) = P (I ) (I ); where I = BP I

can also be viewed as a two-place function (on I and N ). For


instance to compute TP"! (;) we will set N = I at each step.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {6{
CZ

Multiple Models
A program can have several stable models.

p :q
q :p
This has two stable models: M1 = fpg and M2 = fq g.
With multiple models, one needs to decide what the intented
sematnics is: n all models, or nd one?
We take the second interpretation, which leads to the concept of
NonDeterminism.
Strati ed Programs, however, always have a unique stable model.
Strati cation is easy to check from the structure of the program|
independent of the database.
Thus strati ed programs are well-suited for implementation.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {5{
CZ

Stable Models{cont.
Every stable model for P is a minimal model for P and a minimal
xpoint for TP , however minimal models or minimal xpoints
need not be stable models:
M = fag is the only model and xpoint for this pro-
gram
r1 : a :a:
r2 : a a:

A program can have zero stable models, one stable model or


several stable models.
The previous program has no stable model and the barber exam-
ple has no stable model.
However the barber program has a unique stable model after we
eliminate the fact villager(barber).
Thus, the existence of a stable model for a program might depend
on the database. Given a negative Datalog program P , deciding
whether this has a stable model is NP -complete.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {4{
CZ

Stable Models
Programs that have Stable Models avoid self-contradictions
Stability Transformation. Let P a program and I  BP be
an interpretation of P . Then groundM (P ) denote the program
obtained from ground(P ) by the following transformation:

1. remove every rule having as a goals some literal :q with q 2 I

2. remove all negated goals from the remaining rules.

Example: P = ground(P )
p :q
q :p
Stable Models: Let P be a program with model M . M is
said to be a stable model for P , when M is the least model
of groundM (P ).
groundM (P ) is a positive program, by construction: so, its least
model is T "! (;), where T denotes the immediate consequence
operator for groundM (P ).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {3{
CZ

Paradoxes and Contradictions


In the village, the barber shaves everyone who does not shave
himself: Every villager, who does not shave himself,
is shaved by the barber

shaves(barber; X) villager(X); shaves(X; X): :


shaves(miller; miller):
villager(miller):
villager(smith):
villager(barber):

There is no problem with villager(miller), who shaves him-


self, and therefore does not satis es the body of the rst rule.
For villager(smith), given that shaves(smith; smith) is not
in our program, we can assume that :shaves(smith; smith);
then, shaves(barber; smith) is derived that is consistent with
with the negative assumptions made.
For villager(barber): under the assumption :shaves(barber; barber)
the rule yields shaves(barber; barber) which contradicts the
initial assumption.
If we do not initially assume :shaves(barber; barber), then we
cannot derive shaves(barber; barber) using this program and
by the CWA, we will have to assume :shaves(barber; barber),
and end-up with a contradiction.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {2{
CZ

Open World and Closed World

 Open World: what is not part of the database or the program


is assumed to be unknown.
 Closed World: what is not part of the database or the pro-
gram is assumed to be false.
Databases and other information systems adopt the Closed World
Assumption (CWA).
If p is a base predicate with n arguments, then :p(a1; : : : ; an)
i p(a1; : : : ; an) is not true, i.e., it is not in the fact base.
Unique name axiom: no two constants in the database stand for
the same semantic object.
Example: The absence of coolguy(\Clark Kent") database im-
plies that :coolguy(\Clark Kent"), even though the database
contains a fact coolguy(\Super Man").
For positive programs, the CWA is as follows: Let P be a positive
program, then each atom a 2 BP :
1. a is true i a 2 TP"! (;)
2. :a is true i a 2= TP"! (;).
However the CWA for general programs (i.e., programs with negated
goals) might lead to inconsistencies.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.1 DB Updates & NonMonotonic Reasoning {1{
CZ

Beyond Strati ed Negation

 Finding classes of programs that are more powerful than the


ones with strati ed negation and set aggregates is a research
topic.
 The problem (at least in terms of xpoint theory) is due to
the non-monotonic nature of the implicit negation used in
DBs and AI.
 Implicit negation describes the situation where negation is
inferred from the absence of the opposite conclusion, under
the closed-world assumption.
 Nonmonotonic reasoning, and knowledge representation, is a
well-established research topic in AI. The concept of circum-
scription was followed by concepts such as default theories
and auto-epistemic logic; the concept of stable models is re-
cent.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {9{
CZ
Well-Founded Models and Locally
Strati ed Programs
Strati ed and locally strati ed programs always have a well-founded
model (and therefore a unique stable model) that can be com-
puted using the alternating xpoint procedure:
Theorem: Let P be a program that is strati ed or locally
strati ed. Then P has a well-founded model.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {8{
CZ
Partial Well-Founded Models
p :q
q :p
Here:
SP (;) = fp; qg
AP (;) = SP (SP (;)) = SP (fp; qg) = ;
;  SP (A"Pk (;))
Since the overestimates and underestimates never converge, this
program does not have a (total) well-founded model.
Indeed, this program has two stable models.
There is also the concept of partial well-founded model, de ned
as having as negated atoms M = lfp(AP ) and positive atoms
M + = BP SP (M ); thus, the atoms in BP (M + [ M )
are unde ned in the partial well-founded model, while this set is
empty in the total well-founded model.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {7{
CZ
Total Well-founded Model
De nition: Let P be a program and W be the least xpoint
for AP . If SP (W ) = W , then BP W is called the well-founded
model for P .

Now, BP SP (M ) = BP M = M .
But, BP SP (M ) = "P!M (;).
( )

Theorem: Let P be a program with well-founded model M .


Then M is a stable model for P , and P has no other stable
model.
The fact that M is a stable model was proven above.
If N is another stable model, then N is also a xpoint for AP ; in
fact, N  M , since M is the least xpoint of AP . Thus, N  M ;
but, N  M cannot hold, since M is a stable model, and every
stable model is a minimal model. Thus N = M .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {6{
CZ
Alternating Fixpoint Computation
 The least xpoint lfp(AP ) can be computed by (possibly
trans nite) applications of AP .
 Every application of AP in fact consists of two applications
of SP .
 Since A"Pn (;)  A"Pn(;), the even powers of SP
1

A"Pn(;) = SP" n(;) 2

de ne an ascending chain.
 The odd powers of SP
SP (A"Pn(;)) = SP" n (;)
2 +1

de ne a descending chain.
 Every element of the descending chain is  than every ele-
ment of the ascending chain.
 Thus, have an increasing chain of underestimates dominated
by a decreasing chain of overestimates.
 If the two chains ever meet, they de ne the (total) well-
founded model for the program.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {5{
CZ
Fixpoints for SP and AP
Since it is monotonic, AP has a least xpoint lfp(AP ) (by Knaster-
Tarski's theorem)
AP might have several xpoints:
Let (M; M ) be a dichotomy of BP .
LEMMA 1: Then, M is a stable model for P i M is a xpoint
for SP .
Proof: M is a stable model for P i "P!(M )(;) = M .
This equality holds i

BP "! (;) = B M =M
P (M ) P

i.e, i SP (M ) = M .
LEMMA 2: If M is a stable model for P then M is a xpoint of
AP .
Proof: Every xpoint for SP is also a xpoint for AP .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {4{
CZ
Well-Founded Models
Much research work has been devoted to nding general approaches
for the ecient computation of nonstrati ed programs. The con-
cept of well-founded models represents a milestone in this e ort.
lfp(TP ) = TP"! , the linchpin of bottom-up semantics and com-
putation in the presence of negation that is nonmonotonic. Nut
let's nd a related operator that is monotonic ...

 "P!N (;) is monotonic in N .


( )

 Let us de ne:
SP (N ) = BP "P!N (;) ( )

This is antimonotonic in N : SP (N 0)  SP (N ) for N 0  N ).


 The composition of an even number of applications of SP
yields a monotonic mapping.
 The composition of an odd number of applications yields an
antimonotonic mapping.
 Let us de ne:
AP (N ) = SP (SP (N ))
which is monotonic in N .

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {3{
CZ
Local Strati cation: cont.
Theorem: Every locally strati ed program has a stable model
that is equal to the result of the iterated xpoint computation
(on ground(P )).
Proof: same as for strati ed programs.
Local strati cation, however, behaves unlike regular strati cation
from the viewpoints of computation and implementation. A pro-
gram P normally contains a small number of predicate names.
Thus, it is easy to check for strong components with negated arcs
in pdg (P ) and to determine the strata needed for the iterated
xpoint computation.
However, the question of whether a given program can be locally
strati ed is undecidable, when the Herbrand base of the program
is in nite.
Even when the universe is nite, the existence of a stable model
cannot be checked at compile-time: it often depends on the database
content.
In the Barber example, the existence of a local strati cation (and
of a stable model0 depends on whether villager(barber) is in
the database.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {2{
CZ
Locally Strati ed Programs
Local strati cation. A program P is locally strati able i
BP can be partitioned into a (possibly in nite) set of strata
S ; S ; : : :, such that the following property holds: For each rule
0 1

r in ground(P ) and each atom g in the body of r, if h(r) and g


are, respectively, in strata Si and Sj , then

(i) i  j if g 2 pg (r), and


(ii) i > j if g 2 ng (r).

A locally strati ed program de ning integers


even(0):
even(s(J)) :even(J):
Local strati cation: feven(0)g = S0, feven(s(0))g = S1, and
so on.
This alternative de nition of integers is not locally strati ed (Home-
work: prove it!!)
A program that is not locally strati ed
even(0):
even(J) :even(s(J)):
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.2 NonMonotonic Reasononing {1{
CZ
Strati cation and Stable Models
Theorem: Let P be a strati ed program. Then P has a
stable model that is equal to the result of the iterated xpoint
procedure.

1. Let  be a strati cation for P , and let M be the result of the


iterated xpoint computation on P according to .
2. The iterated xpoint computation on ground(P ) according
to  also yields M
3. Let r 2 ground(P ) be a rule used in iterated xpoint com-
putation: say that h(r) belongs to a stratum i.
If :g is a goal of r, then the predicate name of g belongs to
a stratum lower than i
Let r0 be r without the negated goals such as :g .
4. If r was used in the iterated- xpoint computation of M , then
:g during the computation of stratum i. Thus g 2= M ,
since only atoms belonging to higher strata are produced af-
ter the the computation for stratum i. Thus, with Pgr =
groundM (P ) r0 2 Pgr. Thus every rule used in computing
M is also in groundM (P ) = Pgr. Thus lfp(Pgr)  M .
5. But, since the iterated xpoint produces a least xpoint for
TP , TP (M ) = M . But TP gr (M ) = TP (M ) = M .
Thus lfp(Pgr) = M

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {6{
CZ

Temporal Reasoning with Datalog1S


Every query expressed in PLTL can also be expressed in proposi-
tional Datalog1S (i.e., Datalog with only the temporal argument).
For instance, the previous query can be turned into the query
?pair to newcstl where

pair to newcstl newcstl(J) ^ newcstl(J + 1) :

Express p U q : p must be true at each instant in history, until the


rst state in which q is true. Use recursion to reason back in time
and identify all states in history that precede the rst occurrence
of q.

post q(J + 1) q(J):


post q(J + 1) post q(J):
first q(J) :
q(J); post q(J):
pre first q(J) first q(J + 1):
pre first q(J) pre first q(J + 1):
fail p Until q :
pre first q(J); p(J):
p Until q pre q(0); :
fail p Until q:

A similar approach can be used to express other operators of


temporal logic. For instance, p B q can be de ned using the
previous predicates Example and the rule
p Before q p(J); pre first q(J):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {5{
CZ

Other Operators
For instance, the fact that will never be true can simply be
de ned as :F q .
q

The fact that q is always true is simply described as :F (:q ));


the notation G q is often used to denote that q is always true.
The operator p before q , denoted pBq can be de ned as :((:p) U q )|
that is, it is not true that p is false until q .
PLTL nds many applications, including temporal queries and
proving properties of dynamic systems. For instance, the question
\Is there a train to Newcastle that is followed by another one hour
later?" can be expressed by the following query:

?F (newcstl ^ newcstl)

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {4{
CZ

Temporal Operators
In addition to the usual propositional operators _ ^, and :,
;

PLTL o ers the following operators:

2. Next: Next p, denoted p, is true in history H , when p


holds in history H1 = (S1 ; S2; : : :).
Therefore, np, n  0, denotes that p is true in history
(Sn ; Sn+1; : : :).
For instance,

8 newcastl ^ 9 : newcastl

is true since there is a train at 8 and no train at 9.


3. Eventually: Eventually q , denoted F q , holds when, for some
n, q .
n

4. Until: p until q , denoted p U q , holds if, for some n, nq ,


and for every state k < n, kp.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {3{
CZ

Propositional Linear Temporal Logic


(PLTL).
PLTL is based on the notion that there is a succession of states
H = (S0 ; S1 ; : : :), called a history.

For instance, Trains to Newcastle can be modeled by a predicate


newcstl that holds true in the following states: S8 ; S10 ; S12 ; S14 ;

S16 ; S18 ; S20 ; S22 , and it is false everywhere else.

Modal operators are used to de ne in which states a predicate p


holds true.

1. Atoms: Let p be an atomic propositional predicate. Then p


is said to hold in history H when p holds in H 's initial state
S0 .

For instance, :newcastl is true in our example since it is true in


S O

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {2{
CZ

Recurring Schedules
Trains for Newcastle leave daily at 800 hours and then every
two hours until 2200 hours (military time)
before22(22):
before22(H) before22(H + 1):
leaves(8; newcastle):
leaves(T + 2; newcastle) leaves(T; newcastle);
before22(T):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec. 10.3 Temporal Reasoning {1{
CZ

Datalog1s
Discrete time, can be modeled using Datalog1S.
The discrete temporal domain consists of terms built using the
constant 0 and the unary function symbol +1 (written in post x
notation). For the sake of simplicity, we will write n for

z
n times
}| {
(: : : ((0 +1) + 1) : : : + 1)

if T is a variable in the temporal domain, then T , T + 1, and


T + n are valid temporal terms, where T + n denotes

z
n times
}| {
(: : : ((T +1) + 1) : : : + 1)

The endless succession of seasons

quarter(0; winter):
quarter(T + 1; spring) quarter(T; winter):
quarter(T + 1; summer) quarter(T; spring):
quarter(T + 1; fall) quarter(T; summer):
quarter(T + 1; winter) quarter(T; fall):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {8{
CZ

Temporal Projection{auxiliary predicates

distinct(Frm1; To1; Frm2; To2) To1 6= To2:


distinct(Frm1; To1; Frm2; To2) Frm1 6= Frm2:
select larger(X; Y; X) X  Y:
select larger(X; Y; Y) Y > X:

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {7{
CZ

All Classical Algorithms Can be Expressed


as XY-strati ed programs.
A simple example: Temporal Projection.

emp dep sal(1001; shoe; 35000; 19920101; 19940101):


emp dep sal(1001; shoe; 36500; 19940101; 19960101):
represent two tuples from this relation.
Merging overlapping periods into maximal periods after a
temporal projection
e hist(0; Eno; Frm; To) emp dep sal(0; Eno; D; S; Frm; To):
overlap(J + 1; Eno; Frm1; To1; Frm2; To2)
e hist(J; Eno; Frm1; To1);
e hist(J; Eno; Frm2; To2);
Frm1  Frm2; Frm2  To1;
distinct(Frm1; To1; Frm2; To2):
e hist(J; Eno; Frm1; To) overlap(J; Eno; Frm1; To1; Frm2; To2);
select larger(To1; To2; To):
e hist(J + 1; Eno; Frm; To) e hist(J; Eno; Frm; To);
overlap(J + 1; ; ; ; ; );
:overlap(J + 1; Eno; Frm; To; ; );
:overlap(J + 1; Eno; ; ; Frm; To):

final e hist(Eno; Frm; To) e hist(J; Eno; Frm; To);


:e hist(J + 1; ; ; ):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {6{
CZ

Computing XY-strati ed Programs


Computing the well-founded model of an XY-strati ed pro-
gram P

Inititialize: Set = 0 and insert the fact counter(T).


T

Forever repeat the following two steps:


1. Apply the iterated xpoint computation to the syn-
chronized program Pbis, and for each recursive pred-
icate q, compute new q. Return the new q atoms
so computed, after adding a temporal argument T to
these atoms; the value of T is taken from counter(T).
2. For each recursive predicate q, replace old q with
new q, computed in the previous step. Then, replace
counter(T) with counter(T + 1).

 Copy rules
 When does the computation stop?

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {5{
CZ

XY-strati catied Programs


Let P be an XY-program. P is said to be XY-strati ed when
Pbis is a strati ed program.

The program of previous Example is strati ed with the follow-


ing strata: S0 = fparent; old all anc; old delta ancg,
S1 = fnew delta ancg, and S2 = fnew all ancg. Thus, the
program in Example ?? is locally strati ed.
Theorem: Let P be an XY-strati ed program. Then P is
locally strati ed.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {4{
CZ

The Old and the New

r1 : delta anc(0; marc):


r2 : delta anc(J + 1; Y) delta anc(J; X); parent(Y; X);
:all anc(J; Y):
r3 : all anc(J + 1; X) all anc(J; X):
r4 : all anc(J; X) delta anc(J; X):

bi-state program Pbis, computed as follows: For each r 2 P ,

1. Rename all the recursive predicates in r that have the same


temporal argument as the head of r with the distinguished
pre x new .
2. Rename all other occurrences of recursive predicates in r with
the distinguished pre x old .
3. Drop the temporal arguments from the recursive predicates.

The bi-state version for the previous program is:

new delta anc(marc):


new delta anc(Y) old delta anc(X); parent(Y; X);
:old all anc(Y):
new all anc(X) new delta anc(X):
new all anc(X) old all anc(X):

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {3{
CZ

X-rules and Y-rules

r1 : delta anc(0; marc):


r2 : delta anc(J + 1; Y) delta anc(J; X); parent(Y; X);
:all anc(J; Y):
r3 : all anc(J + 1; X) all anc(J; X):
r4 : all anc(J; X) delta anc(J; X):

XY-programs: Let be a set of rules de ning mutually re-


P

cursive predicates. Then we say that P is an XY-program if it


satis es the following conditions:

1. Every recursive predicate of P has a distinguished temporal


argument.
2. Every recursive rule r is either an X-rule or a Y-rule, where
 is an X-rule when the temporal argument in every re-
r

cursive predicate in r is the same variable (e.g., J ),


 r is a Y-rule when (i) the head of r has as temporal argu-
ment J + 1, where J denotes any variable, (ii) some goal
of r has as temporal argument J , and (iii) the remaining
recursive goals have either J or J + 1 as their temporal
arguments.

For instance, is an XY-program where r4 is an X-rule while r2


and r3 are Y-rules.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {2{
CZ

Strati cation by the temporal Argument


Ancestors of marc and the generation gap including the dif-
ferential xpoint improvement
r1 : delta anc(0; marc):
r2 : delta anc(J + 1; Y) delta anc(J; X); parent(Y; X);
:all anc(J; Y):
r3 : all anc(J + 1; X) all anc(J; X):
r4 : all anc(J; X) delta anc(J; X):

This program is locally strati ed by the rst argument in anc


that serves as temporal argument.
The zeroth stratum consists of atoms of nonrecursive predicates
such as parent and of atoms that unify with all anc(0; X) or
delta anc(0; X), where X can be any constant in the universe.

The kth stratum contains atoms all anc(k; X); delta anc(k; X).
Thus, this program is locally strati ed, since the heads of recursive
rules belong
to strata that are one above those of their goals. he kth stra-
tum

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec: 10.4 Beyond Strati cation {1{
CZ

XY-Strati cation
 Does a program have a well-founded model? In general, the
only way to answer this question is to search for such a model
(e.g., by the alternating xpoint.
 For strati ed programs, the answer is however easy to answer
at compile time, independent of the database
 XY-strati cation: is a particular class of locally strati ed pro-
grams for which we also have a simple compile-time check,
and an ecient implementation
 In fact, XY-strati ed programs are particular 1.s programs

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {6{
CZ

Axioms for a Correct History


Question. How do we de ne the correct behavior of a deductive
database with active rules?
Answer. By ensuring that its history satis es the following two
axioms:

1. Completeness Axiom. The history relations in A must be


identical to the history relations in the stable model of A.
2. External Causation Axiom. Let Aext be the logic program
obtained from A by eliminating from the history relations all
changes but the external changes requested by users. Then,
the stable model of Aext and the stable model of the original
A must be identical.
Thus, (1) every rule that is enabled must be triggered, and (2)
the externally requested events plus those triggered by the active
rules will produced the complete history.
We have considered the ideal situation, where the system can
complete the ring of all the active rules before the next user
request comes in. Then there is a natural de nition of what the
correct behavior of the system should be.
Concurrent requests: we can use the concept of serializability to
reduce those cases to this ideal situation.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {5{
CZ

Active Rules
A1 : If a student is added to the alumni relation, then delete
his name from the student relation, provided that this is
a senior-level student (otherwise error{using a rule not
shown here).
A2 : If a person takes a course, and the name of this person
is not in the student relation, then add that name to the
student relation, using the (null) value tba for Major
and Level.
Using the \immediately after" activation semantics (under an ea-
ger ring policy) these rules can be modeled as follows:
A1 : student hist(J + 1; ; Name; Major; senior)
alumni hist(J; +; Name; ; ; );
student snap(J; Name; Major; senior):
A2 : student hist(J + 1; +; Name; tba; tba)
took hist(J; +; Name; ; );
:student snap(J; Name; ; ):
An active logic program consists of
(1) the history relations,
(2) the change predicates,
(3) the snapshot predicates, and
(4) the active rules.
The program A so de ned is XY-strati ed; thus it has a unique
stable model M , which de nes the meaning of the program.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {4{
CZ

Traditional Queries and Deductive Rules


Snapshot predicates de ne the content of the database relations
at each instant J. For the student relation, for instance, we have
the following rules:
Snapshot predicates for student via frame axioms
;
student snap(J + 1 Name Major Level) ; ;
student snap(J Name Major Level) ; ; ; ;
:
student hist(J + 1 Name Major Level) ; ; ; ; :
;
student snap(J Name Major Level) ; ;
student hist(J + Name Major Level) ; ; ; ; :
These rules express what are commonly known as frame axioms.
The current content of each relation is the its snapshot at time J,
where J is the max value for the change counter:
The current content of the relation student
current state(J) change(J); :change(J + 1):
;
student(Name Major Year) ; ;
student snap(J Name Major Year) ; ; ;
current state(J) :

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {3{
CZ

Continuity Axioms
If the database consists of three relations:

; ;
student(Name Major Year); took(Name Course Grade) ; ;
; ;
alumni(Name Sex Degree ClassOf) ;
(the last relation stores the alumni who graduated from college in
the previous years.)
Then we need three rules to keep track of all changes (one rule
per relation in the schema):

change(J) student hist(J ; ; ; ; ):


change(J) took hist(J ; ; ; ; ):
change(J) alumni hist(J ; ; ; ; ; ):
A violation to the continuity axiom can be expressed as follows:

bad history change(J + 1) ; :change(J):

The temporal argument can only be increased by a new event,


and there is no hole in the sequence.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {2{
CZ

Estensional and Intensional Information


These terms are often used to denote, respectively, the database
facts, and the rules.
Here the history of each relation becomes the (only) extensional
information, the rest is intensional information. Then, each database
relation such as
student(0 Jim Black 0
, cs, junior)

must be de ned by rules from its history. e.g. a history of


changes for Jim Black:

student hist(2301,+, 'Jim Black', ee, freshman).


student hist(4007,-, 'Jim Black', ee, freshman).
student hist(4007,+, 'Jim Black', ee, sophomore).
student hist(4805,-, 'Jim Black', ee, sophomore).
student hist(4805,+, 'Jim Black', cs, sophomore).
student hist(6300,-, 'Jim Black', cs, sophomore).
student hist(6300,+, 'Jim Black', cs, junior).

The rst column is a change counter that is global for the system|
that is, it is incremented for each change request.
Several changes can be made in the same SQL update statement:
4; 007 2; 301 changes in one year.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.5 Updates and Active Rules {1{
CZ

Updates in Logic
 In general, logic-based systems have not dealt well with database
updates.
 For instance, Prolog resorts to its operational semantics to
give meaning to assert and retract operations. The result
is that many di erent operational semantics (more than 9)
have been implemented in various systems.
 Logic-based semantics for updates is also a major problem
faced by deductive database systems; however, these concen-
trate on changes in the base relations, rather than facts and
rules as in Prolog.
 Current DB prototypes feature a strained coexistence of declar-
ative and operational constructs: e.g., in GLUE/Nail! GLUE
is an operationa wrap around the declarative Nail!: same syn-
tax but not same semantics.
 Also, a DB system that supports updates and rules should
support active rules too ...

Desiderata:
1. providing a logical model for updates,
2. supporting the same queries that current deductive databases
do, and
3. supporting the same rules that active databases currently do.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {9{
CZ
Beyond Don't Care Non-Determinism
In many situations, we seek to satisfy a condition that holds or
does not hold depending on the choice made. Thus, we might want
to seek among the choice models one that satisfy the condition.
Alternatively, we might make a choice and then backtrack to the
next choice once we nd that the condition does not hold. Thus
an exponential computation is often required.
Hamiltonian path in a graph: A graph has a Hamiltonian
path i there is a simple path that visits all nodes exactly
once.

simplepath(root root) ; :
simplepath(X Y) ; simple path( X) g(X Y) ; ; ; ;
choice((X) (Y)) choice((Y) (X)) ; ; ; :
nonhppath ;
n(X) : simplepath( X) ; :
q ;
:q nonhppath :
If nonhppath is true in M , then rule q :q must also be
satis ed by M . Thus, M cannot be a stable model. Thus, this
program has a stable model i there exists a Hamiltonian path.
Thus, deciding whether a stable model exists for a program is
N P -hard.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {8{
CZ
DB-PTIME without assuming a total
order
Strati ed Datalog programs with choice are also DB-PTIME com-
plete, without having to assume that the universe is totally or-
dered (i.e., respecting the genericity assumption).
The following program de nes a total order for the elements of a
set d(X) by constructing an immediate-successor relation for its
elements (root is a distinguished new symbol):
Ordering a domain
ordered d(root root) ; :
ordered d(X Y) ; ordered d( X) d(Y) ; ; ;
choice((X) (Y)) choice((Y) (X)); ; ; :
Once an arc (X; Y) is generated, this is the only arc leaving the
source node X and the only arc entering the sink node Y.
Since we accept any choice model, we have don`t care non-determinism
and the computation remains polynomial.
Here it means that we accept any order.
For certain queries, we might still a deterministic result: e.g., in
the computation of aggregates which are commutative and asso-
ciative.
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {7{
CZ
Choice in Recursion
For instance, the following program computes the spanning tree,
starting from the source node a, for a graph where an arc from
node b to d is represented by the database fact g(b; d).
Computing a spanning tree
st(root a) ; :
st(X Y) ; st( X) ; ; g(X; Y); Y 6= a; choice((Y) (X)) ; :
The goal Y 6= a ensures that, in st, the end-node for the arc
produced by the exit rule has an in-degree of one;
likewise, the goal choice((Y); (X)) ensures that the end-nodes for
the arcs generated by the recursive rule have an in-degree of one.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {6{
CZ
Properties
In general, the program SV (P ) generated by the transformation
discussed above has the following properties:

 SV (P ) has one or more total stable models.


 The chosen atoms in each stable model of SV (P ) obey the
FDs de ned by the choice goals.

The stable models of SV (P ) are called choice models for P .


Strati ed Datalog programs with choice are in DB-PTIME: actu-
ally they can be implemented eciently by producing chosen
atoms one at a time and memorizing them in a table. The
diffchoice atoms need not be computed and stored; rather, the
goal :diffchoice can simply be checked dynamically against
the table chosen.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {5{
CZ
Stable Version, SV (P ), of a program P
For each choice rule r in P :
r:A B (Z ); choice((X1 ); (Y1)); : : : ; choice((X ); (Y )): k k

Let B (Z ) denotes the conjunction of all the choice goals of r that


are not choice goals, and
let X ; Y ; Z , 1  i  k, denote vectors of variables occurring
i i

in the body of r such that X \ Y = ; and X ; Y  Z .


i i i i

1. In P replace r with a rule r0 obtained by substituting the


choice goals with the atom chosen (W ): r

r0 : A B (Z ); chosen (W ): r

where W  Z is the list of all variables appearing in choice


goals, i.e., W = S1  X [ Y .
j k j j

2. Add the new rule chosen (W ) B (Z ); :diffChoice (W ):


r r

3. For each choice atom choice((X ); (Y )) (1  i  k), add the


i i

new rule
diffChoice (W ) chosen (W 0); Y 6= Y 0:
r r i i

where (i) the list of variables W 0 is derived from W by replac-


ing each A 2 Y with a new variable A0 2 Y 0 (i.e., by priming
i i

those vari ables), and (ii) Y 6= Y 0 is true if A 6= A0, for some


i i

variable A 2 Y and its primed counterpart A0 2 Y 0.


i i

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {4{
CZ
Choice by Negation
actual adv(S P) ; ;
student(S Mahor Yr) ; ; ;
professor(P Majr) ;
choice((S) (P)) ; :
The stable version for the adivisor rule
actual adv(S P) ; ; ;
student(S Majr Yr) professor(P Majr); ; ;
chosen(S P) ; :
chosen(S P) ; ; ; ;
student(S Majr Yr) professor(P Majr) ; ;
:diffChoice(S P) ; :
diffChoice(S P) ; ; ;
chosen(S P0) P 6= P0 :
This program has two stable models. One in which ohm is chosen
as advisor of Jim Black, and the other where bell is chosen
instead.
A program where the rules contain choice goals is called a choice
program.
The semantics of a choice program P can be de ned by transform-
ing P into a program with negation, SV (P ), called the stable
version of a choice program P .
SV (P ) exhibits a multiplicity of stable models, each obeying the
FDs de ned by the choice goals.
Each stable model for SV (P ) corresponds to an alternative set
of answers for P and is called a choice model for P .
Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved
Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {3{
CZ
Many Applications of Chocie
Given the two relations boy(Bname ), girl(Gname), Are
there more boys than girls in our database?

match(Bname Gname); ;
boy(Bname) girl(Gname) :
choice((Bname) (Gname)); ;
choice((Gname) (Bname)); :
matched boy(Bname) match(Bname Gname) ; :
moreboys ;
boy(Bname) :matched boy(Bname) :

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {2{
CZ
Choice Goals
Then, in a language such as LDL++ the goal choice((S); (P))
can be added to force the selection of a unique advisor, out of the
eligible advisors, for a student:
Computation/selection of unique advisors by choice
rules
actual adv(S P) ; ;
student(S Majr Levl) ; ;
; ;
professor(P Majr) choice((S) (P)) ; :
More declaratively, the goal choice((S); (P)) can also be viewed
as enforcing a functional dependency (FD) S ! P; thus, in actual adv,
the second column (professor name) is functionally dependent on
the rst one (student name).

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997
Sec 10.6 Nondeterministic Reasoning {1{
CZ
NonMontonicity and Nondeterminism
With relation student(Name; Majr; Year), our university database
contains the relation professor(Name; Majr). A toy database
with only the following facts:

student(0Jim Black0 ee senior); ; : professor(ohm ee) ; :


professor(bell ee) ; :

elig adv(S P) ; ;
student(S Majr Year) ; ; professor(P Majr); :

We obtain:

elig adv(0 Jim Black0 ohm) ; :


elig adv(0Jim Black0 bell) ; :

But, a student can only have one advisor.

Zaniolo|Ceri|Faloutsos|Snodgrass|Subrahmanian| Zicari|All Rights Reserved


Advanced Database Systems Morgan Kaufmann Copyright
c 1997

S-ar putea să vă placă și