Sunteți pe pagina 1din 40

Department of Computer Engineering

Web Data Management (2180713)

XML Evaluation Techniques : Structural


Join

Student’s Name with Enrollment No: Harsh Soni (140120107166)

Name of Faculty: Prof. Kiran Shah


Evaluation of Query

• For efficient evaluation of XML queries, & in


particular for tree pattern queries.
• Matching each of the binary structural
relationships against database.
• Stitching together these basic matches
Different ways of matching
structural relationships
• Tuple-at-a-time approach
➢ Tree traversal
➢ Using child & parent pointers

➢ Inefficient because complete pass through data

• Pointer based approach


➢ Maintain (Parent,Child) pairs & identifying
(ancestor,descendants) : Time complexity
➢ Maintain (ancestor,descendant) pairs : space

complexity
➢ Either case is infeasible
Solution: Set-at-a-time approach
Uses mechanism
➢ Positional representation of occurrences of XML
elements and string values
➢ Element
• 3 tuple (DocId, StartPos:EndPos, LevelNum)
➢ String
• 3 tuple (DocId, StartPos, LevelNum)
Positional Representation
Structural Joins
• Set-at-a-time approach
• Uses positional representation of XML
elements.
• I/O and CPU optimal
Structural Join
• Goal: join two lists based on either parent-child or
ancestor-descendant
• Input :
➢ AList (a.DocID, a.StartPos : a.EndPos, d.LevelNum)

➢ DList (d.DocID, d.StartPos : d.EndPos, d.LevelNum)

• Output can be sorted by


➢ Ancestor: (DocID, a.StartPos, d.StartPos), or
➢ Descendant: (DocID, d.StartPos, a.StartPos)
Two types of structural join algorithms

• Tree-merge Algorithm

• Stack-tree Algorithm
Algorithm Tree-Merge-Anc
• Output : ordered by ancestors
• Algorithm :
Loop through list of ancestors in increasing order of
startPos
➢ For each ancestor, skip over unmatchable descendants

➢ check for ancestor-descendant relationship

( or parent-child relationship )
➢ Append result to output list
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Case for Tree-Merge-Anc
Analysis
• Ancestor-descendent relationship
➢ |Output List| = O(|AList| * |DList|)
➢ Time complexity is optimal O( |AList| * |DList| )
➢ But poor I/O performance
• Parent-child relationship
➢ |Output List| = O(|AList| + |DList|)
➢ Time complexity is O (|AList| * |DList| )
Tree-Merge-Desc Algorithm
• Output : ordered by descendants
• Algorithm :
Loop over Descendants list in increasing order of startPos
➢ For each descendant, skip over unmatchable ancestors

➢ check for ancestor-descendant relationship

( or parent-child relationship )
➢ Append result to output list
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Case for Tree-Merge-Desc
Analysis
• Ancestor-descendent relationship
➢ |Output List| = O( |AList| * |DList| )
➢ Time Complexity :
O( |AList| * |DList| )
• Parent-child relationship
➢ |Output List| = O(|AList| + |DList|)
➢ Space and Time complexities are
O (|AList| * |DList| )
• Tree-Merge algorithms are not I/O optimal
• Repetitive accesses to Anc or Desc list
Motivation for Stack-Tree
Algorithm
• Basic idea: depth first traversal of XML tree
➢ takes linear time with stack of size equal to tree depth
➢ all ancestor-descendant relationships appear on stack
during traversal
• Main problem: do not want to traverse the whole
database, just nodes in Alist or Dlist
• Solution : Stack-Tree algorithm
➢ Stack: Sequence of nodes in Alist
Stack-Tree-Desc
• Initialize start pointers (a*, d*, s->top)
• While input lists are not empty and stack is not empty
➢ if new nodes (a* and d*) are not descendants of current

s->top, pop the stack


➢ else

if a* is ancestor of d*, push a* on stack


and increment a*
else
➔ compute output list for d* , by matching with all
nodes in current stack, in bottom-up order
➔ Increment d* to point to next node
Example of Stack-Tree-Desc
Execution

Alist : a1, a2, ...

DList : d1, d2, ...


Step 1
Step 2
Step 3
Stack-Tree-Desc Analysis
* Time complexity (for anc-desc and parent-child)
O(|AList| + |DList| + |OutputList|)

* I/O Complexity (for anc-desc and par-child)


O(|AList| / B + |DList| / B + |OutputList| / B)
➢ Where B is blocking factor
Stack-Tree-Anc
• Output ordered by ancestors
• Cannot use same algorithm, as in Stack-Tree-Desc
• Basic problem: results from a particular descendant
cannot be output immediately
➢ Later descendants may match earlier ancestor, hence
have to be output first
Stack-Tree-Anc
• Solution: keep lists of matching descendant nodes
with each stack node
➢ Self-list
Descendants that match this node
Add descendant node to self-lists of all matching ancestor
nodes
➢ Inherit list
Inherited from nodes already popped from stack, to be
output after self-list matches are output
Algorithm Stack-Tree-Anc
● Initialize start pointers (a*, d*, s->top)
● While the input lists are not empty and the stack is not empty
• if new nodes (a* and d*) are not descendants of current
s->top, pop the stack (p* = popped ancestor node)
➢ Append p* . inherit_list to p* . self_list

➢ Append resulting list to (s->top) . inherit_list

• else
➢ if a* is ancestor of d*, push a* on stack

and increment a*
➢ else

Append corresp. tuple to self list of all nodes in


stack
Increment d* to point to next node
Example of Stack -Tree-Anc
Alist : a1, a2, a3
Step 1 Step 2

Dlist : d1, d2
Step 3
Step 4
Step 5
• Final output is :
(a1 , d1) , (a1 , d2) , (a2 , d1) , (a3 , d2)
Stack-Tree-Anc Analysis
• Requires careful handling of lists.
• Time complexity (for anc-desc and parent-child relation)
O(|AList| + |DList| + |OutputList|)
• Careful buffer management needed.
Thank You !