Sunteți pe pagina 1din 1

Semantic XPath Query Transformation

Jeow Li Fook
jeowlifo@comp.nus.edu.sg

Introduction, Motivation and Related Works


XML is now a broadly accepted and used standard for the management and the interchange of data and information. With the growing amount XML data, we need an efficient way of querying them using XPath, XQuery, etc. In this project, we shall look at the semantic transformation of XPath enabled by integrity constraints inherent in XML schema. Past many works have focused on relational (RDB), object (OODB) and deductive databases (DDB). Among them is Kings [1]. He defined the notions of semantic equivalence between queries and those transformed with integrity constraint. These notions inspired many other works based on semantic query optimization (SQO), such as Chakravarthy [2]. In his dissertation, he presented the residue method in transforming queries for DDB. The method and idea presented there was later adapted to OODB by Grant et al in [3]. But as people started to design database for XML, we began to see optimization techniques for XML DB. Among them was [4], where Jason et al presented a cost-based SQO technique to optimized path traversal of XML data graph in the absence of schema. Subsequently as researchers realised the potential of XML schema, we began to see paper such as [5] where Bohm et al leveraged on the properties found in DTD to build a structure index to speed up query processing. Over time, as XPath gains greater acceptance, we also see paper such as [6] where Olteanu et al performed syntactic transformation on XPath expressions. And subsequently, in another paper [7], Kwong et al introduced a framework to optimized XPath with the help of sets of alternative but equivalent XPath expressions derived from properties found in DTD.

Enabled by constraint 4, Query 4a,


/company/department/staffList/perm [20001]
User Reponse Time
Query 2a 4500 Query 2b

, we will return an empty set without executing the query.

Constraint 5: Enumeration
<element name="age"><simpleType><restriction base="int"><minInclusive value="16"/><maxInclusive value="55"/></restriction></simpleType></element>

4000 3500 3000 Time 2500 2000 1500 1000

Enabled by constraint 5, Query 5a,


/company/department/staffList/contract[age>12 and age<35 and age>0 and age<100 and age>10 and age<36]

500 0 System 1 System 2 Systems System 3

, has its overlapping comparison predicates removed to Query 5b,


/company/department/staffList/contract [age<35]

Figure 3: Increment in response time

Constraint 6: Adhesion constraints


<element name="name"> <complexType><sequence><element name="firstName" type="string" minOccurs="0"/><element name="lastName" type="string"/></sequence></complexType></element>
User Reponse Time
Query 7a 3000 2500 2000 Time 1500 1000 500 0 System 1 System 2 Systems System 3 Query 7b

Enabled by constraint 6, Query 6a,


/company/department/staffList/perm[name/lastName]

, has its path test predicate removed to Query 6b,


/company/department/staffList/perm

Constraint 7: Structural constraints


<element name="company"> <complexType><sequence><element name="department"> <complexType><sequence> <element name="staffList"><complexType><sequence> <element name="perm" minOccurs="0" maxOccurs="20000"><sequence><element ref="name"/><element ref="age"/></sequence></element> <element name="contract" minOccurs="0" maxOccurs="20000"><sequence> <element ref="name"/><element ref="age"/><element name="tenureMth" type="int"/></sequence></element> </sequence></complexType></element></sequence></co mplexType></element></sequence></complexType></elem ent>

Figure 4: Non- deterministic result

Semantic Transformations
We present 6 transformations of XPath expressions. These 6 transformations are later enabled by the knowledge of certain integrity constraints to transform an XPath expression into a semantically equivalent one, i.e. into an expression denoting the same set of XML data, for original XML data valid with respect to the given XML schema. Transformation 1: Expansion of path expression Transformation 2: Contraction of path expression Transformation 3: Addition of predicates Transformation 4: Removal of predicates Transformation 5: Return empty Transformation 6: Rewrite paths Constraint 1: Primary key
<key name="deptKey"><selector xpath="./department" /><field xpath="@id" /></key>

Enabled by constraint 7, Query 7a,


/company/department/staffList/contract/age

, is contracted to a partial path Query 7b,


//contract/age

We have shown in the above 4 figures that transformations can lead to several outcomes in the 3 systems. These 3 systems have in some cases shown similar outcomes, and in other cases, differed in outcomes. Nevertheless, the results obtained can help us make better decision on what transformation to apply given a query. For example, if we want to transform path with wildcard selection, we knew from the performance evaluation that it is more beneficial to transform it to full path or partial path since there would be either reduction or no change in user response time.

Enabled by constraint 7, Query 8a,


//tenureMth

Conclusion
We have identified transformations and classified them into 6 categories and have empirically shown that some of these transformations indeed result in an optimization of XPath queries. This preliminary work is encouraging and suggests a finer grain classification of the transformations. If a realistic cost model is available for the evaluation of XPath expressions, we can envisage to analytically evaluating the potential for optimization yielded by the different semantic transformations. We shall also explore applications other than optimization for the transformations that we have identified. We foresee applications in the distributed management and interchange of XML data in which query rewriting can result in increasing not only efficiency but also effectiveness.

, is expanded to full path Query 8b,


/company/department/staffList/contract/tenureMth

Performance Evaluation
In total 37 queries and their corresponding transformed queries were evaluated. To get a clearer picture, we use T-test with a 95% confidence interval on the collected user response time for original query and for transformed query, to determine if there is-: (1) No change in response time, (2) Reduction in response time, (3) Increment in response time and (4) Non- deterministic result.
User Reponse Time
Query 8a 3000 2500 2000 1500 1000 500 0 System 1 System 2 Systems System 3 Query 8b

Enabled by constraint 1, Query 1a,


//perm[@id = (/company//perm[last()])/@id] /@id

, is rewritten to Query 1b,


(/company//perm[last()])/@id

Constraint 2: Foreign key


<keyref name="supervisorPVal" refer="permKey"><selector xpath="./department/staffList/perm" /><field xpath="supervisor" /></keyref>
Time

Reference
[1] King J. J. QUIST: a system for semantic query optimization in relational databases. In proceedings of the 7th VLDB conference, 1981. p. 510-517 [2] U. S. Chakravarthy, J. Grant, and J. Minker. Logic based approach to semantic query optimization. ACM Transactions on Database Systems Vol. 15, No. 2, June 1990. p. 162-207. [3] J. Grant, J. Gryz, J. Minker, L. Raschid. Logic based Semantic Query Optimization for Object Databases. Proceedings of the Thirteenth International Conference on Data Engineering, 1997. p. 444 - 453 [4] Jason McHugh, Jennifer Widom. Query optimization for XML. In proceedings of the 25th VLDB Conference, Edinburgh, Scotland, 1999. [5] Klemens Bohm and Karl Aberer and M. Tamer Ozsu and Kathrin Gayer. Query Optimization for Structured Documents Based on Knowledge on the Document Type Definition. Advances in Digital Libraries, 1998. p. 196-205 [6] D. Olteanu, H. Meuss, T. Furche, F. Bry. XPath: Looking Forward. In proceedings of the EDBT Workshop on XML Data Management. 2002 [7] A. Kwong and M. Gertz. Schema-based Optimization of XPath Expressions. Submitted for conference publication, 2002. <http://citeseer.nj.nec.com/551000.html>

Enabled by constraint 2, Query 2a,


//contract[supervisor="/company//perm/@id"]/name

, has its comparison predicate removed to Query 2b,


//contract/name
Figure 1: No change in response time

Constraint 3: Exclusivity
<element name="city"><simpleType><restriction base="string"><enumeration value="Singapore" /><enumeration value="Malaysia" /></restriction></simpleType></element>
Time 3500 3000 2500 2000 1500 1000 500

User Reponse Time


Query 5a Query 5b

Enabled by constrainst 3, Quey 3a,


//contract [city = "Texas" or city = "Singapore"]

, has its comparison predicate removed to Query 3b,


//contract [city = "Singapore"]

0 System 1 System 2 Systems System 3

Constraint 4: Cardinality
<element name="perm" type="permType" minOccurs="0" maxOccurs="20000"/>

Figure 2: Reduction in response time

Honours Year Project 2005/06. School of Computing, National University of Singapore

S-ar putea să vă placă și