Sunteți pe pagina 1din 5

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.

ORG

120

Improved Readability & Understandability by Incorporating Query Engine & QRO in DWH Architecture
Atika Qazi, Rubina Adnan, Junaid Tariq, Saif Ur Rehman Malik, Usman Habib Qazi
AbstractThe success of every organization depends on the accurate decision made by top management. Top management uses consolidated and obvious view of organizational data for decision making. This consolidated view of organizational data is provided by data warehousing system. These systems are used as an organization repository to support strategic business decisions. In this paper, authors described resourceful and obvious way of answering queries that are coming from multiple classes of users. In different scenarios different type of users may interact with the warehouse system. The users that belong to varied environment need to view the requested result as per their convenience. In order to improve the user readability the QE (Query Engine) & QRO (Query) is being proposed. The mechanism proposed in QRO layer ensures the desired user readability. The feature of Query Engine is to fetch transparency and avoid messy data. By providing variance in readability mode the quality of decision making is increased to high extent. . Index Terms Data Warehouse (DWH), Distrubuted Data Warehouse (DDWH), (QRO) Query Result Optimization

1 INTRODUCTION
he building of Success stands on the beams of decisions made by decision makers. Timely decision must be taken by decision makers in order to get numerous advantages, thus increasing the profitability of that organization. In this regard data warehouse is playing a key role in establishment of top order management for decision making. The data warehousing systems are used as decision support systems to facilitate business in strategic business decision making. Data Warehouse is a database established for business decision support. It is independent of the organizations operational databases. It is comprised of historical and current (DWH) is an informational database that is maintained separately from an organization's operational database Error! Reference source not found.. It is used for providing the basic infrastructure for decision making by extracting, cleansing and storing huge amount of data. Data warehouses support business decisions by collecting, consolidating, and organizing data for

reporting and analysis with tools such as online analytical processing (OLAP) and data mining. The research questions based on the improved readability of the queries send back to user. DWH) is an informational database that is maintained separately from an organization's operational database Error! Reference source not found.. According to Kevin, A collection of corporate information, derived directly from operational systems and some external data sources. Its specific purpose is to support business decisions, not business operations [14]. It is used for providing the basic infrastructure for decision making by extracting, cleansing and storing huge amount of data. Data warehouses support business decisions by collecting, consolidating, and organizing data for reporting and analysis with tools such as online analytical processing (OLAP) and data mining. The research questions based on the improved readability of the queries send back to user. Following are the research issues which are addressed in this research paper: How to improve user understandability. How to avoid messy data to being processed How to utilize maximum time in a useful work. How to tackle garbage data and bring transparency. How to provide user friendly environment.

Atika Qazi is with the Comsats Institute of Information and Technology, Islamabad, Pakistan. Rubina Adnan is with the Comsats Institute of Information and Technology, Islamabad, Pakistan. Junaid Tariq is with the Comsats Institute of Information and Technology, Islamabad, Pakistan. Saif Ur Rehman Malik is with the Comsats Institute of Information and Technology, Islamabad, Pakistan Usman Habib Qazi is with the Mobilink Islamabad, Pakistan

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

121

How diverse classes of users are handle according to their understandability. How enhanced results are displayed as per user aspiration.

In this paper, authors proposed distributed data warehouse architecture with improved user readability in resultant values and avoid untidy information in order to increase success ratio in decision making. The beneficial decision making based upon the transcript results. In this paper the user feasibility and query clearness is being focused in order to make more effective decision making. The query engine Consent to to detect garbage data at the very early stage in order to save time. To avoid faulty data the over all performance of the system will be improved that lead to increased efficiency. In this architecture, data that is coming from useful operational sources is distributed over different levels of hierarchy in an organization. The main focus is higher readability of the resultant data. The resultant data will be provided to user according to his required readability mode in order to bring ease for user .So that out put queries will be displayed in more enhanced way. Load will be divided on separate individual systems for higher reliability as data is being distributed on separate individual systems. Rest of the paper is organized into different sections. Section 2 gives an overview of data warehouse. Section 3 presents the proposed distributed data warehouse architecture, Section 4 presents case study and section 5 describes the implementation details and Section 6 conclude the research work. Different MDA-based methodologies are MODA-TEL Error! Reference source not found. MASTER Error! Reference source not found., C3 Error! Reference source not found., ODAC Error! Reference source not found., Error! Reference source not found., DREAM Error! Reference source not found. and DRIP Catalyst Error! Reference source not found.. Method Engineering (ME) process is used to offer a generic lifecycle for MDA [10]. However, this generic life cycle is silent about some important activities. This work identifies the shortcomings in MDA-SDLC by comparing it with traditional SDLC. This work will lead to a new more generic MDA SDLC. There are certain short comings in the existing MDA architecture. These shortcomings are discussed by Atika et. al. in Error! Reference source not found.. According to Atika et. al. The traditional SDLC is taken as benchmark. MDA-SDLC is mapped with traditional SDLC. It is found that the MDA-SDLC lack some very important phases or activities such as modification and enhancement phase, user training phase and Quality check and evaluation phase .In other words, MDA-SDLC requires to be modified and enhanced. In oreder to test the quality of each process the author introduces

the quality assurance check. The quality levels are settled by the cooperative team and the comparison would take place in order to meet the standards. The step by step checking at each stage of SDLC might reduce the chances of errors and quality product at end will be produce. The correspondence between designers and clients at the end of each phase is mandatory, so that result would be according to clients required specifications. The requirements of user might change time to time due to mission creep. The product design would be modular and flexible enough so that performance of software is not compromised by modification and enhancement. The proposed solution introduces the enhancement phase in the existing MDA. The enhancement phase has been included in MDA to accommodate upcoming modification and enhancement to make systems up to date. The existing system has to be flexible enough so that new modules could be easily embedded into it. The organization of this paper is as follows: section 2 describes literature review about MDA; sections 3 discuss enhanced MDA and concluding marks are presents in section 4.

2 Background 2.1 DWH


A data warehouse is a logical collection of information gathered from many different operational databases used to create business intelligence that supports business analysis activities and decision making tasks. The data warehousing market consists of tools, technologies, and methodologies that allow for the construction, usage, management, and maintenance of the hardware and software used for a data warehouse, as well as the actual data itself. The Data Warehouse is a database maintained for business decision support. In spite of daily operations, data warehouse focuses on the modeling and analysis of data. This leads to the decision making process. William Inmon has described data warehouse as A subject-oriented, integrated, time-variant, non volatile collection of data in support of management decisions [1], [2]. Data warehouse is a set of materialized views over data sources [3], [4], [5]. Ralph Kimball et. Al. defined A data warehouse is a copy of transaction data specially structured for query and analysis [6]. A data warehouse combines various data sources into a single source for end user access. End user can perform ad hoc querying, reporting, analysis, data mining and visualization of warehouse information. The goal of data warehouse is to establish a data repository that makes operational data accessible in a form that is readily acceptable for decision support and other application.

Basic architecture of data warehouse is discussed in [8] as shown in figure 1. Connolly et. al. proposed three tier architecture of a data warehouse [1]. First tier consists of data warehouse and archive / backup data. Second tier

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

122

consists of different data marts. Reporting, OLAP and data mining tools make third tier of the architecture. Hoffer et .al. presented generic two level architecture of data warehouse [9]. Detailed logical architecture of a data warehouse is presented in [10] as shown in figure2. Some data warehouse architectures are discussed in [11] , [12]. MDA is a methodology to system development. The MDA application involves Platform Independent Model, Platform Specific Modeland code. To represent different model multiple tools are used, like Unified Modeling Language (UML) Meta-Object Facility (MOF) and Common Warehouse Meta-model (CWM) etc. MDA provides the means of understanding, design, construction, deployment, operation, and maintenance and this is why its called moel driven.MDA Based methodologies also used Situational Method Engineering (SME). The use of SME is to fit the project according to certain situation. There are different types like Paradigm based SME, generic insatiable process lifecycle and the extension based approach. The stages CIM, PIM, PSM and code generation are described below:

form.

Code

Coding is finalized at this stage. Most of the implementation is carried out by the MDA tool that produces code chosen by the developer. Manual code generation by developer is also carried out when required. Finally the end product is produced. The coding phase provides the

CIM

The CIM is computation independent model that is also known as business model. The situation in which the system will work is defined by the CIM; therefore it helps to present what the system is exactly to do Error! Reference source not found.. The business people are conversant with the vocabulary that CIM holds. Therefore, CIM plays an important role to provide close connection between experts about the business, and software engineers. PIM

Figure 1: software development lifecycle Error! Reference source not found.

Figure 1: Basic Architecture of Data Warehouse [8]

The PIM is a platform independent model. It does not describe the details of platform but specify the formal structure and functions of the system. The analysis phase of SDLC provides input to the PIM. To represent PIM, UML frequently used; any other related tool can be used to represent PIM.

PSM
Figure 2: Detailed Logical Architecture of Data Warehouse [10]

The platform specific model is responsible for the functionality in a particular platform that is specified in PIM in general. Plat form specific takes the input from lowlevel-design that is a stage after analysis. It also identifies how that system makes use of the chosen platform Error! Reference source not found.. UML is used to make platform specific model. Diversity of information related implementation is providing by PSM. Mapping is carried out at this stage for transformation of PIM into PSM according to desire platform. PSM also identifies how that system makes use of the chosen plat-

3. Proposed Architecture
The proposed architecture is shown in Figure 3. The proposed architecture use decentralized approach. Users from multiple directions are sending set of

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

123

queries to Priority Allocation Layer. The users are distributed in hierarchal order. Users that are interacting with system are classifying according to predefined distribution of classes. The top level management is placed on upper level of the hierarchy and the subordinates are classified at lower level. The user interacts with the system through presentation layer. The priority level of user is already defined. The priority allocation layer (PAL) has three major components that are buffer, priority allocation process and prioritized Queue. The set of queries generated by users are denoted with Q .The Q is send to Priority Allocation process. Here Q is filtered on the basis of threshold value. The threshold is a predefined value and can be optimized according to system. Threshold holds is the lowest possible query processing time. It is assumed that query processing time is already calculated on the basis of their complexity. The complexity might be calculated on the basis of joins, set and dimensions. The queries in Q that are less than or equal to Threshold value are filtered out accordingly. QThresh is the set of filtered queries. The QThresh is processed first and replied back to requested user. Following is the equation to calculate set of remaining queries Qi. Qi = Q - QThresh Where QThresh is the set of queries less than Threshold value and Q is the set of all unordered queries. The Qi is sorted in two steps. In first step, Qi is sorted on the basis of users priority in order to get Qij where j is priority number. In second step, sorting is applied on the set Qij individually (i.e. each set of priority is sorted separately) on the basis of processing/time. The continuation of the research work is proposed here. The new QRO (Query Optimized Layer) is being proposed in order to increase user readability. The success of an organization depends on the decisions that are based upon the result of queries. Time Tij is defined as the time slot assigned to set Qij in order to interact with DWH in predefined time. The results are generated to requested users accordingly. The Algorithm of Query Result Optimization is given below Input: Qstand set of resultant queries in regular display form. QRequired set of queries after formulating QStand in requested from. Qi, Qj and Qk are the offered mode, user may select according to need. QInitial is the selected mode by the user. Output: QConvert, queue of resultant queries to be executed. Begin Qo Qi, Qj or Qk For each query in QInitial Qo = Assign QInitial as a Qi, Qj or Qk on the basis of requested processing mode. QInitial Q0

End For each query in QStand QStand = Process the queries according to PAL Process End For each Requested mode QStand QConvert = Convert QStand on the basis of requested mode. QConvert Qstand QConvert = Qo U QStand End End Where QInitial be the selected mode to process the query, QStand is the set of processed queries in typical form, QConvert is the set of resultant queries according to user requested mode, and Qo is the set of upcoming queries.

Figure 3: Proposed Architecture of Distributed Data Warehouse

5. CONCLUSION
Currently data warehouse is used as organizational repository to support business decision making. Mostly the data warehouse systems uses centralized approach. Further more the hierarchy of organization and classes of users is not considered in data

References
[1]. Thomas Connolly, Carolyn Begg, Database Systems: A Practical Approach to Design, Implementation and Management, 4th Edition, Addison-Wesley, 2003 [2]. William Inmon, Building the Data Warehouse, 2nd Edition, New York: Wiley publisher. Inc, 1996 [3]. Z. Bellahsene, Schema, Evolution in Data Warehouses, Knowledge and Information Systems, Springer-Verlag, pp 283-304, 2002 [4]. S. Chen, X. Zhang, E.A. Rundensteiner, A Compensation-Based Approach for Materialized View Maintenance in Distributed Environ-

JOURNAL OF COMPUTING, VOLUME 3, ISSUE 6, JUNE 2011, ISSN 2151-9617 HTTPS://SITES.GOOGLE.COM/SITE/JOURNALOFCOMPUTING/ WWW.JOURNALOFCOMPUTING.ORG

124

[5].

[6].

[7].

[8]. [9].

[10].

[11].

[12].

[13].

ments, In Computer Science Technical Report, Worcester Polytechnic Institute, Worcester, MA, USA, 2004 E.A. Rundensteiner, A. Koeller, X. Zhang, Maintaining Data Warehouses over Changing Information Sources, Communications of the ACM, Volume, 43, New York, NY, USA, pp 57-62, 2000 Ralph Kimball, M. Joy and T. Warren, the Data warehouse Toolkit: with SQL server and Microsoft Business Intelligence Toolset, 2nd Edition, New York: Wiley publisher. Inc, 2006 Efraim Turban, Jay E. Aronson and Narasimha Bolloju, Decision Support Systems and Intelligent Systems, 7th edition, Prentice Hall College Div, 2001 Data Warehousing and OLAP, www.cs.uh.edu/~ceick/6340/dwolap.ppt , Accessed Data: Nov 25, 2008 Jeffrey A. Hoffer, Mary B. Prescott, Fred R. McFadden, Modern database management, Sixth Edition, Pearson Education Publishers, Singapore Online Analytical Processing (OLAP) and Data Warehousing, academic2.bellevue.edu/~jwright/CIS605/Lesson10/OLAP.ppt Accessed Data: Dec 5, 2008 Daniel L. Moody, Mark A.R. Kortink, From Enterprise Models to Dimensional Models: A Methodology for Data Warehouse and Data Mart Design, In Proceedings of the International Workshop on Design and Management of Data Warehouses (DMDW'2000) June 5-6, 2000, Stockholm, Sweden Mohammad Rifaie, Erwin J. Blas, Abdel Rahman M. Muhsen, Terrance T. H. Mok, Keivan Kianmehr, Reda Alhajj, Mick J. Ridley, Data warehouse Architecture for GIS Applications, In Proceedings of the 10th International Conference on Information Integration and Web-based Applications & Services (iiWAS '08) , November 2008, Linz, Austria The Real-Time Data Warehouse: The Next Stage in Data Warehouse Evolution.

[14]. Kevin C. Desouza, Managing Knowledge with Artificial Intelligence: an introduction with guidelines for non specialists, United States of America, British Library of congress, 2002. [15]. Atika Qazi, The Study on Distributed Data warehouse Modeling, Thesis Report IIUI 2010 Atika Qazi 2008-2010 MS International Islamic University, Islamabad, 2002-2004, MCS COMSATS Institute of Information Technology, Abbottabad, 1999-2001 BSC, works with COMSATS Islamabad since 2004 to date. Presently working for COMSATS Institute of Information Technology, Islamabad, Pakistan. The two papers are accepted in international conference 1st in ICMLC 2011, Singapore, and February 26-28, 2011 2nd paper May 2-4 2011, ICCCM Sydney, Australia. Rubina Adnan is working as an Assistant Professor in COMSATS. She has completed her MS from COMSATS and is an active researcher. Junaid Tariq is working in COMSATS as a Lecturer for more than 3 years. He is in his final year of MS and has couple of publications in this field. Saif Ur Rehman Malik is working as a Research Associate in COMSATS. He has completed his MS in 2009 and has more than 15 publications in this field. He is an active researcher and his area of research is Business Intelligence. Usman Habib Qazi is working as a Team coordinator in Mobilink contact centre. He has done his BS Computer Science from COMSATS.

S-ar putea să vă placă și