Documente Academic
Documente Profesional
Documente Cultură
1. Introduction
Business intelligence system commonly described as a decision support system. Its a trend in data integration has been to loosen the coupling between data. In this paper, based on the theory of ontology, we try to optimize the integration process, and make it efficient, flexible and reusable.
Historical Data
External Data
Current Data
3. Information integration
Information integration is also called data integration. In business intelligent system, ETL [4] is the Information integration process. The so-called information society demands complete access to available information, which is often heterogeneous and distributed. In order to establish efficient information sharing, many technical problems have to be solved. First, a suitable information source must be
located that might contain data needed for a given task. Finding suitable information sources is a problem addressed in the areas of information retrieval and information ltering. Once the information source has been found, access to the data therein has to be provided. This means that each of the information sources found in the rst step have to work together with the system that is querying the information. The problem of bringing together heterogeneous and distributed computer systems is known as interoperability problem. Interoperability has to be provided on a technical and on an informational level. In short, information sharing not only needs to provide full accessibility to the data, it also requires that the accessed data may be processed and interpreted by the remote system. Problems that might arise owing to heterogeneity of the data are already well-known within the distributed database systems community: structural heterogeneity (schematic heterogeneity) and semantic heterogeneity (data heterogeneity). Structural heterogeneity means that different information systems store their data in different structures. Semantic heterogeneity considers the contents of an information item and its intended meaning. [5] In order to achieve semantic interoperability in a heterogeneous information system, the meaning of the information that is interchanged has to be understood across the systems. Semantic conicts occur whenever two contexts do not use the same interpretation of the information. There are three main causes identied for semantic heterogeneity. [6] The BI system, as we discussed in previous section, need to integrate various heterogeneous information in to data warehouse. If we find out a better way of modeling the process of ETL, the integration would be more stable and reusable.
integration of data sources, they can be used for the identification and association of semantically corresponding information concepts. [8] There are three approaches to employ ontology, in general, can be identied: single ontology approaches, multiple ontologies approaches and hybrid approaches [9]. Figure 2 gives an overview of the three main architectures.
global ontology local ontology local ontology local ontology
local ontology
local ontology
local ontology
Figure 2. Three ontology approaches The BI system we designed in the next section is based on hybrid ontology architecture.
Ontology-based BI Middleware
Applications
Data Mining Engines
SQL Server
Mapping rules
Local ontology
Global ontology
Advanced App
4. Ontology theory
Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies has made it possible for the data integration community to leverage them for semantic integration of data and information. [7] Initially, ontologies are introduced as an explicit specification of a conceptualization. Therefore, ontologies can be used in an integration task to describe the semantics of the information sources and to make the contents explicit. With respect to the
Oracle
Ontology-based API
Agent Apps
JDBC MOLAP
Basic App
172
access interface, which we can not treat them in the same way. The problem can be solved with database wrapper, which can capsulate the different access interface into a uniform method. That is to say, we can send general data access command (e.g. SQL command) to the wrapper, and then the uniform data will be returned from heterogeneous database by the wrapper. In this system we build on Java platform, so the wrapper here consists by JDBC (Java Data Base Connectivity) and some data access code that we programmed.
part components, or some of the modules will be to add in future. The structure how data stored in data warehouse is different from advanced applications interface can deal with, so this is also an interoperability problem. The system provides a solution by using the ontology-based API (Application Programming Interface), which can give variable data form by interacting with applications. First, this API gets the data from data warehouse by standard ontology form and then change into the form that application can accept. If we establish the standard ontology as a standard, and let application can deal with the standard form, then the standard ontology modal will be general acceptable. The advantage is that security is increased by avoiding accessing data warehouse directly. Isolation between applications and data warehouse also increases the systems flexibility and degree of decoupling. In other words, the system can easily adapt with the data warehouse changing by modifying the interface of the ontology-based API. Interoperability and open degree will be both increased fast in future BI system. We will take advantage of the data warehouse system from business partners. It is a trend that open the analysis result of BI with XML form, which is common operational by most system. That is one reason why ontology language is represented by OWL, which is a subset of XML.
Figure 4. Protg In the next section we will elaborate the key module, and describing with an instance.
173
The ontology used in this system was designed in Protg [10], which is software that can build ontology in a vision way. It also support other ontology representing languages that commonly used .We use OWL DL [11] to describe our ontology in this system.
vocabulary Gross Profit depends on vocabulary Sale Price and Purchase Price, the calculation minus should be done. In this system, we store these calculations in mapping ontology, which means map from one vocabulary to another. We list the owl file represents the ontology which is used in the business system we designed. This file contains the standard ontology corresponding to the sales theme. It generated by Protg version 3.3.1. The ontology properties connect with shared vocabularies by using the owl tag : <owl: equivalentProperty />. We can use the same way to build local ontologies to realize our systems designing.
7. Conclusion
In this paper, we designed an information integration process of BI system based on the concept of ontology, with the help of ontology engine, the integration process have a better maintainability than currently common ETL process. The establishment of shared vocabularies make terms in the field of retail can be reusable. With OWL represented integration modal, openness of the BI system has been increased.
8. References
[1] H. P. Luhn, "A Business Intelligence System", IBM
Journal, IBM, 1958-01-01. [2] D. J. Power, "A Brief History of Decision Support Systems, version 4.0". DSSResources.COM, 2007-03-10. [3] http://en.wikipedia.org/wiki/ Business_Intelligence Wikipedia, Wikimedia Foundation, Inc. 2008-10-07 ,
[4] http://en.wikipedia.org/wiki/Extract,_transform,_load , Wikipedia, Wikimedia Foundation, Inc. 2008[5] Yigal Arens, Chin Y. Chee, Chun-Nan Hsu, and Craig A. Knoblock. Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems , 1993 [6] Cheng Hian Goh. Representing and Reasoning about Semantic Conicts in Heterogeneous Information Sources. Massachusetts Institute of Technology , 1997. [7] http://en.wikipedia.org/wiki/Ontology_based_data_integration, Wikipedia, 2008-05-28 [8] Tom Gruber, "A translation approach to portable ontology specications", Knowledge Acquisition, 5(2), 199 220, 1993
Figure 5. OWL represented standard ontology As discussed in section four, we need to build three kinds of ontologies to realize the system design. The first kind is global ontology that consists of vocabulary and corresponding data column name in data warehouse. The second is local ontology which contains vocabulary and corresponding data column name in heterogeneous data source. That is the method how data warehouse link to heterogeneous data source. Some other vocabularies that can not directly get from heterogeneous data source, and certain kind of calculation must be done before next step. For instance in the process of sale, the Gross Profit equals Sale Price minus Purchase Price. That is to say, the
174
[9] Wache H, Vogele T, Visser U, et al. Ontology-based Integration of InformationA Survey of Existing Approaches"Ontologies and Information Sharing", Proc. of IJCAI-01 Workshop, 2001 [10] http://protege.stanford.edu/overview/ , Stanford Center f-or Biomedical Informatics Research, 2008 [11] Michael K. Smith, Chris Welty, Deborah L. McGuinness, http://www.w3.org/TR/2004/REC-owl-guide20040210/, W3C, 2004-02-10
175