Sunteți pe pagina 1din 5

World Congress on Software Engineering

Application of Ontology-Based Information Integration on BI System


Gang Tong1 Yaohua Sun1 Jun Tang2 Kesheng Qin1 1.College of Automation and Electrical Engineering, Qingdao University of Science and Technology, Qingdao Shandong 266042, China 2. Petrochina Northeast Refining Chemicals Engineering Company LimitedShenyang Liaoning10016China Abstract
In this paper, due to the demand of heterogeneous data sources integration issue in Business Intelligent system, an ontology-based information integration middleware is designed, in which uses semantic description as a tool to map heterogeneous data sources to data warehouse, and applies Web ontology language (OWL) to represent the ontologies and the mapping rules. This middleware is designed and realized to resolve the problem of schematic heterogeneity and maintainability in traditional Extract, Transformation and Load (ETL) process. Key technologies applied in the system are discussed. production, financial, and many other sources of business data for purposes that include, notably, business performance management. Information may be gathered on comparable companies to produce benchmarks. [3] In general, business intelligence systems are datadriven DSS, it is a fundamental process which gathering heterogeneous data in BI system. We also name this process information integration. It will be briefly introduced in the next section.
Decision Maker Web Interface Staff

1. Introduction
Business intelligence system commonly described as a decision support system. Its a trend in data integration has been to loosen the coupling between data. In this paper, based on the theory of ontology, we try to optimize the integration process, and make it efficient, flexible and reusable.

Report Analysis Data Mining Data Warehouse Information Integration (ETL)

2. Business intelligence system


Business intelligence [1] (BI) refers to technologies, applications and practices for the collection, integration, analysis, and presentation of business information and sometimes to the information itself. The purpose of business intelligence: a term that dates at least to 1958, is to support better business decision making. Figure 1 presents the architecture of a common BI system. BI systems provide historical, current, and predictive views of business operations, most often using data that has been gathered into a data warehouse or a data mart and occasionally working from operational data. Software elements support the use of this information by assisting in the extraction, analysis, and reporting of information. Applications tackle sales,
978-0-7695-3570-8/09 $25.00 2009 IEEE DOI 10.1109/WCSE.2009.141 171

Historical Data

External Data

Current Data

Figure 1. Business intelligence system

3. Information integration
Information integration is also called data integration. In business intelligent system, ETL [4] is the Information integration process. The so-called information society demands complete access to available information, which is often heterogeneous and distributed. In order to establish efficient information sharing, many technical problems have to be solved. First, a suitable information source must be

located that might contain data needed for a given task. Finding suitable information sources is a problem addressed in the areas of information retrieval and information ltering. Once the information source has been found, access to the data therein has to be provided. This means that each of the information sources found in the rst step have to work together with the system that is querying the information. The problem of bringing together heterogeneous and distributed computer systems is known as interoperability problem. Interoperability has to be provided on a technical and on an informational level. In short, information sharing not only needs to provide full accessibility to the data, it also requires that the accessed data may be processed and interpreted by the remote system. Problems that might arise owing to heterogeneity of the data are already well-known within the distributed database systems community: structural heterogeneity (schematic heterogeneity) and semantic heterogeneity (data heterogeneity). Structural heterogeneity means that different information systems store their data in different structures. Semantic heterogeneity considers the contents of an information item and its intended meaning. [5] In order to achieve semantic interoperability in a heterogeneous information system, the meaning of the information that is interchanged has to be understood across the systems. Semantic conicts occur whenever two contexts do not use the same interpretation of the information. There are three main causes identied for semantic heterogeneity. [6] The BI system, as we discussed in previous section, need to integrate various heterogeneous information in to data warehouse. If we find out a better way of modeling the process of ETL, the integration would be more stable and reusable.

integration of data sources, they can be used for the identification and association of semantically corresponding information concepts. [8] There are three approaches to employ ontology, in general, can be identied: single ontology approaches, multiple ontologies approaches and hybrid approaches [9]. Figure 2 gives an overview of the three main architectures.
global ontology local ontology local ontology local ontology

single ontology approach shared vocabulary

multiply ontology approach

local ontology

local ontology

local ontology

hybrid ontology approach

Figure 2. Three ontology approaches The BI system we designed in the next section is based on hybrid ontology architecture.

5. Systems architecture design


Figure 3 shows the system we designed, and how does it work. It can be divided into three layers.
Heterogeneous Data Sources
SQL

Ontology-based BI Middleware

Applications
Data Mining Engines

SQL Server

Mapping rules

Wrapper layer : JDBC

Local ontology

Global ontology

Advanced App

4. Ontology theory
Ontologies, as formal models of representation with explicitly defined concepts and named relationships linking them, are used to address the issue of semantic heterogeneity in data sources. In domains like bioinformatics and biomedicine, the rapid development, adoption and public availability of ontologies has made it possible for the data integration community to leverage them for semantic integration of data and information. [7] Initially, ontologies are introduced as an explicit specification of a conceptualization. Therefore, ontologies can be used in an integration task to describe the semantics of the information sources and to make the contents explicit. With respect to the

Oracle

Ontology-based API

Agent Apps

DB2 Data Warehouse

JDBC MOLAP

Basic App

Figure 3. Ontology-based system architecture

5.1. Heterogeneous Data Source Layer


Heterogeneous Data Source Layer consists of two parts. The first part is heterogeneous database, such as Oracle, DB2, My SQL, SQL Server, etc. different structured information is stored in them. The second part is database wrapper. The initial data with the structure that we can not use directly were stored in heterogeneous Database, and databases have different

172

access interface, which we can not treat them in the same way. The problem can be solved with database wrapper, which can capsulate the different access interface into a uniform method. That is to say, we can send general data access command (e.g. SQL command) to the wrapper, and then the uniform data will be returned from heterogeneous database by the wrapper. In this system we build on Java platform, so the wrapper here consists by JDBC (Java Data Base Connectivity) and some data access code that we programmed.

5.2. Ontology-based middleware layer


Ontology-based middleware layer is consisted of ontology engine and data warehouse. In order to solve the semantic heterogeneity problem, ontology engine build on hybrid ontology approach. It includes the standard ontologies which abstract from shared vocabulary library, the local ontologies and the rules that can join them together. The standard ontologies provided user all information that the system designed to integrate; they are based on sale subject, customer subject, etc. It is the standard modal of the core business logic. A local ontology corresponds to a certain heterogeneous database. The ontology engine working principle can be elaborated as follows: First, data are requested by application layer; second, the request is analyzed by ontology engine, finding out the corresponding standard ontology, and the rule that link from the standard ontology to the local ontology; third, generate the heterogeneous SQL request according to the rule; finally, send the request to heterogeneous data source layer.

part components, or some of the modules will be to add in future. The structure how data stored in data warehouse is different from advanced applications interface can deal with, so this is also an interoperability problem. The system provides a solution by using the ontology-based API (Application Programming Interface), which can give variable data form by interacting with applications. First, this API gets the data from data warehouse by standard ontology form and then change into the form that application can accept. If we establish the standard ontology as a standard, and let application can deal with the standard form, then the standard ontology modal will be general acceptable. The advantage is that security is increased by avoiding accessing data warehouse directly. Isolation between applications and data warehouse also increases the systems flexibility and degree of decoupling. In other words, the system can easily adapt with the data warehouse changing by modifying the interface of the ontology-based API. Interoperability and open degree will be both increased fast in future BI system. We will take advantage of the data warehouse system from business partners. It is a trend that open the analysis result of BI with XML form, which is common operational by most system. That is one reason why ontology language is represented by OWL, which is a subset of XML.

5.3. Application layer


In this layer, applications were divided into two groups according to their interactive method with middleware layer. One group is named basic application. The typical feature of this kind of applications is that their details are known by system designer. OLAP and MOLAP are basic applications for this BI system, and they were already existed when the system has been designed. They are designed to access the data warehouse directly. The first reason is the structure of the data is matched perfectly with its interface, and the second is that it will be more efficiency accessing directly. Most early designed applications are basic application. The other group is named advanced application. As the name itself, this type of applications detail is not determined by the time of system-designing, they are complicated variable. For instance, some of the third-

Figure 4. Protg In the next section we will elaborate the key module, and describing with an instance.

6. Key module analysis

173

The ontology used in this system was designed in Protg [10], which is software that can build ontology in a vision way. It also support other ontology representing languages that commonly used .We use OWL DL [11] to describe our ontology in this system.

vocabulary Gross Profit depends on vocabulary Sale Price and Purchase Price, the calculation minus should be done. In this system, we store these calculations in mapping ontology, which means map from one vocabulary to another. We list the owl file represents the ontology which is used in the business system we designed. This file contains the standard ontology corresponding to the sales theme. It generated by Protg version 3.3.1. The ontology properties connect with shared vocabularies by using the owl tag : <owl: equivalentProperty />. We can use the same way to build local ontologies to realize our systems designing.

7. Conclusion
In this paper, we designed an information integration process of BI system based on the concept of ontology, with the help of ontology engine, the integration process have a better maintainability than currently common ETL process. The establishment of shared vocabularies make terms in the field of retail can be reusable. With OWL represented integration modal, openness of the BI system has been increased.

8. References
[1] H. P. Luhn, "A Business Intelligence System", IBM
Journal, IBM, 1958-01-01. [2] D. J. Power, "A Brief History of Decision Support Systems, version 4.0". DSSResources.COM, 2007-03-10. [3] http://en.wikipedia.org/wiki/ Business_Intelligence Wikipedia, Wikimedia Foundation, Inc. 2008-10-07 ,

[4] http://en.wikipedia.org/wiki/Extract,_transform,_load , Wikipedia, Wikimedia Foundation, Inc. 2008[5] Yigal Arens, Chin Y. Chee, Chun-Nan Hsu, and Craig A. Knoblock. Retrieving and integrating data from multiple information sources. International Journal of Intelligent and Cooperative Information Systems , 1993 [6] Cheng Hian Goh. Representing and Reasoning about Semantic Conicts in Heterogeneous Information Sources. Massachusetts Institute of Technology , 1997. [7] http://en.wikipedia.org/wiki/Ontology_based_data_integration, Wikipedia, 2008-05-28 [8] Tom Gruber, "A translation approach to portable ontology specications", Knowledge Acquisition, 5(2), 199 220, 1993

Figure 5. OWL represented standard ontology As discussed in section four, we need to build three kinds of ontologies to realize the system design. The first kind is global ontology that consists of vocabulary and corresponding data column name in data warehouse. The second is local ontology which contains vocabulary and corresponding data column name in heterogeneous data source. That is the method how data warehouse link to heterogeneous data source. Some other vocabularies that can not directly get from heterogeneous data source, and certain kind of calculation must be done before next step. For instance in the process of sale, the Gross Profit equals Sale Price minus Purchase Price. That is to say, the

174

[9] Wache H, Vogele T, Visser U, et al. Ontology-based Integration of InformationA Survey of Existing Approaches"Ontologies and Information Sharing", Proc. of IJCAI-01 Workshop, 2001 [10] http://protege.stanford.edu/overview/ , Stanford Center f-or Biomedical Informatics Research, 2008 [11] Michael K. Smith, Chris Welty, Deborah L. McGuinness, http://www.w3.org/TR/2004/REC-owl-guide20040210/, W3C, 2004-02-10

175

S-ar putea să vă placă și