Sunteți pe pagina 1din 12

ABSTRACT:

The Data Warehousing supports business analysis and decision making by creating an enterprise wide integrated database of summarized, historical information. It integrates data from multiple incompatible sources. By transforming data into meaningful information a data warehouse allows the business manager to perform more substantive, accurate and consistent analysis. DataMining techniques can be implemented rapidly on existing software and hardware platforms to enhance the value of existing information resources and can be integrated with new products and systems as they are brought online. When implemented on high performance clien/server or parallel processing computers datamining tools can analyze massive databases that support querying effectively.

A Data Warehouse is of course a database, but it contains summarized information. Integration of Data Mining with Warehouse exploits effective results like better quering process, performance sharing and also getting reliable information. Here in the following section we expose the entire concept of Data Warehousing & Data Mining.

INTRODUCTION:
Modern organizations are under enormous pressure with recent development of the technology. Clearly we need a rapid access to all kinds of information. To assist this we need to consider the past and to identify relavent trend analysis. So inorder to perform any trend analysis we must have a database. In most organizations you will find really large databases in operation for normal daily transactions. These types of

databases are known as operational databases; in most cases they have not been design to store historical data or to respond to queries but simply to support all the applications for day to day transactions. The second type of database found in organizations is the data warehouse. This is designed for strategic decision support and is largely built up from the databases that make up the operational database. The basic characteristic of a data warehouse is that it contains vast amount of data which can mean billions of records. Smaller, local data warehouse are called data marts. A data warehouse is designed especially for decision support queries, therefore only data that is needed for decision support is extracted from the operational data and stored in the data warehouse along with the time when it was retrieved from operational databases.

Datawarehousing Need for Datawarehouse:

To summarise large volumes of


data. To integrate datas from different sources. Make decision makers to access past data. Dim 1key

Fact 2table key


Dim1tab le Dim 3Dim2t key able Sum Dim2A mar ttrib y Dim3tab le

Dim

Dim1Att rib

Dim3Att rib

Enable people to make informed decisions

FEATURES :

1.

Time dependent: - That is, containing information collected over time, which

implies there must always be a connection between the information in the warehouse and the time when it was entered.

DECISION SUPPORT SYSTEM :


When designing a decision support system, particular importance should be placed on the requirements of the end-user and the h/w and s/w products that will be required.

2.

Non-volatile (permanent): -That is, data in datawarehouse is never updated but used only for queries. End users who want to update the data must use operational database.This means that data warehouse will always be filled with historical data.

The requirements of the end-users: -

3.

Subject oriented: - That is, built around all the existing applications of the operational data.The data warehouse is designed specifically for decision support while the operational databases contain about information for day todayuse.

Some end-users need specific query tools so that they can build their queries themselves. Some others are interested only in particular part of information. We can build a specific type of application around this to speed up the query process.

H/w and S/w products of a decision support systems:


Working in a client/server environment allows you great flexibility in choosing the appropriate s/w for end-users because each individual need can be catered

4.

Integrated: - In data warehouse it is essential to integrate this information and make it consistent; only one name must exists to describe each individual entity.

for on a local workstation.The h/w requirements depends on the type of data warehouse and the techniques with which you want to work.Two basic types of data warehouses : Enterprise data warehouses: The enterprise data warehouse contains corporate wide information integrated from multiple operational data sources for consolidated data

that is built for use by an individual department or division of an organization. Unlike the enterprise data warehouse, datamarts are often built from the bottom of by departmental resources for a specific support application or group of users. Datamarts contain summarized and often detailed about subject area.

DATAWAREHOUSE SCHEMAS :
A multidimensional data model identifies the dimensions, their hierarchies the measure functions etc., for the design of data

Dimensio Fact n1 tabl e table

Dimensio Dimensio n2 n3 table table

cube. But realization of data cube is in designing phase. Variouse schemas as employed. 1. Star schema : It is a modeling paradign in which the datawarehouse contains a large single fact table and a set of smaller dimensional tables, one for each dimension.

analysis. Typically it is composed of several subject areas such as customers, products, and sales and is used for both tactical and strategic decision making. 1. DataMarts :Datamarts contain a subset of carporate wide data

Fact table:

It contains detailed summary data Each tuple consists of foreign key to each dimension table. Corresponds to only one tuple in each dimension table.

star schema the dimension tables are denormalized and in snowflake schema these tables are normalized.

Easier to maintain. Saves storage space.

Dimension table:
It consists of columns that corresponds to the attributes of the dimensions.

One tuple in a dimension table may corresponds to more than one tuple in the fact table.

Microsoft Data Warehousing Framework: The goal of the data warehousing framework is to simplify the design implementation and management of data warehousing solutions. The data warehousing framework describes the relationships between the various components used in the process of building using and managing a data warehouse

1:N relationship exists between factable and dimensiontables.

It is easy to understand and easy to define hierarchies.

It reduces the no. of physical joins and is easy to maintain. The core of the Microsoft framework is a set of enabling technologies comprised of the data transport layer and integrated data repository. Operational data must pass through a cleaning and transformation stage before being placed into the datamarts or

2. Snowflake schema : It consists of single fact table and multiple dimension tables. The difference between star schema and snowflake schema is that in

data warehouse in order to confirm to the decisions laid out during the design stage.

Datamining
DataMining or knowledge discovery in

End-user tools including desktop productivity products specialized analysis products and custom programs are used to gain access the information in the data warehouse. Ideally user access is through a directory facility that enables the user search for appropriate and relevant data to resolve business questions, and provides a layer of security between the users and backend systems.Finally a verity of tools come into play for the management of data warehouse environment such as scheduling repeated tasks and managing multiserver N/w.

databases is the nontrivial extraction of implicit and previously unknown and potentially usefull information from the data. Data mining is the search for relationship and global patterns that exist in large databases but are hidden among vast amount of data.

WORKING PROCEDURE :
DataMining software analyzes relationships and patterns in stored transactions data based on open-ended user queries. Generally sought four types of relationships

Microsoft repository provides the integration point for the metadata shared by the various tools used in the data warehousing process. Shared metadata allows for the transparent integration of the multiple tools from a variety of vendors, with out the need for specialized interfaces between each of the products.

are : classes : Stored data is used to locate data in predetermined groups. Clusters : Data items are grouped according to logical relationships or consumer preferences.

Associations : Data can be mined to identify associations.

2. Decision Trees: Tree shaped structures that represent sets of decisions. These decisions generate rules for classification of dataset. 3. Genetic Algorithms: Optimization techniques that use processes such as

Sequential patterns : Data is mined to anticipate behaviour patterns and trends.

Major Steps :

Extract, transform and load transaction data onto the datawarehouse system.

genetic combinations, mutation and natural selection in a design based on the concepts of evaluation. 4. Rule Induction: The extraction of useful if-then rules from data based on statistical significance.

Store and manage the data in a multidimensional database system.

Provide data access to business analysts and Information technology professionals.

Datawarehouse with data mining:

Analyze the data by application software. Data mining: - As is well known, in mining, enormous quantities of debris have to be removed before diamonds or gold can be found. The analogy that, with a computer you can automatically find the one

Present data in useful manner such as graph or table.

Techniques in DataMining :
1. Artificial Neural Networks: Non-linear predictive models that learn through training and resemble biological neural network in structure.

'information-diamond' among the tons of data-debris in your database is of course very attractive.

Integration of a data mining in a decision support system is very helpful. The sole function of data warehouse is to supply information needed to make adequate decisions. In some cases you can use standard SQL tools for decision support, but if you want to compare millions of records and do not know exactly the type of information you require, or if you want to find hidden data then you have to turn to data mining. In many cases you will find that you need a separate computer for data mining; trying to mine operational data is almost impossible because there are different applications with different types of attributes and different data types but no historical data. With a data warehouse this problem does not exist - all the information has been transferred from the operational database to the data warehouse; furthermore,
Data

the Relationship between operational data, a data warehouse, and datamarts

Client/Server and data warehousing:


Over the past few years it has proved very difficult to built effective decision support systems because the techniques available were not able to support the end-user satisfactorily. End-users would ideally like to have available all kinds of techniques such as GUI, statistical techniques, windowing mechanisms and visualization techniques so that they can easily access the data being sought. This means that a great deal of local computer power is needed at each workstation, and the client/server technique is the solution to this problem.

in many cases you can clean the data before


Operati onal commencing data

Wareho

data mining. u
se

Datama rts

Extrac ts from severa l

datab ases

Client/Server involves dispersing the s/w over several computers and creating an environment for the end-user so that it appears that each is working on just one system. The heavy load of GUI or other visual techniques can be processed on this local machines and all the database tasks handled by a specific database serve. In this way the database server can be completely optimized for the database. In some cases you can buy special databases that operate with specific type of h/w. With client/server you only have to change the piece of s/w that is related to the end-user the other applications do not require alteration. Of all the techniques currently available on the market, client/server represents the best choice for building a data warehouse.

b. Inventory turn and product tracking in manufacturing. c. Profitable lane or driver risk analysis in transportation. d. Claims analysis or fraud detection in insurance.

DataMining:
Retail/Marketing : Identifying buying patterns from customers.

Banking: Detect patterns of fraudulent


credit card use.

Healthcare:
1. Identifying the behaviour of the risky customer.

APPLICATIONS: Datawarehousing:
a. Sales and marketing analysis across many industries.

2. Identifying successful medical therapies for different illenesses.

Conclusion: Acquiring of right information at right time to right people is key to take right decisions. To make possible so, the path called data warehouse is used to data mining.

Bibliography:
1. Data Mining Dolf Zantinge 2. Decision Support and Data Warehouse Systems by Pieter Adriaans ,

A Paper Presentation on

By:

Ch.Kameswara Rao III / IV B.Tech (IT) kamesh.chakka1233@gmail.com

M.Bharath Bhushan III / IV B.Tech (C.S.E) bharathbhushan.m@in.com

S-ar putea să vă placă și