Sunteți pe pagina 1din 31

TEAM

Bharat Jain Ankita Golchha Shilpa Kasani Vijay Kumar Tasneem Taj 26 27 28 29 30

Rachana Kola

31

Data warehousing
Data Warehousing is a database used for reporting &

analysis.

It focuses on data storage. Essential components of Data warehouse system. Data warehouse can be subdivided into data marts.

Characteristics of data warehouse

Conceptual view.

Unlimited dimensions.

Dynamic sparse matrix handling.

Client / server architecture.

Accessibility & transparency.

OLTP & OLAP


OLTP : Online transaction processing It is characterized

by a large no. of short online transactions. ( Insert , Update , Delete )

OLAP : Online Analytical Processing It is characterized

by relatively low volume of transactions.

OLTP v/s OLAP


OLTP (Operational system) Sources of data Purposes of data Operational data. Control and run fundamental business tasks. Snapshot . Short and fast. Simple queries. Very fast. Relatively small. Highly normalized. OLAP (Data warehouse system) Consolidation data. Planning , problem solving & decision making. Multi dimensional view. Periodic long running. Complex queries Depends on the amount of data involved. Large due to aggregation and history data. De normalized.

What the data reveals Inserts & updates Queries Processing speed Space requirements Database design

Backup & recovery

Backup essential regularly.

Reloading OLTP data as a recovery method.

ARCHITECTURE
External data sources EXTRACT CLEAN TRANSFORM LOAD REFRESH Serves Reports Metadata Repository

OLAP

Data warehouse
Operational systems

Data Mining

COMPONENTS
3 main systems required :
Source systems Data staging area Presentation servers

Operational data :
Internal data External data

Load manager :
Simple transformation of data to prepare the data for entry

into the warehouse.

CONTD..
Warehouse manager :
Analysis of data. Transformation & merging of data. Backing up & archiving of data.

Detailed summarized archived data. Meta data :


Meta data means data about data. Extraction & loading process. Warehouse management process. Query management process.

End user access tools.

ETL PROCESS
Extract

Loading

Transform

Cleansing

IMPORTANT TERMS
Drill down

Roll up

Aggregation

Granularity

DATA WAREHOUSING SCHEMA MODELS


A schema is a collection of database objects, including tables,

views, indexes, and synonyms.


There are many schema models designed for data

warehousing but the most commonly used are:


1. 2.

Star schema Snowflake schema

3.

Fact constellation schema

DIMENSIONAL DATA MODEL


Dimensional data model is most often used in data warehousing

systems.
The objective of dimensional modeling is to represent a set of business

measurements in a standard framework that is easily understandable by end users.


The main components of a Dimensional Model are Fact Tables and

Dimension Tables.
A fact table is a table that contains the measures of interest.

A dimension is a structure usually composed of one or more hierarchies

that categorizes data.

Example of dimensional model

STAR SCHEMA

The star schema is also called star-join schema, data cube, or multi-dimensional schema.

It is the simplest style of data warehouse schema.

The star schema consists of one or more

fact tables referencing any number


of dimension tables.

A star schema classifies the attributes of an event into facts (measured numeric/time data), and descriptive dimension attributes (product id, customer name, sale date) that give the

facts a context.

SAMPLE STAR SCHEMA

Advantages of star schema


The main advantages of star schemas are that they:
Provide a direct and intuitive mapping between the business entities

being analyzed by end users and the schema design.


Provide highly optimized performance for typical star queries. Are widely supported by a large number of business intelligence

tools, which may anticipate or even require that the data-warehouse schema contain dimension tables

SNOW FLAKE SCHEMA


Extension of the star schema.

Each point of the star explodes into more points.

That dimensional table is normalized into multiple lookup tables each representing a level in the dimensional hierarchy.

SAMPLE SNOWFLAKE SCHEMA

CONTD..
The Time Dimension that consists of 2 different hierarchies:
Year Month Day
Week Day

We will have 4 lookup tables


For year For month For week For day

Year is connected to Month, which is then connected to Day. Week is only connected to Day.

FACT CONSTELLATION SCHEMA

FACT CONSTELLATION SCHEMA


Shaped like a constellation of stars
More complex than star or snowflake With each star schema it is possible to construct fact

constellation schema By splitting the original star schema into more star schemes each of them describes facts on another level of dimension hierarchies

TYPES OF DATA WAREHOUSING APPLICATIONS


Personal productivity.

Query and reporting.


Planning and analysis.

PERSONAL PRODUCTIVITY APPLICATIONS


Useful for manipulating and presenting data on

individual PCs. Developed for a standalone environment Address applications requiring only small volumes of warehouse data.

DATA QUERY & REPORTING


Data access through simple, list-oriented queries, and the

generation of basic reports.

Provide a view of historical data .

Do not address the enterprise need for in-depth analysis

and planning.

PLANNING & ANALYSIS


Address such essential business requirements.

Referred to as on-line analytical processing (OLAP)

applications. Mandates that the organization look not only at past performance but, more importantly, at the future performance of the business. The combined analysis of historical data with future projections is critical to the success of today's corporation.

Advantages of data warehousing


Provides business users a customer centric view of the companys heterogeneous data. Added value to companys customers through better access to information. Historical information.

Enhanced data quality.

Supplements disaster recovery plans.

One stop shop.

Provides saving in billing processes, reduces fraud losses etc.

Disadvantages of data warehousing


Not optimal for unstructured data. Data warehouses get outdated relatively quickly. Duplicate functionality between data warehouses &

operational systems. Extremely expensive. Costs : Time spent in careful analysis. Design & implementation. Hardware costs. Software costs. On going support & maintenance.

Conclusion
Data warehousing is necessary to analyze the business

needs, integrate data from several sources, model the data in an appropriate manner to present the business information in the form of dashboards and reports.

S-ar putea să vă placă și