Documente Academic
Documente Profesional
Documente Cultură
By
Ali Gardezi
Prashanth Janardanan
Aaron Sheffield
The Evolution Of Data Warehousing
Competitive Advantage
– Huge returns of investment for companies who
have successfully implemented a data warehouse
is evidence of the enormous competitive
advantage that accompanies this technology.
– Competitive advantage is gained by allowing
decision-makers access to data that can reveal
previously unavailable, unknown and untapped
information on, for example, customers, trends
and demands.
Benefits of Data Warehousing
Operational Data
source 2 Lightly Summarized Data
OLAP Tools
DBMS
Detailed Data
Warehouse Manager
Operational Data
source n
Archive/backup
data
End-user access tools
Operational Data
store (ODS)
Data Warehouse Architecture
Operational Data:
– Source of Data for the Database Warehouse.
Operational Data Store:
– A repository of current and integrated Operational
Data used for analysis.
Load Manager:
– Performs all the operations associated with the
extraction and loading of data into the warehouse.
Data Warehouse Architecture
Warehouse Manager:
– Performs operations such as
Analysis of data to ensure consistency.
Transformation and merging of source data from
temporary storage into data warehouse tables.
Creation of indexes and views on base tables.
Generation of denormalization.
Generation of aggregations.
Backing-up and archiving data.
Data Warehouse Architecture
Query Manager
– Performs all operations associated with the management of user
queries.
– Examples,
Directing queries to the appropriate tables.
Scheduling the execution of queries.
Detailed Data
– The area that stores all the detailed data in the database schema.
– Not available online but made available by aggregating data to the
next level of detail.
– On a regular basis, detailed data is added to the warehouse to
supplement the aggregated data.
Data Warehouse Architecture
Metadata
Operational Data Reporting, query, application
source 1 development, and EIS tools
Highly
Summarized Query
Inflow Manager
Load data
Manager
Outflow
Lightly Summarized Data
OLAP Tools
Upflow
DBMS
Detailed Data
Warehouse Manager
Operational Data
source n Downflow
Data mining tools
Archive/backup
data
End-user access tools
Operational Data
store (ODS)
Data Warehouse Data Flows
Downflow
– The processes associated with archiving and backing-up of
data in the warehouse.
– Archiving old data plays an important role in maintaining
the effectiveness and performance of the warehouse by
transferring the older data of limited value to a storage
archive such as a magnetic tape or optical disk.
– The downflow of data includes the processes to ensure that
the current state of the data warehouse can be rebuilt
following data loss, or software/hardware failures.
– Archived data should be stored in a way that allows the re-
establishment of the data in the warehouse, when required.
Data Warehouse Data Flows
Outflow
– The processes associated with making the data
available to the end-users.
– The two key activities involved in the outflow
include:
Accessing, which is concerned with satisfying the end-
users’ requests for the data that they need.
Delivering, which is concerned with proactively
delivering information to the end-users’ workstations and
is referred to as a type of ‘publish-and-subscribe’
process.
Data Warehouse Data Flows
Metaflow
– The processes associated with the management
of the metadata.
– Metaflow is the process that moves metadata
(data about other flows).
– Metadata is a description of the data contents of
the data warehouse, what is in it, where it came
from originally, and what has been done to it by
way of cleansing, integrating, and summarizing.
Data Warehouse Tools And
Technologies
A subset of Data
Warehouse that supports
the requirements of a
particular department or
business function.
Date Warehouse Vs Data Mart
Operational Data
source 1 Reporting, query, application
development, and EIS tools
Metadata
Load Highly
Manager summarized
data
Query
Manager
Operational Data Lightly Summarized Data
source 2
OLAP Tools
DBMS
Detailed Data
Warehouse Manager
Operational Data
source n
Data marts
Data mining tools
Operational Data
store (ODS) Archive/Backup data
Summarized data
Data Mart Issues
Functionality from being small, easy-to-access databases, the capabilities and complexity
of data marts have increased
Load performance faster end user response time leads to large number of summary
tables and hence increase in data loading time
Users access to multiple data marts this lead to creation of virtual data marts which are
nothing but views of several physical data marts
Advanced Join methods offers partition wise joins , which increases the performance
of joins and thereby reducing the query response time
SQL optimizer One of the most powerful features of oracle. Its cost based optimizer
determines the most efficient access paths and joins for every query.
TimeID(pk)
Day
Fact table week
___________
timeID(pk) propertyID(pk)
propertyID(pk) type
branchID(pk) city
Offerprice
Sellingprice branchID(pk)
Type
City
region
Unnormalized
Dimension tables
Dimensionality Modeling cont…
branchID(pk)
Fact table Type
___________ city
timeID(fk)
propertyID(fk)
branchID(fk)
Offerprice
City(pk)
Sellingprice
region
Normalized
Dimension tables
Salient Features of DM
Efficiency The consistency of the underlying database structure allows more efficient
access to the data by various tools like report writers and query tools
Ability to handle changing environments This design is better able to support ad hoc
user queries
Extensibility This model is extensile as far as adding new facts, dimensions , attributes
and breaking existing dimension records are concerned
Ability to model common business situations using report writers, query tools and
other user interfaces, every situation has a well-understood set of alternatives
Predictable query processing Even though the overall suite of star schemas in the
enterprise dimensional model is complex, the query processing is very predictable as
every fact table has to queried independently
DM Vs ER modeling
Explicit declaration
Conformed dimensions and facts
Dimensional integrity
Open aggregate navigation
Dimensional symmetry
Sparsity tolerance
Administration Criteria
Graceful modification
Dimensional replication
Changed dimension notification
Surrogate key administration
International consistency
Multiple-dimension hierarchies
Ragged-dimension hierarchies
Expression Criteria
Analytic capabilities.
Generally what the user sees most often
(make sure the boss is happy).
Expression Criteria