Documente Academic
Documente Profesional
Documente Cultură
Introduction to DW
Author: Deepak Natarajan & Aditya Gollapudi
Systech Education
Course Agenda
Session 1
What is DW? Elements of DW OLTP vs. OLAP Types of OLAP Implementing Life Cycle
Systech Education
Session 2
Session 3
Introduction to DW
Purpose
The purpose of this module is to give an insight into the basic concepts and terminology of data warehousing and business intelligence
Introduction to DW
Systech Education
Objective
Define a Data warehouse. Understand the elements of DW. Differentiate between OLTP and OLAP system. Explain the types of OLAP system Understand the implementation life cycle
Introduction to DW
Systech Education
What is DW?
Data Warehouse
Systech Education
The Problem
IBM Hunter # ?
?
VAX
WLIC ? WASTE
HP SI ERS
Freds PC ?
Introduction to DW
Systech Education
The Solution
DATA WAREHOUSE
meta data
Introduction to DW
Systech Education
Data Warehouse
A warehouse is a subject-oriented, integrated, timevariant and non-volatile collection of data in support of management's decision making process.
Introduction to DW
Systech Education
Data Warehouse
Subject Oriented:
Data that gives information about a particular subject instead of about a company's ongoing operations. Data that is gathered into the data warehouse from a variety of sources and merged into a coherent whole.
Integrated:
Introduction to DW
Systech Education
Data Warehouse
Time-variant:
All data in the data warehouse is identified with a particular time period. Data is stable in a data warehouse. More data is added but data is never removed. This enables management to gain a consistent picture of the business.
Non-volatile
Introduction to DW
Systech Education
10
Elements of DW
Source System Staging Area Dimension Model Data Mart Data Warehouse ODS EDW Kimballs Approach Inmons Approach
Systech Education
Source System
An operational system of record whose function is to capture the transactions of the business. Characteristics of Source System:
Priorities are uptime and availability. Queries against source system are narrow and severely restricted. Maintains little historical data.
Introduction to DW
Systech Education
12
Staging Area
It is a storage area and set of processes that clean, transform, combine, de-duplicate, household, archive and prepare source data for use in the data warehouse. Characteristics of Staging Area:
It is layered between the source system and presentation server. It does not provide query and presentation services.
Systech Education
Introduction to DW
13
Dimensional Model
It is a technique for modeling data that is alternative to entity-relationship (E-R) modeling. A dimensional model contains the same information as an E-R model but packages the data in a symmetric format. Components Of Dimensional Model
Fact Table
Is the primary table in each dimensional model that is meant to contain measurements of the business. Each dimension is defined by its primary key that serves as the basis for referential integrity with any given fact table.
Systech Education
Dimension Table
Introduction to DW
14
Data Mart
A logical subset of the complete data warehouse. A data mart is usually built for a single part of the business and organized around single business process.
Every data mart must be represented by a dimensional model. Basis for top-down and bottom-up approach in data warehouse.
Systech Education
Introduction to DW
15
Data Warehouse
The queryable source of data in the enterprise. The data warehouse is the union of all the constituent data marts.
Historical data is maintained. Data is fed from the data staging area. It is also frequently updated on a controlled load basis as data is corrected, snapshots are accumulated and label are changed.
Systech Education
Introduction to DW
16
An ODS is an integrated database of operational data. Its sources include legacy systems and it contains current or near term data. An ODS may contain 30 to 60 days of information, while a data warehouse typically contains years of data.
Introduction to DW
Systech Education
17
An ODS is usually designed to contain low level or atomic (indivisible) data such as transactions and prices. Only Data warehouse contains aggregate data.
Introduction to DW
Systech Education
18
An Enterprise Data Warehouse is a data warehouse containing all publishable quality data of a permanent nature collected by an organization. This inevitably includes historic data from multiple data sources.
Introduction to DW
Systech Education
19
Operational transaction data is usually excluded due to its volatile nature. Enterprise data warehouses are valuable resources, are costly to construct, and require a long time to evolve.
Introduction to DW
Systech Education
20
Kimballs Approach
Start with clearly defined user requirements. Build a subject area at a time based on a star schema. The data warehouse is the union of all the data marts but only if the dimensions conform across all the fact tables. Before loading the facts and dimensions, first concentrate on the staging area.
Systech Education
Introduction to DW
21
Inmons Approach
Advocates normalized enterprise data warehouse. Time variant data structures. Dont be concerned about requirements too much up front.
Introduction to DW
Systech Education
22
OLTP ER Model OLAP ER Model Relationship between ER Model and Dimension Model Advantages of Dimension Modeling
Systech Education
OLTP - ER Model
Logical design technique that seeks to eliminate data redundancy conceptual data model that views the real world as entities and relationships Entity-Relationship diagram is used to visually represents data objects
Introduction to DW
Systech Education
24
The ER modeling technique is a discipline used to illuminate the microscopic relationships among data elements. The highest art form of ER modeling is to remove all redundancy in the data. This is immensely beneficial to transaction processing because transactions are made very simple and deterministic.
Systech Education
Introduction to DW
25
Example:- The transaction of updating a customer's address may devolve to a single record lookup in a customer address master table. This lookup is controlled by a customer address key, which defines uniqueness of the customer address record and allows an indexed lookup that is extremely fast. It is safe to say that the success of transaction processing in relational databases is mostly due to the discipline of ER modeling
Systech Education
Introduction to DW
26
Introduction to DW
Systech Education
27
Logical design technique that seeks to present the data in a standard framework that is intuitive and allows for high performance access. Every dimensional model is composed of
One table with a multi part key called the fact table A set of smaller tables called dimension tables
Introduction to DW
Systech Education
28
Introduction to DW
Systech Education
29
A single entity relationship diagram breaks down into multiple fact table diagrams ER diagrams are useful, but they are meant to be viewed in small sections, not all at once.
Introduction to DW
Systech Education
30
OLTP
OLAP
OLTP is a class of program that OLAP enables a user to easily and facilitates and manages transaction- selectively extract and view data oriented applications, typically for from different points-of-view. data entry and retrieval transactions Source of data: Operational data; OLTPs are the original source of the data Source of data: Consolidation data; OLAP data comes from the various OLTP databases
Introduction to DW
Systech Education
31
OLTP
Purpose of Data: To control and run fundamental business tasks What the data reveals: A snapshot of ongoing business processes
OLAP
Purpose of Data: To help with planning, problem solving, and decision support What the data reveals: Multi-dimensional views of various kinds of business activities
Introduction to DW
Systech Education
32
OLTP
Inserts and Updates: Short and fast inserts and updates initiated by end users Queries: Relatively standardized and simple queries returning relatively few records
OLAP
Inserts and Updates: Short and fast inserts and updates initiated by end users Queries: Often complex queries involving aggregations
Introduction to DW
Systech Education
33
OLTP
Processing speed: Typically very fast
OLAP
Processing speed: Depends on the amount of data involved; batch data refreshes and complex queries may take many hours; query speed can be improved by creating indexes Space requirements: Larger due to the existence of aggregation structures and history data; requires more indexes than OLTP
Systech Education
Introduction to DW
34
OLTP
Database design: Highly normalized with many tables
OLAP
Database design: Typically denormalized with fewer tables; use of star and/or snowflake schemas Data Access: Moderate access frequency; large quantities of data; predominantly reading operations
Data Access:Very frequent access; small quantities of data per operation; Reading, writing, modifying, deletion
Introduction to DW
Systech Education
35
OLTP
Backup and recovery: Backup religiously; operational data is critical to run the business, data loss is likely to entail significant monetary loss and legal liability
OLAP
Backup and recovery: Instead of regular backups, some environments may consider simply reloading the OLTP data as a recovery method
Introduction to DW
Systech Education
36
The first step in converting an ER diagram to a set of DM diagrams is to separate the ER diagram into its discrete business processes and to model each one separately. The second step is to select those many-to-many relationships in the ER model containing numeric and additive nonkey facts and to designate them as fact tables. The third step is to denormalize all of the remaining tables into flat tables with single-part keys that connect directly to the fact tables. These tables become the dimension tables.
Systech Education
Introduction to DW
37
The master DM model of a data warehouse for a large enterprise will consist of somewhere between 10 and 25 very similar-looking star join schemas. Each star join will have four to 12 dimension tables. If the design has been done correctly, many of these dimension tables will be shared from fact table to fact table.
Systech Education
Introduction to DW
38
Drilling Down means adding more dimension attributes to the SQL answer set from within a single star join. Drilling Up means removing more dimension attributes from the SQL answer set within a single star join. Drilling Across means linking separate fact tables together through the conformed (shared) dimensions.
Systech Education
Introduction to DW
39
predictable, standard framework predictable framework of the star join schema withstands unexpected changes in user behavior extensible to accommodate unexpected new data elements and new design decisions administrative utilities and software processes helps to manage and use aggregates
Systech Education
Introduction to DW
40
Types of OLAP
Systech Education
Types of OLAP
ROLAP
Relational OLAP. ROLAP systems store data in the relational database. Multidimensional OLAP. MOLAP systems store data in the multidimensional cubes. HOLAP technologies attempt to combine the advantages of MOLAP and ROLAP.
Systech Education
MOLAP
HOLAP
Introduction to DW
42
ROLAP
Source Systems Data Warehouse Server
Clients
Introduction to DW
Systech Education
43
MOLAP
RDBMS Server MDBMS Server Clients
Introduction to DW
Systech Education
44
Introduction to DW
Systech Education
45
HOLAP
RDBMS Server MDBMS Server Clients
Introduction to DW
Systech Education
46
Planning Business Requirements Definition Dimensional Modeling Physical Design Data Staging Design and Development Technical Architecture Design
Systech Education
Systech Education
Lifecycle Approach
Successful implementation of a data warehouse depends on the appropriate integration of numerous tasks and components. You need to coordinate the many facets of a data warehouse and demonstrate strength across all aspects of the project for success. The Business Dimensional Lifecycle ensures that the project pieces are brought together in the right order and at the right time.
Systech Education
Introduction to DW
49
Project Planning
Dimensional Modeling
Physical Design
Deployment
Data Track Application Track
Project Management
Introduction to DW
Systech Education
50
Project Planning
The lifecycle begins with project planning. It addresses the definition and scoping of the data warehouse project, including early critical tasks like readiness assessment and business justification. Then project planning focuses on resource and skill-level staffing requirements coupled with project task assignments, duration and sequencing. Project planning is dependent on the business requirements, as denoted by the two-way arrow between the activities.
Systech Education
Introduction to DW
51
A data warehouses success is greatly increased by a sound understanding of the business end users and their analytical requirements. The designers must understand the key factors driving the business to effectively determine business requirements and translate them into design considerations. The business requirements establish the foundation for the three parallel tracks focused on technology, data and end user applications.
Systech Education
Introduction to DW
52
Dimension Modeling
The definition of the business requirements determines the data needed to address business users analytical requirements. Then data models to support these analyses are designed by the construction a matrix that represents key business processes and their dimensionality which will serve as a blueprint to ensure that the data warehouse is extensible across the organization over time.
Systech Education
Introduction to DW
53
Coupling this data analysis with our earlier understanding of the business requirements, we then develop a dimensional model with a fact table grain, associated dimensions, attributes, and hierarchical drill paths and facts. The logical database design is completed with the appropriate table structure and primary/foreign key relationships and also the preliminary aggregation plan is also developed.
Systech Education
Introduction to DW
54
Physical Design
Physical database design focuses on defining the physical structures necessary to support the logical database design. Primary elements include defining the naming standards and setting up the database environment. Preliminary indexing and partitioning strategies are also determined.
Systech Education
Introduction to DW
55
The extract process always exposes data quality issues that have been buried within the operational data store. If not addressed properly they can significantly impact the credibility of the data warehouse. Also two warehouse staging processes need to be build:
One for the initial population of the data warehouse. Another for the regular, incremental loads.
Systech Education
Introduction to DW
56
The technical architecture design establishes the overall architecture framework and vision. Three factors must be considered simultaneously to establish the data warehouse technical architecture design:
Business requirements, current technical environment, and planned strategic technical directions.
Introduction to DW
Systech Education
57
Using the technical architecture design as framework, specific architectural components such as the hardware platform, database management system, data staging tool, or data access tool will need to be evaluated and selected. A standard technical evaluation process is defined along with specific evaluation factors for each architectural component. Once the product has been evaluated and selected, they are then installed and thoroughly tested to ensure end-to-end integration with the data warehouse environment.
Systech Education
Introduction to DW
58
A set of standard end user application is usually defined since not all business users need ad hoc access to the data warehouse. Application specification describe the report template, user driven parameters, and required calculations. These specifications ensure that the development team and the business users have a common understanding of the applications to be delivered.
Systech Education
Introduction to DW
59
The development of the end user applications involves configuring the tool metadata and constructing the specified reports. Optimally these applications are build using an advanced data access tool that provides significant productivity gains for the development team. In addition, it offers a powerful mechanism for business users to easily modify existing report templates.
Systech Education
Introduction to DW
60
Deployment
Deployment represents the convergence of technology, data, and end user applications accessible from the business users desktop. Extensive planning is required to ensure that these puzzle pieces fit together properly. Business users education integrating all aspects of the convergence must be developed and delivered. In addition, user support and communication or feedback strategies should be established before any business users have access to the data warehouse.
Systech Education
Introduction to DW
61
Focus on the business users must continue by providing them with ongoing support and education. Also focus on the backroom must continue to ensure that the processes and procedures are in place for effective ongoing operations of the data warehouse.
Introduction to DW
Systech Education
62
Project Management
Used to ensure that Business Dimensional Lifecycle activities remain on track and in sync. Activities include:
Monitoring project status Issue tracking Change control to preserve scope boundaries The development of a comprehensive project communication plan that addresses both the business and information systems organizations
Systech Education
Introduction to DW
63
High level task sequencing Activities that should be happening concurrently throughout the technology data Application tracts
Introduction to DW
Systech Education
64