Sunteți pe pagina 1din 77

Business Analytics

By
Dr. Atanu Rakshit
Email: atanu.rakshit@iimrohtak.ac.in
atanu.raks@gmail.com

Business Analytics
Text Book:
Business Intelligence A Managerial Approach by
Efraim Turban, Ramesh Sharda, Dursun Delen and
Devid King, 2/e, Pearson, 2012

Reference Material:
Business Analytics for Manager by Gert H. N.
Laursen and Jesper Thorlund, Wiley, 2010

Business Analytics
Reference Material:
Decision Support and Business Intelligence
Systems by Efraim Turban, Ramesh Sharda and
Dursun Delen, 9/e, Pearson, 2012
Business Intelligence Strategy A Practical Guide
for Achieving BI Excellence by John Boyer, Bill
Frank, Brian Green and Tracy Harris, MC Press,
2010

Business Analytics
Sessions Plan

Introduction to Business Analytics


Data Warehousing
Data Mining for Business Intelligence
Business Analytics Model
The Business Analytics at the Analytics Level
Business Analytics at the Strategic Level
Business Analytics at the Functional Level
Business Performance Management
Big Data Analytics
Project Presentation

Business Analytics

Introduction to Data
Warehousing

Learning Objectives
Understand the basic definitions and concepts of
data warehouses
Learn different types of data warehousing
architectures; their comparative advantages and
disadvantages
Describe the processes used in developing and
managing data warehouses
Explain data warehousing operations
Explain the role of data warehouses in decision
support

Learning Objectives
Explain data integration and the extraction,
transformation, and load (ETL) processes
Describe real-time (a.k.a. right-time and/or active)
data warehousing
Understand data warehouse administration and
security issues

Opening Vignette
DirecTV Thrives with Active Data Warehousing
Company background
Problem description
Proposed solution
Results
Answer & discuss the case questions.

Main Data Warehousing Topics

DW definition
Characteristics of DW
Data Marts
ODS, EDW, Metadata
DW Framework
DW Architecture & ETL Process
DW Development
DW Issues

What is a Data Warehouse?

A physical repository where relational data are


specially organized to provide enterprise-wide,
cleansed data in a standardized format
The data warehouse is a collection of integrated,
subject-oriented databases designed to support
DSS functions, where each unit of data is nonvolatile and relevant to some moment in time

Characteristics of DW

Subject oriented
Integrated
Time-variant (time series)
Nonvolatile
Summarized
Not normalized
Metadata
Web based, relational/multi-dimensional
Client/server
Real-time and/or right-time (active)

Data Mart
A departmental data warehouse that stores
only relevant data
Dependent data mart
A subset that is created directly from a data
warehouse
Independent data mart
A small data warehouse designed for a
strategic business unit or a department

Data Warehousing Definitions


Operational data stores (ODS)
A type of database often used as an interim area for a data
warehouse
Oper marts
An operational data mart
Enterprise data warehouse (EDW)
A data warehouse for the enterprise
Metadata
Data about data. In a data warehouse, metadata describe
the contents of a data warehouse and the manner of its
acquisition and use

DW Framework

DW Architecture
Three-tier architecture

1.
2.
3.

Data acquisition software (back-end)


The data warehouse that contains the data & software
Client (front-end) software that allows users to access
and analyze data from the warehouse

Two-tier architecture
First 2 tiers in three-tier architecture is combined into one

Sometimes there is only one tier

DW Architectures

OLAP Definition
OLAP is implemented in a multi-user client/server
mode and offers consistently rapid response to queries,
regardless of database size and complexity. OLAP
helps the user synthesize enterprise information
through comparative, personalized viewing, as well as
through analysis of historical and projected data in
various "what-if" data model scenarios. This is
achieved through use of an OLAP Server.
19

OLAP Server
An OLAP server is a high-capacity, multi-user data
manipulation engine specifically designed to support
and operate on multi-dimensional data structures.
A multi- dimensional structure is arranged so that
every data item is located and accessed based on the
intersection of the dimension members which define
that item.
The design of the server and the structure of the data
are optimized for rapid ad-hoc information retrieval
in any orientation, as well as for fast, flexible
calculation and transformation of raw data based on
formulaic relationships. 20

OLAP Server
The OLAP Server may either physically stage the
processed multi-dimensional information to deliver
consistent and rapid response times to end users, or it
may populate its data structures in real-time from
relational or other databases, or offer a choice of
both.
Given the current state of technology and the end
user requirement for consistent and rapid response
times, staging the multi-dimensional data in the
OLAP Server is often the preferred method.
21

Multi-dimensional Data
HeyI sold $100M worth of goods
Dimensions: Product, Region, Time
Hierarchical summarization paths

Product

W
S
N
Juice
Cola
Milk
Cream
Toothpaste
Soap
1 2 34 5 6 7
Month
22

Product
Industry

Region
Country

Time
Year

Category

Region

Quarter

Product

City

Office

Month

Day

Week

A Visual Operation: Pivot (Rotate)

10
Juice
Cola
Milk
Cream

47
30
12

Product

3/1 3/2 3/3 3/4

Date

23

Slicing and Dicing


The Telecomm Slice

Product

Household
Telecomm
Video
Audio

Europe
Far East
India
Retail Direct

Sales Channel

Special
24

Roll-up and Drill Down


Higher Level of
Aggregation

Sales Channel
Region
Country
State
Location Address
Sales Representative
Low-level
Details
25

Nature of OLAP Analysis


Aggregation -- (total sales,
percent-to-total)
Comparison -- Budget vs.
Expenses
Ranking -- Top 10, quartile
analysis
Access to detailed and aggregate
data
Complex criteria specification
Visualization
26

A Web-based DW Architecture

Web pages

Client
(Web browser)

Internet/
Intranet/
Extranet

Application
Server

Web
Server

Data
warehouse

Data Warehousing Architectures


Issues to consider when deciding which
architecture to use:
Which database management system (DBMS)
should be used?
Will parallel processing and/or partitioning be
used?
Will data migration tools be used to load the data
warehouse?
What tools will be used to support data retrieval
and analysis?

Alternative DW Architectures

Alternative DW Architectures

Alternative DW Architectures
1.
2.
3.
4.
5.

Independent Data Marts


Data Mart Bus Architecture
Hub-and-Spoke Architecture
Centralized Data Warehouse
Federated Data Warehouse

Each has pros and cons!

Teradata Corp. DW Architecture

Data Integration and the Extraction,


Transformation, and Load (ETL) Process
Data integration
Integration that comprises three major processes: data
access, data federation, and change capture
Enterprise application integration (EAI)
A technology that provides a vehicle for pushing data from
source systems into a data warehouse
Enterprise information integration (EII)
An evolving tool space that promises real-time data
integration from a variety of sources, such as relational
databases, Web services, and multidimensional databases

Data Integration and the Extraction,


Transformation, and Load (ETL) Process
Extraction, transformation, and load (ETL)
Transient
data source

Packaged
application

Data
warehouse
Legacy
system

Extract

Transform

Cleanse

Load

Data mart
Other internal
applications

ETL
Issues affecting the purchase of ETL tool
Data transformation tools are expensive
Data transformation tools may have a long learning curve

Important criteria in selecting an ETL tool


Ability to read from and write to an unlimited number of
data sources/architectures
Automatic capturing and delivery of metadata
A history of conforming to open standards
An easy-to-use interface for the developer and the
functional user

Data Warehouse Development

Data warehouse development approaches

Inmon Model: EDW approach (top-down)


Kimball Model: Data mart approach (bottom-up)
Which model is best?

There is no one-size-fits-all strategy to DW

One alternative is the hosted warehouse


Data warehouse structure:

The Star Schema vs. Relational

Real-time data warehousing?

Representation of Data in DW
Dimensional Modeling a retrieval-based system that
supports high-volume query access
Star schema the most commonly used and the simplest style
of dimensional modeling
Contain a fact table surrounded by and connected to several
dimension tables
Fact table contains the descriptive attributes (numerical values)
needed to perform decision analysis and query reporting
Dimension tables contain classification and aggregation information
about the values in the fact table

Snowflakes schema an extension of star schema where the


diagram resembles a snowflake in shape

Multidimensionality
Multidimensionality
The ability to organize, present, and analyze data by
several dimensions, such as sales by region, by product, by
salesperson, and by time (four dimensions)

Multidimensional presentation
Dimensions: products, salespeople, market segments, business units,
geographical locations, distribution channels, country, or industry
Measures: money, sales volume, head count, inventory profit, actual
versus forecast
Time: daily, weekly, monthly, quarterly, or yearly

Star vs Snowflake Schema


Star Schema
Dimension
TIME

Snowflake Schema
Dimension
PRODUCT

Dimension
MONTH

Quarter

Brand

M_Name

...

...

...

Fact Table
SALES

Dimension
QUARTER

UnitsSold

Dimension
BRAND
Brand
Dimension
DATE
Date

LineItem

...

...

Q_Name

...

Dimension
GOGRAPHY

Division

Coutry

...

...

...

Dimension
CATEGORY
Category

Fact Table
SALES

...
Dimension
PEOPLE

Dimension
PRODUCT

...

UnitsSold
...

Dimension
PEOPLE

Dimension
STORE

Division

LocID

...

...

Dimension
LOCATION
State
...

Analysis of Data in DW
Online analytical processing (OLAP)

Data driven activities performed by end users to query the


online system and to conduct analyses
Data cubes, drill-down / rollup, slice & dice,

OLAP Activities

Generating queries (query tools)


Requesting ad hoc reports
Conducting statistical and other analyses
Developing multimedia-based applications

Analysis of Data Stored in DW


OLTP vs. OLAP
OLTP (online transaction processing)

A system that is primarily responsible for capturing and


storing data related to day-to-day business functions
such as ERP, CRM, SCM, POS,
The main focus is on efficiency of routine tasks

OLAP (online analytic processing)


A system is designed to address the need of
information extraction by providing effectively and
efficiently ad hoc analysis of organizational data
The main focus is on effectiveness

Application-Orientation vs.
Subject-Orientation
Subject-Orientation

Application-Orientation

Operational
Database
Loans

Credit
Card

Data
Warehouse
Customer
Vendor

Trust
Savings

Product
Activity

OLAP vs. OLTP

OLTP vs Data Warehouse


OLTP

Application Oriented
Used to run business
Detailed data
Current up to date
Isolated Data
Repetitive access
Clerical User

Warehouse (DSS)

Subject Oriented
Used to analyze business
Summarized and refined
Snapshot data
Integrated Data
Ad-hoc access
Knowledge User (Manager)

OLTP vs Data Warehouse


OLTP
Performance Sensitive
Few Records accessed at
a time (tens)
Read/Update Access
No data redundancy
Database Size 100MB 100 GB

Data Warehouse
Performance relaxed
Large volumes accessed
at a time(millions)
Mostly Read (Batch
Update)
Redundancy present
Database Size
100
GB - few terabytes

OLTP vs Data Warehouse


OLTP
Transaction throughput
is the performance
metric
Thousands of users
Managed in entirety

Data Warehouse
Query throughput is the
performance metric
Hundreds of users
Managed by subsets

To summarize ...
OLTP Systems are
used to run a business

The Data Warehouse


helps to optimize the
business

OLAP Operations
Slice a subset of a multidimensional array
Dice a slice on more than two dimensions
Drill Down/Up navigating among levels of data
ranging from the most summarized (up) to the most
detailed (down)
Roll Up computing all of the data relationships for
one or more dimensions
Pivot used to change the dimensional orientation
of a report or an ad hoc query-page display

A 3-dimensional
OLAP cube with
slicing
operations

OLAP

Ti
m

Slicing Operations on a
Simple Tree-Dimensional
Data Cube

Sales volumes of
a specific Product
on variable Time
and Region

Cells are filled


with numbers
representing
sales volumes

Geography

Product
Sales volumes of
a specific Region
on variable Time
and Products

Sales volumes of
a specific Time on
variable Region
and Products

Variations of OLAP

Multidimensional OLAP (MOLAP)


OLAP implemented via a specialized
multidimensional database (or data store) that
summarizes transactions into multidimensional
views ahead of time
Relational OLAP (ROLAP)
The implementation of an OLAP database on top of
an existing relational database
Database OLAP and Web OLAP (DOLAP and WOLAP);
Desktop OLAP,

ROLAP / MOLAP Approaches

Relational OLAP: 3 Tier DSS


Data Warehouse

ROLAP Engine

Database Layer

Application Logic Layer

Presentation Layer

Generate SQL execution


plans in the ROLAP
engine to obtain OLAP
functionality.

Obtain multidimensional reports


from the DSS Client.

Store atomic data in


industry standard
RDBMS.

53

Decision Support Client

MD-OLAP: 2 Tier DSS


MDDB Engine

Database Layer

MDDB Engine

Application Logic Layer

Store atomic data in a proprietary data structure


(MDDB), pre-calculate as many outcomes as
possible, obtain OLAP functionality via proprietary
algorithms running against this data.

54

Decision Support Client

Presentation Layer
Obtain multi-dimensional
reports from the DSS
Client.

Massive DW and Scalability


Scalability
The main issues pertaining to scalability:

The amount of data in the warehouse


How quickly the warehouse is expected to grow
The number of concurrent users
The complexity of user queries

Good scalability means that queries and other


data-access functions will grow linearly with the
size of the warehouse

Symmetric Multi Processing

CPUs

Shared Symmetric System Bus

Shared
Memory

One hop

Disks

Figure 6: All components are equidistant in classical SMP architectures.

Real-time/Active DW/BI
Enabling real-time data updates for real-time
analysis and real-time decision making is
growing rapidly
Push vs. Pull (of data)

Concerns about real-time BI

Not all data should be updated continuously


Mismatch of reports generated minutes apart
May be cost prohibitive
May also be infeasible

RDW / ADW
Batch

Mini-Batch

Micro-Batch

Real-Time

Description

Source changes
Data is loaded in full Data is loaded
are captured and
or incrementally using incrementally using
accumulated to be
a off-peak window.
intra-day loads.
loaded in intervals.

Source changes
are captured and
immediately
applied to the DW.

Latency

Daily or higher

Hourly or higher

15min & higher

Second(s)

Capture

Filter Query

Filter Query

CDC

CDC

Intialization

Pull

Pull

Push, then Pull

Push

Target Load

High Impact

Low Impact, load frequency is tuneable

Source Load

High Impact

Queries at peak
times necessary

Some to none depending on CDC


technique

RDW / ADW
Need for real-time data warehousing
Decision Support has become operational
Integrated BI requires closed-loop analytics
The reach and impact of information access for
decision making can affect customer service, SCM,
and beyond.
Traditional hub-and-spoke architecture is difficult to
keep in sync
One huge BW so that data is centralized for BI/BA
tools

Real-time/Active DW at Teradata

Enterprise Decision Evolution and DW

Traditional vs Active DW
Environment

DW Administration and Security


Data warehouse administrator (DWA)
DWA should
have the knowledge of high-performance software, hardware and
networking technologies.
possess solid business knowledge and insight.
be familiar with the decision-making processes so as to suitably
design/maintain the data warehouse structure.
possess excellent communications skills.

Security and privacy is a pressing issue in DW


Safeguarding the most valuable assets
Government regulations
Must be explicitly planned and executed

The Future of DW
Sourcing

Open source software


SaaS (software as a service)
Cloud computing
DW appliances

Infrastructure

Real-time DW
Data management practices/technologies
In-memory processing (super-computing)
New DBMS
Advanced analytics

MDM
Master Data Management (MDM)
Master Data Management
Operational versus Analytical Master Data Management
Demystifying Master Data Management
Would You Like Fries With That? And Does CrossSelling Justify Master Data Management?
Data management's top eight stories of 2008
Human resources data analytics brings metrics to
workforce management
3-65

Master Data Management

Affecto 2008

Master data what is it?


Master data is data shared across computer systems
in the enterprise.
Master data is the dimension or hierarchy data in
data warehouses and transactional systems
Master data is core business objects shared by
applications across an enterprise
Slowly changing Reference data shared across
systems
Master data is data worth managing
Affecto 2008

Master data vs. Metadata vs. Transactional


Transactional
Company

Country

Account

SubAccount

Date

Amount

Affecto

NO

505050

500

20080301

KR30.000

Metadata
Company

Country

Account

Sub-Account

Date

Amount

Text

Text

Integer

Integer

Date

Float

nVarchar(50)

Char(2)

Int(6)

Int(3)

Datetime

Decimal

(YYYYMMDD)

Master data
Products
Software
Hardware
CPU

Customers
Affecto OY

Country
Europe

Affecto AS

Norway

Affecto 2008
Affecto
AB

Sweden

Master data applications


Product master data
Product Information Management (PIM)
Customer master data
Customer Data Integration (CDI)
Analytical master data
Hierarchies used for reporting
Other possible
Recipe master data
Vendor master data
Employee master data
Affecto 2008

What is Master Data Management


The processes and technology to produce and
maintain a single clean copy of master data
The Golden record
An Application for creating and maintaining an
authoritative view of master data including
policies and procedures for access, update,
modification, viewing between systems across
the enterprise
Affecto 2008

Why master data management?

Different people involved


Inevitable manual process
Error-prone, inconsistent
No way to audit
No way to rollback changes
Time and resource
consuming
Updates are interpreted
by systems experts

PPS
PPS
Admin

Essbase
Essbase
Analysis
Services

DW

Accounts
Entity
Project
Product
Location
Channel

ETL
ERP
EAI
Admin Spreadsheet

Dynamics

ERP

SAP

Custom

Review
Spreadsheet
Business
User
Affecto 2008

IT Admin
E-Mail

Master data Management solution


Single version of
truth
Master data
synchronized and
validated
Data maintained by
Business Users and
domain experts
not systems
experts

PPS

Essbase
Essbase
Analysis
Services

DW

Accounts
Entity
Project
Product
Location
Channel

Dynamics

Business
User

Affecto 2008

MDM

ERP

ETL
EAI

SAP

Custom

Governance & Compliance


Master data governance
Can you track changes in dimensions?
Do you know who made the changes?
Do you know when changes occurred?
Can you produce a dimension from Q2 last year?

Compliance
International accounting standard
Transparency and auditability
Affecto 2008

DW Implementation Issues
Tasks for successful DW implementation
Establishment of service-level agreements and data-refresh
requirements
Identification of data sources and their governance policies
Data quality planning
Data model design
ETL tool selection
Relational database software and platform selection
Data transport
Data conversion
End-user support

DW Implementation Guidelines

Project must fit with corporate strategy & business


objectives
There must be complete buy-in to the project by executives,
managers, and users
It is important to manage user expectations about the
completed project
The data warehouse must be built incrementally
Build in adaptability, flexibility and scalability
The project must be managed by both IT and business
professionals
Only load data that have been cleansed and are of a quality
understood by the organization
Do not overlook training requirements

Successful DW Implementation
Things to Avoid
Starting with the wrong sponsorship chain
Setting expectations that you cannot meet
Engaging in politically naive behavior
Loading the data warehouse with information just
because it is available
Believing that data warehousing database design is
the same as transactional database design
Choosing a data warehouse manager who is
technology oriented rather than user oriented

Failure Factors in DW Projects


Lack of executive sponsorship
Unclear business objectives
Cultural issues being ignored
Change management

Unrealistic expectations
Inappropriate architecture
Low data quality / missing information
Loading data just because it is available

BI / OLAP Portal for Learning


MicroStrategy, and much more
www.TeradataStudentNetwork.com
Pw: <check with TDUN>

Q&A

S-ar putea să vă placă și