Sunteți pe pagina 1din 35

Unit 11:

Analysis Process
Designer and
Data Mining
Agenda

 Analysis Process Designer (APD)

 Integration of the Data Mining Workbench

 Exercise

 SAP AG 2003, Title of Presentation, Speaker Name / 2


Do you know….

...your data exactly, which are available in your SAP BW?

...the relationships between your data ?

...the potential of your data in detail?

...do you use the potential of your data?

 SAP AG 2003, Title of Presentation, Speaker Name / 3


New Feature in SAP BW…

 Exploration of data available through all SAP BW


storage media

 Application of advanced analytical methods

 Goal: gain new insights from these data

New insights

 meaningful relationships between these data, which


are hidden or too complex to be uncovered through
pure observation or intuition.

 SAP AG 2003, Title of Presentation, Speaker Name / 4


An Analysis Process

Sources Target

Transformation

 SAP AG 2003, Title of Presentation, Speaker Name / 5


The Analysis Process Designer
Transfer results into &
apply in Campaigns-Target groups
OLTPs (e.g. SAP CRM)

Transformation

x²+y²+2dx+2ey+f=0
(x,y)=F(x²,y
²)
Selection Update and Analysis
in SAP BW

SAP BW
Preparation

 SAP AG 2003, Title of Presentation, Speaker Name / 6


Integration of the Analysis Process Designer in SAP BW

Preparation and
Data extraction Data Storage / delivery of the
from disparate Consolidation information
data sources and Structuring (Reports and
Analyses)

Analysis Process Designer

“Enhanced Transfer
Rules“

 SAP AG 2003, Title of Presentation, Speaker Name / 7


The Interactive Workbench of the APD

Analysis
Process
Repository

Drag & Drop


“Step-by-Step“
Context Menu
for Display and
Settings

 SAP AG 2003, Title of Presentation, Speaker Name / 8


APD – Process Overview

Step 1: Select Data Step 2: Prepare Data

Step 3: Transform Data

Step 4: Store/Transfer Data Step 5: Deploy Data


Campaigns-Target groups
BW ABC Analysis-Customer
Other
CRM
System
(e.g. CRM) A B C

 SAP AG 2003, Title of Presentation, Speaker Name / 9


APD – Sources (Step 1)

Based on the task or problem at hand, data must be provided which is


statistically and semantically relevant. The APD provides a mechanism for
delivering this data in an easy to use, graphical environment. Subsequent
steps in the process can then be assured of a firm bases for continued
analysis.

Read from the following sources :

Characteristic: Read data from an InfoObject Master Data

InfoProvider: Use an InfoCube, ODS object, or Multi-Provider as source

Query: Read data from a query

Flat File: Read data from a flat file

Data Base Tables: Read data from a database table (hidden with BW 3.5)

 SAP AG 2003, Title of Presentation, Speaker Name / 10


APD Data Transformation (Step 2 & Step 3)

Preparation (Step 2) :

To maintain the quality of the analysis process results a clean,


complete and error free database is crucial. In order to ensure quality,
the APD provides basic data operations to prepare and cleanse the
raw data.

Transformation (Step 3) :

With the help of the robust transformations it is possible to discover


and make hidden information relationships apparent.

 SAP AG 2003, Title of Presentation, Speaker Name / 11


APD – Data Transformation I: Preparation (Step 2)

Use the following operations to prepare your data:

Filter: Restrict the amount of data to be processed

Aggregation: Group and aggregate data according to selected fields

Join: Merge data from two different sources

Sort: Sort the data according to the selected fields

Transpose into columns: Transform flat data records into a list

Transpose into rows: Transform a list into flat data records

Hide columns: Hiding of entire columns

 SAP AG 2003, Title of Presentation, Speaker Name / 12


APD – Data Transformation II (Step 3)

These features realize the Transformation of the Data :

ABC Classification: Calculation of ABC Classification

Regression: Application of linear/non-linear Regression algorithm

Clustering: Application of Clustering algorithm

Scoring: Application of “Weighted Score Table“ algorithm

Decision Tree: Application of Decision Tree algorithm

Data Mining: Application of external (non-SAP) Data Mining Models

Routine: Custom transformation via ABAP routine

 SAP AG 2003, Title of Presentation, Speaker Name / 13


APD – Visualization of Intermediate Results

A cleansed, complete and error-free database is crucial for meaningful


results in an analysis process. This is realized step-by-step within the APD.
At the same time it must be possible to check each process step individually.

To facilitate the discovery process the APD provides visualization tools with
which intermediate results can be easily displayed or the quality of the data
can be analyzed:

 Display Data: the data per “process node“ can be displayed at any time in tabular
format.
 Elementary Statistics: Advanced visualization methods for a quick view of basic
properties and quality of the interim results per “process node”. This functionality
includes histograms, distributions and basic statistical measures like means,
standard deviations, correlations and visualizations.
 Calculate Intermediate Results: the interim results of each “process node” can
be stored temporarily for performance reasons.

 SAP AG 2003, Title of Presentation, Speaker Name / 14


APD – “Data Display“

 SAP AG 2003, Title of Presentation, Speaker Name / 15


APD – Basic Statistics

 SAP AG 2003, Title of Presentation, Speaker Name / 16


APD – Basic Statistics

 SAP AG 2003, Title of Presentation, Speaker Name / 17


APD – Data Targets (Step 4)

Write results back to the following targets :


ODS Object: Load results back to a transactional ODS; from here updates into other SAP BW Data targets can be performed.

Master Data: Update InfoObject master data

OLTP System: Transfer results to a OLTP system, e.g.CRM system

Association Analysis: Training/Application of the Association Analysis algorithm

Regression: Training of linear/non-linear Regression algorithm

Clustering: Training of Clustering algorithm

Decision Tree: Training of Decision Tree algorithm

Data Mining Models as Data Targets :

 SAP AG 2003, Title of Presentation, Speaker Name / 18


ETL versus APD

At first sight the APD seems to offer the similar functions, as those performed by ETL
Tools in Data Warehousing solutions. Nevertheless it is important to note that these
are two completely different applications with different objectives.

1. ETL Process
Extraction: data procurement, that is selection of relevant data from source
systems and supply of the data work area

Transformation: processing and massaging data to specified structure and


quality requirements of the data in the workspace.

Loading: bringing the data physically from the workspace into the Data
Warehouse.

2. APD Process
In an Analysis Process existing data are accessed in SAP BW (Data Warehouse) and
completely new data are created via the use of specialized transformations and only the
new data are written back into the database of the SAP BW or an operational system.

 SAP AG 2003, Title of Presentation, Speaker Name / 19


ETL versus APD

Source Systems Data Warehouse

ETL Process

Data Data New Data


~= +

APD Process

Data copies New Data

 SAP AG 2003, Title of Presentation, Speaker Name / 20


Business Benefits

The most important potential which can be realized with this improved
information are:

 Cost reduction (TCO)

 Revenue improvement (ROI)

 Improved customer experience and -satisfaction

By being 100% integrated into the SAP BW, the Analysis Process
Designer (incl. Data Mining Features) also guarantees, that only a single
database is accessed and not different data tables in different source
system. This significantly decreases interfacing problems as well as
related issues with data integrity, quality and system performance.

 SAP AG 2003, Title of Presentation, Speaker Name / 21


Agenda

 Analysis Process Designer (APD)

 Integration of the Data Mining Workbench

 Exercise

 SAP AG 2003, Title of Presentation, Speaker Name / 22


Definition von Data Mining

"Data mining is the process of discovering meaningful new

correlations, patterns and trends by "mining" large amounts of stored

data using pattern recognition technologies, as well as statistical and

mathematical techniques.“ (Ashby, Simms (1998))

 SAP AG 2003, Title of Presentation, Speaker Name / 23


What Is Data Mining?

Data mining helps you turn data into knowledge:

 Data mining is an analytical approach that looks for hidden information


patterns in large databases

 Data mining helps to turn high-volume data into high-value information

Data mining helps you turn data into action:

 Data mining not only provides insights by analyzing past data, but it is
also capable of predicting future trends and behaviors

 Data mining allows organizations to make the critical jump from


retrospective analysis to prospective decision-making

 The overall objective of data mining is “knowledge


discovery”.
 SAP AG 2003, Title of Presentation, Speaker Name / 24
Definition of “Knowledge Discovery in Databases”

By “Knowledge Discovery in Databases” (KDD) we see a process-based


approach to the search for potentially high-quality knowledge in source
databases. In the center of these processes Data Mining methods are
strategically placed. Nevertheless the overall quality of Data Mining results also
requires both the preliminary, preparation steps as well as post-processing
activities.

 SAP AG 2003, Title of Presentation, Speaker Name / 25


Crisp-Model (“Cross-Industry Process for Data Mining”)*

Business Data Under-


Understanding standing

..…
….. Data
Data Preparation
Deployment

Modeling

Evaluation

* Chapman et al. 1998

 SAP AG 2003, Title of Presentation, Speaker Name / 26


Crisp-Model ("Cross-Industry Standard Process for Data Mining")

Business Understanding
Description of the Business Objectives and Data Mining Goals / Success
Data Understanding
Selection of the data and exploratory analysis (quality, problems, description of selected
data)
Data Preparation
Cleaning, transformation, integration, formatting of the selected data
Modeling
Selection, building, testing and running different models
Evaluation
Approval of the models and assessment of the results (in accordance with the defined
objectives), review of the process

Deployment
Preparation of final reports, presentation, action plans and deployment of results

 SAP AG 2003, Title of Presentation, Speaker Name / 27


Definition of “Knowledge Discovery in Databases”

There are several divergent KDD methodologies / models. Nearly all can
be summarized / synthesized into the following 5 main phases:

Task Analysis Preprocessing Data Mining Postprocessing Deployment

 Task  Data Selection  Model  Output  Deployment


Develop- Generation of results
 Business  Data Cleaning
ment
Understanding  Evaluation/
 Data (Training)
Analysis of
 Problem Preparation
 Running Results
definition
 Data Models
 Analysis of Transformation (Prediction)
requirements

 SAP AG 2003, Title of Presentation, Speaker Name / 28


SAP Data Mining < BW 3.5

 Data Mining methods have been available in SAP BW since Release 3.0B:
 ABC
 Association Analysis
 Regression
 Decision Tree
 Clustering

 Training and application of the SAP Data Mining Methods via a separate
Workbench (Data Mining Workbench)

 single "Data Source“  BW Query

 Training, Evaluation and application


of Data Mining Models via "Wizard“

 SAP AG 2003, Title of Presentation, Speaker Name / 29


APD – Process Overview

Step 1: Select Data Step 2: Prepare Data

SAP Data Mining Integration


Step 3: Transform Data
 Analysis Process Designer
functions as a new, graphical
Frontend for Data Mining to
train and apply Data Mining
models.
 Train Data Mining model (the
Step 4: Store/Transfer is a target node in Step
modelData the 5: Deploy Data
Analysis Process) Campaigns-Target groups
BW  Application ABC Analysis-Customer
Other of Data Mining
modelsCRM
System for prediction in
Analysis Processes (the model
(e.g. CRM) A B C

is a transformation node in the


Analysis Process)

 SAP AG 2003, Title of Presentation, Speaker Name / 30


Supervised Learning (Predictive)

Overview Reason to Use

Decision Tree
A tree-like way of representing a Identification of behavior patterns, e.g.
collection of hierarchical rules that lead churn behavior, satisfaction analysis,
to a class or value. risk analysis

Simple but powerful data-mining tool


that is very popular, probably due to its
ease of setup.

Scoring (Linear Regression)


Scoring serves to evaluate data records. Identification of requirements, e.g. in
In the procedures of linear and non conjunction with any kind of loyalty
linear regression derived weighting- programs
functions (either derived from historic
data or directly defined) enable
quantitative predictions of one variable
from the values of another.

 SAP AG 2003, Title of Presentation, Speaker Name / 31


Unsupervised Learning (Informative)

Overview Reason to Use

Clustering
Clustering serves to segment and divide Clustering can find use e.g. in an
data into so-called clusters in a way, that insurance by creating customer groups
data of similar content will be assigned with respect to income, age, insurance
to one cluster however the clusters differ policy and well known cases of damage.
among one another as far as possible. By doing so it is possible to identify
through Clustering, which combinations
of certain characteristics orrcure often
together and form corresponding
customer segments.

ABC Classification
The ABC-classification is a frequently E.g. customer can be classified into
used analysis method in order to classify three classes (A, B, C) according to the
objects (customers, products or amount of turnover realized with the
colleagues) on the basis of a certain company.
measurement category, like revenue or
profit.

 SAP AG 2003, Title of Presentation, Speaker Name / 32


Unsupervised Learning (Informative)

Overview Reason to Use

Association Analysis
The association analysis serves to find The association analysis helps to find
regularities above all in business e.g. cross-selling chances. The identified
operations and to formulate rules can be used to arrange associated
corresponding rules, in the way like "if a products together in a catalogue, super
customer buys product A, he also buys market or web-shop, or to address
product B and C". systematically customer which have
already bought product A for product C.

Scoring (Weighted Score Tables)


Scoring serves to evaluate data records. Evaluation of influence factors in regard
In the procedures of Weighted Score to future decision.
Tables, weighting-functions are defined
manually by evaluating the single model
fields and of those one weighted sum
will be formed.

 SAP AG 2003, Title of Presentation, Speaker Name / 33


Positioning of Data Mining in SAP BW

Task Analysis Preprocessing Data Mining Postprocessing Deployment

 Task  Data Selection  Model  Output  Deployment


Develop- Generation of results
 Business  Data Cleaning
ment
Understanding  Evaluation/
 Data (Training)
Analysis of
 Problem Preparation
 Running Results
definition
 Data Models
 Analysis of Transformation (Prediction)
requirements

Step 1: Select Data Step 3: Transformation Step 4: Store/Transfer Data Step 5: Deploy Data
Camp.Targetgroups
SAP
ABC Analysis
APD BW other

CRM
Systems
(e.g. CRM)
A B C

Step 2: Preparation

 SAP AG 2003, Title of Presentation, Speaker Name / 34


Agenda

 Analysis Process Designer (APD)

 Integration of the Data Mining Workbench

 Exercise

 SAP AG 2003, Title of Presentation, Speaker Name / 35

S-ar putea să vă placă și