Unit 11: Analysis Process Designer and Data Mining

Unit 11:
Analysis Process
Designer and
Data Mining
Agenda
 Analysis Process Designer (APD)
 Integration of the Data Mining Workbench
 Exercise
 SAP AG 2003, Title of Presentation, Speaker Name / 2

Do you know….
...your data exactly, which are available in your SAP BW?
...the relationships between your data ?
...the potential of your data in detail?
...do you use the potential of your data?

New Feature in SAP BW…
 Exploration of data available through all SAP BW

storage media
 Application of advanced analytical methods
 Goal: gain new insights from these data
New insights
 meaningful relationships between these data, which

are hidden or too complex to be uncovered through
pure observation or intuition.

An Analysis Process
Sources Target
Transformation

The Analysis Process Designer
Transfer results into &
apply in Campaigns-Target groups
OLTPs (e.g. SAP CRM)
Transformation
x²+y²+2dx+2ey+f=0
(x,y)=F(x²,y
²)
Selection Update and Analysis
in SAP BW
SAP BW
Preparation

Integration of the Analysis Process Designer in SAP BW
Preparation and
Data extraction Data Storage / delivery of the
from disparate Consolidation information
data sources and Structuring (Reports and
Analyses)
Analysis Process Designer
“Enhanced Transfer
Rules“

The Interactive Workbench of the APD
Analysis
Process
Repository
Drag & Drop

“Step-by-Step“
Context Menu
for Display and
Settings

APD – Process Overview
Step 1: Select Data Step 2: Prepare Data
Step 3: Transform Data
Step 4: Store/Transfer Data Step 5: Deploy Data

Campaigns-Target groups
BW ABC Analysis-Customer
Other
CRM
System
(e.g. CRM) A B C

APD – Sources (Step 1)
Based on the task or problem at hand, data must be provided which is

statistically and semantically relevant. The APD provides a mechanism for
delivering this data in an easy to use, graphical environment. Subsequent
steps in the process can then be assured of a firm bases for continued
analysis.
Read from the following sources :
Characteristic: Read data from an InfoObject Master Data
InfoProvider: Use an InfoCube, ODS object, or Multi-Provider as source
Query: Read data from a query
Flat File: Read data from a flat file
Data Base Tables: Read data from a database table (hidden with BW 3.5)

APD Data Transformation (Step 2 & Step 3)
Preparation (Step 2) :
To maintain the quality of the analysis process results a clean,

complete and error free database is crucial. In order to ensure quality,
the APD provides basic data operations to prepare and cleanse the
raw data.
Transformation (Step 3) :
With the help of the robust transformations it is possible to discover

and make hidden information relationships apparent.

APD – Data Transformation I: Preparation (Step 2)
Use the following operations to prepare your data:
Filter: Restrict the amount of data to be processed
Aggregation: Group and aggregate data according to selected fields
Join: Merge data from two different sources
Sort: Sort the data according to the selected fields
Transpose into columns: Transform flat data records into a list
Transpose into rows: Transform a list into flat data records
Hide columns: Hiding of entire columns

APD – Data Transformation II (Step 3)
These features realize the Transformation of the Data :
ABC Classification: Calculation of ABC Classification
Regression: Application of linear/non-linear Regression algorithm
Clustering: Application of Clustering algorithm
Scoring: Application of “Weighted Score Table“ algorithm
Decision Tree: Application of Decision Tree algorithm
Data Mining: Application of external (non-SAP) Data Mining Models
Routine: Custom transformation via ABAP routine

APD – Visualization of Intermediate Results
A cleansed, complete and error-free database is crucial for meaningful

results in an analysis process. This is realized step-by-step within the APD.
At the same time it must be possible to check each process step individually.
To facilitate the discovery process the APD provides visualization tools with
which intermediate results can be easily displayed or the quality of the data
can be analyzed:
 Display Data: the data per “process node“ can be displayed at any time in tabular
format.
 Elementary Statistics: Advanced visualization methods for a quick view of basic
properties and quality of the interim results per “process node”. This functionality
includes histograms, distributions and basic statistical measures like means,
standard deviations, correlations and visualizations.
 Calculate Intermediate Results: the interim results of each “process node” can
be stored temporarily for performance reasons.

APD – “Data Display“

APD – Basic Statistics

APD – Basic Statistics

APD – Data Targets (Step 4)
Write results back to the following targets :

ODS Object: Load results back to a transactional ODS; from here updates into other SAP BW Data targets can be performed.
Master Data: Update InfoObject master data
OLTP System: Transfer results to a OLTP system, e.g.CRM system
Association Analysis: Training/Application of the Association Analysis algorithm
Regression: Training of linear/non-linear Regression algorithm
Clustering: Training of Clustering algorithm
Decision Tree: Training of Decision Tree algorithm
Data Mining Models as Data Targets :

ETL versus APD
At first sight the APD seems to offer the similar functions, as those performed by ETL
Tools in Data Warehousing solutions. Nevertheless it is important to note that these
are two completely different applications with different objectives.
1. ETL Process
Extraction: data procurement, that is selection of relevant data from source
systems and supply of the data work area
Transformation: processing and massaging data to specified structure and

quality requirements of the data in the workspace.
Loading: bringing the data physically from the workspace into the Data
Warehouse.
2. APD Process
In an Analysis Process existing data are accessed in SAP BW (Data Warehouse) and
completely new data are created via the use of specialized transformations and only the
new data are written back into the database of the SAP BW or an operational system.

ETL versus APD
Source Systems Data Warehouse
ETL Process
Data Data New Data

~= +
APD Process
Data copies New Data

Business Benefits
The most important potential which can be realized with this improved
information are:
 Cost reduction (TCO)
 Revenue improvement (ROI)
 Improved customer experience and -satisfaction
By being 100% integrated into the SAP BW, the Analysis Process
Designer (incl. Data Mining Features) also guarantees, that only a single
database is accessed and not different data tables in different source
system. This significantly decreases interfacing problems as well as
related issues with data integrity, quality and system performance.

Agenda
 Exercise

Definition von Data Mining
"Data mining is the process of discovering meaningful new
correlations, patterns and trends by "mining" large amounts of stored
data using pattern recognition technologies, as well as statistical and
mathematical techniques.“ (Ashby, Simms (1998))

What Is Data Mining?
Data mining helps you turn data into knowledge:
 Data mining is an analytical approach that looks for hidden information

patterns in large databases
 Data mining helps to turn high-volume data into high-value information
Data mining helps you turn data into action:
 Data mining not only provides insights by analyzing past data, but it is
also capable of predicting future trends and behaviors
 Data mining allows organizations to make the critical jump from

retrospective analysis to prospective decision-making
 The overall objective of data mining is “knowledge

discovery”.
Definition of “Knowledge Discovery in Databases”
By “Knowledge Discovery in Databases” (KDD) we see a process-based

approach to the search for potentially high-quality knowledge in source
databases. In the center of these processes Data Mining methods are
strategically placed. Nevertheless the overall quality of Data Mining results also
requires both the preliminary, preparation steps as well as post-processing
activities.

Crisp-Model (“Cross-Industry Process for Data Mining”)*
Business Data Under-

Understanding standing
..…
….. Data
Data Preparation
Deployment
Modeling
Evaluation
* Chapman et al. 1998

Crisp-Model ("Cross-Industry Standard Process for Data Mining")
Business Understanding
Description of the Business Objectives and Data Mining Goals / Success
Data Understanding
Selection of the data and exploratory analysis (quality, problems, description of selected
data)
Data Preparation
Cleaning, transformation, integration, formatting of the selected data
Modeling
Selection, building, testing and running different models
Evaluation
Approval of the models and assessment of the results (in accordance with the defined
objectives), review of the process
Deployment
Preparation of final reports, presentation, action plans and deployment of results

Definition of “Knowledge Discovery in Databases”
There are several divergent KDD methodologies / models. Nearly all can
be summarized / synthesized into the following 5 main phases:
Task Analysis Preprocessing Data Mining Postprocessing Deployment
 Task  Data Selection  Model  Output  Deployment

Develop- Generation of results
 Business  Data Cleaning
ment
Understanding  Evaluation/
 Data (Training)
Analysis of
 Problem Preparation
 Running Results
definition
 Data Models
 Analysis of Transformation (Prediction)
requirements

SAP Data Mining < BW 3.5
 Data Mining methods have been available in SAP BW since Release 3.0B:
 ABC
 Association Analysis
 Regression
 Decision Tree
 Clustering
 Training and application of the SAP Data Mining Methods via a separate
Workbench (Data Mining Workbench)
 single "Data Source“  BW Query
 Training, Evaluation and application

of Data Mining Models via "Wizard“

APD – Process Overview
Step 1: Select Data Step 2: Prepare Data
SAP Data Mining Integration

Step 3: Transform Data
 Analysis Process Designer
functions as a new, graphical
Frontend for Data Mining to
train and apply Data Mining
models.
 Train Data Mining model (the
Step 4: Store/Transfer is a target node in Step
modelData the 5: Deploy Data
Analysis Process) Campaigns-Target groups
BW  Application ABC Analysis-Customer
Other of Data Mining
modelsCRM
System for prediction in
Analysis Processes (the model
(e.g. CRM) A B C
is a transformation node in the

Analysis Process)

Supervised Learning (Predictive)
Overview Reason to Use
Decision Tree
A tree-like way of representing a Identification of behavior patterns, e.g.
collection of hierarchical rules that lead churn behavior, satisfaction analysis,
to a class or value. risk analysis
Simple but powerful data-mining tool

that is very popular, probably due to its
ease of setup.
Scoring (Linear Regression)

Scoring serves to evaluate data records. Identification of requirements, e.g. in
In the procedures of linear and non conjunction with any kind of loyalty
linear regression derived weighting- programs
functions (either derived from historic
data or directly defined) enable
quantitative predictions of one variable
from the values of another.

Unsupervised Learning (Informative)
Clustering
Clustering serves to segment and divide Clustering can find use e.g. in an
data into so-called clusters in a way, that insurance by creating customer groups
data of similar content will be assigned with respect to income, age, insurance
to one cluster however the clusters differ policy and well known cases of damage.
among one another as far as possible. By doing so it is possible to identify
through Clustering, which combinations
of certain characteristics orrcure often
together and form corresponding
customer segments.
ABC Classification
The ABC-classification is a frequently E.g. customer can be classified into
used analysis method in order to classify three classes (A, B, C) according to the
objects (customers, products or amount of turnover realized with the
colleagues) on the basis of a certain company.
measurement category, like revenue or
profit.

Unsupervised Learning (Informative)
Association Analysis
The association analysis serves to find The association analysis helps to find
regularities above all in business e.g. cross-selling chances. The identified
operations and to formulate rules can be used to arrange associated
corresponding rules, in the way like "if a products together in a catalogue, super
customer buys product A, he also buys market or web-shop, or to address
product B and C". systematically customer which have
already bought product A for product C.
Scoring (Weighted Score Tables)

Scoring serves to evaluate data records. Evaluation of influence factors in regard
In the procedures of Weighted Score to future decision.
Tables, weighting-functions are defined
manually by evaluating the single model
fields and of those one weighted sum
will be formed.

Positioning of Data Mining in SAP BW
Task Analysis Preprocessing Data Mining Postprocessing Deployment
 Task  Data Selection  Model  Output  Deployment

Develop- Generation of results
 Business  Data Cleaning
ment
Understanding  Evaluation/
 Data (Training)
Analysis of
 Problem Preparation
 Running Results
definition
 Data Models
 Analysis of Transformation (Prediction)
requirements
Step 1: Select Data Step 3: Transformation Step 4: Store/Transfer Data Step 5: Deploy Data
Camp.Targetgroups
SAP
ABC Analysis
APD BW other
CRM
Systems
(e.g. CRM)
A B C
Step 2: Preparation

Agenda
 Exercise

Unit 11: Analysis Process Designer and Data Mining

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Unit 11: Analysis Process Designer and Data Mining

Încărcat de

Drepturi de autor:

Formate disponibile

Unit 11:

 Analysis Process Designer (APD)

 Integration of the Data Mining Workbench

 SAP AG 2003, Title of Presentation, Speaker Name / 2

...your data exactly, which are available in your SAP BW?

...the relationships between your data ?

...the potential of your data in detail?

...do you use the potential of your data?

 SAP AG 2003, Title of Presentation, Speaker Name / 3

 Exploration of data available through all SAP BW

 Application of advanced analytical methods

 Goal: gain new insights from these data

 meaningful relationships between these data, which

 SAP AG 2003, Title of Presentation, Speaker Name / 4

 SAP AG 2003, Title of Presentation, Speaker Name / 5

 SAP AG 2003, Title of Presentation, Speaker Name / 6

Analysis Process Designer

 SAP AG 2003, Title of Presentation, Speaker Name / 7

Drag & Drop

 SAP AG 2003, Title of Presentation, Speaker Name / 8

Step 1: Select Data Step 2: Prepare Data

Step 3: Transform Data

Step 4: Store/Transfer Data Step 5: Deploy Data

 SAP AG 2003, Title of Presentation, Speaker Name / 9

Based on the task or problem at hand, data must be provided which is

Read from the following sources :

Characteristic: Read data from an InfoObject Master Data

InfoProvider: Use an InfoCube, ODS object, or Multi-Provider as source

Query: Read data from a query

Flat File: Read data from a flat file

 SAP AG 2003, Title of Presentation, Speaker Name / 10

To maintain the quality of the analysis process results a clean,

With the help of the robust transformations it is possible to discover

 SAP AG 2003, Title of Presentation, Speaker Name / 11

Use the following operations to prepare your data:

Filter: Restrict the amount of data to be processed

Aggregation: Group and aggregate data according to selected fields

Join: Merge data from two different sources

Sort: Sort the data according to the selected fields

Transpose into columns: Transform flat data records into a list

Transpose into rows: Transform a list into flat data records

Hide columns: Hiding of entire columns

 SAP AG 2003, Title of Presentation, Speaker Name / 12

These features realize the Transformation of the Data :

ABC Classification: Calculation of ABC Classification

Regression: Application of linear/non-linear Regression algorithm

Clustering: Application of Clustering algorithm

Scoring: Application of “Weighted Score Table“ algorithm

Decision Tree: Application of Decision Tree algorithm

Data Mining: Application of external (non-SAP) Data Mining Models

Routine: Custom transformation via ABAP routine

 SAP AG 2003, Title of Presentation, Speaker Name / 13

A cleansed, complete and error-free database is crucial for meaningful

 SAP AG 2003, Title of Presentation, Speaker Name / 14

 SAP AG 2003, Title of Presentation, Speaker Name / 15

 SAP AG 2003, Title of Presentation, Speaker Name / 16

 SAP AG 2003, Title of Presentation, Speaker Name / 17

Write results back to the following targets :

Master Data: Update InfoObject master data

OLTP System: Transfer results to a OLTP system, e.g.CRM system

Association Analysis: Training/Application of the Association Analysis algorithm

Regression: Training of linear/non-linear Regression algorithm