Sunteți pe pagina 1din 4

7/20/2016 SAP BODS Transforms

SAP BODS Transforms


Written by Saurav Mitra Last Updated: 29 May 2016

This article deals with the various types of transformations available in SAP BODS. Transformations
are in-built, optional objects used in dataflow to transform source data to desired output dataset
objects available in Local Object Library under the Transforms tab. There are four categories of
Transform available in SAP BODS namely Data Integrator, Data Quality, Platform and Text Data
Processing.

List of available transforms


1. Data Integrator- Data_Transfer, Date_Generation, Effective_Date, Hierarchy_Flattening,
History_Preserving, Key_Generation, Map_CDC_Operation, Pivot (Columns to Rows), Reverse
Pivot (Rows to Columns), Table_Comparison, XML_Pipeline
2. Data Quality- Associate, Country ID, Data Cleanse, DSF2 Walk Sequencer, Geocoder, Global
Address Cleanse, Global Suggestion Lists, Match, USA Regulatory Address Cleanse, User-Defined
3. Platform- Case, Map_Operation, Merge, Query, Row_Generation, SQL, Validation
4. Text Data Processing- Entity_Extraction

Our approach is to get a detailed knowledge on all the above transforms starting with the mostly
commonly used ones. So we will start with the Query transform.

QUERY Transform
Query transform is used to retrieve a data set based on the input schema that satisfies conditions
that we specify. A query transform is similar to a SQL SELECT statement. The Query transform is used
to perform the following operations: -

Maps column from input Schema to output Schema.


Perform transformations and functions on the source data.
Assign Primary Keys to output Schema columns.
Add New Output columns, Nested Schemas, and Function Calls to the output Schema.
Perform Data Nesting and Unnesting with Sub Schemas of the Output Schema. Also assign Make
Current Schema.
Generate Distinct result set output for the input Schema.
Join data from Multiple Input Source Schemas. Equi Join as well as Outer Join is supported.
Filter input Source Data.
Performs Aggregation based on input column groups.
Generate sorted dataset based on source input column order.
Also we can generate DTD, XML Schema or File Format based on the Input or Output Schema.

CASE Transform
https://dwbi.org/etl/sap-data-services/83-sap-bods-transforms 1/7
7/20/2016 SAP BODS Transforms

Case transform is used to divide or route the input data set into multiple output data sets based on
the defined logical expression. It is used to implement IF-THEN-ELSE logic at dataflow level. This
transform accepts only one source input. We can define multiple labels and their corresponding CASE
expression. For input rows that do not satisfy any of the CASE conditions, we may select to output
those records using the DEFAULT case. For that we need to select the check box Produce default
output when all expressions are false.

Two other featured properties of this transform are Row can be TRUE for one case only and
Preserve expression order. If we select the option Row can be TRUE for one case only, then a
row is passed to the first case whose expression returns TRUE. Otherwise, the row is passed to all the
cases whose expression returns TRUE. Preserve expression order option is available only when the
Row can be TRUE for one case only option is checked. We can select this option if expression order is
important to us because there is no way to guarantee which expression will evaluate to TRUE first.

MERGE Transform
Merge transform is used to combine multiple input dataset with the same schemas into a single output
dataset of the same schema. It is equivalent to SQL UNION ALL statement. In order to eliminate
duplicate records from output dataset basically to attain UNION operation, add a Query transform
with DISTINCT option enabled after the Merge transform.

VALIDATION Transform
Validation transform is used to filter or replace the source dataset based on criteria or validation rules
to produce desired output dataset. It enables to create validation rules on the input dataset, and
generate the output based on whether they have passed or failed the validation condition. This
transform is typically used for NULL ckecking for mandatory fields, Pattern matching, existence of
value in reference table, validate datatype, etc.

The Validation transform can generate three output dataset Pass, Fail, and RuleViolation. The Pass
Output schema is identical with the Input schema. The Fail Output schema has two more columns,
DI_ERRORACTION and DI_ERRORCOLUMNS. The RuleViolation has three columns DI_ROWID,
DI_RULENAME and DI_COLUMNNAME.

MAP_OPERATION Transform
Map_Operation transform allows conversions between data manipulation operations like INSERT,
UPDATE, DELETE & REJECT. It enables to change the operation codes of input data sets to produce
the desired output row type. There are 4 operation codes for any input row type – Normal, Update,
Insert and Delete. In addition, the DISCARD option can be assigned to the output row type. Discarded
rows are not passed through to the output of the transform.

https://dwbi.org/etl/sap-data-services/83-sap-bods-transforms 2/7
7/20/2016 SAP BODS Transforms

If the output record is flagged NORMAL or INSERT, then it inserts a new record in the target table. If
it is marked as UPDATE it basically overwrites an existing row in the target table. If the input record
is flagged as Delete it does not load the records in the target table. But if the output row type is set
to DELETE then it deletes the corresponding records present in target. If the row is marked as
DISCARD then no records are passed to the output of the transform.

ROW_GENERATION Transform
Row_Generation transform produces a dataset with a single column. The column values start with the
number that we specify in the Row number starts at option. The value then increments by one to
specified number of rows as set in the Row count option. This transform does not allow any input
data set.

SQL Transform
SQL transform is used to submit or perform standard SQL operations on database server. The SQL
transform supports a single SELECT statement only. This transform does not allow any input data set.
Use this transform when other built-transforms cannot perform the required SQL operation. Try to use
this transform as your last option as it not optimised for performance and also reduces readability.

KEY_GENERATION Transform
Key_Generation transform helps to generate artificial keys for new rows in a table. The transform
looks up the maximum existing key value of the surrogate key column from the table; And uses it as
the starting value to generate new keys for new rows in the input dataset. The transform expects a
column with the same name as the Generated key column of the source table to be a part of the
input schema.

The source table must be imported into the DS repository before defining the source table for this
transform. The fully qualified Table name e.g. DATASTORE.OWNER.TABLE should be specified. Also
we can set the Increment value i.e. the interval between the generated key values. By default it is
1. We can also use a variable placeholder for this option. We will be using this tranform frequently
while populating surrogate key values of slowly changing dimension tables.

TABLE_COMPARISON Transform
Table_Comparison transform helps to compare two data sets and generates the difference between
them as a resultant data set with rows flagged as INSERT, UPDATE, or DELETE. This transform can
be used to ensure rows are not duplicated in a target table, or to compare the changed records of a

https://dwbi.org/etl/sap-data-services/83-sap-bods-transforms 3/7
7/20/2016 SAP BODS Transforms

data warehouse dimension table. It helps to detect and forward all changes or the latest ones that
have occurred since the last time the comparison table was updated. We will be using this transform
frequently while implementing slowing changing dimensions and while designing dataflows for recovery.

The source table must be already imported into the DS repository. The fully qualified Table name e.g.
DATASTORE.OWNER.TABLE should be specified. Also set the input dataset columns that uniquely
identify each row as Input primary key columns. These columns must be present in the comparison
table with the same column names and datatypes. If the primary key value from the input data set
does not match a value in the comparison table, DS generates an INSERT statement. Else it
generates an UPDATE row with the values from the input dataset row after comparing all the columns
in the input data set that are also present in the comparison table apart from the primary key
columns. As per your requirement select only the required subset of non-key Compare columns
which will give performance improvement.

If the Input primary key columns have duplicate keys, the transform arbitrarily chooses any of the
rows to compare during dataflow processing i.e. order of the input rows are ignored. Selecting the
Input contains duplicate keys check box provides a method of handling duplicate keys in the input
data set.

If the comparison table contains rows with the same primary keys, the transform arbitrarily chooses
any of the rows to compare. Specify the column of the comparison table with unique keys i.e. by
design contains no duplicate keys as the Generated key column. A generated key column indicates
which row of a set containing identical primary keys is to be used in the comparison. This provides a
method of handling duplicate keys in the comparison table.

For an UPDATE, the output data set will contain the largest key value found for the given primary
key. And for a DELETE, the output data set can include all duplicate key rows or just the row with
the largest key value.

When we select the check box Detect deleted row(s) from comparison table the transform flags
rows of the comparison table with the same key value as DELETE. When we select the options of the
transforms - Generated key column, Detect deleted row(s) from comparison table and Row-by-row
select or the Sorted input comparison method; Additional section appears to specify how to handle
DELETE rows with duplicate keys. i.e. Detect all rows or Detect row with largest generated key
value

Apart from all these properties there are three methods for accessing the comparison table namely
Row-by-row select, Cached comparison table and Sorted input. Below is the brief on when to
select which option.

1. Row-by-row select option is best if the target table is large compared to the number of rows
the transform will receive as input. In this case for every input row the transform fires a SQL to
lookup the target table.
2. Cached comparison table option is best when we are comparing the entire target table. DS
uses pageable cache as the default. If the table fits in the available memory, we can change
the Cache type property of the dataflow to In-Memory.
3. Sorted input option is best when the input data is pre sorted based on the primary key
columns. DS reads the comparison table in the order of the primary key columns using sequential
read only once. NOTE: The order of the input data set must exactly match the order of all
primary key columns in the Table_Comparison transform.

NOTE:

https://dwbi.org/etl/sap-data-services/83-sap-bods-transforms 4/7

S-ar putea să vă placă și