Documente Academic
Documente Profesional
Documente Cultură
World of
“Cache”
The Hidden agenda
a) Basics of Cache
1) Memory Cache
2) Where the cache files are created
3) Naming Conventions
4) Cache Calculations
b) Advanced Cache
1) Look up Cache
2) Aggregator Cache
3) Joiner Cache
4) Ranker Cache
Let’s get to the Basics:
Cache is a combination of:
1) Index Cache: Server stores key values or condition values used to index values at a faster rate.
2) Data Cache: Server stores output values.
• The server creates a memory cache based on size specified in the session
properties which can be done manually based on certain calculations .
• By default, the PowerCenter Server allocates 1 GB to the index cache and 2GB to
the data cache for each transformation instance.
• If the PowerCenter Server cannot allocate the configured amount of cache memory,
it cannot initialize the session and the session fails.
• If the PowerCenter Server requires more memory than the configured cache size,
it pages to the Disc. Since paging to disk can slow session performance, try to
configure the index and data cache sizes to store data in memory.
Where are the Cache Files
Created?
• The PowerCenter Server creates the index and data cache files by default in the
PowerCenter Server variable directory, $PMCacheDir.
• If you do not define $PMCacheDir, the PowerCenter Server saves the files in the PMCache
directory specified in the UNIX configuration file or the cache directory in the Windows
registry. If the UNIX PowerCenter Server does not find a directory there, it creates the index
and data files in the installation directory. If the PowerCenter Server on Windows does not
find a directory there, it creates the files in the system directory.
• If a cache file handles more than 2 GB of data, the PowerCenter Server creates multiple
index and data files. When creating these files, the PowerCenter Server appends a number
to the end of the filename, such as PMAGG*.idx1 and PMAGG*.idx2. The number of index
and data files are limited only by the amount of disk space available in the cache directory.
Three Instances when the Cache File exists even after Session completion:
• For example,
PMLKUP8_4_2.idx,
Name
Cache file name prefix configured in the Lookup transformation.
Prefix
Describes the type of transformation:
Aggregator transformation is PMAGG.
Prefix Joiner transformation is PMJNR.
Lookup transformation is PMLKUP.
Rank transformation is PMAGG.
Session
Session instance ID number.
ID
Transfor
mation Transformation instance ID number.
ID
If the session contains more than one partition, this identifies the partition
Partition number. The partition index is zero-based, so the first partition has no
Index partition index. Partition index 2 indicates a cache file created in the third
partition.
Identifies the type of file:
Suffix Index file is .idx.
Data file is .dat.
If a cache file handles more than 2 GB of data, the PowerCenter Server creates
multiple index and data files. When creating these files, the PowerCenter
Overflow
Server appends an overflow index to the filename, such as PMAGG*.idx.1 and
Index
PMAGG*.idx.2. The number of index and data files are limited by the amount
of disk space available in the cache directory.
Cache Calculations
• Aggregator:
Index size: (Sum of column sizes in group-by ports + 17) X number of groups.
Data Size: (Sum of column sizes of output ports + 7) X number of groups.
• Rank:
Index size: (Sum of column sizes in group-by ports + 17) X number of groups.
Data Size: (Sum of column sizes of output ports + 10) X number of groups + 20.
• Joiner:
Index Size: (Sum of master column sizes in join condition + 16) X number rows in
master table.
Data Size: (Sum of master column sizes NOT in join condition but on output ports
+ 8)X number of rows in master table
• LookUp:
• Index Size: # rows in lookup table [( S column size) + 16] * 2
• Data Size: # rows in lookup table [( S column size) + 8]
Aggregator,
Joiner, Lookup
Datatype Rank
precision + 8
Binary precision + 2 Round to nearest
multiple of 8
Date/Time 18 24
Decimal, high precision off (all
10 16
precision)
Decimal, high precision on
18 24
(precision <=18)
Double 10 16
Real 10 16
Integer 6 16
ASCII mode: ASCII mode: precision +
String precision + 3 9
Small integer 6 16
Lookup Caches Overview
• The informatica server returns a value from the lookup table or cache when the condition is
true.When the condition is not true, informatica server returns the default value for
connected transformations and null for unconnected transformations.
Dynamic cache :
• The informatica server inserts rows into cache when the condition is false.This indicates that
the the row is not in the cache or target table. U can pass these rows to the target table
• ITEM_ID = IN_ITEM_ID1
• The lookup condition uses one column, ITEM_ID, and the table contains
60,000 rows.
• Use the following calculation to determine the minimum index cache
requirements:
• 200 * (16 + 16) = 6,400
• Use the following calculation to determine the maximum index cache
requirements:
• 60,000 * (16 + 16) * 2 = 3,840,000
• Therefore, this Lookup transformation requires an index cache size between
6,400 and 3,840,000 bytes.
Calculating the Lookup Data
Cache
• In a connected transformation, the data cache contains data for the
connected output ports, not including ports used in the lookup condition.
In an unconnected transformation, the data cache contains data from the
return port.
• 1) PROMOTION_ID - Connected output port not in lookup condition –
Integer -> 16
• 2) DISCOUNT - Connected output port not in lookup condition - Decimal
16
• The lookup table has 60,000 rows.
• Use the following calculation to determine the minimum data cache
requirements:
• 60,000 * (32 + 8) = 2,400,000
• This Lookup transformation requires a data cache size of 2,400,000 bytes.
Aggregator Cache
• When the PowerCenter Server runs a session with an Aggregator
transformation, it stores data in memory until it completes the aggregation.
• Verify the location where you want to store the aggregate files. Configure
the session to write file names in the session log.
• If you want the PowerCenter Server to write the incremental aggregation
cache file names in the session log, configure the session with Verbose Init
tracing.
• Verify the incremental aggregation settings in the session properties.
You can configure the session for incremental aggregation in the
Performance settings on the Properties tab.
• You can also configure the session to reinitialize the aggregate cache. If you
choose to reinitialize the cache, the Workflow Manager displays a warning
indicating the PowerCenter Server overwrites the existing cache and a
reminder to clear this option after running the session.To configure a
session for incremental aggregation:
Calculating the Aggregator Index
Cache
The index cache holds group information from the group by ports.
# groups [( S column size) + 17]
Columns Group by columns
As per example,
STORE_ID – Integer size 6
ITEM - String size - 18
Therefore total column size = 18 + 6 = 24
Assuming there are 72,000 input rows
The Min Index Cache calculation is:
72,000 * (24 + 17) = 2,952,000
The max index cache calculation is double the amount:
2,952,000 * 2 = 5,904,000
Therefore, this Aggregator transformation requires an index cache size between
2,952,000 and 5,904,000 bytes.
Calculating the Aggregator Data
Cache
• The data cache holds row data for variable ports and connected output ports. As a result, the data
cache is generally larger than the index cache. To reduce the data cache size, connect only the
necessary input/output ports to subsequent transformations. Use the following information to calculate
the minimum aggregate data cache size:
• # groups[( S column size) + 7]
• Column size a) Non group by input/output ports.
b) Local variable ports.
c) Port containing aggregate
function (multiply by three).*
In the example,
ORDER_ID – Integer 6
SALES_PER_STORE_ITEMS - Decimal 30*
Total = 36
The total number of groups as calculated for the index cache size is 72,000. Use the following calculation
to determine the minimum data cache requirements:
• 72,000 * (36 + 7) = 3,096,000
• Therefore, this Aggregator transformation requires a data cache size of 3,096,000 bytes.
Joiner Cache
• While using joiner cache informatica server first reads the data from master source
and built index & data cache in the master rows. After building the cache,the
PowerCenter Server then performs the join based on the detail source data and the
cache data.
• Server creates the Index cache as it reads the master source into the data cache.
The server uses the Index cache to test the join condition. When it finds a match, it
retrieves rows values from the data cache
• The PowerCenter Server caches all master rows with a unique key in the index
cache, and all master rows in the data cache.
• For instance,
Index cache. The PowerCenter Server caches 100 master rows with unique keys.
Data cache. The PowerCenter Server caches the master rows in the data cache that
correspond to the 100 rows in the index cache. The number of rows it stores in the
data cache depends on the data. For example, if every master row contains a unique
key, the PowerCenter Server stores 100 rows in the data cache. However, if the
master data contains multiple rows with the same key, the PowerCenter Server
stores more than 100 rows in the data cache.
Joiner Index Cache
Calculation
The index cache holds rows from the master source that are in the join condition.
• PRODUCTS is the master source and has 90,000 rows. Use the following
calculation to determine the minimum index cache requirements:
• 90,000 * (16 + 16) = 2,880,000
• Double the size to determine the maximum index cache requirements:
• 2,880,000 * 2 = 5,760,000
• Therefore, this Joiner transformation requires an index cache size between
2,880,000 and 5,760,000 bytes.
Joiner Data Cache
Calculation
• The data cache holds rows from the master source until the PowerCenter Server joins
the data.
• # master rows [( S column size) + 8]
• Column Master column not in join condition and used for output.
• In the example , The following figure shows the connected output ports for
JNR_ORDERS_PRODUCTS:
• ITEM_NAME – string 32
• PRODUCT CATEGORY – decimal 30
• Total column size = 62
• The master source has 90,000 rows.
• Use the following calculation to determine the minimum data cache requirements:
• 90,000 * (62 + 8) = 6,300,000
• This Joiner transformation requires a data cache size of 6,300,000 bytes.
Rank Caches
• When the PowerCenter Server runs a session with a Rank transformation, it
compares an input row with rows in the data cache. If the input row out-ranks a
stored row, the PowerCenter Server replaces the stored row with the input row.
• For example, you configure a Rank transformation to find the top three sales. The
PowerCenter Server reads the following input data:
• SALES
• 10,000
• 12,210
• 5,000
• 2,455
• 6,324
• The PowerCenter Server caches the first three rows (10,000, 12,210, and 5,000).
When the PowerCenter Server reads the next row (2,455) it compares it to the cache
values. Since the row is lower in rank than the cached rows, it discards the row with
2,455. The next row (6,324), however, is higher in rank than one of the cached rows.
Therefore, the PowerCenter Server replaces the cached row with the higher-ranked
input row.
• If the Rank transformation is configured to rank across multiple groups, the
PowerCenter Server ranks incrementally for each group it finds.
Calculating the Rank Index
Cache
• The index cache holds group information from the group by ports. Use
the following information to calculate the minimum rank index cache size:
• Rank Index Calculation:
• # groups [( S column size) + 17]
• Columns Group by columns.
• PRODUCT_CATEGORY (string(21)- column size) = 24
• There are 10,000 product categories, so the total number of groups is
10,000. Use the following calculation to determine the minimum index cache
requirements:
• 10,000 * (24 + 17) = 410,000
• Double the size to determine the maximum index cache requirements:
• 410,000 * 2 = 820,000
• Therefore, this Rank transformation requires an index cache size between
410,000 and 820,000 bytes.
Calculating the Rank Data
Cache
• The data cache size is proportional to the number of ranks. It holds row data until
the PowerCenter Server completes the ranking and is generally larger than the
index cache. To reduce the data cache size, connect only the necessary
input/output ports to subsequent transformations. Use the following information to
calculate the minimum rank data cache size:
• # groups [(# ranks *( S column size + 10)) + 20]
• ITEM_NO Decimal(10) = 10
• ITEM_NAME String(23) = 26
• PRICE Decimal (14) = 10
• TOTAL COLUMN SIZE = 46
• RNK_TOPTEN ranks by price, and the total number of ranks is 10. The number
of groups is 10,000.
• Use the following calculation to determine the minimum data cache requirements:
• 10,000[(10 * (46 + 10)) + 20] = 5,800,000
• This Rank transformation requires a data cache size of 5,800,000
• bytes.