Documente Academic
Documente Profesional
Documente Cultură
Page 2
Page 3
There are three principal data environments with the Sunrise architecture. A graphic
view is shown below along with information on the purpose served by each.
Warehouse
All source data is loaded to DB2 tables in the Sunrise warehouse database.
When all sources are transitioned to the Sunrise sourcing architecture, the
Sunrise warehouse will have replaced the existing collector database (CDB)
along with the load database (LPDB), the ETL database (PETL), and parts of
the raw database.
With the agile development approach Sunrise has taken source data is
identified to support a user story. This data is documented along with any
rules required to manage the data as well as the source of the data. In many
cases the data required to support a user story already exists in the Sunrise
environment but new data from existing or entire new sources is identified.
When the content is new from an existing source the source DataStage or LEI
process is extended to include the new element. The elements history over
time, where available, is also brought in. When the data is from a new source
the source is defined to the DataStage or LEI processes and a source specific
data acquisition document created. This document is provided to an ETL
developer who will acquire the data from the source. The newly acquired
data, and historical content if available, will be loaded into the warehouse.
The Data Design section covers details on the data warehouse processes.
Page 4
The reporting data mart contains a business specific set of marketing data
based on business reporting requirements. This data has been transformed
into a dimensionally based star schema implementation. There are three
major data areas within the dimensional reporting data mart which are:
1. Classification dimensions (e.g., what country is the opportunity located
in),
2. Business measurement fact (e.g., validated leads created, total
response count), and
3. Summary aggregate (e.g., align multiple measures into a single record
for high performance reporting).
Each of these is detailed below.
Dimensions
The word dimension is another word for a categorization or classification
scheme to be applied in order to understand data. There are principally two
kinds of dimensions which are:
1
efective date and setting the expiration date of the previous record. In
addition to setting the expiration date on the old record the replaced by key
is also set to point to the newly insert record. One additional update is made
to all prior historical records to set the current key value to the key of the
newly inserted dimension record. This provides an easy method to bring
historical content to a current view. For example a S/36 because an AS/400
which became an iSeries. The older S/36 and AS/400 records current key
would be the key of the current iSeries record.
Below is a Sunrise dimension record so highlight the control information as
well as the dimension values itself. The dimension value includes the
dimension code, short name, and long name values. Every other element is
used by the dimension control processes to manage the dimension in and
across time.
Facts
Based on business reporting requirements content from the warehouse
is processed into business metrics and summarized, where required,
into aggregated information. For each of the measures the business
Page 6
Summary Aggregates
Once metrics have been generated for a Sunrise story performance of
the Sunrise reporting or analytics is evaluated. If the amount of data
cannot support a near-real time response to Cognos reports then use of
Page 7
Page 10
Page 11
The construct data is acquired weekly and runs through the normal WDM
routines to determine change. If there is no change or the change involves
addition, the process does not raise an alert. If, on the other hand, existing
structures are modified (e.g., table has a column removed or the data type
for a column changes) the process issues a check source alert. The architect
responsible for the source will review the changes to determine any impact to
Sunrise build along with any downstream consumers.
Page 12
InfoSphere DataStage
DataStage is a product which allows data from a multitude of sources and source
technologies to be managed into a DB2 database, Netezza appliance, or many
other data environments. It provides many built-in functions for transforming
data thereby simplifying the amount of skills required to establish a source to
target data load.
Sunrise will primarily use DataStage as the means to acquire data into the
Sunrise warehouse. Below is an example source to target data flow defined into
DataStage.
For DataStage services, as with reporting, Sunrise uses the IBM Worldwide BACC
infrastructure.
Page 13
CastIron Appliance
Another company that IBM bought was CastIron. This company created a robust
series of source to target functionality which operates across many diferent
exchange technologies. Although the technology exists and is in use by Sales
Connect to acquired tactic plan data from PDb/Sunrise the infrastructure is not
as pervasive as that of DataStage. Additionally DataStage resource is relatively
easy to acquire.
Page 14
DB2 Replication
CastIron
DB2
1. Local results
limit acquired data
from source
database (like
tactic in EST with
PDb)
1. When source
table and target
table same
structure.
2. When near real
time data required
3. When database
logging is not
circular.
1. When source or
target application
Notes
N/A
GSA File
N/A
N/A
N/A
N/A
DataStage
LEI
have CastIron
appliance
infrastructure
1. Net change
based on solid
source update
date process
2. Change data
capture if source
enabled
1. Inserts into
stage table
source application
N/A
Connects to Notes
database and
inserts new or
updated records
along with deleted
records
When production
process
consistently
outputs same file
structure and
required needed
to change file
structure
N/A
Page 16
Page 17
Most IBM BDS reference data become dimensions in Sunrise. The reference data
values are loaded and managed in the warehouse environment. Part of this
process includes the mapping of any replaced by values as well as expiration of
values that are no longer to be used but which were not replaced. At each
months BDS review the Sunrise standards delegate identifies any changes to
values used by Sunrise. For each month where there is a change a requirement
is opened. From this the reference data will be instantiated and go efective
when the BDS standard requires it. Any existing standard values which are
expire or made obsolete have their expiration date set for on day the BDS
documented.
Page 18
Data Design
The following section the high level design approach for establishing and managing
data throughout the Sunrise data environments.
Data Classification
Control Elements
There is a class of fields which are used by data source applications, the Sunrise
warehouse and ETL processes to insure that data is accurately managed. These
elements serve no purpose to any business person other than to prove that data is
being appropriately managed. This section will cover some specific control elements
that are found in the end to end management of data for Sunrise along with the
rationale for each.
Page 19
Not all control elements identified above apply to all data sources or the data that
Sunrise manages in the warehouse.
City name
Tactic ID
In market date
Ofer name
Job title
Page 21
Customer set
Page 22
Net change will record a record to the history table only when the business
data content changed. This results in much fewer records being recorded to
the data area history table, as compared to a refresh type data source. This
mode though does not provide an easy means to determine and measure all
data at all points in time; although it is possible to do. History from net
change is primarily used for diagnostic purposes to explain why something
reported results at one point in time but not at another.
Page 23
Synchronization Mode
Over time source applications will apply manual updates to records due to
some issue which was discovered with some application functionality without
updating the last update timestamp (if available). In addition time based
issues with GMT versus local time with daylight savings time will sometimes
cause a net change process to miss some records which were changed,
added or deleted. To accommodate all the issues that are otherwise not easily
identified any data area which operates net change then is the ability to
acquire a sync set of data. The load process, at the delete step, will perform
synchronization check between stage content and the active content. It know
the data area is in sync mode because all records have a change indicator of
S. The S will be change to a D when it is determined that a record is in
active but not load and it is within the archive time limits of the data area.
These records will then be used in the normal net change to determine any
missing records (additions), any changes (updates), along with any deletes.
The above is a very short list for the more than 100 elements marketing
current uses when integrating marketing content with sales opportunities to
understand how marketing activities support sales eforts.
Page 25
Control Area
In order to understand how well an acquire source process is executing as
well as understand historical acquire sessions two primary control tables are
used. The first is used to store the status of each source data area being
acquired indicating success for each data area acquired by updating that
areas value from a zero to a one. This table is also used to store acquire
validation results which correlate content using prior record content as well
as acquired data area content comparing that content to another acquired
data areas content. A diagram depicting run control tables for MAT is below.
The run status table has only one row for each and every executing run. Once
an acquire process completes the run status record along with the data area
status records are inserted into their respective history tables.
Page 26
The second control table stores start and completion timestamps as well as
counts for each data area acquired. The diagram for the current run and the
historical data area results is below.
The run date and run sequence columns are used to join the data area status
details to each run occurrence.
Page 27
Staging Area
When source data is acquired to support Sunrise business data requirements
each source data area, e.g., a DB2 table from a source, will be inserted into a
stage table. The acquired process will deliver all data from a source to that
sources data area staging table. The following information provides context
and rules for acquired data into the warehouse.
1
Each stage table, no matter the source, will have a column with the source
system timestamp when the acquire process started.
1. The source query may include one or more tables/documents but the
target will always be a single data area warehouse stage table.
2. Warehouse data management processes will not start until after all
data areas have been successfully acquired from the source.
Page 28
3. When acquire process starts it will record an entry into the data area
into the data area detail control table along with the start timestamp
from the Sunrise warehouse.
4. Once a data area has been successfully be acquired the completion
timestamp from the Sunrise warehouse will be set in the data area
detail control area.
Load Area
Once all the source data areas have been successfully acquired the
warehouse data management processes begin. The first of these processes
involves loading the data from the acquired source. This load process can
involve any of the following capabilities.
1
Active Area
Once a sources data area has completed the load process, that data areas
content will be immediately applied to that data areas active table.
Page 29
Depending on the data area type one of the following two operations will
occur. For net change data areas the following processes execute.
1
For record deletions the active area process will insert every deleted
record from the active table to the delete table including the deletion
timestamp. Once the deleted records are in the deletion table they are
deleted from the active table.
1. For record updates the current active area record will be inserted into
that areas history table. Once the updated records are in the history
table they are deleted from the active table since the updated record
in the load table will be inserted in its place.
2. For record additions the added record from the load table will be
inserted into the active data area table.
For data areas which are operate as complete refresh data areas the
following process occurs. These steps preserve the record content at each
point in time to easily support in depth over time analysis.
1
1. The complete set of data in that data areas load table is inserted into
that data areas active table.
2. The complete set of data in that data areas active table is inserted into
that data areas history table.
Type R data areas will always have the complete set of records in the
history database area.
Historical Area
As described in the process outline above in Active area section, the historical
area will contain older records. The older record could have been identified
by the warehouse net change load process or will be every record when the
data area is a complete refresh area.
Delete Area
As described in the process outline above in Active area section, the delete
area will contain records which were physically removed from the source due
Page 30
to an application or use error. Records that are archived or aged out in the
source will be maintained in the active warehouse area.
Page 31
The diagram above covers data area that operate in net change mode as well as
those which are refresh types. One rule to remember is that a data area operating in
complete refresh mode will never have any records in the delete area since all that
data areas records are point in time and stored in history.
Dimension
Although dimensions are sourced from business data standard repositories or
marketing specific standard repositories there is additional work performed on them
by the warehouse. Since marketing requires two views of data available at all times,
with a near real time presentation to the user requirement, dimensions have an
additional element appended by the warehouse. This element, known as the current
dimension key for each dimension value, is the second column of each dimension
and maintained by warehouse processes.
To illustrate this elements use lets take the example where the System/34 brand
was replaced by the System/36 brand which was replaced by the AS/400 brand
which was later replaced by the iSeries brand. While each dimension record
maintains linkage to the value which replaced or superseded it, providing marketing
Page 32
a current view of all System/34 through iSeries brands would require having a
normal dimension join to itself four times to get all records aligned to iSeries. By the
warehouse maintaining the current key value there is only one join and all the
historical records are rolled up into the current brands view; iSeries in this example.
Below is a screen capture that depicts the EDGE brand hierarchy structure.
Sales opportunities can roll up from the product category level of the brand
hierarchy, level four, all the way up to the first level of that hierarchy (e.g., SWG,
IGS) allowing reporting to easily sum up value to a higher level in the brand
organization. Also shown in the dimension above are additional elements for
grouping dimension values. This is often referred to as creating an alternate
hierarchy. Marketing uses two of these grouping elements in the EDGE brand to
regroup content based on how marketing needs to either analyze and/or report
brand.
Page 33
SD132KEIW Source Data 1st Tablespace 32k Page Size for EIW
SI132KMPW Source Index 1st Tablespace 32k Page Size for MPW
BD1132KMKO Build Data 1st Tablespace 32k Page Size for Mktg
Opportunity
Table Compression
Page 34
Starting with DB2 version eight column and table level compression has been
available. This facility allows for the storage of more data in a smaller space.
Performance with compressed tables tends to be as good as uncompressed
and in some cases it is better due to more data is brought in for processing
with every I/O from the storage subsystem. Sunrise has standardized table
definitions to, by default, be compressed.
Index Creation
Indexes can provide high performance paths to access data in a DB2
database. The downside with indexes, depending on how they are defined,
includes:
1
Never selected for use to access data when too few values exist for the
index to use
Page 35
In Sunrise every source will have its own set of tablespaces. This will allow
the source content, once acquired and processed through the warehouse, to
be backed up by the SDC while still supporting select access to that sources
warehouse tables. In addition to source specific tablespaces there are build
specific spaces. These are defined to support data mart or analyst required
content that will be copied from the warehouse to the appropriate target
mart.
The following naming and size standards are applied to all Sunrise data
tablespaces:
Page 36
Page 37
Page 38
1. Build (BLD)
2. Delivery (TGT)
3. Process Execution Control (PEC)
When reviewing any data within Sunrise the process area which is primarily
responsible for the data content can be identified based on the data areas table
name (e.g., content from EIW will have a process prefix of SRC).
Metadata
One of the critical requirements for Sunrise was to be able to adapt as business
needs change in minutes where possible and always within two weeks to any
business requirement. In order to achieve this requirement standard processes and
objects were created which utilize metadata to define what is required in
operationally manage Sunrise data. Each process area has a corresponding set of
metadata that execution will use to manage data. The following identifies the
metadata for each of the process areas.
The metadata change process involved updates to the master set of metadata
which, when applied to the Sunrise environment, versions the existing metadata to
a history table. This enable recovery, in the event of any issues, as well as tracks
updates over time based on the changing business needs over time. Below is an
ERD depicting the source metadata tables.
Page 39
The diagram shows element involved in defining each source, the data areas from
the source, alert contacts to notify related to issue at the source or data area levels,
the elements in each data area along with partitioning information when data is
acquired and recording errors/warnings when issues arise.
timestamp of that completion and validates the content and/or record count based
on defined rules. If there are no issues the source acquire is set to a Success
status. Below is a more detailed flow DataStage follows in acquiring data.
1. Business Data Key (BDK) Sets the key(s) used to determine if business data
content changed for any record.
2. Active History Delete (AHD) Shifts data between load, active, delete and
history database areas depending on how that data is defined to be
managed.
3. Reference Data Key (RDK) Resolves data elements to Sunrise business and
reference data dimensions.
4. Active History Reference (AHR) Manages reference data dimensions
including versioning based on metadata definitions.
5. Rollback (RBK) Used to reset warehouse data to some previous point in
time due to a Sunrise error our bad content being acquired from or delivered
by a source.
The diagram below shows how staged data moves through the warehouse with the
end result in the active, history and delete database areas.
Page 42
Page 43
Page 44
The design of each build has its own workflow to optimize the time required to
process the build. In many cases the build workflow is parallelized to help reduce
the time to have data available. Sunrise, as a general rule, has a noon US eastern
data available in the dashboard requirement for the weekly management system.
Over time the build workflows will integrate within process execution control to
facilitate better management across all Sunrise processes.
Page 45
The build begins after the EIW, MAT, and CRM data has been acquired, processed
into the warehouse, and had each of their respective builds completed. The build
then:
Integrates the MAT and CRM response as well as contact addressed data so
that no duplication of response or contact address exists.
1. Aligns the response tactics with master set of tactics generated by the tactic
simplification integration build including identifying when auto-deploy
tactics are required for responses from countries which were not planned as
part of the tactic.
2. Aligns the response contact with opportunity contacts using creation and
deletion dates to determine the type of marketing influence for an
opportunity.
Page 46
3. Based on the type of source which created an opportunity record, along with
associated marketing tactics, create a response if one does not exist (e.g.,
LDR created opportunity but no response exists for the tactic the LDR
associated with the opportunity).
Page 47
Once the data has applied to the target environment the completion is recorded.
Page 48
The content required for MDb users includes output from two build processes. Once
those processes have completed the DataStage delivery process is invoked.
Page 49
Page 50
After an acquire completes all records are shifted to the history table shown on the
right for over time operations and data source analysis.
This particular process status table records activity performed by the DAC
procedure. Since a procedure can be called in parallel by multiple processes each
invocation records the calling process, its processing sequence, the from and two
data areas along with the beginning and ending timestamps as well as record
counts. Additionally any errors or warnings that occurred will set the status and
populate the error description field supporting operations diagnostics.
Page 52
System Environment
Hardware
Software
AIX v6.1
DB2 v10.1
Storage
In order to enable increased utilization of CPU and memory, given the
software environment Sunrise operates within, the original storage approach
was reworked in 2013 to allow for multiple concurrent parallel I/O operating
Page 53
on multiple adapters with multiple bufers for multiple file systems which
contain Sunrise databases. The diagram below shows the end result of the
storage alteration made which enabled Sunrise to meet its by noon business
requirement.
Security Enablement
Data Access Groups
The following table defines the groups established for controlling access to
each of the database environments as well as content within each database
environment.
Sunrise Warehouse
Page 54
Page 56
srops Sunrise Operational Group containing any user ID which can execution
production processes.
1. srwhbld Sunrise warehouse operational user ID
2. srdmbld Sunrise data mart operational user ID
Page 57
Page 58
In addition to the source and IT maintained data area information content the
source change tracking process also updates the dictionary records every time
there is a change to a data area as well as when the change was captured. For each
data area element the following is to be entered:
Business data name
Business definition
Business purpose
Content example(s)
Business rules
Associated content standard
Additional information
Below is an example of dictionary content is depicted below:
DTL_SSM_STEP_NO:
Business name: Sales cycle code
Definition: Contains a text value to identify various measurement
points for an opportunities lifecycle.
Purpose: Used to understand where an opportunity is, in the sales
lifecycle, as well as what stages it previously passed through along
with how much time it spent in each stage.
Example(s): 3, 6
Standard: BDS
SDC Integration
For performance as well as total recoverability it is critical that the DB2 database
environment must be managed in concert with Service Delivery Center (SDC)
operations.
Page 59
Netezza Environment
Although DB2 is a highly scalable environment performance does become an
issue when the table has a huge number of records. IBM acquired a
company with capabilities in high scale high volume data management
capabilities. This company, Netezza, has a hybrid hardware and software
implementation. The BACC, as the intelligence and analysis center of
competency, has installed a Netezza to support analytics and reporting.
Sunrise will utilize this environment, in concert with the DB2 environment,
when size and scale dictate its use.
Page 60
System Environment
Hardware
Software
MQ Client
DB2 v9.7
Page 61
The following diagram provides an overall architectural view for BACC DataStage
environments.
The following link will navigate to the BACC home wiki page. The following screen
capture depicts the current DataStage environment.
Page 62
Page 63
Overview
System Environment
Hardware
Software
zLinux
Page 64
Cognos v10.2.1.5
Page 65
System Environment
Hardware
Software
Page 66
zLinux
Page 67
Page 68
Each of the high level process steps above is defined and maintained in the annual marketing targets
implementation guide.
Page 69
The red line at the bottom signifies two data relationships. First is that the initial IMT sector targets are
calculated by the IMT sector spread and second is that sub-brand and sector targets for the same
marketing program in the same IMT must have the same value.
Once the marketing target content is set in TM1 it is exported and loaded into the
Sunrise warehouse for use in the management measurement system.
Page 71
Page 72
once that source acquire completes. The SDC will maintain four backups
versions in case a restore is required.
NMON
The first tool is used to track what is executing in any Sunrise database at
any time. It provides not just what is executing but how much database
resource is being consumed and how efficiently the database process is
executing. With this tool operations can easily tell if some process seems to
be running or not. If the process is executing, but not performing reads and/or
writes at tens of thousands to a million plus records per second, ops will
watch that process. If it continues to run without seemingly executing, ops
will contact development for a review.
The second tool provides the ability to monitor the AIX environment that the
database is executing within. This allows operations to see how much system
resource each Sunrise database is consuming including CPU, memory and I/O.
This is where operations has found issues with the I/O subsystem performing
well given the Sunrise database load at that time.
Page 74
Infrastructure Environment
Systems
The Sunrise development environment is supported on a pSeries platform in
the Lexington DST environment. The pSeries configuration for development
is:
Thirty two processors
132GB memory
Software
The development environment maintains the same software levels as the
AHE production environment although it may have additional environments
to the next level of software for testing.
AIX v6.1
DB2 v10.1
DB2 v10.5
Page 75
Databases
At the time of this update the following databases are defined on
development:
SRWHSE
SRDM
LPDB
UPDB
MARSRaw/ADR
SRWHSE10
Page 76