Documente Academic
Documente Profesional
Documente Cultură
Cliff Longman,
Chief Technology Officer
Kalido
Executive summary
Kalido automates the implementation of 20 data warehouse best practices as defined by Ralph Kimball and Bill
Inmon, the two best known data warehouse gurus.
This results in three major benefits:
Time to value
Kalido data warehouses are developed 50% faster than conventional data warehouses. In addition, Kalido
data warehouses are flexible, so adapting to new requirements is also much faster than with conventional
data warehouses.
Reduction in total cost
The total number of person days required to deliver and then maintain a Kalido data warehouse is 75% less
than for a conventional data warehouse.
Risk mitigation
Human error is significantly reduced, resulting in accurate business reports even through complex business
change.
This paper addresses the detailed explanation of how Kalido implements these best practices. It is written with
the project manager, data architect and data warehouse developer in mind as they consider how to use Kalido in
their information management projects.
Table of Contents
1. Introduction
2. Design goals
3. A
rchitectural structure
and features
5
5
5
5
3.3. Terminology
4. D
esign techniques and how
Kalido supports them
7
7
8
4.5. Roles
10
11
non-additive facts/measures
UOM conversions
4.11. Multiple grain transaction data
11
11
11
12
13
13
13
13
14
14
4.19. Staging
14
15
5. Summary table
16
6. Conclusion
17
7. About Kalido
17
1. Introduction
We are frequently asked how Kalido supports industry best practices for building and running data warehouses.
This paper answers that question. Kalido delivers a packaged software solution for implementing tried and
tested industry best practices in a way that eliminates human error and greatly speeds the development and management of data warehouse environments.
We have developed a list of industry best practices culled directly from The Data Warehouse Lifecycle
Toolkit by Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite (Wiley, 1998). We have
enhanced this list to include techniques recommended by Bill Inmon in his specification for DW 2.0
(www.inmoncif.com/registration/news/dw2.php) and also some more recent Kimball design tips from
www.ralphkimball.com.
For each technique, we have paraphrased a description and given references to The Data Warehouse Lifecycle
Toolkit book, DW 2.0 specification (page numbers in angle brackets <>) or design tip at which you can find a
full description of the best practice advice. We have then described how Kalido enforces or supports the practice
through its software products.
This paper is divided into three main sections. The first section highlights the major design goals that Kimball
and Inmon recommend as drivers for data warehouse development. The second section describes how Kalido
fits into the overall architecture of a data warehouse. The third section itemizes 20 best practices one by one and
describes how Kalido enforces or supports these best practices. Finally, the summary gives a quick reference
table of all the items discussed with a rating for Kalidos support for each.
2. Design goals
Three primary data warehouse design goals stand out from the literature. These design goals have been at the
forefront of the Kalido design teams consideration from the earliest stages of the softwares initial development.
Having been developed to address the needs of the Royal Dutch Shell Corporation as it attempted to reconcile
performance data from its world-wide operations during the early 1990s, Kalido software has been developed with
the following in mind:
The decision regarding which Kalido architecture to choose is driven by the disparity of the marts (similar marts
tend to favor a single Kalido instance with the marts supported by results sets, dissimilar marts tend to favor
multiple Kalido instances).
If the marts have to be supported by different DBMSs, then there must be at least one Kalido instance for each
DBMS to be used.
3.3. Terminology
The following table shows the Kalido terminology equivalents to the Kimball/Inmon terms.
Kimball/Inmon
Kalido
Comment
Dimension
Dimension
Dimensional Table
Fact
Transaction
Source System
Fact type
Class of transaction
Surrogate key
OID
Naming scheme
Dimensional models
Figure 4.1.
Kalido also ensures that other features of the system as a whole are kept intact alongside slowly changing
dimensions, for example automated summaries, queries that run across periods of change, currency and other
unit of measure conversions, etc. All these other features are executed in conjunction with the slowly changing
dimensions so that the business user gets the right information when Kalido produces output.
4.5. Roles
Roles describe a situation in which the same class is used for two different purposes. An example would be the
Company class being used in the role of Customer and also Supplier (i.e., we sell to companies; we also
buy from companies). Kimball describes roles <pages 223226> focusing especially on examples of dates/times
as a class with many roles.
Kalido has roles built in. Part of the business model involves specifying the roles that a class can play, and also
the role that is being played when the class is referred to by a fact (e.g., the customer and the supplier for a
purchase fact). These roles apply to relationships defined in the business model.
Kalido also implements subtypes which assist with validation (sometimes confused with roles). For example, a
customer may have a mandatory credit-rating attribute, whereas a supplier would not but may have a maximum
lead time attribute. This is achieved through class of business entity subtypes with rich cardinality rules like a
super-type entity can be zero, one or many of its sub-types.
Kalido would also validate the multiple role fields in each transaction record against whether that company is
allowed to play a particular (e.g., supplier) role.
Kalido also allows result sets (data generated as output) to be physically stored and labeled as part of the
business model (by creating a new transaction dataset). This allows complex algorithmically defined summaries
(such as allocations) to be calculated using custom algorithms and then stored under the scope of the Kalido
business model, affording them the ability to be selected and evolved under business model control.
Time variance needs to be taken into account when creating summaries. Each summarization process requires
a choice of aggregation using current parents or with parents at the time that each transaction took place. Kalido
automates this for you.
aggregates and summaries
Figure 4.6.
Figure 4.7.
10
11
margin by individual sales. Note that the algorithms and schemes for allocating to a lower grain are specific to
each company and sometimes have multiple alternatives within a single company.
Kalido supports allocation by defining a new measure and transaction data set for each allocation method (so,
for example, there may be a measure called costs allocated by headcount and a TDS called daily headcount
allocated costs). A script, program or ETL tool is used to implement the allocation algorithm depending on the
complexity of the allocation algorithm. With Kalido, implementers should bear in mind that models may change,
so the most flexible (but most difficult) allocation algorithms are driven from the Kalido meta data rather than
being hard coded to expect certain classes, attributes and associations in a defined configuration. Implementers
should also be careful to re-run an allocation should any of the relevant master data change historically (e.g., if
an employee entity is updated on 1/1/2006 to be in a different department historically, such as between 1/1/2005
and 6/1/2005.
Sometimes, the same type of transaction data may be available from multiple sources with different granularity.
For example, we may have a legacy sales order processing system capturing the revenue and volume by product,
customer and day and another ERP system with sales transactions capturing revenue, volume, discount and
distribution cost by product, customer-shipped-to (as a child of customer-sold-to), sales representative and day.
Having some different measures and referring to different columns in the dimension tables would make the
aggregation at common levels more difficult, resulting in complex SQL unions. Kalido handles such multi-granular
transactions automatically to resolve the common granularity from its meta data and adds up measure values
correctly while more granular totals are still available as much as the individual source granularities.
Kalido also provides a construct known as the coding structure which allows facts that may be recorded against
one of a number of levels in a hierarchy (e.g., a budget that may be for a team, a department, a division or a
company). Kalido automatically generates the right SQL to aggregate facts recorded against a coding structure.
12
13
is that when the master data changes, more (or fewer) columns in the mapping table may appear (based on the
number of levels in the data). Kalido automatically handles this upgrade to both table and data, but it does mean
that table structure may change in a production environment, unless this type of change is handled through the
development/test/QA/release environment (as it typically is in most non-Kalido systems). It is worth noting that
with Kalido this is simply modeled as an involution on the business model, and although this never results in
a SQL CONNECT BY statement (to Kimballs point) it does simply work by providing the right results for the
user (and does this in a way that continues to work despite changes in the hierarchical structure such as
re-organizations, including aggregations that span a reorganization).
4.19. Staging
Kimball describes staging in great depth <pages 609663>. The description deals with approaches to surrogate
key management, database versus flat file staging, incremental loading, final data loading, error handling/flagging,
dimensional change logic, historic data loads, data quality and cleansing. In the Kimball model, the staging area
is a significant part of the data warehouse architecture. Kalido provides either automation or support for each of
the items mentioned above, and as a result, the staging area in a Kalido implementation tends to be somewhat
simpler than would be the case in a Kimball implementation. That is not to trivialize what can be a very complex
and difficult part of a data warehouse project poor quality data in particular still needs to be sorted out it is
just that Kalido automates or obviates the need for a significant proportion of standard Kimball-style staging
activity. Here are the main points:
Database versus flat file staging
Kalido supports both. Files can be on the server running the load utility or via an ftp connection.
Incremental loading
Kalido will do delta detection for dimensional data if required (so a complete dump of a dimension can be
loaded each period, allowing Kalido to do the incremental update as necessary). If source system or ETL
tools are to do the change data detection, Kalido will accept a file of deltas.
14
15
5. Summary table
The table below summarizes Kalidos support for the industry best practices for developing data warehouses.
For each feature we have given an indication of whether Kalido supports the best practice or not, and if it does
support the best practice, the level of support Kalido offers.
16
Warehouse Feature
Kalido support
Level of Support
Dimensional models
Surrogate keys
Roles
Time as a dimension
Sparse facts
Many-many dimensions
Degenerate dimensions
Junk dimensions
Audit dimension
Staging
6. Conclusion
While additional data warehousing best practices are advocated by both Ralph Kimball and Bill Inmon, as well
as by many other practitioners, this paper has focused on the 20 forming the bulk of the published literature for
data warehouse design as defined by Kimball and Inmon. From its inception, Kalido has kept the most critical of
these best practices firmly in mind while developing and improving its software products: incremental development,
efficiency and ease of understanding and graceful adaptation to change.
By using Kalido to build and manage their data warehouse, organizations can rest assured that it conforms to the
best practices as advocated by the industrys best known experts. More important, a Kalido data warehouse will
deliver the accurate, consistent, accessible information your company needs to manage and run your business
over time, as it changes.
For more information about Kalido software, please visit our web site at http://www.kalido.com. To learn more
about how Kalido supports industry best practices, please contact us at info@kalido.com.
7. About Kalido
Kalido delivers active information management for business. With Kalidos unique business-model-driven
technology, decisions are fueled by accurate, accessible and consistent information, delivered in real time, to
dramatically improve corporate performance. Kalido software can be deployed at a fraction of the time and cost
of traditional information management methods.
Kalido software is installed at over 250 locations in more than 100 countries with market leading companies.
Headquartered in Burlington, Massachusetts, Kalido is backed by Atlas Venture, Benchmark Capital and Matrix
Partners. More information about Kalido can be found at: http://www.kalido.com.
17
Contact Information
US Tel:
+1 781 202 3200
Eur Tel:
+44 (0)845 224 1236
Email:
info@kalido.com
Copyright 2007 Kalido. All rights reserved. Kalido, the Kalido logo and Kalidos product names are trademarks of Kalido.
References to other companies and their products use trademarks owned by the respective companies and are for reference purpose only.
WP-DWBP0307