Sunteți pe pagina 1din 5

More Create Blog Sign In

Clarity in Data Warehousing


From the point of view of Randy Grenier

February 5, 2011 Me (2012)

Operational Data Stores (ODS)

Introduction
An operational data store (ODS) is an architectural component of a data warehouse that is
used for immediate reporting with current operational data. An ODS contains lightly
transformed and lightly integrated operational data with a short time window. It is used for
real time and near real time reporting.
Email Me
Unlike data marts, an ODS is not refreshed from the data warehouse history tables. Rather it
is directly loaded from operational data, staging area, or incoming files. It can optionally claritygrenier@gmail.com
serve as a data source for the data warehouse.
Click on any image to see full size.

More Stuff
Digital Art

Nature Photos

Blog Archive
February (1)

It's Only Words!


An ODS must be frequently refreshed so that it contains very current data. An ODS can be
updated daily, hourly, or even immediately after transactions on operational data. Words are conduits to our passions
and our memories. We think in words.
Transformation and Integration We speak and are spoken to in our
The update frequency and currency of the data in the ODS is directly related to the amount dreams. Words can inspire, enrage,
of transformation and integration that is performed on the data. Choices are made by the depress and elevate us. But words
cannot reach the deepest sanctuary of
development team based on how long various transformation processes will take to complete
the soul.
vs. how current the data must be for reporting. For example, if real time reports or
dashboards require data within minutes or even seconds after data events, it may not be
possible to do time-consuming transformation or integration processing. Also, dimensions
and other reference data in the data warehouse may not be as current as new operational
data.

Some degree of transformation and/or integration is usually required for reports. Bill Inman
defines five ODS classes[1]. The classes represent different levels of ease and speed of
refresh vs. the degree of integration and transformation. For example, a Class I ODS would
simply consist of direct replication of operational data (no transformation), where a Class V
ODS would consist of highly integrated and aggregated data (highly transformed). A Class I
ODS would be the quickest and simplest to refresh, while a Class V ODS would involve the
most complex, time-consuming processing.
Refresh Options
One option for refreshing an ODS with very current data is to use the transaction logs of the
operational data to update data replicated in the ODS. Database systems use transaction
logging to record all updates, inserts, and deletes to tables. Transaction logs are normally
used for rolling back invalid transactions or for applying changes to data that were not
completed due to a system failure. However, if the tables are replicated in the ODS, the
transaction logs can also be used to refresh them. The replicated tables could then be used
as staging for reporting tables. In SQL Server and in Oracle using transaction logs to update
replicated data is called Change Data Capture (CDC).[2] Those products provide system
stored procedures and other tools to assist in applying transaction logging to replicated data.

Another option for keeping current data in an ODS is the use of indexed views[3] (SQL
Server) or materialized views[4] (Oracle). Indexed views and materialized views are similar
to regular views except that they provide high performance when the view is queried. These
high-performance views created in the ODS schema point at operational data. Views can
provide a relatively simple way to keep an ODS very current. Views cannot be used when it
is necessary to access data on remote servers.

File system mirroring works at a file system level and uses mirrored data to refresh the ODS.
For example, EMC’s proprietary Business Control Volume (BCV) disconnects the mirror so
that it contains a snapshot of data at a given point in time. After the snapshot has been used
to update the ODS, it is reconnected to the source data and brought back into synch by the
system.
Update triggers can be created on operational data tables to write to the ODS whenever the
data is updated. Triggers are infrequently used for ODS refresh because they require
modification of the operational data to add the trigger code. Also, because triggers can affect
performance, they are not feasible when there is high volume transaction processing.

When an ODS needs to be updated less frequently, conventional ETL processes can be
used. For example, if an ODS is only updated once daily, it may be feasible to export
operational data to files which can then be quickly loaded into the ODS.

ODS Tables
An ODS can contain its own staging tables as well as transformed tables for reporting. It is
common for an ODS to utilize separate staging tables from the rest of the data warehouse
because the ODS is refreshed by separate processes than the data warehouse.

Reporting tables are limited by the fact that there may not be as full integration and
transformation as there is for data warehouse tables. This is because time consuming ETL
processing may not allow for fast access to very current data. Because of this, dimensional
modeling (star schema) is not always possible.

Reporting
The main purpose of an ODS is to provide reporting and querying on very current operational
data. Reports can only be created on very short time windows of data. To query history, the
data warehouse and application-specific data marts must be used instead of the ODS.

Reports may also be affected by the limited transformation and integration of data in an
ODS. For example, data may not contain the surrogate keys necessary for joins on
dimensions in the data warehouse. Data from multiple sources may not be completely
integrated into consistent structures and attribute values.

Summary
An ODS can provide access to current operational data for reporting. An ODS is loaded
directly from operational data and not from the data warehouse history tables. An ODS only
contains a short time window of data. If history is required, the ODS can be a data source for
the data warehouse.

An ODS must balance the frequency of refresh, the degree of transformation and integration,
and how current data in the ODS must be. A number of refresh options should be
considered based on application requirements.

An ODS can be a database or a schema within a database that contains both staging as well
as reporting tables. There may be limitations on the reporting tables compared to data mart
tables due to the fact that there is less transformation and integration of operational data.

[1] http://www.information-management.com/issues/20000101/1749-1.html
[2] http://msdn.microsoft.com/en-us/library/cc645937.aspx
[3] http://technet.microsoft.com/en-us/library/cc917715.aspx
[4] http://en.wikipedia.org/wiki/Materialized_view

Posted by randy at 11:13 AM

10 comments:

shikender September 22, 2011 at 10:02 AM


Hello,
We facilitate the provision of independent analysis to support expert testimony,
regulatory or legislative engagements. Frequently, this work includes economic,
financial and statistical studies of varying data analysis, technical and
http://www.stlouisbridal.com.
Reply

Anonymous August 6, 2012 at 1:39 PM


Randy,
What do you recommend for a good data cleansing tool? I have heard of Google
Refine but I am looking for something better..
Reply

smit shah March 15, 2013 at 5:21 AM


I am a warehouse associate in a small local business and I am gaining experience in
assembly, packing, shipping, receiving, inventory, and some quality control. I am the
only one in the warehouse, so I am pretty much responsible for everything that
happens in that part of the company. What are some good paying jobs where this
experience would make me a competitive applicant? I also have an Associates in
Science degree.
warehouse for sale
Reply

Joseph Nguyen August 21, 2013 at 12:10 PM


Wow, amazing explanation and diagram. Thank you!
Reply

SYED SALMAN RAZA May 14, 2014 at 8:26 AM


plz, tell me in simple words that what is operational data actually?
Reply

Replies

randy April 29, 2016 at 10:59 AM


Operational data is simply transactional or OLTP data that is brought into
the data warehouse.

Reply

Unknown July 14, 2016 at 2:04 PM

If you report off the ODS, when do you cleanse data and remove duplicate data?
Reply

Sunjay Kapoor February 19, 2017 at 9:12 PM


material handling equipments | Automated Storage & Retrieval System | Conveyors |
Automated Guided Vehicles | Warehouse Management System | SCADA
Supervision Software!

Reply

Sdaemon Infotech Pvt Ltd March 13, 2018 at 2:56 AM


Wow! Great post! The content is very rich, and I really like it. It help me very much to
solve some problems. It is very helpful for all the people on the web. Thanks a
lot.Sdaemon Infotech Pvt LtdData Warehousing And Reporting Service Provider

Reply

Teju Teju June 21, 2018 at 2:47 AM

Thank you sharing the information Informatica Online Training Hyderabad


Reply

Enter your comment...

Comment as: ZEUSTHAI (Go Sign out

Publish Preview Notify me

Newer Post Home Older Post

Subscribe to: Post Comments (Atom)

Awesome Inc. theme. Powered by Blogger.

S-ar putea să vă placă și