Documente Academic
Documente Profesional
Documente Cultură
by Alan Jordan
Chief Technology Officer
Coglin Mill
June 2008
2
Introduction
IBM DB2 Web Query for System i was announced in April 2007 as a replacement for the
Query/400 product. It was released in September 2007, and is now available to all System i
customers at V5R4 level and above.
Its announcement has generated an enormous amount of interest in the System i community. In fact,
the level of interest significantly exceeds IBMs expectations.
Why?
It is undoubtedly true the ageing Query/400 product was overdue for replacement, but in truth, several
other products from 3rd party vendors have been available for many years.
Maybe it is the IBM name on the product, or maybe the expectation that it is free or available at a
very low cost. Regardless, it is apparent that many System i shops are planning to implement DB2
Web Query as a replacement for Query/400 in the very near future.
It is also becoming apparent, from direct and anecdotal reports by early adopters, that DB2 Web
Query by itself is not the universal panacea to existing problems in a Query/400 based reporting
environment. This suggests there is some level of misconception regarding DB2 Web Querys
capabilities, or more likely that there is a general misconception regarding the required components of
a successful business intelligence architecture.
Informed decision making in any organization is dependent on access to reliable, accurate and timely
information. Front-end tools, such as DB2 Web Query are only the delivery vehicle for this information,
in much the same way that faucets deliver water to your kitchen sink and bathroom. Replacing a worn
out faucet with a shiny new one will look great, but does nothing to address water quality, leaky pipes
or a broken hot water system. Of course you wouldnt expect a new faucet to fix your plumbing
problems but many organizations fail to recognize that their reporting problems are mostly
associated with plumbing issues, and the fix needs to be applied before the data (water) gets to the
reporting tool (faucet).
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
3
In todays modern business environment, IT managers in System i centric organizations are facing a
significant number of reporting challenges compared to the situation they faced ten or more years ago.
Lets explore the past to fully understand the challenges this past decade has brought with it
In 1997 a typical organization using an AS/400 to run its business had fairly simple and easy to
manage reporting requirements. All or most of its business applications might run on the same
AS/400. In many cases all critical data came from the same ERP application - such as JDE or BPCS.
Management reporting was fairly straightforward with fixed, paper based reports being the norm,
whether provided by the application itself, or developed using Query/400. PC based graphical
reporting tools were available, but not widely adopted in the AS/400 community.
Since then, we have seen explosive growth of the internet and ecommerce, new technologies have
emerged and new paradigms have come along (and in some cases, quietly disappeared a few years
later). Additionally, we have seen unprecedented merger and acquisition activity in the marketplace.
The mobility of the IT workforce brings new people into our organizations, bringing with them new
skills and ideas. As a result, in many System i data centers, PCs, Unix and Linux boxes running
Oracle, Microsoft SQL Server, MySQL and other databases sit alongside System i boxes. Times have
changed.
If we compare the System i marketplace to organizations using other platforms and databases, we
see an interesting, potentially disturbing trend. The System i community as a whole, has not been an
aggressive adopter of business intelligence technologies, and as such could be seen to lag behind
their competitors on other platforms. Certainly there are many larger System i centric organizations
that do have BI implementations but clearly a significant number of medium sized organizations do
not, as well as a majority of smaller businesses.
When analyzing the reasons for this slow uptake, we see the very success of the platform is (at least
in part,) responsible. Our integrated database in conjunction with the readily available Query/400 tool
made it very easy for us to design our own reports and queries. RPG programmers were always
available to write the more difficult reports. We have been spoiled because we have the very best
available report programming language ever created! (Just in case you didnt know: RPG stands for
Report Program Generator).
Contrast this to our colleagues struggling to provide the same capabilities on other platforms. Until the
advent of the SQL language and modern BI reporting tools, these companies had fewer options. They
struggled with data in text files or data in multiple databases, and with programming languages that
were not particularly well suited to generating printed reports. Is it any wonder these organizations
embraced BI tools and technologies when they became available?
While there are still many System i shops that are relatively insulated, with all of their data coming
from System i applications, this is no longer the norm. Many IT managers are facing an enormous
challenge in being able to provide the information that their business users are demanding. Data is
spread across multiple systems, multiple applications and different databases. Added to this, many
C-level execs are asking for dashboards to provide a high level view of the business with pictures,
charts, color coded indicators and drill-down capabilities etc.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
4
We have more data than ever before. Our business community has much higher expectations than
ever before. But IT departments are seeing budget cuts and are expected to produce more with fewer
resources. While some challenges are unrelated to reporting, a significant number of organizations
cite access to data and reporting as one of their major issues.
Now IBM is promising to solve our reporting issues with a new, modern web-based reporting tool
DB2 Web Query.
Is it up to the task?
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
5
Let us explore some of the common issues plaguing our reporting initiatives. Some of these are found
in almost every organization, regardless of size. Other issues may not apply to your specific
environment, but are nevertheless common, and you could well encounter them in the future.
Data Quality
This is arguably the biggest issue of them all however data quality is given little attention in many
shops. While there are many reasons for this, there is probably one major underlying factor to this
lack of attention:
While it is possible this is true, based on the experiences of every business intelligence consultant
and data analyst it is highly unlikely. Just because you dont see significant data quality issues, it
doesnt mean there arent any. You just havent found them yet.
Operational applications (e.g. ERP Systems) generate enormous amounts of data in many tables and
columns. When bugs (associated with data issues) are encountered in the daily use of the application,
the problem gets attention, and fixed. However, there are many instances when data is generated by
a process, and not touched again by the application, or if touched, the error is such that it does not
result in a recognizable problem. These data errors do not get discovered and corrected.
So when you hear someone in your organization say we dont have any data problems, be very
skeptical!
You have a problem if you are using that suspect data for business intelligence reporting. If you have
done nothing to validate critical pieces of data (information) how confident can you be in the reports
you are delivering or consuming? Whose neck are you putting at risk?
Data Complexity
The data contained in an operational database can be quite difficult to interpret. There are a number
of reasons for this:
o The principle of 3rd Normal Form calls for avoidance of redundancy, resulting in a
complex database with many more tables than you would think necessary. From your
perspective, this means having to join many tables for just a simple report.
o Minimize disk usage. Disk storage used to be quite costly; therefore every effort was
made to reduce the number of bytes used. Instead of meaningful values, single
character codes are commonly used. The software understands them, and translates
them to recognizable information but do you know what they all mean?
There is usually no user-manual describing the structure and meaning of the database. Again,
this is because the database was not designed for human access. The software using it does
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
6
not need a user manual. Not only is it complex, but it is not documented!
In all but the very best database designs, you will encounter all sorts of inconsistencies. This
is usually because the application has grown over time and been developed by many different
programmers:
o The same piece of data may have a quite different name and description in different
tables.
o Redundant columns may have been re-used for a different purpose, and therefore
have totally incorrect names and descriptions.
o Meaningful information may be embedded inside another column (e.g. the first
character of the sales representative code identifies the sales region).
Disparate Data
Data coming from just one application may be difficult enough to understand and use but that pales
in comparison to the task of joining data across applications. If those applications reside on different
systems and use different database types, it gets even harder.
What are the odds that similar, related pieces of information are stored in the same format?
If we are dealing with Customers (as most of us do), and we have two or more different
applications, with overlapping function, its also likely we also have more than one customer
database. Some customers will only exist in one place, but it is probable some will be in both
databases (or all 3, or 4 of them). Unfortunately, it is highly unlikely that theyll have the same
customer number (or even that the customer name will be identical).
Assuming the data allows me to join tables across applications, how do I achieve this if the
applications are on different servers? And in different database types?
Each server will likely have different security; database availability may be different and many
other issues will make this a nightmare.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
7
Dates
Dates can be so much of an issue that weve given them their own section.
In almost all legacy System i (e.g. AS/400 era) databases, dates are stored in numeric columns.
Unfortunately there are many different potential formats: yyyymmdd, mmddyyyy, cyymmdd etc. If you
are lucky, all dates in an application will be in the same format but that may not be the case. At least
today, your dates are probably in a Y2K compliant format!
Dates are important since they comprise probably the single most important dimension as far as
business intelligence reporting. We almost always want to know when something happened, or
maybe to group and summarize information based on a particular timeframe. As such, we need to be
easily able to determine the year or month from a date or the week, or even the quarter. Or maybe
we want to calculate the number of days (or months) between two dates.
These requirements are not easily accomplished when your dates are just a number in a numeric
column! Plus, many applications allow dates of zero, or set them to all 9s to indicate some unknown
future date. These are not valid dates and must be specially handled.
Performance
When reporting directly from operational data, we must use whatever source of data is available to us
containing the information we need. For transaction data that usually means the most detailed level
stored in the database (e.g. invoice line item level). If you need a summary report by Division and
Brand, and then another one by Customer Group and Region, your only option is to process all line
items (matching the selection criteria) and perform the aggregation and then do it again for the
second report. As weve recognized above, we will almost certainly need to join this detail level data
to several other tables. This often leads to significant performance problems. If these reports are run
on an ad-hoc basis during the day, they not only take a while to run, but can also affect everyone else
using the system.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
8
Weve identified some common issues you may have encountered when using Query/400 for
business intelligence reporting. Now well examine how (if at all) DB2 Web Query can solve these
problems.
Data Quality
Not addressed at all.
This is the case with virtually all query and reporting tools. DB2 Web Query, Showcase, NGS,
Cognos, Brio Query etc. have no coherent mechanisms to manage data quality. The old adage
applies: garbage in, garbage out.
Data Complexity
The nature of this issue the fact that the data is complex, does not change simply because you are
using a new query tool. The effort required to investigate, discover and understand the data and all of
its associate rules and idiosyncrasies remains the same. However, DB2 Web Query does give you
the capability to add descriptive comments at the table and column level.
Disparate Data
The base version of DB2 Web Query has no support for access to data on other platforms.
Dates
DB2 Web Query does have limited support for numeric dates. However it requires manual handling of
every date column and custom programming within DB2 Web Query - a very labor intensive process,
and possibly outside the capabilities of the average user.
Performance
Reports written using DB2 Web Query may perform slightly better than an equivalent Query/400
report. This is because Query/400 always uses the old Classic Query Engine (CQE), whereas DB2
Web Query will, in most cases take advantage of the newer SQL Query Engine (SQE), which can
perform better for many types of queries. Certain types of query however, may still use the CQE, in
which case performance will be comparable.
However, the underlying reason for most performance issues is not the query engine but the
structure of the data being accessed, and the fact that you are reporting directly against your
operational database. DB2 Web Query cannot address this.
Of course, this is not a definitive list of the issues that can be encountered. However, the vast majority
of issues fall broadly into one of the above categories, and you can expect that very few data related
issues will be resolved by DB2 Web Query. The truth is that you should NOT expect a query/reporting
tool to address them at all.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
9
Weve explored some of the common problems associated with business intelligence reporting.
Organizations are unlikely to live with these restrictions and limitations, and instead will develop
solutions. Let us look at poorly designed and implemented solutions to these reporting problems, and
ensure we understand the pitfalls to avoid.
Youll probably recognize some or all of these symptoms within your own organization:
Ad-hoc solutions
Each time an issue is encountered in a specific report or query, it is addressed as a stand-alone issue
and resolved using an approach chosen by the developer (or business user) assigned to the task. The
individual challenges faced and the chosen solutions are quite varied, but we can look at some
examples:
Create an extract table or summary table to stage and format the data required for the
query and use an RPG program to load it.
Use several queries, each outputting to a work file, and then create a final query over those
work files to get to the required result.
Use Client access to download data to Excel. Use Excel to further manipulate and merge
various sets of data to get to the required end result.
Because of the ad-hoc approaches used to solve individual reporting requirements, each can have
any or all of these underlying reasons for producing incorrect results.
Duplicated effort
In many cases you will need different views of the same information. While the specific requirements
may be somewhat different in each case, the underlying data values are very similar -just grouped,
sorted, filtered and presented in different ways. When these individual issues are addressed
independently, the same processing steps (extract data, perform calculations etc) are repeated each
time. The developer tasked with each new project probably needs to re-discover the data sources and
business rules. Not only is this totally inefficient, but is also a major contributor to the multiple versions
of the truth phenomenon.
Lack of Documentation
In a reporting environment relying on printed reports or static on-line reports, your business users
simply need to know what is available. This is not too hard to communicate. However, when business
users are given access to query tools or OLAP tools allowing them to explore data through the use of
the tool, you immediately have a problem. You have turned business users into pseudo-IT people,
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
10
who are now empowered to create their own views of the data. But unless you also provide them with
a road-map so that they know what data (ie tables) is available to them, and what it means they are
going to give it up as too hard, or struggle with their incomplete knowledge and make mistakes. We
are again faced with a multiple versions of the truth scenarios, and also potentially the ad-hoc
approach issue.
This scenario is found in many organizations, but quite often it is not recognized as an issue. The
various elements (extracts, summary files, reports etc) grow over many years, with new ad-hoc
components being added as needs dictate. The following diagram shows the end result in a typical
organization.
100
~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ 80 ~~~~~~~~~
~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ 60 ~~~~~~~~~
~~~~~~~~~ ~~~~~~~~~ ~~~~~~~~~ 40 ~~~~~~~~~
20 John spends 5 days
0
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
every month generating
this and massaging the
numbers until he thinks
it is correct
Mary wrote this
extract. She left last
year and no-one
knows how it works
Brand Region GL
Purchasing Profitability
Sales Sales Summary
Extract Extract
Summary Summary (Excel)
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
11
By now you probably recognize that a data warehouse may be the answer.
But many organizations, for one reason or another, shy away from this approach.
We dont have a problem. In most cases, this really means, I dont want to have to face up to
it. (The ostrich syndrome).
Were too small to get into Data Warehousing. We dont have the resources or the skills to do
that.
Ive heard Data Warehouse projects often fail Im not going to risk that.
What we have learned over the past decade is a set of best practices that, if followed, will eliminate
the causes of failure. More importantly, these best practices can be implemented in a project of any
scale. In other words you can (and should) build and deliver small pieces of an overall grand plan,
rather than trying to build it all at once. Furthermore, there are tools available today that will vastly
reduce the overall effort involved.
The key is to plan, build and deliver around an architected business intelligence framework.
So what does that mean? Lets list some key attributes of the overall project:
Think big, develop small. Implement different subject areas, or departmental requirements
one at a time, delivering each completed area out to your business community as completed.
Ensure each part builds on the grand plan, using the same standardized approach.
Manage data quality. Poor data quality WILL sink the project.
A full discussion on architecture and design of a data warehouse is beyond the scope of this White
Paper but there are two well known, mature concepts that are commonly followed:
The Corporate Information Factory, developed by Bill Inmon and Dr. Claudia Imhoff and the concept
of Dimensional Modeling, developed by Ralph Kimball. There are many similarities between these two
design philosophies, and both are well proven.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
12
Depending on your needs, you may not need an extensive data warehouse. Maybe several data
marts will suffice. It does not need to be hugely complex. The important thing is to have a structured
approach and an overall plan.
Develop ETL (Extract, Transform and Load) processes to load these tables from your sources
of data. Data quality management is a key aspect of ETL
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
13
RODIN is the most powerful ETL solution available for IBM System i and iSeries business systems.
Comprehensive data integration, cleansing and transformation capabilities enable you to easily design
and build complex applications in a minimum amount of time. Integrate data from virtually any source
including local and remote System i tables, as well as relational databases such as DB2 UDB, Oracle,
and SQL server.
Features include:
Security
It is extremely important to secure the tables in a data warehouse both from unauthorized access to
the data as well as unauthorized changes to the definitions. RODIN is fully integrated with System i
security and allows complete customization to suit your own particular needs.
Auditing
Sarbanes-Oxley requires public companies to provide comprehensive audit trails to show the origin
and lineage of any information used for financial reporting and this often comes from a data
warehouse. However even for private organizations, there are tremendous benefits in being able to
provide this same information. RODIN has fully automatic auditing of every ETL process, and the
comprehensive metadata provides complete source to target data lineage, including all business rules
and transformations.
Ease of Use
This is where RODIN provides enormous benefit especially to smaller organizations with limited
resources. You do not need to be a programmer or DBA to use RODIN. It is designed from the outset
to be very powerful, yet extremely easy to use. You can be up and running within a day of installation
and delivering real value within a very short timeframe.
RODIN lays the foundation for success. It enforces consistency and provides (often totally
automatically) most of the factors necessary for a successful business intelligence
implementation.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
15
If you are already using DB2 Web Query, or have kicked the tires, you will know that to be able to use
a table for a DB2 Web Query report, you need to create some metadata in the form of a Synonym.
The steps involved in creating a Synonym are not that difficult, however the advanced features that
Synonyms support can be very time consuming to implement and manage. Let us take a look at some
of the requirements and issues surrounding Synonyms:
Missing information
When using the CLI Adapter to create Synonyms (IBMs recommended approach), two important
pieces of information are not included:
If the column has an SQL name, the 10 character system name of the column is not
included in the Synonym. This means that the end-user MUST be familiar with the SQL
column names, which in some cases may not be the case.
The column headings are included (if present) but the 50 character text description is not.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
16
So DB2 Web Query has some great features to make our reporting easier but there are a few
shortcomings and the manual management of Synonyms can be burdensome. RODINs integration
with DB2 Web Query significantly simplifies this, and in most cases totally removes the need to
manage Synonyms at all:
Synonyms for RODIN tables are created automatically. Parameters allow you to specify
whether to include a prefix for all tables from this RODIN environment, and whether to qualify
the table with the library name, or to use the library list at run time.
Whenever a RODIN table is modified, the DB2 Web Query Synonym is automatically
refreshed. You do not need to remember to do this.
The 10 character system name and 50 character text descriptions are included.
RODIN supports unlimited free format text at both table and column level. The RODIN text
editor is significantly more user friendly, but the main advantage here is the data warehouse
developer is likely more familiar with the table and columns and can enter more complete,
meaningful information than possibly the person designing a report in DB2 Web Query.
RODIN fully supports true date, time and timestamp columns. When mapping legacy date
columns into RODIN tables, date (or time) conversion occurs automatically. Therefore your
legacy date problem completely goes away, and there is no need to use the Synonym Editor
to create virtual columns and create date conversion routines. Furthermore, RODIN
automatically takes advantage of a feature of DB2 Web Query to decompose dates. This
feature defines Year, Month, Day and even Quarter virtual columns for each date, significantly
simplifying and enhancing reporting by allowing you to sort and select by any of these
components of a date.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
17
RODIN contains modeling information in the form of Subject Areas. A Subject Area is a set of
related tables most commonly representing a Star Schema. Each Subject Area has all of the
logical table joins defined, and also allows unlimited descriptive text describing purpose and
use. These subject areas are automatically created as Synonyms in DB2 Web Query,
eliminating the need to define the joins a second time. Once again, if any table in the Subject
Area changes, the Synonym is automatically refreshed.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
18
The new DB2 Web Query product is poised to change the face of reporting on the System i platform.
However, the evidence is clear it will not by itself solve any of the issues limiting your reporting
capabilities and frustrating your business community.
Yes, you can deliver reports via the web. Yes, you can display information as charts. Yes, you can
create dashboards etc.
But if you migrate to DB2 Web Query without addressing the issues outlined in this paper, you will
simply put a pretty new face on your problems. The phrase lipstick on a pig is appropriate here.
This is a huge opportunity for organizations planning to implement DB2 Web Query.
You have excited your business community by promising a brave new world of reporting now you
need to deliver. If the term business intelligence was not commonly used in your organization before,
it certainly should be now.
You have an opportunity to revolutionize both operational and management reporting in your
organization. But you probably only have one chance to get it right. If you fail to implement the
framework for success, the lipstick will wear off very quickly. The reports still will not balance with
each other. You still wont be able to combine data from disparate systems on the same report. The
numbers could still be wrong.
What chance will you have to take a second shot at it? Will you be able to go back to senior
management, admit your errors and ask for the time and resources to fix it? Maybe.
RODIN is a proven solution, used for many years by large organizations, such as HSBC Bank, Office
Depot (Europe), Discovery Channel Stores, Fiserv CBS, Wells Fargo and other well known
companies. They, like many other organizations have recognized that successful business
intelligence requires an architected approach to data quality management, ETL, metadata etc, and the
right tool is critical to success.
It is very likely your success will exceed your expectations. Data warehouse or data mart
implementations commonly provide a huge ROI. Various studies have shown the typical ROI (after 3
years) averages over 400%, with extreme examples in excess of 1,000%.
Your mileage may vary but with the right approach and the right tools (RODIN & DB2 Web Query)
you can experience the same results, regardless of your organizations size.
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.
19
Alan Jordan is a Senior Vice President and CTO of Coglin Mill. He joined the Company in 1988 and
has been involved in the development of RODIN since 1995. He is the senior Product Architect,
oversees customer support and education and also undertakes short-term consulting engagements
with RODIN customers.
A native of Australia, he has been living and working in Rochester, MN since 1998. He is a regular
speaker on the role of ETL in business intelligence at COMMON and other venues.
Coglin Mill is a privately held Australian software company that has been developing software for IBM
midrange systems since 1985.
Early software products included a major Distribution and Financials package designed specifically for
organizations with complex requirements, and an advanced set of utilities to help manage the
development and production environments in mainly large System/38 installations. Today, the
company focuses solely on its very successful RODIN Data Asset Management software suite, which
is the leading solution for building and managing data warehouse and data mart environments on
System i.
Email rodinsales@coglinmill.com
DB2 Web Query as a Replacement for Query/400 Copyright Coglin Mill, 2008 All rights reserved.