1.5 Historical Data Storage and Evaluation: G. C. Buckbee

1.
Historical Data Storage and Evaluation

G. C. BUCKBEE
Partial List of Suppliers:
ABB Instrumentation; Canary Labs; Crystal Reports; Fluke; The Foxboro Company;
Honeywell; National Instruments; Oil Systems; Omega Engineering, Inc.; SAS;
Sequentia; Siemens; Squirrel; Trendview Recorders; Toshiba International; Wonderware; Yokogawa; Zontec
Accurate and reliable production and process data are essential

to continuous process improvement. Todays computer-based
control systems are capable of generating huge volumes of
data. Simply capturing all that data, however, will not solve
the problem. To be useful, it must be the right data, readily
accessible, in the right format. The design of such an information system requires an understanding of both the process and
the business environment. This chapter considers the design
and development of a data historian system for industrial processes. These systems may be known as real-time information
systems, data historians, or process information systems.
CLARIFYING THE PURPOSE OF THE DATA SYSTEM

Figure 1.5a shows the basic functions of a data historian. Fundamentally, data is collected, historized, then analyzed.
Reports are developed and decisions are made. By improving
understanding of the process, the data historian plays a critical
role in process improvement. A real-time information system
must meet the needs of many users. Operators want reliable
short-term process information. Managers want reliable and
accurate production accounting. Engineers typically want both
short-term and long-term data with sophisticated analysis
tools. The information technology department wants something that will integrate well with other systems, while requiring minimal maintenance. To avoid major disappointment at
project completion, it is absolutely critical to describe clearly
the scope and functionality of the system at the beginning.
Data
Collection
Data
Storage
Analysis
DecisionMaking
Reporting
FIG. 1.5a
Basic functions of a data collection system.
Fundamentally, the information system exists to support

decision making for each of these different users. Consider
each the following questions for each piece of data:
1. Is this the right data? Based on the users needs, make
sure that all relevant data are being captured. Are the
raw data or filtered values desired? Should online samples be collected, just the lab data, or both? Is the right
point in the process being measured?
2. Does the data come from the most reliable and accurate source?
3. Can the data be readily accessed, at the location where
the decision is made? If the data are not accessible, the
data are of no use to anybody. If the operator needs the
data, then the data should be available on the plant floor.
In team-based environments, the information should be
available in team rooms, meeting rooms, or wherever the
team gathers. Engineers should be able to place the data
onto the same computer system as their analysis tools.
4. Are the data displayed in the right format for a decision
to be made? As the saying goes, a picture is worth a
thousand words. Trends are common, easy to use, and
extremely valuable. For production data, cold, hard
numbers in an easy-to-read report are usually desired.
With so many different users, gaining a clear definition
of the scope can be a challenge. Although most people can
easily visualize scope in the form of equipment, it is more
important to define the system functionality first. By functionality, we mean defining all of the expected capabilities
of the system. This starts out as a general description, and
evolves into a very detailed description of the system.
Requirements from each of the system end users must be
collected, prioritized, and combined. A simple tool for clarifying
the system needs is the IS/IS NOT chart, shown in Table 1.5b.
The entries in this chart will vary greatly from plant to plant,
or even from department to department. Clarifying this on
paper and gaining management alignment early will help to
ensure a successful project. The following sections provide
79
2002 by Bla G. Liptk
80
Overall Plant Design
TABLE 1.5b
IS/IS NOT Analysis for a Sample Mill
IS
IS NOT
Process data collection
Maintenance data collection
Production data collection
Financial
Daily reporting
Monthly reporting
Short-term (minutes, hours, days)
Long-term (months, years)
Trending, reporting
X-Y plotting, statistics
Read-only from DCS
Download to DCS
Continuous data
Batch data
to do it? Will the production data be used to make

daily decisions?
2. How can it be ensured that the data will be entered
only once? Multiple points of data entry will lead
inevitably to discrepancies and confusion, not to mention effort lost by having to enter the same data twice.
3. How can the manual data entry be minimized? Lets
face it manual data entry is a rather boring chore,
and people make mistakes when they are bored. Can
the data be collected directly from the control system
or transferred electronically from lab equipment?
Integration with Maintenance
some suggestions for system scope. Of course, there is also

hardware and infrastructure scope to be considered.
Interactions and Integration with Other Systems
If the data historian is expected to communicate with other
electronic systems, the nature of that communication must
be spelled out. To clarify the communications between the
two systems, one should answer the following questions:
1. What information is going to be passed?
2. What is the trigger for passing the data? (time, an
event, etc.)
3. What is the form of the data? (scalar number, records,
databases, etc.)
4. Where does the Master data reside?
When several different information systems are involved, it
is helpful to use a data flow diagram to understand the interaction between systems. Figure 1.5c shows a sample data
flow diagram. For the information system to be successfully
integrated with operations, consider the following issues:
1. How will the data be used? This is the critical issue.
Will the operator use the data from a control chart to
make adjustments to the process? Will operators have
the authority to do so? Do they have the right training
Data
from
DCS
Time-Stamped
Tank Level PVs
and Flow Totalizers
Manually
Entered Time-Stamped
Quality Tank Level PVs
Data and Flow Totalizers
FIG. 1.5c
Sample data flow diagram.
Process
Records
Quality
Records
Section 1.6 provides a great deal of detail about the integration of maintenance functions with data historians. Only a small
summary is presented here. Some process data are particularly valuable for maintenance operations. For example,
hours of use of large equipment are often needed to plan
maintenance schedules. To meet the needs of maintenance,
the following should be considered:
1. How is maintenance work planned? How is it scheduled? Is there an existing electronic scheduling system?
2. Is the right process data being measured for maintenance needs? For example, are oil temperatures, or
vibration levels, being measured and recorded online?
If so, this can be very useful data for maintenance.
3. How will the maintenance personnel gain access to
the data? Will they have physical access to a computer
or terminal in their workspace? Or will they work from
printed reports?
Integration with Management
The management team is another key customer of the data
system. In fact, many information systems are justified based
on the value of the data to the management team. To meet
the needs of the management team, consider the following:
1. What types of routine reports are needed? Daily and
monthly reports are the most commonly requested.
Daily
24 h
Production
Summary
Query
8 h of
Prodn Info
Finished
Product
Quality Query
This Shifts
Quality Data
Daily
Production
Report
On-Demand
Quality
Report
1.5 Historical Data Storage and Evaluation
2. Which items should be tracked? Typically, the top

priorities are production rate data, then material and
energy usage. Environmental data have also become
increasingly important in recent years.
3. What types of analysis tools are needed? Does the
management require a particular type of analysis tool,
such as control charts?
Beware of trying to provide all the analysis tools for
everyones needs. This will drive up the cost of the project
considerably, and many of these tools will never be used. A
preferred approach is to make the data available in an open
format, then to let each analysis tool be justified individually.
Using this approach, the process engineer may use a different
set of tools than the shift manager, but they will both be
looking at the same data.
DATA COLLECTION
The first part of the information system is the data collection.
The data must be retrieved from the control system. With the
proliferation of computer-based control systems, there are
myriad ways to gather data. Specific network connection
issues are reviewed in Chapter 4. Data to be collected generally fall into two categories: continuous and event data. The
type of process operation will often determine the data collection needs.
In a continuous process, materials are supplied and transformed continuously into finished product. Examples include
pulp and paper mills, oil refineries, utilities, and many other
large-scale processes. Data collection is needed to track production, raw material and energy consumption, and for problem solving.
In a batch process, fixed procedures, or recipes, are used
to produce finished products. Ingredients are added, and processing operations are performed, in a specific sequence.
Data collection needs include keeping historical records of
batch times, operations, ingredient amounts, and lot numbers.
Continuous data are collected to keep a record of the
process. Most often, data are collected regularly, at a given
frequency. At a minimum, the process variable (PV) of each

loop is recorded. Some systems allow the collection of controller output and set point as well. Continuous data collection is most often used for continuous processes.
In older systems, data are typically collected once per
minute. But newer systems allow fast collection, as fast as
once per second, or even faster.
Older DCS (distributed control systems) systems have little memory available for data storage. To conserve storage
space, DCS systems will rotate out the data as the data age. It
is typical to keep up to a weeks worth of fast data, then to
compress the data by taking averages. This way, a long history
of averages can be kept, even though some resolution is lost.
Short-term continuous data are of great interest to the
operator. Good resolution and good trending tools are critical
for these analyses. The long-term averages are typically of
interest to management, and are used to maintain production
records and to evaluate costs.
Event Data
Any process event can also trigger the collection of data. For
example, the lifting of a safety relief valve may trigger the
collection of information about vessel pressure, level, and
contents. In fact, each alarm can be considered an event.
In batch systems, the collection of event data takes on
even more significance. As each significant step of the batch
process passes, records of material use, time, energy use, and
processing steps are recorded. These data can be used later
to reconstruct the manufacture of each batch of product.
Event data are often collected on a triggered, or interrupt
basis, rather than at a fixed collection frequency.
For anything more complicated than simple logging, collecting event data requires advance planning to define the
event record. Each piece of data is placed into a field in a
data record. Figure 1.5d represents a typical data record.
One type of event that is very important to track is operator activity. By tracking the actions of the operator, it is possible
to reconstruct events after the fact. Typically, the following
Field Names Predefined
Date
Time
1-Apr-01
6:10:13am
FIC1234.PV
FIC1234.OP
TIC1276.PV
TIC1276.OP
TIC1276.MODE
1313.01
83.91
524.06
91.93
AUTO
One record (row) added at

each Time Stamp Interval.
Real-time data snapshot is
stored in each field.
FIG. 1.5d
Sample data record.
81
82
Wiring from
Instruments
A/D
C
o
n
v
e
r
t
e
r
Processor
Memory
Comm
Port
To
Computer
Power
Supply
FIG. 1.5f
Data logger.
FIG. 1.5e
Chart recorder.
activities are logged by DCS systems:

1. Operator changes, such as set point changes, control
output changes, or controller mode changes.
2. Reactions to alarms, such as acknowledgments.
A good activity log will identify the change that was made,
the date and time of the change, and the control station from
which the change was made.
Data Loggers
Data loggers are small, stand-alone devices that collect data.
They are the digital equivalent of a brush recorder or chart
recorder. Figure 1.5e shows a picture of a traditional chart
recorder. These devices typically have one or more 420 mA
input channels, and enough memory to log a lot of data. Once
the data are collected, they are uploaded to a computer for
analysis. The typical components of a data logger are shown
in Figure 1.5f.
Multichannel models allow the collection of up to 30 channels simultaneously.
Most data loggers can be battery powered, allowing them
to be used in the field. If the logger is to be used in extreme
conditions, check to be sure that it can handle the tempera-
ture, moisture, and other environmental conditions. If data

are to be collected over many hours or days, be sure to check
the system life on one set of batteries, or inquire about alternate power supply arrangements.
Choose a system that is capable of collecting data quickly
enough to satisfy the needs. For long-term data collection in
a remote location, 1 min or even 1 h per sample may be
enough. But if the data logger is being used for troubleshooting a problem, much faster data collection may be necessary,
on the order of 1 ms per sample.
Some systems are designed to be idle, until a trigger
signal is received. This feature is very helpful for troubleshooting an intermittent problem. Set up the trigger signal to
detect the first sign of a problem, then collect data.
Once data are collected, the data must be uploaded to a
computer for analysis. Most data loggers support a serial link
connection to a PC for this purpose.
Data loggers typically do not have on-board analysis
functions. However, some newer systems include a laptop
computer that is capable of a great deal of analysis.
Data Collection Frequencies
As obvious as it sounds, data must be collected quickly
enough to be useful. But oversampling of data simply wastes
storage space. So how is the required sampling time determined? Shannons sampling theorem states that we must
sample at least twice as fast as the signal that we hope to
reconstruct. Figure 1.5g shows the compromise between data
collection frequency and data storage and network loading.
For continuous data collection systems on a DCS platform,
1-min sampling frequencies are quite common. Newer DCS
systems are capable at sampling at higher frequencies, typically
as fast as one sample per second. This is more than adequate
for production tracking, raw material usage, and energy tracking.
Troubleshooting problems drive the need for faster data
collection. For example, new systems that monitor process
dynamics may require that data be sampled at 1 s or faster.
When troubleshooting, we are often trying to determine the
sequence of events. Because process and safety interlocks happen fairly quickly (milliseconds), sampling will have to be
much faster to satisfy these needs. To troubleshoot these very
fast problems, a separate, dedicated data logger is often used.
Sample Rate, s/sample
For troubleshooting event-type data, it is not the collection frequency, but the precision of the time stamp that is of
concern. Some DCS systems, such as Foxboro I/A, timestamp the data at the input/output (I/O) level, providing excellent resolution. Others pass the data along at a given collec100
Sha
nno
10
nL
imi
t, S
low
Re
com
me
est
nde
dC
Low Limit for

most DCS systems
Pos
sib
le C
olle
olle
ctio
nR
ate
ctio
nR
ate
0.1
100
10
1
Period of Highest Frequency Problem, s (Log Scale)
FIG. 1.5g
Sample rate vs. problem frequency.
tion frequency, only to be time-stamped when it reaches the

historian device.
ARCHITECTURE OF A DATA HISTORIAN SYSTEM
The data historian often provides the only link between the
control system and the plantwide computer network. The system
must be designed to accommodate the needs of both. The typical
architecture of a data historian system is shown in Figure 1.5h.
In a modern data historian system, most signals are collected in the DCS or PLC (programmable logic controller),
and data storage is relatively inexpensive. Communications
is the bottleneck. To improve system capacity and capability,
the communications will have to be optimized.
For starters, we will need to look at internal communications in the DCS or PLC. Some DCS systems will pass
data along internal networks at a given scan frequency, typically 1 s at best. Others will use a deadband and exceptionreporting mechanism to reduce internal communications
load. Figure 1.5i shows how the deadband algorithm works
to reduce data storage requirements.
Analysis
Station
Analysis
Station
Office Network
Report
Printer
Data
Historian
DCS/PLC
Server
DCS or PLC Network
FIG. 1.5h
Typical architecture.
Recorded Signal
Each recorded value shown as a

New data are recorded when the
value goes outside the deadband.
Time
FIG. 1.5i
Diagram of deadband reporting.
83
84
Analysis
Station
Analysis
Station
Office Network
Report
Printer
Data
Historian
Gateway
Device
DCS/PLC
Server
DCS or PLC Network
FIG. 1.5j
Diagram of a gateway device.
In the past DCS systems used proprietary networks. This

changed dramatically in the 1990s, as open network systems
made their way into the DCS world. At this writing, PLCs
remain proprietary at the processor and I/O level, although the
trend is toward more open systems, using industry standards.
Communications with the data historian may be via the
proprietary network, or may be through a gateway device of
some sort. Figure 1.5j is a diagram of a system architecture
with a gateway device. This is typically a dedicated device,
sold by the DCS vendor. It allows the translation of data from
the proprietary network into a more open communications
standard.
Data that have reached the data historian should be available on an open network. The de facto standard at this layer
is Ethernet for the physical communications. Communications
across this network may use any number of proprietary or
industry-standard protocols, such as dynamic data exchange
(DDE), object linking and embedding (OLE), or OLE for
process control (OPC). For more information on these open
communications protocols, refer to Section 4.9.
DATA STORAGE
Data must be stored somewhere, usually on a server hard
disk. There are many considerations, including the format of
data storage, the ease of retrieval and exporting, and a variety
of security issues.
Because large amounts of data are being collected, it is
easy to consume disk storage space. Traditional databases
and simple flat-file techniques are simple to use, but consume
a large amount of space. In the past, this was a problem,
because of the cost of storage media. Many varieties of data
compression techniques were developed and employed to
reduce the cost of storage.
As the cost of storage media has dropped dramatically in
recent years, the cost of storage has become quite small. Unless
one is archiving thousands of data points at rates greater than
once per second, storage costs should be only a minor concern.
However, the compression techniques remain, so one must pay

attention to how they are used, and to their limitations.
Flat files represent the simplest form of data storage. Data
are stored in rows and columns. Each row represents a set of
data from one instant in time. Each column contains data
from a specific tag or parameter. The data are typically stored
in ASCII format, using commas or tabs to delimit between
columns. One that uses commas may be known as a .csv, or
comma-separated variables file.
Flat files are very inefficient for storage, as they take up a
lot of room. They are also limited in accuracy. If one stores
the data to only one decimal place, that is all one will ever be
able to see. Also, making changes to which data will be stored
is challenging. Typically, new columns are added on to the
extreme end of the file. Deleting columns is even trickier, or
may not be possible, depending on the application being used.
The strength of flat files is that they are extremely accessible. They can be opened by word processing and spreadsheet
applications. Data can be imported into almost any data analysis software package. A simple program written in Basic or
C can easily manipulate the data. Data can be easily edited,
copied, and pasted. Figure 1.5k shows an example of a flat file.
Some vendors store data in a simple compressed or binary
format. In these cases, data might be reduced to a 12- or 16-bit
value to be stored. This typically results in a dramatic reduction, often ten times or more, in storage requirements. With
the right extraction tool, data retrieval is also quite fast.
The challenge with compressed storage is that it is not
as easy to view or extract the data. A more sophisticated tool
must be used. These tools are often made available by the
vendor of the historian software. When purchasing a historian
that uses compressed files, be sure to understand export capabilities of the system. Otherwise, one may be locked into
using that vendors tools for analysis.
The relational database has become a more common
method for data storage in recent years. A relational database
organizes the data in a series of tables. By establishing relationships between these tables, a flexible and very powerful
database can be developed. Changes, additions, and deletions
85
Date,Time,FIC1234.PV,FIC1234.OP,TIC1276.PV,TIC1276.OP,TIC1276.MODE
1-Apr-01,6:10:13am,1313.01,83.91,524.06,91.93,AUTO
1-Apr-01,6:11:13am,1313.27,83.52,524.87,91.81,AUTO
1-Apr-01,6:12:13am,1313.39,83.13,524.62,91.67,AUTO
1-Apr-01,6:13:13am,1312.87,84.06,524.67,90.31,AUTO
1-Apr-01,6:14:13am,1312.31,83.87,523.06,91.11,MAN
1-Apr-01,6:15:13am,1311.76,84.11,522.17,91.11,MAN
1-Apr-01,6:16:13am,1310.02,84.22,523.44,91.11,MAN
FIG. 1.5k
Sample flat file.
Table :Batch Records

Batch No:
Start Time:
Product ID:
Density:
Weight, lbs.:
Lead Tech ID:
Table :Products
Table :Operators
Product ID:
Description:
Min Density:
Max Density:
Target Density:
Tech ID:
Last Name:
First Name:
Date of Hire:
Operating Team:
FIG. 1.5l
Relational database design for a simple batch system.
of data can also be accomplished. In addition, many of these

systems will automatically track security information, such
as who made the changes and when.
Relational databases carry a lot more overhead than do
simple flat files or binary files. For this reason, they may not
be capable of high data throughput, or they may require faster
hardware, or more sophisticated networks than a simpler file
format. When designing a relational database system, be sure
to request performance data from the vendor. Performance
data should include an assessment of the rate at which data
can be stored and retrieved.
Relational databases are especially powerful where
event-type data are to be stored. For example, batch operations
often require recording of specific data from each step in the
process. Each step can be recorded in the database. Later,
these records can be processed using the powerful relational
tools, such as combining and filtering. Figure 1.5l shows a
relational database design for a simple batch system.
Some data historian vendors use proprietary algorithms for
data storage. These techniques are often optimized for the specific system. Depending on the approach used, these formats
may also provide substantial reductions in storage requirements.
The caution, of course, is that one must be able to retrieve the
data. As in the compressed, or binary files, it is not desirable to
be limited to the data analysis tools provided by a single vendor.
So be sure to find out about export capabilities and formats.
It is also important to understand if the algorithm used causes
a loss in resolution of the data. For example, some systems use
a simple deadband to minimize the data collection frequency. A
new data point is only recorded if the value changes by more
than a certain deadband, for example, 1%. Because most oper-
ation is typically at steady state, data storage is greatly reduced.

One must be aware, however, that the data retrieved may have
been collected at a variety of sampling intervals. This may limit
the ability to perform certain types of analysis on the data.
The swinging-door algorithm is a modification of the deadband method. In this approach, the previous two data points are
used to develop a straight-line extrapolation of the next data
point. A deadband around this line is used. As long as the value
continues to track this trend line, within the deadband, no more
data are collected. Once a new point is collected, a new trend
line is calculated, and the process starts over. Figure 1.5m shows
how the swinging door algorithm works.
Where to Store Data
Given the open capabilities of todays DCS and PLC systems,
there are many options for the location of the stored historical
data. Data can be stored within the DCS, in the human
machine interface (HMI) system or on a separate box. When
making this decision, keep in mind the following.
Use industry-standard computers and hard drives for data
storage. The cost of the hardware for the data storage will be
much less than the DCS vendors proprietary platform, and there
will be greater flexibility. Redundancy and sophisticated security
are available in most readily available industrial servers.
Consider the use of two networks. One will be for the
control system, and the other will be for data retrieval and
analysis. This separates the network loading, and eliminates
the risk of a power user analyzing last years data and
bringing down the control network. A third network, typically
the largest, is the plantwide local-area network (LAN).
86
Recorded Signal
Each recorded value shown as a
New data are recorded when the

value goes outside the deadband.
Slope of the deadband lines

is based on the previous 2 data points.
Time
FIG. 1.5m
Swinging door data storage algorithm.
Analysis
Station
Analysis
Station
Plant LAN
Data
Historian
Power User
Station
Network
Printer
Historian/Info System Network

DCS/PLC
Server
Quality
Station
DCS or PLC Network
FIG. 1.5n
Diagram of recommended network arrangement.
Figure 1.5n shows a recommended network arrangement.

The idea here is to minimize the network traffic on each LAN.
At a minimum, keep data historian traffic off the control LAN
as much as possible: Protect the basic function of the control
network, which is to control the process. For systems connected to a plant or corporate LAN, coordinate efforts with
the information technology (IT) group.
Capturing the meta-data is more of a challenge than

simply collecting the number. Moreover, most analysis tools
are not yet sophisticated enough to associate the meta-data
to the raw values for analysis.
Most newer control systems will make signal quality data
available to a data historian.
The Cost of Data Storage
Data Compression
Some of the algorithms mentioned above are used for data
compression. That is, data compression reduces the amount
of data storage required, thereby reducing network load and
hard disk storage requirements.
Meta-Data
Meta-data can be defined as data about the data. For example, many DCS systems include meta-data about the quality
of a process variable signal. The signal may be out of range,
good, manually substituted, or questionable.
There was a time when data storage was expensive. In fact, on

older DCS systems, this may still be true. In these cases, one
must be careful about what to store, and for how long. As the
cost of electronics dropped in the 1990s, the cost of electronic
storage plummeted dramatically. At this time, the cost of storage
is ridiculously small. As an example, take the cost of storing
one PV, collected continuously every second for 1 year.
2 bytes 60 s/min 60 min/h 24 h 365 days
= 6 MB/year
In 2001, a 20-GB drive, or 20,000 MB, cost about $400.
So the cost of each years worth of PV storage was roughly

$400 6 MB/20,000 MB = $0.12! And that was without any
compression! So in most cases, cost of data storage is only
a minor factor when designing a system.
HARDWARE SELECTION
Typically, data historians will bridge multiple networks. On one
side, they collect data from the control system network, and on
the other they make the data available to the plant LAN. This
arrangement will minimize traffic on the control system network, improving capacity and stability of the control system.
Designing a computer network has become a fairly complicated subject. Refer to Chapter 4 for a more complete
treatment of networks.
Data are typically stored on a hard drive system. This
technology changes rapidly, and hard disks of greater and
greater capacity are available each year. Data historian systems are usually designed to have the largest available hard
drive. The incremental cost is small, and there will be room
to collect data at higher frequencies and for longer time
periods with a larger hard drive.
Hard drives, of course, are mechanical devices with moving parts. Although hard drive reliability has improved
greatly, failures are still quite common over a 5- or 10-year
life. Redundant systems, such as RAID arrays, are used to
minimize the risk of lost data due to hard drive failure. There
are many ways to implement these redundancy strategies.
Discuss this with a knowledgeable computer consultant
before attempting this.
Some RAID arrays use a parity-checking scheme, which
is shown in Figure 1.5o. If any single drive fails, the paritycheck drive can be used to reconstruct the lost data. The parity
drive totals the bits in each column from each of the other
drives. If the total is even, it records a zero. If the total is
odd, it records a 1. Lost data are reconstructed by totaling
the remaining drives, and checking the odd or even result
against the parity.
To analyze the data, a computer station that is capable
of accessing the data and running the software analysis tools
Drive 1:
1010 1111
Drive 2:
1010 0011
Drive 3:
0100 1000
Drive 4:
1110 0001
Parity Drive:
1010 0101
FIG. 1.5o
Diagram of parity-checking scheme.
87
is necessary. In most cases, the lowest-cost option will be to

run the analysis software on the computers that already exist
on the plant LAN. This has the added advantage of making
the data and the completed analysis available via standard
business applications, such as e-mail and word processing.
If the existing computers are not capable of handling the
analysis, or if a dedicated station is desired, then it will be
necessary to purchase additional stations. These stations should
have a high-speed network card to gather the data from the
historian. The amount of raw processing power required
depends on the amount of data that will be analyzed. Most
analysis stations will work well with low- to mid-range processors. A high-powered process may be required when analyzing very long-term data (years), or when doing complex
analysis, such as correlation studies or frequency analysis.
Analysis stations should also have a simple way of exporting
the data to other PCs. This can be directly via LAN connection,
or via floppy, removable hard drive (or ZIP drive), or via a CDwriter. The choice of media depends upon what type of readers
are available to the majority of downstream data users.
In smaller applications, one may be tempted to locate the
analysis software directly on the same machine that is used
for data collection and storage. This is not recommended.
Frequent user access to the machine may cause operating
system failures or machine re-boots. Data may be lost while
the machine is recovering.
Backup Media
The backup media should be chosen based on the following
criteria: durability, ease of use, and cost.
Durable media will last for many years. It may be surprising that CDs have an expected life span of only 10 years,
and they are one of the better media available. And one should
be sure to select a supplier that will be around for a while.
One is not likely to need to restore from a backup for several
years. If a disaster occurs, and it is necessary to restore the
data, it is important to be sure that the device driver will work
with the latest operating system. Backup media should also
Drive 1:
1010 1111
Drive 2:
1010 0011
Drive 3:
0100 1000
Drive 4:
LOST DATA
Parity Drive:
1010 0101
Recovered Data:
1110 0001
88
be easy to use for both backup and recovery. There should

be software backup tools that guide the process. In some of
the better systems available today, the backup process can be
automatically triggered, based on time and day. This way,
one simply pops in the new media sometime before the next
backup.
Also, be sure that the media can be easily labeled. If it
is not labeled, it will not be possible to find the right version
for recovery.
When selecting media for backup, the cost of media is
almost as important as the cost of the drive hardware.
Remember that one will be filling up the media on a daily,
weekly, or monthly basis. A lot of media will be used over
the life of the system.
ANALYSIS AND EVALUATION

The most basic of analysis tools is summarization. Using tools
such as totalization or averaging, one can convert material usage
rates and production rates into more meaningful numbers, such
as tons produced per day, or tons consumed per month.
Because many DCS and PLC systems are equally capable
of totalizing and averaging the numbers, a strategic choice about
where to perform the calculation is necessary. In most cases it
is best to perform the calculation as far down in the system as
possible. This ensures consistency and also maintains the data

in the event of loss of communication to the data historian.
When establishing any sort of summarization, be sure to
document the basis for the calculation. Of particular importance is the time frame chosen for daily averaging. Some plants
use midnight-to-midnight, some choose shift-start to shift-start
times (i.e., 7 A.M. to 7 A.M.) to report production. Make sure
that the data historian matches the established business practices. Managers love reports! It provides some insight into the
process, and more importantly, it provides information for
direct accountability, particularly for production information.
In many cases, managers bonuses are based directly on the
production numbers that appear on their reports. So expect a
lot of interest from management when planning the reporting
system. Similarly, be sure to document the basis of any calculations, perhaps in a users manual. Generally speaking, the
daily production report is the heart of many reporting systems.
This report will show daily production totals, and may also
include usage of raw materials and energy. The most sophisticated of these reports will include pricing and profit data
obtained through a link to another business tracking system.
Monthly reports simply totalize and collate the information that was contained in the daily reports. Depending on
the needs of management, quarterly or annual reports may
also be required. Figure 1.5p shows a typical daily production
report.
Daily Production Report

XYZ Widgets, Inc.
Start Date
Start Time
Stop Date
Stop Time
1-Apr-01
7:00:00 AM
2-Apr-01
7:00:00am
Production
Total Tons :
Good Tons :
Product A :
Product B :
Product C :
Scrap :
FIG. 1.5p
Sample production report.
527.3
506.3
126.6
269.5
110.2
21.0
Good Tons: 96.02%
Raw Materials
Ingredient A:
Ingredient B:
Ingredient C:
Gas:
433.1
66.1
32.3
1276.3
Time Analysis
Product A:
Product B:
Product C:
ChangeOver:
Maintenance:
5:21
12:17
4:41
0:41
1:00
tons
tons
tons
MCF
89
50.0
PV
40.0
30.0
20.0
10.0
17:33
17:34
17:35
17:36
17:37
Time
FIG. 1.5q
Diagram of a simple trend.
When designing the report, it is best to try to provide the

data in a format that is directly usable. Try to avoid providing
a report so that the user can enter the data into a spreadsheet.
If users are going to do this every day, week, or month, why
not design the system to provide the spreadsheet directly?
As reporting systems have become more sophisticated,
it is now possible to add trends or other graphical information
to the report.
Trends, of course, are a basic analysis tool included with
most DCS and HMI systems. But they are usually limited to
short-term analysis.
For data that stretches over months or even years, most
DCS and HMI-based trending packages cannot handle it. So
a trending package that is designed for the data historian will
most likely be required. Figure 1.5q shows a simple trend.
Advanced trending packages will allow the user to adjust
the timescale and the displayed range of data. This is commonly called zoom-in and zoom-out capability.
Even the most well conceived analysis system will not
meet the needs of all users. Power users will always come
up with some new analysis that they simply must have to
solve a process or business problem. Although it is tempting
to modify the design of the data historian or analysis tools,
it is usually prudent (and cost-effective) simply to make the
data available through export to a common data format.
The fact is that 99% of the users will not need the special
feature that the power users wants. By exporting the data,
power users can then apply whatever tools they wish. The
responsibility for justifying and maintaining the specialized
analysis tool falls back to them.
A simple export tool will simply dump out the data in a
standard ASCII, CSV, or Excel file. More sophisticated export
tools may allow one to filter or sort the data first, before exporting. Statistical quality control requires special trending and/or
alarming tools. For more detail on the use of statistical process control tools, see IEH, Volume 2, Section 1.20 Statistical Process Control.
To assess the relationship between variables, correlation
tools are required. A variety of these tools such as JMP and
SAS have been around for many years.
Data Filtering and Editing

Data are not perfect. Sensors fail. Wires fall off. Instruments
drift out of calibration. Unexpected disturbances occur. To
complete analysis of imperfect data, it is desirable to have
data filtering and editing tools.
Filtering tools allow data selection based on a variety of
criteria. The most obvious will allow selection based on a
time window. But it may be desirable to be able to select
data using more complex criteria, such as whether or not the
point was in alarm, or whether a start-up was in progress, or
whether it was a Sunday. A good set of filtering tools can be
used to find patterns in the data quickly, and to prove or
disprove hypotheses about plant problems.
Data editing tools allow the user to modify data that was
collected automatically. This can be used to fill in gaps, or
to correct for known instrument drift, for example. In the
most sophisticated of systems, that database will keep track
of all edits that were performed on the data.
SYSTEM TESTING
When starting up a data historian, plan on extensive system
testing. One will want to confirm that the right data are being
recorded, at the right frequency, and stored in the right place.
Also, it is important to test the backup and restore system.
Waiting until after a catastrophe to find out that all the valuable data were never backed up is not desirable.
SUPPORT OF THE DATA HISTORIAN SYSTEM
The biggest support issue will be to decide who owns the
maintenance of the system. In most plants, the data historian
falls right on the border between the instrument and control
department and the IT department. The input of both groups
is important to establish a successful maintenance plan.
Be sure to discuss issues such as:
Where will the hardware be housed?

Who will complete the backups?
90
Who will be called in if there are hardware problems?

Who will handle software problems?
How to work together to handle communications problems between the data historian and the DCS or PLC.
Remember: There are two ends to every communications problem!
Where to find budgetary funds for maintenance, service,
and upgrades.
There are many ways to structure the agreement between the

various parties successfully. What is most important at this stage
is to discuss the issues openly, and to establish a basic plan.
Security
Depending on the policies of a company, data security may
be a very big concern. In many plants, information such as
production rates, downtime, and material usage are considered proprietary information.
To use the data to the fullest extent, the historian should
be resident on the plantwide or corporate network. However,
the larger the connected network, the greater the concerns
about unauthorized access to critical business information.
The security system for a data historian can range from
simplistic to very complex. It is best to consult with the IT
department to ensure that an appropriate security system is
selected. The goal is to maintain the security of the data,
while making authorized access as simple as possible. The
best systems will allow integration of data historian security
with the existing plant network security.
A more detailed treatment of security issues can be found
in Section 2.7, Network Security.
Backup, Archive, and Retrieval
Be sure to establish a backup system. It would be a shame
to collect all of this valuable data, then lose the data because
of a hard drive crash. Backups are typically made to some
type of removable media. The type of media chosen depends

on several factors: economics, expected life of the data, and
regulatory requirements.
Archives back up the process data. A complete backup
will also record configuration information for the operating
system, hardware, and data historian configuration. Archives
should be made daily or weekly, and a complete backup
should be done when the system is installed, and at least
annually thereafter.
The economics of removable media seem to change each
year, as the electronics industry continues to evolve. Check with
the IT department or a PC consultant to evaluate these options.
A word about regulatory requirements: If the data must
be kept on file to meet government regulations, be careful in
the choice of media. Most electronic media (even CD-ROMs)
have an expected life of only 10 years. Even if the media
lasts, will the media be capable of replay in 10 years, on the
next generation of computers? Unfortunately, paper and
microfilm remain the media of choice for storage that must
last more than 10 years.
Of course, a backup is useless if it cannot be retrieved.
The retrieval system should allow data to be restored fully.
Depending on the requirements, one may want to restore the
data to the primary data historian, or one may want to restore
the data to another off-line system. Be sure to inquire about
the capabilities.
Bibliography
Eckes, G., General Electrics Six-Sigma Revolution, New York: John Wiley
& Sons, 2000.
Hitz, P., Historical Data in Todays Plant Environment, Research Triangle
Park, NC: ISA Press, 1998.
Johnson, J. H., Build Your Own Low-Cost Data Acquisition and Display
Devices, 1993.
Liptk, B., Optimization of Unit Operations, Radnor, PA: Chilton, 1987.
Liu, D. H. F., Statistical Process Control, in Process Control (Instrument
Engineers Handbook), 3rd ed., Radnor, PA: Chilton, 1995, pp. 138143.

1.5 Historical Data Storage and Evaluation: G. C. Buckbee

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

1.5 Historical Data Storage and Evaluation: G. C. Buckbee

Încărcat de

Drepturi de autor:

Formate disponibile

1.

Historical Data Storage and Evaluation

Partial List of Suppliers:

Accurate and reliable production and process data are essential

CLARIFYING THE PURPOSE OF THE DATA SYSTEM

Fundamentally, the information system exists to support

2002 by Bla G. Liptk

Overall Plant Design

Process data collection

Maintenance data collection

Production data collection

Short-term (minutes, hours, days)

Long-term (months, years)

X-Y plotting, statistics

Read-only from DCS

to do it? Will the production data be used to make

some suggestions for system scope. Of course, there is also

2002 by Bla G. Liptk