Application of Machine Learning in The Fault Diagnostics of Air Handling Units PDF

Applied Energy 96 (2012) 347358
Contents lists available at SciVerse ScienceDirect
Applied Energy
journal homepage: www.elsevier.com/locate/apenergy
Application of machine learning in the fault diagnostics of air handling units

Massieh Naja a,, David M. Auslander a, Peter L. Bartlett b, Philip Haves c, Michael D. Sohn d
a
Department of Mechanical Engineering, University of California, Berkeley, California 94720, United States
Computer Science Division and Department of Statistics, University of California, Berkeley, California 94720, United States
c
Commercial Building Systems Group, Lawrence Berkeley National Laboratory, California 94720, United States
d
Airow and Pollutant Transport Group, Lawrence Berkeley National Laboratory, California 94720, United States
b
a r t i c l e
i n f o
Article history:
Received 16 August 2011
Received in revised form 9 January 2012
Accepted 20 February 2012
Available online 27 March 2012
Keywords:
Bayesian network
HVAC systems
Air-handling unit
Energy management
Fault detection and diagnosis
Machine learning
a b s t r a c t
An air handling units energy usage can vary from the original design as components fail or fault dampers leak or fail to open/close, valves get stuck, and so on. Such problems do not necessarily result in occupant complaints and, consequently, are not even recognized to have occurred. In spite of recent progress
in the research and development of diagnostic solutions for air handling units, there is still a lack of reliable, scalable, and affordable diagnostic solutions for such systems. Modeling limitations, measurement
constraints, and the complexity of concurrent faults are the main challenges in air handling unit diagnostics. The focus of this paper is on developing diagnostic algorithms for air handling units that can address
such constraints more effectively by systematically employing machine-learning techniques. The proposed algorithms are based on analyzing the observed behavior of the system and comparing it with a
set of behavioral patterns generated based on various faulty conditions. We show how such a patternmatching problem can be formulated as an estimation of the posterior distribution of a Bayesian probabilistic model. We demonstrate the effectiveness of the approach by detecting faults in commercial building air handling units.
2012 Elsevier Ltd. All rights reserved.
1. Introduction
1.1. Overview
Air handling units account for a signicant portion of building
energy consumption and have a major impact on comfort conditions and building maintenance cost. An air handling units energy
usage can vary from the original design as components fail or fault:
dampers leak or fail to open/close, valves get stuck, and so on. Such
problems do not necessarily result in occupant complaints, as the
cascade structure of the control system would try to neutralize
the fault effect through re-adjusting other parameters and/or
changing the component loads. For instance, the effect of a damper-leakage fault may be covered by re-adjusting the position of
the hot or cold water valves. The fault may not even be recognized
to have occurred even though it may result in an increase in energy
usage. As long as the control system satises the set-points, the
building operators tend to assume that the system is working efciently in a non-faulty condition.
The topic of fault detection and diagnosis in air handling units
has been an active area of research and development for more than
Corresponding author.
E-mail addresses: massieh@ieee.org (M. Naja), dma@me.berkeley.edu (D.M.
Auslander), bartlett@cs.berkeley.edu (P.L. Bartlett), phaves@lbl.gov (P. Haves),
mdsohn@lbl.gov (M.D. Sohn).
0306-2619/$ - see front matter 2012 Elsevier Ltd. All rights reserved.
doi:10.1016/j.apenergy.2012.02.049
two decades. However, in spite of the progress and effort made,

there is still a lack of reliable, affordable, and scalable solutions
to locate and manage faults in these systems; modeling limitations,
measurement constraints, and the complexity of concurrent faults
are among the main challenges for scalable solutions for air handling unit diagnostics.
The fact is that the principles of HVAC systems, particularly for
air handling units, are known well enough to create suitable model
structures; however, the accuracy of such models can be improved
only up to a certain level; beyond that, excessive effort is required
to obtain high-quality a priori knowledge,1 which negatively affects
model scalability. This limits the applicability of diagnostic strategies
that rely on accurate or detailed models.
On the other hand, the architecture of sensor networks in air
handling units is not necessarily designed solely for diagnostic purposes. Other factors and considerations such as control objectives,
nancial constraints, and practical limitations are also involved. As
a result, we are confronted with situations in which the performance of two or more components is monitored through only
one sensor (or one set of sensors). A well-known example is reliance on supply air temperature to analyze the functionality of
1
If the model is a detailed rst-principle model, the a priori knowledge comprises
mainly model parameter values and their variations. If the model is an empirical
model, the a priori knowledge is usually high-quality training data for system
behavior in different modes.
348
M. Naja et al. / Applied Energy 96 (2012) 347358
Nomenclature
HVAC
heating, ventilation, and air conditioning
No fault no-fault condition
Reverse reverse actuator fault
OAD leak outside air damper leakage fault
RAD leak return air damper leakage fault
Stuck
stuck damper fault
Fouling fouling fault
VLV stuck valve-stuck fault
SAT
supply air temperature (F)
OAT
outside air temperature (F)
RAT
return air temperature (F)
MAT
mixed air temperature (F)
DMP
outside air damper position (F)
T_air_in temperature of entering air (F)
T_water_in temperature of entering water (F)
T_air_out temperature of outgoing air (F)
NTU
number of transfer unit (NTU) method
CFM
cubic feet per minute, measurement of air volume ow
rate
the mixing box and heating and cooling coils. As will be shown later, in such scenarios, when the sensor output is contaminated, it
could be due to the malfunction of any involved components,
and it is not necessarily straightforward to locate the malfunctioning one.
The complexity of modeling limitations and measurement constraints in air handling unit diagnostics becomes even more severe
when the possibility of concurrent faults is taken into account. A
single-fault assumption would relieve the diagnostic complexity,
but in reality, two or more faults may occur at the same time within one component or across different ones. The effect of concurrent
faults is not necessarily a linear interpolation of each individual
ones.
One approach to relieve the diagnostic complexity due to modeling limitations and measurement constraints is active diagnostics. In active-mode diagnostics, the diagnostic mechanism
actively controls or manipulates the system inputs (e.g. damper
positions, valves, etc.) to detect and isolate faults. Usually, inputs
are changed based on predened (or adaptive) test sequences to
explore various operating conditions. The tests can be structured
to explore operating points with less uncertainty or error, or in
the case of one sensor being affected by several components functionality, put neighboring components into neutral states to have
one component at a time affecting the measured variable. However, active-mode diagnostics require isolation of the system from
normal operation, an option that may not be feasible.
Conversely, in passive-mode diagnostics, there is no control on
the inputs. In this approach, the system is in a closed-loop operation manipulated by the control system based on the set-point error and so on. This is a more complicated scenario, as there is no
capability to change or manipulate the inputs to follow a test procedure or sequence. The diagnostic mechanism needs to somehow
make the best use of available data (measurements) from daily
operation.
The focus of this paper is on developing passive-mode diagnostic algorithms for air handling units that can systematically address the above constraints in a passive mode. We believe that
an ideal diagnostic solution should not only be reliable in detecting
and isolating abnormal behaviors but also have systematic solutions for constraints and challenges related to scalability and
affordability. Our proposed diagnostic algorithm is based on
VLV
IID
va
vw
Ch
Cc
Tair-in
Tw-in
a
b
l
r2
DP
Cp
d
valve position
independent and identically distributed
air velocity (ft/s)
velocity of water (ft/s)
hot uid capacity rate
cold uid capacity rate
temperature of incoming air (F)
temperature of incoming water (F)
coefcient factor
coefcient factor
mean or expected value
variance
total pressure rise across fan
specic heat of air (BTU/lbF)
density (lb/ft3)
fan combined efciency
analyzing observed behavioral patterns and comparing them with

a set of predened patterns generated based on different fault
assumptions. In Section 3, we will show how such a patternmatching problem can be formulated as estimation of the posterior
distribution of a Bayesian diagnostic model. We will also show
how the proposed diagnostic framework can systematically address modeling and measurement constraints. In Section 4, we
demonstrate the effectiveness of the proposed algorithm using various examples.
2. Literature survey
Heating, ventilation, and air conditioning (HVAC) systems account for more than 30% of annual energy use in the United States
[3,5,6]; however, it has become apparent that only a small percentage of them work efciently or in accordance with the design intent [2,9]. Operational faults are one of the main causes for the
inefcient operation of HVAC systems. Studies of existing buildings
have found that energy savings of 515% are typically achievable
simply by xing faults and optimizing HVAC control systems [8].
However, the current methods of detecting faults or performance creep are labor-intensive. Typically, building operators or
engineers use intuition and various rules of thumb to identify the
problem. In practice, the labor-intensiveness of these tasks is such
that they are not routinely performed and in fact may never be performed. If the 515% energy savings are to be met in practice,
HVAC systems must be capable of detecting when a failure has occurred, when performance is creeping and to determine the likely
offending hardware or operating condition. Automated systems for
fault detection are, therefore, essential if low-energy or net-zero
energy goals are to be met nationally.
Functionally, an air handling unit (AHU) is a device used to condition and circulate the air as part of an HVAC system. It is usually a
large metal structure containing one or two fans, a mixing box, and
heating/cooling coils2 (Fig. 1). The mixing box mixes the air returning from the building with fresh outside air; the minimum ratio of
outside air to be re-circulated is specied by building codes. The
heating/cooling coils heat up or cool down the mixed air to maintain
the required supply air temperature and humidity.
2
It may contain both or either.
349
Fig. 1. Air handling unit.
Typically, an air handling unit contains three temperature sensors, the outside air temperature (OAT), return air temperature
(RAT), and supply air temperature (SAT) sensors, along with a fan
status indicator (Fig. 1). One of the main challenges in monitoring
air handling unit performance is the absence of a reliable measurement for the mixed air temperature (MAT), the temperature of the
air coming from the mixing box before going through the heating/
cooling sections. Usually, either there is no sensor in place to measure the MAT or, even if there is a temperature sensor, the sensor
readings are unreliable due to incomplete upstream mixing. This
constraint forces us to use the SAT sensor to evaluate mixing box
performance. However, as shown in Fig. 1, the SAT is also affected
by the heating/cooling coil functionality, and distinguishing the
mixing box effects from the heating/cooling coil effects is not

straightforward (as in the case when two or more components
are being monitored through one sensor).
An AHU malfunctions when any number of its internal components faults. Air handling diagnostics have been an active area for
research and development [26,27,33,43,7,41,24,12,14]. A variety of
diagnostic solutions ranging from rst-principle-model-based
diagnostic routines [16,32] to empirical-model-based diagnostic
approaches [36,32,45,46,34,29,30] and qualitative/rule-based diagnostic solutions [25,4,19,15] have been developed for the evaluation of air handling unit performance and its components.
However, as mentioned earlier, the nature of the HVAC industry
and the fact that AHUs are usually designed and customized for
350
each individual buildings limit the applicability of diagnostic solutions that rely on detailed models (or models that rely on conguration data that is not easily measureable or accessible) from the
scalability perspective. On the other hand, when an analysis approach employs simplied, more generic, models, the challenge
is how to differentiate between the inconsistencies due to model
misspecication errors and those due to system malfunction. In
other words, when detailed models are replaced with more simplied ones, the interpretation of model prediction differences becomes more challenging.
A strategic approach to address the complexity of employing
simplied models is to change the focus of an analysis approach instead on system behavioral patterns instead on error residuals. In
other words, instead of analyzing the difference between the system output and the model prediction at one or a few operating
points, diagnostics are made by evaluating the system behavioral
patterns over a window of operation. This lessens the dependency
of the diagnostic algorithm on model accuracy. Such an approach
has been employed by a number of diagnostic routines, particularly qualitative and semi-quantitative diagnostic approaches
[26,27]. The key here is an algorithm (inference mechanism) that
evaluates the observed behavior and compares it against a set of
predened (or even adaptive) hypotheses. Fuzzy logic has become
a popular choice for such problems due to the inherent exibility
embedded in fuzzy sets and fuzzy rules, which makes it a suitable
solution for reasoning in domains with some level of uncertainty
[44,16,17,20]. For example, Haves et al. [17] proposed a fuzzybased diagnostic routine for the fault diagnostics of VAV air handling units in which the fuzzy-based inference mechanism compares the predictions of simplied models with the air handling
unit component outputs at various operating conditions to draw
conclusions about the air handling unit health status.
However, fuzzy-based inference mechanisms have their own
limitations. As the problem complexity grows (due to the system
complexity, a large amount of disparate sensor data, the number
of potential faults, etc.), a large number of fuzzy sets and fuzzy
rules are required to analyze the system performance. Added to
this is the difculty with adjusting and tuning fuzzy sets either
manually or through other approaches.
Another approach to managing modeling limitations are rulebased diagnostic routines [42,10,35,1,28,37,38]. In this approach,
a priori knowledge is formulated through a set of if-then rules coupled with an inference mechanism searching through the rules to
draw a diagnostic conclusion. Rule-based frameworks can be designed based on expert knowledge or rst principles. Their advantage is simplicity and ease of deployment; however, as discussed in
Katipamula and Brambley [26,27], as problem complexity grows or
when new/additional rules are added, the simplicity of the approach is lost quickly. Furthermore, sometimes the activation of
the rules depends on threshold(s), which may depend greatly on
model uncertainties, measurement errors, or other issues. More
discussion on this can be found in House et al. [19].
In this paper, we adopt the strategy of employing simplied
models, as we believe that dependency on complex and detailed
models is a signicant technological barrier and cause for industry
resistance to large-scale deployment. Our approach therefore relies
on more sophisticated inference mechanisms to interpret discrepancies between model predictions and the system output.
existence of one or more faults in the system (see for example

[39,40]. Once the closest hypothetical pattern is identied, the
associated assumptions are concluded to be the system health status. For example, in mixing box diagnostics, if it turns out that the
observed performance is closer to the behavioral pattern described
by the outside-air-damper-leakage fault condition from a pool of
behavioral patterns associated with stuck-damper fault, reverseactuator fault, and so on, it is concluded that the underlying mixing
box had an outside-air-damper-leakage fault.3
To formulate this within a mathematical framework, let us dene the set of potential faults as:
F ff1 ; f2 ; f3 ; . . . :; fn g
3:1
and the measured data from the system is dened as:
E fe1 ; e2 ; e3 ; . . . :; em g
3:2
where e1 . . . em present vectors of the data measured at t = 1, . . ., m.

The aim is to calculate the probability of F given E, P(F/E) posterior
probability of F, and nd out for which combination of f1, f2, f3, . . ., fn,
P(F/E) is maximized.
f1 . . . fn Represents the set of all possible faults in the systems (fi
is 1 when the ith fault exists and 0 when the ith fault does not
exist). For example, in the mixing box example, f1 could be an outside-air-damper-leakage fault, f2 could be a return-air-damperleakage fault, and f3 could be a reverse-actuator fault. Therefore,
F = {1, 0, 0} means that only one fault (an outside-air-damper-leakage fault) exists; F = {0, 0, 1} is related to the case of reverse-actuator fault, similarly, and F = {1, 1, 0} is related to the case of two
concurrent faults: an outside-air-damper-leakage fault and a return-air-damper-leakage fault. The case of F = {0, 0, 0} is related
to a no-fault scenario.
Note that the marginal probability of an individual fault (fj) can
be calculated by:
Pfj je1 ; e2 ; e3 ; . . . ; em
Pf1 ; f2 ; f3 ; . . . :; fn je1 ; e2 ; e3 ; . . . ; em
f1 ...fn excludingfj
3:3
Now, using Bayes rule, we can compute P(F/E) as:
Pf1 . . . fn Pe1 . . . em jf1 . . . fn

f1 ...fn Pf1 . . . fn Pe1 . . . em jf1 . . . fn
Pf1 . . . fn je1 . . . em P
3:4
where P(f1 . . . fn) is the prior distribution. Different strategies or logic

can be used to estimate the prior distributions. They can be dened
based on statistical analysis: if there are statistical results or qualitative information about which faults (or fault combinations) are
more frequent than others. Additionally, intuitive methods can be
employed to dene the fault priors. In this paper, we follow the philosophy that a single fault is more likely to occur than two faults
simultaneously; similarly, two concurrent faults have a higher
occurrence probability than three concurrent faults. Therefore, single faults are assigned a higher prior than two concurrent faults, and
two concurrent faults would have a higher prior than three concurrent faults, and so on.
With an IID sampling assumption,4 Eq. (3.4) can be expanded as:
log Pf1 . . . fn je1 . . . em log Pf1 . . . fn

log
m
P
log Pei jf1 . . . fn
i1
Pf1 . . . fn Pe1 . . . em jf1 . . . fn
f1 ...fn
3:5
3. Diagnostic algorithm
We think of fault diagnostics as the process of analyzing a system behavioral pattern (observed performance) and comparing it
with a set of hypothetical patterns to nd the closest match. Each
hypothetical pattern is developed based on the assumption of the
3
The mixing box functionality, model, and diagnostic algorithm are discussed in
detail in Section 4.
4
Here, the IID assumption means that, given faults f1 . . . fn, the random variables
e1 . . . em are statistically independent and identically distributed. More on IID
sampling can be found in DasGupta [11].
351
P(ei|f1 . . . fn) is the likelihood function: the probability of measuring

ei given f1 . . . fn. This comes from the system model: assuming that
the fault condition f1 . . . fn exists, what is the likelihood of measuring
ei? We can split ei into two sets: the sets of system inputs, Ii and system outputs, Oi.
ei Ii ; Oi
The inputs are assumed to be known and deterministic,5 and the
output is what is measured from the system behavior. For example,
in the case of the mixing box, the inputs are the outside air temperature (OAT), the return air temperature (RAT), and the outside air
damper position (DMP), and the output could be the mixed air temperature (MAT) or outside air fraction (OAF).6
Under these assumptions, P(ei|f1 . . . fn) can be written as7:
Pei jf1 . . . fn POi jIi ; f1 . . . fn
3:6
Eq. (3.6) is indeed a probabilistic model of system performance.

It denes the system output as a random variable conditionally
dependent on the input and the fault status. Interpreting the model
output as a random variable provides a systematic structure to deal
with uncertainties in the model output due to modeling simplications and errors. In this framework, such uncertainties can be
quantied into the random variable variance.
One challenge with Eqs. (3.4) and (3.5) is that, for applications
with a large number of potential faults, there would be a very large
number of faulty scenarios to analyze (it can be on the order of a
thousand or more). For applications such as an air handling unit
in which the number of faults is limited and manageable, this is
not a concern. However, for more complex applications where
the number of potential faults/abnormalities is on the order of
hundreds, it would be computationally problematic. One solution
could be solving Eqs. (3.4) and (3.5) numerically by employing
numerical algorithms such as the Markov chain Monte Carlo
(MCMC) method. Another practical approach is to adopt more simplications/assumptions to reduce the problems complexity. For
instance, we may assume that concurrent faulty scenarios with
more than three simultaneous faults are negligible, as they have
a very small probability.8
The probabilistic models in Eq. (3.6) can be developed in different ways. They could be an extension of analytical models with
added uncertainties/errors, or more sophisticated statistical procedures can be employed to develop the models. For example, the
characteristics of the output random variable can be thought of
as a combination of a set of basis functions generated at the input,
linearly combined with coefcients inuenced by the system fault
status. If the output random variable is a Gaussian distribution (or,
more generically, an exponential family distribution), the estimation of the linear coefcients can be straightforward. As some of
the demonstrations in Section 4 employ these types of models, it
would be helpful to briey address the derivations of such models.
Lets assume that the system has a set of inputs I = [I1, I2,
I3, . . . , Im]T and an output, y, which we assume to be a Gaussian
distribution with l, r2 as the mean and variance variables. Also, assume that there is a set of basis functions {h1, h2, h3, . . . , hn} projecting the input vector I to x = [x1, x2, x3, . . . , xn]T so that we have:
x1 h1 I; x2 h2 I; . . . ; xn hn I
3:7
5
The assumption of deterministic inputs can be dropped for more general
scenarios.
6
OAF is dened in Section 4.
7
Here we assume modeling the static behavior of the system.
8
Keep in mind that such simplication/assumption would affect only the
denominator of Equation (3.4) [or the last of part of Eq. (3.6)], which is the
normalizing factor for correct estimation of the posterior probabilities. They will not
affect the process of locating the fault combination with maximum posterior
distribution. They would change only slightly the marginal probability of faults.
Now lets assume that the output, y, is a linear combination of

x9:
l hT x where hT h1 ; h2 ; . . . ; hn T
3:8
As y is a Gaussian distribution, we have:

1
1
Pyi jl; r2 p exp 2 yi li 2
2r
2pr2
9
8

<l y l2i =
y2i
1
i i
2
exp
p exp 2
: r2 ;
2r
2pr2

li yi Ali
hyi exp
2
l1 = hTx1,
where
hyi
1
p
,
2pr2
Ali
xi = [xi,1, xi,2, . . . , xi,n]T,

l21
2
3:9
h = [h1, h2, . . . , hn]T
, yi or xi is the value of y or x at time/index = i,
and xi,1 is the value of x1 at time/index = i. This is indeed a canonical

representation of the Gaussian distribution.10 Now, for N IID samples of y (e.g. y1, y2, y3, . . . , yn), the log-likelihood can be written as:
lh log
N
Q
li yi Ali
r2
hyi exp
i1
N
P
loghyi
i1
hT XN
xy
i1 i i

1 XN
r2
i1
AhT xi
3:10
To see how we derived the above equation, note that because of

the IID sampling assumption, we can write P(y1, y2 . . . yn), which is
the likelihood of observing y1, y2 . . . yn as:
Py1 ; y2 . . . yn Py1 Py2 . . . Pyn
N
Q
Pyi
i1
Then, we simply replace each yi with its equivalence obtained

from Eq. (3.9) and then take the log of both sides.
Now, if we calculate the rst derivative of the likelihood function in Eq. (3.10):
rh l
N @l @ l
P
1 XN
1
i
y li xi 2 X T y l
2
i1 i
@
l
@h
r
r
i1
i
3:11
where X = [x1, x2, . . ., xn], Y = [y1, y2, . . ., yn]T and l = [l1, l2, . . ., ln]T
And the second derivative is:
r2h l
1 Xn
r2
x xT
i1 i i
r2
XT X
3:12
which is a quadratic function with a negative sign, meaning that l(h)

is a concave function, and we can estimate h that maximizes (h), the
likelihood function, by employing a convex optimization
algorithm.11
Graphically, the proposed algorithms in Eqs. (3.4) and (3.5) can
be represented as a Bayesian network model with three nodes
(Fig. 2, Graph A): the input node, which represents the system inputs; the output node, representing the system outputs; and the
fault node, representing system faults. The input node is assumed
to be known and deterministic. The output node is a random variable conditionally dependent on the input node and the fault
node. The projection from the input node to the output node is
inuenced by the fault node, which is indeed a simplied model
of the system. With this framework, the diagnostic process is dened as an estimate of the posterior distribution of the fault and
9
For simplicity, we assume here that only l is a function of H. However, in a more
general case, both l and r2 can be assumed to be a function of H.
10
More about the exponential families and canonical representations can be found
in Jordan et al. [23,22,31].
11
This derivation is indeed a special case for generalized linear models (GLM).
Further details about generalized linear models can be found in Jordan et al. [23] and
Dobson and Barnett [13].
352
An example of such a model is shown in Fig. 3. In the mixture

model, each node itself is a Bayesian diagnostic model of the related component. In Fig. 3, the rst node can be related to a mixing
box and the second one to a heating coil. The interaction between
nodes is dened based on the system architecture. A component
node input may contain all or part of the adjacent component node
outputs, and similarly, its output may construct all or part of the
next component inputs. The input nodes are not necessarily deterministic anymore, and the output nodes may not be fully observed.
When there is a measurement constraint, the associated node is
considered to be a hidden variable and the calculation of posterior
probabilities is achieved by summing over all possible values of the
hidden node. In Fig. 3, if the output of the rst component, which is
also the input of the second component, is assumed to be hidden
(not measureable), the posterior probability of the fault nodes
can be estimated by:
I O
PF 1 ; F 2 jI1 ; O2 P P 2 P1
Fig. 2. Bayesian network-based diagnostic model.
F1
F2
PF 1 PF 2 PO1 jF 1 ; I1 PO2 jF 2 ; I2
I2 O1 PF 1 PF 2 PO1 jF 1 ; I 1 PO2 jF 2 ; I2
3:14
the determination of the fault combination/status maximizing the
posterior probability.
Another issue to consider is the distinction between what we
call abrupt faults and degradation faults. Abrupt faults are faults
that arise instantaneously (e.g. stuck-damper fault, reverse-actuator fault, etc.), while degradation faults evolve over time, becoming
progressively more severe (e.g. damper leakage, valve leakage,
etc.). From a modeling and diagnostics standpoint, abrupt faults
can be thought of as binary events [the fault either exists (1) or
not (0)], while degradation faults are more like events with an
associated severity level parameter. This means that, in the case
of degradation faults, another parameter (fault stage) is required
to include the characteristics of the fault and its effects. Thus, we
enhance the diagnostic algorithm in Fig. 2 (Graph A) and Eq.
(3.4) by adding another node (stage node) to capture such dynamics (Fig. 2, Graph B). If a fault is a degradation fault, the stage node
species its severity level, and the system output is inuenced by
both the fault and its severity level. If the fault is an abrupt fault,
the severity will become just an identity matrix. Then, Eq. (3.4)
can be modied as:
PFjE
P
P
L PFPLjFPEjF; L
PF; LjE P P
L
F
L PFPLjFPEjF; L
3:13
where L = {l1, l2, l3, . . . , ln} is various severity levels. P(F) is the fault
prior. P(L|F) is the severity prior for each fault. Again, statistical or
intuitive approaches can be employed for the estimation of severity
priors. Here, we consider a uniform distribution for severity priors.
P(E|F,L) is, again, the likelihood function: the probability of observing E given the fault condition F and severity level L.12
The last issue to discuss is how the proposed algorithm can be
extended systematically to address measurement constraints. As
mentioned earlier, one of the challenges in air handling unit diagnostics is the absence of reliable measurements for MAT, which
forces us to rely on SAT to evaluate the functionality of both the
mixing box and heating/cooling coils.
For such scenarios, the proposed diagnostic algorithm (Bayesian
model) can be expanded to a model with a mixture of components
in which missing measurements are considered to be hidden variables (nodes) and the posterior probability is estimated by summing over all possible values of the hidden nodes.
12
To avoid complexity and present the equations in a more clear structure, we use
the Fig. 2 (Graph A) model as our reference for further derivations. However, all the
equations can be systematically expanded to the Fig. 2 (Graph B) format using
Equation (3.13).
The elements in Eq. (3.14) are either component fault priors (e.g.
P(Fi)) or a component likelihood function (e.g. P(Oi|Fi, li)). For each
measurement, the estimation of the posterior probability of each
fault combination involves considering all possible values of the
hidden (unknown) variable(s) and computing the posterior probability in each case. This has a higher computational complexity
compared to the scenarios where there is no hidden variable, especially if there are two, three, or more hidden variables. Also, note
that, in the case of a mixture of components, diagnostics are performed with less information available (some variables are not
measured and considered hidden). As we will see in the illustrative
examples in Section 4, this does not come for free. There will be
some penalty /cost for the diagnostic process that may affect diagnostic efciency.
4. Illustrative examples
In this section, we apply the proposed algorithm to air handling
unit diagnostics. We rst start with single component diagnostics
to demonstrate the capability of the proposed diagnostic algorithm
in detecting/isolating faults, handling concurrent faults, and dealing with modeling uncertainty/error. Then, we will consider mixture component scenarios to show how the algorithm can
systematically manage measurement constraint issues.
Example 1 Diagnosis of mixing box: As mentioned earlier, a mixing box mixes the air returning from the building with the outside
air based on the ratio dened by the control system (Fig. 1). The ratio is specied to minimize the energy required to heat up or cool
down the supply air and to satisfy the standard fresh air required
for occupants. A mixing box consists of three dampers: the outside
air damper, return air damper, and exhaust air damper (Fig. 1),
manipulated by one actuator in which the outside air and return
dampers operate in opposite directions. Mixing box malfunction
is a common fault in air handling unit operation. The malfunction
could be due to the leakage of the outside or return air dampers
(not fully closed in the 0% closing command position), the reverse
operation of the actuator, stuck damper(s), and so on.
Mixing box performance is usually analyzed by a dimensionless
parameter [44] outside air fraction (OAF), which is the ratio of the
difference between the mixed air temperature (MAT) and the return air temperature (RAT) over the difference between the outside
air temperature (OAT) and the return air temperature (RAT).
OAF
MAT RAT
OAT RAT
4:1
353
Fig. 3. Diagnostic model containing a mixture of components.
Fig. 4. Mixing box model (OAF variations versus OA damper) in non-faulty and various faulty conditions; Graph A is the operation under the non-faulty condition. Graph B is
related to the reverse-actuator fault condition, Graph C is an outside-air-damper-leakage fault, and Graph D is a return-air-damper-leakage fault.
The OAF is an indication of the inuence of the outside air

temperature on the mixed air temperature. It is ideally one
when the outside air damper is fully open (meaning that the
mixed air temperature is completely affected by the outside air
temperature variations) and zero when the damper is closed.
The variations of OAF versus damper positions in non-faulty
and several faulty conditions are shown in Fig. 4. In each graph,
the envelope shows the possible values for OAF given the damper position. As you see, each damper position is projected to a
range of possible values for OAF. The range is wider for middle
positions and narrower at the corners. This range, which is the
uncertainty range or modeling error, is due to variations of
parameters such as uid resistance, thermal resistance, and so
on, that are not easily measurable. In fact, Fig. 4 shows simplied models of the mixing box and various faulty and non-faulty
conditions.
The dynamics shown in Fig. 4 can be simulated by a random
variable, N(l, r) with a Gaussian distribution in which l and r
are dened as a linear combination of a set of basis functions:
l a1 f1 a2 f2 a3 f3
r2 b1 f1 b2 f2 b3 f3
f1 1;
f2

DMP
;
100
f 3 DMP=1002
where DMP is the outside air damper position. For example, a set of
possible values for a1, a2, a3 and b1, b2, b3 in non-faulty operation
(Graph 1 in Fig. 4) could be:
a1 0; a2 1; a3 0
b1 0;
b2 1;
b3 1
For each operating condition (various faulty and non-faulty conditions), a and b can be estimated through maximizing the likelihood
function of N(l, r) provided IID samples. However, as shown in Eqs.
(3.9), (3.10), (3.11), (3.12), the likelihood function of such a model is
a concave function, which means that a convex optimization
354
Fig. 5. Mixing box operation and diagnostic results. The upper graph shows the variations of OAT, RAT, MAT, and outside air damper position (DMP). SAT is used as a proxy for
MAT after correcting for the fan temperature rise. The lower graph shows the diagnostic assessment. As shown, the mixing box has a return-air-damper-leakage fault.
algorithm can be employed to locate the global maxima and estimate a and b numerically.
Applying the diagnostic model shown in Fig. 2A for mixing box
diagnostics, the input node contains the damper position and return and outside air temperatures, and the output node contains
OAF. Data from air handling unit operation at an experimental
facility, the Iowa Energy Center,13 was obtained to evaluate the performance of the diagnostic algorithm. During the test, faults were
deliberately induced to the system for analysis purposes.
An example of the mixing box operation, which contains the
variations of OAT, RAT, and MAT, as well as outside air damper position (DMP), is shown in Fig. 5 (upper graph).14 Note that, due to
poor mixing in most mixing boxes, SAT has been used as a proxy
for MAT after correcting for the fan temperature rise.15 Also, keep
in mind that the outside and return air dampers operate in opposite
directions and are controlled by one actuator.
A visual inspection of Fig. 6 (upper graph) would indicate an
abnormality in the mixing box operation. When the DMP is fully
closed, MAT is very close to RAT (as expected); however, when
the DMP is fully open, there is a gap between MAT and OAT, indi-
13
The Iowa Energy Center is an experimental facility for research, demonstration,
and education in building HVAC systems, energy efciency, and conservation (http://
www.energy.iastate.edu/).
14
The air handling unit has a design air ow of 3200 CFM with a draw-through
supply fan. Both the supply fan and the return fan motors are in stream. The design
supply fan pressure rise is 3.25 inches of water, and the return Fan pressure rise is 1.7
in. of water.
15
The fan temperature rise can be calculated by equating the increase in the
sensible heat content of the air stream to the sum of the uid work done by the fan
and the heat produced by the inefciency of the fan and other associated
components: DT dCDpPg where DP is the total pressure rise across the fan, d is the
density of air, Cp is the specic heat of air, g is the combined efciency of the fan
components in the air stream (typically the fan, belt, and motor). DP can be obtained
from the design pressure rise [usually available from test and balance (TAB)
measurements] and the efciency of the fan + motor is available from manufacturers
literature, mechanical drawings, etc. In general, the fan temperature rise is about one
degree or so. For instance, in this example, DP is 3.2 in. of water, and the efciency is
0.76, which leads to a fan temperature increase of around 1. 5 F. More can be found
in Haves [17].
cating that MAT is not solely inuenced by OAT but by RAT as well.
This suggests that the return air damper is not fully closed and has
a leakage fault.
The diagnostic result is shown in Fig. 5 (lower graph). As predicted, the system has a return-air-damper-leakage fault.16 Initially, the diagnostic mechanism makes a vague assessment
regarding most faults. Earlier assessments are inuenced signicantly by the fault prior assessments. As more data is observed,
the diagnostic belief moves away from prior belief and gets closer
to the true belief (empirical distribution). Another interesting point
is how the diagnostic mechanism nalizes its assessment. As we
know, return-air-damper-leakage faults cannot be condently conrmed unless the mixing box operates at (or near) the 100% outside
air damper position (which would be a 0% return air damper position). As shown in the gure, the mechanism does not nalize its
assessment regarding the return-air-damper-leakage fault until it
observes the system performance at around the 100% outside air
damper position.
The probability graphs shown in Fig. 5 are marginal probabilities calculated by Eq. (3.3). At each sampling time, the posterior
probability of each fault combination is rst computed based on
the current and previous observations, P(f1 . . . fn|e1 . . . ei). With the
assumption of IID sampling, the computation can be achieved
using Eq. (3.5). After the computation of posterior probabilities,
the marginal probability of each fault is computed using Eq. (3.3).
Example 2 Concurrent faults: In this example, we consider a
concurrent fault scenario in the mixing box. The mixing box characteristics are the same as in the previous example, and its operation is shown in Fig. 6 (upper graph). It may not be as easy as in the
previous example to detect faults by visual inspection. However,
the graph indicates that, as the DMP closes, MAT gets closer to
OAT instead of RAT, which suggests a reverse-actuator fault. In
addition, when the outside air damper is completely closed, there
is still a gap between MAT and OAT, which is a red ag for a leakage
16
In fact, a 10% return-air damper-leakage fault was intentionally induced to the
system.
355
Fig. 6. Mixing box operation and diagnostic results. The upper graph shows the mixing box operation. The lower graph shows the diagnostic analysis. In this example, the
mixing box has two concurrent faults: reverse-actuator and return-air-damper-leakage faults.
fault. The diagnostic results shown in Fig. 6 (lower graph) conrm

the existence of reverse-actuator and return-air-damper-leakage
faults.17
Keep in mind that, initially, the diagnostic algorithm starts with
a predened prior for each fault. As more data is observed, the
diagnostic belief moves away from the prior belief (prior fault distribution) to the true or empirical belief (distribution). This is why,
in Fig. 6 (lower graph), at the beginning, there is signicant probability for the existence of both outside-air-damper-leakage and
return-air-damper-leakage faults. The graph starts with the diagnostic assessment of the rst observation, which is a measurement
at 100% DMP. In a non-faulty operation when DMP is fully open,
MAT is expected to be close to OAT. However, as shown in the
graph, MAT is close to RAT instead. This indicates the existence
of fault(s). However, the algorithm does not have enough data
yet to identify the type of fault(s). Therefore, it assigns a very
low probability to the no-fault scenario (indicating that an abnormality exits) but keeps the probability of other faults at a signicant level. As it observes the system performance at other
operating points, it gets a better perspective regarding the fault
sources.
Example 3 Diagnostics of the heating coil: As mentioned, the
heating (cooling) system has the task of heating (cooling) the supply air. It is usually a nned tube heat-exchanger with hot (cold)
water on the hot (cold) side and air on the cold (hot) side. It normally contains one or a few sets of tubes mounted perpendicularly
on the ow of air passing through the coil. The heat transfer rate is
controlled by manipulating the valve adjusting the water ow rate
through the coil. Common operational faults include fouling of the
coil, leakage of the valve, reverse action of the valve actuator, stuck
valves, and so on.
In this example, we evaluate the heating coil performance
employing the proposed algorithm. The diagnostic model is shown
in Fig. 7. The inputs are Tair-in (temperature of incoming air), VLV
(valve position), Tw-in (temperature of incoming water), and CFM
17
A reverse actuator fault and around a ten percent return-air-damper-leakage fault
was induced in the system.
Fig. 7. Heating coil diagnostic algorithm.
(cubic feet per minute, measurement of air volume ow rate),

and the output is Tair-out (temperature of outgoing air). The simplied heat-exchanger model used is based on Holmes effectivenessNTU method model [18,21], briey described as (Fig. 7):
T air-out eT w-in T air-in T air-in
4:2
1 expNTU c
1 c expNTU 1 c
4:3
C min
; C min minC h ; C c ; C max maxC h ; C c
C max
4:4
Ch is the hot uid capacity rate; Cc is the cold uid capacity rate.
NTU
UA
;
Cm
1
;
rt
r t ra v 0:8
r m r w v 0:8
a
w
4:5
where va is the velocity of the air and vw is the velocity of the water.
In his paper, Holmes provided the suggested values for ra, rm, and rw.
Similar to the mixing box case, here we have used a simplied model of the heat exchanger, requiring parameters that can be obtained
356
Fig. 8. Heating coil operation and diagnostic results. The upper graph shows the variations of entering air temperature (T_air_in), entering water temperature (T_water_in),
outgoing air temperature (T_air_out), and valve position (Valve). The lower graph shows the diagnostic results. As shown, the coil has a valve-leakage fault.
from drawing specs. Indeed, the model presented by Holmes is a

fairly simple steady-state model for the simulation of heating and
cooling coils.
An operation of the heating coil is shown in Fig. 8 (upper graph),
and the diagnostics result is shown in Fig. 8 (lower graph). As
shown, there is a valve-leakage fault; however, the diagnostic
mechanism does not nalize its assessment about this fault until
it sees the system performance at around the 0% valve position.18
Example 4 Mixture of components: As the last example, we consider a simulated case of a mixture of components, an air handling
unit with a mixing box and a heating coil (Fig. 1), to show the effectiveness of the algorithm in dealing with measurement constraints.
Unlike previous examples in which SAT was inuenced by the
functionality of only one component, in this example, SAT is inuenced by the functionality of both the mixing box and the heating
coil. Following the mixture model framework shown in Fig. 3, the
diagnostic model of the air-handling unit is shown in Fig. 9. The
rst component is related to the mixing box, and the second component is related to the heating coil. MAT, which is the output of
the rst component and part of the inputs to the second component, is not measureable and is considered hidden (unknown).
The operation of each component is shown in Fig. 10 (upper
graph). As MAT is not available and the coils are not off, it is not
easy to analyze the components performance visually. The diagnostic results are shown in Fig. 10 (lower graph). The mixing box
has an outside-air-damper-leakage fault, while the heating coil is
non-faulty. Again, the probabilities shown in Fig. 10 are marginal
probabilities calculated using Eq. (2.3) from the computed posterior probability for each fault combination. In this mixture model,
MAT is missing (hidden), and the posterior probability is calculated
through summing over all possible values for MAT [the summation
in the nominator of Eq. (3.14) is over MAT].
18
A leakage fault of ve to ten percent was deliberately induced in the system. The
coil has 12 circuits, the tube inside diameter is 0.5 in., the face height is 1.93 ft, and
the face width is 3.33 ft. Also, the valve authority is 50% and the expected thermal
transfer at low ow for the valve-coil system at normal operation is shown in Table 1.
Fig. 9. Mixture of components diagnostic model for the air handling unit. The rst
component is related to the mixing box, and the second component is related to the
heating coil. MAT, which is the output of the rst component and part of the inputs
to the second component, is not measureable and is considered hidden (unknown).
In Fig. 10, the assessment of the outside-air-damper-leakage

fault improves signicantly after the time step of 50. In the graphs
for the component operations (upper graphs), the DMP and VLV are
both at 0% opening positions during this period. This is an ideal
operating condition for the diagnostic algorithm to check for leakage faults, and the diagnostic mechanism makes the best use out of
it. In addition, some faults such as the reverse operation or stuck
fault of either the damper or the valve are assessed readily by comparing leakage faults. Reverse or stuck types of faults, which are
also categorized as abrupt faults, can be assessed based on the relative changes of the system behavior. For example, when the heating valve opens up, we expect an increase in the supply air
temperature; if the opposite is observed, we would quickly red-ag
a reverse fault. That is why these types of faults are assessed readily. On the other hand, the assessment of leakage faults requires
observation of the system performance at certain operating points
and quantitative analysis against the predicted performance.
Therefore, they would not be assessed as quickly as abrupt faults.
Comparing the diagnostic results in Figs. 5, 6, 8 and 10, in the
case of Fig. 10, the diagnostic algorithm takes more time to reach
a solid conclusion regarding the system health status. This is the
357

Table 1
Expected thermal transfer at low ow for the value-coil system.
Valve % open
Thermal transfer % (Q/Qmax)
3
7.1
6
13.2
9
18.8
12
24
15
28.9
18
33.4
21
37.7
24
41.7
27
45.5
30
49.1
Fig. 10. Air handling unit operation and diagnostic results. The upper graphs show the operation of the mixing box and the heating coil and the lower graphs show the
diagnostic results.
penalty paid for the absence of MAT. As the algorithm relies on SAT
to interpret the functionality of both the mixing box and the heating coil, the uncertainty associated with each component model is
aggregated into SAT readings. In this case, the algorithm requires
more data, which basically means more time, to make a solid
assessment. In other scenarios, the constraint may affect the condence level of the diagnostic assessment.
The last issue to discuss is sensor error. Normally, in air-handling units, sensor error could be up to 2 F. The question is
how this would affect the diagnostic performance. In general, as
the diagnostic algorithm is less dependent on measurement at
any individual point and takes into account the system behavioral
pattern over a window of operation, it has more exibility in dealing with sensor errors. We know that the algorithm already takes
into account some level of uncertainty/error due to the employment of simplied models. To account for sensor error in a systematic way, the uncertainty can be expanded to include both
modeling errors and sensor errors, such as in the likelihood function. If the employed mode is an extension of an analytical model
with added uncertainty, the sensor error can be included as part of
the analytical process. If other statistical procedures are used to develop the model, the sensor error can be assumed to be statistically
independent and aggregated accordingly. One easy way could be to
think of the sensor error as a random variable with normal distribution (or a distribution recommended by the sensor manufacturer), add that to the modeling error random variable, and
compute the joint variance.
Incorporation of sensor error results in a higher level of uncertainty associated with each measurement. As a result, the diagnostic mechanism would require more data to reach a solid
conclusion, and it would be less sensitive to faults with a smaller

impact on the system performance. If a fault effect is within the
uncertainty range dened by the modeling and sensor errors, the
algorithm cannot distinguish between deviations in the system
performance due to the fault occurrence and those coming from
modeling and sensor error. For instance, if, in the rst example, a
2 F sensor error had been considered for MAT and RAT, OAF could
have an error of 40% or more when the difference between MAT
and RAT was less than 10 F. This means that, if the effect of a return-air-damper-leakage fault were less than 3 or 4 F, it would be
non-detectable, and the diagnostic result would be inconclusive
between no-fault and a return-air-damper-leakage fault.
5. Conclusion
Relying on accurate or detailed models and requiring the measurement of parameters/variables that are not easily measureable/
accessible in practical applications have been the main drawbacks
of diagnostic solutions for air handling units from the scalability
and affordability perspectives. The aim of this paper was to develop diagnostic algorithms that are less dependent on model accuracy and more exible with respect to measurement constraints.
In the proposed diagnostic algorithm, we think of fault diagnostics
as the process of analyzing a system behavioral pattern (observed
performance) and comparing it with a set of hypothetical patterns
to nd the closest match. Each hypothetical pattern is developed
based on the assumption of the existence of none, one or more
faults in the system. We demonstrated how such a problem can
be formulated as a posterior estimation problem of a Bayesian
358
model. It was shown how effective the problem diagnostic algorithm could be in detecting/isolating faults, dealing with measurement constraint challenges, and managing the complexity of
concurrent faults.
Although the focus of this paper was on air handling unit diagnostics, we believe that the proposed algorithm has the potential
to be applied in other applications with similar restrictions. Further
research and development may be required for more complex scenarios. A commercial building, for example, contains several air
handling units and other components, which increases the system
complexity as a network of components. Research needs to be
done to determine what challenges exist because of the added
complexity (for example, dimensionality) and what further development might be needed to make the proposed approach applicable to such systems.
References
[1] Andersen KK, Reddy TA. The error in variables (EIV) regression approach as a
means of identifying unbiased physical parameter estimates: application to
chiller performance data. Int J Heat, Ventil, Air Condit Refrig Res 2002;8(3):
295309.
[2] Ardehali M, Smith TF, House JM, Klaassen CJ. Building energy use and control
problems: an assessment of case studies. ASHRAE Trans 2003;109:2.
[3] Baker N, Steemers K. Energy and environment in architecture. New York: E&FN
Spon; 2000.
[4] Brambley MR, Pratt RG, Chassin DP, Katipamula S. Automated diagnostics for
outdoor air ventilation and economizers. ASHRAE J 1998;40(10)::4955.
[5] Buildings energy data book. US Department of Energy (DOE); 2003.
[6] DOE plan projects savings for new building energy costs. ASHRAE J; 2000. pp6.
[7] Carling P. Comparison of three fault detection methods based on eld data of
an air-handling unit. ASHRAE Trans 2002;108(1).
[8] Cibse GH. Building control systems. Oxford: Butterworth-Heinemann; 2000.
[9] Claridge DE, Culp CH, Liu M, Deng S, Turner WD, Haberl JS. Campus wide
continuous commissioning of university buildings. In: Proceedings of the
ACEEE summer study. Washington, DC: ACEEE; 2000.
[10] Castro N. Performance evaluation of a reciprocating chiller using experimental
data and model predictions for fault detection and diagnosis. ASHRAE Trans
2002;108(1).
[11] DasGupta A. Asymptotic of statistics and probability. Springer Tests in
Statistics; 2008.
[12] Dexter AL, Ngo D. Fault diagnosis in air-conditioning systems: a multi-step
fuzzy model-based approach. Int J Heat Ventil Air Condit Refrig Res
2001;7(1):83102.
[13] Dobson AJ, Barnett A. An introduction to generalized linear models. Texts in
statistical science. 3rd ed. Chapman & Hall/CRC; 2008.
[14] Du Z, Jin X, Yang Y. Fault diagnosis for temperature, ow rate and pressure
sensors in VAV systems using wavelet neural network. J Appl Energy
2009;86(9):162431.
[15] Glass AS, Gruber P, Roos M, Todtli J. Qualitative model-based fault detection in
air-handling units. IEEE Control Syst Mag 1995;15(4):1122.
[16] Haves P, Salsbury T, Wright JA. Condition monitoring in HVAC subsystems
using rst principles. ASHRAE Trans 1996;102(1):51927.
[17] Haves P, Kim M, Naja M, Xu P. A semi-automated commissioning tool for VAV
air handling units: functional test analyzer. ASHRAE Trans 2007;113:1.
[18] Holmes MJ. The simulation of heating and cooling coils for performance
analysis. In: Proceeding of the conference on system simulation in building,
Liege; 1982.
[19] House JM, Vaezi-Nejad H, Whitcomb JM. An expert rule set for fault detection
in air handling units. ASHRAE Trans 2001;107(1).
[20] Hyvrinen J, Krki S. International energy agency building optimisation and
fault diagnosis source book. Espoo, Finland: Technical Research Centre of
Finland, Laboratory of Heating and Ventilation; 1996.
[21] Incropera FP, DeWitt DP, Bergman TL, Lavine AS. Fundamentals of heat and
mass transfer. Chapman and Hall/CRC; 2011.
[22] Jordan MI. Graphical models, statistical science, vol. 19; 2004. p. 14055
[Special Issue on Bayesian Statistics].
[23] Jordan MJ, Wainwright MJ. Graphical models, exponential families, and
variational inference; 2008.
[24] Kumar S, Sinha S, Kojima T, Yoshida H. Development of parameter based fault
detection and diagnosis technique for energy efcient building management
system. Energy Convers Manage 2001;42:83354.
[25] Katipamula S, Pratt RG, Chassin DP, Taylor ZT, Gowri K, Brambley MR.
Automated fault detection and diagnostics for outdoor-air ventilation systems
and economizers: methodology and results from eld testing. ASHRAE Trans
1999;105(1).
[26] Katipamula S, Brambley MR. Methods for fault detection, diagnostics, and
prognostics for building systems a review part I. HVAC&R Res 2005;11:n1.
[27] Katipamula S, Brambley MR. Methods for fault detection, diagnostics, and
prognostics for building systems a review part 2. HVAC&R Res 2005;11:n2.
[28] Kumar SS, Kojima T, Yoshida H. Development of parameter based fault
detection and diagnosis technique for energy efcient building management
system. Energy Convers Manage 2001;42:83354.
[29] Lee WY, Park C, Kelly GE. Fault detection of an air-handling unit using residual
and recursive parameter identication methods. ASHRAE Trans 1996;102(1):
52839.
[30] Lee WW, House JM, Kyong NH. Subsystem level fault diagnosis of a buildings
air-handling unit using general regression neural networks. J Appl Energy
2004;77(2):15370.
[31] McCullagh P, Nelder JA. Generalized linear models. Monographs on statistics &
applied probability. 2 nd ed. Chapman & Hall/CRC; 1989.
[32] Norford LK, Wright JA, Buswell RA, Luo D, Klaassen CJ, Suby A. Demonstration
of fault detection and diagnosis methods for air-handling units (ASHRAE 1020RP). HVAC&R Res 2002;8(1):4171.
[33] Pakanen J, Sundquist T. Automation-assisted fault detection of air-handling
unit; implementing the method in a real building. Energy Build 2003;
35:193202.
[34] Peitsman HC, Bakker V. Application of black-box models to HVAC systems for
fault detection. ASHRAE Trans 1996;102(1):62840.
[35] Reddy TA, Niebur D, Andersen KK, Pericolo PP, Cabrera G. Evaluation of the
suitability of different chiller performance models for on-line training applied
to automated fault detection and diagnosis. Int J Heat Ventil Air Condit Refrig
Res 2003;9(4).
[36] Riemer PL, Mitchell JW, Beckman WA. The use of time series analysis in fault
detection and diagnosis methodologies. ASHRAE Trans 2002;108(2).
[37] Rossi TM. Detection, diagnosis, and evaluation of faults in vapor compression
cycle equipment. Ph.D. thesis, School of Mechanical Engineering, Purdue
University, West Lafayette, Indiana; 1995.
[38] Rossi TM, Braun JE. A statistical, rule-based fault detection and diagnostic
method for vapor compression air conditioners. Int J Heat Ventil Air Condit
Refrig Res 1997;3(1):1937.
[39] Sohn MD, Small MJ, Pantazidou M. Reducing uncertainty in site
characterization using Bayes Monte Carlo methods. J Environ Eng 2000;
126(10):893902.
[40] Sohn MD, Reynolds P, Singh N, Gadgil AJ. Rapidly locating and characterizing
pollutant releases in buildings. J Air Waste Manage Assoc 2002;52:142232.
[41] Salsbury TI, Diamond RC. Fault detection in HVAC systems using model-based
feed-forward control. Energy Build 2001;33:40315.
[42] Wagner J, Shoureshi R. Failure detection diagnostics for thermo-uid systems.
J Dyn Syst Meas Contr 1992;114(4):699706.
[43] Wang S, Chen Y. Fault-tolerant control for outdoor ventilation air ow rate in
buildings based on neural networks. Build Environ 2002;37:691704.
[44] Xu P, Haves P, Kim M. Model-based automated functional testingmethodology and application to air handling units. ASHRAE Trans 2005;
111(Pt. 1):979989 [LBNL-55802].
[45] Yoshida H, Iwami T, Yuzawa H, Suzuki M. Typical faults of air-conditioning
systems, and Fault detection by ARX model and extended Kalman lter.
ASHRAE Trans 1996;102(1):55764.
[46] Yoshida HS, Kumar S. ARX and AFMM model-based on-line real-time data base
diagnosis of sudden fault in AHU of VAV system. Energy Convers Manage
1999;40:1191206.

Application of Machine Learning in The Fault Diagnostics of Air Handling Units PDF

Încărcat de

Informații document

Descriere originală:

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Application of Machine Learning in The Fault Diagnostics of Air Handling Units PDF

Încărcat de

Drepturi de autor:

Formate disponibile

Applied Energy 96 (2012) 347358

Contents lists available at SciVerse ScienceDirect

Application of machine learning in the fault diagnostics of air handling units

two decades. However, in spite of the progress and effort made,

M. Naja et al. / Applied Energy 96 (2012) 347358

analyzing observed behavioral patterns and comparing them with

It may contain both or either.

M. Naja et al. / Applied Energy 96 (2012) 347358

Fig. 1. Air handling unit.

mixing box effects from the heating/cooling coil effects is not

M. Naja et al. / Applied Energy 96 (2012) 347358

existence of one or more faults in the system (see for example

and the measured data from the system is dened as:

where e1 . . . em present vectors of the data measured at t = 1, . . ., m.

Pf1 . . . fn Pe1 . . . em jf1 . . . fn

where P(f1 . . . fn) is the prior distribution. Different strategies or logic

log Pf1 . . . fn je1 . . . em log Pf1 . . . fn

log Pei jf1 . . . fn

Pf1 . . . fn Pe1 . . . em jf1 . . . fn

M. Naja et al. / Applied Energy 96 (2012) 347358

P(ei|f1 . . . fn) is the likelihood function: the probability of measuring

Pei jf1 . . . fn POi jIi ; f1 . . . fn

Eq. (3.6) is indeed a probabilistic model of system performance.

Now lets assume that the output, y, is a linear combination of

As y is a Gaussian distribution, we have:

xi = [xi,1, xi,2, . . . , xi,n]T,

h = [h1, h2, . . . , hn]T

, yi or xi is the value of y or x at time/index = i,

and xi,1 is the value of x1 at time/index = i. This is indeed a canonical

To see how we derived the above equation, note that because of

Py1 ; y2 . . . yn Py1  Py2 . . .  Pyn

Then, we simply replace each yi with its equivalence obtained

which is a quadratic function with a negative sign, meaning that l(h)

M. Naja et al. / Applied Energy 96 (2012) 347358

An example of such a model is shown in Fig. 3. In the mixture

Fig. 2. Bayesian network-based diagnostic model.

M. Naja et al. / Applied Energy 96 (2012) 347358

Fig. 3. Diagnostic model containing a mixture of components.

The OAF is an indication of the inuence of the outside air

M. Naja et al. / Applied Energy 96 (2012) 347358

M. Naja et al. / Applied Energy 96 (2012) 347358

fault. The diagnostic results shown in Fig. 6 (lower graph) conrm

Fig. 7. Heating coil diagnostic algorithm.

(cubic feet per minute, measurement of air volume ow rate),

T air-out eT w-in T air-in T air-in

M. Naja et al. / Applied Energy 96 (2012) 347358

from drawing specs. Indeed, the model presented by Holmes is a

In Fig. 10, the assessment of the outside-air-damper-leakage

M. Naja et al. / Applied Energy 96 (2012) 347358

conclusion, and it would be less sensitive to faults with a smaller

M. Naja et al. / Applied Energy 96 (2012) 347358

S-ar putea să vă placă și

Py1 ; y2 . . . yn Py1 Py2 . . . Pyn