Sunteți pe pagina 1din 12

Energy and Buildings 95 (2015) 160171

Contents lists available at ScienceDirect

Energy and Buildings


journal homepage: www.elsevier.com/locate/enbuild

Energy audit of schools by means of cluster analysis


Rigoberto Arambula Lara a , Giovanni Pernigotto b , Francesca Cappelletti c, ,
Piercarlo Romagnoni c , Andrea Gasparella a
a
b
c

Faculty of Science and Technology, Free University of Bozen-Bolzano, Italy


Department of Management and Engineering, University of Padova, Italy
Department of Design and Planning in Complex Environments, University IUAV of Venice, Italy

a r t i c l e

i n f o

Article history:
Available online 26 March 2015
Keywords:
Energy retrot
Energy audit
Schools refurbishment
Cluster analysis
Energy consumption

a b s t r a c t
More than 30 % of the Italian schools have very low energy efciency due to aging or poor quality of
construction. The current European policy on energy saving, with the Commission Delegated Regulation
(EU) 244/2012, recommends a cost-optimal analysis of retrot improvements, starting from some reference buildings. One relevant issue is the denition of a set of reference buildings effectively representative
of the considered stock. A possible solution could be found using data mining techniques, such as the
K-means clustering method, which allows the division of a large sample into more homogeneous and
small groups. This work adopts the cluster analysis to nd out a few school buildings representative of
a sample of about 60 schools in the province of Treviso, North-East of Italy, thus reducing the number
of buildings to be analyzed in detail to optimize the energy retrot measures. Real consumption data
of the scholastic year 20112012 were correlated to buildings characteristics through regression and
the parameters with the highest correlation with energy consumption levels used in cluster analysis to
group schools. This method has supported the denition of representative architectural types and the
identication of a small number of parameters determinant to assess the energy consumption for air
heating and hot water production.
2015 Elsevier B.V. All rights reserved.

1. Introduction
According to the latest report about school buildings of the
Italian Association for the Environment Safeguard, in Italy 42 000
schools are currently in operation and about 60 % of them were
built before 1974 [1]. Despite nearly 50 % of the schools have undergone emergency repairs in the last 5 years, more than 30 % requires
urgent maintenance not only due to aging reasons but also because
of the poor quality of the recent constructions.
The current interest in school buildings, not only in Italy but
also in Europe, is primarily related to two aspects: the high level
of energy consumption of this sector, and the inadequate level of
comfort (both thermal and air quality). Numerous studies have
been carried out to determine both the real dimension of the problem and to propose technically and economically feasible solutions,
while the governments have established tougher regulations and
standards that new and retrotted constructions have to comply with. The main problems in schools, as pointed out by many

Corresponding author. Tel.: +39 0412571295.


E-mail address: francesca.cappelletti@iuav.it (F. Cappelletti).
http://dx.doi.org/10.1016/j.enbuild.2015.03.036
0378-7788/ 2015 Elsevier B.V. All rights reserved.

authors, deal with not only the building envelope and system features, but with the management as well.
Some years ago, Antonini et al. [2] carried out a survey on a sample of 50 schools in the North-East Region of Veneto, Italy. Schools
were assessed as for the energy performance, through analytical
calculation methods, and for the environmental quality, through
experimental detections. It was found that schools in Veneto use
annually between 250 kWh m2 and 350 kWh m2 (290 kWh m2
in average) including hot water for the gymnasiums and the canteens. About one third of this use is attributable to heat losses
through the building envelope. With respect to the heating systems, in addition to the oversized heat generators found in almost
all the buildings of the sample, the same analysis identied problems, only detectable through in situ measurements, related to an
incorrect positioning of the internal thermostatic probes or of the
heating elements or to a general bad management of the heating
system. Similarly, Filippn [3], starting from a study on 15 Argentinian schools, reported a number of issues of management and
control related to the maintenance, the appropriate positioning of
thermostats, the identication of critical areas, the monitoring of
abnormal loads and the training of staff and students in the proper
use of the facilities.

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

Nomenclature
Symbols
A
C
EP
F
H
h
HDD
HDH
K
Q
R2
S
T
U
V
VIF

area (m2 )
referred to cluster or centroid
normalized energy performance (Wh m3 K1 h1 )
F-test statistic ()
capacity of the heating system (kW)
hour (h)
heating degree days (K d)
heating degree hours (K h)
number of partitions for K-means algorithm
heating demand (Wh)
index of determination ()
dissipating surface (m2 )
temperature ( C)
thermal transmittance (W m2 K1 )
conditioned volume (m3 )
variance ination factor ()

Subscripts
0
initial
adj
adjusted
env, gl
referred to transparent envelope
referred to opaque envelope
env, o
external
ext
f
referred to oor
oor in thermal contact with the ground
fg
int
internal
k
referred to the kth cluster
occupancy
occ
r
referred to the roof
vw
vertical walls exposed to the external environment
referred to the windows
win

Another important data collection, focused in particular on


energy consumption, installed power and used fuel, was conducted
in the province of Perugia, Central Italy, by Desideri and Proietti [4]
on 29 education institutes, distinguished by the type of construction. The specic electric and heat consumptions per unit of volume,
per class and student have been calculated. It was noted that the
energy needs for heating is about 80 % of the total and, in the hypothetical scenario in which all the examined buildings reduced the
consumption to the minimum detected in the sample, the achievable savings in terms of energy per student and energy per cube
meter would be 47.6 % and 38 % respectively. Recently, consumption data for space heating of a sample related to 120 school units
were collected in the province of Torino, North-West of Italy [5].
The schools were equipped with fuel meters, heat meters and climatic probes allowing the derivation of a performance indicator
for the heating consumption, to be used for an initial analysis of
the building stock and a preliminary assessment of future budget allocations for the Public Administration. Subsequently, using
techniques of multivariate statistical inference on a subgroup of 35
units within the monitored buildings, linear models were developed to correlate the measured consumption with thermo-physical
and geometrical characteristics [5,6]. The difculty of obtaining a
complete documentation describing the buildings led to focus on
the development of models based on a small number of independent and easily detectable variables, such as the installed power
and the oor surface.
Recent studies focused on the identication of the mosteffective measures to be applied in the building-system retrotting.
Butala and Novak [7] indicated insulation and replacement of

161

the windows as the most effective, cost-effective and necessary


interventions for 20 of 24 Slovenian buildings examined. For the
Greek climates characterized by a higher level of heating degreedays, Dimoudi and Kostarela [8] quantied the effect of individual
retrotting interventions to reduce both the needs for heating and
cooling in order to increase the indoor comfort of the occupants,
assessing the energy savings also in terms of decrease of the pollution agents in the considered environments. The potential savings,
after the improvement of the insulation level, were 28.7 % for heating, while more than 99 % in cooling, through the use of simple
ceiling fans and especially night ventilation.
When planning the retrot or assessing the improvement
potential of a large stock of existing buildings, a large-scale assessment of the consumption has to be carried out. In this framework
different auditing approaches can be adopted to nd a benchmark to evaluate the energy performance of the building stock,
on one hand, and to assess the performance after retrotting
interventions, on the other hand. Many studies on building stock
classication and benchmarking have been carried out, some of
them concerning school buildings. A primary goal in these benchmark analyses was to dene a stochastic model based on a few
variables, either a regressive model [6] or a targeted selection of
statistically signicant cases [9], suitable to rstly estimate the
margins for improvement. Hernandez et al. [10] proposed a method
to calculate the energy performance benchmark for a rating system using a calculated energy performance indicator and grading
it according to standard EN 15217:2007 [11]. A group of primary
schools in Ireland was used as case-study and the main problem
they encountered was the lack of historical data, a problem that is
also present in Italy.
The European Commission is nowadays promoting the renovation of existing buildings by the implementation of a cost-optimal
analysis of different retrot improvements, starting from a reference building, which has to be representative of a building category
[12]. As it can be expected, dening the reference building in a
stock of existing ones implies the analysis of a large amount of
information to nd out how this set can be sub-grouped. In similar
cases when the building stock is very large, the application of statistical techniques in order to group buildings with homogeneous
characteristics is necessary to focus the investigations on a small
number of representatives and possibly extend the results to the
others. Gaitani et al. [9], used clustering to identify a few representative buildings on which to carry out detailed considerations
of retrotting, thus anticipating the European directives. Analyzing a sample of 1100 schools, i.e. 33 % of the Greek school building
stock, by means of algorithms of clustering and principal components analysis, ve typical buildings were selected and described
by seven characteristics, such as the heated area, age of the building
and heating system, envelope insulation, number of classrooms and
students and occupancy prole. In order to reduce the number of
variables analyzed, the contribution of each to the nal energy performance was calculated individually. The application of clustering
in the analysis of existing buildings can be found also in other studies. Indeed, clustering analysis is a powerful data mining technique,
used to nd correlations and patterns, by which a set of elements is
split into several homogeneous groups containing elements that
are much more similar to each other and signicantly different
from those of any other group. Santamouris et al. [13], for example, used clustering techniques to dene energy classes based on
heating energy consumption of a large sample of schools in Greece.
Some other authors applied the cluster analysis for the building
stock evaluation, not only to schools but also to the household market [14]. In some studies the regression analysis associated to the
clustering has been used. This is the case of Filippn et al. [15], who
evaluated the historical heating natural gas consumption during
13 years in 72 apartments belonging to three different multifamily

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

A large database of information for 85 buildings that represent all of the Province-owned educational building stock has been

0.8

30

0.4

14

10
3

0.2

0
>70

60-70

50-60

10

40-50

20

0.6

25

Hea ted Volume [x1000 m]


Fig. 2. Frequency of buildings by the gross heated volume.

Cumulative frequency

40

30-40

2.1. General description

20-30

2. Statistical description of the school sample

50

10-20

Although the sample is composed of schools, this methodology


can be applied also to other buildings categories and could be of
particular interest for historical buildings.

analyzed. The province of Treviso is located in the Italian climatic


zone E (Cfa according to Kppen classication). Schools are situated in locations with a conventional number of Heating Degree
Days (HDD20 ) calculated with respect to a base temperature of 20 C
spanning from 2350 to 2700 K d.
For each building, data regarding geometry, thermal properties
of the building envelope and energy consumption of 5 years are
available. After a rst control, some of the buildings have not been
considered for further analysis, because of missing or inconsistent
information. The nal selected sample includes about 60 buildings,
all of which with consistent data availability. The age of these
buildings is variable (Fig. 1): many were built before the 1976, a few
of them in the recent past years. Fig. 1 shows the frequency distribution of the buildings concerning the construction period. About
50 % of the schools were built before the publication of any energy
law, the rst in Italy being the Law number 373, in force since 1976.
A picture of the sample can be drawn by the descriptive statistical analysis of the most common parameters. Three quarters of
the schools have a gross heated volume lower than 20,000 m3 . Bigger volumes actually occur when the same school occupies more
than one building (Fig. 2). In Fig. 3 the frequency distribution of
the schools concerning the ratio between the dispersing area and
the heated volume (S/V ratio) shows that the sample is composed
mostly of quite compact buildings with a S/V ratio varying from 0.3
to 0.5. Finally, looking at the amount of the transparent surfaces,
67 % of the schools have a percentage of transparent envelope on
the total dispersing opaque envelope (Aenv, gl /Aenv, op ) in the range
612 % (Fig. 4a) and 78 % have a window to oor ratio (Awin /Af ) in
the range 1020 % (Fig. 4b).

4-10

(1) Using regression analysis, the groups of parameters (predictors)


that are better correlated to the nal energy consumption are
identied;
(2) The clustering method has been selected, adapted to the peculiarities of the research objectives with the addition of some
rules and performed for clustering the sample of schools;
(3) Another regression has been performed to check the goodness
of the clustering and to validate the developed clusters;
(4) The obtained results have been studied, the centroids of each
cluster have been determined and identied as representative,
and other linear models have been developed to get an optimized tting to the data in each cluster.

Fig. 1. Frequency distribution of buildings concerning the construction period.

<4

buildings in La Pampa, Argentina. The stepwise regression method


was used to select the most representative variables that explain
the changes of heating and annual energy consumption during the
available period and the clustering to group the apartments according to the variables explanatory of consumption, aiming to dene
annual energy classes and their ranking.
Concerning the aim of the studies, in some cases the clustering
has been applied to classify the energy performance of buildings
[9,13,16], in other cases to dene the building typologies in a building stock [17], sometimes to analyze the occupants behavior in
relation to the energy loads [18], or to check the necessary number
of typical load curves to represent the building behavior [19]. In
other contexts, clustering has been applied not on real buildings,
but on simulated ones. This is the case of Heidarinejad et al. [20],
who used cluster analysis to examine simulated energy consumption of 134 U.S. LEED ofce buildings to classify buildings into high,
medium, and low energy use intensity clusters and to provide a
quantitative evaluation of the large difference in energy intensities
in high-performance ofce buildings.
In retrotting, even if energy consumption data are often available, they have to be correlated with the buildings characteristics
in order to estimate the best actions to apply. While being a simple
task for a single building, this becomes quite difcult and timeconsuming when a large stock is considered. Moreover, even when
consumptions are collected, information about buildings characteristics is often very little.
The aim of this work is to explore a method for clustering a
large set of existing buildings in order to group them on the basis
of the characteristics that have the highest contribution on energy
consumption levels. The nal aim is nding a few representative
schools, which could be monitored, modeled also by means of simulation calibration techniques, and analyzed, in order to evaluate
the impact of interventions, and to optimize, by a cost-optimal
approach, the list of the possible retrot measures. The set of buildings is composed of about 60 schools dated back from the 19th
century up to now and located in the province of Treviso, in the
North-East of Italy. For each school, energy use for heating and
for sanitary hot water was available for the period 20082013,
together with a number of geometrical and thermo-physical data.
All this information has been elaborated with a precise approach
coherent with [21,22] in order to group them on the base of similar
characteristics:

Frequency

162

19

20
10
0

Dispersing Surface/ Volume ratio [m-1]

10

0.4

0.2
0

Opaque Envelope Average U-value [W m -2 K-1]

(b) 50

2.2. Thermal resistance of building envelope

30

30

0.6

16

0.4

>16

14-16

12-14

10-12

8-10

0.2

40

0.8

29
16

20-25

15-20

3
10-15

<5

5-10

10

0.4

11
0

1
>30

20

0.6

25-30

30

0.2

Cumulative frequency

50

>4

3.5-4

3-3.5

1
2.5-3

0.2
0

Windows average U-value [W m -2 K-1]


Fig. 5. Frequency of buildings by thermal transmittance of opaque envelope (a) and
of windows (b).

opaque envelope and the average transmittance of the windows is


plotted.

2.3. Occupancy schedule

Aenv, gl / Aenv, op [%]

(b)

Frequency

14

1
6-8

<2

4-6

10

10

0.4

10

1.5-2.0

0.8

17

<1.5

40

Cumulative frequency

2-4

Frequency

50

0.6

24

20

(a)

0.8

2.0-2.5

Frequency

40
For each school of the sample, some more information about
the thermal transmittance of the structures is available. Data
were elaborated in order to obtain an area weighted average
value of thermal transmittance for the external opaque envelope (vertical walls, oor, roof) and windows. In Fig. 5a and b
the frequency distribution of the average transmittance of the

Cumulative frequency

Fig. 3. Frequency of buildings by dispersing S/V ratio.

20

Cumulative frequency

0.6

>1.5

30

1.3-1.5

0.8-0.9

>0.9

0.8

1.1-1.3

0.2

0.7-0.8

0.6-0.7

0.5-0.6

1
0.4-0.5

<0.2

5
0.3-0.4

10

0.4

40

0.9-1.1

19

20

0.6

0.7-0.9

28

30

50

0.5-0.7

0.8

<0.5

40

163

(a)

Frequency

Cumulative frequency

50

0.2-0.3

Frequency

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

Awin / Af [%]
Fig. 4. Frequency of buildings by the percentage of windows area over the opaque
envelope area (a) and over the total oor area (b).

In the schools dataset, the overall number of hours per year


in which the indoor temperature was maintained at the setpoint
is available for years from 2008 to 2013. This parameter changes
according to the type of school and to the extra-curricular activities
that take place in the same school building outside the teaching timetable. Actually, the occupancy schedule is an information
more signicant than the total amount of occupancy hours: thus,
using these schedules it is possible to calculate for each school the
total occupancy hours during the corresponding heating period
(October 15th April 15th). Consequently, the temperature difference between the indoor and outdoor temperature, which the
heating energy consumption strictly depends on, can be estimated
at each hour during the occupancy period. In the available dataset,
occupancy schedules of all the schools are available only for the
year 20112012. Since heating has to be provided during occupancy hours in order to maintain the indoor air temperature at
the setpoint, it has been possible to calculate the specic Heating Degree Hours (HDH20, occ ) for the scholastic year 20112012.
Meteorological data for the same period coming from 10 monitoring stations administered by the regional environmental agency
(ARPA Veneto) in different locations of the province have been
used.

164

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

Fig. 6. Heating degree hours (HDH20, occ ) for each school during heating period of the scholastic year 20112012.

As shown in Eq. (1), HDH20, occ are calculated as the sum of all
the hourly differences between the external air temperature and
a supposed internal setpoint temperature of 20 C during every
occupancy hour of the heating period.
HDH20,occ = (Tint Text ) h

(1)

In Fig. 6 the results of this calculation are plotted, showing that the
range of values present in this group of buildings is quite ample, as
it goes from around 8500 to 25 500 K h.
2.4. Heating systems and energy consumptions
As regards the heating system, most of the schools use natural gas boilers (87 %) while the others use heating oil fueled
boilers. Before 2012, all the boilers were traditional noncondensing ones. During 2012, the traditional boiler has been replaced in
some schools with a condensing one. As for the value of the boiler
heating capacity: 16 % have boilers below 300 kW, 40 % between
300 kW and 600 kW, 27 % between 600 kW and 900 kW, and 17 %
more than 900 kW. In order to compare the energy performance of
schools, the energy consumption of each of them has been normalized with respect to its heated volume and heating degree hours.
Fig. 7 shows the trend of the energy index, which is very variable:
from 0.53 Wh m3 K1 h1 to 8.41 Wh m3 K1 h1 .
3. Method
As specied in the introduction, the rst step regards the selection of the quantities to describe the sample and then to perform the
clustering. The annual energy consumption of each building can be
correlated to some of them, such as those describing the geometry,
the composition of the envelope, the characteristics of the conditioning system, the control schedules and the weather conditions.
The inuence of each quantity on the heating demand is clearly different and the highest correlated parameters and variables can be
used to characterize effectively the sample of buildings. Since the
aim of the work is to group schools with similar characteristics and
correlations between them and the energy consumption, we need
to dene which variables are the most suitable to characterize the
heating demand and which ones can describe the properties of the
buildings set. As showed in previous paragraph, in order to make
the energy consumption for heating (Q) easier to compare and, consequently, the school easier to split into homogeneous groups, it
has been normalized coherently with the Italian National Guidelines for the Energy Labelling of Buildings [23] and EN 15217:2007.
Thus, Q have been divided by value of the conditioned volume (V)
and the heating degree hours (HDH20, occ ) calculated according to
the approach explained before. In this way, weather, size of building
and occupancy have been removed from the list of the descriptive

quantities and their effects directly accounted in the normalized


energy performance (EP), expressed in [Wh m3 K1 h1 ]. The list
of 12 candidate descriptive quantities includes:
The area of the vertical walls exposed to the external environment
Avw [m2 ];
The area of the roof Ar [m2 ];
The area of the oor Af [m2 ];
The area of the oor in thermal contact with the ground Afg [m2 ];
The total area of the opaque envelope Aenv, o [m2 ];
The total area of the transparent envelope Aenv, gl [m2 ];
The ratio between the windows area and the vertical walls area
Awin /Avw [];
The ratio between the windows area and the total oor area
Awin /Af [];
The ratio between the transparent envelope and the opaque envelope Aenv, gl /Aenv, op [];
The average thermal transmittance of the envelope U
[W m2 K1 ];
The shape factor of the school, expressed in terms of ratio S/V
[m1 ] between the dissipating surface S and the conditioned volume V;
The capacity of the heating system H [kW].
In order to help to nd the combinations of the candidate quantities to dene homogenous groups, we decided to adopt the multiple
linear regression, since it offers the best advantages to perform
the following clustering [21]. Indeed, the selected quantities can
be employed both to identify the groups and to develop linear
predictive models for their elements. Other statistical techniques
could be implemented, such as multivariate ANOVA or the analysis of the correlations with Spearmans index, depending on the
research objectives. In this case, considering the small size of the
sample and the aim of detecting linear relationships, only the principal effects of the correlations between the candidate quantities
and the EP have been investigated. The study of the interactions
and non-linear relationships will be considered in further developments with larger datasets, including more years of measured
data. Due to the chosen approach, EP can be treated as a response
quantity, which is, at most, a linear function of the 12 independent
predictors. For each one of the 12 descriptive quantities, the highest
value in the whole dataset is identied and used to normalize the
characteristics of each building. The predictors can be grouped in
4083 possible combinations starting from groups with 2 to groups
with 12 predictors. The corresponding linear models are elaborated
starting from the smallest groups. The adjusted index of determi2 ) of each model is monitored because it is one of the
nation (Radj
most meaningful indexes to consider in case of non-hierarchical
clustering [21]. The statistical signicance of the model itself has

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

165

Fig. 7. Annual energy consumption per unit heated volume and per unit degree hour (scholastic year 20112012).

been controlled by means of the value of F-tests and the p-values


and the multi-collinearity issues by means of the variance ination
factor (VIF). Only models with signicant p-value with respect to
a signicance level of 5 % or, at least, 10 % and, preferably, with
VIF values lower than 10 (i.e., without multi-collinearity issues) are
considered for the denition of the quantities for the clustering.
The calculation of the different models stops when, even including
2 cannot be signicantly improved. The
larger and larger groups, Radj

Since the whole dataset for the clustering includes 58 elements,


we decided to impose KI = 3 for the rst clustering and KII = 2 in case
of sub-clustering. A preliminary study has indicated that, for the
current dataset, using KI = 3 in the rst clustering is more effective
than KI = 2. On the contrary, if KI = 4, some clusters are too small and,
so, the centroids have few representativeness. Furthermore, many
clusters obtained with KI = 4 from this sample have the number
of buildings inadequate for the development of predictive linear
models according to the central limit theorem and so cannot be
validated. Clustering into 3 groups results to be a good compromise
between the sub-clustering levels and the adequacy of the cluster
size.
The K-means method is sensitive to the initial centroids C0, k :
with the aim of increasing the robustness of the approach, an iterative procedure has to be implemented, as well as an optimized
choice of the starting conditions. As it is commonly done in K-means
approaches, the initial virtual centroids are randomly generated
within the domain of the dataset. However, some constraints are
imposed in order to prevent cases in which C0, k are too far or too
close to each other and to partially avoid including very distant data
points in the same cluster. Specically, for each initial centroid an
inuence region is dened as percentage of the total size of the data
cloud (30 % in this case). For sub-clustering, instead, since KII = 2
and the number of elements is lower, a different approach which
maximizes the distance between C0, 1 and C0, 2 is preferred. Consequently, specic combinations of the values of the predictors are
used for the centroid initialization: the virtual centroid C0, 1 has the
maximum value and C0, 2 the minimum. After the creation of the
initial clusters, the centroids C1, k are calculated and the K-means

2 are selected as set


combinations of predictors with the highest Radj
of coordinates to dene the position of each element in the sample
of schools.
The next two steps involve the clustering and its validation.
Among the different alternatives for the identication of clusters,
the K-means approach is one of the most popular techniques in
clustering and data mining. The technique is based on a simple
partitional algorithm that tries to nd K non-overlapping clusters
[24,25]. By this method, K centroids are selected according to the
desired number of clusters and data points are assigned to the closest centroid according to the squared Euclidean distances. Once the
clusters are dened, it is possible to validate them by checking if
2 with respect to
the combination of predictors with the highest Radj
the whole dataset is the best for the cluster as well. If it is not, the
2 is found and used
combination of predictors with the highest Radj
as new coordinate system. If the cluster has an improved but still
2 , if enough buildings are present (i.e., with more than 25
poor Radj
elements), it is possible to run the algorithm again in order to dene
2
sub-clusters, but using the set of parameters with the highest Radj
for the cluster to split.

Table 1
Results of the 924 combinations with 6 predictors: top-10 congurations selected for the rst clustering and parameters considered for each group.
ID
Predictors
Avw
Ar
Af
Af-g
Aenv, op
Aenv, gl
Awin /Avw
Awin /Af
Aenv, gl /Aenv, op
U
S/V
H

235

825

787

x
x

311

902

270

x
x

x
x

843

53

304

307

x
x

x
x
x

x
x

x
x
x

x
x
x
x

x
x
x

x
x
x
x

x
x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

x
x
x

2
Radj

0.275

F value
p-value
VIF > 10

4.59
<0.01

0.265

0.261

0.259

0.257

0.255

0.254

0.254

0.254

0.253

4.43
<0.01

4.35
<0.01

4.32
<0.01

4.28
<0.01

4.24
<0.01

4.23
<0.01

4.23
<0.01

4.23
<0.01

4.22
<0.01

166

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

Table 2
Results of the rst clustering of the 58 schools and the subsequent second regression in each cluster. In gray color the improved regressions.

C1

C2

C3

ID

235

825

787

311

902

270

843

53

304

307

R2 adj
F value
p-value
N
R2 adj
F value
p-value
N
R2 adj
F value
p-value
N

0.399
3.88
0.01
27
0.165
1.76
0.17
24

0.400
3.44
0.02
23
0.326
3.09
0.03
27

0.542
3.36
0.08
13
0.455
2.39
0.21
11
0.154
1.99
0.10
34

0.802
9.76
<0.01
14
0.094
1.63
0.17
37

0.355
2.92
0.04
22
0.341
3.32
0.02
28

0.585
4.76
0.02
17
0.450
2.37
0.21
11
0.168
1.97
0.11
30

0.461
4.56
0.01
26
0.151
1.68
0.19
24

0.543
3.18
0.11
12
0.166
2.43
0.04
44

0.695
4.79
0.08
11
0.565
4.68
0.01
18
0.058
1.29
0.31
29

0.486
4.30
0.01
22
0.478
2.52
0.19
11
0.02
1.08
0.41
25

approach is iterated. The heuristic procedure continues until the


determination of the i combination of centroids Ci, k for which the
global squared Euclidean distances are minimized.
Once all possible clusters are dened, the results are analyzed and the schools closest to the centroids are determined. As
regards the linear predictive models, we have discussed also if some
improvements to the adjusted index of determination are possible
and, in case, which are the implications of optimizing the linear
regressions on the data of each sub-cluster. The predictors, their
combination, as well as the regression coefcients are re-calculated
considering the elements of each cluster. Also in this case, F-values,
p-values and VIF are monitored to see to which extent the models
can be used: if just to describe the data in the clusters or also for
further extrapolations and predictions.
4. Results and discussion
4.1. Clustering analysis
The evaluation of the number of predictors to be used to describe
the sample has been stopped at groups with 8 elements. Indeed,
2 of the different multiple linear regressions, no
comparing the Radj
signicant increase is registered changing the number of predictors from 6 to 7 and there is a decrement from 7 to 8. For this
reason, combinations of 6 quantities were chosen for clustering.
The 924 combinations with 6 predictors were listed according

to their adjusted index of determination and the rst 10 con2 were selected as coordinates to
gurations with the highest Radj
perform clustering analysis (Table 1). Each combination of 6 parameters has been identied by an ID-number. Analyzing the 10 best
groups of 6 predictors, we can see that adjusted index of determination, the Fisher F-values and the p-values are all similar. No
multi-collinearity issues are detected. These linear models, even if
statistically signicant with respect to the chosen level and without multi-collinearity, could not be used because the adjusted index
of determination is very low. This conrms the importance of the
identication of homogeneous groups to improve the models t. As
regards the involved predictors, it can be noticed that the area of the
vertical walls, the area of the surface in thermal contact with the
ground, the average thermal transmittance, the shape factor and
the capacity of the heating system are common to most of models
(U, S/V and H to all of them). What differentiates the models is often
the variable descriptive of the window area.
The best 10 groups have been used for the rst clustering. In
order to assess the improvement given by adoption of these combinations of predictors within the K-means method, they have been
used to develop new linear regressive models starting from the elements of each cluster (Table 2). In Table 2, in gray those clusters and
combinations for which the linear regressive model elaborated for
a given cluster is statistically signicant with respect to a level of
10 % and has an adjusted index of determination larger than the
one determined for the whole dataset with the same predictors. All

Table 3
2
(Best model) calculated for each cluster. In red the model variables with VIF > 10. In gray
Groups of predictors used in clustering vs. groups of variables with an optimized Radj
color the improved regressions.
Cluster
ID
Predictors
Avw
Ar
Af
Af-g
Aenv,op
Aenv,gl
Awin Avw
Awin Af
Aenv,gl Aenv,op
U
S/V
H
R2 adj
F statistic
p-value
N

C1

C2

Cluster
304

Best model
162

Cluster
304

x
x

C3
Best model
629

C3.1

C3.2

Cluster
304

Best model
403

Cluster
403

Best model
396

Cluster
403

Best model
457

x
x
x

x
x
x

x
x
x

x
x

x
x

x
x

x
x
x

x
x
x
0.695
4.79
0.08
11

x
0.988
140.59
<0.01

x
x

x
x
x
0.565
4.68
0.01
18

x
x
x
0.676
6.91
<0.01

x
x
x
0.058
1.29
0.31
29

x
x

0.486
5.41
<0.01

0.891
13.21
0.03
10

0.989
130.29
<0.01

0.162
1.58
0.24
19

x
x

x
x
0.369
2.75
0.06

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

167

IDs with the exception of ID-53 gave improvements considering


the rst cluster, especially ID-311 and ID-304. Looking at the second cluster, only ID-825, ID-902 and ID-304 gave a better t of the
linear models to the data. Eventually, no ID led to improvements
for the last cluster: in many cases small groups were generated,
preventing the calculation of the regressions. As a whole, ID-304
brought the best results, since, for more than one cluster generated with its predictors, it allowed us to develop better linear
models.
In conguration ID-304, three main clusters are found: C1 with
11 buildings, C2 with 18 and C3 with 29 (Table 2). With respect
to the value of the adjusted index of determination for model ID2 = 0.69) and C
304 on the whole dataset, the indexes of C1 (Radj
2
2 = 0.56) are increased while R2 of C is still very low. These
(Radj
3
adj
results mean that the clustering gave an improvement for the classication and modeling of half of the buildings set, now split into
the homogeneous groups C1 and C2 . The remaining half of the sample in C3 needed some further investigation: since its number of
elements was high enough, it has been determined to apply a subclustering. For the sub-clustering of C3 , the group of predictors with
2 is ID-403 and, thus, it has been used in the K-means
the best Radj
method. The groups resulting from this operation are clusters C3.1
(with 10 schools) and C3.2 (including 19 buildings). The linear model
ID-403 has been assessed on these two sub-clusters but only the
adjusted index of determination of C3.1 was improved. Since the
size of C3.2 was not large enough, no additional sub-clustering has
been performed.
For each of the nal clusters C1 , C2 , C3.1 and C3.2 , we looked for
linear models with adjusted index of determination higher than
that of the linear models used for the selection of the predictors
to use in the clustering. When another combination of predictors
2 , it was selected to model that cluster. Four differoptimized Radj
ent congurations were, in this way, selected: ID-162 is found to
be the best combination for C1 , ID-629 for C2 , ID-396 for C3.1 and
2 of 0.37.
ID-457 for C3.2 . The linear model ID-457 has still a low Radj
That could indicate either a residual non-homogeneous group or
the impossibility of tting well the data with a linear model or
both. Since C3.2 is too small to be split into sub-clusters and to have
sufcient data to develop regressive models, a possible development is to look for non-linear models to t C3.2 elements. For all
2 and p-value have been improved. In Table 3 it is posclusters, Radj
sible to see that the models developed for C1 and C2 are slightly
or not affected by multi-collinearity issues while the sub-clusters
C3.1 and C3.2 are largely affected: this means that the errors on
the estimation of the regression coefcients for some predictors
can be high. Consequently, the models cannot be used for dataextrapolation because the uncertainty on the relationship between
the normalized energy performance and some predictors can be
too large. Thus, these linear regressions can give robust results
only with the aim to study and model the buildings in the clusters or groups of buildings with similar characteristics, without any
purpose of extensive generalization. In some cases in the literature, the Principal Component Regression (PCR) was used instead
of multivariate regression to overcome multi-collinearity effects
and it could be a possible further development to improve the
models. In [26], for example, in order to balance the efciency and
accuracy of benchmarking modeling, a selective residual-clustering
benchmarking method is proposed for building envelope energy
efciency evaluation with multi or high dimensional data set. The
authors rstly calculated the variance ination factor to determine
the degree of multi-collinearity among explanatory variables. Then,
they applied either simple multivariate regression analysis for the
combinations of variables without multi-collinearity or principal
component regression to a sample of 480 buildings. Through PCR,
it has been noticed that main uncorrelated components can be

Fig. 8. Actual vs. Estimated energy consumption of schools in cluster C1 , C2 , C3 , C3.1


and C3.2 . The dotted lines indicate a deviation of 20 %.

168

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

Table 4
Building closest to the centroid of each cluster.

Predictors

Avw
Ar
Af-g
Aenv, op
Aenv, gl
U
S/V
Awin /Avw
Awin /Af
Af
H
Aenv, gl /Aenv, op
EP

Units

m2
m2
m2
m2
m2
W m2 K1

m2
kW

Wh m3 K1 h1

CLUSTER 1
CN 042-01

CLUSTER 2
VB 049-01

CLUSTER 3.1
CV 091-01

CLUSTER 3.2
MB 083-02

CLUSTER 3
CV 046-01

5111.66
5503.00
5503.00
10 328.23
1842.26
0.82
0.33
0.36
0.18
10 185.00
1420.00
0.18
1.74

2080.31
1720.60
1762.00
2027.16
738.31
0.95
0.38
0.35
0.17
4474.00
378.00
0.36
0.62

1773.71
1350.00
1341.00
1881.26
508.23
1.07
0.38
0.29
0.16
3205.00
644.00
0.27
1.77

2217.54
1612.00
1612.00
3829.54
493.28
1.27
0.41
0.22
0.16
3104.00
822.00
0.13
2.48

2178.25
1866.00
1861.00
3983.55
576.34
1.09
0.42
0.26
0.15
3739.00
283.00
0.14
1.51

identied and more efciently adopted to elaborate a regression


model.
As regards the comparison of the linear regression models to
the measured data, for each cluster, the normalized values of the
outputs (i.e., the normalized energy consumption) have been represented (Fig. 8). As it can be seen, the model ts very well the
measured data for C1 and, considering C2 , most of points are within
the error band of 20 %. Looking at the third graph, it can be observed
that sub-clustering allowed to optimize the models t to data, conrming what already observed about the indexes of determination.
In order to visualize all the parameters describing a particular
building at once, parallel coordinates plots were used. As seen in
Fig. 9, through this particular kind of representation it is possible
to visualize multivariate data from each building using a number of
parallel axes corresponding to the number of parameters. In each
parallel axis, the normalized value for one parameter is showed,
by means of a line crossing the axis at that particular level. This
way the comparison of the buildings characteristics can be done
referring to the spread of the variability of each parameter. In the
following analysis, the terms high, medium and low are relative to
the sample considered.
In the case of cluster C1 , buildings with roughly the biggest
external wall areas are included, as well as those with medium values (0.40.6) for the S/V ratio. Even if this cluster seems the most
disperse with respect to parameters such as the area of opaque
envelope Avw (with values between 0.2 and 1) and the average
thermal transmittance U (going from 0.3 to 1), similarities between
buildings included in C1 are found. For example, the S/V ratio and
the roof area Ar , which have a smaller dispersion, range values
between 0.4 and 0.6 for the former and from 0.2 to 1 for the latter.
Energy consumption levels for all the buildings inside this group
are between 0.1 and 0.4, thus including some of the buildings with
lower energy demand per volume in the dataset.
With regard to buildings inside C2 , these are characterized by
small roof and ground oor area, with most of the schools within a
range from 0.1 to 0.2, nding medium to high values (0.30.7) for
the thermal transmittance and high values (between 0.4 and 1) for
the shape factor S/V. Nevertheless, a small dispersion is found in the
majority of the parameters dening this group, exception made for
the above-mentioned S/V and Awin /Af , both with values spanning
from medium to high, and Aenv, gl /Aenv, o , for which values from low
to medium (0.20.7) are found.

The buildings of the sample with the highest average thermal


transmittance of the envelope are found in cluster C3 and, consequently, some of the schools with the highest levels of consumption
per volume belong to this last cluster. Nevertheless, almost all
of these schools have medium external wall areas (0.20.6) and
small roof and ground oor areas, with values ranging from
0.1 to 0.3. With respect to sub-clusters C3.1 and C3.2 , the main
differences relate to Aenv, gl /Aenv, o , as well as to the reported energy
consumption levels. In the case of C3.1 , these quantities show an
ample range of values, going from 0.4 and 1 for the former and
from 0.2 to 1 for the latter. On the contrary, in C3.2 the dispersion
is smaller and lower values for both parameters are found, being
Aenv, gl /Aenv, o from 0.2 to 0.4 and the energy demand in the range
between 0.1 and 0.3, that is the lowest for whole dataset. Despite
general similarities in C3.1 and C3.2 quantities, the two sub-clusters
have different energy consumption levels: buildings with medium
to high energy consumption are grouped in C3.1 , while schools
with low energy demand are included in C3.2 .
4.2. Reference buildings for each cluster
Once the centroid coordinates for each cluster and sub-cluster
were dened, square Euclidean distances from every building to
their corresponding centroid were calculated in order to determine
which building is the closest to the centroid inside each particular
group (Table 4). Being these, the school buildings with characteristics that are more similar to the average values within their cluster,
these buildings are considered to be adequate reference buildings
for the group they belong to.
With regard to cluster C1 , a distance of 0.1821 was measured
from the centroid to the school identied with the code CN 04201. This building belongs to a technical institute located in the
town of Conegliano. It has a total oor area of 10 185 m2 , being
the largest among reference buildings. Its average transmittance
(U = 0.82 W m2 K1 ) is the lowest and, even if this is a relatively
compact building, it has the largest external wall surface. In contrast, school VB 049-01 is the second smallest reference building
and has also the lowest energy consumption per conditioned volume. It is situated in Valdobbiadene and is the closest to cluster C2
centroid with a total distance of 0.0924. The distance from cluster
C3 centroid to the nearest school is 0.1016 for building CV 04601, which is located in Castelfranco Veneto. From the centroid of

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

169

Fig. 9. Diagrams in parallel axis representing the coordinates (normalized parameters) of the schools belonging to each cluster and the centroid coordinates of clusters C1 ,
C2 , C3 , C3.1 and C3.2 . The gray dots in each graph x-axes indicates the parameters used for the optimized regression.

170

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171

C3.1 , the smallest distance to a school is 0.1581, for building CV


091-01 (in Castelfranco Veneto) whereas school MB 083-02 (situated in Montebelluna) is the closest to C3.2 centroid with a total
distance of 0.0707. With regard to their particular characteristics,
the latter is quite similar in geometry to C3 reference building
whereas parameters with lower values are generally found in the
former.

out the most convenient retrot solutions. By means of the


regressive models, the achievable energy and economic savings can be estimated for the whole clusters. Moreover, some
comparison can be conducted starting on the data collected
during the next years of the survey, with the purpose of investigating the fault detection capabilities of the approach or of
assessing the effects of some already implemented energy saving
measures.

5. Conclusion
In this work, we discussed the problem of classifying and modeling the existing building stock and identifying a limited number
of representative buildings in order to develop strategies for an
extensive refurbishment according to the cost-optimal approach
suggested in the Commission Delegated Regulation [12]. We studied a sample of almost 60 schools in the province of Treviso, Italy,
and adopted a modied K-means approach as methodology for
their classication. Multiple Linear Regression techniques have been
used to drive and validate the cluster analysis, demonstrating some
potential to determine the classication and leading the clustering
strategy.
The main aspects of the proposed method have been the following ones:
(1) Starting from a multi-dimensional domain composed by all the
descriptive quantities of the schools sample and correlated with
their normalized energy consumption, it has been possible to
reduce the variables from 12 to the main 6 ones.
(2) Once the best group of quantities has been dened, the clustering has been performed and the results validated by studying
2 ) and
the variation of the adjusted index of determination (Radj
other statistics, such as p-values, F-values and variance ination factors, considered for diagnostic purposes. The method
has been iterated on more levels, until clustering was no more
meaningful and veriable.
(3) The data in the clusters have been studied and described
with optimized regressions. For some clusters, such as C1 , C2
and C3.1 , a good data tting has been achieved by the developed linear models: the adjusted index of determination is
almost 0.7 for C2 and more than 0.9 for C1 and C3.1 . However, some diagnostics revealed statistical multi-collinearity
issues, in particular for the sub-clusters and for the optimized
regressions. This underlined the impossibility of robust use of
the found models for extrapolation and the necessity to further investigate the data with alternative approaches, such
as the principal components regressions or some non-linear
methods. Otherwise, the dataset should be populated with
more points (not necessary with more buildings but with more
annual energy consumption data) to allow for more robust
ndings.
(4) Even if there are some limitations in the extent of the models outside the dataset, homogeneous groups and models have
been properly dened and, subsequently also their centroids.
The schools closest to those centroids have been determined.
Their oor area ranges from around 3100 m2 (C3.2 ) to more than
10 000 m2 (C1 ), with an installed heating capacity from less
than 400 kW (C2 ) to more than 1400 kW (C1 ), an average thermal transmittance of the envelope from 0.82 W m2 K1 (C1 )
to 1.27 W m2 K1 (C3.2 ), a S/V ratio from 0.33 (C1 ) to 0.41 (C3.2 )
and a ratio between transparent and opaque envelope from 0.13
(C3.2 ) to 0.36 (C2 ).
After the identication of the representative schools, it is now
possible to classify the schools according to a priority intervention list, to apply the cost-optimal approach and to nd

Acknowledgment
The authors would like to thank the Province of Treviso (Provincia di Treviso) for making the schools database available for this
research.

References
[1] Legambiente, Ecosistema Scuola 2012 XIII Rapporto di Legambiente sulla
qualit delledilizia scolastica, delle strutture e dei servizi (XIII Legambiente
report on the quality, facilities and services of school buildings), 2012 (in Italian).
[2] E. Antonini, M. Boscolo, F. Cappelletti, P. Romagnoni, Riqualicazione di
edici scolastici: risultati di una campagna di monitoraggio energetico
(Schools refurbishment: results of an energy monitoring campaign), in:
Proceedings of the 4th Energy Forum, Bressanone, 2009, pp. 139143 (in
Italian).
[3] C. Filippn, Benchmarking the energy efciency and greenhouse gases emissions of school buildings in central Argentina, Build. Environ. 35 (2000)
407414.
[4] U. Desideri, S. Proietti, Analysis of energy consumption in the high schools of a
province in central Italy, Energy Build. 34 (2002) 10031016.
[5] S.P. Corgnati, V. Corrado, M. Filippi, A method for heating consumption assessment in existing buildings: a eld survey concerning 120 Italian schools, Energy
Build. 40 (5) (2008) 801809.
[6] S.P. Corgnati, T. Bellone, F. Ariaudo, Previsione dei consumi per il riscaldamento ambientale degli edici esistenti con approccio statistico: il caso
delle scuole (Determination of energy consumption for space heating
in existing buildings with statistical approach: the case of schools), in:
Proceedings of the 64th National Congress ATI, Montesilvano, Italy, 2009 (in
Italian).
[7] V. Butalak, P. Novak, Energy consumption and potential energy savings in old
school buildings, Energy Build. 29 (1999) 241246.
[8] A. Dimoudi, P. Kostarela, Energy monitoring and conservation potential in
school buildings in the C climatic zone of Greece, Renew. Energy 34 (2009)
289296.
[9] N. Gaitani, C. Lehmann, M. Santamouris, G. Mihalakakou, P. Patargias, Using
principal component and cluster analysis in the heating evaluation of the school
building sector, Appl. Energy 87 (6) (2010) 20792086.
[10] P. Hernandez, K. Burke, J.O. Lewis, Development of energy performance benchmarks and building energy ratings for non-domestic buildings: an example for
Irish primary schools, Energy Build. 40 (2008) 249254.
[11] CEN, EN 15217:2007 Performance of buildings Methods for expressing
energy performance and for energy certication of buildings, Brussels, Belgium,
2007.
[12] European Commission, Commission Delegated Regulation (EU) No
244/2012 of 16 January 2012 supplementing Directive 2010/31/EU,
2012.
[13] M. Santamouris, G. Mihalakakou, P. Patargias, N. Gaitani, K. Sfakianaki,
M. Papaglastra, C. Pavlou, P. Doukas, E. Primikiri, V. Geros, M.N. Assimakopoulos, R. Mitoula, S. Zerefos, Using intelligent clustering techniques to
classify the energy performance of school buildings, Energy Build. 39 (2007)
4551.
[14] F. Encinas, A. De Herde, Denition of occupant behavior patterns with respect to
ventilation for apartments from real estate market in Santiago de Chile, Sustain.
Cities Soc. 1 (2011) 3844.
[15] C. Filippn, F. Ricard, S. Flores Larsen, Evaluation of heating energy consumption patterns in the residential building sector using stepwise selection and
multivariate analysis, Energy Build. 66 (2013) 571581.
[16] S. Petcharat, S. Chungpaibulpatana, P. Rakkwamsuk, Assessment of potential
energy saving using cluster analysis: a case study of lighting systems in buildings, Energy Build. 52 (2012) 145152.
[17] A.A. Famuyibo, A. Duffy, P. Strachan, Developing archetypes for domestic
dwellings an Irish case study, Energy Build. 50 (2012) 150157.
[18] Z. Yu, F. Haghighat, B.C.M. Fung, E. Morofsky, H. Yoshino, A methodology for
identifying and improving occupant behavior in residential buildings, Energy
36 (2011) 65966608.
[19] P.R.S. Jota, V.R.B. Silva, F.G. Jota, Building load management using cluster and statistical analyses, Int. J. Electr. Power Energy Syst. 33:8 (2011)
14981505.

R. Arambula Lara et al. / Energy and Buildings 95 (2015) 160171


[20] M. Heidarinejad, M. Dahlhausen, S. McMahon, C. Pyke, J. Srebric, Cluster analysis
of simulated energy use for LEED certied U.S. ofce buildings, Energy Build.
85 (2014) 8697.
[21] X. Gao, A. Malkawi, A new methodology for building energy performance
benchmarking: an approach based on intelligent clustering algorithm, Energy
Build. 84 (2014) 607616.
[22] M. Halkidi, Y. Batistakis, M. Yazirgiannis, On clustering validation techniques,
J. Intell. Inf. Syst. 17 (2/3) (2001) 107145.
[23] Italian Government, Minister Decree 26/06/2009: National guidelines for buildings energy labelling, Gazzetta Ufciale, 2009.

171

[24] S.P. Lloyd, Least Squares Quantization in PCM, Bell Telephone Labs Memorandum, Murray Hill, NJ, 1957 (Reprinted in: IEEE Trans. Information Theory IT-28
(1982) vol. 2, p. 129137).
[25] W. Junjie, Advances in K-means clustering. A data mining thinking, Springer
Thesis, Springer, 2012.
[26] W. Wang, Z. Shen, K. Grosskopf, Benchmarking energy performance of building
envelopes through a selective residual-clustering approach using high dimensional dataset, Energy Build. 75 (2014) 1022.

S-ar putea să vă placă și