Sunteți pe pagina 1din 6

Paper accepted for presentation at PPT 2001 2001 IEEE Porto Power Tech Conference loth-13Ih September, Porto,

Portugal

Load pattern clustering for Short-Term Load Forecasting of anomalous days


/

Gianfranco Chicco, Member, IEEE, Roberto Napoli, Member, IEEE, Federico Piglione

Abstract- Load forecasting algorithms try to capture regular behaviours in historic load time series in order to perform an accurate forecast. The presence of anomalous days (holidays, working days between holidays, social events) is a serious drawback and requires a dedicated forecast. The successful application of Artificial Neural Networks (ANN) in this field suggested the use of the Kohonen Self-Organising Map for clustering the similar load patterns and classifying day typologies. In order to evaluate the benefits of this choice, this work compares the Kohonen map with a classic clustering algorithm, both applied to grouping the daily load patterns in homogeneous sets. The information gathered by the clustered data is then applied to the 24-hour ahead load forecasting of anomalous days, by means of an ANN-based approach. The results show that the combined use of both clustering techniques allows better understanding of the anomalous load patterns.
Index terms-short-term load forecasting, cluster analysis, artificial neural networks, self-organising map, radial basis function

In ANN-based STLF, a cluster of similar load patterns, belonging to some day typology, forms the Training Set (TS) of a feed-forward ANN that learns to forecast that specific day typology. The method referred in [2] uses an unsupervised algorithm that forms several clusters of similar load patterns; afterwards, for each cluster a dedicated Functional Link Network (FLN) is trained. In References 3 and 4, a Self-Organising Map (SOM) assists the human expert in finding the TS of a dedicated MultiLayer Perceptron (MLP). Moreover, the SOM has been directly employed as associative memory in finding a first guess of the forecasted load profile [5]. Actually, there exist some differences in using the SOM instead of the clustering algorithms. The SOM recognises very well clusters of similar patterns in a set of raw data. Moreover, it has the interesting property of the topological preservation, i.e. similar data are grouped together in the same region of the map. In fact, SOM projects the ndimensional space of the input data into the twodimensional grid of the map units. This feature is very useful for visual inspection of the data set peculiarities, though some skill is required in order to train correctly the SOM. Topological preservation does not exist in clustering algorithms, which only split up the data in several classes. If only the latter information is needed, the problems concerning design and training of the SOM could be avoided. This paper compares the performances of SOM and clustering algorithms in the classification of daily load patterns, mainly with relation to the anomalous days problem. Firstly, the algorithms and their features are summarised. Afterwards, some anomalous periods are considered and the classification uncertainties discussed. Finally, the hourly load forecast of the anomalous days is performed by means of the clustered data and a forecasting model based on the Radial Basis Function (RBF) ANN.

I. INTRODUCTION
Short-Term Load Forecasting (STLF) predicts the power system load 1-7 days in advance. It is then one of the main tools in power system management, allowing safe and economical operation. The traditional STLF procedures are based on statistical approaches, such as multiple regression, Box-Jenkins method, and spectral decomposition. In the last decade Artificial Neural Networks (ANN) have been more and more regarded as an effective approach to the STLF problem [I]. Apart from the method employed, it is widely recognised that better forecasts are obtained if the historical time series present a regular behaviour, which could be easily captured by the forecast algorithm. Actually, anomalous days (holidays, working days between holidays) require dedicated forecasts. In order to forecast the next day's load the human expert search the historic database for similar days and draft a forecast that he subsequently refine by means of current information on weather and social events. In the same way, the grouping of similar daily load patterns allows prediction algorithms to reduce the forecasting error. Paper PPT-377 accepted for presentation at the IEEE Porto Power Tech 2001 Conference, Porto, Portugal, September 10-13,2001. The authors are with the Dipartimento di Ingegneria Elettrica Industriale, Politecnico di Torino, corso Duca degli Abruzzi 24, I10129 Torino, Italy (E-mail chicco@polito.it, rna~oli(iir,polito.it,~irlione@athena.polito.it)

11. THESELF-ORGANISING MAP


The SOM, developed by Kohonen [6,7], is an unsupervised topologically preserving ANN that performs a clustering analysis of the input data. It is composed of a pre-defined grid of units (usually two-dimensional) which form a competitive layer (only one unit is activated at the presentation of each input sample). Each unit is represented by an n-dimensional weight vector, belonging to the n-dimensional input space, and by its position in the

0-7803-7139-9/01/$10.00 02001 IEEE

grid. The learning algorithm, inspired by biological considerations, changes not only the weights of the winning unit (the unit whose weight vector is nearest, according to some distance criterion, to the presented sample), but also the weights of its neighbours in inverse proportion of their distance from the winning unit (neighbourhood function). During the training, separate areas (bubbles of activity), which follow the probability distribution of the TS samples, grow up spontaneously in the map. The most attractive feature of the SOM is that, once trained, the map represents the projection of the TS data, belonging to an n-dimensional space, into a bidimensional one. The mutual distances between the bubbles are then proportional to those in the original data space. The topological preservation of the input structure and the simulation of the bubbles of activity distinguish this approach from the traditional unsupervised clustering approaches.

For the numerical tests we employed some hourly load data extracted from the database of a small Italian electric utility. This utility supplies a mixed industrial, commercial and residential load. The winter load peak of the weekdays is about 270 MW, with a base load of 100 MW. A database with three years (199511997) of hourly load data was available. These data were analysed by the authors in Reference 10, in order to develop an ANN-based load forecasting method for ordinary weekdays. A correlation analysis showed that the behaviour of time series is nearly autoregressive, since the influence of the exogenous variables is weak. In fact, the correlation with weather variables is weak, so that they act mostly as a seasonal effect. Two cases are presented: a winter month (December 1996) with several holidays and some anomalous days (days between holidays), and a spring month (April 1996) which includes the Easter week.

111. CLUSTERING ALGORITHMS


Case I. December 1996

Unsupervised clustering algorithms, such as k-means, Isodata, Maximin distance [SI, are able, like the SOM, to discovering regularities in data sets. In this work we employed the algorithm proposed by Pao [2,9], whose main features are efficiency and simplicity. The algorithm is a variant of the classic 'follow the leader' approach and does not require initial guess of cluster centre coordinates, nor the initial number of clusters. The clustering process is controlled by a threshold called Vigilance Parameter (VP) and uses Euclidean metric function. The first sample is selected as the centre of the first cluster. Then, the next sample is compared to the first cluster centre. If the Euclidean distance is smaller of the VP, it is clustered with the first. Otherwise, it is the centre of a new cluster. This process is repeated for all samples, and reiterated until a stable cluster formation occurs. The VP is then the average radius of the clusters. A good heuristic guess for the VP is the half of the Mean Euclidean Distance (MED) among the N samples xi, defined by:

Fig. 1 shows the monthly load pattern of December 1996. The first day is a Sunday; then, beginning from the fourth Sunday, the load shape becomes anomalous, owing to the Christmas holidays.
280 1
260

240
220

MW 200
180

160
140

120
I '

200

300

400

500

SO0

700

I O

hours

Fig. 1. December 1996: monthly load pattern. The TS is composed by 31 daily load patterns of 24 elements. At first, we trained a SOM grid of 10x10 units. The initial learning rate, decreasing linearly to zero during the training, was 0.05 and the initial radius (in units) of the training area was 10. A training session of 10000 steps was performed. The resulting map is shown in Fig. 2, where the radius of the circles representing the units is proportional to the number of points classified by each unit and the day number is superimposed on the map. Four main activity bubbles, which can be separated by decision boundaries, are in evidence:

The computational speed of the Pao algorithm easily allows repeated tries in order to find a satisfactory value of the VP.

IV. APPLICATION LOAD TO PATTERN CLUSTERING The performances of the SOM and the Pao unsupervised clustering algorithm have been compared in the task of clustering daily load patterns. The aim is detecting the anomalous days and obtaining an adequate grouping of similar load patterns.

A) B) C) D)

normal working days; Sundays 1, 8, 15,22; Saturdays 7, 14, 2 1, Christmas and New Year Eves; Christmas Day, Thursday 26 (holiday), and Sunday 29.

Moreover, there are some anomalous days classified separately. They are 23, 27, 28 and 30 (working days

enclosed between holidays), whose attribution to one of four main bubbles by visual inspection is rather doubtful.

10x10 units. In the resulting map (Fig. 3), with MQE 12.2 MW, four main activity bubbles are evident:

A) normal working days; B) Sundays 14th, 21~1,28th, and Thursday 25th (Italian national holiday); C) Saturdays 6th, 13th, 20th, 27th; D) Sunday 8th (Easter) and Monday 9th (holiday). Moreover, there is an anomalous day (Friday 26th), classified separately because the near holidays obviously encourage a long weekend. The Pao clustering algorithm, applied with VP = 80 MW (the MED was 190 MW), produced the five clusters shown in Table 11. In this case, the classification resulted identical to the SOM's one.
TABLE I1 - April 1996: clustered load patterns Days MO 1 , Tu 2, We 3, Th 4, Fr 5, Tu 9, We 10, Th 11,

r30

4
27

1 0

Fig. 2. SOM trained on December 1996.

Cluster
1

A useful index of the quantisation quality of the SOM is the measure of the Mean Quantisation Error (MQE), given by:

2 3 4
5

Fr 12, MO 15, Tu 16, We 17, Th 18, Fr 19, MO 22, Tu 23, We 24, MO 29, Tu 30 Su 14, Su 21, Th 25, Su 28 Su 7, MO 8 Sa 6, Sa 13, Sa 20, Sa 27 Fr 26

where x is the generic sample and mi the winner map unit i for the sample xi. In the map of Fig. 2 the MQE resulted 11.4 MW. For applying the Pao clustering algorithm to the same case, we computed the MED of the TS, which resulted 207 MW. Therefore, we chose 100 MW as initial guess of VP. The algorithm produced the four clusters shown in Table I. They are very similar to those deduced by the SOM, and group the working days (cluster I), the Sundays (cluster 2), the Saturdays and the holiday's eves (cluster 3), and finally the Christmas holidays (cluster 4). The main difference i s that the anomalous days (December 23, 27, 28, 30) which SOM classified separately, now belong to clusters 2 and 3.

12
~~~ ~

Fig. 3. SOM trained on April 1996. Cluster Days 1 MO 2, Tu 3 , We 4, Th 5 , Fr 6, MO 9, Tu 10, We 11, Th 12, Fr 13, MO 16, Tu 17, Wc 18, Th 19, Fr 20 2 Su I , Su 8, Su 15, Su 22, Sa 28 Sa 7, Sa 14, Sa 21, MO 23, Tu 24, Fr 27, MO30, Tu 31 3 4 We 25, Th 26, Su 29 The computed MQE results 33.3 MW. This value is larger than that obtained by the SOM, but SOM has usually many units in the same activity bubble and therefore the average distance between samples and grid units is lesser. On the other hand, the data grouping i s more definite by the clustering algorithm, which produces only few cluster centres. In this case, the anomalous days have been compelled to enter in the main clusters.

v. LOAD FORECASTING ANOMALOUS OF DAYS


The main purpose of the clustering is to find classes of similar load patterns in historical database that, with suitable calendar transposition, allow the selection of an adequate TS for the present forecast period. This task is not trivial in the forecast of anomalous days, where the TS has to be carefully picked.

A . Cluster-based,forecatmodel We employed an original RBF-based algorithm, previously described in Reference 10. Because anomalous days do not have meaningful correlation with the immediately previous days, the method employs 24-hour load profiles picked in the same cluster. The load forecasting model has been

Case II. April 1996 The TS is composed of 30 daily load patterns with 24 elements each. Also in this case, we used a SOM grid of

accordingly modified: (3) where: t: time (hour) L (t): hourly load at time t h (t): hour (of day) number (lt24) The forecast procedure is then so arranged: (i) Cluster the load profiles of a given period (e.g. a month) of the previous year and check for anomalous days. (ii) Locate in the present year the anomalous day in corresponding calendar position. (iii) According to model (3), build up a TS composed by the load profiles of the present year in corresponding calendar position. (iv) Use the RBF in trainhecall mode according the method of Reference 10.

together with the actual load profile of Saturday 27 (line with circles). The hypothesis is then verified, as it is also shown in Fig. 6, where the very different load pattern of a normal Saturday (20 December) is presented together with the profiles of Fig. 5. Finally, the load forecast of Saturday 27 is obtained by training a RJ3F with the available similar load patterns, i.e. the Sundays 7, 14, and 21 and the Saturday 28 December 1996. The actual and forecast load patterns are shown in Fig. 7. The Mean Absolute Percentage Error (MAPE) resulted 4.4%, with a maximum absolute error of 13.8

MW.
220,

200 -

180 -

MW
160 -

140 -

B. Forecast examples
As first example, let's consider the 28 December 1996, a Saturday enclosed between the Christmas holidays, that was classified in a doubtful way by the SOM (see Fig. 2). On the contrary, the Pao algorithm classified that day in the cluster 2, together with the December Sundays. This fact is confirmed by the visual inspection of the load patterns. Fig. 4 shows the load patterns of the five days grouped in the cluster 2 of Table I. It is clear that the load profile of Saturday 28 (line with circles) is anomalous and very similar to those of four December Sundays.
220

120 -

."" 0

10 hour

15

20

25

Fig. 5 . Load profiles of 7, 14, 21,27 December 1997.


240

220

200

180

MW
160

200 140

180

120

MW
160

Sa 27 Dec.
100

10

15

20

25

hour
140

Fig. 6. Load profiles of the 7, 14,20, 21, 27 December 1997.e

120 -

220
0

10

15

20

25

hour

200
180
MW
160

Fig. 4. Load profiles of the cluster 2 of Table I. In order to exploit in December 1997 the information gathered in the previous year, the corresponding calendar position must be taken into account. In this year the day corresponding to Saturday 28 December 1996 is Saturday 27 December. Therefore, the hypothesis is that Saturday 27 would be an anomalous day and belong to the cluster of the Sundays of December 1997. The load profiles of the first three Sundays of December 1997 are shown in Fig. 5,

140 120 100

'

10

15

20

25

hour

Fig. 7. Load forecasting of Saturday 27 December 1997.

Similar results have been obtained in the forecast of Tuesday 30 December 1997. This is an anomalous weekday included between the Christmas and New Year holidays. The corresponding day in the previous year is Monday 30 December 1996, that is classified in a doubtful way by the SOM (see Fig. 2), but belongs to cluster 3 in Table I, with the Saturdays and some other anomalous days. Let's apply this information to December 1997. The load profiles of the first three Saturdays of December 1997 and the Christmas Eve are shown in Fig. 8, together with the actual load profile of Tuesday 30 (line with circles).
240 220 200

profiles are compared in Fig. 12. The MAPE is 4.3%, with a maximum absolute error of 14 MW.

160

150
MW

140 130 -

120 110

100 90 -

10

15
hour

20

25

Fig. 10. Load profiles of the 14,21,25,28 April 1996.


180 -

MW
160

180

140 120

170
160

MW

150
Ivv

10

15
hour

20

25

140
130

Fig. 8. Load profiles of the 6, 13, 20,24, 30 December 1997.


120

In order to forecast the load profile of Tuesday 30, we then build up a TS composed by the days 6, 13, 20 and 24 of December 1997, and Monday 30 December 1996. The result is shown in Fig. 9. The MAPE is 3.3%, with a maximum absolute error of 13.9 MW.
240

110 100

90

10

15 hour

20

25

Fig. 1 1. Load profiles of the 6, 13,20,25 April 1997.

- actual
MW

180

160

140

1
I
5
10
hour

140

120 L 0

120

15

20

25
iI n n I V"

Fig. 9. Load forecasting of Tuesday 30 December 1997. Finally, we applied this method to a spring midweek holiday, the 25 April (Italian National Holiday). For year 1996, both SOM and Pao algorithm classify this holiday together with the April Sundays, as shown in Fig. 3 and Table 11. Sunday 7 April 1996 is the Easter Sunday and has obliviously a separate cluster. These load profiles are shown in Fig. 10, where Thursday 25 April is represented by a line with circles. The corresponding load profiles in April 1997, presented in Fig. 11, show the same similarities of the previous year. The TS is composed by the Sundays 6, 13, 20 April 1997, and Thursday 25 April 1996. The forecast and actual load

10

15
hour

20

25

Fig. 12. Load forecasting of Friday 25 April 1997.

VI. DISCUSSION CONCLUSION AND


The previous results show that the SOM performs a detailed clustering, which highlights the similarities among patterns and, above all, facilitates the understanding through visual representation. However, in some cases it could be difficult to draw the decision boundaries that separate the clusters. In most of the presented examples, only the visual inspection of the load patterns allows to

draw the correct boundary. The problem could derive from the data nature, but also from the grid shape and the training parameters. Therefore, in order to obtain a satisfactory map it is often necessary a time consuming trial and error procedure. The Pao clustering algorithm does not produce a refined visual projection of the data space, nevertheless it performs a fast and efficient clustering. In fact, the pattern clusters in the previous examples are almost identical to those obtained by the SOM. The advantage is that there is only one training parameter, the VP, and the execution time is almost negligible. Obviously, it is possible to reduce the quantisation error by reducing the VP. The optimal clustering becomes then a compromise between the average cluster variance and the number of clusters. Also in this case, a trial and error procedure is required, but it is made easier by the higher execution speed. In order to forecast the hourly load pattern of anomalous days, the combined use of both clustering methods proved very effective in discovering the hidden similarities. The experimental results are quite good for anomalous days, because a MAPE of 4% corresponds in our data to an average absolute error of about 8 MW. In fact, the useh1 historical references for anomalous days are rather far in the past and short-term effects are hard to capture. We conclude that the SOM produces a useful planar map of the clustered data, but sometimes it does not explain the data that fall out of the activity bubbles. On the other hand, the Pao algorithm (as other clustering algorithms) is less suited for the visual approach, but allows a simple direct measure of the pattern similarity. Therefore, the wise use of both methods seems to be the better approach.

VII. REFERENCES H.S. Hippert, C.E. Pedreira, R.C. Souza, neural Networks for Short-Term Load Forecasting: A Review and Evaluation, IEEE Transactions on Power Systems, Vol. 16, N. 1, Feb. 2001, pp. 44-55 M. Djukanovic, B. Babic, D.J. Sobajic, Y.-H. Pao, Unsupervised/supervised learning concept for 24-hour load forecasting, ZEE Proceedings-C, 140, 1993, pp. 311-318 Y.-Y. Hsu, C.-C. Yang, Design of artificial neural networks for short-term load forecasting. Part 1: Self-organising feature maps for day type identification, IEE Proceedings-C, 138, 1991, pp. 407-413 R. Lamedica, A. Prudenzi, M. Sforna, M. Caciotta, V. Orsolini Cencelli, A neural network based technique for short-term forecasting of anomalous load periods, IEEE Transactions on Power Systems, Vol. 11, N. 4, Nov. 1996, pp. 1749-1756 A.J. Germond, N. Macabrey, T. Baumann, Application of artificial neural networks to load forecasting, Proc. ZNNSSummer Workshop Tveural Networks Computing for the Electric Power Industry, Stanford, CA (August 17-19

1992), pp. 165-171


T. Kohonen, Self-organisation and Associative Memory (3rd edn.), Springer-Verlag, Berlin, I989 T. Kohonen et al., SOM-PAK, The SelflOrganising Map Program Package, Users Guide, Helsinki University of

Technology, 1995 J.T. Tou, R.C. Gonzalez, Pattern Recognition Principles, Addison-Wesley, 1974 Y.-H. Pao, Adaptive Pattern Recognition and Neural Networks, Addison-Wesley, Reading, MA, USA. 1989 [ 101E.Bompard, E.Carpaneto, G.Chicco, R.Napoli, F.Piglione, Short-term load forecasting of a small electric utility by a fast learning RBF neural network, Proc. PMAPS 2000, Funchal, Madeira, Portugal, September 25-28, 2000, V01.2, paper FOR- 129
I