Documente Academic
Documente Profesional
Documente Cultură
Pauli Murto
Supervisor: Instructor:
Neural network models for short-term load forecasting January 5, 1998 Pages: 92
Department:
Chair:
Supervisor: Instructor:
Abstract:
Neural network techniques have been recently suggested for short-term load forecasting by a large number of researchers. This work studies the applicability of this kind of models. The work is intended to be a basis for a real forecasting application. First, a literature survey was conducted on the subject. Most of the reported models are based on the so-called Multi-Layer Perceptron (MLP) network. There are numerous model suggestions, but the large variation and lack of comparisons make it difficult to directly apply proposed methods. It was concluded that a comparative study of different model types seems necessary. Several models were developed and tested on the real load data of a Finnish electric utility. Most of them use a MLP network to identify the assumed relation between the future load and the earlier load- and temperature data. The models were divided into two classes. First, forecasting the load for a whole day at once was studied. Then hourly models, which are able to update the forecasts as new data arrives, were considered. The test results showed, that the hourly models are more suitable for a forecasting application. The forecasting errors were smaller than with a SARIMAX model, which was tested for the comparative purpose. The work suggests that this kind of an hourly neural network model should be implemented for a thorough on-line testing in order to get a final opinion on its applicability.
Key words:
Short-Term Load Forecasting (STLF), Neural Networks, Multi-Layer Perceptron (MLP) networks
Library code:
(filled by the secretary)
TEKNILLINEN KORKEAKOULU Tekij: Tyn nimi: English title: Pivmr: Pauli Murto
DIPLOMITYN TIIVISTELM
Shknkulutuksen lyhyen aikavlin ennustaminen neuroverkkomalleilla Neural network models for short-term load forecasting 5. 1. 1998 Sivumr: 92
Osasto:
Tiivistelm:
Viime aikoina useat tutkijat ovat ehdottaneet neuroverkkoihin perustuvia malleja shknkulutuksen lyhyen aikavlin ennustamiseen. Tss tyss tutkitaan tllaisten mallejen soveltuvuutta. Ty on tarkoitettu perustaksi varsinaiselle ennustesovellukselle. Tyss perehdyttiin ensin aiheeseen liittyvn kirjallisuuteen. Useimmat kirjallisuudessa esitetyist menetelmist perustuvat niin kutsuttuun monikerrosperceptron (Multi-Layer Perceptron, MLP) verkkoon. Vaikka ehdotettuja malleja on paljon, niiden vaihtelevuus ja keskinisen vertailun puuttuminen tekevt menetelmien suoran soveltamisen vaikeaksi. Pteltiin, ett on vlttmtnt vertailla kokeellisesti erilaisia mallityyppej. Tyss kehitettiin erityyppisi malleja, joita tutkittiin ern suomalaisen shklaitoksen todellisella kulutusaineistolla. Useimmat mallit perustuvat monikerrosperceptron verkkoon. Tll pyrittiin mallintamaan tulevien kulutusarvojen oletettu riippuvuus aiemmista kulutus- ja lmptila-arvoista. Mallit jaettiin kahteen luokkaan. Ensin tutkittiin koko vuorokauden k ulutuskyrn ennustamista yhdell kertaa. Sitten siirryttiin tarkastelemaan tunneittaisia malleja, joilla ennustetta voidaan pivitt aina kun saadaan uutta tietoa. Koetulokset nyttivt, ett tunneittaiset mallit ovat sopivampia ennustesovellusta ajatelleen. Ennustusvirheet olivat pienempi kuin SARIMAX mallilla, jota tarkasteltiin vertailukohteena. Tyss ehdotetaan, ett tllinen tunneittainen neuroverkkomalli tulisi ottaa perusteelliseen koekyttn, jotta saataisiin lopullinen ksitys sen soveltuvuudesta.
Avainsanat:
Ei lainata ennen:
Tyn sijaintipaikka:
(osastosihteeri tytt)
Preface
I have prepared this thesis at Control Systems unit of ABB Power Oy. I want to thank the head of the unit, Aimo Sorsa, for providing the opportunity to do the work. I would also like to thank my instructor Arto Juusela and supervisor Raimo P. Hmlinen. Juha Toivari deserves my thanks for many practical hints. Particularly, I want to thank Tuomas Raivio for critical comments and valuable advice regarding this work.
CONTENTS
INTRODUCTION................................................................................................................... 7 1.1 1.2 1.3 BACKGROUND....................................................................................................................... 7 PURPOSE OF THE WORK.......................................................................................................... 8 STRUCTURE OF THE WORK ..................................................................................................... 9
LOAD FORECASTING ....................................................................................................... 10 2.1 2.2 2.3 FACTORS AFFECTING THE LOAD ........................................................................................... 10 PROPERTIES OF THE LOAD CURVE ......................................................................................... 11 POSSIBLE APPROACHES ........................................................................................................ 15
NEURAL NETWORKS IN LOAD FORECASTING .......................................................... 21 3.1 MULTI-L AYER PERCEPTRON NETWORK (MLP) ..................................................................... 22
Description of the network....................................................................................................... 22 Learning ................................................................................................................................. 24 Generalization ........................................................................................................................ 25 3.2 3.3 MLP NETWORKS IN LOAD FORECASTING .............................................................................. 26 LITERATURE SURVEY........................................................................................................... 28
Peak, valley, and total load forecasting ............................................................................................... 29 Hourly forecasting ............................................................................................................................. 30
Unsupervised learning models................................................................................................. 31 Other reported approaches...................................................................................................... 33 Summary................................................................................................................................. 35 4 FORECASTING THE DAILY LOAD PROFILE ............................................................... 37 4.1 FORECASTING WITH PEAK, VALLEY, AND AVERAGE LOADS ................................................... 37
Description of the MLP network models .................................................................................. 37 Predicting the shape of the load curve ..................................................................................... 40
Classifying the days ........................................................................................................................... 40 The load shape based on peak- and valley loads ..................................................................................41 The load shape based on average load.................................................................................................41
Test results.............................................................................................................................. 42
Error measure ....................................................................................................................................42 Peak, valley, and average load forecasts ............................................................................................. 43 Load shape forecasts.......................................................................................................................... 48 Combining peak, valley, and average load forecasts with load shape predictions ..................................52
4.2
Basic idea ............................................................................................................................... 54 Using Kohonen's self organizing feature map .......................................................................... 55
Overview of the model.......................................................................................................................55 Test results ........................................................................................................................................56
MODELS FOR HOURLY FORECASTING ....................................................................... 61 5.1 5.2 CHOOSING THE STRUCTURE OF THE MODELS......................................................................... 61 THE MODEL FORECASTING HOUR BY HOUR ........................................................................... 63
Description of the model ......................................................................................................... 63 Results in forecasting one day at a time ................................................................................... 65
Forecasting without temperature data .................................................................................................65 Including the temperature ...................................................................................................................70
The average errors for different lead times .............................................................................. 74 Results in forecasting for one week at once.............................................................................. 76 5.3 5.4 COMPARISON TO A SARIMAX MODEL................................................................................. 77 UTILIZING ONLY THE MOST RECENT LOAD VALUES................................................................ 80
One network for all hours ...................................................................................................................80 Separate networks for different hours .................................................................................................83
Test results.............................................................................................................................. 80
5.5 6
SUMMARY........................................................................................................................... 85
CONCLUSIONS ................................................................................................................... 86
Chapter 1. Introduction
1 Introduction
1.1 Background
Load forecasting is one of the central functions in power systems operations. The motivation for accurate forecasts lies in the nature of electricity as a commodity and trading article; electricity can not be stored, which means that for an electric utility, the estimate of the future demand is necessary in managing the production and purchasing in an economically reasonable way. In Finland, the electricity markets have recently opened, which is increasing the competition in the field. Load forecasting methods can be divided into very short-, short-, mid- and long-term models according to the time span (see for example Karanta 1990). In very-short term forecasting the prediction time can be as short as a few minutes, while in long-term forecasting it is from a few years up to several decades. This work concentrates on short-term forecasting, where the prediction time varies between a few hours and about one week. Short-term load forecasting (STLF) has been lately a very commonly addressed problem in power systems literature. One reason is that recent scientific innovations have brought in new approaches to solve the problem. The development in computer technology has broadened possibilities for these and other methods working in a realtime environment. Another reason may be that there is an international movement towards greater competition in electricity markets (Rsnen 1995). Even if many forecasting procedures have been tested and proven successful, none has achieved a strong stature as a generally applied method. A reason is that the circumstances and requirements of a particular situation have a significant influence on choosing the appropriate model. The results presented in the literature are usually not directly comparable to each other. A majority of the recently reported approaches are based on neural network techniques (see section 3.3). Many researchers have presented good results. The attraction of the methods lies in the assumption that neural networks are able to learn properties of the load, which would otherwise require careful analysis to discover.
Chapter 1. Introduction However, the development of the methods is not finished, and the lack of comparative results on different model variations is a problem. Therefore, to make use of the techniques in a real application, a comparative analysis of the properties of different model types seems necessary.
Chapter 1. Introduction This work does not study the forecasting for special days, such as religious and legal holidays. Special days have different consumption profiles from ordinary days, which makes forecasting very difficult for them. When implementing a real application, a means to take these days into account has to be found. The most common approach, but not necessarily the best one, is to treat them as Sundays.
2 Load forecasting
2.1 Factors affecting the load
Generally, the load of an electric utility is composed of very different consumption units. A large part of the electricity is consumed by industrial activities. Another part is of course used by private people in forms of heating, lighting, cooking, laundry, etc. Also many services offered by society demand electricity, for example street lighting, railway traffic etc. Factors affecting the load depend on the particular consumption unit. The industrial load is usually mostly determined by the level of the production. The load is often quite steady, and it is possible to estimate its dependency on different production levels. However, from the point of view of the utility selling electricity, the industrial units usually add uncertainty in the forecasts. The problem is the possibility of unexpected events, like machine breakdowns or strikes, which can cause large unpredictable disturbances in the load level. In the case of private people, the factors determining the load are much more difficult to define. Each person behaves in his own individual way, and human psychology is involved in each consumption decision. Many social and behavioral factors can be found. For example, big events, holidays, even TV-programs, affect the load (Gross and Galiana 1987, Karanta 1990, Kim et al. 1995). The weather is the most important individual factor, the reason largely being the electric heating of houses, which becomes more intensive as the temperature drops (Kallio 1985). As a large part of the consumption is due to private people and other small electricity customers, the usual approach in load forecasting is to concentrate on the aggregate load of the whole utility. This is also the approach taken in this work. This reduces the number of factors that can be taken into account, the most important being (Gross and Galiana, 1987): In the short run, the meteorological conditions cause large variation in this aggregated load. In addition to the temperature, also wind speed, cloud cover, and humidity have an influence (see, e.g., Chow and Leung 1996, Kallio 1985, Khotanzad et al. 1996).
10
Chapter 2. Load forecasting In the long run, the economic and demographic factors play the most important role in determining the evolution of the electricity demand. From the point of view of forecasting, the time factors are essential. By these, various seasonal effects and cyclical behaviors (daily and weekly rhythms) as well as occurrences of legal and religious holidays are meant. The other factors causing disturbances can be classified as random factors. These are usually small in the case of individual consumers, although large social events and popular TV-programs add uncertainty in the forecasts. Industrial units, on the other hand, can cause relatively large disturbances. Only short-term forecasting will be dealt in this work, and the time span of the forecasts will not range further than about one week ahead. Therefore, the economic and demographic factors will not be discussed. The decision to combine all consumption units into one aggregate load means that the forecasting rests largely on the past behavior of the load. Time factors play the key role in the analysis of this work. In the next section, the load behavior of a Finnish electric utility is examined, and some basic properties of the time series are discussed.
Chapter 2. Load forecasting The load curve over one year is shown in figure 2.1. The seasonal trend can be easily seen; in the winter, the average load is about twice as high as in the summer. The extent of this property is a special characteristic of Finland's load conditions, and is due to great differences between the weather conditions of different seasons of the year.
90 80 70 60 50 MW 40 30 20 10 0
1000
2000
3000
6000
7000
8000
Figure 2.1: load over the period May 24, 1996 to May 23, 1997
There are also shorter cyclical effects, which can be seen from the autocorrelation function of the time series. This is shown in figure 2.2. The peaks in 24, 48, 72, indicate the daily rhythm, and the peaks in the multiplies of 168 means that also a weekly rhythm exists.
12
0.9
0.8
0.7
0.6
0.5
0.4
50
100
150
200
250
300
350
400
450
500
Figure 2.2: The sample autocorrelation function of the load series shown in figure 2.1.
The weekly rhythm originates from the working day weekend rhythm obeyed by most people. On working days social activities are at a higher level than on Saturdays and Sundays, and therefore the load is also higher. In figure 2.3, the load over two succeeding weeks in April 1997 is shown. The series begins with five quite similar patterns, which are the load curves of Monday-Friday. Then two different patterns for Saturday and Sunday follow. This same weekly pattern is then repeated.
70 65 60 55 MW 50 45 40 35 30
50
100
150 Days
200
250
300
350
Figure 2.3: The load over the period April 14 April 27, 1997. The first day is Monday.
13
Chapter 2. Load forecasting The daily rhythm on the other hand results from the synchronous behavior of people during the day. Most people sleep at night, and therefore the load is low at night hours. Also during the day, many activities tend to be simultaneous for a majority of people (working time, lunch hour, TV-watching etc.). The daily rhythm changes throughout the year. In figure 2.4, load curves of typical Wednesdays at different parts of the year are shown.
July 3, 1996 80 60 MW 40 20 0 MW 80 60 40 20 0 October 23, 1996
25
20
25
80 60 MW 40 20 0 MW 0 5 10 15 Hours 20 25
80 60 40 20 0
10 15 Hours
20
25
As seen in figure 2.3, there are of course differences also between the days at the same season. Therefore, in load forecasting, days are often divided into several day types, each of which has their own characteristic load patterns. It is clear that Saturdays and Sundays have different load curves than other days. Often also Mondays and/or Fridays are separated from other working days, because the closeness of the weekend can have a slight effect on the load (see, e.g., Kim et al. 1995). A more difficult question is the classification of the special days (for example legal and religious holidays). Sometimes they are classified in the same category with Sundays (e.g. Hsu and Yang 1991). However, different special days have different load profiles. In this work, classifying of the days will be used in all forecasting models. The guiding principle is that three distinct classes are: 1) Mondays-Fridays, 2) Saturdays, and 3) Sundays. The classifying of the special days is not examined. 14
15
Chapter 2. Load forecasting The problem is the large amount of work needed in constructing adequate models for each consumer category. The division between peak load models and load shape models is quite fundamental. The peak load models only forecast the daily peak loads, and load shape models forecast load values for all hours (or half-hours). This work concentrates on load shape models, although peak load forecasting is treated in chapter 4 as the first step of creating the hourly forecast.
z (t ) = i f i (t ) + v(t ) ,
i=1
(2.1)
where the load at time t is expressed as a weighted sum of explicit time functions, usually sinusoids with a period of 24 or 168. The coefficients i are slowly varying constants being usually estimated through a linear regression or exponential smoothing. The modeling error v(t) is assumed to be white noise. Spectral decomposition is another time-of-day model. The model has basically the same form as in (2.1), but the time functions fi(t) represent the eigenfunctions corresponding to the autocorrelation function of the load time series. This kind of functions can in principle represent the colored random loads with greater precision than arbitrarily selected time functions. Time-of-day models have been suggested, for example, by Sharma (1974) and Thompson (1976). An example of applying spectral decomposition can be found in Laing (1985).
16
Chapter 2. Load forecasting Regression models Regression models normally assume that the load can be divided into a standard load component and a component linearly dependent on some explanatory variables. The model can be written:
z (t ) = b(t ) +
a y (t ) + (t ) ,
i=1 i i
(2.2)
where b(t) is the standard load, (t ) is a white noise component, and yi(t) are the independent explanatory variables. The most typical explanatory variables are weather factors. A typical regression model has been used by Rsnen and Ruusunen (1992). They model different consumer categories by separate regression models. The load is divided into a rhythm component and a temperature dependent component. The rhythm component corresponds to the load of a certain hour in the average temperature of the modeling period. More complicated model variations have also been proposed. Some models use earlier load values as explanatory variables in addition to external variables (e.g. Papalexopoulos and Hesterberg 1990). Regression models are among the oldest methods suggested for load forecasting. They are quite insensitive to occasional disturbances in the measurements. The easy implementation is another strength. The serial correlation, which is typical when regression models are used on time series, can cause problems. Stochastic time series models This is a very popular class of dynamic forecasting models (see, e.g., Karanta and Ruusunen 1991, Piggott 1985, Hagan and Behr 1987). There are many names encountered in the literature for the class, for example ARMA (autoregressive moving average) models, ARIMA (integrated autoregressive moving average) models, Box-Jenkins method, linear time series models, etc. A general treatment of the model type can be found in, e.g., Pindyck and Rubinfeld (1991). The basic principle is that the load time series can first be transformed into a stationary time series (i.e. invariant with respect to time) by a suitable differencing. Then the remaining stationary series can be filtered into white noise. The models 17
Chapter 2. Load forecasting assume that the properties of the time series remain unchanged for the period used in model estimation, and all disturbances are due to this white noise component contained in the identified process. The basic ARIMA model can be written: ( B ) d z (t ) = ( B) a (t ) , where z(t) , t = 1, , N is the modeled time series a(t), t = 1, , N is a white noise sequence
p ( B) = 1 1 B ... p B , the AR parameter polynomial
(2.2)
This basic ARIMA model is not by itself suitable for describing the load time series, since the load series incorporates seasonal variation. Therefore, the differencing with the period of seasonal variation (usually 24 and 168) is required. The model then obtained is called a seasonal ARIMA (SARIMA) model and can be written (Box and Jenkins, 1976):
S d D S ( B ) S ( B ) S z (t ) = ( B )S ( B ) a (t ) ,
(2.3)
where
S D D and S is the period of the seasonal variation. S = (1 B )
An external input variable, such as temperature in the case of load time series, can also be included in the model. Such a variant of the ARIMA model is called an ARIMAX model, and can in general be written: ( B ) d z (t ) = w( B) x(t b) + ( B ) a(t ) , where x(t) is the external variable at time t 18 (2.4)
The ARIMA model including both, the seasonal variation and external variable is sometimes called a SARIMAX model. Such a model will be tested in this work as a comparison to neural network models (section 5.3). The stochastic time series models have many attractive features. First, the theory of the models is well known and therefore it is easy to understand how the forecast is composed. The properties of the model are easy to calculate; the estimate for the variance of the white noise component allows the confidence intervals for the forecasts to be created. The model identification is also relatively easy. Established methods for diagnostic checks are available. Moreover, the estimation of the model parameters is quite straightforward, and the implementation is not difficult. The weakness in the stochastic models is in the adaptability. In reality, the load behavior can change quite quickly at certain parts of the year. While in ARIMA models the forecast for a certain hour is in principle a function of all earlier load values, the model can not adapt to the new conditions very quickly, even if model parameters are estimated recursively. A forgetting factor can be used to give more weight to the most recent behavior and thereby improve the adaptability. Another problem is the handling of the anomalous load conditions. If the load behavior is abnormal on a certain day, this deviation from the normal conditions will be reflected in the forecasts into the future. A possible solution to the problem is to replace the abnormal load values in the load history by the corresponding forecast values. State-space models In the linear state-space model, the load at time t can be written:
z (t ) = cT x(t ) ,
where
(2.5)
(2.6)
19
Chapter 2. Load forecasting The state vector at time t is x(t), and u(t) is a weather variable based input vector. w(t) is a vector of random white noise inputs. Matrices A, B, and the vector c are assumed constants. There exist a number of variations of the model. Some examples can be found in, e.g., Toyoda et al. (1970), Gupta and Yamada (1972), Campo and Ruiz (1987). In fact, the basic state-space model can be converted into an ARIMA model and vice versa, so there is no fundamental difference between the properties of the two model types. According to Gross and Galiana (1987), a potential advantage over ARIMA models is the possibility to use a priori information in parameter estimation via Bayesian techniques. Yet, they point out that the advantages are not very clear and more experimental comparisons are needed. Expert systems Expert systems are heuristic models, which are usually able to take both quantitative and qualitative factors into account. Many models of this type have been proposed since the mid 1980's. A typical approach is to try to imitate the reasoning of a human operator. The idea is then to reduce the analogical thinking behind the intuitive forecasting to formal steps of logic (Rahman and Bhatnagar 1988, Rahman and Hazim 1993). A possible method for a human expert to create the forecast is to search in history database for a day that corresponds to the target day with regard to the day type, social factors and weather factors. Then the load values of this similar day are taken as the basis for the forecast. An expert system can thereby be an automated version of this kind of a search process (Jabbour et al. 1988). On the other hand, the expert system can consist of a rule base defining relationships between external factors and daily load shapes. Recently, a popular approach has been to develop rules on the basis of fuzzy logic (see, e.g., Hsu and Ho 1992, Kim et al. 1995, Momoh and Tomsovic 1995). The heuristic approach in arriving at solutions makes the expert systems attractive for system operators; the system can provide the user with the line of reasoning followed by the model (Asar and McDonald 1994).
20
21
Chapter 3. Neural networks in load forecasting Mohammed et al. 1995, Khotanzad et al. 1995). The attraction of MLP has been explained by the ability of the network to learn complex relationships between input and output patterns, which would be difficult to model with conventional algorithmic methods. In the models, inputs to the network are generally present and past load values and outputs are future load values. The network is trained using actual load data from the past. In addition to MLP forecasters, models based on unsupervised learning have also been suggested for load forecasting (see, e.g., Hsu and Yang 1991, Djukanovic et al. 1993, Lamedica et al. 1996). The purpose of these models can be the classification of the days into different day types, or choosing the most appropriate days in the history to be used as the basis for the actual load forecasting.
y = ( i wi xi ) ,
n
(3.1)
where
y is the output
Possible forms of the activity function are linear function, step function, logistic function and hyperbolic tangent function. 22
Chapter 3. Neural networks in load forecasting The MLP network consists of several layers of neurons. Each neuron in a certain layer is connected to each neuron of the next layer. There are no feedback connections. A three-layer MLP network is illustrated in figure 3.1.
HIDDEN LAYER INPUT LAYER X1 Y1 X2 OUTPUT LAYER
. . .
Xn
. . .
. . .
Yn
As an N-dimensional input vector is fed to the network, an M-dimensional output vector is produced. The network can be understood as a function from the Ndimensional input space to the M-dimensional output space. This function can be written in the form: y = f ( x; W ) = ( Wn ( Wn 1 (... ( W1x)...))) , where
y is the output vector
(3.2)
The most often used MLP-network consists of three layers: an input layer, one hidden layer, and an output layer. The activation function used in the hidden layer is usually nonlinear (sigmoid or hyperbolic tangent) and the activation function in the output layer can be either nonlinear (a nonlinear-nonlinear network) or linear (a nonlinearlinear network).
23
Chapter 3. Neural networks in load forecasting The neural network of this type can be understood as a function approximator. It has been proved that given a sufficient number of hidden layer neurons, it can approximate any continuous function from a compact region of RN to RM at an arbitrary accuracy (Funahashi 1989, Hornik et al. 1989).
Learning
The network weights are adjusted by training the network. It is said that the network learns through examples. The idea is to give the network input signals and desired outputs. To each input signal the network produces an output signal, and the learning aims at minimizing the sum of squares of the differences between desired and actual outputs. From here on, we call this function the sum of squared errors. The learning is carried out by repeatedly feeding the input-output patterns to the network. One complete presentation of the entire training set is called an epoch. The learning process is usually performed on an epoch-by-epoch basis until the weights stabilize and the sum of squared errors converges to some minimum value. The most often used learning algorithm for the MLP-networks is the back-
propagation algorithm. This is a specific technique for implementing gradient descent method in the weight space, where the gradient of the sum of squared errors with respective to the weights is approximated by propagating the error signals backwards in the network. The derivation of the algorithm is given, for example, in Haykin (1994). Also some specific methods to accelerate the convergence are explained there. A more powerful algorithm is obtained by using an approximation of Newton's method called Levenberg-Marquardt (see, e.g., Bazaraa 1993). In applying the algorithm to the network training, the derivatives of each sum of squared error (i.e. with each training case) to each network weight are approximated and collected in a matrix. This matrix represents the Jacobian of the minimized function. The Levenberg-Marquardt approximation is used in this work to train the MLP networks. In essence, the learning of the network is nothing but estimating the model parameters. In the case of the MLP model, the dependency of the output on the model parameters is however very complicated as opposed to the most commonly used mathematical models (for example regression models). This is the reason why the iterative learning is required on the training set in order to find suitable parameter 24
Chapter 3. Neural networks in load forecasting values. There is no way to be sure of finding the global minimum of the sum of squared error. On the other hand, the complicated nonlinear nature of the input-output dependency makes it possible for a single network to adapt to a much larger scale of different relations than for example regression models. That is why the term learning is used in connection with neural network models of this kind.
Generalization
The training aims at minimizing the errors of the network outputs with regard to the input-output patterns of the training set. The success in this does not, however, prove anything about the performance of the network after the training. More important is the success in generalization. A network is said to generalize well, when the output is correct (or close enough) for an input, which has not been included in the training set. A typical problem with network models is overfitting, also called memorization in the network literature. This means that the network learns the input-output patterns of the training set, but at the same time unintended relations are stored in the synaptic weights. Therefore, even though the network provides correct outputs for the input patterns of the training set, the response can be unexpected for only slightly different input data. Generalization is influenced by three factors: the size and efficiency of the training set, the model structure (architecture of the network), and the physical complexity of the problem at hand (Haykin 1994). The latter of these can not be controlled, so the means to prevent overfitting are limited to affecting the first two factors. The larger the training set, the less likely the overfitting is. However, the training set should only include input-output patterns that correctly reflect the real process being modeled. Therefore, all invalid and irrelevant data should be excluded. The effect of the model structure in the generalization can be seen in two ways. First, the selection of the input variables is essential. The input space should be reduced to a reasonable size compared to the size of the training set. If the dimension of the input space is large, then the set of observations can be too sparse for a proper generalization. Therefore, no unnecessary input variables should be included, because the network can learn dependencies on them that do not really exist in the real
25
Chapter 3. Neural networks in load forecasting process. On the other hand, all factors having a clear effect on the output should be included. The larger the number of free parameters in the model, the more likely the overfitting is. Then we speak of over-parameterization. Each hidden layer neuron brings a certain number of free parameters in the model, so in order to avoid over-parameterization, the number of hidden layer neurons should not be too large. There is a rough rule of thumb for a three-layered MLP (Oja). Let H = number of hidden layer neurons N = size of the input layer M = size of the output layer T = size of the training set The number of free parameters is roughly W=H(N+M). This should be smaller than the size of the training set, preferably about T/5. Thereby, the size of the hidden layer should be approximately:
T H 5( N + M )
(3.3)
In order to be sure of a proper generalization, the network model, like any mathematical model, has to be validated. This is a step in system identification, which should follow the choosing of the model structure and estimating the parameters. The validation of a neural network model can be carried out on the principle of a standard tool in statistics known as cross-validation. This means that a data set, which has not been used in parameter estimation (i.e. training the network), is used for evaluation of the performance of the model.
Chapter 3. Neural networks in load forecasting Therefore, the building of a MLP model for load forecasting can be seen as a nonlinear system identification problem. The determining of the model structure consists of selecting the input variables and deciding the network structure. The parameter estimation is carried out by training the network on load data of the history. This requires choices concerning the learning algorithm and appropriate training data. The model validation is carried out by testing on load data, which has not been used in training. However, the modeling with neural networks is different to modeling with linear system models. The nonlinearity and the great adaptability of the network models make it possible to use specific indicators as input variables. In the case of load forecasting, the hour of the day and day type of the target hour, for instance, can be included as binary codes in the network input. The network model can be understood to be based on pattern recognition functions, where different input patterns are mapped in different ways. This makes the models very different to, for example, ARIMA models, which assume that the load time series can be made stationary (i.e. invariant with respect to time) with suitable filters. The handling of the special load conditions is easier for neural network models than for ARIMA models. Another matter supporting neural network models, is the relatively rapid changing of the characteristics in the load behavior. This is a problem with statistical models, because they can not always keep up with the sudden changes in the dependencies of the load. For example, the beginnings of holiday seasons etc. can change the load behavior rapidly. As neural network models are in essence based on pattern recognition functions, they can in principle be hoped to recognize the changed conditions without re-estimating the parameters. This requires of course that conditions corresponding to the new situation have been used in training, and that network inputs contain the information necessary for recognizing the conditions. On the other hand, a problem with MLP models is the black-box like description of the dependencies of the future values on the past behavior. The understanding of the model is very difficult; the common sense can hardly be applied in order to see how the outputs depend on inputs. The responding of the model to an input pattern, which is very different to any experienced during the learning, can be unexpected. This can happen in new conditions, even if the model is validated with test data.
27
Chapter 3. Neural networks in load forecasting Another problem is the lack of general procedures to build the models. As the MLP models do not assume a specific functional form for the modeled relations, selecting the appropriate model structure is more heuristic than, for example, in the case of ARIMA models.
Basic MLP-models
The literature about short-term load forecasting with MLP neural network models can be roughly divided into three categories with regard to the forecasting target. These different model types are intended for: forecasting daily peak-, valley- or total load forecasting the whole daily load curve at one time forecasting the load of the next hour
The models of the first two categories are static in the sense that the forecasts are not adapted during the day. The third model type is usually used recursively in order to forecast further than just one hour ahead. The model is dynamic, since the forecast can be updated every time new data arrives.
28
Chapter 3. Neural networks in load forecasting There are also many other factors that make the models different from each other. These differences can be for example in: the use of the weather data the other input variables network architecture training algorithm selection of the training data
In the following, the models forecasting only peak-, valley-, or total loads are treated first, following with the hourly forecasting models (also called load shape models). Peak, valley, and total load forecasting There are a few articles on forecasting peak-, valley- or total loads of the day with an MLP network. The highest load values are usually the most crucial for the electric utilities to know in advance. Also, forecasts for the peak-, valley-, and total loads of a day can be used as the first step in forecasting the whole daily load curve. Park et al. (1991a) study the peak- and total load forecasting with a very basic threelayer MLP. The input variables to the network include only the maximum, minimum, and average temperatures of the target day. Thereby, the network is used in modeling only the dependencies on temperature values; no previous load values are used as predictors. The results are very good, average forecasting errors of five different test sets being 2.04 % for the peak load forecasts, and 1.68 % for the total load forecasts. The article also discusses hourly forecasting. Ho et al. (1992) use a much larger network structure for peak- and valley load forecasting. The 46 input variables for the three-layer MLP include the forecast high temperatures for the target day in three different areas, three recorded temperature values of the previous day, and three temperature values and the peak (valley) load value in the past ten days of the similar day type as the target day. 60 hidden layer neurons are used. This massive network structure is trained on only 30 input-output patterns. The training algorithm is a variation of the back-propagation, where the momentum is adapted. Results are good, but only provided for a few test days.
29
Chapter 3. Neural networks in load forecasting The motivation of Ho et al. in forecasting peak- and valley loads is to use them in connection with a rule-based expert system developed in another work of theirs. This system provides 24 normalized hourly values for each day type, which can be scaled to match the peak- and valley forecasts. Peng et al. (1992) forecast the daily total load, and present a simple search procedure for selecting the most relevant training cases. The proposed network architecture is a modification of a normal three-layered MLP, where the input nodes have straight linear connections to the output layer in addition to the normal hidden layer connections. The five inputs to the network are the forecast maximum and minimum temperatures of the target day, and maximum and minimum temperatures as well as the total load of the previous day. Asar and McDonald (1994) compare different input structures and data normalizations in peak load forecasting. The best results were obtained with the simplest input structure, which only contains the peak loads of the previous day and the days one week and one month before. The input structures containing temperature values did not improve the results. The end of the article is devoted to a discussion on possibilities of a hybrid system combining neural networks and supervisory expert systems. Hourly forecasting The hourly load forecasts can be obtained either by forecasting the whole daily load curve at one time with a multi-output network (a static approach), or by forecasting the load with a single-output network for one hour at a time (a dynamic approach). At least two articles discussed in the following compare these approaches. Lee and Park (1992) propose two different models. In the static model, the day is divided into three parts, which have separate network weights. The input to the network includes the load values of the corresponding part of the day of a few previous days of the same type. The weekday load patterns and weekend-day load patterns are treated separately. In the dynamic model, the input to the network includes the load values of a few previous hours, and these same hours on a few previous days. Neither of the models uses temperature data. The results with both models are quite similar, but the dynamic model appears to be slightly better in the tests. 30
Chapter 3. Neural networks in load forecasting Lu et al. (1993) use historical load data of two electric utilities in different parts of the world to investigate whether MLP models are system dependent, and/or case dependent. Static and dynamic models are used for both cases. The inputs to the networks include past load and temperature data and also the hour-of-day and day-ofweek information. The training sets of the length of one or two months are used. The results indicate that there are no firm criteria to select a suitable network structure. Models are not unique, and different systems require different model structures. The dynamic approach seems again slightly better than the static one. Another articles on hourly models are by Park et al. (1991b) and Chen et al. (1992). These use the dynamic approach. In the former, the input to the network only contains load and temperature values of the two previous hours, the forecast temperature for the target hour, and the hour of the day (the target hour). The average forecasting errors are less than 2 % for all five test sets. Chen et al. (1992) use a nonfully connected network structure. This means that the hidden layer neurons are divided into certain clusters, and each input neuron is connected only to some clusters. The overall network is a combination of these smaller supporting networks. There are 31 input nodes containing past load and temperature data, temperature-forecast data, and hour-of-day and day-of-week information. The results show an improvement over an ARIMA model, which is used as a comparison.
Chapter 3. Neural networks in load forecasting classifications obtained with the self-organizing map are not always unique, because it can be difficult to say whether neurons close to each other on the map represent separate day types or not. On the other hand, the classification results seem quite obvious; the result for the load data of Taiwan power system of May 1986 suggest the days divided into categories: 1) Sundays and holidays, 2) Mondays and days after holidays, 3) Saturdays, 4) weekdays except holidays. Djukanovic et al. (1993) discuss the supervised and unsupervised learning concepts using a functional link net, which allows supervised and unsupervised learning with the same net configuration and with the same data structure. The input to the network includes the 24 load values of the day before the target day, the maximum, minimum, and average temperatures of this day, temperature forecasts for the target day, the tariff season of the year, and the day of the week. In the training phase, the network uses unsupervised learning to classify the data into clusters and supervised learning for the actual forecasting within each cluster. In the forecasting phase, the target day is classified to one of the existing clusters on the basis of load and temperature data of the previous day. The actual forecast is then created within this cluster. Piras et al. (1996) suggest a structure, where an unsupervised learning model called neural gas is used in preprocessing the data into clusters. The system is thereby divided into submodels, which utilize normal MLP networks in approximating nonlinear relations. The resulting outputs are summed by a weighted fuzzy average, which allows a smooth transition between the models. Lamedica et al. (1996) propose a normal three-layered MLP with an input structure containing the load data of all hours of two preceding days, and three binary codes indicating the day type. The output consists of 24 load values of the target day. The model is reported to work well on normal days, but the unsatisfactory accuracy on anomalous load conditions such as vacation periods and long weekends are the motivation for a more detailed day type classification. Kohonen's SOM is used for the classification, and cluster codes of the target day and the two preceding days are included in the input pattern of the MLP network. In the forecasting stage, the preventive classification of the target day is left to a human operator. The model is very different to the ones suggested by Djukanovic et al. (1993) and Piras et al. (1996), because the supervised training is performed with a single network for all clusters, and the cluster code is included as an input to the network. The results 32
Chapter 3. Neural networks in load forecasting indicate that the normal MLP provides better results for normal load conditions, but the using of SOM classification improves the accuracy on anomalous conditions. There is one article proposing a very different use of Kohonen's SOM (Baumann et al. 1993). There the network is used directly in forecasting the daily load curve instead of classifying load patterns. The network is trained on load curves consisting of two succeeding days. The forecast for a day is obtained by associating the load curve of the previous day to a certain neuron on the map. This neuron provides directly the forecast for the target day. The model is described in more detail in section 4.2, where also some test results obtained with the test data of this work are presented.
Chapter 3. Neural networks in load forecasting Sforna et al. (1995) introduce a system consisting of two neural forecasters embedded in the same environment, but able to act separately if needed. These modules can also be replaced with different traditional statistical routines. The first module forecasts the daily load curve with a normal MLP. The second module works on-line, and performs correction on the static forecast on the basis of the most recent information. This uses a recurrent neural network, where the neurons of the first hidden layer are connected to themselves in addition to the second layer neurons. Chow et al. (1996) present a neural network module for weather compensation. The idea is to forecast the deviation of the load on a certain hour from the load of the corresponding hour on the previous day. The network has 24 output nodes, so the forecast is obtained 24 hours ahead at once. Many authors feel that making the load forecasts as accurate as possible requires the utilizing of external information, such as the knowledge of different social activities etc. Some articles propose the use of concepts of fuzzy set theory combined with neural networks for the purpose. There are two possible approaches (Momoh et al. 1995). First, the fuzzy logic can be used in providing a neural network with numerical input data based on human expertise. Second, fuzzy rules can be used to make corrections on the neural network output on the basis of the human expertise. The model explained by Srinivasan et al. (1995) is of the former type. There, a fuzzy front-end models the quantitative and qualitative knowledge about the system, and a neural network models the relationship between the fuzzy inputs and outputs. The output of the network is then defuzzified to obtain the load profile for the target day. Kim et al. (1995), on the other hand, suggest a model of the latter type. The forecasting procedure is divided into two steps. In the first one, a provisional forecast for the load is obtained using a normal MLP neural network with one output node. In the second step, fuzzy expert systems are applied to estimate the correction to the load due to temperature change and possible holiday-nature of the day. The taking of other factors such as election days, rainy season, or television programs into account is considered as a future development. Fuzzy concepts are also used in a model called fuzzy neural network. Bakirtzis et al. (1995) propose such an integrated neural-network-based fuzzy decision system. There, a fuzzification interface, fuzzy rule base, fuzzy interference machine and 34
Chapter 3. Neural networks in load forecasting defuccification interface perform a mapping from the non-fuzzy input space to the non-fuzzy output space. This kind of a fuzzy system can approximate any continuous function to an arbitrary degree of accuracy. The system can be represented by a layered network and the adapting of the model parameters is performed through a training process similar in nature to that of a MLP network. Short-term forecasting results are reported to be similar to those of neural networks, but the training of the fuzzy neural network is faster.
Summary
The literature presents many model types. Most of the models are based on feedforward type MLP networks, but also models using unsupervised learning, fuzzy concepts and recurrent networks, to give a few examples, are presented. A common approach is to build a modular system, where separate modules concentrate on specific tasks. There is a single feature, which clearly divides the models into two distinct classes. Namely, some models are based on the idea of producing the whole load curve of a day at one time, while others are able to forecast the hourly load ahead at any time of the day. In most of the articles, either of these model types is used without giving specific reasons for the choice. There is not much comparison between the two approaches. The comparison of the approaches presented in the literature is difficult. Most of the articles present a specified solution to the problem, but the justifying of the choices concerning the utilized methods is often not given much attention. Since the load conditions are different in each case, the direct comparison of the forecasting errors is quite meaningless. This is the reason for the comparative approach taken in this work. The goal is to obtain comparable information on the performance of the basic model types. This kind of an approach is seen necessary for the purpose of building a real application suited for the defined needs. Another thing lacking in the articles is the analysis of the performances at different lead times. In this work, the idea to use separate modules for different lead times is
35
Chapter 3. Neural networks in load forecasting considered. In particular, the possibility to improve the accuracy for the closest hours with a separate model will be examined.
36
37
Chapter 4. Forecasting the daily load profile used. The seasonal trend on the load can be easily seen. Also, the weekly load structure can be seen in the form of lower load values on weekends than on working days.
90 80 70 60 MW 50 40 30 20 10
50
100
150
200 Days
250
300
350
400
Figure 4.1: Peak-, valley- and average loads over the period May 24, 1996-May 23, 1997.
There are numerous ways to choose the architecture of the models. Here, a decision was made to use one network for all day types, and to include the type of the day as an input to the network. Another possibility would have been to use separate networks for each day type, but this was considered unnecessary on the basis of some preliminary testing. It was also decided that each network has one hidden layer between input- and output layers. It was concluded that using networks with only one output node gives better results than forecasting peak-, valley- and average loads with one multi-output network. This means that separate networks are used in all cases. Another features to be decided about the architecture of the network are the input variables and the number of hidden layer neurons. For the input variables, the following symbols are used: Lmax (i ) = maximum (peak) load of day i
Chapter 4. Forecasting the daily load profile Lave (i ) = average load of day i Tmax (i) = maximum temperature of day i
Chapter 4. Forecasting the daily load profile Informing the network about the day type is important, because Saturdays and Sundays have much lower peak loads than working days. A hyperbolic tangent function is used as the activation function by all feed-forward neural networks of this work. This function is a mapping into the interval [-1,1]. To enable the converging of the network training within a reasonable time, the desired output values should be scaled onto this interval (see, e.g., Haykin 1991). All temperature and load values are therefore scaled linearly between 1 and 1.
40
Chapter 4. Forecasting the daily load profile The model is tested in this work only on normal days; special holidays are excluded from the test data. The load shape based on peak- and valley loads The shape of the load curve for a certain day contains 24 normalized load values. When the load shape is combined with the forecast peak- and valley load values, the normalized load values are:
Lnor (i, j ) =
where
(4.1)
(4.2)
where the hats indicate that the load values are forecasts. The load shape based on average load Wen the load shape is combined with the average load, the normalized load values are: Lnor (i, j ) = L (i, j ) Lave (i) , where Lave (i) = average load of day i The forecast for the hourly load is then: L (i, j ) = L nor (i, j ) + L ave (i ).
(4.3)
(4.4)
41
Chapter 4. Forecasting the daily load profile Different averaging models The forecast for the load shape is obtained by averaging some load shapes of the corresponding day type class in the load history. The number of these days has to be decided. Four different models are considered for day type classification 1, and three different models are considered for classification 2. These are: Classification 1: Model 1: For all day types two most recent example days are averaged. Model 2: For all day types three most recent example days are averaged. Model 3: For weekdays five most recent example days, for Saturdays and Sundays two most recent examples are averaged. Model 4: For weekdays five most recent example days, for Saturdays and Sundays three most recent examples are averaged. Classification 2: Model 1: For all day types two most recent example days are averaged. Model 2: For all day types three most recent example days are averaged. Model 3: For all day types five most recent example days are averaged.
Test results
Error measure In comparing different models, the average percentage forecasting error is used as a measure of the performance. This is defined: L i Li 1 N = N i =1 Li 100% ,
Eave
(4.5)
where
42
and average load forecasting, and number of hours in the case of hourly load forecasting). Li = the i:th load value
The reason for using the average percentage error is the fact, that its meaning can be easily understood. It is also the most often used error measure in the load forecasting literature used as reference of this work, and therefore allows for some comparative considerations (although the results are not directly comparable in different situations). Another possibility as a performance measure would be the root-mean-square (RMS) percent error. This penalizes the square of the error as opposed to the average forecasting error. Therefore, the RMS takes the deviation of the errors into account more effectively. However, when both measures were calculated on some test models with relatively small errors, the orders of preference were in practice the same with both measures. Therefore, the average forecasting error will be used throughout this work. Peak, valley, and average load forecasts The results of forecasting the peak-, valley-, and average loads with MLP networks are given in this section. The eight different input structures (see page 39) were tested in all cases. The training set for the networks consists of the data over one year, from May 24, 1996 to May 23, 1997. The idea is that all load and weather conditions are representative in the training in order to enable the model to adapt to all conditions. The test data consists of the remaining data, from May 24, to August 18, 1997. Peak-, valley and average loads over the test period are shown in figure 4.2. In the beginning of the period, the weekly rhythm is clearly visible. At the end of the period, it becomes less distinct, especially for the peak- and valley loads. The reason is the summer holidays, which naturally weaken the weekly rhythm and also make the forecasting more difficult. 43
55 50 45 40 35 MW 30 25 20 15 10 5 0 10 20 30 40 Days 50 60 70 80 90
Figure 4.2: Peak-, valley- and average loads over the period May 24, 1997August 18, 1997
A more thorough testing would require the use of test data at all seasons of the year, but as the testing must not be carried out with the same data as used in the network training, all testing is limited to the only season where data in two successive years is available. If the testing were carried out on another season, and the data of the test season were simply excluded from the training set, then the data of the prevailing conditions would be under-representative in the training set. The test season is in the summertime, when the holidays as an external factor have an influence on the load. This makes the forecasting difficult. Also the temperature has a much fainter effect in the summer than in other seasons. The average percentage forecasting errors for the test data are shown in figures 4.3 (peak load), 4.4 (valley load), and 4.5 (average load). As the neural network forecasters can give slightly different results with different training sessions, the training and forecasting was performed three times with each network. The error values are averages from these three test runs. The tests were carried out with 3, 5, 7 and 10 hidden layer neurons. These numbers were chosen on the basis of rough calculations for the reasonable size of the hidden layer, as discussed in chapter 3 (equation 3.3).
44
3 5 7 1
n n n 0
0 1 2 3 4 5 6 7 8
In p u t s t r u c t u r e s
Figure 4.3: The average percentage errors for peak load forecasts on the test period May 24 August 18, 1997.
10
3 5 7 1
n n n 0
0 1 2 3 4 5 6 7 8
In p u t s t r u c t u r e s
Figure 4.4: The average percentage errors for valley load forecasts on the test period May 24 August 18, 1997.
10
3 5 7 1
n n n 0
0 1 2 3 4 5 6 7 8
In p u t s t r u c t u r e s
Figure 4.5: The average percentage errors for average load forecasts on the test period May 24 August 18, 1997.
45
Chapter 4. Forecasting the daily load profile The input structure 4 gives the best results in all the cases. This contains only the peak load of the previous day as input. No temperature data is utilized. It can be concluded that in this test case the neural networks fail to make use of any other data than the peak, valley, or average load of the previous day. The real values and forecasts with input structure 4 are shown in figures 4.6 (peak load), 4.7 (valley load) and 4.8 (average load).
55
50
45 MW 40 35 30
10
20
30 Days
40
50
60
70
Figure 4.6: Actual and forecast peak loads with 7 hidden layer neurons and input structure 4
34 32 30 28 26 MW 24 22 20 18 16 14
10
20
30 Days
40
50
60
70
Figure 4.7: Actual and forecast valley loads with 3 hidden layer neurons and input structure 4
46
45
40
35
MW
30
25
20
15
10
20
30 Days
40
50
60
70
Figure 4.8: Actual and forecast average loads with 7 hidden layer neurons and input structure 4.
The forecasts for the average load are more accurate than those for the peak and valley loads. The errors are on average less than 3.5% for the input structure 4. On the other hand, for the valley load, the average error is only slightly less than 5%. There are a few occasions, where the average forecasting error is unacceptably large (figure 4.4). This is a phenomenon, which has been observed from time to time during this work when forecasting with feed-forward networks: sometimes the network totally fails in the training phase. The forecasts can then be almost anything. It is interesting to notice that the errors grow larger as the hidden layer size gets larger, except in the case of input structure four, which has only 1+4 input variables. This suggests over-parameterization. The results don't suggest the use of the data from the previous week. If, however, the peak load forecasts are made further than one day ahead, the situation may be different. If the only input to the network were the data from the day just before the target day, then in forecasting two days ahead, the input would consist only of forecast data. If, on the other hand, the input also contains data from the previous week, the accuracy could be better. The data in July is quite much affected by the summer holidays, and the load seems to be difficult to forecast. If the tests are run only on the five first weeks of the data, the results are much better. In figure 4.9, the average errors for peak, valley and average 47
Chapter 4. Forecasting the daily load profile load forecasts are given with the input structures 4, 5 and 6, using five hidden layer neurons.
P e a k lo a d
V a lle y l o a d
Average load
Figure 4.9: The average percentage forecasting errors for peak-, valley-, and average load forecasts on the test period May 24 June 28, 1997.
Load shape forecasts To test the performance of the load shape forecasting model, the prediction was carried out with the unrealistic assumption that the peak-, valley- and average load values are known in advance. To get a good opinion on the performances of different models, testing was carried out on four test periods, all at different seasons of the year. The average hourly forecasting errors are given in figures 4.10 4.13. Figures 4.10 and 4.11 show the average errors for the model based on daily peak and valley loads. Figure 4.10 is for day type classification 1, and figure 4.11 for day type classification 2. The corresponding with daily average loads are shown in figures 4.12 and 4.13. Each bar represents the average error with a certain averaging model (see page 42).
48
4.5
3.5
2.5
1 2 3 4
1.5
0.5
Sept. 30 - Nov. 3, 96
Figure 4.10: The load shape forecasting with peak- and valley loads: the average errors for day type classification 1.
4.5
3.5
2.5
1.5
0.5
Sept. 30 - Nov. 3, 96
Figure 4.11: The load shape forecasting using peak- and valley loads: the hourly average errors for day type classification 2.
49
4.5
3.5
2.5
1 2 3 4
1.5
0.5
Sept. 30 - Nov. 3, 96
Figure 4.12: The load shape forecasting using the daily average load: the hourly average errors for day type classification 1.
4.5
3.5
2.5
1.5
0.5
Sept. 30 - Nov. 3, 96
Figure 4.13: The load shape forecasting using daily average load: the hourly average errors for day type classification 2.
The classification 1 appears superior to classification 2 in both cases. The averaging models 3 and 4 seem to give the best results. The average error is less than 3 % for all test periods, except for the last one. As an illustration, the forecast and the real load over the one-week long period between April 8 April 14, 1997 are shown in figures 4.14 (based on the peak- and 50
Chapter 4. Forecasting the daily load profile valley loads) and 4.15 (based on the average load). The classification 1 and averaging model 3 are used in both cases.
70
65
60
MW
55
50
45
40
2 0
4 0
60
80 H o u rs
100
120
140
160
180
Figure 4.14: The load shape forecast with peak- and valley loads on the period April 8 April 14, 1997. Classification 1 and averaging model 3 are used. The average error is 3.17 %.
70
65
60
55 MW 50 45 40 35
2 0
4 0
60
80 H o u rs
100
120
140
160
180
Figure 4.15: The load shape forecast with average load on the period April 8 April 14, 1997. Classification 1 and averaging model 3 are used. The average error is 3.27 %.
51
Chapter 4. Forecasting the daily load profile Combining peak, valley, and average load forecasts with load shape predictions Following decisions were made on the models forecasting peak-, valley-, and average loads. input structure 4 is used hidden layer has 7 neurons
Correspondingly, following decisions were made on the load shape prediction model: days are classified into three classes (Mondays-Fridays, Saturdays, Sundays) in the class Mondays-Fridays, 5 example days are averaged in the classes for Saturdays and Sundays, 2 example days are averaged
Forecasting was performed twice on the test period May 24 August 18, 1997: first peak- and valley forecasting, then average forecasting, was combined with load shape prediction. Average forecasting error was 5.03 % for the model using peak- and valley loads, and 4.48 % for the model using average load. In figure 4.16, the actual load and forecast based on peak- and valley loads is shown for the two weeks long time period May 24 June 6. The forecasting error for this time period is 4.76 %. The corresponding for the model using average load is given in figure 4.17. The forecasting error is in this case 4.35 %.
5 5
5 0
4 5
MW
4 0
3 5
3 0
2 5 0 5 0 1 0 0 1 5 0 H o u rs 2 0 0 2 5 0 3 0 0 3 5 0
Figure 4.16: Actual load and forecast obtained by combining load shape prediction and peak- and valley load forecasts on the period May 24 June 6, 1997. Average forecasting error is 4.76 %.
52
5 5
5 0
4 5
MW
4 0
3 5
3 0
2 5 0 5 0 1 0 0 1 5 0 H o u rs 2 0 0 2 5 0 3 0 0 3 5 0
Figure 4.17: Actual load and forecast obtained by combining load shape prediction and average load forecasts on the period May 24 June 6, 1997. Average forecasting error is 4.35 %.
In both cases, the forecasting goes fine for the first week, the average error being around 3 %. However, in the second week, the level of the load fluctuates much more and large errors in forecasting the peak-, valley- and average loads cause notable systematic errors thus spoiling the hourly forecasts.
53
54
55
Chapter 4. Forecasting the daily load profile the network, which would have large enough weight values to compare with the actual load values. Therefore, the forecast values will be too low. To overcome the problem, a correction to the forecasts is needed. In the article of Baumann and Germond (1993), a trend correction is proposed. This is intended for taking care of the annual growth, but it also applies to the problem caused by extreme load curves. The forecast for a certain hour is adjusted in the following way: the difference between the corresponding hour in the previous day and the weight vector element of that hour in the best-matching neuron is taken. This is multiplied by a constant and added to the preliminary forecast. From here on, the correction will be called delta correction. The forecasting with the delta correction can be expressed as follows:
L i = Wi + 24 + ( Li 24 Wi ) ,
(4.6)
where
Test results The training and test sets are the same as used earlier in this chapter: May 24, 1996 May 23, 1997 and May 24, 1997 August 18, 1997. The performance of the model was tested with five different network sizes, and with eleven values for . The average percentage forecasting errors are given in figures 4.18 and 4.19. In figure 4.18, the network was trained by feeding the whole training set 50 times into each network (one network for each day of the week). In figure 4.19, 500 training epochs were used.
56
12
10
6x6 8x8 10 x 10 12 x 12 14 x 14
d=0
d=0.1
d=0.2
d=0.3
d=0.4
d=0.5
d=0.6
d=0.7
d=0.8
d=0.9
d=1
Figure 4.18: the average errors with different network sizes and delta values. The training consisted of 50 epochs.
12
10
6x6 8x8 10 x 10 12 x 12 14 x 14
d=0
d=0.1
d=0.2
d=0.3
d=0.4
d=0.5
d=0.6
d=0.7
d=0.8
d=0.9
d=1
Figure 4.19: the average errors with different network sizes and delta values. The training consisted of 500 epochs.
The results are better with 500 training epochs. Especially for the large network sizes, the long training appears to be necessary. The smallest forecasting errors are obtained with the network with 8 x 8 neurons. The best value for is 0.9. 57
(4.7)
The model was tested on the same test period as the SOM-model. The average forecasting errors with different delta parameters are shown in figure 4.20:
5 .2 5 .1 5 4 .9 Average error (%) 4 .8 4 .7 4 .6 4 .5 4 .4 4 .3 4 .2 4 .1 4 d=0 d=0.1 d = 0 .2 d = 0 .3 d = 0 .4 d=0.5 d=0.6 d=0.7 d = 0 .8 d=0.9 d=1
Figure 4.20:The average forecasting errors for the test period May 24 August 18, 1997.
58
Chapter 4. Forecasting the daily load profile It can be seen that this model achieves greater accuracy than the more complicated SOM-model. The best result is obtained with =0.5, where the average percentage error is 4.55 %. This is only slightly worse than the result obtained by forecasting the daily average load with a MLP network, and combining this to the load shape prediction (section 4.2). For an illustration, the real load and the forecast with =0.5 for the two-weeks-long period May 25 June 7, 1997 are shown in figure 4.21. The average error is 4.04 %.
60 55 50 45 MW 40 35 30 25 20
50
100
150 H o u rs
200
250
300
350
Figure 4.21: Actual load and forecast obtained with the selection model over the period May 25 June 7, 1997. The average forecasting error is 4.04 %
To illustrate the ability of this model to forecast the mere shape of the load, the forecasting was also performed with the assumption that the average load of the target day is known in advance. In that case, the forecast load curve is:
L (i, j ) = L ( s, j ) + [ Lave (i) Lave ( s ) ]
(4.8)
The average error was 3.20 % on the testing period. This is slightly worse than with the model described in section 4.1.
59
4.3 Summary
Forecasting the daily load profile was first studied using a MLP network to predict daily peak, valley and average load values. The shape of the load curve was predicted by averaging some daily shapes in the load history. The whole year was used as the training set for the MLP networks. The performance of the models was tested on the load data of the summer 1997. Using the temperature data did not improve the forecasting accuracy; the best results were obtained with the simplest input structure, which only uses the load data of the previous day as the predictor. The average errors of approximately 4 % for the peak load, less than 5 % for the valley load, and less than 3.5 % for the average load were obtained. The results were, however, better for the beginning of the summer, before the summer holidays. These error percentages are somewhat larger than many of those reported in the literature. Some authors have reported average forecasting errors of around 2 % for daily peak and total load forecasts (e.g. Park et al. 1991a, Peng et al. 1992). The reason for the difference lies most likely in the nature of the test data sets. The number of the consumption units is much smaller in the case of this work than in most of those in the literature. Since the proportion of the load deviation to the aggregate load level decreases when the number of consumption units increases, the difference in the results seems natural. In combining peak-, valley- and average load forecasts to load shape forecasts, using average load was found superior to using peak- and valley loads. The average error for the hourly load curve forecasts was around 4.5 % for the whole test set. The results in forecasting the load on the basis of the whole load curve of the previous day were quite similar to those obtained with combined average load- and load shape forecasting. The use of Kohonen network did not improve the accuracy over the simpler selection model. The average error was 4.55 % with the selection method.
60
Chapter 5. Models for hourly forecasting Should the load of a certain hour be modeled as a function of the previous hours only, or also as a function of the corresponding hours in the previous day and week? Should there be separate neural network forecasters for different day types or should the day type be included as an input to the single network forecasting for all day types? Should the training set for the networks contain the data of the whole year or only recent weeks or months? The period of the previous year corresponding to the forecasting time can also be considered. The temperature clearly affects the load behavior in Finland, but the answer to the first question is not obvious. The change in the temperature is usually not very fast, and the effect on the load is most likely delayed. Therefore, it is clear that the shorter the time span of the forecast, the less important is the temperature as an explanatory variable. It can be doubted if the temperature can be effectively utilized in hour by hour forecasting for the shortest lead-times. Models, both with and without temperature, are tested in this chapter. The second question is also examined through testing. The models treated in section 5.3 use load values of the previous day and week in addition to the most recent hours. The forecasts are created hour by hour. The models are tested with and without temperature data for lead times of up to one week. An analysis of the average error percentages for different lead times is included. The models in section 5.5 use only the most recent load values. These are intended for short lead times (up to a few hours), and do therefore not utilize temperature data. The network has many output nodes, and the forecast is created for many lead hours at once. The third question is answered only on the basis of preliminary testing. From numerous test runs on different models, it was concluded that including the day type as an input to the neural network seems to be a more suitable solution. It is easier to implement. The problem with using separate networks for different day types is the small number of example days of some day types. Therefore, in this chapter only models with one neural network for all day types are considered.
62
Chapter 5. Models for hourly forecasting The last question is difficult to answer. In the previous chapter, the data over the whole year was used for training a network to predict peak-, valley-, and average loads. The idea was to train the network to recognize characteristics of all seasons. However, in forecasting the hourly load, the training set is 24 times larger, and the networks also have to be larger. The training can become very exhaustive. To decline the training time, using separate networks for each hour of the day is given some consideration in section 5.5, but the test results do not seem very promising. On the other hand, with a short training period the network can be trained fast. The problem is the relatively fast changing of the load characteristics. If, for example, one or two preceding months are used for training, the conditions may already be quite different at the time of the forecasting. However, the training can be performed very often. Alternatively, the network weights can be adapted continuously by training with the most recent data. Then the idea reminds recursive identification methods, where the model parameters are estimated recursively (see, e.g., Sderstrm and Stoica, 1989). In this chapter, most of the models use relatively short training sets.
L4 = L (i 24)
L5 = L (i 168) In addition to the load values, the network needs inputs indicating the day type. For this purpose, the days are divided into four categories: Mondays, Tuesdays-Fridays, Saturdays, and Sundays. Each day type has a binary input that gets the value one if the day of the target hour is of this particular type, and otherwise zero. The day type inputs can be written: 63
Chapter 5. Models for hourly forecasting Dk , where k = 1, ,4 It was also concluded that including input variables to inform the network of the hour of the day improves the forecasting accuracy significantly. Five binary type variables are used for the purpose. The hour-of-day inputs are: H k , where k = 1,...5 These get values 0 or 1 in such a way that day (0-23). Therefore, the model using no temperature data has 14 input variables and one output variable. Three different means to take the effect of the temperature into account are considered. In the first one, the temperature values of the target hour and corresponding hours one day and one week ago are used. In the second one, the average temperatures of the target day, previous day, and the day one week ago are used. In the third one, the maximum and minimum temperatures of these days are used. If the load of the hour j on the day i to be forecast is denoted L (i, j ) , the temperature inputs in these three model types are: Model 1: Model 2: T1 = Tave (i ) T2 = Tave (i 1) T3 = Tave (i 7) Model 3: T1 = Tmax (i )
2
k =1
k 1
T1 = T (i, j ) T2 = T (i 1, j )
T3 = T (i 7, j )
T2 = Tmin (i )
T3 = Tmax (i 1)
T4 = Tmin (i 1)
T5 = Tmax (i 7) T6 = Tmin (i 7)
64
65
10 9
8 7 6 5 4 3 2 1 0
Sept. 17 Aug. 17 Oct. 31 Dec. 1, Oct. 15, Oct. 15, 1996 1996 1996 Oct. 1 Dec. 1, 1996 Jan. 19 Dec. 10 Mar. 15 Feb. 15 Apr. 19, Feb. 17, Feb. 17, Apr. 19, 1997 1997 1997 1997
10 hidden layer neurons 15 hidden layer neurons 20 hidden layer neurons
Training set
Figure 5.1: The average forecasting error percentages for various training- and test sets and three hidden layer sizes. Forecasts were updated daily.
The Figure shows that the training with two months gives in general better results than training with only one month. The results for the two-month long training periods are otherwise quite satisfying, except for the test week at the end of February. The reason for the failure in this particular test week is the unusual weather conditions. The performance of the models is illustrated in figures 5.2 5.5. They show the actual and forecast loads for all test weeks with two-month long training sets.
66
60
55
50
45 MW 40 35 30 25
20
40
60
80 Hours
100
120
140
160
180
Figure 5.2: The actual and forecast load on the period October 16, 1996 October 22, 1996. The training period is August 17 October 15, 1996. The average forecasting error is 2.99 %.
70
65
60
55 MW 50 45 40 35
20
40
60
80 Hours
100
120
140
160
180
Figure 5.3: The actual and forecast load on the period December 2 December 10, 1996. The training period is October 1 December 1, 1996. The average forecasting error is 3.46 %. The Friday the 6th and Saturday the 7th are excluded from the test set, because 6th of December is Finland's independence day, and the load behaviour is exceptional.
67
85 80 75 70 65 MW 60 55 50 45 40 35
20
40
60
80 Hours
100
120
140
160
180
Figure 5.4: The actual and forecast load on the period February 18 February 24, 1997. The training period is December 10 February 17, 1997. The average forecasting error is 8.92 %.
70
65
60
55
50
45
40
35
20
40
60
80
100
120
140
160
180
Figure 5.5: The actual and forecast load on the period April 20 April 26, 1997. The training period is February 15 April 19, 1997. The average forecasting error is 3.92 %.
In figure 5.4, it can be seen that the load decreases heavily throughout the week, and the forecasts can not keep up with that. This is a problematic situation in the view of forecasting. The load curve during the last month of the training set and the whole test week is shown in figure 5.6. The vertical line separates these data sets from each
68
Chapter 5. Models for hourly forecasting other. In the light of this figure, it is not surprising that model can not predict the load without using temperature data.
85 80 75 70 65 MW 60 55 50 45 40 35
100
200
300
400 500 H o u rs
600
700
800
900
Figure 5.6: The load over the period January 19 February 24, 1997
The temperature during the same time period is shown in figure 5.7. The change during the test week is remarkable. At least for this kind of situations where the temperature changes rapidly, including the temperature in the model is essential.
15 10 5 0 -5 -10 -15 -20 -25
Temperature
100
200
300
400 500 H o u rs
600
700
800
900
Figure 5.7: The temperature over the period January 19 February 24, 1997. The sharp peak at about hour 600 is likely to be a measurement error.
69
Including the temperature In figure 5.8, the average forecasting errors are shown for the three different models using temperature data. The same training and test periods are used as in figure 5.1. In all test runs, 15 hidden layer neurons were used. Models 1, 2 and 3 refer to different ways to include the temperature in the networks (see page 64).
10 9
8 7 6 5 4 3 2 1 0 Sept. 17 Aug. 17 Oct. 15, Oct. 15, 1996 1996 Oct. 31 Dec. 1, 1996 Oct. 1 Dec. 1, 1996 Jan. 19 Feb. 17, 1997 Dec. 10 Feb. 17, 1997 Mar. 15 Apr. 19, 1997 Feb. 15 Apr. 19, 1997
model 1 model 2 model 3
Training set
Figure 5.8: the average error percentages for different training and test periods. 15 hidden layer nurons were used in all test runs.
It is clear that the shorter training time is not adequate for this model. The errors are unacceptably large in all cases, where the length of the training period is one month. The reason seems to be, that there are simply not enough example cases of different days, in particular weekend days, in the training set to match the increased size of the input layer. The problem is illustrated in figure 5.9, where the actual and forecast loads for the test period October 16 October 22 are shown.
70
6 0 5 5 5 0 4 5 4 0 3 5 3 0 2 5 2 0
20
40
60
80
100
120
140
160
180
Figure 5.9: The actual and forecast load on the period October 16 October 22, 1996. The training period is September 17, 1996 October 15, 1996. Model type 2 was used. The average error is 6.32 %.
On the other hand, with the training sets of the length of two months, including the daily average temperatures in the model (model 2) appears to improve the accuracy; the errors are in general slightly smaller than when forecasting without temperature data. However, on the test week at the end of the February the errors are still unacceptably large. While the two months long training period is clearly superior to the one month long, the training period of four months was tested in order to see if this could improve the accuracy for the test week in February. The training was performed for the model type 2 with 15 hidden layers on the period October 10, 1996 February 17, 1997. The average forecasting error for the test set was 5.08 %. The actual and forecast loads with the model type 2 for all test weeks are shown in figures 5.10-5.13. Figure 5.12 shows the test week in February. It can be seen that the model can now foresee the decreasing of the load, but still not to the fully satisfying extent.
71
60
55
50
45 MW 40 35 30 25
20
40
60
80 H o u rs
100
120
140
160
180
Figure 5.10: The actual and forecast load on the period October 16 October 22, 1996. The training period is August 17, 1996 October 15, 1996. The average forecasting error is 2.19 %.
70
65
60
55 MW 50 45 40 35
20
40
60
80 H o u rs
100
120
140
160
180
Figure 5.11: The actual and forecast load on the period December 2 December 10, 1996. The training period is October 1 November 30, 1996. The average forecasting error is 3.25 %. The Friday the 6th and Saturday the 7th are excluded from the test set.
72
80 75 70 65 60 MW 55 50 45 40 35
20
40
60
80 H o u rs
100
120
140
160
180
Figure 5.12: The actual and forecast load on the period February 18 February 24, 1997. The training period is October 10 February 17, 1997. The average forecasting error is 5.08 %.
70
65
60
55 MW 50 45 40 35
20
40
60
80 H o u rs
100
120
140
160
180
Figure 5.13: The actual and forecast load on the period April 20 April 26, 1997. The training period is February 15 April 19, 1997. The average forecasting error is 3.19 %.
73
74
0.12
0.1
0.08
0.06
0.04
0.02
10 Lead hours
15
20
25
Figure 5.14: The average forecasting errors for different lead times. Each curve corresponds to a certain test period of the length of one week. No temperature data was utilized.
0.12
0.1
0.08
0.06
0.04
0.02
10 Lead hours
15
20
25
Figure 5.15: The average forecasting errors for different lead times. Each curve corresponds to a certain test period of the length of one week. Daily average temperatures were utilized in forecasting.
Both figures show that the accuracy at all test sets is on average the better the shorter the lead-time is. However, the accuracy seems to get worse very fast when moving towards longer lead times. Already with lead-times of 5 hours, the average errors are in most cases almost as large as with the lead-time of 24 hours.
75
Chapter 5. Models for hourly forecasting The errors are clearly smaller with the model utilizing temperature data. It seems that the model should not be used without temperature data even for the shortest lead times, if reasonably accurate forecasts for the average temperature are available.
10 9
Training set
Figure 5.16: the average forecasting errors for some training- and test sets. The forecasting was performed once for the whole test week.
The average errors are only slightly larger than in forecasting one day at a time. The results provide, however, a too optimistic opinion on the performance, because perfect temperature forecasts for one week ahead were assumed. This is of course unrealistic, but the results show that with even a rough forecast for the average temperature, the model can forecast with adequate accuracy for a much longer leadtime than one day.
76
(5.1) where:
wt = (1 B)(1 B 24 )(1 B168 ) yt = yt yt 1 yt 24 + yt 25 yt 168 + yt 169 + yt 192 yt 193 ut = (1 B)(1 B 24 )(1 B168 ) xt = xt xt 1 xt 24 + xt 25 xt 168 + xt 169 + xt 192 xt 193
yt = load on hour t ut = temperature on hour t B is the backshift operator. 1 , 2 ,1 ,2 ,24 ,168 are free parameters estimated using the load and temperature data. The structure of the model was chosen by examining the autocorrelation and partial autocorrelation functions of the data. The testing was carried out by forecasting the load one day at a time for four different test weeks. The results are therefore comparable to those of section 5.2. The parameters were estimated using about five weeks of load data before each test day. Therefore, during the test week, the parameters were re-estimated daily. The residuals were checked each time in order to discover possible inadequacy of the model. The average forecasting errors with both, SARIMAX model and MLP model utilizing daily average temperatures are shown in figure 5.17. The training sets with the MLP model are two months long, except for the test set in February, where the training set is four months long. 77
Average error
4.00 % 3.00 %
SARIMAX MLP
2.00 % 1.00 % 0.00 % October 16 October 22, 1996 November 28 December 5, 1996 February 18 February 24, 1997 April 20 April 26, 1997
Test sets
Figure 5.17: the average forecasting errors using ARIMAX model and MLP model with daily average temperatures.
All errors are slightly larger with SARIMAX than with MLP model. Figures 5.18 and 5.19 show the forecasts for two test weeks with both, MLP model and SARIMAX model. When forecasting for longer lead-times, the MLP model becomes clearly superior to SARIMAX. The errors with SARIMAX model grew quite fast unacceptably large, when moving towards longer lead-times; forecasting for a whole week at once gave very inaccurate results. With MLP model, on the other hand, the errors were almost as small as when forecasting one day at a time (see figure 5.16).
78
80 70 MW 60 50 40 30 0 80 70 MW 60 50 40 30 0 20 40 60 80 100 Hours 120 140 160 180 20 40 60 80 100 Hours 120 140 160 180
Figure 5.18: The actual load and forecast with SARIMAX model (top) and MLP model (bottom) on the period February 18 February 24, 1997. The average forecasting error is 5.86 % with SARIMAX and 5.08 % with MLP (trained on four months).
70 60 MW 50 40 30
20
40
60
80
100 Hours
120
140
160
180
Figure 5.19: The actual load and forecast with SARIMAX model (top) and MLP model (bottom) on the period April 20 April 26, 1997. The average forecasting error is 4.05 % with SARIMAX and 3.19 % with MLP.
79
Test results
One network for all hours First, the training with the data of a whole year, from May 24, 1996 to May 23, 1997, was taken into consideration. Testing was carried out using the remaining data, from May 24, 1997 to August 8, 1997. 80
Chapter 5. Models for hourly forecasting The number of the training cases is quite considerable. There are 24 cases each day, and the number of the days is more than 300. Therefore, the number of the hidden layer neurons should also be quite large (see equation 3.3). Small hidden layer sizes were tried, but results were inadequate. The problem with a large hidden layer is the long computation time needed in training. For a very simple input-output structure, where p = 3 and q =1, the sufficient training with 50 hidden layer neurons took several hours for Matlab to complete. For the 6 first test weeks (that is the time before the summer holidays) the average error varied between 1.6 % and 1.9 %. The increasing of the hidden layer size further would probably still improve the results. However, to avoid the excessive growth in the computing time, the reducing of the training set was taken into consideration instead. Training sets of one and two months were tested. The sets consist of the end part of the one year long data set used above. The week following the training sets was used as a test set. The reason for not using a longer test set is the fact, that training with a short training set would in reality be often repeated. Therefore, there is no sense in testing a network on load data of a later time. The average forecasting errors with different model structures are given in the tables 5.1 and 5.2. In almost all cases p=3, because this was found the most suitable number of input load values in preliminary test runs. The number of output neurons varies between 1 and 7, and that of hidden layer neurons varies between 10 and 30.
81
Table 5.1: the average forecasting errors for the test set May 24 May 30, 1997. The training period is March, 23 May 23, 1997.
p 3 3 3 3 3 3 3 3 3 5 3 q 1 1 1 3 3 3 5 5 5 5 7 neurons 10 20 30 10 20 30 10 20 30 30 30 lead 1h 1.37 1.35 1.40 2.57 1.47 1.47 2.37 1.63 1.47 1.59 1.51 lead 2h 2.54 2.05 1.82 3.05 1.93 1.80 1.92 1.76 lead 3h 2.60 2.42 2.33 3.06 2.48 2.30 2.22 2.07 lead 4h 3.00 2.67 2.55 2.31 2.30 lead 5h 3.54 3.23 2.84 2.66 2.73 lead 6h 3.05 lead 7h 3.32
Table 5.2: the average forecasting errors for the test set May 24 May 30, 1997. The training period is April 23 May 23, 1997.
p 3 3 3 3 3 3 3 3 3 5 3 q 1 1 1 3 3 3 5 5 5 5 7 neurons 10 20 30 10 20 30 10 20 30 30 30 lead 1h 1.31 1.26 1.23 2.07 1.38 1.32 2.60 1.37 1.41 1.44 1.42 lead 2h 2.49 1.91 1.78 3.13 1.76 1.85 1.91 1.75 lead 3h 2.53 2.17 2.22 3.37 2.39 2.20 2.25 2.25 lead 4h 3.46 2.47 2.50 2.50 2.40 lead 5h 3.40 2.57 2.66 2.85 2.69 lead 6h 3.03 lead 7h 3.28
It seems clear that the results with the shorter training set are slightly better. The hidden layer size of 30 neurons seems to be superior to smaller hidden layers. Increasing the output layer size does not seem to worsen the forecasting accuracy considerably with the shortest lead times. Therefore, the model with p = 3, q = 5 or 7, and number of hidden layer neurons = 30 seems to be the most appropriate.
82
Chapter 5. Models for hourly forecasting To get a more reliable opinion on the accuracy, the model with three load inputs and five outputs was applied to the same test sets as used in the first part of this chapter. The training sets were approximately one month long. The average error percentages for different lead times are given in figure 5.20.
7
Lead time = 1h Lead time = 2h Lead time = 3h Lead time = 4h Lead time = 5h
Training set
Figure 5.20: The average forecasting errors for different training and test sets with p=3, q=5, number of hidden layer neurons = 30. The test sets consist of the week straight after the training set.
The results are good for the first two test cases. For the third one, errors are however large. This is the week, which caused large forecasting errors also with the hour by hour model. Although the errors for the shortest lead times are quite small, there seems to be no clear improvement over the hour by hour model utilizing the data of the previous day and week. Separate networks for different hours The reason for using a separate network for each hour of the day is to enable training with the load data of the whole year within a reasonable time. The training set consists therefore of the data from May 24, 1996 to May 23, 1997, and the test data of the remaining data, from May 24, 1997 to August 18, 1997.
83
Chapter 5. Models for hourly forecasting The average error percentages with different model structures are given in table 5.3. The errors are clearly larger than with one network trained on a short training set. Larger hidden layer sizes were tried, but without considerable improvement in the results.
Table 5.3: the average forecasting errors with the model using separate networks for each hour of the day. The training set consists of data between May 24, 1996 May 23, 1997. The test set is the period May 24 August 18, 1997.
p 3 3 3 5 5 5 3 3 5 3 3 5 5 2 5 q 1 1 1 1 1 1 3 3 3 3 5 5 5 5 5 neurons 3 5 7 3 5 7 5 7 7 10 5 5 7 10 10 lead 1h 2.35 4.14 2.30 2.43 2.40 2.37 2.50 2.42 2.55 2.30 2.65 3.09 2.64 2.32 2.50 lead 2h 3.38 3.08 3.20 3.04 3.31 3.29 3.16 3.04 3.14 lead 3h 3.73 3.73 3.78 3.68 3.70 3.92 3.78 3.67 3.77 lead 4h 4.11 7.03 4.21 4.13 4.26 lead 5h 11.04 4.67 4.60 4.45 4.61
84
5.5 Summary
In the first part of the chapter, a MLP network model utilizing the data of the previous day and week as well as the most recent hours was tested. The forecasting was carried out hour by hour. The model was tested with and without temperature data on four test weeks. The results were clearly better, when daily average temperatures were included. The average errors varied between 2 % and 4 % on most test sets, when forecasts were updated daily. These results are quite similar to most of those reported in the references of this work. As already mentioned, the conditions vary from a test case to another, and therefore a more detailed comparison of error percentages to the literature values seems purposeless. A seasonal ARIMAX model was also tested and compared to the MLP models. The results were slightly better with the MLP model. Especially with long lead-times the advantage of MLP model over SARIMAX became evident. The average errors at lead-times from 1 to 24 hours were sorted out in order to see how efficiently the model can utilize the most recent information. It was found out that the accuracy was clearly better for the next few hours than for longer lead-times. In the second part of the chapter, a different MLP network was tested for the shortest lead-times. The model utilized only the load data of the most recent hours. The results did not, however, show improvement over the hour by hour model.
85
Chapter 6. Conclusions
6 Conclusions
Several neural network models for short-term load forecasting were studied in the work. The techniques were divided into two classes: models for creating the forecast for one whole day at a time (chapter 4), and models utilizing the most recent load data and allowing hourly forecasting (chapter 5). The focus was on discovering basic properties of different model types. This was considered necessary for the basis of a real forecasting application. The problem with the models of chapter 4 is that they don't allow updating during the day. If the forecast starts to go wrong, the models can not make any changes on the basis of the most recent information, until at the end of the day. Therefore, if they were used in a real application, a method for making corrections in real time would be necessary. Also, a means to provide forecasts for a sufficient number of lead-hours at all instances should be developed. In this sense, the hour by hour models of chapter 5 are more attractive (section 5.2). They allow hourly forecasting, and it was shown that this clearly improves the accuracy for the closest hours. Another good feature is the ability to forecast for an arbitrary lead-time. Forecasting one week ahead was tested, and assuming perfect forecasts for daily average temperatures, the accuracy was good. Since the accuracy even with daily updating was better than with static models of chapter 4, it is suggested that the hour by hour model type is more suitable for a real application. The model provided better results than the SARIMAX model, which was tested for the comparative purpose. In chapter 5, the idea to further improve the accuracy for the closest hours using a separate forecaster was also examined. Another model, which utilizes only the most recent load data to forecast the next hours, was introduced for this purpose in section 5.4. The test results did not, however, support the idea; the hour by hour model appeared about as accurate for these shortest lead-times, and seems thereby the most appropriate model for all time spans studied in this work. One future refinement to the model could be a more precise modeling of the temperature effect. In this work, the temperature was included as an input signal to the neural networks. The hourly temperature values could not be properly utilized this 86
Chapter 6. Conclusions way; the best results were obtained when using only daily average temperatures. The reason for this seems to be that the changes in the temperature are slow, and the succeeding measurements are correlated even with quite long delays. The temperature signal is therefore not rich enough to estimate many parameters in a complicated model (Rsnen and Ruusunen 1992). Another possible approach would be to try to separate the temperature effect, and forecast the temperature-normalized load. The temperature dependent part would then be added to this in order to get the final load forecast. The test results of the work were obtained using the load data of a Finnish electric utility over 15 months. As this is only a single case, further experimental evidence on another cases is desired. Without this, no conclusions on the generality of the model can be made. The next step related to this work is, therefore, to implement the hour by hour model (section 5.2) and use it along with the current forecasting models on the data of several electric utilities. There are some matters that have to be taken into account in implementing the model. First, the training of the network has to work automatically in a real-time environment. Different procedures can be tested. The simplest solution is to re-train the whole network at certain intervals. An alternative solution is to train continuously with the latest data. Giving more weight to the most recent data by presenting it more often to the network can also be tested. Another matter requiring attention is the treatment of abnormal data points (outliers) and anomalous load conditions, such as special holidays. This has not been studied experimentally in this work. There are two separate problems. First, the abnormal data must be recognized in order to remove it from the training data. Second, as accurate forecasts as possible should be created for the special days and days following these. All of this should be automated as far as possible. The first problem is perhaps easier to handle. In literature, several criteria to detect outliers have been suggested (see, e.g., Karanta and Ruusunen 1991). Also forecasting for the special days has been treated by many researchers (e.g., Hsu and Yang 1993, Lamedica et al. 1996, Kim et al. 1995). A common approach is to treat them as Sundays, and the days after these as Mondays. This is a simple solution, but requires
87
Chapter 6. Conclusions some refinements for the model at hand. In order to obtain greater accuracy, more sophisticated methods should be considered as future improvements. This work shows that the neural network models, specifically the hourly MLP models developed in chapter five, have many of the required properties, which were listed in chapter 1. The model is suited for an automatic application, and implementation to an energy management system is relatively easy. In the light of the tests in different parts of the year, it can adapt well to different weather conditions. A particularly good property is the ability to forecast accurately for all lead-times from one hour to one week. The problem with the models of this kind lies in the reliability; the forecasts are created in a rather complicated way and it is very difficult to understand how the model works. Therefore, the behavior in abnormal conditions may be unexpected. For this reason, a thorough on-line testing is necessary in order to ensure the reliability in varying conditions. Only this will provide a definite opinion on the applicability of the model.
88
References References Allera, S. V., and J. E. McGowan, 1986, "Medium-term forecasts of half-hourly system demand: development of an interactive demand estimation coefficient model", IEE Proceedings- C, Vol. 133, No. 7, November 1986, pp. 393-396. Asar, A., J. R. McDonald, 1994, "A specification of neural network applications in the load forecasting problem", IEEE transactions on control systems technology, Vol. 2, No. 2, June 1994, pp. 135-141. Bakirtzis, A. G., J. B. Theocharis, S. J. Kiartzis, K. J. Satsios, 1995, "Short term load forecasting using fuzzy neural networks", IEEE Transactions on Power Systems, Vol. 10, No. 3, August 1995, pp. 1518-1524. Baumann, T., A. J. Germond, 1993, "Application of the Kohonen Network to shortterm load forecasting, ANNPS Yokohama, April 1993. Bazaraa, M. S., H. D. Sherali, C. M. Shetty, 1993, "Nonlinear programming - theory and algorithms", 2 nd edition, John Wiley & Sons, Singapore. Box, G. E. P, G. M. Jenkins, 1976, "Time series analysis: forecasting and control", Holden-Day, San Fransisco. Broehl, J. H., 1981, "An end-use approach to demand forecasting", IEEE Transactions on Power Apparatus and Systems, Vol PAS-100, No. 6, June 1981, pp. 2714-2718. Bunn, D. W., E. D. Farmer (eds.), 1985, "Comparative models for electrical load forecasting", John Wiley & Sons, Belfast. Campo, R. and P. Ruiz, 1987, "Adaptive weather-sensitive short-term load forecast", IEEE Transactions on Power Systems, Vol. PWRS-2, No. 3, August 1987, pp. 592600. Chen, S.-T., D. C. Yu, A. R. Moghaddamjo, 1992, "Weather sensitive short-term load forecasting using nonfully connected artificial neural network", IEEE Transactions on Power Systems, Vol. 7, No. 3, August 1992, pp. 1098-1102. Chow, T. W. S., C. T. Leung, 1996, "Neural network based short-term load forecasting using weather compensation", IEEE Transactions on Power Systems, Vol. 11, No. 4, November 1996, pp. 1736-1742. Dash, P. K., H. P. Satpathy, A. C. Liew, S. Rahman, "A real-time short-term load forecasting system using functional link network", IEEE Transactions on Power Systems, Vol. 12, No. 2, May 1997, pp. 675-680. Djukanovic, M., B. Babic, D. J. Sobajic, Y.-H. Pao, 1993, "Unsupervised/supervised learning concept for 24-hour load forecasting", IEE Proceedings-C, Vol. 140, No. 4, July 1993, pp. 311-318. Economakos, E., 1979, "Application of fuzzy concepts to power demand forecasting", IEEE Transactions on Systems, Man, and Cybernetics, Vol. SMC-9, No. 10, October 1979, pp. 651-657. Funahashi, K., 1989, "On the approximate realization of continuous mapping by neural networks", Neural Networks, vol. 2, No. 3, 1989, pp. 183-192. Goh, T. N., S. S. Choi, and S. B. Chien, 1983, "Forecasting of electricity demand by end-use characteristics", Electric Power Systems Research, Vol. 10, No. 2, March 1986, pp. 145-148. 89
References Gross, G., F. D. Galiana, 1987, "Short-term load forecasting", Proceedings of the IEEE, Vol. 75, No. 12, December 1987, pp. 1558-1573. Gupta, P. C. and K. Yamada, 1972, "Adaptive short-term load forecasting of hourly loads using weather information", IEEE Transactions on Power Apparatus and Systems, Vol. PAS-91, No. 5, Sept./Oct. 1972, pp. 2085-2094. Hagan, M. T. and S. M. Behr, 1987, "The time series approach to short term load forecasting", IEEE Transaction on Power Systems, Vol. PWRS-2, No. 3, August 1987, pp, 785-791. Haykin, S., 1994, "Neural networks a comprehensive foundation", MacMillan College Publ. Co., New York. Ho, K.-L., Y.-Y. Hsu, C-C. Yang, 1992, "short-term load forecasting using a multilayer neural network with an adaptive learning algorithm", IEEE Transactions on Power Systems, Vol. 7, No. 1, February 1992, pp. 141-148. Hornik, K., M. Stinchcombe, H, White, 1989, "Multilayer feedforward networks are universal approximators", Neural Networks, vol 2, 1989, pp. 359-366. Hsu, Y.-Y., C.-C. Yang, 1991, "Design of artificial neural networks for short-term load forecasting. Part I: Self-organising feature maps for day type selection", IEE Proceedings-C, Vol. 138, No. 5, September 1991, pp. 407-413. Hsu, Y.-Y., C.-C. Yang, 1991, "Design of artificial neural networks for short-term load forecasting. Part II: multilayer feedforward networks for peak load and valley load forecasting", IEE Proceedings-C, Vol. 138, No. 5, September 1991, pp. 414-418. Hsu, Y.-Y., K.-L. Ho, 1992, "Fuzzy expert systems: an application to short-term load forecasting", IEE Proceedings-C, Vol. 139, No. 6, November 1992, pp. 471-477. Jabbour, K., J. F. V. Riveros, D. Landsbergen, and W. Meyer, 1988, "ALFA: Automated Load Forecasting Assistant", IEEE Transactions on Power Systems, Vol. 2, No. 3, August 1988, pp.908-914. Kallio, M., 1985, "Lmptilatekij shknkulutuksen ennustamisessa", Master's thesis, Systems Analysis Laboratory, Helsinki University of Technology (in Finnish). Karanta, I., 1990, "Short Term load forecasting in communal electric utilities", Licentiate thesis, Systems Analysis Laboratory, Helsinki University of Technology. Karanta, I., J. Ruusunen, 1991, "Short term load forecasting in communal electric utilities", Research Report A40, Systems Analysis Laboratory, Helsinki University of Technology. Khotanzad, A., R.-C. Hwang, A. Abaye, D. Maratukulam, 1995, "An adaptive modular artificial neural network hourly load forecaster and its implementation at electric utilities", IEEE Transactions on Power Systems, Vol. 10, No. 3, August 1995, pp. 1716-1722. Khotanzad, A., M. H. Davis, A. Abaye, D. J. Maratukulam, 1996, "An artificial neural network hourly temperature forecaster with applications in load forecasting", IEEE Transactions on Power Systems, Vol. 11, No. 2, May 1996, pp. 870-876. Kim, K.-H., J.-K. Park, K.-J. Hwang, S.-H. Kim, 1995, "Implementation of hybrid short-term load forecasting system using artificial neural networks and fuzzy expert systems", IEEE Transactions on Power Systems, Vol. 10, No. 3, August 1995, pp. 1534-1539. 90
References Kohonen, T., 1987, "Self-Organization and Associative Memory", 2 Springer-Verlag, Berlin. Kohonen, T., 1997, "Self-Organizing Maps", 2
nd nd
edition,
Laing, W. D., 1985, "Time series methods for predicting the CEGB demand", in: Bunn D. W, E. D. Farmer (eds), "Comparative models for electrical load forecasting", John Wiley & Sons, Belfast, 1985, pp. 69-85. Lamedica, R., A. Prudenzi, M. Sforna, M. Caciotta, V.O. Cencelli, 1996, "A neural network based technique for short-term forecasting of anomalous load periods", IEEE Transactions on Power Systems, Vol. 11, No. 4, November 1996, pp. 1749-1756. Lee, K. Y., J. H. Park, 1992, "short-term load forecasting using an artificial neural network", IEEE Transactions on Power Systems, Vol. 7, No. 1, February 1992, pp. 124-130. Lu, C. N., H. T. Wu, S. Vemuri, 1993, "Neural network based short term load forecasting", IEEE Transactions on Power Systems, Vol. 8, No. 1, February 1993, pp. 336-342. Moghram, I., S. Rahman, 1989, "Analysis and evaluation of five short-term load forecasting techniques", IEEE Transactions on Power Systems, Vol. 4, No. 4, October 1989, pp. 1484-1491. Mohammed, O., D. Park, R. Merchant, T. Dinh, C. Tong, A. Azeem, J. Farah, C. Drake, 1995, "Practical experiences with an adaptive neural network short-term load forecasting system", IEEE Transactions on Power Systems, Vol. 10, No. 1, February 1995, pp. 254-265. Momoh, J. A., K. Tomsovic, 1995, "Overview and literature survey of fuzzy set theory in power systems", IEEE Transactions on Power Systems, Vol. 10, No. 3, August 1995, pp. 1676-1690. Oja, E., Lecture material for the course Principles of Neural Computing, held at Helsinki University of Technology in spring 1997. Papalexopoulos, A. D., T. C. Hesterberg, 1990, "A regression-based approach to short-term system load forecasting", IEEE Transactions on Power Systems, Vol. 5, No. 4, November 1990, pp. 1535-1547. Park, D. C., M. A. El-Sharkawi, R. J. Marks II, L. E. Atlas, M. J. Damborg, 1991a, "Electric load forecasting using an artificial neural network", IEEE Transactions on Power Systems, Vol. 6, No. 2, May 1991, pp. 442-449. Park, J. H., Y. M. Park, K. Y. Lee, 1991b, "Composite modeling for adaptive shortterm load forecasting", IEEE Transactions on Power Systems, Vol. 6, No. 2, May 1991, pp. 450-457. Peng, T. M., N. F. Hubele, G. G. Karady, 1992, "Advanvement in the application of neural networks for short-term load forecasting", IEEE Transactions on Power Systems, Vol. 7, No. 1, February 1992, pp. 250-256. Peng, T. M., N. F. Hubele, G. G. Karady, 1993, "An adaptive neural network approach to one-week ahead load forecasting", IEEE Transactions on Power Systems, Vol. 8, No. 3, August 1993, pp. 1195-1203.
91
References Piggot, J. L., 1985, "Short-term forecasting at British Gas", in: Bunn D. W, E. D. Farmer (eds), "Comparative models for electrical load forecasting", John Wiley & Sons, Belfast, 1985, pp. 173-211. Pindyck, R. S., D. L. Rubinfeld, 1991, "Economtric models & economic forecasts", McGraw-Hill, Singapore. Piras, A., B. Buchenel, Y. Jaccard, "Heterogeneous artificial neural network for short term electrical load forecasting", IEEE Transactions on Power Systems, Vol. 11, No. 1, February 1996, pp. 397-402. Rahman, S., Bhatnagar, R., 1988, "An expert system based algorithm for short term load forecast", IEEE Transactions on Power Systems, Vol. 3, No. 2, May 1988, pp. 392-399. Rahman, S., O. Hazim, "A generalized knowledge-based short-term load-forecasting technique", IEEE Transactions on Power Systems, Vol. 8, No. 2, May 1993, pp. 508514. Rsnen, M., J. Ruusunen, "Verkoston tilan seuranta mittauksilla ja kuormitusmalleilla", Research Report B17, Systems Analysis Laboratory, Helsinki University of Technology (in Finnish). Rsnen, M., 1995, "Modeling processes in the design of electricity tariffs", Research Report A60, Systems Analysis Laboratory, Helsinki University of Technology. Sforna, M., F. Proverbio, 1995, "A neural network operator oriented short-term and online load forecasting environment", Electric Power Systems Research 33, 1995, pp. 139-149. Sharma, K. L. S., A. K. Mahalanabis, "Recursive short-term load forecasting algorithm", Proceedings of the Institution of Electrical Engineers, Vol. 121, January 1974, pp. 59-62. Srinivasan, D., C. S. Chang, A. C. Liew, 1995, "Demand forecasting using fuzzy neural computation, with special emphasis on weekend and public holiday forecasting", IEEE Transactions on Power Systems, Vol. 10, No. 4, November 1995, pp. 1897-1903. Sderstrm, T., P. Stoica, 1989,"System identification", Prentice Hall International, Cambridge. Thompson, R. P., 1976, "Weather sensitive electric demand and energy analysis on large geographically diverse power system Application to short term hourly electric demand forecasting", IEEE Transactions on Power Apparatus and Systems , Vol PAS-95, no. 1, Jan./Feb. 1976, pp. 385-393. Toyoda, J., M.-C. Chen, and Y. Inoue, 1970, "An application of state estimation to short-term load forecasting, Part I: Forecasting modeling, Part II: implementation", IEEE Transactions on Power Apparatus and Systems, VOL. PAS-89, No. 7, September/October 1970, pp. 1678-1688.
92