Documente Academic
Documente Profesional
Documente Cultură
Murat Gunduz1
Haci Bayram Sahin2
ABSTRACT
INTRODUCTION
One of the indispensable demands in every part of life is electricity due to not only
widely use of electrical equipments in homes but also in factories to produce
manufactured goods. Every moment the usage of electricity is increasing because of
rising world population and rapid development of technology. Electricity can be
generated from various sources such as nuclear, coal, natural gas, fuel oil, water, wind,
sun etc. Electricity derived from any renewable energy source (RES) is considered
‘‘green’’ because of the negligible impact on greenhouse gas emissions (Rahman, 2003).
In this study a cost estimation model was developed for HEPP projects with regression
technique by using past data of fifty four projects. In order to calculate the cost of a
HEPP project accurately, a detailed hydrological study, site investigation, good basin
planning, geotechnical survey and various tests are essential. Not only do these steps take
a long time and have cost but without proper planning and execution, they may be a
waste of time and money. Since these design stages are too time consuming, other fast
yet accurate methods are required (Verlinden et al., 2008). Improved cost estimation
techniques, which are available to project managers, will facilitate more effective control
of time and costs in construction projects (Hegazy, 2002).
In literature, different cost estimation methods can be found, resulting in three main
approaches as follows; 1) Variant-based cost estimation, 2) Generative cost estimation,
and 3) Hybrid cost estimation (Wierda, 1990). Variant-based cost estimation method is
developing the relationship between a forthcoming project cost and a similar past project.
In this study variant-based cost estimation method is studied by analyzing the data with
multiple regression analysis (MRA).
1
PhD, Associate Professor, Department of Civil Engineering, Middle East Technical University, Ankara,
Turkey, gunduzm@metu.edu.tr
2
President, HHA Engineering Ltd. Co., Ankara, Turkey, hbayram.sahin@hha.com.tr
Regression analysis is a data oriented technique because it deals directly with the
collected data without considering the process behind these data (Zayed and Halpin,
2005). In order to construct a cost related data set fifty four feasibility reports of various
types of HEPP projects were examined carefully. These projects cover new projects and
therefore there is no need to apply an escalation factor. Types of collected parameters are
shown in Table 1.
In this study quantity take-off costs of HEPP projects were estimated by using historical
data because of unpredicted cost are difficult to calculate. Since expropriation costs vary
based on the area of the land and the value of the land, it is almost impossible to estimate
expropriation cost in the early stages. Thus, estimation of quantity take-off cost is
reasonable based on the data which are given the Table 1. An example of cost table of
one project among fifty four projects is represented in Table 2.
1 Coffer Dams
Sub-Total 517,686
Sub-Total 105,702,884
Regression Analysis
Regression cost models have been used for estimating cost since the 1970s because they
have the advantage of a well-defined mathematical basis as well as measures of how well
a curve matches a given data set (Kim et al., 2004). Regression or multiple regression
analysis is a very powerful statistical tool that can be used as both an analytical and
predictive technique in examining the contribution of potential new items to the overall
estimate of reliability (Skitmore, 1990).
Kim et al. (2004) states that multiple regression analysis can be generally represented in
the form of
At the end of regression analysis two evaluation parameters, R2-value and p-values, are
used to estimate the fit of the model. High values for R2 means (around 0.9 value) that a
reliable relationship has been constructed between the estimated variable and predictor
variables. P-values of each parameter used in the model should be around the 0.1 value or
less in order to have significant variability in the model.
s XY 1 n (x − x)i (yi − y) ∑ x i yi − nx y
rXY = = ∑ = (6)
s Xs Y n ı =1 s X sY ⎡ 1 2⎤ ⎡ 1 2⎤
⎢∑ x i − ( n )(∑ x i ) ⎥ ⎢∑ y i − ( n )(∑ yi ) ⎥
2 2
⎣ ⎦ ⎣ ⎦
where x bar and sx are the sample mean and standard deviation for one sample, and y bar
and sy are the sample mean and standard deviation of other sample . If the correlation
coefficient of two variables is close to either +1 or -1, these two variables are highly
correlated and one of them should be disregarded in the regression model. All predictor
variables are analyzed by Minitab software whether each pairs of variable are correlated
or not.
In this study, if the p-value between two parameters is higher than 0.80, these two
parameters are assumed to be highly correlated to each other. Table 3 represents all
correlation coefficients and Table 4 shows the correlated parameter pairs and the Pearson
correlation values.
Qave 0.840
Qd 0.836 0.991
- -
Hd 0.221 0.141 0.118
Qd – Qave : This pair has the highest p-value among all pairs. The reason why these
two pairs are highly correlated is that both variables are the discharges and they have a
very strong relationship. In other words, design discharge is determined by considering
Qave –Q5 Average discharge was calculated by considering rainfall dropped on the
basin area. In other words, a greater the basin area means a higher the average discharge.
Five-year occurrence flood discharge is affected by the basin characteristics. Although
these two parameters are not seen as related parameters, correlation analysis shows that
they are highly correlated. Five-year occurrence flood discharge is highly correlated with
design discharge also. Therefore five-year occurrence flood discharge was not included in
the MRA.
Based on the above discussion, the disregarded parameters are IC, Qave, Dp, Q5 and A.
These parameters were not included in the regression model due to correlation with other
parameters. Therefore, the regression analysis was performed with seven parameters: Qd,
Hd, Lt, Lc, Letl, Lp and Q100.
The final regression model is performed in steps. In every step if there exists an
unnecessary parameter, which has little significance on the model, it is dropped from the
model and the analysis is repeated until all parameters are effective in the analysis. This
procedure called is parsimonious modeling and Pankratz (1983) states that the principle
In parsimonious models, a backward elimination method is used for the initial regression
model. According to this technique, variables that were not contributing to the model are
eliminated one by one at each step. The regression statistic, significance level (P value,
which gives an indication of the significance of the variables included in the model) is
used for determination of variables to be eliminated. In general, the variables
corresponding to the coefficients with P values close to or less than 0.10 are considered to
have significant contribution to the model (Ontepeli, 2005). This procedure is repeated
until all predictor variables having p-values close to 0.1 or smaller than 0.1 in order to
develop regression model in this study.
In order to use the prediction models all parameters should be normalized by dividing
corresponding parameter with given numbers as shown in the table 5 before putting into
the model. The result will be the billion TL and it can be converted into million TL by
multiplying the result with thousand.
In the first regression analysis, the performance of Lc has a p-value of 0.993 and
therefore this parameter was eliminated in the next regression analysis.
Analysis of Variance
Source DF SS MS F P
Regression 7 0.411070 0.058724 44.39 0.000
Residual Error 41 0.054234 0.001323
Total 48 0.465304
A second regression analysis was performed. Although ‘constant parameter’ has the
highest p-value, which is 0.183, this parameter cannot be disregarded in this regression
analysis. Thus, instead of the ‘constant parameter’, Lp was disregarded as it has the
second highest p- value.
Analysis of Variance
Source DF SS MS F P
Regression 6 0.411070 0.068512 53.06 0.000
Residual Error 42 0.054234 0.001291
Total 48 0.465304
Analysis of Variance
Source DF SS MS F P
Regression 5 0.406985 0.081397 60.02 0.000
Residual Error 43 0.058319 0.001356
Total 48 0.465304
CONCLUSION
The accuracy of the regression model was checked after developing the model. The
performances of models were tested by using mean absolute percentage error (MAPE) of
which formula is given below.
In the formula “i” is the number of project, “actual” is the real cost of that project and
“predicted” is the predicted cost of that project calculated by estimation models. The
regression model was checked by comparing the actual cost of the testing projects with
predicted costs of those testing projects from the developed model. Table 6 shows the
predicted costs of testing projects, percent error and MAPE. These testing projects have
installed capacity from 4 MW to 110 MW, design discharge from 2.88 m3/s to 119 m3/s
and design head from 111 m to 335 m.
The costs of the testing projects are estimated with 10% error by the regression model
developed for this study. Since it is difficult to estimate the cost of hydroelectric power
plant projects at the early stages, this accuracy is in the acceptable range.
In conclusion, the aim of this study is to develop an estimation model for cost calculation
of HEPP projects in the early stage of the projects. Data from fifty-four HEPP projects
data were gathered and analyzed by regression technique. These fifty four sample project
costs were determined in different years from year 2005 to 2008. Time is important in
construction in terms of cost of material and labor and it is considered while developing
the model in this study. All costs of sample projects represent the year 2008 prices. Costs
of projects having prices other than 2008 year are recalculated by using official Turkey
escalation coefficient of related year. Therefore, when cost of a project is estimated by
using this model the estimated cost represents the 2008 prices. 2008 year cost can be
converted any year cost with escalation coefficient. Because escalation coefficient is
assigned by determining the changes in prices.
For the regression analysis statistical software called Minitab was used. Highly correlated
parameters were detected by correlation analysis and these parameters were not included
in regression analysis. After three trials, at the end of each trial the less significant
parameter was disregarded in next trial and the final regression model was constructed.
The developed final model was checked by comparing the actual cost of testing projects
with predicted costs of those testing projects from the developed model. The average
error of the model is 10% which enables investors to predict the cost of a HEPP project at
the beginning. Cost prediction models for HEPP projects can be improved by developing
sub models for each structure of HEPP projects. Prediction formula includes a length of
electricity transmission line which varies project to project and it does not have great
influence all type of projects. Therefore a new model can be developed after disregarding
the energy transmission line.
Hegazy T., (2002). Computer-based Construction Project Management. Upper Saddle River,
NJ: Prentice-Hall Inc.
Kim G.H, Sung-Hoon A., Kyung-In K., (2004). Comparison of construction cost estimating
models based on regression analysis, neural networks, and case-based reasoning. Building
and Environment 39, pp 1235 – 1242.
Meyer R., Krueger D., (1998). A Minitab Guide To Statistics, Prentice Hall Inc.
Ogayar B., Vidal P. G., (2009), Cost Determination of the Electro-Mechanical Equipment of
a Small Hydro-Power Plant, Renewable Energy 34. 6-13.
Ontepeli M.B., (2005). Conceptual cost estimating of urban railway system projects. Ms.
Thesis presented to Middle East Technical University (METU), Ankara, Turkey.
Pankratz A., (1983). Forecasting with univariate Box-Jenkins models, John Wiley and Sons,
New York.
Rahman S., (2003), Green Power: What it is and Where can we find it?, IEEE Power and
Energy Magazine, 30-37.
Shtub A., Versano R., (1999), Estimating the cost of steel pipe bending, a comparison
between neural networks and regression analysis
Singal S. K., Saini R.P., (2008), Cost analysis of low-head dam-toe small hydropower plants
based on number of generating units, Energy for Sustainable Development Volume XII No 3.
55-60.
Smith A.E., Mason A.K., (1999). Cost estimation predictive modeling: Regression versus
neural networks. Engineering economist 42 (2), 137–161.
Verlinden B., J.R. Duflou, P. Collin, D. Cattrysse. (2008). Cost estimation for Sheet Metal
Parts Using Multiple Regression and Artificial Neural Networks: A Case Study. International
Journal of Production Economics 111. 484 – 492.
Wierda, L., (1990). Cost Information Tools for Designers. Delft University Press, Delft, The
Netherlands.
Zayed M. Tarek, Halpin W. Daniel, (2005), Productivity and Cost Regression Models for
Pile Construction, Journal of Construction Engineering and Management 131. 779-789.