Sunteți pe pagina 1din 12

A CONCEPTUAL COST ESTIMATION MODEL FOR HYDROELECTRIC

POWER PLANT PROJECTS

Murat Gunduz1
Haci Bayram Sahin2

ABSTRACT

Due to high number of design parameters and complexity of construction, hydroelectric


power plant project costs are affected by many parameters. In this paper, 54 feasibility
reports of hydroelectric power plant (HEPP) projects are analyzed. A database is
constructed that has the project costs, design parameters and project features. The data set
is analyzed by using multiple regression analysis. A cost estimation model is developed
to estimate the hydroelectric power plant project cost. This model informs investors about
the conceptual cost of the HEPP projects without detailed investigations at the early
stages of project.

INTRODUCTION

One of the indispensable demands in every part of life is electricity due to not only
widely use of electrical equipments in homes but also in factories to produce
manufactured goods. Every moment the usage of electricity is increasing because of
rising world population and rapid development of technology. Electricity can be
generated from various sources such as nuclear, coal, natural gas, fuel oil, water, wind,
sun etc. Electricity derived from any renewable energy source (RES) is considered
‘‘green’’ because of the negligible impact on greenhouse gas emissions (Rahman, 2003).

In this study a cost estimation model was developed for HEPP projects with regression
technique by using past data of fifty four projects. In order to calculate the cost of a
HEPP project accurately, a detailed hydrological study, site investigation, good basin
planning, geotechnical survey and various tests are essential. Not only do these steps take
a long time and have cost but without proper planning and execution, they may be a
waste of time and money. Since these design stages are too time consuming, other fast
yet accurate methods are required (Verlinden et al., 2008). Improved cost estimation
techniques, which are available to project managers, will facilitate more effective control
of time and costs in construction projects (Hegazy, 2002).

In literature, different cost estimation methods can be found, resulting in three main
approaches as follows; 1) Variant-based cost estimation, 2) Generative cost estimation,
and 3) Hybrid cost estimation (Wierda, 1990). Variant-based cost estimation method is
developing the relationship between a forthcoming project cost and a similar past project.
In this study variant-based cost estimation method is studied by analyzing the data with
multiple regression analysis (MRA).

1
PhD, Associate Professor, Department of Civil Engineering, Middle East Technical University, Ankara,
Turkey, gunduzm@metu.edu.tr
2
President, HHA Engineering Ltd. Co., Ankara, Turkey, hbayram.sahin@hha.com.tr

A Conceptual Cost Estimation Model 1019


DEVELOPMENT OF THE MODEL

In order to estimate costs of construction projects researchers have developed different


techniques and models to implement for different types of construction projects. Ogayar
and Vidal (2009) studied the determination of electromechanical equipment cost of small
HEPP projects. Singal and Saini (2008) analyzed the cost of small dam, hydropower
plants having low-head dams based on the number of generating units. Kim et. al (2004)
collected the data of realized costs of 530 residential building constructed in Seoul. In
general, data is analyzed by three different methods which are regression, neural network
and case-based reasoning methods. Shtub and Versano (1999) compared neural network
and regression analysis methods while estimating the cost of pipe bending. Smith and
Mason (1999) applied regression and neural network techniques to a real world cost
estimation data set in order to estimate cost of pressure vessels. For this study, a
regression analysis method was used.

Analyzed Data Set

Regression analysis is a data oriented technique because it deals directly with the
collected data without considering the process behind these data (Zayed and Halpin,
2005). In order to construct a cost related data set fifty four feasibility reports of various
types of HEPP projects were examined carefully. These projects cover new projects and
therefore there is no need to apply an escalation factor. Types of collected parameters are
shown in Table 1.

Table 1. Analyzed Data Set


Abbreviation Definition
PC Project Cost
IC Installed Capacity
Qa Average Discharge
Qd Project Design Discharge
Hd Project Design Head
Lt Length of Tunnel
Lc Length of Channel
Letl Length of Energy Transmission Line
Dp Diameter of Penstock
Lp Length of Penstock
Q5 Five Year Occurrence Flood Discharge
Q100 Hundred Year Occurrence Flood Discharge
A Catchment Area of The Basin

1020 Collaborative Management of Integrated Watersheds


The formulations of quantity take-off cost, facility cost, project cost and total investment
cost can be written as follows;

Quantity Take-off Cost = CC + EMC + HMC (1)

Facility Cost = Quantity Take-off Cost +UC (2)

Project Cost = Facility Cost +Total ECC + EC (3)

Total Investment Cost = Project Cost + InC (4)

Whereas CC is construction cost, EMC is electromechanical cost, UC is unpredicted cost,


EC expropriation cost, ECC is engineering and consultancy cost and InC is interest cost.

In this study quantity take-off costs of HEPP projects were estimated by using historical
data because of unpredicted cost are difficult to calculate. Since expropriation costs vary
based on the area of the land and the value of the land, it is almost impossible to estimate
expropriation cost in the early stages. Thus, estimation of quantity take-off cost is
reasonable based on the data which are given the Table 1. An example of cost table of
one project among fifty four projects is represented in Table 2.

Table 2. Cost Table of HEPP Project -1


HEPP Project-1 Cost Table
Cost (TL)

1 Coffer Dams

1.a Upstream Cofferdam 372,795

1.b Downstream Cofferdam 144,891

Sub-Total 517,686

2 Derivation Tunnel, Sluice Outlet and Valve Room 3,991,901


3 Dam and Spillway
3.a Rockfill Dam With Clay Core 1,161,642
3.b Spillway 4,417,011
Sub-Total 5,578,653

4 Inlet Structure 2,979,958


5 Conveyance Tunnels and Approach Tunnels

5.a Conveyance Tunnel 104,390,161


5.b Approach Tunnels

A Conceptual Cost Estimation Model 1021


1,312,723

Sub-Total 105,702,884

6 Surge Tank 2,328,672

7 Power House 3,279,445

8 Penstock and Penstock Supports 5,004,770

9 Road Relocation and Work Site Roads 2,154,700

10 Permanent Housing 1,124,486

11 Electromechanical Equipment 48,360,000

12 Energy Transmission Line 3,288,480

Quantity Take-off Cost (Sum of Lines 1-12) 184,311,634


Unpredicted Cost 27,646,745
Facility Cost (Quantity Take-off Cost + Unpredicted Cost) 211,958,379
Engineering and Consultancy Cost for Construction Works 22,884,394
Engineering and Consultancy Cost for Electromechanical
Works 2,969,788
Expropriation Cost 3,719,000
Project Cost 241,531,561
Interest Cost During Construction Period 32,041,864
Total Investment Cost 273,573,425

The exchange rate for TL is: 1$ = 1.5 TL

Regression Analysis

Regression cost models have been used for estimating cost since the 1970s because they
have the advantage of a well-defined mathematical basis as well as measures of how well
a curve matches a given data set (Kim et al., 2004). Regression or multiple regression
analysis is a very powerful statistical tool that can be used as both an analytical and
predictive technique in examining the contribution of potential new items to the overall
estimate of reliability (Skitmore, 1990).

Kim et al. (2004) states that multiple regression analysis can be generally represented in
the form of

Y = C + b1X1 + b2X2 + … + bnXn; (5)

1022 Collaborative Management of Integrated Watersheds


where Y is the total estimated cost, and X1; X2; : : : ; Xn are measures of distinguishable
variables that may help in estimating Y. The parameter C is the estimated constant, and
b1; b2; : : : ; bn are the coefficients estimated by regression analysis, given the availability
of some relevant data.

At the end of regression analysis two evaluation parameters, R2-value and p-values, are
used to estimate the fit of the model. High values for R2 means (around 0.9 value) that a
reliable relationship has been constructed between the estimated variable and predictor
variables. P-values of each parameter used in the model should be around the 0.1 value or
less in order to have significant variability in the model.

Correlation of Variables: The variables of a regression model can be tested statistically to


select the best number of variables that best fit the available data (Zayed and Halpin,
2005). If some of the variables are interconnected multicollinearity occurs, generating
wrong cost equation relationships, and result in inaccurate cost estimates (Wang et al.,
1990). Correlation analysis for this study was performed with a statistical software called
Minitab in order to find out if mullticollinearity of the parameters exists. The equation of
correlation coefficient can be calculated by equation (6).

s XY 1 n (x − x)i (yi − y) ∑ x i yi − nx y
rXY = = ∑ = (6)
s Xs Y n ı =1 s X sY ⎡ 1 2⎤ ⎡ 1 2⎤
⎢∑ x i − ( n )(∑ x i ) ⎥ ⎢∑ y i − ( n )(∑ yi ) ⎥
2 2

⎣ ⎦ ⎣ ⎦

where x bar and sx are the sample mean and standard deviation for one sample, and y bar
and sy are the sample mean and standard deviation of other sample . If the correlation
coefficient of two variables is close to either +1 or -1, these two variables are highly
correlated and one of them should be disregarded in the regression model. All predictor
variables are analyzed by Minitab software whether each pairs of variable are correlated
or not.

In this study, if the p-value between two parameters is higher than 0.80, these two
parameters are assumed to be highly correlated to each other. Table 3 represents all
correlation coefficients and Table 4 shows the correlated parameter pairs and the Pearson
correlation values.

A Conceptual Cost Estimation Model 1023


Table 3. All Correlation Coefficients

IC Qave Qd Hd Lt Lc Letl Dp Lp Q5 Q100

Qave 0.840

Qd 0.836 0.991
- -
Hd 0.221 0.141 0.118

Lt 0.352 0.074 0.030 0.524


- - - - -
Lc 0.398 0.291 0.283 0.158 0.452
-
Letl 0.322 0.128 0.107 0.125 0.131 0.224
- -
Dp 0.816 0.847 0.805 0.138 0.246 0.281 0.296
-
Lp 0.351 0.169 0.203 0.769 0.410 0.132 0.033 0.027
- -
Q5 0.638 0.818 0.821 0.221 0.001 0.290 0.204 0.785 -0.047
- -
Q100 0.536 0.776 0.783 0.238 0.075 0.283 0.070 0.687 -0.064 0.914
- - - -
A 0.535 0.857 0.834 0.258 0.065 0.146 0.036 0.675 0.043 0.688 0.745

Table 4. Highly Correlated Variable Pairs and Correlation Coefficient


Pearson
Correlated Correlation
Parameters Values
Qd-Qave 0.991
Qd-IC 0.836
Qave-IC 0.840
Dp-Qave 0.847
Dp-IC 0.816
Dp-Qd 0.805
Q5-Qave 0.818
Q5-Qd 0.821
A-Qave 0.857
A-Qd 0.834

Qd – Qave : This pair has the highest p-value among all pairs. The reason why these
two pairs are highly correlated is that both variables are the discharges and they have a
very strong relationship. In other words, design discharge is determined by considering

1024 Collaborative Management of Integrated Watersheds


the value of average discharge. Therefore, average discharge Qave was disregarded
which was average discharge in the regression model. Based on above discussion, if a
parameter is highly correlated with both average (Qave) and design discharge (Qd), the
only relationship between that parameter and either average or design discharge can be
incorporated.

Qd – IC Installed capacity was calculated by two parameters which were design


discharge and design head. That is, installed capacity is not an independent variable - it is
dependent on design discharge and design head. Thus, installed capacity cannot be
considered in regression analysis.

Dp – Qave Diameter of penstock was determined by considering the average


discharge. Penstock diameter is also highly correlated with design discharge. So diameter
of penstock was not used as a predictor variable in the regression model.

Dp – IC As it is stated before design discharge and design head determines the


installed capacity. These two parameters are very effective in diameter of penstock.

Qave –Q5 Average discharge was calculated by considering rainfall dropped on the
basin area. In other words, a greater the basin area means a higher the average discharge.
Five-year occurrence flood discharge is affected by the basin characteristics. Although
these two parameters are not seen as related parameters, correlation analysis shows that
they are highly correlated. Five-year occurrence flood discharge is highly correlated with
design discharge also. Therefore five-year occurrence flood discharge was not included in
the MRA.

A – Qave As stated in the previous paragraph average flow is dependent on the


rainfall and the area of the basin. So, the correlation of basin area and average flow is
reasonable. Thus basin area was disregarded in the regression estimation model.

In conclusion, multicollinearity or correlation between predictor parameters was analyzed


by using statistical software called Minitab. Totally, twelve parameters were
investigated, because in regression analysis all parameters should be independent from
each other in order to develop a reliable estimation model. Each parameter pair that has a
correlation p-value higher than 0.80 is assumed to be highly correlated.

Based on the above discussion, the disregarded parameters are IC, Qave, Dp, Q5 and A.
These parameters were not included in the regression model due to correlation with other
parameters. Therefore, the regression analysis was performed with seven parameters: Qd,
Hd, Lt, Lc, Letl, Lp and Q100.

The final regression model is performed in steps. In every step if there exists an
unnecessary parameter, which has little significance on the model, it is dropped from the
model and the analysis is repeated until all parameters are effective in the analysis. This
procedure called is parsimonious modeling and Pankratz (1983) states that the principle

A Conceptual Cost Estimation Model 1025


of parsimony is important, because parsimonious models generally produce better
forecasts.

In parsimonious models, a backward elimination method is used for the initial regression
model. According to this technique, variables that were not contributing to the model are
eliminated one by one at each step. The regression statistic, significance level (P value,
which gives an indication of the significance of the variables included in the model) is
used for determination of variables to be eliminated. In general, the variables
corresponding to the coefficients with P values close to or less than 0.10 are considered to
have significant contribution to the model (Ontepeli, 2005). This procedure is repeated
until all predictor variables having p-values close to 0.1 or smaller than 0.1 in order to
develop regression model in this study.

In order to use the prediction models all parameters should be normalized by dividing
corresponding parameter with given numbers as shown in the table 5 before putting into
the model. The result will be the billion TL and it can be converted into million TL by
multiplying the result with thousand.

Table 5. Normalization Procedure of Parameters


Original
Parameter Unit Divided by
Cost TL 1,000,000,000
IC MW 100
3
Qave m /s 1,000
3
Qd m /s 1,000
Hd m 1,000
Lt m 1,000
Lc m 10,000
Letl Km 100
Dp m 10
Lp m 1,000
3
Q5 m /s 1,000
3
Q100 m /s 1,000
2
A Km 10,000

In the first regression analysis, the performance of Lc has a p-value of 0.993 and
therefore this parameter was eliminated in the next regression analysis.

1026 Collaborative Management of Integrated Watersheds


The regression-1 equation is
Cost = - 0.0176 + 1.98 Qd + 0.190 Hd + 0.0569 Lt - 0.0002 Lc +
0.127 Letl - 0.0592 Lp - 0.0474 Q100

Predictor Coef StDev T P


Constant -0.01756 0.01617 -1.09 0.284
Qd 1.9773 0.1842 10.73 0.000
Hd 0.19042 0.08028 2.37 0.022
Lt 0.05693 0.02093 2.72 0.010
Lc -0.00017 0.01898 -0.01 0.993
Letl 0.12735 0.06731 1.89 0.066
Lp -0.05919 0.03390 -1.75 0.088
Q100 -0.04740 0.01793 -2.64 0.012

S = 0.03637 R-Sq = 88.3% R-Sq(adj) = 86.4%

Analysis of Variance

Source DF SS MS F P
Regression 7 0.411070 0.058724 44.39 0.000
Residual Error 41 0.054234 0.001323
Total 48 0.465304

A second regression analysis was performed. Although ‘constant parameter’ has the
highest p-value, which is 0.183, this parameter cannot be disregarded in this regression
analysis. Thus, instead of the ‘constant parameter’, Lp was disregarded as it has the
second highest p- value.

The regression-2 equation is


Cost = - 0.0176 + 1.98 Qd + 0.190 Hd + 0.0570 Lt + 0.127 Letl -
0.0592 Lp - 0.0474 Q100

Predictor Coef StDev T P


Constant -0.01764 0.01304 -1.35 0.183
Qd 1.9776 0.1795 11.02 0.000
Hd 0.19045 0.07923 2.40 0.021
Lt 0.05701 0.01888 3.02 0.004
Letl 0.12744 0.06576 1.94 0.059
Lp -0.05923 0.03330 -1.78 0.083
Q100 -0.04739 0.01771 -2.68 0.011

S = 0.03593 R-Sq = 88.3% R-Sq(adj) = 86.7%

Analysis of Variance
Source DF SS MS F P
Regression 6 0.411070 0.068512 53.06 0.000
Residual Error 42 0.054234 0.001291
Total 48 0.465304

A Conceptual Cost Estimation Model 1027


The final regression model was obtained after three trials with having 0.875 R2 value and
the remaining parameters have small p-values (around 0.100 values)

The regression-3 equation is


Cost = - 0.0254 + 1.81 Qd + 0.0848 Hd + 0.0553 Lt + 0.149 Letl -
0.0374 Q100

Predictor Coef StDev T P


Constant -0.02536 0.01260 -2.01 0.051
Qd 1.8090 0.1562 11.58 0.000
Hd 0.08479 0.05373 1.58 0.122
Lt 0.05531 0.01932 2.86 0.006
Letl 0.14922 0.06622 2.25 0.029
Q100 -0.03742 0.01722 -2.17 0.035

S = 0.03683 R-Sq = 87.5% R-Sq(adj) = 86.0%

Analysis of Variance

Source DF SS MS F P
Regression 5 0.406985 0.081397 60.02 0.000
Residual Error 43 0.058319 0.001356
Total 48 0.465304

CONCLUSION

The accuracy of the regression model was checked after developing the model. The
performances of models were tested by using mean absolute percentage error (MAPE) of
which formula is given below.

MAPE = 1/n * | Actuali – Predictedi | / Predictedi (7)

In the formula “i” is the number of project, “actual” is the real cost of that project and
“predicted” is the predicted cost of that project calculated by estimation models. The
regression model was checked by comparing the actual cost of the testing projects with
predicted costs of those testing projects from the developed model. Table 6 shows the
predicted costs of testing projects, percent error and MAPE. These testing projects have
installed capacity from 4 MW to 110 MW, design discharge from 2.88 m3/s to 119 m3/s
and design head from 111 m to 335 m.

1028 Collaborative Management of Integrated Watersheds


Table 6. Results of Regression Model Cost Estimation
Real Cost Estimated Cost Percent
Project MAPE
Million TL Million TL Error
1 13.199 11.760 -10.90%
2 5.406 4.924 -8.91%
3 12.817 14.217 10.93% 9.94%
4 75.469 82.764 9.67%
5 220.068 240.546 9.31%

The costs of the testing projects are estimated with 10% error by the regression model
developed for this study. Since it is difficult to estimate the cost of hydroelectric power
plant projects at the early stages, this accuracy is in the acceptable range.

In conclusion, the aim of this study is to develop an estimation model for cost calculation
of HEPP projects in the early stage of the projects. Data from fifty-four HEPP projects
data were gathered and analyzed by regression technique. These fifty four sample project
costs were determined in different years from year 2005 to 2008. Time is important in
construction in terms of cost of material and labor and it is considered while developing
the model in this study. All costs of sample projects represent the year 2008 prices. Costs
of projects having prices other than 2008 year are recalculated by using official Turkey
escalation coefficient of related year. Therefore, when cost of a project is estimated by
using this model the estimated cost represents the 2008 prices. 2008 year cost can be
converted any year cost with escalation coefficient. Because escalation coefficient is
assigned by determining the changes in prices.

For the regression analysis statistical software called Minitab was used. Highly correlated
parameters were detected by correlation analysis and these parameters were not included
in regression analysis. After three trials, at the end of each trial the less significant
parameter was disregarded in next trial and the final regression model was constructed.
The developed final model was checked by comparing the actual cost of testing projects
with predicted costs of those testing projects from the developed model. The average
error of the model is 10% which enables investors to predict the cost of a HEPP project at
the beginning. Cost prediction models for HEPP projects can be improved by developing
sub models for each structure of HEPP projects. Prediction formula includes a length of
electricity transmission line which varies project to project and it does not have great
influence all type of projects. Therefore a new model can be developed after disregarding
the energy transmission line.

A Conceptual Cost Estimation Model 1029


REFERENCES

Hegazy T., (2002). Computer-based Construction Project Management. Upper Saddle River,
NJ: Prentice-Hall Inc.

Kim G.H, Sung-Hoon A., Kyung-In K., (2004). Comparison of construction cost estimating
models based on regression analysis, neural networks, and case-based reasoning. Building
and Environment 39, pp 1235 – 1242.

Meyer R., Krueger D., (1998). A Minitab Guide To Statistics, Prentice Hall Inc.

Ogayar B., Vidal P. G., (2009), Cost Determination of the Electro-Mechanical Equipment of
a Small Hydro-Power Plant, Renewable Energy 34. 6-13.

Ontepeli M.B., (2005). Conceptual cost estimating of urban railway system projects. Ms.
Thesis presented to Middle East Technical University (METU), Ankara, Turkey.

Pankratz A., (1983). Forecasting with univariate Box-Jenkins models, John Wiley and Sons,
New York.

Rahman S., (2003), Green Power: What it is and Where can we find it?, IEEE Power and
Energy Magazine, 30-37.

Shtub A., Versano R., (1999), Estimating the cost of steel pipe bending, a comparison
between neural networks and regression analysis

Singal S. K., Saini R.P., (2008), Cost analysis of low-head dam-toe small hydropower plants
based on number of generating units, Energy for Sustainable Development Volume XII No 3.
55-60.

Smith A.E., Mason A.K., (1999). Cost estimation predictive modeling: Regression versus
neural networks. Engineering economist 42 (2), 137–161.

Verlinden B., J.R. Duflou, P. Collin, D. Cattrysse. (2008). Cost estimation for Sheet Metal
Parts Using Multiple Regression and Artificial Neural Networks: A Case Study. International
Journal of Production Economics 111. 484 – 492.

Wierda, L., (1990). Cost Information Tools for Designers. Delft University Press, Delft, The
Netherlands.

Zayed M. Tarek, Halpin W. Daniel, (2005), Productivity and Cost Regression Models for
Pile Construction, Journal of Construction Engineering and Management 131. 779-789.

1030 Collaborative Management of Integrated Watersheds

S-ar putea să vă placă și