Documente Academic
Documente Profesional
Documente Cultură
RA:207762
PROFESSOR
UNIVERSITY OF CAMPINAS
BRASIL
2018
Introduction
The scientific research has many objectives, between these, it aims to create models for
different phenomena. Knowing the behavior, it is not an easy work to do. The first thing the
investigator has to achieve is to recreate the phenomenon that is being investigated, as well
as the conditions of the environment.
The researcher needs to ensure that the replica is performing very close to real behavior. It
is necessary to do many tests to get enough data to propose a mathematical model capable
of representing the behavior of the phenomena.
The data must be analyzed and treated. The investigator ought to know some statistical
techniques to fit the data on the most suitable model. The procedure could be done through
techniques like minimal squares, maximal verisimilitude, linear regression and others.
In addition, it is necessary to know how accurate the model is. The researcher must be
aware of the approximations and possible errors that the model could have. Therefore, the
errors need to be calculated.
This work aims to determine parameters of some models, as well as the confidence interval
of each parameter. It will be used linear regression, minimal squares theory and Bayesian
inference.
Methodology
The works aims to determine parameter of models. It exist several statistical techniques to
approximate the value of the parameter, minimizing the error of the calculations. The most
known technique is linear regression.
𝑦 = 𝑋𝛽 + 𝜀, where
1 𝑥11 𝑥1𝑝 𝛽𝑜 𝑦1
𝑋 = [1 𝑥12 𝑥2𝑝 ] 𝛽 = [ 𝛽1 ] 𝑦 = [𝑦2 ]
1 𝑥1𝑛 𝑥𝑛𝑝 𝛽𝑝 𝑦𝑛
The least squares is the most common estimation method. It is easily to calculate. The
method minimizes the sum of squares residuals. The expression for parameters beta is the
next
−1
𝛽̂ = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑦 = (∑ 𝑥𝑖 𝑥𝑖𝑇 ) (∑ 𝑥𝑖 𝑦𝑖 )
Maximum likelihood estimation can be performed when the distribution of the error terms is
known to belong to a certain parametric family ƒθ of probability distributions. When fθ is a
normal distribution with zero mean and variance θ, the resulting estimate is identical to the
ordinary least squares estimate. Generalized least squares estimates are maximum
likelihood estimates when ε follows a multivariate normal distribution with a known
covariance matrix.
Results
1) Determine the parameter of the model considering the data of the table below.
Indicate the confidence interval of 95%.
𝑦 = 𝛽𝑜 + 𝛽1 𝑥1 + 𝛽2 𝑥2 + 𝜀
Y 12 13 3 3 11 19 1 14 15 17 2 15
X1 31 16 29 19 27 21 24 11 26 18 12 3
X2 4 5 3 0 2 6 2 3 6 6 1 5
The parameters are determined using the linear regression theory. In this case is multiple
linear regression. The system could be represented as matrix system as follows:
𝒀 = 𝑿𝜸 + 𝜺, where X is a matrix of the data, Y is the vector of the values of the model and 𝛾
is the vector of the parameters. The values of the parameters are obtained applying minimal
squares theory.
𝛾 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝑋 = [1 𝑥1 𝑥2 ].
For calculate the confidence interval of each parameter, it must first calculate its variance.
The value of the estimators for the variance of each parameter are the next.
𝜎̂(𝛽𝑜 ) = 3.4221
𝜎̂(𝛽1 ) = 0.1313
𝜎̂(𝛽2 ) = 0.5277
The confidence interval is defined as 𝑡𝛼,(𝑛−𝑝−1) ∗ 𝜎̂(𝛽𝑖 ). The confidence interval for the
2
Another way to solve the problem is though the methodology presented below. For a three-
parameter system could be applied an equation system based on summations of the data.
The next matrix system is obtained:
𝑛 ∑ 𝑥1 ∑ 𝑥2 𝛽𝑜 ∑𝑌
∑
[ 𝑥1 ∑ 𝑥12 ∑ 𝑥1 𝑥2 ] [𝛽1 ] = [∑ 𝑥1 𝑌]
∑ 𝑥2 ∑ 𝑥1 𝑥2 ∑ 𝑥22 𝛽2 ∑ 𝑥2 𝑌
12 237 43 𝛽𝑜 125
[237 5439 843] [𝛽1 ] = [2363]
43 843 201 𝛽2 571
Solving the system, the same values for the parameters are obtained.
2) Determine the parameter of the model considering the data of the table below.
Indicate the confidence interval of 95%.
𝑌 = 𝛽𝑜 + 𝛽1 𝑥 + 𝛽2 𝑥 2 + 𝜀
Y 0,08 0,18 0,32 0,53 0,88 1,3 1,95 2,8 3,9 4,6 16,54
X 1 2 3 4 5 6 7 8 9 10 55
x^2 1 4 9 16 25 36 49 64 81 100 385
Using the methodology of the previous exercise the system is solved by the multiplication of
the matrixes.
𝛾 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑌
𝑋 = [1 𝑥 𝑥 2 ].
𝜎̂(𝛽𝑜 ) = 0.12777
𝜎̂(𝛽1 ) = 0.05336
𝜎̂(𝛽2 ) = 0.00473
𝑧 = 𝑎𝑥 𝑏 𝑦 𝑐 + 𝜀
a) Estimate the parameters by linear regression.
For applying linear regression it is necessary to linearize the model, to do this, logarithm is
a good tool. The procedure is as follows
𝑧 = 𝑎𝑥 𝑏 𝑦 𝑐
ln(𝑧) = ln( 𝑎𝑥 𝑏 𝑦 𝑐 )
The model has now a linear form, thus, it is possible to apply linear regression. The input
data has to be altered to get to do the process. It must be calculated the logarithm of the
data. Rearranging the table, it could be written as:
x y z ln x ln y ln z
20 6 3948 2,99573227 1,79175947 8,280964401
22 7 5372 3,09104245 1,94591015 8,588955558
24 8 6772 3,17805383 2,07944154 8,820551743
26 9 8796 3,25809654 2,19722458 9,082052352
28 10 10874 3,33220451 2,30258509 9,294129898
30 11 13200 3,40119738 2,39789527 9,487972109
32 12 15955 3,4657359 2,48490665 9,677527539
34 13 19055 3,52636052 2,56494936 9,855084813
36 14 22433 3,58351894 2,63905733 10,01828837
38 15 26213 3,63758616 2,7080502 10,17401075
40 16 30356 3,68887945 2,77258872 10,32074947
20 6 3979 2,99573227 1,79175947 8,28878581
22 7 5150 3,09104245 1,94591015 8,546751994
24 8 6824 3,17805383 2,07944154 8,828201089
26 9 8580 3,25809654 2,19722458 9,057189192
28 10 10801 3,33220451 2,30258509 9,287394001
30 11 13191 3,40119738 2,39789527 9,487290058
32 12 16032 3,4657359 2,48490665 9,682342004
34 13 18988 3,52636052 2,56494936 9,85156248
36 14 22311 3,58351894 2,63905733 10,01283511
38 15 26196 3,63758616 2,7080502 10,17336201
40 16 30235 3,68887945 2,77258872 10,31675547
20 6 3913 2,99573227 1,79175947 8,272059622
22 7 5332 3,09104245 1,94591015 8,581481681
24 8 6913 3,17805383 2,07944154 8,841158976
26 9 8818 3,25809654 2,19722458 9,084550366
28 10 10788 3,33220451 2,30258509 9,286189684
30 11 13224 3,40119738 2,39789527 9,48978864
32 12 16102 3,4657359 2,48490665 9,686698767
34 13 19071 3,52636052 2,56494936 9,855924136
36 14 22509 3,58351894 2,63905733 10,02167051
38 15 26234 3,63758616 2,7080502 10,17481156
40 16 30313 3,68887945 2,77258872 10,31933194
This study case has three parameter to calculate, which are obtained solving the equation
system presented below:
𝑏 = 2.1513
𝑐 = 0.5589
𝜎̂(𝑎) = 0.89399
𝜎̂(𝑏) = 0.16779
𝜎̂(𝑐) = 0.11935
For applying the minimal squares theory, it was used the same model than previous item. It
was took the linear equation to approximate the behavior of the system. The data matrix
was made with them logarithm since the model is in logarithmical scale. The system has the
next form:
𝛾 = (𝑋 𝑇 𝑋)−1 𝑋 𝑇 𝑍
𝑏 = 2.1513
𝑐 = 0.5589
𝜎̂(𝑎) = 0.89399
𝜎̂(𝑏) = 0.16779
𝜎̂(𝑐) = 0.11935
Independent on the methodology used, the values of the parameters must be the same, as
well as the confidence interval.
c) Determine sampling for the parameters a, b and c. Determine the variance. Use
Bayesian inference in this procedure.
For this item, it was used the matlab toolbox mcmc, which is the toolbox for Bayesian
inference. The code consist of a function where it is defined the global maximum of the
logarithm verisimilitude function. The code calls the function and with initial parameters
calculate sampling for the parameters. Next, the results of some runs are shown.
a b c sig2
1,9517 2,2747 0,4647 2,4667
2,7003 2,0739 0,6133 1,1796
3,2754 1,9575 0,7021 1,3931
2,4051 2,1293 0,5725 0,9370
2,1181 2,2221 0,5044 1,3637
2,8655 2,0218 0,6520 0,9806
1,9997 2,2559 0,4786 1,6194
4,7160 1,7550 0,8523 3,7273
3,2383 1,9440 0,7114 1,0679
2,5025 2,1041 0,5915 0,8712
2,9596 2,0101 0,6605 1,4269
2,3239 2,1795 0,5338 1,9398
2,6757 2,0645 0,6199 1,0012
2,2564 2,1697 0,5419 1,0994
2,1787 2,1941 0,5237 0,7691
1,8094 2,3067 0,4403 0,8647
2,8058 2,0563 0,6266 3,4960
4,7420 1,7428 0,8603 2,4831
2,4680 2,1140 0,5834 1,0311
2,0459 2,2291 0,4985 1,2731
2,6002 2,0795 0,6104 1,0870
2,6208 2,0781 0,6094 0,9131
2,3678 2,1411 0,5627 1,0549
2,5370 2,0960 0,5974 0,9273
3,2069 1,9520 0,7042 1,1367
3,7722 1,8544 0,7760 1,2258
The plot of the parameters sampling is as follows:
Parameter a:
Parameter b:
Parameter c:
d) Determine the density probability function using the principle of maximum entropy
for the parameter obtained in items a and b.
The three parameters, a, b and c have similar characteristics. Whole of them have half
statistic and variance known. Therefore, the system presents three constrictions. the
constrictions used have the following form:
2
𝑃(𝑥) = 𝑒 𝑘𝑜 −𝑘1 𝑥−𝑘2 𝑥
Parameter a:
𝜇𝑎 = 2.30297 𝜎𝑎 = 0.89399
Parameter b:
𝜇𝑏 = 2.1513 𝜎𝑏 = 0.16779
2
𝑃𝑏 (𝑥) = 𝑒 81.32784+76.4132𝑥−17.7597𝑥
Parameter c:
𝜇𝑐 = 0.5589 𝜎𝑐 = 0.11935
2
𝑃𝑐 (𝑥) = 𝑒 9.7578−39.2364𝑥−35.1014𝑥
4) Choose one statistical distribution and estimate its parameters using distribution-
fitting tool of matlab.
It was created one vector for variables, other for frequency and another one for censoring.
The data was uploaded to matlab toolbox. The objective is to find the probability function
that fits better to de data. Next, it is shown some examples:
1 (𝑥−𝜇)2
𝑝(𝑥) = 𝑒− 𝜎
𝜎√2𝜋
𝜇 = 344.778 𝜎 = 102.57
Weibull distribution fit
𝛽𝑥 𝛽−1 𝑥 𝛽
𝑝(𝑥) = exp (− ( ) )
𝛼𝛽 𝛼
𝛼 = 381.58 𝛽 = 4.015
𝑥−𝜇
𝑝(𝑥) = exp (− exp (− ))
𝜎
𝜇 = 393.56 𝜎 = 86.68
Gamma distribution fit
𝑥 𝛼−1 𝑥
𝑝(𝑥) = 𝛼
exp (− )
𝛽 Γ(𝛼) 𝛽
𝛼 = 10.736 𝛽 = 32.11
Comparison of the distributions
The extreme value distribution fits better to the data. Weibull and Normal distribution have
similar behavior. Even extreme value distribution seems to be the best option; normal
distribution could be a suitable distribution function for the study case, since it is the
easiest to deal and known distribution.