Documente Academic
Documente Profesional
Documente Cultură
In summarizing multivariate data, can consider the use of scatterplot matrix and
corresponding correlation matrix. To note correlation matrix merely measures the extent of
linear relationships and is not able to capture non-linear relationships.
The use of matrix to derive least squares coefficients
Residual standard deviation. An estimator of 𝜎 2 , the mean squared error (MSE), is defined as
𝑛
2
1
𝑠 = ∑(𝑦𝑖 − 𝑦̂𝑖 )2
𝑛 − (𝑘 + 1)
𝑖=1
Following property 3, the test statistics 𝑏𝑗 /𝑠𝑒(𝑏𝑗 ) can be shown to be the t-distribution with df =
n-(k+1). This is useful in testing the significance and confidence interval of 𝑏𝑗 . The confidence
interval can be used to infer if any hypothesized value (i.e. the null hypothesis) is to be rejected.
Rule of thumb is, if sample size is large enough, we can interpret a variable to be important if its
t-ratio exceeds two in absolute value
A graphical device to overcome the shortcomings of pair correlation is added variable plot, also
called partial regression plot.
It is possible that in a multiple linear regression context, one coefficient indicates negative
relationship with response but shows positive correlation between that variable and response. It
must be noted that however regression coefficients consider other variables in effect and
correlation is based on the pair values only. Added variable plot can provide insight to this
phenomenon.
The correlation for the added variable plot is called as partial correlation coefficient. It is the
correlation between the response and variable of interest in the presence of the other
explanatory variables. Partial correlation coefficient can be calculated either by drawing the plot
or using the formula:
𝑡(𝑏𝑗 )
𝑟(𝑦, 𝑥𝑗 |𝑥1 , … , 𝑥𝑗−1 , 𝑥𝑗+1 , … , 𝑥𝑘 ) =
2
√𝑡(𝑏𝑗 ) + 𝑛 − (𝑘 + 1)
Linear in “Linear Regression” implies linearity in the parameters but may be a highly non-linear
function of the explanatory variables