Documente Academic
Documente Profesional
Documente Cultură
Proprietate:
= 1;
=
j j
M (Y ))) = M (XY )
M (X) M (Y )
cov (X; Y )
=p
D2 (X) D2 (Y )
1;
= 0;
M (X)) (Y
necorelare
Repartitii asociate:
P
(X; Y )
(C1 ) =
8 P P
>
< x2A y2B p (x; y)
sau
f (x; y) l2 ;
>
:
8
< P
:
(x;y) ;
(X; Y ) (C1
sau
1
(X; Y ) (C1
1
rep. discreta
rep. continua
B) ;
rep. discreta
R) ;
rep. continua
(C2 ) =
8
< P
:
(X; Y ) (A
sau
1
(X; Y ) (R
C2 ) ;
rep. discreta
C2 ) ; rep. continua
y2B
pY (y)
x2A
p (x; y) ; x 2 A
p (x; y) ; y 2 B
f (x; y) dy; x 2 R
fY (y)
f (x; y) dx; y 2 R
Proprietate:
X; Y independente
necorelate
=) X; Y
h dP
Demonstratie:
Notam aplicatia identitate cu i : (
ca i este masurabila si P i 1 = PjF
Z
h dPjF =
h dP
; K) ! ( ; F) :
h i dP =
h dP
Rezulta
g1 dP =
Dar
g1 ; g2
Fie
sunt
g2 dP 8A 2 F
masurabile. Rezulta g1 = g2 P
a:s:
(A)
g:
! R+
gdPjF 8A 2 F
Aplicam Lema:
Z
gdPjF =
Deci
IA gdPjF =
XdP =
IA gdP =
gdP
gdP 8A 2 F
X ;
cu
M X
jF ;
Atunci notam
M (1A j F) = P (A j F)
variabila aleatoare, F = B (Y ) = Y
(B) :
M (X j B (Y )) = M (X j Y )
A 2 K; X = 1A
si
F = B (Y ) :
Atunci notam
M (1A j B (Y )) = P (A j Y )
4
Atunci notam
cu proprietatea
M (X j Y = y) Y = M (X j Y ) P
a:s:
Propozitie
Fie X si Y variabile aleatoare, cu X nenegativa sau
integrabila. Functia masurabila ' : R ! R este versiune a
mediei conditionate M (X j Y ) daca si numai daca
Z
' (y) dP
(y) =
XdP; 8B 2 B
1 (B)
Demonstratie:
Z
' Y
' Y dP
Dar B (Y ) = Y
' (y) dP
= M (X j Y ) P a:s: ,
Z
=
M (X j Y ) dP; 8A 2 B (Y )
1 (B)
1 (B)
P (Y = ak )
k2I
P (Y = ak ) > 0 8k;
5
X
k2I
fak g
P (Y = ak ) = 1
XdP
1 (B)
M (X j Y = y)
cu
1
P (Y = ak )
XdP:
fY =ak g
Notam cu
'
' (ak ) =
XdP; k 2 I
cu
A = fak ; k 2 Ig :
fY =ak g
' (y) dP
(y)
' (y) dP
B\A
(y) =
Fie B
2 B:
' (ak ) P (Y = ak ) =
ak 2B\A
XdP =
ak 2B\A
fY =ak g
XdP
1 (B)
(X; Y )
X X
p (x; y)
x2A0 y2A
fa0k ; k 2
(X; Y )
f(x;y)g
A0 =
Ig
A = fak ; k 2 Ig
atunci
M (X j Y = ak ) =
X
k2I
a0k
P (X = a0k ; Y = ak ) X 0
=
ak P (X = a0k j Y = ak )
P (Y = ak )
k2I
f (x; y) dx
Aratam ca
M (X j Y = y) =
f (x; y)
dx
fY (y)
f (x; y)
dx
fY (y)
Fie B 2 B
Z
' (y) dP
(y)
f (x; y) A
dx fY (y) dy =
fY (y)
R
Z
x f (x; y) dxdy =
x 1B (y) f (x; y) dxdy =
R B
(1B
R R
Y ) XdP =
Y
XdP
1 (B)
X)
f (x; y)
fY (y)
x f (x j y) dx
Denitie
Fie vectorul aleator (X; Y ) cu componente integrabile.
Se numeste regresia lui X in Y functia
y ! M (X j Y = y)
x;
xy
2
y
xy
2 R2
2
x
x y
2
y
x y
exp
1
2 (1
2)
"
f (x:y) =
2 2
x y
2)
(1
y
y
#)
Proprietatea 1
Repartitiile marginale ale lui
P
=N
Demonstratie:
Adunand si scazand
exp
2
x
x;
; P
2)
p
2
2
y
x
fY (y) =
y;
2
y
1
2
x
2
y
este
exp
2
y
2)
(1
f (x; y) dx = q
=N
sunt
la exponent obtinem
f (x:y) = q
2
1
2 2x (1
N (2; ; )
1
2
2
y
2
y
X:
Proprietatea 2
Repartitia lui
N
conditionata de
x
2
x
este normala,
2
f (x; y)
fY (y)
Corolar
M (X j Y = y)
D2 (X j Y = y)
2
x
y
2
Rezulta ca, pentru modelul normal bidimensional, regresia lui X in Y este liniara, iar ecuatia dreptei de regresie
este
x=
x
x
(X; Y )
(X; Y )
M (X j Y = y) = a + by
n
X
(xi
byi )
i=1
na + b
>
>
: a
n
P
yi =
i=1
n
P
yi + b
i=1
@SS
@a
i=1
n
P
= 0
se scrie sub
xi
i=1
n
P
yi2 =
@SS
@b
xi yi
i=1
n
P
i=1
Notatie:
yi
n
P
i=1
n
P
i=1
yi
=n
yi2
n
X
yi2
(ny) = n
i=1
n
X
(yi
y) > 0
i=1
s2x
1X
(xi
n i=1
x)
s2y
sxy
1X
(yi
n i=1
y)
1X
(xi
n i=1
sxy
sx sy
x) (yi
y)
sxy
sx
=r
2
sy
sy
bb y
b
a = x
x=r
sx
(y
sy
10
y)
bb (X1 ; :::; Xn )
b
a (X1 ; :::; Xn )
= X
n
P
(yi
y)
i=1
n
X
Xi
X (yi
y) = P
n
i=1
1
(yi
y)
i=1
bb (X1 ; :::; Xn ) y
n
X
Xi (yi
i=1
= b
M (b
a j Y1 = y1 ; :::; Yn = yn )
= a
n
X
xi
i=1
2 notat
bbyi
b
a
= SSresid
(X; Y )
2
x
; i = 1; :::; n
Proprietatea 3.
Variabila aleatoare
SSresid =
n
X
Xi
i=1
are proprietatea
2
x
1
(1
2)
SSresid
b
a
bbyi
2
(n
2)
Rezulta din Proprietatea 8 de la "Estimarea parametrilor" (metoda celor mai mici patrate).
11
y)
In continuare facem o analiza a surselor de variabilitate ale datelor, utilizand modelul regresiei liniare
(ANOVA pentru dreapta de regresie)
In acest moment dispunem de urmatoarele valori:
valorile observate ale covariatei (ale variabilei "cauza")
yi ; i = 1; ::; n;
xi ; i = 1; :::; n;
("efect")
xbi = b
a + bb yi ; i = 1; :::; n;
xbi ; i = 1; :::; n;
reziduuri
SSresid =
i=1
(xi
xbi ) =
SSregresie =
n
X
b
a
xi
i=1
n
X
i=1
SStotal =
n
X
(xbi
(xi
bbyi
x)
2
x)
i=1
Demonstratie:
SStotal
n
X
i=1
(xi
xbi + xbi
x) =
= SSresid + SSregresie + 2
n
X
i=1
12
(xi
xbi ) (xbi
x)
n
X
(xi
i=1
xbi ) (xbi
n
X
x) =
x + bby
xi
bb
b
a
xi
i=1
i=1
n
X
n h
X
(xi
bbyi
bb (yi
x)
i=1
bb nsxy
sxy
ns2y
s2y
bbyi
b
a + bbyi
bby + bbyi
i
y) (yi
x =
x =
y) =
=0
2
x
1
(1
2)
SSregresie
si
13
2
x
1
(1
2)
SStotal ;
yi2
i=1
se descompune in suma a
qj =
N
X
aj
forme patratice
y y ; j = 1; :::m;
; =1
N
X
yi2 =
i=1
m
X
qj ;
j=1
j = 1; :::; m;
Aj = aj
; =1;:::;N
asa incat
r1 +:::+rj
qj =
k=r1 +:::+rj
zk2 ; j = 1; :::m
1 +1
este ca
r1 + ::: + rm = N
Demonstratie:
" =) "
z = By; B 0 B = I;
cu
m
X
y0 Aj y
j=1
Rezulta
m
X
Aj = I
j=1
0
1
m
X
rang @
Aj A = N
j=1
Dar
Deci
rang @
m
X
j=1
m
X
Aj A
j=1
N
" (= "
rang (Aj ) =
m
X
rj
j=1
r1 + ::: + rm
B intr-o
0
1
B1
B ::::: C
B
C
B : C
C
B=B
B : C
B
C
@ ::::: A
Bm
forma partitionata,
Pentru i = 1 :
A1 este N N dimensionala,
Iq
D0 A1 D00 = 4 0
0
0
Ir 1
0
3
0
0 5
0
= D0 1
= kd k
15
si avem
Iq
A1 = D0 4 0
0
Retinem
b
(1)
=d
3
0
0 5D
0
0
Ir 1
0
= 1; :::; r1 ;
B1 = b
= 1; :::; N
(1)
=1;:::;r1 ;
=1;:::;N
N
X
(1)
y ;
= 1; :::; r1
=1
z(1)
Atunci
q1
Iq
= y0 A1 y = y0 D0 4 0
0
= z12 + ::: + zq2
q1 =
r1
X
=1
Pentru i
arbitrar:
0
Ir 1
0
2
zq+1
:::
zr21
3
0
0 5 Dy =
0
c z 2 ; c 2 f 1; 1g:
N
X
(i)
y ;
= r1 + ::: + ri
+ 1; :::; r1 + ::: + ri
=1
Bi = b
qi =
(i)
=r1 +:::+ri
r1 +:::+r
X i
=r1 +:::+ri
Atunci
m
X
i=1
qi =
N
X
=1
1 +1
1 +1;:::;r1 +:::+ri ;
=1;:::;N
c z 2 ; c 2 f 1; 1g:
c z 2 ; c 2 f 1; 1g:
16
Dar
m
X
i=1
N
P
qi = y0 y > 0 8y 6= 0
qi =
= 1 8 = 1; :::; N:
z 2 ; i = 1; :::; m
=r1 +:::+ri
1 +1
N
X
y ;
N;
parti-
= 1; :::; N
=1
N
X
y2 =
=1
N
X
z2
=1
deci
B 0 B = I;
cu Ai = a(i)
asa incat
; =1;:::;N
m
X
ri ; i = 1; :::; m;
Qi :
i=1
r1 + ::: + rm = N
17
Demonstratie
" =) "
Qi =
=r1 +:::+ri
Z =
N
X
Z 2 ; i = 1; :::; m
1 +1
Y ;
= 1; :::; N
=1
Din proprietatile combinatiilor liniare de variabile independente, repartizate normal rezulta ca Z este repartizata N (0; 1) pentru orice = 1; :::; N si Z1 ; :::; ZN sunt inde2
pendente. Atunci, din avem Qi
(ri ) ; i = 1; :::; m si, din
asociativitatea independentei, Qi este independenta de Qj
pentru orice i 6= j:
Corolar 1
Fie Y1 ; :::; Yk variabile aleatoare independente, identic
repartizate N (0; 1) : Notam Y = (Y1 ; :::; Yk )0 : O conditie necesara si sucienta ca Y0 AY sa e repartizata 2 este ca A2 = A;
caz in care numarul de grade de libertate este egal cu
rang(A):
Corolar 2.
Fie Y1 ; :::; Yk variabile aleatoare independente, identic
repartizate N (0; 1) : Notam Y = (Y1 ; :::; Yk )0 : Presupunem ca
Y0 Y =Q1 + Q2 ; unde
Q1 = Y0 AY
Atunci Q2
(k
(r)
r) :
Corolar 3.
Fie Y1 ; :::; Yk variabile aleatoare independente, identic
repartizate N (0; 1) : Notam Y = (Y1 ; :::; Yk )0 : Fie Q; Q1 ; Q2 forme
18
Q = Q1 + Q2 ; Q
(a) ; Q1
(b) :
Corolar 4.
Fie Y1 ; :::; Yk variabile aleatoare independente, identic
repartizate N (0; 1) : Notam Y = (Y1 ; :::; Yk )0 : Fie Y0 A1 Y 2 (a)
si Y0 A2 Y 2 (b) : O conditie necesara si sucienta ca cele
doua forme patratice sa e independente este ca A1 A2 = 0:
============================================
atunci
b = 0;
2
x
2
x
1
(1
2)
1
(1
2)
SSregresie
2
SStotal
(1)
(n
1)
iar variabilele (11 ) SSregresie si (11 ) SSresid sunt independente (in raport cu repartitia conditionata).
2
x
2
x
Demonstratie:
Daca
N a;
2
x
b = 0;
2
Xi
este
; 8i:
n
X
i=1
bb
ci
X
n
2X
n
X
i=1
(yi
i=1
y) = P
n
i=1
1
(yi
(yi
n
X
i=1
i=1
SSregresie = P
n
b
a + bbyi
2
y)
n
X
(yi
y) Xi
i=1
1
X1
B : C
C
(X1 ; :::; Xn ) B B
@ : A
2
y)
Xn
19
bby + bbyi
!2
unde
notat
y) (yj
B = k(yi
y)ki;j=1;:::;n = kbij k
1
b1j
B : C
B
C
@ : A=0
bnj
yj
yi
1) = 1.
(n
1
B
ns2y
1
B
ns2y
Cum
1
2 (1
x
2)
SSregresie =
2
x
2)
(1
!0
1
B
ns2y
1
2
x
(1
2)
1
(1
2)
SSregresie
(1) :
n
X
Xi
i=1
Putem scrie
SStotal =
n
X
Xi
i=1
1
X1
B : C
1
C
X Xi = 2 (X1 ; :::; Xn ) A B
@ : A
n
Xn
B
B
B
A=B
B
B
@
0
1
0
:::::
0
0
0
1
1
:::::
0
0
::::: 0
::::: 0
::::: 0
::::: :::::
:::::
1
::::: 1 1
20
1=n
1=n
1=n
:::::
1=n
1=n
1
C
C
C
C
C
C
A
en = 0
+C
iar Ce1 ; :::; Cen sunt vectori liniar independenti. Deci rang
n
1:
1
n2 A
Rezulta ca
2
x
1
(1
2)
SStotal
(n
1) :
1
B
ns2y
1
B=0
ns2y
Cum avem si
2
x
2
x
2
x
1
(1
1
(1
2)
2)
1
(1
2)
SSresid =
SSregresie =
SSresid =
2
x
2
x
1
(1
1
(1
2)
2
x
1
(1
2)
2)
(SStotal
SSregresie ) ;
0
B
1
(X1 ; :::; Xn ) B B
@
2
sy
(X1 ; :::; Xn )
1
A
n2
1
B
ns2y
1
X1
: C
C
: A
Xn
0
X1
B :
B
@ :
Xn
1
C
C
A
(1) ;
(n
putem aplica Corolar 4 si obtinem independenta variabilelor (11 ) SSregresie si (11 ) SSresid:
2
x
2
x
21
2) ;
FUNCTII IN
> cauza
> ef ect
> model
Functia
SS
SSregresie
SSresid
SStotal
Grade de libertate
1
n 2
n 1
SS (mean SS)
SSregresie = SSregresie
SSresid = n 1 2 SSresid
c (y1 ; :::; yn )
c (x1 ; :::; xn )
lm (ef ect cauza)
lm
returneaza
coe cients
b
a; bb
xbi ; i = 1; :::; ng
Functia anova returneaza tabelul ANOVA si teste pentru ipoteza fb = 0g despre care discutam in ultima parte a
cursului.
22
APLICATIE
longley {datasets}
R Documentation
Longleys Economic Regression Data
Description
A macroeconomic data set which provides a well-known
example for a highly collinear regression.
Usage
longley
Format
A data frame with 7 economical variables, observed
yearly from 1947 to 1962 (n=16).
GNP.deator: GNP implicit price deator (1954=100)
GNP: Gross National Product.
Unemployed: number of unemployed.
Armed.Forces: number of people in the armed forces.
Population: noninstitutionalized population >= 14
years of age.
Year: the year (time).
Employed: number of people employed.
The regression lm(Employed ~.) is known to be highly
collinear.
Alegem ca variabila raspuns Employed, cu covariata
Population
> X <- longley[, "Employed"]
> Y <- longley[,"Population"]
> model1<-lm(X~Y2)
> model1
Call:
lm(formula = X ~Y)
Coe cients:
(Intercept)...........Y
8.3807 .........0.4849
23
> summary(model1)
Call:
lm(formula = X ~Y2)
Residuals:
Min........ .......1Q.......... Median....... 3Q .............Max
-1.4362 ...-0.9740 .........0.2021...... 0.5531 ......1.9048
Coe cients:
....................Estimate .....Std. Error...... t value.......Pr(>jtj)
(Intercept) ...8.3807 .......4.4224 ..........1.895 ........0.079 .
Y................ 0.4849 ........0.0376 ..........12.896 .....3.69e-09
Residual standard error: 1.013 on 14 degrees of freedom
Multiple R-Squared: 0.9224, Adjusted R-squared: 0.9168
F-statistic: 166.3 on 1 and 14 DF,
p-value: 3.693e-09
p-value < 0.05, deci modelul regresiei liniare este corect
> anova(model1)
Analysis of Variance Table
Response: X
...................Df...... Sum Sq........Mean Sq .......F value........Pr(>F)
Y........ ........1....... 170.643 ......170.643 .......166.30 ......3.693e-09
Residuals ...14 ......14.366 .........1.026
24