10-11 PE sem-REGRESSION 0212101511

Previziune economica
(Master management an I)
Note de seminar - semestrul I - 2010 - 2011 Daniela.hincu@man.ase.ro

Master PE 2010 - 2011 1
Cuprins
I. II. III.
Recapitulare metode de regresie Regresie cu variabile duale Regresie logistica
Master PE 2010 - 2011
Clasificare metode regresie

Multiple Regression Models Linear
NonLinear
Linear
Dummy Variable
Interaction
PolyNomial
Square Root
Log
Reciprocal
Exponential
3
Master PE 2010 - 2011
Variabile dummy variabile duale

Doua categorii: dihotonomice cu valori 0 sau 1 Trei categorii: ptr. calitate: 0 sau 1 sau 2 - ptr. client, potential client, client al concurentei - Patru categorii: trimestru I, sau II, sau III sau IV
-
Ptr. k categorii, se introduc k-1 variabile dummy fiecare cu valorile 0 sau 1

Master PE 2010 - 2011 4
Rezultate vanzare masini second hand model 1 pret=f(kilometri)

------------------------------------------------------------------------------------------------------Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8063 R Square 0.6501 Adjusted R Square 0.6466 Standard Error 151.6 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F Significance F ------------------------------------------------------------------------------------------------------Regression 1 4183528 4183528 182.1 0.000 Residual 98 2251362 22973 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat P value Intercept 6533 8451 77.31 0.0000 kilometri -0.0312 0.0023 -13.49 0.0000
6/23/2011
Master PE 2010 - 2011
Model 2 regresie multipla noua variabila: culoare: 1 ptr. alb, 2 ptr. argintiu, 3 in rest
Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8095 R Square 0.6552 Adjusted R Square 0.6481 Standard Error 151.2 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F ------------------------------------------------------------------------------------------------------Regression 2 4216263 2108132 92.17 Residual 97 2218627 22872 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat Intercept 6580 92.96 70.79 kilometri -0.0313 0.0023 -13.56 culoare -21.67 18.11 -1.20
Significance F 0.000
P value 0.0000 0.0000 0.2345
1 (culoare alba) I1 ! 0 (daca culoare{ alb)

6/23/2011
I2
1 culoare arg int ie 0 (daca culoare argintiu)
Master PE 2010 - 2011
Noul model cu 2 variabile dummy

I1 ! 1 I 2 ! 0 indica I 0 ! 1 I1 ! 1 indica I1 ! 0 I 2 ! 0 indica culoare ! alba culoare ! metalizata / argintie culoare { alba si { argintie
ret ! F 0 F1kilometri F 3 I1 F 4 I 2
Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8355 R Square 0.6980 Adjusted R Square 0.6886 Standard Error 142.3 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F ------------------------------------------------------------------------------------------------------Regression 3 4491749 1497250 73.97 Residual 96 1943141 20241 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat P value Intercept 6350 92.17 68.90 0.0000 kilometri -0.0278 0.0024 -11.72 0.0000 I(1) 45.24 34.08 1.33 0.1876 I(2) 147.7 38.18 3.87 0.0002
Significance F 0.000
Master PE 2010 - 2011
Interpretare - masini care nu sunt albe si nu sunt argintii
Pret ! F 0 F1kilometri F 3 (0) F 4 (0)

Pret ! 6350 0.0278kilometri
Interpretare - masini albe si care nu sunt argintii

rice ! 6350 0.0278kilo etri 45.2 rice ! 6395.2 0.0278kilo etri
Interpretare - masini non-albe si metalizate/argintii

Pret ! 6350 0.0278kilometri 148 Pret ! 6498 0.0278kilometri
Master PE 2010 - 2011 8
Aplicatie regresie cu variabila dummy

A dummy variable is a binary variable that has either 1 or zero. It is commonly used to examine group and time effects in regression. Panel data analysis estimates the fixed effect and/or random effect models using dummy variables. The fixed effect model examines difference in intercept among groups, assuming the same slopes. By contrast, the random effect model estimates error variances of groups, assuming the same intercept and slopes. The data used here are of the top 50 information technology firms from the 308 page of OECD Information Technology Outlook 2004 (http://thesius.sourceoecd.org/). The data set contains revenue, R&D budget, and net income in current USD millions. Sursa: Using Dummy Variables in Regression, Park, Hun Myoung2002Present. Jeeshim and KUCC625 (2005-03-26)
Master PE 2010 - 2011 9
Model 1 - regressing R&D budget in 2002 on net income in 2000 and firm type. The dummy variable d is set to 1 for equipment and software companies and zero for other firms.
Master PE 2010 - 2011
10
Master PE 2010 - 2011
11
Model 2 - assuming that equipment and software firms have more R&D investment than do telecommunications and electronics companies. There may or may not be correlation (dependence) between the dummy variable (firm types) and regressors (net income).
Master PE 2010 - 2011
12
Master PE 2010 - 2011
13
Comparison between Model 1 and Model 2 (Fixed Group Effect)
The top green is regression line for equipment and software companies, while the bottom yellow line is one for telecommunication and electronics firms in Model 2. the green and yellow lines are parallel with a difference of 1,006.626, the coefficient of the dummy variable. The intercept of equipment and software firm is computed as 2140.205 = 1006.626 +1133.579.
Master PE 2010 - 2011
14
Model 2 - the regression with two dummy variables: one for equipment and software firms and another d0 for telecommunication and electronics
Master PE 2010 - 2011
15
Logistic regression models

A logistic link function is used in the regression formula, such that the regression formula can be written as:
Logit[Prob(outcome)] = a + b1x1 + b2x2 + + bixi.
Where: a is the intercept, b1 to bi are regression coefficients for i covariables x1 to xi, similar to other regression models. The logit indicates the natural logarithm of the odds of the probability p that the outcome occurs: log(p/(1p)). Odds ratios can be calculated by exponentiating the coefficients: OR=exp(bi). The relationship between the probability of the outcome and the logit is a characteristic curve.
Master PE 2010 - 2011 16
Logistic Link Function

The relationship between the probability of an outcome and the logit of the probability is a characteristic curve. The logit is calculated as: ln(probability/(1probability)). When the logit is probability is 50%. 0, the
Master PE 2010 - 2011
17
Metoda regresiei logistice

Regresia logistic modeleaz rela ia dintre o mul ime de variabile independente xi (categoriale, continue) i o variabil dependent dihotomic (nominal , binar ) Y. O astfel de variabil dependent apare, de regul , atunci cnd reprezint apartenen a la dou clase, categorii prezen /absen , da/nu etc. Ecua ia de regresie ob inut , de un tip diferit de celelalte regresii discutate, ofer informa ii despre: importan a variabilelor n diferen ierea claselor, clasificarea unei observa ii ntr-o clas .

De remarcat c diagrama de mpr tiere a valorilor nu ofer nici un indiciu n privinta dependen elor. n asemenea cazuri, regresia liniar clasic nu ofer un model adecvat. Presupunem c valorile y (variabil binar ) sunt codificate 0/1, valoarea 1 exprimnd n general apari ia unui anumit eveniment, astfel nct ceea ce se caut este o estimare a probabilit ii de producere a respectivului eveniment n func ie de valorile variabilelor independente.
Master PE 2010 - 2011
18
Cazul unei singure variabile independente

( y ! 1 x) ln( ) !E F x 1 ( y ! 1 x) Cantitatea din partea stng este numit (transformarea) logit a probabilit ii P(y=1|x). Semnifica ia expresiei P(y=1|x) este: probabilitatea de realizare a valorii y=1 condi ionat de valoarea x; sau cu alte cuvinte, probabilitatea de clasare a observa iei x n clasa y=1, probabilitatea ca valoarea x s fie asociat cu producerea evenimentului y=1. Se noteaz P(y=1|x) cu p, conform nota iei de la modelul probabilist binomial (probabilitatea de succes). Transformarea logit este necesar pentru a proiecta probabilitatea p din intervalul (0,1) n intervalul (- , + ), fapt necesar n procesul de estimare a parametrilor. Modelul este legat direct de no iunea de odds (raport de anse), notat OR (odds ratio or odds report): p OR ! 1 p
Modelul este:
eE F x P ( y ! 1 x) ! 1 eE F x
care reprezint raportul dintre probabilitatea de succes i probabilitatea de insucces. p Modelul se mai poate scrie: ! eE F x 1 p de unde interpretarea coeficientului (pozitiv): cre terea cantit ii logit atunci cnd x cre te cu o unitate sau cu cat creste OR cre te de e ori atunci cnd x cre te cu o unitate.
Master PE 2010 - 2011 19
Mod de identificare a constantelor beta_zero si beta_unu
ate de intrare: seria de date pentru {Xi , Yi} n care variabila Y este de tip dual (cu valori posibile numai 1 sau 0). Mod de lucru: L ! F 0 F1 X Se construieste seria de logit:
eL p( X ) ! 1 eL
Yi
Se calculeaz pentru fiecare punct valoarea
pi (1 pi )1Yi

care este maxima dac i probabilitatea ca Y=1 este 1 sau dac i probabilitatea ca Y=1 este 0. Yi 1Yi Se minimizeaza produsul p (1 p )
Master PE 2010 12011 i !- , n
20
Estimarea/testarea modelului
Testarea ipotezei = 0 se realizeaz prin testul Wald, corespunz tor testului t de la regresia liniar , statistica testului fiind
F2 G ! var(F 2 )
2
care este repartizat 2 cu un singur grad de libertate. Intervalul de ncredere pentru este, potrivit rezultatelor de la analiza ecua iei de regresie
unde beta este estima ia lui (din ecua ia de regresie estimat ) iar SE(beta) este abaterea standard a reparti iei de sondaj a lui . Se observ c , pentru o observa ie, dac p > 0,5, atunci este mai probabil ca observa ia s apar in grupului caracterizat de y=1. Aceast condi ie este echivalent cu OR > 1, adic logit > 0.
F z1E / 2 SE ( F )
;e
F z1 E / 2 SE ( F )
Master PE 2010 - 2011
21
Estimarea/testarea modelului
Testarea semnifica iei coeficien ilor se face prin testul Wald sau prin testul raportului de verosimilitate (LR, likelihood-ratio). Testul Wald este prezentat la modelul logistic cu un singur factor. Testul LR se bazeaz pe statistica ob inut ca raport ntre maximul func iei de verosimilitate sub ipoteza nul (H0) i maximul func iei de verosimilitate n condi ii mai largi. Lema Neyman-Pearson arat c acesta este cel mai puternic test la un prag fixat. Pentru cazul regresiei logistice, se calculeaz raportul ntre valoarea maxim a func iei de verosimilitate pentru modelul complet (L1) i cea pentru modelul mai simplu (L0). Statistica LR este -2log(L0/L1), si este repartizat prin distributia 2. Testul LR este recomandat n cazul construirii modelului pas cu pas, verificnd dac variabila eliminat din model este semnificativ , deci dac modelul poate fi simplificat.
Master PE 2010 - 2011
22
Cazul mai multor variabile independente

Modelul general este
ln( ) ! E F 1 x1 F 2 x 2 ... F p x p 1 ( y ! 1 x1 ,..., x p ) ( y ! 1 x1 ,...., x p )
unde p este P(y =1|x1,x2,,xp). Se poate ob ine imediat i forma exponen ial echivalent . Interpretarea coeficien ilor cre terea i: cantit ii logit (logaritm din OR) atunci cnd xi cre te cu o unitate (celelalte variabile x r mnnd constante).
Master PE 2010 - 2011 23
Interpretare coeficienti de regresie

P ( y ! 1 x1 ,...., x p ) ! exp(E F 1 x1 F 2 x 2 ... F p x p ) 1 exp(E F 1 x1 F 2 x 2 ... F p x p )
i se
Pentru coeficientul
ob ine:
OR xi !1, x j !0 ,i { j 1 ! exp( F i ) ! ORbaza 1 ( y ! 1 xi ! 1, x j ! 0, i { j ) ORbaza
( y ! 1 xi ! 1, x j ! 0, i { j )
Se ajunge astfel, din caracterul multiplicativ al modelului logistic:
OR x1, x 2 ,... xp ! exp( F 0 ) exp( F 1 ) ... exp( F p )

la interpretarea util c fiecare i exprim contribu ia factorului xi la explicarea probabilit ii (sub forma OR) de producere a evenimentului y = 1. Astfel, fixnd xi = 1, exp( i) va reprezenta factorul multiplicativ constant indiferent de valorile celorlalte variabile independente. Dac i = 0, factorul corespunz tor nu are nici un efect, (nmul irea cu 1). Dac i < 0 prezen a factorului reduce probabilitatea evenimentului y = 1, iar dac i > 0 se va m ri aceast probabilitate.
Master PE 2010 - 2011 24
Regresie logistica exemplu din statistica medicala (sursa: Regresia logistica, M. Gorunescu
variabila dependenta prezenta hipertensiunii variabile independente: subiectul fumeaza (valoare 1 pentru fumat), subiectul este obez (valoare 1 ptr. obezitate), subiectul are varsta peste 40 de ani (valoare 1 ptr. varsta)
Master PE 2010 - 2011
25
Interpretare coeficienti
Master PE 2010 - 2011
26
Diffusion of UK residential telephones in Modelling and forecasting the diffusion of innovation A 25-year review International Journal of Forecasting 22 (2006) 519 545
Master PE 2010 - 2011
27
The diffusion of a single innovation in a single market

Bass (1969) suggests that individuals are influenced by a desire to innovate (coefficient of innovation p) and by a need to imitate others in the population (coefficient of imitation q). The probability that a potential adopter adopts at time t is driven by (p+qF(t)) where F(t) is the proportion of adopters at time t. Relating the similarity of innovation diffusion with the spreading of an epidemic, imitation is often called a contagion effect. In a pure innovation scenario (p >0,q =0), diffusion follows a modified exponential; in a pure imitation scenario (p=0, q>0 ), diffusion follows a logistic curve. Other properties are that (p+q) controls scale and (q/p) controls shape (the condition (q /p)>1 is necessary for the curve to be S-shaped).
Since the diffusion of an innovation is a complex process, involving large numbers of individual decisions, the diffusion of any one innovation will be due to elements of both extreme hypotheses. Van den Bulte and Stremersch (2004) performed a meta-analysis on the use of the Bass model applied to new product diffusion. The study involved 746 different Bass estimations spread over 75 consumer durables and 77 countries. The international comparison enabled them to test several sets of hypotheses, relating the diffusion to both the national culture and the nature of the product. The contagion-based hypotheses for which they found support are that ( q /p) ratios are: negatively associated with individualism (individualism means more immunity to social contagion) or positively associated with collectivism; positively associated with power-distance (a measure of the hierarchical nature of the culture). The assumption here is that classes tend to adopt a new product at a similar time; positively associated with masculinity (cultures where there is a clear distinction between gender roles).
Bass model (1969) considered a population of m individuals who are both innovators (those with a constant propensity to purchase, p) and imitators (those whose propensity to purchase is influenced by the amount of previous purchasing, q. The probability density function for a potential adopter making an adoption at time t is: f(t)=(p+q F(t)) (1-F(t)) 1 exp( ( p q )t ) F (t ) ! The corresponding cumulative density function is p
1 exp( ) ( ( p q )t ) q
Master PE 2010 - 2011
28

10-11 PE sem-REGRESSION 0212101511

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

10-11 PE sem-REGRESSION 0212101511

Încărcat de

Drepturi de autor:

Formate disponibile

Previziune economica

Note de seminar - semestrul I - 2010 - 2011 Daniela.hincu@man.ase.ro

Recapitulare metode de regresie Regresie cu variabile duale Regresie logistica

Master PE 2010 - 2011

Clasificare metode regresie

Master PE 2010 - 2011

Variabile dummy variabile duale

Ptr. k categorii, se introduc k-1 variabile dummy fiecare cu valorile 0 sau 1

Rezultate vanzare masini second hand model 1 pret=f(kilometri)

Master PE 2010 - 2011

P value 0.0000 0.0000 0.2345

1 (culoare alba) I1 ! 0 (daca culoare{ alb)

1 culoare arg int ie 0 (daca culoare argintiu)

Master PE 2010 - 2011

Noul model cu 2 variabile dummy

Master PE 2010 - 2011

Interpretare - masini care nu sunt albe si nu sunt argintii

Pret ! F 0  F1kilometri  F 3 (0)  F 4 (0)

Pret ! F 0  F1kilometri  F 3 (1)  F 4 (0)

Pret ! F 0  F1kilometri  F 3 (0)  F 4 (1)

Aplicatie regresie cu variabila dummy

Master PE 2010 - 2011

Master PE 2010 - 2011

Master PE 2010 - 2011

Master PE 2010 - 2011

Comparison between Model 1 and Model 2 (Fixed Group Effect)

Master PE 2010 - 2011

Master PE 2010 - 2011

Logistic regression models

Logistic Link Function

Master PE 2010 - 2011

Metoda regresiei logistice

Master PE 2010 - 2011

Cazul unei singure variabile independente

Mod de identificare a constantelor beta_zero si beta_unu

Se calculeaz pentru fiecare punct valoarea

Master PE 2010 12011 i !- , n

Master PE 2010 - 2011

Master PE 2010 - 2011

Cazul mai multor variabile independente

Interpretare coeficienti de regresie

OR xi !1, x j !0 ,i { j 1 ! exp( F i ) ! ORbaza 1  ( y ! 1 xi ! 1, x j ! 0, i { j ) ORbaza

Se ajunge astfel, din caracterul multiplicativ al modelului logistic:

OR x1, x 2 ,... xp ! exp( F 0 ) exp( F 1 ) ... exp( F p )

Master PE 2010 - 2011

Master PE 2010 - 2011

Master PE 2010 - 2011

The diffusion of a single innovation in a single market

Master PE 2010 - 2011

S-ar putea să vă placă și

Pret ! F 0 F1kilometri F 3 (0) F 4 (0)

Pret ! F 0 F1kilometri F 3 (1) F 4 (0)

Pret ! F 0 F1kilometri F 3 (0) F 4 (1)

OR xi !1, x j !0 ,i { j 1 ! exp( F i ) ! ORbaza 1 ( y ! 1 xi ! 1, x j ! 0, i { j ) ORbaza