Sunteți pe pagina 1din 28

Previziune economica

(Master management an I)

Note de seminar - semestrul I - 2010 - 2011 Daniela.hincu@man.ase.ro


Master PE 2010 - 2011 1

Cuprins
I. II. III.

Recapitulare metode de regresie Regresie cu variabile duale Regresie logistica

Master PE 2010 - 2011

Clasificare metode regresie


Multiple Regression Models Linear

NonLinear

Linear

Dummy Variable

Interaction

PolyNomial

Square Root

Log

Reciprocal

Exponential
3

Master PE 2010 - 2011

Variabile dummy variabile duale


Doua categorii: dihotonomice cu valori 0 sau 1  Trei categorii: ptr. calitate: 0 sau 1 sau 2 - ptr. client, potential client, client al concurentei - Patru categorii: trimestru I, sau II, sau III sau IV
 -

Ptr. k categorii, se introduc k-1 variabile dummy fiecare cu valorile 0 sau 1


Master PE 2010 - 2011 4

Rezultate vanzare masini second hand model 1 pret=f(kilometri)


------------------------------------------------------------------------------------------------------Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8063 R Square 0.6501 Adjusted R Square 0.6466 Standard Error 151.6 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F Significance F ------------------------------------------------------------------------------------------------------Regression 1 4183528 4183528 182.1 0.000 Residual 98 2251362 22973 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat P value Intercept 6533 8451 77.31 0.0000 kilometri -0.0312 0.0023 -13.49 0.0000

6/23/2011

Master PE 2010 - 2011

Model 2 regresie multipla noua variabila: culoare: 1 ptr. alb, 2 ptr. argintiu, 3 in rest
Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8095 R Square 0.6552 Adjusted R Square 0.6481 Standard Error 151.2 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F ------------------------------------------------------------------------------------------------------Regression 2 4216263 2108132 92.17 Residual 97 2218627 22872 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat Intercept 6580 92.96 70.79 kilometri -0.0313 0.0023 -13.56 culoare -21.67 18.11 -1.20

Significance F 0.000

P value 0.0000 0.0000 0.2345

1 (culoare alba) I1 ! 0 (daca culoare{ alb)


6/23/2011

I2

1 culoare arg int ie 0 (daca culoare argintiu)

Master PE 2010 - 2011

Noul model cu 2 variabile dummy


I1 ! 1 I 2 ! 0 indica I 0 ! 1 I1 ! 1 indica I1 ! 0 I 2 ! 0 indica culoare ! alba culoare ! metalizata / argintie culoare { alba si { argintie

ret ! F 0  F1kilometri  F 3 I1  F 4 I 2

Regression Statistics (Observations 100) ------------------------------------------------------------------------------------------------------Multiple R 0.8355 R Square 0.6980 Adjusted R Square 0.6886 Standard Error 142.3 ------------------------------------------------------------------------------------------------------ANOVA df SS MS F ------------------------------------------------------------------------------------------------------Regression 3 4491749 1497250 73.97 Residual 96 1943141 20241 Total 99 6434890 ------------------------------------------------------------------------------------------------------Coefficients Standard Error t Stat P value Intercept 6350 92.17 68.90 0.0000 kilometri -0.0278 0.0024 -11.72 0.0000 I(1) 45.24 34.08 1.33 0.1876 I(2) 147.7 38.18 3.87 0.0002

Significance F 0.000

Master PE 2010 - 2011

Interpretare - masini care nu sunt albe si nu sunt argintii

Pret ! F 0  F1kilometri  F 3 (0)  F 4 (0)


Pret ! 6350  0.0278kilometri
Interpretare - masini albe si care nu sunt argintii

Pret ! F 0  F1kilometri  F 3 (1)  F 4 (0)


rice ! 6350  0.0278kilo etri  45.2 rice ! 6395.2  0.0278kilo etri
Interpretare - masini non-albe si metalizate/argintii

Pret ! F 0  F1kilometri  F 3 (0)  F 4 (1)


Pret ! 6350  0.0278kilometri  148 Pret ! 6498  0.0278kilometri
Master PE 2010 - 2011 8

Aplicatie regresie cu variabila dummy


A dummy variable is a binary variable that has either 1 or zero. It is commonly used to examine group and time effects in regression. Panel data analysis estimates the fixed effect and/or random effect models using dummy variables. The fixed effect model examines difference in intercept among groups, assuming the same slopes. By contrast, the random effect model estimates error variances of groups, assuming the same intercept and slopes. The data used here are of the top 50 information technology firms from the 308 page of OECD Information Technology Outlook 2004 (http://thesius.sourceoecd.org/). The data set contains revenue, R&D budget, and net income in current USD millions. Sursa:  Using Dummy Variables in Regression, Park, Hun Myoung2002Present. Jeeshim and KUCC625 (2005-03-26)
Master PE 2010 - 2011 9

Model 1 - regressing R&D budget in 2002 on net income in 2000 and firm type. The dummy variable d is set to 1 for equipment and software companies and zero for other firms.

Master PE 2010 - 2011

10

Master PE 2010 - 2011

11

Model 2 - assuming that equipment and software firms have more R&D investment than do telecommunications and electronics companies. There may or may not be correlation (dependence) between the dummy variable (firm types) and regressors (net income).

Master PE 2010 - 2011

12

Master PE 2010 - 2011

13

Comparison between Model 1 and Model 2 (Fixed Group Effect)

The top green is regression line for equipment and software companies, while the bottom yellow line is one for telecommunication and electronics firms in Model 2. the green and yellow lines are parallel with a difference of 1,006.626, the coefficient of the dummy variable. The intercept of equipment and software firm is computed as 2140.205 = 1006.626 +1133.579.

Master PE 2010 - 2011

14

Model 2 - the regression with two dummy variables: one for equipment and software firms and another d0 for telecommunication and electronics

Master PE 2010 - 2011

15

Logistic regression models


A logistic link function is used in the regression formula, such that the regression formula can be written as:
Logit[Prob(outcome)] = a + b1x1 + b2x2 + + bixi.

Where: a is the intercept, b1 to bi are regression coefficients for i covariables x1 to xi, similar to other regression models. The logit indicates the natural logarithm of the odds of the probability p that the outcome occurs: log(p/(1p)). Odds ratios can be calculated by exponentiating the coefficients: OR=exp(bi). The relationship between the probability of the outcome and the logit is a characteristic curve.
Master PE 2010 - 2011 16

Logistic Link Function


The relationship between the probability of an outcome and the logit of the probability is a characteristic curve. The logit is calculated as: ln(probability/(1probability)). When the logit is probability is 50%. 0, the

Master PE 2010 - 2011

17

Metoda regresiei logistice


Regresia logistic modeleaz rela ia dintre o mul ime de variabile independente xi (categoriale, continue) i o variabil dependent dihotomic (nominal , binar ) Y. O astfel de variabil dependent apare, de regul , atunci cnd reprezint apartenen a la dou clase, categorii prezen /absen , da/nu etc. Ecua ia de regresie ob inut , de un tip diferit de celelalte regresii discutate, ofer informa ii despre: importan a variabilelor n diferen ierea claselor, clasificarea unei observa ii ntr-o clas .


De remarcat c diagrama de mpr tiere a valorilor nu ofer nici un indiciu n privinta dependen elor. n asemenea cazuri, regresia liniar clasic nu ofer un model adecvat. Presupunem c valorile y (variabil binar ) sunt codificate 0/1, valoarea 1 exprimnd n general apari ia unui anumit eveniment, astfel nct ceea ce se caut este o estimare a probabilit ii de producere a respectivului eveniment n func ie de valorile variabilelor independente.

Master PE 2010 - 2011

18

Cazul unei singure variabile independente


( y ! 1 x) ln( ) !E F x 1  ( y ! 1 x) Cantitatea din partea stng este numit (transformarea) logit a probabilit ii P(y=1|x). Semnifica ia expresiei P(y=1|x) este: probabilitatea de realizare a valorii y=1 condi ionat de valoarea x; sau cu alte cuvinte, probabilitatea de clasare a observa iei x n clasa y=1, probabilitatea ca valoarea x s fie asociat cu producerea evenimentului y=1. Se noteaz P(y=1|x) cu p, conform nota iei de la modelul probabilist binomial (probabilitatea de succes). Transformarea logit este necesar pentru a proiecta probabilitatea p din intervalul (0,1) n intervalul (- , + ), fapt necesar n procesul de estimare a parametrilor. Modelul este legat direct de no iunea de odds (raport de anse), notat OR (odds ratio or odds report): p OR ! 1 p

Modelul este:

eE  F x P ( y ! 1 x) ! 1  eE  F x

care reprezint raportul dintre probabilitatea de succes i probabilitatea de insucces. p Modelul se mai poate scrie: ! eE  F x 1 p de unde interpretarea coeficientului (pozitiv): cre terea cantit ii logit atunci cnd x cre te cu o unitate sau cu cat creste OR cre te de e ori atunci cnd x cre te cu o unitate.
Master PE 2010 - 2011 19

Mod de identificare a constantelor beta_zero si beta_unu

ate de intrare:  seria de date pentru {Xi , Yi} n care variabila Y este de tip dual (cu valori posibile numai 1 sau 0). Mod de lucru: L ! F 0  F1 X  Se construieste seria de logit:
eL p( X ) ! 1 eL
 Yi

Se calculeaz pentru fiecare punct valoarea

pi (1  pi )1Yi


care este maxima dac i probabilitatea ca Y=1 este 1 sau dac i probabilitatea ca Y=1 este 0. Yi 1Yi Se minimizeaza produsul p (1  p )

Master PE 2010 12011 i !- , n

20

Estimarea/testarea modelului
Testarea ipotezei = 0 se realizeaz prin testul Wald, corespunz tor testului t de la regresia liniar , statistica testului fiind
F2 G ! var(F 2 )
2

care este repartizat 2 cu un singur grad de libertate.  Intervalul de ncredere pentru este, potrivit rezultatelor de la analiza ecua iei de regresie

unde beta este estima ia lui (din ecua ia de regresie estimat ) iar SE(beta) este abaterea standard a reparti iei de sondaj a lui . Se observ c , pentru o observa ie, dac p > 0,5, atunci este mai probabil ca observa ia s apar in grupului caracterizat de y=1. Aceast condi ie este echivalent cu OR > 1, adic logit > 0.

F  z1E / 2 SE ( F )

;e

F  z1 E / 2 SE ( F )

Master PE 2010 - 2011

21

Estimarea/testarea modelului
Testarea semnifica iei coeficien ilor se face prin testul Wald sau prin testul raportului de verosimilitate (LR, likelihood-ratio). Testul Wald este prezentat la modelul logistic cu un singur factor. Testul LR se bazeaz pe statistica ob inut ca raport ntre maximul func iei de verosimilitate sub ipoteza nul (H0) i maximul func iei de verosimilitate n condi ii mai largi. Lema Neyman-Pearson arat c acesta este cel mai puternic test la un prag fixat. Pentru cazul regresiei logistice, se calculeaz raportul ntre valoarea maxim a func iei de verosimilitate pentru modelul complet (L1) i cea pentru modelul mai simplu (L0). Statistica LR este -2log(L0/L1), si este repartizat prin distributia 2. Testul LR este recomandat n cazul construirii modelului pas cu pas, verificnd dac variabila eliminat din model este semnificativ , deci dac modelul poate fi simplificat.

Master PE 2010 - 2011

22

Cazul mai multor variabile independente


Modelul general este
ln( ) ! E  F 1 x1  F 2 x 2  ...  F p x p 1  ( y ! 1 x1 ,..., x p ) ( y ! 1 x1 ,...., x p )

unde p este P(y =1|x1,x2,,xp). Se poate ob ine imediat i forma exponen ial echivalent . Interpretarea coeficien ilor cre terea i: cantit ii logit (logaritm din OR) atunci cnd xi cre te cu o unitate (celelalte variabile x r mnnd constante).
Master PE 2010 - 2011 23

Interpretare coeficienti de regresie


P ( y ! 1 x1 ,...., x p ) ! exp(E  F 1 x1  F 2 x 2  ...  F p x p ) 1  exp(E  F 1 x1  F 2 x 2  ...  F p x p )
i se

Pentru coeficientul

ob ine:

OR xi !1, x j !0 ,i { j 1 ! exp( F i ) ! ORbaza 1  ( y ! 1 xi ! 1, x j ! 0, i { j ) ORbaza

( y ! 1 xi ! 1, x j ! 0, i { j )

Se ajunge astfel, din caracterul multiplicativ al modelului logistic:

OR x1, x 2 ,... xp ! exp( F 0 ) exp( F 1 ) ... exp( F p )


la interpretarea util c fiecare i exprim contribu ia factorului xi la explicarea probabilit ii (sub forma OR) de producere a evenimentului y = 1. Astfel, fixnd xi = 1, exp( i) va reprezenta factorul multiplicativ constant indiferent de valorile celorlalte variabile independente.  Dac i = 0, factorul corespunz tor nu are nici un efect, (nmul irea cu 1). Dac i < 0 prezen a factorului reduce probabilitatea evenimentului y = 1, iar dac i > 0 se va m ri aceast probabilitate.
Master PE 2010 - 2011 24

Regresie logistica exemplu din statistica medicala (sursa: Regresia logistica, M. Gorunescu
variabila dependenta prezenta hipertensiunii variabile independente: subiectul fumeaza (valoare 1 pentru fumat), subiectul este obez (valoare 1 ptr. obezitate), subiectul are varsta peste 40 de ani (valoare 1 ptr. varsta)

Master PE 2010 - 2011

25

Interpretare coeficienti

Master PE 2010 - 2011

26

Diffusion of UK residential telephones in Modelling and forecasting the diffusion of innovation A 25-year review International Journal of Forecasting 22 (2006) 519 545

Master PE 2010 - 2011

27

The diffusion of a single innovation in a single market


Bass (1969) suggests that individuals are influenced by a desire to innovate (coefficient of innovation p) and by a need to imitate others in the population (coefficient of imitation q). The probability that a potential adopter adopts at time t is driven by (p+qF(t)) where F(t) is the proportion of adopters at time t. Relating the similarity of innovation diffusion with the spreading of an epidemic, imitation is often called a contagion effect. In a pure innovation scenario (p >0,q =0), diffusion follows a modified exponential; in a pure imitation scenario (p=0, q>0 ), diffusion follows a logistic curve. Other properties are that (p+q) controls scale and (q/p) controls shape (the condition (q /p)>1 is necessary for the curve to be S-shaped).
Since the diffusion of an innovation is a complex process, involving large numbers of individual decisions, the diffusion of any one innovation will be due to elements of both extreme hypotheses. Van den Bulte and Stremersch (2004) performed a meta-analysis on the use of the Bass model applied to new product diffusion. The study involved 746 different Bass estimations spread over 75 consumer durables and 77 countries. The international comparison enabled them to test several sets of hypotheses, relating the diffusion to both the national culture and the nature of the product. The contagion-based hypotheses for which they found support are that ( q /p) ratios are: negatively associated with individualism (individualism means more immunity to social contagion) or positively associated with collectivism; positively associated with power-distance (a measure of the hierarchical nature of the culture). The assumption here is that classes tend to adopt a new product at a similar time; positively associated with masculinity (cultures where there is a clear distinction between gender roles).

Bass model (1969) considered a population of m individuals who are both innovators (those with a constant propensity to purchase, p) and imitators (those whose propensity to purchase is influenced by the amount of previous purchasing, q. The probability density function for a potential adopter making an adoption at time t is: f(t)=(p+q F(t)) (1-F(t)) 1  exp( ( p  q )t ) F (t ) !  The corresponding cumulative density function is p

1  exp( ) ( ( p  q )t ) q

Master PE 2010 - 2011

28

S-ar putea să vă placă și