0 Voturi pozitive0 Voturi negative

21 (de) vizualizări35 paginiCat Var Stat

Sep 28, 2013

© Attribution Non-Commercial (BY-NC)

PDF, TXT sau citiți online pe Scribd

Cat Var Stat

Attribution Non-Commercial (BY-NC)

21 (de) vizualizări

Cat Var Stat

Attribution Non-Commercial (BY-NC)

- Which is the Corect Statistical Test to Use
- Techniques of Statistical Analysis 1 Group 2 2014-15 (1)
- Complaint Behaviour
- Fundamentals of Design Experiments Part 1
- statistics jokes
- MANOVA
- DEF-STAT
- Slide
- Re-Engineering on the Production of Surrogate Feeds for Broiler Chickens (Gallus–Gallus Domesticus): its Effects on Broilers’ Live and Carcass Weights and Consumption Cost
- DAVE 5
- marketing research
- BMTRY701
- seleksi mundur
- Analysis of Helmet Impact Velocity Experimental Data and Statistical Tolerance Design
- FactoMineR
- Kls Int Mdl Riset Stat Inferensial Nov 08
- 1999 Material Ease 7
- 2009_3_ANOVA
- how to do anova in spss
- Testing of Hypothesis 2013.pptx

Sunteți pe pagina 1din 35

ERSH 8320

Slide 1 of 34

Todays Lecture

Regression with a single categorical independent variable. Coding procedures for analysis.

3

Dummy coding.

Relationship between categorical independent variable regression and other statistical terms.

Slide 2 of 34

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q

Linear regression regresses a continuous-valued dependent variable, Y , onto a set of continuous-valued independent variables X. The regression line gives the estimate of the mean of Y conditional on the values of X, or E (Y |X). But what happens when some or all independent variables are categorical in nature? Is the point of the regression to determine E (Y |X), across the levels of Y ? Cant we just put the categorical variables into SPSS and push the Continue" button?

Slide 3 of 34

q Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding q Dummy Coding Multiple Categories (> 2) Wrapping Up q

The Kenton Food Company wished to test four different package designs for a new breakfast cereal. Twenty stores, with approximately equal sales volumes, were selected as the experimental units. Each store was randomly assigned one of the package designs, with each package design assigned to ve stores. The stores were chosen to be comparable in location and sales volume. Other relevant conditions that could affect sales, such as price, amount and location of shelf space, and special promotional efforts, were kept the same for all of the stores in the experiment.

Slide 4 of 34

Cereal

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up

Slide 5 of 34

A Regular Regression?

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding

W

30.00

W W

25.00

W W W W W W W W W W W W W W W

20.00

15.00

10.00 1.00

2.00

3.00

4.00

Package Type

Slide 6 of 34

Categorical Variables

Categorical variables commonly occur in research settings. Another term sometimes used to describe for categorial variables is that of qualitative variables. A strict denition of a qualitative or categorical variable is that of a variable that has a nite number of levels. Continuous (or quantitative) variables, alternatively, have innitely many levels.

3 3

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding

Often this is assumed more than practiced. Quantitative variables often have countably many levels. Level of precision of an instrument can limit the number of levels of a quantitative variable.

Dummy Coding

3

Multiple Categories (> 2) Wrapping Up

Slide 7 of 34

Research Design

3

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up

3 3

q

3 3

Prediction. Explanation.

Slide 8 of 34

Analysis Specics

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up

Because of nature of categorical variables, emphasis of regression is not on linear trends but on differences between means (of Y ) at each level of the category.

3

Not all categorical variables are ordered (like cereal box type, gender,etc...).

When considering differences in the mean of the dependent variable, the type of analysis being conducted by a regression is commonly called an ANalysis Of VAriance (ANOVA). Combinations of categorical and continuous variables in the same regression is called ANalysis Of CoVAriance (ANCOVA - Chapters 14 and 15).

Slide 9 of 34

From Pedhazur (1997; p. 343): Assume that the data reported [below] were obtained in an experiment in which E represents an experimental group and C represents a control group. E 20 18 17 17 13 85 17 26 C 10 12 11 15 17 65 13 34

Y Y )2 = (Y Y y2

Slide 10 of 34

As you may recall from an earlier course on statistics, an easy way to determine if the means of the two conditions differ signicantly is to use a t-test (with n1 + n2 2) degrees of freedom.

H0 1 = 2 HA 1 = 2 t= 1 Y 2 Y

2 +y 2 y1 2 n1 +n2 2

1 n1

1 n2

Slide 11 of 34

t=

Overview Categorical Variables q Regression Basics q Not A Good Idea q Categorical Variables q Research Design q Analysis Specs q Example q Example Analysis Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q

26+34 5+52

17 13

1 5

1 5

4 = = 2.31 3

From Excel (=tdist(2.31,8,2)), p = 0.0496. If we used a Type-I error rate of 0.05, we would reject the null hypothesis, and conclude the means of the two groups were signicantly different. But what if we had more than two groups?. This type of problem can be solved equivalently from within the context of the General Linear Model.

Slide 12 of 34

Variable Coding

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q q

When using categorical variables in regression, levels of the categories must be recoded from their original value to ensure the regression model truly estimates the mean differences at levels of the categories. Several types of coding strategies are common:

3 3

Each type will produce the same t of the model (via R2 ). The estimated regression parameters are different across coding types, thereby representing the true difference in approaches employed by each type of coding. The choice of method of coding does not differ as a function of the type of research or analysis or purpose (explanation or prediction) of the analysis.

Slide 13 of 34

Variable Coding

Denition: a code is a set of symbols to which meanings can be assigned (Pedhazur, 1997; p. 342). The assignment of symbols follows a rule (or set of rules) determined by the categories of the variable used.

3

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q q q

All entities within the same symbol are considered alike (or homogeneous) within that category level. Categorical levels must be predetermined prior to analysis.

3 3

Some variables are obviously categorical - gender. Some variables are not so obviously categorial - political afliation.

Slide 14 of 34

Dummy Coding

The most straight-forward method of coding categorical variables is dummy coding. In dummy coding, one creates a set of column vectors that represent the membership of an observation to a given category level. If an observation is a member of a specic category level, they are given a value of 1 in that category levels column vector. If an observation is not a member of a specic category, they are given a value of 0 in that category levels column vector.

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q q

Slide 15 of 34

Dummy Coding

For each observation, a no more that a single 1 will appear in the set of column vectors for that variable. The column vectors represent the predictor variables in a regression analysis, where the dependent variable is modeled as a function of these columns.

3

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q q

Because of linear dependence with an intercept, one category-level vector is often excluded from the analysis.

Because all observations at a given category level have the same value across the set of predictors, the predicted value of the dependent variable, Y , will be identical for all observations within a category. The set of category vectors (and a vector for an intercept) are now used as input into a regression model.

Slide 16 of 34

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up

Y 20 18 17 17 13 10 12 11 15 17 Mean SS

X1 1 1 1 1 1 1 1 1 1 1 15 100 yx2 = 10

X2 1 1 1 1 1 0 0 0 0 0 1 0

Slide 17 of 34

The General Linear Model states that the estimated regression parameters are given by: b = (X X)1 X y

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q

From the previous slide, you can see what our entries for X could be, but...

3

Notice that X1 = X2 + X3 .

3 3

(X X) is a singular matrix - no inverse exists. Any combination of two of the columns would rid us of the linear dependency.

Slide 18 of 34

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q

40 100

= 0.4

Slide 19 of 34

b2 = 17 is the mean for the E category. b3 = 13 is the mean for the C category. Without an intercept, the model is fairly easy to interpret. For more advanced models, an intercept will prove to be helpful in interpretation.

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up

Slide 20 of 34

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q

40 100

= 0.4

Slide 21 of 34

a = 13 is the mean for the C category. b2 = 4 is the mean difference between the E category and the C category. The C category is called reference category. For members of the C category: Y = a + b2 X2 = 13 + 4(0) = 13

q Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q

With the intercept, the model parameters are now different from the rst example. The t of the model, however, is the same.

Slide 22 of 34

40 100

= 0.4

Slide 23 of 34

a = 17 is the mean for the E category. b3 = 4 is the mean difference between the C category and the E category. The E category is called reference category. For members of the E category: Y = a + b3 X3 = 17 4(0) = 17

q Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up q

With the intercept, the model parameters are now different from the rst example. The t of the model, however, is the same.

Slide 24 of 34

Overview Categorical Variables Variable Coding Dummy Coding q Example: Dummy Coded q Example 1 q Example 2 q Example 3 q Hypothesis Test Multiple Categories (> 2) Wrapping Up

Because each model had the same value for R2 and the same number of degrees of freedom for the regression (1), all hypothesis tests of the model parameters will result in the same value of the test statistic. R2 /k 0.4/1 F = = = 5.33 2 (1 R )/(N k 1) (1 0.4)/(10 1 1)

From Excel (=fdist(5.33,1,8)), p = 0.0496. If we used a Type-I error rate of 0.05, we would reject the null hypothesis, and conclude the regression coefcient for each analysis would be signicantly different from zero.

Slide 25 of 34

Recall from the t-test of the mean difference, t = 2.321 For the test of the coefcient, notice that F = t2 . Also notice that the p-values for each hypothesis test were the same, p = 0.0496. The test of the regression coefcient is equivalent to running a t-test when using a single categorical variable with two categories.

Slide 26 of 34

Generalizing the concept of dummy coding, we revisit our rst example data set, the cereal experiment data. Recall that there were four different types of cereal boxes. A dummy coding scheme would involve creation of four new column vectors, each representing observations from each box type. Just as was the case with two categories, a linear dependency is created if we wanted to use all four variables. Therefore, we must choose which category to remove from the analysis.

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up q

Slide 27 of 34

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up q q

Just as was the case for the example with two categories, a multiple category regression model with a single categorical independent variable has a direct link to a statistical test you may be familiar with. The regression model tests for mean differences across all pairings of category levels simultaneously. Testing for a difference between multiple groups (> 2) equates to a one-way ANOVA model (for a model with a single categorical independent variable).

Slide 28 of 34

Y

11 17 16 14 15 12 10 15 19 11 23 20 18 17 19 27 33 22 26 28

X1

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

X2

1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

X3

0 0 0 0 0 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0

X4

0 0 0 0 0 0 0 0 0 0 1 1 1 1 1 0 0 0 0 0

X5

0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1 1 1

Type 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3 4 4 4 4 4

28-1

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up q

Because X5 (representing box type four) was omitted from our model, the estimated intercept parameter now represents the mean for group X5 . All other parameters represent the difference between their respective category level and category level four with respect to the dependent variable. a = 27.2 b2 = 12.6 b3 = 13.8 b4 = 7.8

q q q q

Slide 29 of 34

Therefore:

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up

C = YC Y = a + b2 (0) + b3 (0) + b4 (1) = 27.2 7.8 = 19.4 D = YD Y = a + b2 (0) + b3 (0) + b4 (0) = 27.2

Slide 30 of 34

Hypothesis Test

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up

To test that all means are equal to each other (H0 : 1 = 2 = . . . = k ) against the hypothesis that at least one mean differs (H1 : At least one = ), called an omnibus test, the same hypothesis test from before can be used: R2 /k 0.4/1 F = = = 5.33 (1 R2 )/(N k 1) (1 0.4)/(10 1 1)

q q q q

Slide 31 of 34

Hypothesis Tests

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) q Breakfast Cereal Example q Hypothesis Test Wrapping Up

q

From Excel (=fdist(28.77,3,16)), p = 0.000001. If we used a Type-I error rate of 0.05, we would reject the null hypothesis, and conclude that at least one regression coefcient for this analysis would be signicantly different from zero. Having a regression coefcient of zero means having zero difference between two means (reference and specic category being compared). Having all regression coefcients of zero means absolutely no difference between any of the means.

Slide 32 of 34

Final Thought

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q Final Thought q Next Class q

Regression with categorical variables can be accomplished by coding schemes. Differing ways of coding (or inclusion of certain coded column vectors) may change the interpretation of the model parameters, but will not change the overall t of the model. But what can be said for comparing pair-wise mean differences in a simultaneous regression model...I will leave you in suspense...

Slide 33 of 34

Next Time

Overview Categorical Variables Variable Coding Dummy Coding Multiple Categories (> 2) Wrapping Up q Final Thought q Next Class

Slide 34 of 34

- Which is the Corect Statistical Test to UseÎncărcat dedrquan
- Techniques of Statistical Analysis 1 Group 2 2014-15 (1)Încărcat deEzequiel Paura
- Complaint BehaviourÎncărcat deJawad A Soomro
- Fundamentals of Design Experiments Part 1Încărcat deali
- statistics jokesÎncărcat deNischit Shetty
- MANOVAÎncărcat deMuhammad Mehtab Abbasi
- DEF-STATÎncărcat deJerahmeel Querimit
- SlideÎncărcat deDinhkhanh Nguyen
- Re-Engineering on the Production of Surrogate Feeds for Broiler Chickens (Gallus–Gallus Domesticus): its Effects on Broilers’ Live and Carcass Weights and Consumption CostÎncărcat deIJEAB Journal
- DAVE 5Încărcat deHyacinth Baula Bautista
- marketing researchÎncărcat defirmansyah rama
- BMTRY701Încărcat deKhaled Sayed
- seleksi mundurÎncărcat deLegi Lukmansyah
- Analysis of Helmet Impact Velocity Experimental Data and Statistical Tolerance DesignÎncărcat deKathy E. Gill
- FactoMineRÎncărcat deIrfan U Shah
- Kls Int Mdl Riset Stat Inferensial Nov 08Încărcat deL-Len
- 1999 Material Ease 7Încărcat deAnonymous T02GVGzB
- 2009_3_ANOVAÎncărcat deVineeta Singh
- how to do anova in spssÎncărcat deamuthaamba
- Testing of Hypothesis 2013.pptxÎncărcat deKalpesh Koradia
- Terje MahanÎncărcat denizanzizah
- Statistics Block Presentation Slides Day 4Încărcat deStephan Zeelie
- VrijlingGelder_van2002Încărcat deEranda Perera
- AN EMPIRICAL APPROACH TO THE EFFECT OF NON-OIL EXPORT ON NIGERIAN ECONOMY.Încărcat deIJAR Journal
- Research PaperÎncărcat deKunal Satija
- HiMCM Problem and Solution 2010 Curbing_City_ViolenceÎncărcat deAnonymous j6r5KRtrH2
- Bioestadisticas Para El ClinicoÎncărcat deJuan José Tijerina
- Midterm Notes MGMt 2050Încărcat deSami Attiq
- e Jankowski 2015Încărcat deMoltimer Folchart Craw
- The Extreme Future Stock Returns following extreme earnings surprisesÎncărcat deWilliam Chou

- Time Series Analysis by SPSSÎncărcat deFaizan Ahmad
- STAT659: Chapter 6Încărcat desimplemts
- 2002_Bookmatter_EconometricsÎncărcat deRohit James Joseph
- KNNAmazondataset(Revised)Încărcat deRahul Yadav
- Final ReportÎncărcat degarry
- Practical Guide to Principal Component Analysis (PCA) in R & PythonÎncărcat deNilson Bispo
- An Analysis of Inaccuracy in Piepline Construction Cost EstimationÎncărcat deIoana Popescu
- 250-mcqs-of-mgt503Încărcat deAngel Watson
- Part 4C (Quantitative Methods for Decision Analysis) 354.docÎncărcat deOeln Cainglet
- History of Statistics by Christine ValerioÎncărcat deChristine Valerio
- Ch02 SolutionÎncărcat deSharif M Mizanur Rahman
- Asparouhov, Tihomir and Muth´en & Muth´en., (2008) - Exploratory Structural Equation ModelingÎncărcat dejtorada
- 2-1 a Thomas Homogeneity ISO 13528 TemplateÎncărcat deAhmad Marzuki
- Chapter 2a-Thematic Cartography.pdfÎncărcat demaryamsiti504
- R Stats UcalÎncărcat deivanwielki
- Business Statistics II:Încărcat deashishmanish
- Tutorial for StatgraphicsÎncărcat deSaúl Dt
- Darwiniana, Nueva Serie - Identidad Taxonómica de Schinopsis Lorentzii y Schinopsis Marginata (Anacardiaceae)Încărcat denegroknd
- Cheat_Sheet_Interpreting_Regressions_One_Page.pdfÎncărcat deniloykrittika
- Quantitative AnalysisÎncărcat deIris Xueru Chen
- Course Description of Probability Theory and Random ProcessesÎncărcat deprabhat
- LLJ110705Încărcat deFatima Beena
- CCHU Mid Term AssignmentÎncărcat dehellomynameisnetter
- Chaper 1Încărcat deChen
- remotesensing-09-01187Încărcat deektabaranwal
- Data Warehousing and Mining QuÎncărcat derajeshchauhan
- Tutsheet3Încărcat devishnu
- Welcome to TNPSC - Application Form PrintÎncărcat deஜெயகுமார் சக்கரவர்த்தி
- Year 1 Mathematics Portfolio SatisfactoryÎncărcat deWooSeok Choi
- PASS(workshop)2a.pdfÎncărcat deLesa Adzli

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.