Sunteți pe pagina 1din 12

Lecture 1.

Introduction
Outlines for Today

1.Types of Variables

2. Categorical Data

3. SA3202 Data Sets

4. Categorical Data Analysis

11/21/2019 SA3202, Lecture 1 1


Types of Variables:
a). Binary Variable: take only two possible values
e.g. Sex

e.g. Marital Status

b). Nominal Variable: take several unordered values


e.g. Nationality

e.g. Race

c). Ordinal Variable: take several ordered values


e.g. Grade

e.g. Social class

e.g. Political view

11/21/2019 SA3202, Lecture 1 2


d). Discrete Variable: take a countable number of possible values
e.g. # of students in a class

e.g. # of fishes in a pond

e). Continuous Variable: take any possible value in a interval


e.g. Height

e.g. Income

Categorical Variable: take a finite number of possible values, including types of a)-c), and
possibly d). The possible values of a categorical variable is referred to as its categories or levels.
e.g. Type of blood

e.g. Grade of a class

11/21/2019 SA3202, Lecture 1 3


Categorical Data: data collected based on one or several categorical variables. Other terms:
count data, frequency data,discrete data, qualitative data, cross-classified data.

Contingency Table: A table presenting the categorical data


Cells-----correspond to different combinations of the categories (levels)
Entry in a cell----- frequency of the cell
Dimension---------# of the categorical variables

e.g. One-way Table

e.g Two-way Table

11/21/2019 SA3202, Lecture 1 4


SA3202 Data Sets
1. Random Number Data The following table shows the frequency of each digit when 100 “random
digits” were generated on a pocket calculator:

Digit 1 2 3 4 5 6 7 8 9 0 Total
Frequency 7 8 8 15 13 11 12 8 5 13 100

2. Suicide Data The following table shows the classification of suicides in France by day of the week.
Based on these data, Durkheim (1897) concludes that suicide diminishes at the end of the week, beginning
on Friday. He also notes that the suicide rate is not lower on Sunday than on Saturday.

Day Monday Tuesday Wednesday Thursday Friday Saturday Sunday Total


# of Suicides 1001 1035 982 1033 905 737 894 6587

3. Homicide Data The following table shows the monthly distribution of homicides in the USA in
1970.

Month Jan. Feb. Mar. Apr. May Jun. Jul. Aug. Sept. Oct. Nov. Dec. Total
# of Homicides 1318 1229 1327 1257 1424 1399 1475 1559 1417 1507 1400 1534 16848

11/21/2019 SA3202, Lecture 1 5


4. Political Views Data The following table shows the political views of 1397 Americans in 1975.

Response Code Frequency


Extremely Liberal 1 46
Liberal 2 179
Slightly Liberal 3 196
Moderate 4 559
Slightly Conservative 5 232
Conservative 6 150
Extremely Conservative 7 35
Total 1397

5. Number of Boys Data The following table shows the number of boys among the first 4 children in
3343 Swedish families of size 4 or more.

Number of boys 0 1 2 3 4 Total


Frequency 183 789 1250 875 246 3343

6. World Cup Data The following table shows the number of goals scored per team per game, for the 32
matches played in the 1996 Football World Cup.

Number of Goals 0 1 2 3 4 5 6+ Total


Frequency 18 20 15 7 2 2 0 64

11/21/2019 SA3202, Lecture 1 6


7. Vitamin C Data The following table is based on 1961 French study regarding the therapeutic value of
ascorbic (Vitamin C). The study was double blind, with one group of 140 skiers receiving a placebo
while a second group of 139 received 1 gram of ascorbic acid per day. Of interest is the relative
occurrence of colds for the two groups

Placebo Vitamin C
Cold 31 17
Not Cold 109 122
Total 140 139

8. Seal Belt Data The following table is based on the records of accidents in Florida, USA, in 1988.

Safety Equipment
Injury Seal Belt None Total
Fatal 510 1601 2111
Nonfatal 412368 162527 574835
Total 412818 164128 576946

9. Death Penalty Data The following table is based on a study concerning the effects of racial
characteristics on whether individuals convicted of homicide receive the death penalty . It shows that
the defendant’s race (white, black) and the verdict (death penalty, no death penalty) in 326 cases of
homicide in Florida, USA during 1976-1977.

Defendant’s Race
Death Penalty White Black Total
Yes 19 17 36
No 141 149 290
Total 160 166 326
11/21/2019 SA3202, Lecture 1 7
10. University Admission Data The following table shows admission results for the six largest graduate
departments at the University of California at Berkeley, for the fall 1973 session.
Applicant’s Gender
Whether Admitted Male Female
Yes 1198 557
No 1493 1278
Total 2691 1835

11. Smoking and Lung Cancer Data The following table is based on a retrospective study of lung
cancer and tobaco smoking among patients in hospitals in serveral English cities. The table compares
male lung cancer patients with control patients having other diseases, according to the average number of
cigarettes smoked daily over a ten-year period preceding the onsets of the disease

Daily Average # of Cigarettes Lung Cancer Patients Control Patients


None 7 61
<5 55 129
5-14 489 570
15-24 475 431
25-49 293 154
50+ 38 12

12. Smoking Habit Data The following table is from a study concerning smoking habits of high school
students in Arizona, USA
Student Smokes Students Not Smoke
Both parents smoke 400 1380
One parent smokes 416 1823
Neither parent smokes 188 1168

11/21/2019 SA3202, Lecture 1 8


13. Income and Job Satisfaction Data The following table is taken from the 1984 General Social Survey of
the National Data program in the US. The variables are income and job satisfaction. Income has four
levels: <$6000, between $6000 and $15000, between $15000 and $25000, and over $25000. Job
satisfaction g=has four levels: very dissatisfied (VD), little dissatisfied (LD), moderately satisfied (MS),
and very satisfied (VS):
Job Satisfaction
Income VD LD MS VS
<6000 20 24 80 82
6000-15000 22 38 104 125
15000-25000 13 28 81 113
>25000 7 18 54 92

14. British Social Mobility Data The following table relates father’s and son’s occupational status for
a sample of 3500 British father-son pairs.
Son’s Status
Father’s Status 1 2 3 4 5
1 50 45 8 18 8
2 28 174 84 154 55
3 11 78 110 223 96
4 14 150 185 714 447
5 3 42 72 320 411

11/21/2019 SA3202, Lecture 1 9


15. Danish Social Mobility Data The following table presents data on intergenerational mobility in
Denmark, similar to the British Social Mobility Data:
Son’s Status
Father’s Status 1 2 3 4 5
1 18 17 16 4 2
2 24 105 109 59 21
3 23 84 289 217 95
4 8 49 175 348 198
5 6 8 69 201 246
16. The “Complete” Death Penalty Data. The following table gives the victim’s race (black, white), as
well as the defendant’s race and the verdict, for the Death Penalty Data presented earlier.

Defendant’s Race Victim’s Race Death Penalty Not Death Penalty


White White 19 132
Black 0 9
Black White 11 52
Black 6 97
17. The “Complete” University Admission Data. The following table gives the admission decisions for
each of the six largest graduate departments for the University Admission Data.
Male ( Whether Admitted) Female
Department Yes No Yes No
A 512 313 89 19
B 353 207 17 8
C 120 205 202 391
D 138 279 131 244
E 53 138 94 299
F 22 351 24 317
Total 1198 1493 557 1278
11/21/2019 SA3202, Lecture 1 10
Categorical Data Analysis: the analysis of the categorical data, usually referred to fitting a
statistical model to the data: first postulate a model for the underlying population via
formulating a statistical hypothesis, and then test whether or not the model fits the data.

This is also known as the hypothesis test of contingency table: compare the observed
frequencies with their “expected frequencies” (the frequencies expected under the model) to see
how close they are.

Method 1: Pearson’s Goodness of Fit Test The test statistic is

Degrees of freedom:

11/21/2019 SA3202, Lecture 1 11


Method 2: Wilk’s Likelihood Ratio Test The test statistic is

Degrees of freedom:
Method 1 and Method 2 are asymptotically equivalent.

11/21/2019 SA3202, Lecture 1 12

S-ar putea să vă placă și