Sunteți pe pagina 1din 14

Due 11:59pm (midnight), Friday, 2/10

Homework 3
Last Name: First Name:

Directions: Please type in your answer in bold font. For the empirical exercises (where you analyze
data using SAS) please attach your SAS code on a separate page of this homework - your SAS
code should allow me to replicate the information you used to answer the questions.

1. (Ch.3) Suppose a new standardized test is given to 100 randomly selected third-grade students in

New Jersey. The sample average score Y on the test is 58 points and the sample standard
deviation, SY, is 8 points.

a. The authors plan to administer the test to all third-grade students in New Jersey. Construct a 95%
confidence interval for the mean score of all New Jersey third graders.
Answer:

b. Suppose the same test is given to 200 randomly selected third graders from Iowa, producing a
sample average of 62 points and sample standard deviation of 11 points. Construct a 90%
confidence interval for the difference in mean scores between Iowa and New Jersey.
Answer:

c. Can you conclude with a high degree of confidence that the population means for Iowa and New
Jersey students are different? (What is the standard error of the difference in the two sample
means? What is the p-value of the test of no difference in the means versus some difference?)
(Hint: To answer first state the null and alternative hypothesis; second, compute standard
error of the difference in the two ample means and then compute the relevant t-statistic;
third compute the p-value associated with the t-statistic; and finally use the p-value to
answer the question.)
Answer:

2. (Ch.3) To investigate possible gender discrimination in a firm, a sample of 100 men and 64
women with similar job descriptions are selected at random. A summary of the resulting monthly
salaries follows:
Y Standard Deviation, SY n
Avg. Salary ( )
Men $3100 $200 100
Women $2900 $320 64

a. What do these data suggest about wage differences in the firm? Do they represent statistically
significant evidence that wages of men and women are different? (Hint: To answer first state
the null and alternative hypothesis; second, compute the relevant t-statistic; third compute
the p-value associated with the t-statistic; and finally use the p-value to answer the
question.)
Answer:

1
b. Do these data suggest that the firm is guilty of gender discrimination in its compensation
policies? Explain.
Answer:

3. (SAS Exercise) On blackboard you will find a SAS data file (under General Materials > Data >
CPS92_08 Data Set) CPS92_08.sas7bdat that contains data on full-time, full-year workers, age
25-34, with a high school diploma or a BA/BS as their highest degree. The file
CPS92_08_Description provides a description of the dataset. Use these data to answer the
following questions. Attach your SAS code on a separate page of this homework - your SAS
code should allow me to replicate the information you used to answer the questions.

a. Find out the mean, standard deviation, minimum and maximum values for ahe, bachelor, female
and age variables. (Hint: use Proc means procedure in SAS)
Answer:
The SAS System

The MEANS Procedure


Variable Mean Std Dev Minimum Maximum

year 2000.06 8.0000696 1992.00 2008.00


ahe 15.3266620 8.9947619 1.3141025 82.4175797
bachelor 0.4355576 0.4958460 0 1.0000000
female 0.4294855 0.4950189 0 1.0000000
age 29.6437712 2.8319157 25.0000000 34.0000000

b. Draw a scatter plot of ahe vs. age in SAS and paste the graph below.
Answer:

2
c. Find out the covariance and correlation coefficient of ahe and age.
Answer:
2 Variables: ahe age

Covariance Matrix, DF = 15315


ahe age
ahe 80.90574250 3.43669594
age 3.43669594 8.01974638

Simple Statistics
Variable N Mean Std Dev Sum Minimum Maximum
ahe 15316 15.32666 8.99476 234743 1.31410 82.41758
age 15316 29.64377 2.83192 454024 25.00000 34.00000

3
Pearson Correlation Coefficients, N = 15316
Prob > |r| under H0: Rho=0
ahe age
ahe 1.00000 0.13492
<.0001

age 0.13492 1.00000


<.0001

d. Copy and paste your SAS code below. Your SAS code should allow me to replicate your results
for Part a to Part c.
Answer:
libname datalib "E:\Econometrics\CPS92_08 Data Set";
/* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
proc means data = datalib.CPS92_08 mean std min max;
var;
run;

/* Scatter plot */
proc gplot data = datalib.CPS92_08;
plot ahe * age;
run;
quit;

/* Covariance and correlation */


proc corr data = datalib.CPS92_08 cov;
var ahe age;
run;

e. Copy and paste your Log Window report below. (Hint: you should make sure your log
window only has blue font report in addition to the black code. Red messages display
errors and green messages display warnings.)
Answer:
Licensed to UNIVERSITY OF ALASKA ANCHORAGE - SFA - T&R, Site 70121837.
NOTE: This session is executing on the W32_8PRO platform.

NOTE: Updated analytical products:

SAS/STAT 14.1
SAS/ETS 14.1
SAS/OR 14.1
SAS/IML 14.1
SAS/QC 14.1

NOTE: Additional host information:

W32_8PRO WIN 6.2.9200 Workstation

NOTE: SAS initialization used:


real time 9.84 seconds

4
cpu time 1.09 seconds

1 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
2 /* Summary of Data: Mean, Standard deviation, percentiles */
3 proc means data = datalib.CPS92_08 Data Set mean std min max;
---
73
WARNING: Ignoring second data set reference.
ERROR 73-322: Expecting an =.
4 var str testscr;
ERROR: Variable STR not found.
ERROR: Variable TESTSCR not found.
5 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.14 seconds
cpu time 0.01 seconds

6 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
7 /* Summary of Data: Mean, Standard deviation, percentiles */
8 proc means data = datalib.CPS92_08 Data Set mean, std, min, max;
---
73
202
ERROR 73-322: Expecting an =.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
9 var str testscr;
WARNING: Ignoring second data set reference.
ERROR: Variable STR not found.
ERROR: Variable TESTSCR not found.
10 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

11 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
12 /* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
13 proc means std min max data = datalib.CPS92_08 Data Set mean std min max;
---
73
WARNING: Ignoring second data set reference.
ERROR 73-322: Expecting an =.
14 var str testscr;
ERROR: Variable STR not found.
ERROR: Variable TESTSCR not found.
15 run;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

16 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
5
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
17 /* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
18 proc means data = datalib.CPS92_08 mean std min max;
19 var;
20 run;

NOTE: Writing HTML Body file: sashtml.htm


NOTE: There were 15316 observations read from the data set DATALIB.CPS92_08.
NOTE: PROCEDURE MEANS used (Total process time):
real time 1.59 seconds
cpu time 0.21 seconds

21 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
22 /* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
23 proc means data = datalib.CPS92_08 mean std min max;
24 var;
25 run;

NOTE: There were 15316 observations read from the data set DATALIB.CPS92_08.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.07 seconds
cpu time 0.03 seconds

26
27 /* Scatter plot */
28 proc gplot data = datalib.CPS92_08
29 plot var *ahe vs. age * str;
----
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ANNOTATE, DATA, GOUT,
UNIFORM.
ERROR 76-322: Syntax error, statement will be ignored.
30 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
31 quit;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GPLOT used (Total process time):
real time 0.10 seconds
cpu time 0.01 seconds

32 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
33 /* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
34 proc means data = datalib.CPS92_08 mean std min max;
35 var;
36 run;

NOTE: There were 15316 observations read from the data set DATALIB.CPS92_08.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.09 seconds
cpu time 0.01 seconds

37
38 /* Scatter plot */
39 proc gplot data = datalib.CPS92_08
40 plot var;
6
----
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ANNOTATE, DATA, GOUT,
UNIFORM.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
41 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
42 quit;

NOTE: PROCEDURE GPLOT used (Total process time):


real time 0.01 seconds
cpu time 0.01 seconds

43 libname datalib "E:\Econometrics\CPS92_08 Data Set";


NOTE: Libref DATALIB refers to the same physical library as TMP1.
NOTE: Libref DATALIB was successfully assigned as follows:
Engine: V9
Physical Name: E:\Econometrics\CPS92_08 Data Set
44 /* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
45 proc means data = datalib.CPS92_08 mean std min max;
46 var;
47 run;

NOTE: There were 15316 observations read from the data set DATALIB.CPS92_08.
NOTE: PROCEDURE MEANS used (Total process time):
real time 0.06 seconds
cpu time 0.04 seconds

48
49 /* Scatter plot */
50 proc gplot data = datalib.CPS92_08
51 plot;
----
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ANNOTATE, DATA, GOUT,
UNIFORM.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
52 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
53 quit;

NOTE: PROCEDURE GPLOT used (Total process time):


real time 0.00 seconds
cpu time 0.00 seconds

54 proc gplot data = datalib.CPS92_08


55 plot ahe vs. age * str;
----
22
76
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ANNOTATE, DATA, GOUT,
UNIFORM.
ERROR 76-322: Syntax error, statement will be ignored.
56 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
57 quit;

NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE GPLOT used (Total process time):
real time 0.00 seconds
cpu time 0.00 seconds

58 proc gplot data = datalib.CPS92_08


7
59 plot ahe * age;
----
22
202
ERROR 22-322: Syntax error, expecting one of the following: ;, (, ANNOTATE, DATA, GOUT,
UNIFORM.
ERROR 202-322: The option or parameter is not recognized and will be ignored.
60 run;

WARNING: RUN statement ignored due to previous errors. Submit QUIT; to terminate the procedure.
61 quit;

NOTE: PROCEDURE GPLOT used (Total process time):


real time 0.00 seconds
cpu time 0.00 seconds

62 proc gplot data = datalib.CPS92_08;


63 plot ahe * age;
64 run;

NOTE: 14251 bytes written to C:\Users\bdanders\AppData\Local\Temp\SAS Temporary


Files\_TD368_OPN402_\gplot.png.
65 quit;

NOTE: There were 15316 observations read from the data set DATALIB.CPS92_08.
NOTE: PROCEDURE GPLOT used (Total process time):
real time 1.39 seconds
cpu time 0.31 seconds

66 proc corr data = datalib.CPS92_08 cov;


67 var;
68 run;

ERROR: Variable list empty.


NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CORR used (Total process time):
real time 0.03 seconds
cpu time 0.01 seconds

69 proc corr data = datalib.CPS92_08 cov;


70 var;
71 run;

ERROR: Variable list empty.


NOTE: The SAS System stopped processing this step because of errors.
NOTE: PROCEDURE CORR used (Total process time):
real time 0.01 seconds
cpu time 0.01 seconds

72 proc corr data = datalib.CPS92_08 cov;


73 var ahe age;
74 run;

NOTE: PROCEDURE CORR used (Total process time):


real time 0.14 seconds
cpu time 0.01 seconds

4. (Ch. 3) On blackboard you will find a SAS data file (under General Materials > Data >
CPS92_08 Data Set) CPS92_08.sas7bdat that contains data on full-time, full-year workers, age
25-34, with a high school diploma or a BA/BS as their highest degree. The file
CPS92_04_Description provides a description of the dataset. Use these data to answer the

8
following questions. Attach your SAS code on a separate page of this homework - your SAS
code should allow me to replicate the information you used to answer the questions.

a. Compute the sample mean for average hourly earnings (AHE) in 1992 and in 2008. Construct a
95% confidence interval for the population means of AHE in 1992 and 2008 and the change
between 1992 and 2008. Answer the question by filling out below table. (Hint: Indicate 2008
observations as year_group 1 and 1992 observations as year_group 2. Use proc ttest
command to get the means, standard errors and confidence intervals.)

Answer:
AverageHourlyEarnings,Nominal$s
Mean SE(Mean) 95%ConfidenceInterval
AHE2008 18.9761 0.1155 19.202418.7497
AHE1992 11.6264 0.0644 11.752511.5002
Difference SE(Difference) 95%ConfidenceInterval
AHE2008AHE1992 7.3497 0.1327 7.60987.0897

The SAS System

The TTEST Procedure

Variable: ahe
Year_group N Mean Std Dev Std Err Minimum Maximum
1 7711 18.9761 10.1394 0.1155 2.0032 82.4176
2 7605 11.6264 5.6133 0.0644 1.3141 46.6341
Diff (1-2) 7.3497 8.2101 0.1327

Year_group Method Mean 95% CL Mean Std Dev 95% CL Std Dev
1 18.9761 18.7497 19.2024 10.1394 9.9819 10.3021
2 11.6264 11.5002 11.7525 5.6133 5.5255 5.7039
Diff (1-2) Pooled 7.3497 7.0897 7.6098 8.2101 8.1192 8.3031
Diff (1-2) Satterthwaite 7.3497 7.0906 7.6089

Method Variances DF t Value Pr > |t|


Pooled Equal 15314 55.39 <.0001
Satterthwaite Unequal 12065 55.60 <.0001

9
Equality of Variances
Method Num DF Den DF F Value Pr > F
Folded F 7710 7604 3.26 <.0001

10
b. In 2008, the value of the Consumer Price Index (CPI) was 215.2. In 1992, the value of the CPI
was 140.3 Repeat (a) but use AHE measured in real 2008 dollars ($2008); that is, adjust the 1992
data for price inflation that occurred between 1992 and 2008. Answer the question by filling out
below table. (Hint: create a new variable called real_ahe, i.e. multiply 1992 wages by (CPI-
2008/CPI-1992) to get real 2008 dollar wages)
Answer:
AverageHourlyEarnings,Real$2008
Mean SE(Mean) 95%ConfidenceInterval
AHE2008
AHE1992
Difference SE(Difference) 95%ConfidenceInterval
AHE2008AHE1992

c. If you were interested in the change in workers purchasing power from 1992 to 2008, would you
use the results from (a) or from (b)? Explain.
Answer: B) Because it keeps purchasing power the same and accounts for inflation.

d. Use the 2008 data to construct a 95% confidence interval for the mean of AHE for high school
graduates. Construct a 95% confidence interval for the mean of AHE for workers with a college
degree. Construct a 95% confidence interval for the difference between the two means. Answer
the question by filling out below table. (Hint: Indicate college grads observations as
edu_group 1 and high school grads observations as edu_group 2. Use proc ttest command
to get the means, standard errors and confidence intervals.)
(Hint: you can put a where clause to filter 2008 data,
proc ttest data=... (where = (year = 2008)) alpha=0.05;
var ;

11
class ;
run; )
Answer:
AverageHourlyEarningsin2008
Mean SE(Mean) 95%ConfidenceInterval
College 21.78 22.1124.44
HighSchool 15.31 15.5115.11
Difference SE(Difference) 95%ConfidenceInterval
CollegeHighSchool 1.4 1.001.8

e. Repeat part d using the 1992 data expressed in real 2008 dollars ($2008).
Answer:
AverageHourlyEarningsin1992(in$2008)
Mean SE(Mean) 95%ConfidenceInterval
College
HighSchool
Difference SE(Difference) 95%ConfidenceInterval
CollegeHighSchool

f. Based on Part d, is there a significant gap between AHE of college students and high school
students in 1992? How about in 2008 (based on Part e)? Explain.
Answer:

12
Code:

libname datalib "E:\Econometrics\CPS92_08 Data Set";


/* Summary of Data: Mean, Standard deviation, Minimun, Maximum percentiles */
proc means data = datalib.CPS92_08 mean std min max;
var;
run;

/* Scatter plot */
proc gplot data = datalib.CPS92_08;
plot ahe * age;
run;
quit;

/* Covariance and correlation */


proc corr data = datalib.CPS92_08 cov;
var ahe age;
run;
/* Indicate 2008 observations as year_group 1 and 1992 observations as year_group 2
*/
data datalib.CPS92_082;
set datalib.CPS92_08;
if Year = (2008) then Year_group =1;
if Year = (1992) then Year_group =2;
run;
/* t-test to test difference between two means */
proc ttest data= datalib.CPS92_082 alpha=0.05;
var real_ahe;
class year_group;
run;

13
14

S-ar putea să vă placă și