Sunteți pe pagina 1din 18

ECON 1203 Tutorial Workshop Questions

Semester 1 2016
***This document will be periodically updated with questions to be discussed in
succeeding tutorials, and re-posted to Moodle every fortnight.***
Weeks 1 and 2
1. (a) What is meant by a variable in a statistical sense? Distinguish between qualitative and
quantitative statistical variables, and between continuous and discrete variables. Give
examples.
(b) Distinguish between (i) a statistical population and a sample; (ii) a parameter and a
statistic. Give examples.
2. In order to know the market better, the second-hand car dealership, Anzac Garage, wants
to analyze the age of second-hand cars being sold. A sample of 20 advertisements for
passenger cars is selected from the second-hand car advertising/listing website
www.drive.com.au The ages in years of the vehicles at time of advertisement are listed
below:
5, 5, 6, 14, 6, 2, 6, 4, 5, 9, 4, 10, 11, 2, 3, 7, 6, 6, 24, 11
(a) Calculate the frequency, cumulative frequency and relative frequency distributions for
the age data using the following bin classes:
More than 0 to less than or equal to 8 years
More than 8 to less than or equal to 16 years
More than 16 to less than or equal to 24 years.
(b) Sketch a frequency histogram using the calculations in part (a). What can you say
about the distribution of the age of these second-hand cars? Is there anything that
concerns you about the frequency table and histogram? Specifically, is the choice of bin
classes appropriate? What needs to be done differently?
(c) Halve the width of the bins (0 to 4, 4 to 8, etc) and recalculate the frequency,
cumulative frequency and relative frequency distributions. Using the new distributions
and histogram, what can you now say about the distribution of the age of second-hand
cars?
3. Health expenditure
A recent report by Access Economics provides a comparison of Australian expenditures
on health with that of comparable OECD countries. Data from that report relating to the
year 2005 have been used to reproduce their Figure 2.2 (below denoted as Figure 2.1).
(a) What are the key features of these data?
(b) While this is a bivariate scatter plot, there are three variables involved: health
expenditure, GDP and population. Why account for population by expressing health
expenditure and GDP in per capita terms?

Health expenditure per capita (US$000)

Figure 2.1 OECD Health Expenditure and


GDP
7
6
5
4
3
2
1
0
0

10

20

30

50

40

60

70

GDP per capita (US$000)

4. Australian housing prices


Recent research by Dr Nigel Stapledon at the UNSW School of Economics provides
an extensive analysis of Australian housing prices since 1880. In Figure 2.2 his data
are used to provide a comparison of Sydney and Melbourne housing prices over time.
(a)

What are the key features of these data?

(b)

Why have prices been expressed in constant dollars?

Figure 2.2 Comparison of Sydney and Melbourne median


house prices in constant 2007-08 Dollars
600

Thousands of dollars

500
400
300
200
100
0
1860

1880

1900

1920

1940

1960

Year
Sydney

Melbourne

1980

2000

2020

5. Using the car data from Question 2:


(a)
Calculate the mean, median and mode for this sample of data and use these
statistics to further describe the distribution of car ages.
(b)
If the largest observation were removed from this data set, how would the three
measures of central tendency you have calculated change?
6. For the following statistical population, compute the mean, range, variance and
standard deviation: 3, 3, 5, 12, 13, 14, 17, 20, 21, 21.
What would happen to each of the measures you have calculated if:
(a)
4 were added to each data point (observation)?
(b)

each data point were multiplied by 2?

7. Migrant wealth.
Suppose the Minister for Immigration is interested in research on the assimilation of
migrant households (a household where the chief income-earner is foreign born). The
Household, Income and Labour Dynamics in Australia (HILDA) survey is a
representative survey of Australian households. Using 4,669 household observations
for 2002 from HILDA, we find there are 3,567 households classified as Australianborn and 1,102 classified as migrants. One key consideration is how migrant
households are doing in terms of wealth compared with Australian-born households.
Using these data, we find the following:
Summary statistics for net household wealth ($A)
Australian-born

Mean
236,064

10th percentile
1,545

Median
123,020

90th percentile
560,006

Migrant

248,970

1,720

131,152

524,372

(a) What can you say about the distribution of net household wealth, for both
Australian-born and migrant households, by looking at just the mean and the median
figures?
(b) More generally, what can you say about the distribution of wealth for migrant
households compared to that for Australian-born households? In particular, which
type of household has greater variation in wealth?
(c) Suppose the minister has net household wealth of $600,000. What can you say
about his or her financial circumstances relative to other Australian-born households?
8. Sydney housing prices.
Figure 3.2 depicts a scatter plot of Sydney-area housing prices versus distance from
the CBD. The unit of observation is a suburb, price is the mean of the median price of
houses sold in each suburb for two quarters (those ending in September and

December 2002), and distance is measured in kilometers from downtown.


(a) What would you expect the correlation to be between price and distance?
(b) Does it appear that there is a linear relationship between the two variables?
(c) What other key features of these data can be determined from the plot?
Figure 3.2: House prices in Sydney suburbs versus distance to
CBD
6000000

5000000

Price $

4000000

3000000

2000000

1000000

0
0

10

20

30

40

50

60

70

80

Distance to CBD (kms)

9. Anzac Garage wants to develop guidelines for setting prices of cars according to the
cars age. They hire a business consultant who chooses a sample of 117 second-hand
passenger car advertisements collected from www.drive.com.au and retrieves data on
the age and price of the cars.
(a) The business consultant first calculates the correlation coefficient between age
and price and finds it to be -0.278. Interpret this result.
(b) Sketch what you think the scatter diagram from which this correlation coefficient
was calculated might look like. Suppose the business consultant constructs a simple
linear regression model using price as the dependent variable, and age as the
independent variable. What do you think the estimated regression line might look like
here? (We will return to this particular example later in the course and address this
question more formally.)
10. Big Data. Suppose you are sitting at the NSW Department of Health and have access
to information on hospital admissions, diagnosis, private insurance coverage, sex, age,
smoking status, and length of hospital stay for all patients at all NSW hospitals for
2000 through 2015. A team of statisticians in your department are available to
analyse these data following your direction.

(a) You get a phone call from the State treasurer wanting to know how much of your
budget you spend on smokers and smoking-related health problems. You promise
to get back to her, and put down the phone. What do you tell your team?
(b) You get a phone call from the Australian Council on Smoking and Health, asking
about any evidence that the State has on the association between smoking and
health outcomes. You promise to get back to them and put down the phone.
What do you tell your team?
11. Work through problem 34 on page 165 of Sharpe (Chapter 4).

Weeks 3 and 4
1. (a) Explain what it means to say that two probabilistic events in a sample space are
mutually exclusive of one another.
(b) Explain what it means to say that two probabilistic events in a sample space are
independent of one another.
(c) Why can two events not at the same time be both mutually exclusive and
independent of one another?
2. A department store wants to study the relationship between the way customers pay
for an item and the price of the item. 250 transactions are recorded and the following
table is formed.
Price category
Under $20
$20-$100
Over $100

Cash
15
11
6

Payment
Credit card
9
53
38

Debit card
18
52
48

Convert the table to a joint distribution. Express each of the following questions in
terms of probability statements, and then solve:
(a) What is the probability that an item is under $20?
(b) What is the probability that an item with a price tag of $43 is paid for in cash?

(c) What is the probability that people pay for an item that is at least $20 by credit?
(d) If somebody used a debit card to pay for an item, what is the probability that the
item was less than $100?
(e) Are price and means of payment independent?

3. In a small batch of 20 manufactured widgets, there are, in fact, 3 defective ones. You,
as quality control officer for the company making the widgets, decide to examine a
sample of 3 widgets, selected without replacement, to see how many defective ones
are selected.
(a) Use a probability tree to evaluate the probability distribution of the number of
defectives sampled.
(b) How would your answer change if the sampling were done with replacement?

4. Work through problem 16 on page 200 of Sharpe (Chapter 5).

5. Work through problem 18 on page 200 of Sharpe (Chapter 5).

6. Work through problem 44 on page 203 of Sharpe (Chapter 5).

7. The manager of a factory has determined from past experience that X, the number of
repairs required to machines in her factory on any one day, has the following
probability distribution:
x
P(X = x)

0.41

0.25

0.18

0.10

0.06

Calculate the following:


(a)

P(1 <X< 4)

(b)

P(0 X 3)

(c)

E(X)

(d)

Var(X)

(e)

What is the conditional probability distribution of X, conditional on some


positive number of repairs taking place?

(f)

Describe at least one business decision the manager might face that would be
impacted by the information in the original table of unconditional
probabilities.

(g)

Describe at least one business decision the manage might face that would be
impacted by the information in the table of conditional probabilities.

8. Suppose that the daily number of errors a randomly-selected bank teller makes is
denoted by X and follows the distribution given in the table below. A human resource
manager records the daily numbers of errors of two randomly selected tellers. Denote
the associated random variables by X1 and X2. As the selection is random, X1 and X2
are independent and follow the same distribution as X. The manager then computes
+
the sample mean = 1 2 where the sample size is n = 2.
2
X

P(X = x)

0.6

0.2

0.2

a.

Find the mean and variance of X1. Explain why we do not need to find the mean
and variance of X2 once we know those of X1.

b.

Since X1 and X2 are random, so is. Find the mean and variance of the random
variable. Compare these with the result from (a) and comment. Hint: you will
find it useful to note that (1 , 2 ) = 0 because X1and X2are independent. This

simplifies the evaluation of the variance of the random variable.

c.

d.

Find the possible values that may take. Hence list the probability distribution
of for samples of size 2. (This is known as the sampling distributionof ).

Examine briefly what would happen if n =3, 4, ? For this last sub-question, you
will need to use the idea of a factorial of an integer n, labelled !, which means n
multiplied by every positive integer smaller than itself. So, for example, 3! = 3
2 1 = 6. Also recall the combinatorial formula for the number of ways of
selecting x from n distinct objects(Sharpe page 193): Cxn = !/( )! !.

9. A student has enrolled in three courses in this semester. Lets call them courses A, B
and C. Her chances of passing each course are 0.8, 0.65, and 0.5, respectively.
Passing each course is assumed to be independent of passing other courses. Answer
the following:
a.

Define a random variable for each course outcome.

b.

What is the probability that this student passes exactly two courses? Express this
question in terms of probability statements, and then solve.

c.

What is the probability that this student fails at least one course? Express this
question in terms of probability statements, and then solve.

d.

How reasonable is the assumption of independence?

10. Let X be the number of heads in 4 tosses of a fair coin.


a. What is the probability distribution of X?
b. What are the mean and variance of X?
c. Consider a game where you win $5 for every head but lose $3 for every tail that
appears in 4 tosses of a fair coin. Let the variable Y denote the winnings from this
game. Formulate the probability distribution of Y based on the probability
distribution of X.
d. What is the expected value of Y? Would you like to play this game? If so, why? If
not, why not?

11. Work through problem 41 on page 234 of Sharpe (Chapter 6).

Weeks 5 and 6
1. A random number generator is designed to draw numbers at random from within a
specified range. We can consider any number in the range as a possible outcome.
(a) What type of distribution is the random number generator drawing from?
(b) Suppose we program a random number generator to generate a random number
with a value falling in the interval [0, 2]. What is the height of the density of the
distribution from which the random number generator is drawing? Draw a graph
of the probability density function.
(c) What is the cumulative probability distribution of the random variable from
which draws are being taken? Draw a graph of the cumulative probability
distribution function.
(d) Find the following for this case: P(Y<0.6); P(Y0.6); P(0.5<Y<1.5), using both
the density function and the cumulative probability function. Show that your
answers match whichever you use.
2. From several years records, a fish market manager has determined that the weight of
deep sea bream sold in the market (X) is approximately normally distributed with a
mean of 450 grams and a standard deviation of 100 grams. Assuming this distribution
will remain unchanged in the future, calculate the expected proportions of deep sea
bream sold over the next year weighing
a) between 300 and 400 grams.
b) between 400 and 600 grams.
c) more than 625 grams.

3. In a certain large city, household annual incomes are considered approximately


normally distributed with a mean of $40,000 and a standard deviation of $6,000. What
proportion of households in the city have an annual income over $35,000? If a random
sample of 120 households were selected, how many of these households would we
expect to have annual incomes between $35,000 and $45,000?
4. What is the 75th percentile of the normal distribution N(10, 9)?
5. In a certain city, it is estimated that 60% of households have access to the internet. A
company wishing to sell services to internet users randomly chooses 150 households in
the city and sends them advertising material.
(a)
Calculate the probability that fewer than 90 contacted households have
internet access.
(b)
Calculate the probability that between 60 and 100 (inclusive) contacted
households have internet access.
(c)
There is an 80% chance (probability of .8) that the number of contacted
households with internet access equals or exceeds what value?
6. Using your personalized Course Project data:
(a) Calculate the sample averages of all variables. Which of these averages are
meaningful? Express the meaning of each average in words that are
understandable and effective for a layperson such as your client.
(b) Do you need to manipulate the raw data provided, before proceeding to
statistical analyses, in order to address the clients question? If so, how?
7. Work through problem 28 on page 264 of Sharpe (Chapter 7), referring to the 68-95-99.7
Rule explained on page 239-240 of Sharpe.

8. UNSW wants to measure the attractiveness of its brand to potential students. The
university performs an experiment by inviting 100 high school students from different
public schools across New South Wales to browse a few websites related to different
universities, and then to choose the one that they would prefer most.
(a) Is this a random sample? Can you think of any potential source of selection bias?
(b) Suppose that a perfectly random sample of students is drawn from the target
population, and these students take part in the exercise described above. With
reference to the brief discussion on page 732 of Sharpe (Confounding and
Lurking Variables), can you think of any confounding factors that is, factors
that might lead to lack of confidence in using students expressed preferences, as
measured in this exercise, as an indicator of their degree of overall attraction to
the UNSW brand?
(c) Suppose that the exercise described in part (b) is conducted. The resulting data
include each students high school, the selection of universities whose websites

they browsed, and the one amongst those that they chose as their most-preferred
university. Sketch on a piece of paper or in an Excel sheet what these data
would look like once they are made ready for quantitative analysis.
(d) Add to the display in part (c) any additional variables that you r answer to part
(b) indicated you might like to have access to. Show these variables in a form
that is analysis-ready.
(e) Suppose you had access to the expanded data set constructed in part (d).
Describe what sort of analyses you could conduct that might help to shed light
on UNSWs core question about the attractiveness of its brand.
(f) Based on your analysis, what would you be able to tell UNSW leadership about
the core drivers of its brand appeal that is, what it is about UNSW that students
are drawn to?
9. Work through problem 22 on page 325 of Sharpe (Chapter 9).

10. Work through problem 44 on page 328 of Sharpe (Chapter 9).

11. Work through problem 60 on page 329 of Sharpe (Chapter 9).

12. Work through problem 36 on page 356 of Sharpe (Chapter 10).

Weeks 7 and 8
1. Suppose a normally distributed random variable X has a mean of 50 and a variance of 100.
Also suppose a sample of size 16 is drawn from this population. Calculate the following
probabilities:

(a)
(b)

P(40< X <55)
P(40< <55)

2. Recall the Anzac Garage data used previously. These data are available from the
course website (in the Tutorial Questions and Information folder) in an Excel file
called AnzacG.xls. Use these 117 observations on used passenger cars to find the 95%
confidence interval for the population mean distance travelled by used passenger cars

(this variable is labelled odometer in the data set and is measured in kilometres).
Assume the population standard deviation is 60,000kms.
3. What would be the effects on the width of the confidence interval calculated in the
previous question of:
(a)
a decrease in the level of confidence used?
(b)
an increase in sample size?
(c)
an increase in the population standard deviation?
(d)
an increase in the sample standard deviation?
(e)
an increase in the value of found?

4. Again referring to the data in odometer from AnzacG.xls and the population from
which it is drawn, determine the sample size required to estimate the population mean
to within 5,000 kms with 90% confidence. Again assume the population standard
deviation is 60,000 kms.

5. Perform the following hypothesis tests of the population mean. In each case, draw a
picture to illustrate the rejection regions on both the Z and distributions, and
calculate the p-value of the test.
(a)
H0: = 50, H1: > 50, n = 100, = 55, = 10, = 0.05
(b)
H0: = 25, H1: < 25, n = 100, = 24, = 5, = 0.1
(c)
H0: = 80, H1: 80, n = 100, = 80.5, = 4, = 0.05
6. A real estate expert claims the current mean value of houses in a particular area is
more than $250,000. A random sample of 150 recent sales prices in the area yields a
sample mean of $265,000. It is known that house values in the area are
approximately normally distributed with a standard deviation of $50,000.
(a) Perform an upper tail test of the null hypothesis that the population mean house
value in the area is $250,000. Use a 5% level of significance and state the
rejection (critical) region in terms of both and z.
(b) Why is an upper tail test most appropriate in this case?
(c) What is the p-value associated with the test statistic used in the part (a) test?
Interpret this value.
(d) Define in words the type I and II errors that could afflict the part (a) test.
7. What effect does increasing the sample size have on the outcome of a hypothesis test?
Explain your answer using the example of a one-tail test concerning the mean of a
normally distributed population with known variance.
8. Work through problem 40 on page 420 of Sharpe (Chapter 12).

Recalling Exercise 39:

Then, re-do the analysis with all settings the same except supposing that:
c) The professors students scored 108 points on the final exam, having used the
software (and nothing else changed).
d) The number of students enrolled in the course decreased from 481 to 210
(and nothing else changed).
e) The standard deviation of the students scores increased from 6.3 to 25.2
points (and nothing else changed).

9. Project Review: For the course project, you are only expected to use statistical
methods covered in lectures and tutorials up to and including those in Week 9. Thus
you should now have sufficient material to complete the project in a timely fashion.

What might be useful at this stage is to think about presentation. See the Examples of
Statistical Reports section of the Project folder on Moodle for some ideas in general.
As a directed exercise for this tutorial, compare and contrast the presentation of
material in the NSW BOCSAR report on driving under the influence of cannabis
(driving-cannabis.pdf) and Queensland Office of Economic and Statistical Research
bulletin on computer and internet usage in Queensland (computer-internet-useageqld-c01.pdf). You should be able to read these reports comfortably, although there are
a few methods that may be unfamiliar in the cannabis report (although these methods
will be covered later in the course).
Weeks 9 and 10
1. State whether the normal distribution, the t distribution, or neither would be the right
type of sampling distribution to assume for the sample mean in order to test
hypotheses regarding the population mean in the following situations:
(a)
Population variable normally distributed, 2 unknown, sample size less than
30.
(b)

Population variable normally distributed, 2 unknown, sample size greater


than 30.

(c)

Population variable normally distributed, 2 known, sample size less than 30.

(d)

Population variable not normally distributed, 2 unknown, sample size greater


than 30.

(e)

Population variable not normally distributed, 2 unknown, sample size less


than 30.

2. Reconsider the example used earlier in the course in which a real estate expert
claimed the current mean value of houses in a particular area was more than
$250,000. A random sample of 150 recent sales prices in the area yielded a sample
mean of $265,000, and it is known that house values in the area are approximately
normally distributed with a standard deviation of $50,000.
(a)
If in fact the population mean house value in the area is $260,000, what is the
probability of committing a type II error in performing an upper-tail test of the
null hypothesis that the mean house value price in the area is $250,000, as was
done in Part (a) of the prior weeks exercise? What is the power of the test in
these circumstances? State in words what the power of the test means.
(b)
Illustrate your answer to part (a) above by showing on a diagram the areas
representing the probability of a type II error and the power of the test.
3. A company running an urban rail service wishes to estimate its daily average number
of late-running trains on weekdays. For 10 randomly selected weekdays, it finds the
following numbers of late running trains:
32, 10, 9, 18, 25, 15, 14, 18, 22, 16
(a)

(b)

Assuming the number of late running trains on a weekday is approximately


normally distributed, calculate a 90% confidence interval for the mean number
of late-running trains on a weekday.
If we did not have the assumption of normality, could we still calculate a
confidence interval in this example? If not, suggest a way of overcoming this
problem.

4. Reconsider the question from a previous week that used the Anzac Garage data,
available from the course website (in the Tutorial Questions and Information folder)
in an Excel file called Anzacg.xls. Would normality be a good approximation for the
population distribution of distance travelled by used passenger cars? (Hint: look at
the summary statistics and a histogram.) Do you need to assume normality? Redo the
95% confidence interval for the population mean distance travelled by used passenger
cars without assuming a known population standard deviation.
5. It is known that 80% of people suffering from a particular disease are cured by a
certain standard medication. Test the claim of the developers of a new medication
that their product is more effective than the standard medication in curing the disease,
using a 5% significance level, given a random sample of 400 people with the disease
of whom 330 are cured by using the new medication. (Hint: Use the normal
approximation, and ignore the continuity correction.)
6. Download the data Credit_Card_Bank from the MyStatLab website (available
under the heading of Chapter 1: Data and Decisions). Using the variables Offer
Status and Spendlift Positive, conduct the appropriate Chi-squared test to
determine whether these there is a relationship between the type of offer a customer

was exposed to and whether a lift in spending was observed, assuming a significance
level of 0.05. Interpret your results.
7. Use a calculator to compute the sample least squares regression line for the
model = 0 + 1 + , given the following six observations:
y
2
8
6
12
9
11
x
1
4
3
10
10
8
8. Suppose the relationship between the dependent variable weekly household
consumption expenditure in dollars (y) and the independent variable weekly
household income in dollars (x) is represented by the simple regression model (i
refers to the ith observation or household):
= 0 + 1 +

Suppose a sample of observations yields least squares estimates of b0 = -32 and b1 =


0.82 for this model.

(a) What does represent in the model?


(b) State the basic (classical) assumptions made about the s in this model. Explain
in words what the assumptions mean.
(c) Does the estimate of b0 = -32 make sense? If not, does this necessarily invalidate
the model? Explain your answer.
(d) Interpret both 1 and b1. What does the model predict would be the change in y
following a $10 increase in x from some initial level?
(e) Suppose we measured y and x in cents rather than dollars. What effect would
this have on the estimated coefficient of x? What effect would it have on the
estimated intercept?
(f) Suppose y were measured in dollars but x were measured in cents. What effects
would this have on the estimated coefficient of x?
(g) Distinguish between and (the residual associated with observation i).
Illustrate your answer with a diagram.

9. Work through problem 16 on page 529-530 of Sharpe (Chapter 15).

S-ar putea să vă placă și