Sunteți pe pagina 1din 30

DESCRIPTIVE STATISTICS

1. Indicate whether the following variables are categorical or quantitative:


a. Favorite food.
b. Favorite profession.
c. Number of goals scored by your favorite team last season.
d. Number of students at your school.
e. The eye color of your classmates.
f. IQ of your classmates
2. Indicate whether the following variables are discrete or continuous:
a. Number stocks sold every day in the stock exchange.
b. Hourly temperatures recorded at an observatory.
c. Lifetime of a car.
d. The diameter of the wheels of several cars.
e. Number of children from 50 families.
f. Annual Census of Americans.
3. Classify the following variables as categorical, quantitative discrete or continuous.
a. The nationality of a person.
b. Number of liters of water contained in a tank.
c. Number of books on a library shelf.
d. Sum of points tallied from a set of dice.
e. The profession of a person.
f. The area of the different tiles on a building.
4. The marks obtained by a group of students in a test are:
15, 20, 15, 18, 22, 13, 13, 16, 15, 19, 18, 15, 16, 20, 16, 15, 18, 16, 14, 13.
Construct a frequency distribution table for the data and draw the corresponding
frequency polygon.
5. Given the following series:
3, 3, 4, 3, 4, 3, 1, 3, 4, 3, 3, 3, 2, 1, 3, 3, 3, 2, 3, 2, 2, 3, 3, 3, 2, 2, 2, 2, 2, 3, 2, 1, 1, 1, 2,
2, 4, 1.
Construct a frequency distribution table for the data and draw the corresponding bar
chart.
6. Given the following series:
5, 2, 4, 9, 7, 4, 5, 6, 5, 7, 7, 5, 5, 2, 10, 5, 6, 5, 4, 5, 8, 8, 4, 0, 8, 4, 8, 6, 6, 3, 6, 7, 6, 6, 7,
6, 7, 3, 5, 6, 9, 6, 1, 4, 6, 3, 5, 5, 6, 7.
Construct a frequency distribution table for the data and draw the corresponding bar chart.
7. The weights of 65 children are represented by the following table:

1
Weight [50, 60) [60, 70) [70, 80) [80,90) [90, 100) [100, 110) [110, 120)
fi 8 10 16 14 10 5 2
a. Construct the frequency table.
b. Plot the histogram and frequency polygon.
8. 40 students in a class have obtained the following test scores out of 50.
3, 15, 24, 28, 33, 35, 38, 42, 23, 38, 36, 34, 29, 25, 17, 7, 34, 36, 39, 44, 31, 26, 20,
11, 13, 22, 27, 47, 39, 37, 34, 32, 35, 28, 38, 41, 48, 15, 32, 13.
a. Construct the frequency table.
b. Draw the histogram and frequency polygon.
9. Given the statistical distribution of the table.
xi 61 64 67 70 73
fi 5 18 42 27 8
Calculate:
a. The mode, median and mean.
b. The range, average deviation, variance and standard deviation.
10. Calculate the mean, median and mode for the following set of numbers: 5, 3, 6, 5, 4,
5, 2, 8, 6, 5, 4, 8, 3, 4, 5, 4, 8, 2, 5, 4.
11. Find the variance and standard deviation for the following data series:
12, 6, 7, 3, 15, 10, 18, 5.
12. Find the mean, median and mode for the following set of numbers:
3, 5, 2, 6, 5, 9, 5, 2, 8, 6.
13. Find the average deviation, variance and standard deviation for the following series
of numbers:
2, 3, 6, 8, 11.
12, 6, 7, 3, 15, 10, 18, 5.
14. The test results from a group of employees from a factory are represented in the
following table:
fi
[38, 44) 7
[44, 50) 8
[50, 56) 15
[56, 62) 25
[62, 68) 18
[68, 74) 9
[74, 80) 6

2
a. Draw the histogram and the cumulative frequency polygon.
b. Calculate the mean, median and mode
15. Given the series:
3, 5, 2, 7, 6, 4, 9.
3, 5, 2, 7, 6, 4, 9, 1.
Calculate:
The mode, median and mean.
The average deviation, variance and standard deviation.
16. A statistical distribution is given by the following table:
[10, 15) [15, 20) [20, 25) [25, 30) [30, 35)
fi 3 5 7 4 2
Calculate:
The mode, median and mean.
The range, average deviation and variance.
17. Given the statistical distribution:
[0, 5) [5, 10) [10, 15) [15, 20) [20, 25) [25, ∞)
fi 3 5 7 8 2 6
Calculate the mode.

18. The numbers 4.47 and 10.15 are added to a set of 5 numbers whose mean is 7.31.
What is the mean of the new set of numbers?
19. A dentist records the number of cavities in 100 children from a school. The
information obtained is summarized in the following table:
No. of cavities fi ni
0 25 0.25
1 20 0.2
2 x z
3 15 0.15
4 y 0.05
a. Complete the table to obtain the values of x, y, z.
b. Create a pie chart.
c. Calculate the average number of cavities.
20. Given the set:
10, 13, 4, 7, 8, 11 10, 16, 18, 12, 3, 6, 9, 9, 4, 13, 20, 7, 5, 10, 17, 10, 16, 14,
8, 18

3
Find their median and quartiles.
21. A pediatrician has obtained the following table, which represents the number of
children who begin to walk for the first time at different ages:
Months Children
9 1
10 4
11 9
12 16
13 11
14 8
15 1
a. Draw the respective frequency polygon.
b. Calculate the mode, median, mean and variance.
22. Complete the missing data in the following statistical table:
xi fi Fi ni
1 4 0.08
2 4
3 16 0.16
4 7 0.14
5 5 28
6 38
7 7 45
8
Also, calculate the mean, median and mode of this distribution.
23. Consider the following data: 3, 8, 4, 10, 6, 2.
a. Calculate its mean and variance.
b. If all the above data was multiplied by 3, what would the new mean and variance is?
24. The result of throwing two dice 120 times is represented by the table:
Sums 2 3 4 5 6 7 8 9 10 11 12
1
No. of Times 3 8 9 20 19 16 13 11 6 4
1
a. Calculate the mean and standard deviation.
b. Find the percentage of values in the interval (x − σ, x + σ).
25. The heights of the players (in centimeters) from a basketball team are represented by
the table:

4
[175, [185, [190, [195,
Height [170, 175) [180, 185)
180) 190) 195) 2.00)
No. of players 1 3 4 8 5 2
Calculate:
a. The mean.
b. The median.
c. The standard deviation.
d. How many players are above the mean plus one standard deviation?
26. The result of throwing one dice 200 times is represented by the following table:
1 2 3 4 5 6
fi a 32 35 33 b 35
Determine the value of a and b knowing that the average score is 3.6.
27. The following graph is a histogram representing the weight of 100 children:
a. Create the respective table of distribution.
b. If John weighs 72 pounds, how many students
are lighter than he is?
c. Calculate the mode.
d. Calculate the median.
e. In what quartile are 25% of the heaviest pupils
found?
28. Given the absolute cumulative frequency table:
Age Fi
[0, 2) 4
[2, 4) 11
[4, 6) 24
[6, 8) 34
[8, 10) 40
a. Calculate the arithmetic mean and standard deviation.
b. Calculate the difference between the values that are the 10 central ages?
c. Create the respective absolute cumulative frequency polygon.
29. Person A has a height of 1.75 meters and lives in a city where the average height is
1.60 meters and the standard deviation is 20 centimeters. Person B is 1.80 meters and
lives in a city where the average height is 1.70 meters and the standard deviation is 15
centimeters. Which of the two is considered to be taller compared to their fellow
citizens?

5
30. A teacher distributed two tests to a group of 40 pupils and obtained the following
results:
A mean of 6 for the first test with a standard deviation of 1.5 and a mean of 4 for
the second test with a standard deviation of 0.5.
A pupil obtains a score of 6 in the first test and 5 in the second. With regard to
the rest of the group: in which of the two tests did he obtain a better score?
31. The attendance at 4 cinema halls on a given day was 200, 500, 300 and 1,000 people.
a. Calculate the dispersion of the number of attendees.
b. Calculate the coefficient of variation.
c. If there were 50 attendees more in each room on the same day, what effect would it
have on the dispersion?
32. State whether each of the following describes a study measuring qualitative or
quantitative data.
a. A researcher distributed open-ended questions to participants asking how they
feel when they are in love.
b. A researcher records the blood pressure of participants during a task meant to
induce stress.
c. A psychologist interested in drug addiction injects rats with an attention –
inducing drug and then measures the rate of lever pressing.
d. A witness to a crime gives a description of the suspect to the police.
33. State whether each of the following are continuous or discrete data:
a. Time in seconds to memorize a list of words
b. Number of students in a statistics class
c. The weight in pounds of newborn infants
d. The SAT score among college students.
34. Fill in the table below to identify the characteristics of each variables:
Type of data Type of number
Scale of
Variable (qualitative (continuous vs.
measurement
vs.quantitative) Discrete)
Gender
Seasons
Time of day
Rating scale score
Movie ratings (one to four
stars)

6
Number of students in your
class
Temperature (degrees
Fahrenheit)
Time (in minutes) to prepare
dinner
Position standing in line
35. The table below shows the daily sales (VND 1,000) of internet cafe in Nov/2009 as:
700 940 765 860 870 890
950 650 750 850 855 780
760 735 600 780 920 690
620 730 830 860 750 1000
740 800 750 680 880 790
a. Represent the data using grouped frequency distribution with 4 classes and same
class width.
b. Determine the percentage of days with daily sales equal to and more than VND
800,000.
c. Represent the data arranged by appropriate diagram.
36. Here are the raw data of total export value (USD million) of 30 enterprises in Hanoi
in 2010:
65 65 58 77 67 68
45 57 74 52 80 61
56 70 40 72 65 78
42 65 57 52 45 66
57 69 50 65 66 65
a. Rearrange the data.
b. Represent the arranged data diagrammatically in an appropriate way.
c. Comment on the presentation.
37. You are given the following data
6 10 6 4 9 5 5 5 5 7 6 2 5
5 5 5 7 8 7 6 7 5 4 6 4 4
5 7 3 6 4 7 4 4
a. Construct a frequency distribution for these data
b. Based on the frequency distribution, develop a histogram
38. A data set has 200 observations. The maximum value in the data set is $16.300 and the minimum value

7
a. Use Sturges’s rule to determine the number of classes that you will use
b. Based on the number of classes determined in part a, indicate the class width for each class.

39. Here are the sales of bicycles (number of bicycles) over 40 weeks in a shop, as follows:

44 77 40 80 68 44 50 49 56 68
56 48 74 54 78 58 46 79 64 71
62 52 74 63 58 73 64 46 56 69
62 48 66 58 51 70 47 67 52 58

Question:
a. Create the frequency distribution with 4 classes? h=?
b. Determine the average number of bicycles sold in a week; Mode; Me; σ of the number of bicycles
sold?
c. Give a conclusion for the characteristics of this distribution (Moderately Left Skewed?
Moderately Right Skewed? Or Symmetric?)

40. The table below shows the age of workers in a company:

Age No of workers Age No of workers

20-24 973 45-49 369


25-29 1122 50-54 592
30-34 414 55-59 825
35-39 257 From 60 210
40-44 258 and over

Question:
a. Determine the average age of workers? Standard deviation of age?
b. If you are a HR manager of this company and want to make a policy that reaches the demand of
workers
There are 3 groups of demand after research:
-Group 1: People from ages 20-34: Need for increasing salary
-Group 2: People from ages 35-49: Need for increasing time for relax and
Entertainment
-Group 3: People from ages 50 and over: Need for increasing more benefits after

8
retirement
Which group do you choose to satisfy their demand? Why?
41. The table below shows the average income of workers in 3 companies, which have produced the
same type of products :
over 12 months of 2009 (Unit: million VND)

A 3.3 1.8 2 2.1 2.4 1.8 2.1 2.2 2.4 2.3


B 3 2 2.1 2.2 2.3 2 2 2.1 2.3 2.3
C 2.8 2.1 2 2.4 2.3 2 2.2 2.3 2.2 2.4

a. Calculate the average monthly income of a worker in each company


b. Determine the median of monthly income of workers in each company
c. Determine the standard deviation of monthly income of workers in every company.
d. Which company should we apply for, if other conditions are similar (mobile phone
support…)?

42. Determine the average income and variance of income in both groups in a manufacture and give
a conclusion:

Group 1 Group 2
Income No of workers Income No of workers

(1000 VND) (people) (1000 VND) (People)

1200 3 1500 6
1500 8 1800 11
2100 10 2100 7
2200 6 2200 4
2500 3 2500 2

43. The average weight of students in a class is 55 kg with the standard deviation as 6kg.
The average height of students is 165 cm with the standard deviation as 8 cm
Students in the class are similar in term of weight or height?

Research on the average price and standard deviation of price of 2 types of stocks over a period.
Comment on the variability/ dispersion of those two stocks.

9
Type Average Price (vnd/share) Standard Deviation (vnd/share)
A 20000 4000
B 300000 40000

44. Productivity of workers in one company


Productivity (items/h) No of workers
35-40 10
40-45 20
45-50 30
50-60 35
60-80 5
a. Average productivity of per worker?
b. Mode of productivity? Median?
INFERENTIAL STATISTICS
Confidence Interval Problems
1. A study is conducted in a neighborhood to better understand the types of recreational
activities. 100 individuals are selected at random and surveyed.
It is known that 2,500 children, 7,000 adults and 500 elderly live in the neighborhood.
Therefore, the researchers decide to choose the previous sample using stratified
sampling, as it is known that the recreations of the inhabitants change with age.
Knowing this, determine the sample size for each stratum.

Total population: 2,500 + 7,000 + 500 = 10,000.

2. Given the population of elements: {22, 24, 26}.

10
a. Write down all possible samples of size two, chosen by simple random sampling.
b. Calculate the variance of the population.
c. Calculate the variance of the sample averages.

3.Calculate the variance of the sample averages.

3. The height of students studying at a language school follows a normal distribution


with a mean of 1.62 m and a standard deviation of 0.12. What is the probability that the
mean of a random sample of 100 students will be taller than 1.60 m?

4. A sample of the various prices for a particular product has been conducted in 16
stores that were selected at random in a neighborhood of a city. The following prices
were noted:
95, 108, 97, 112, 99, 106, 105, 100, 99, 98, 104, 110, 107, 111, 103, 110.
Assuming that the prices of this product follow a normal law of variance of 25 and an
unknown mean:
a . What is the distribution of the sample mean?
b. Determine the confidence interval at 95% for the population mean.

11
2.Determine the confidence interval at 95% for the population mean.

95% → zα/2 = 1.96

(104 − 1.96 · 1. 25, 104 + 1.9 · 1.25) = (101.55; 106.45)

5.The average height of a random sample of 400 people from a city is 1.75 m. It is
known that the heights of the population are random variables that follow a normal
distribution with a variance of 0.16.
a. Determine the interval of 95% confidence for the average heights of the population.
b. With a confidence level of 90%, what would the minimum sample size need to be in
order for the true mean of the heights to be less than 2 cm from the sample mean? (1090)

1.Determine the interval of 95% confidence for the average heights of the population.

n = 400 x = 1.75 σ = 0.4

1 − α = 0.95 zα/2 = 1.96

(1.75 ± 1.96 · 0.4/20 ) → (1.7108,1.7892)

6. The monthly sales of an appliance shop are distributed according to a normal law,
with a standard deviation of $900. A statistical study of sales in the last nine months has
found a confidence interval for the mean of monthly sales with extremes of $4,663 and
$5,839.
a. What were the average sales over the nine-month period?
b. What is the confidence level for this interval?

1. What were the average sales over the nine month period?

n=9 x = (4,663 + 5,839)/2; x =5,251

2. What is the confidence level for this interval?

12
E = (5,839 − 4,663)/2 = 588

588 = z α/2 · 900/3 zα/2 = 1.96

1 − α = 0.95 → 95%

7. The proportion of colorblind individuals in a population needs to be estimated by the


percentage observed in a random sample of individuals of size n.
a. If the percentage of colorblind individuals in the sample is 30%, estimate the value of
n so that, with a confidence level of 0.95, the error in the estimate is less than 3.1%.
b. If the sample size is 64 individuals, and the percentage of colorblind individuals in the
sample is 35%, determine using a significance level of 1%, the corresponding
confidence interval for the proportion of the colorblind population.

1 − α = 0.95 zα/2 = 1.96

α = 0.01 1 − α = 0.99 zα/2 = 2.575

8. In a population, a random variable follows a normal distribution with an unknown


mean and a standard deviation of 2.
a. In a sample of 400 selected at random, a sample mean of 50 was obtained. Determine
the confidence interval with a confidence level of 97% for the average population.
b. With the same confidence level, what minimum sample size should it have so that the
interval width has a maximum length of 1?

13
9. The quantity of hemoglobin in the blood spread of a man follows a normal
distribution with a standard deviation of 2 g/dl.
Calculate the confidence level for a sample of 12 men, which indicates that the
population mean blood hemoglobin is between 13 and 15g/dl.

10. In a department store chain, 150 people work in human resources, 450 in sales, 200
in accounting and 100 in customer service. In order to conduct a survey, a sample of 180
workers is selected. How many employees should be selected from each department
according to the criterion of proportionality?

Hypothesis Testing Problems


1. A company that a packages peanut states that at a maximum 6% of the peanut shells
contains no nuts. At random, 300 peanuts were selected and 21 of them were empty.

14
a. With a significance level of 1%, can the statement made by the company be accepted?
b. With the same sample percentage of empty nuts and 1 − α = 0.95, what sample size
would be needed to estimate the proportion of nuts with an error of less than 1%?
(cannot reject)

2. The life span of 100 W light bulbs manufactured by a particular company follows a
normal distribution with a standard deviation of 120 hours and its half-life is guaranteed
under warranty for a minimum of 800 hours. At random, a sample of 50 bulbs from a lot
is selected and it is revealed that the half-life is 750 hours. With a significance level of
0.01, should the lot be rejected by not honoring the warranty?
H_0: u>= 800
(reject)

3. A manufacturer of electric lamps is testing a new production method that will be


considered acceptable if the lamps produced by this method result in a normal
population with an average life of 2,400 hours and a standard deviation equal to 300.
A sample of 100 lamps produced by this method has an average life of 2,320 hours. Can
the hypothesis of validity for the new manufacturing process be accepted with a risk
equal to or less than 5%?

H_0: u = 2400
(reject)

4. The quality control division of a factory that manufactures batteries suspects defects
in the production of a model of mobile phone battery, which results in a lower life
for the product. Until now, the time duration in phone conversation for the battery
followed a normal distribution with a mean of 300 minutes and a standard deviation of
30. However, in an inspection of the last batch produced before sending it to market, it
was found that the average time spent in conversation was 290 minutes in a sample of 60
batteries. Assuming that the time is still normal with the same standard deviation: Can it
be concluded that the quality control suspicions are true at a significance level of 1%?
H0 : µ ≥ 300 ?

5. It is believed that the average level of prothrombin in a normal population is


20mg/100 ml of blood plasma with a standard deviation of 4-milligrams/100 ml. To

15
verify this, a sample is taken from 40 individuals in whom the average is 18.5 mg/100
ml. Can the hypothesis be accepted with a significance level of 5%?
1. A random sample of 25 sport utility vehicles (SUVs) of the same year and model
revealed the following miles per gallon (mpg) values:
12.4 13.0 12.6 12.1 13.1
13.0 12.0 13.1 11.4 12.6
9.5 13.25 12.4 10.7 11.7
10.0 14.0 10.9 9.9 10.2
11.0 11.9 9.9 12.0 11.3
Assume that the population for mpg for this model year is normally distributed.
Use the sample results to develop a 95% confidence interval estimate for the population
mean miles per gallon.

2. The concession managers for the Arkansas Travelers (a minor league baseball team
located in Little Rock) are interested in estimating the average amount spent on food by
fans attending the team’s Friday night home
games. Suppose a random sample of 36 receipts for food orders was taken from last
year’s receipts for
Friday night home games with the following food expenditures recorded:
30.50 10.63 3.77 21.90 21.95 9.65
14.31 11.39 25.36 15.79 30.88 12.20
8.48 20.70 28.54 9.13 15.54 14.95
11.96 11.91 8.28 12.87 24.26 21.04
20.08 10.08 25.37 12.02 11.61 11.22
25.36 28.07 17.71 23.00 31.79 17.70
a. Based on the sampled receipts, what is the best point estimate for the mean food
expenditures for Friday night home games?
b. Use the sample information to construct a 95% confidence interval estimate for the
true mean expenditures for Friday night home games.
c. Before the sample was taken, the food concessions manager stated that mean food
expenditures were about $19.00 per order. Does his statement seem consistent with the
results obtained in part b?

16
CORRELATION AND REGRESSION
1. Five children aged 2, 3, 5, 7 and 8 years old weigh 14, 20, 32, 42 and 44 kilograms
respectively.
a. Find the equation of the regression line of age on weight.
b. Based on this data, what is the approximate weight of a six-year-old child?

2. The success of a shopping center can be represented as a function of the distance (in
miles) from the center of the population and the number of clients (in hundreds of
people) who will visit. The data is given in the table below:
No. Customer (x) 8 7 6 4 2 1
Distance (y) 15 19 25 23 34 40
a. Calculate the linear correlation coefficient.
b. If the mall is located 2 miles from the center of the population, how many customers
should the shopping center expect?
c. To receive 500 customers, at what distance from the center of the population should
the shopping center be located?
3. The grades of five students in mathematics and chemistry classes are:
Mathematics 6 4 8 5 3. 5
Chemistry 6. 5 4. 5 7 5 4
Determine the regression lines and calculate the expected grade in chemistry for a
student who has a 7.5 in mathematics.

4. A data set has a correlation coefficient of r = −0.9, with the means of marginal

distributions of = 1 and = 2. It is known that one of the following four equations


corresponds to the regression of y on x:
y = −x + 2 3x − y = 1 2x + y = 4 y = x + 1
Select the correct line.

5. The heights (in centimeters) and weight (in kilograms) of 10 basketball players on a
team are:
Height (X) 186 189 190 192 193 193 198 201 203 205
Weight (Y) 85 85 86 90 87 91 93 103 100 101
Calculate:
a The regression line of y on x.
b The coefficient of correlation.

17
c The estimated weight of a player who measures 208 cm.

6. From the following data of hours worked in a factory (x) and output units (y),
determine the regression line of y on x, the linear correlation coefficient and determine
the type of correlation.
Hours (X) 80 79 83 84 78 60 82 85 79 84 80 62
Production (Y) 300 302 315 330 300 250 300 340 315 330 310 240

7. A group of 50 individuals has been surveyed on the number of hours devoted each
day to sleeping and watching TV. The responses are summarized in the following table:
No. of sleeping hours (x) 6 7 8 9 10
No. of hours of television (y) 4 3 3 2 1
Absolute frequencies (fi) 3 16 20 10 1
a Calculate the correlation coefficient.
b Determine the equation of the regression line of y on x.
c If a person sleeps eight hours, how many hours of TV are they expected to
watch?

8. The following table summarizes the results of an aptitude test given to six clerks to
determine the correlation between test scores (x) and sales in the first month (y) in
hundreds of dollars.
X 25 42 33 54 29 36
Y 42 72 50 90 45 48
a. Find the correlation coefficient and interpret the results.
b. Calculate the regression line of y on x and predict the sales of a vendor who obtains
47 on the test.

10. A company wants to predict the annual value of its total sales based on the national
income of the country where is does business. The relationship is represented in the
following table:
x 189 190 208 227 239 252 257 274 293 308 316
y 402 404 412 425 429 436 440 447 458 469 469
x represents the national income in millions of dollars and y represents the
company's sales in thousands of dollars in the period from 1990 to 2000 (inclusive).
Calculate:

18
a. The regression line of y on x.
b. The linear correlation coefficient and interpret it.
c. If in 2001, the country's national income was 325 million dollars, what would the
prediction for the company's sales be?

11. The statistical information obtained from a sample of 12 farms on the relationship
between the investment and yield in hundreds of thousands of dollars is shown in the
following table:
Investment (x) 11 14 16 15 16 18 20 21 14 20 19 11
Yield (y) 2 3 5 6 5 3 7 10 6 10 5 6
Calculate:
a. The regression line of the yield with regard to the investment.
b. The estimated investment needed to obtain a yield of $1,250,000.
12. The number of hours devoted to studying a subject and the marks obtained by eight
students in the corresponding examination is:
Hours (x) 20 16 34 23 27 32 18 22
Mark (y) 6.5 6 8.5 7 9 9.5 7.5 8
Calculate:
a. Line of regression of y on x.
b. The estimated mark a person would obtain who studied 28 hours.
13. The following table shows the age (in years) of 10 children and a quantitative
measure of their aggressive behavior (measured on a scale of 0 to 10)
Age 6 6 6.7 7 7.4 7.9 8 8.2 8.5 8.9
Aggressive behavior 9 6 7 8 7 4 2 3 3 1
a. Determine the regression line of aggressive behavior according to age.
b. From that line, determine the value of aggressive behavior that would correspond
to a child of 7.2 years.
14. The values of two variables x and y are distributed according to the following table:
y/x 100 50 25
14 1 1 0
18 2 3 0
22 0 1 2
a. Calculate the covariance.
b. Obtain and interpret the linear correlation coefficient.
c. Determine the equation of the regression line of y on x.

19
15. The scores obtained by a group of students in tests that measure verbal ability (X)
and abstract reasoning (Y) are represented in the following table:
y/x 20 30 40 50
(25-35) 6 4 0 0
(35-45) 3 6 1 0
(45-55) 0 2 5 3
(55-65) 0 1 2 7
a. Is there a correlation between the two variables?
b. According to the data, if one of these students obtained a score of 70 points in abstract
reasoning, what would be the estimated score in verbal ability?
16. It is determined that there is no relationship between the consumption of paper and
water in a city.
a. What is the value of the covariance of these variables?
b. What is the linear correlation coefficient?
c. Determine the equations of the two regression lines and interpret the relationship.
17. The number of offenses committed in the past year by four drivers of a transport
company and the following table represents their respective experience in years:
Years (x) 3 4 5 6
Offenses (y) 4 3 2 1
Calculate the linear correlation coefficient and interpret it.

18. A person has entered weekly football pools and has noted the number of correct
predictions he has made over four weeks in February. The correct predictions are
represented in the following table:
Pools (X) 6 8 6 8
No. of Correct Predictions (Y) 1 2 2 1
Determine the linear correlation coefficient and interpret it. Based on the success this
individual has experienced in February, should potential betters have confidence in his
predictions?
Here is an estimated multiple regression model:
Yx1x2x3 = b0 + b1x1 + b2x2 + b3x3
Where:
- Y: number of products sold in the month (1000 units)
- X1: number of people in the area (1000 people)

20
- X2: unemployment rate of the area (%)
- X3: expenditure of advertisement (million VND)
Question:
a. Determine the population multiple regression model to find out b0, b1, b2, b3
b. Assume that we have already determine:
b0 = -3.4 , b1 = 0.52, b2 = 0.66, b3 = 0.33
Determine the status of those figures and their meaning.
c. Based on the result found out in b, forecast the sales of that product in the area,
whereas the number of people is 25,650 people; the unemployment rate is 5%;
and the expenditure of advertisement is 30.5 million VND.

EXERCISE 18
American Express Company has long believed that its cardholders tend to travel more
extensively than others – both on business and for pleasure. As part of a
comprehensive research effort undertaken by a New York market research firm on
behalf of American Express, a study was conducted to determine the relationship
between travel and charges on the American Express card. The research firm selected
a random of 25 cardholders from the American Express computer file and recorded
their total charges over a specified period. The data are as follow:

Trav Charg Trav Charg Trav Charg Trav Charg Trav Charg
el es el es el es el es el es
miles (USD) miles (USD) miles (USD) miles (USD) miles (USD)
1211 1802 2026 2305 2699 3371 3643 5298 4533 6059
1345 2405 2133 3016 2806 3998 3852 4801 4804 6426
1422 2005 2253 3385 3082 3555 4033 5147 5090 6321
1687 2511 2400 3090 3209 4692 4267 5738 5233 7026
1849 2332 2468 3694 3466 4244 4498 6420 5439 6964

Review Questions
1 How does regression differ from correlation?
2 How does an algebraic line differ from a statistical line?
3 Lines are characterized by their slope and intercept. What does the slope tell you about
the line? What does the intercept tell you?
4 What does a slope of 0 indicate?
5 What is "squared" in a least squared regression line?

21
6 Suppose the relation between AGE (years) and HEIGHT (inches) in an adolescent

population is described by this model: = 46 + 1.5X. Interpret the slope of this


model. Then, predicted the average height of a 10 year-old.
7 What t value do you use when calculating a 95% confidence interval for b when n =
25?
8 What symbol is used to denote the slope in the data? What symbol is used to denote the
slope in the population?
9 Under the null hypothesis, beta = ______.
10 Negative slopes suggest that as X increases, Y tends to _______________.
11 The Normality and equal variance assumptions for regression refer to the
distribution of the _____________.
12 What is confounding?
13 What is a residual?
14 What distributional conditions are necessary to help infer population slope beta?
15 Vocabulary: least squares method, regression coefficients, slope estimate (b),

slope parameter (b), intercept estimate (a), intercept parameter (a), ("y hat"),
standard error of the regression (sY|x), standard error of the slope (SEb).
16. Anscombe's quartet. This exercise demonstrates why it is important to look at a
graph of the data before conducting numerical analyses. Each data sets is characterized

by these same numerical results: n = 11, = 9.0, = 7.5, r = 0.82, = 3 + 0.5X, and P
= 0.0022.
a. Plot each data set on four separate graphs.
b. Which of these data sets will support linear correlation and regression? Explain your
response.

Data Set I Data Set II Data Set III


X1 Y1 X2 Y2 X3 Y3
10.0 8.04 10.0 9.14 10.0 7.46 8.0
8.0 6.95 8.0 8.14 8.0 6.77 8.0
13.0 7.58 13.0 8.74 13.0 12.74 8.0
9.0 8.81 9.0 8.77 9.0 7.11 8.0
11.0 8.33 11.0 9.26 11.0 7.81 8.0
14.0 9.96 14.0 8.10 14.0 8.84 8.0
6.0 7.24 6.0 6.13 6.0 6.08 8.0

22
4.0 4.26 4.0 3.10 4.0 5.39 19.0
12.0 10.84 12.0 9.13 12.0 8.15 8.0
7.0 4.82 7.0 7.26 7.0 6.42 8.0
5.0 5.68 5.0 4.74 5.0 5.73 8.0
17. Ecological study smoking and lung cancer. Recall that X = regional per capita 1-
year cigarette consumption in 1930 and Y = lung cancer mortality (per 100,000 person-
years) 20 years later. The scatter plot revealed a linear positive association. Although the
data point for the U.S. was lower than expected, we could not say with certainty whether
it was an outlier. Overall, r = 0.74 (P = 0.010).
a. Calculate the least square regression coefficients for these data. Then show the
regression model (equation) for the data.
b. Interpret the slope estimate of the model.

c. Predict the lung cancer mortality rate (per 100,000 person-years) in a country with
annual per capita cigarette consumption of 800 cigarettes.
d. Calculate the 95% confidence interval for the slope. Interpret this interval.

TIME-SERIES ANALYSIS AND FORECASTING


1. A firm’s sales for a product line during the 12 quarters of the past three years were as
follows:
YEAR QUARTER SALES (1000 YEAR QUARTER SALES (1000
USD) USD)
1 1 600 3 2600
2 1550 4 2900
3 1500 3 1 3800
4 1500 2 4500
2 1 2400 3 4000
2 3100 4 4900
Estimate a trend line and forecast each quarter of the fourth year using least square
method and four-quarter moving average method.

Year Qtr x y t y-t xy


1 1 1 600 801.2821 -201.282 600
2 2 1550 1160.897 389.1026 3100
3 3 1500 1520.513 -20.5128 4500
4 4 1500 1880.128 -380.128 6000
2 1 5 2400 2239.744 160.2564 12000

23
2 6 3100 2599.359 500.641 18600
3 7 2600 2958.974 -358.974 18200
4 8 2900 3318.59 -418.59 23200
3 1 9 3800 3678.205 121.7949 34200
2 10 4500 4037.821 462.1795 45000
3 11 4000 4397.436 -397.436 44000
4 12 4900 4757.051 142.9487 58800

∑x 78.00
∑y 33350.00
∑xy 268200.00
∑x^2 650.00
b1 359.6153846
b0 441.6666667

Q1 Q2 Q3 Q4
26.92308 450.6410256 -258.974 -218.59 0.00
0 0 0 0 (adjustments)
26.92308 450.6410256 -258.974 -218.59 Adjusted
5143.59 5926.923077 5576.923 5976.923 Forecast

t (centered
Year Qtr x y moving y-t
avg)

1 1 1 600
2 2 1550
3 3 1500 1512.5 -12.5
4 4 1500 1931.25 -431.25
2 1 5 2400 2262.5 137.5
2 6 3100 2575 525
3 7 2600 2925 -325
4 8 2900 3275 -375
3 1 9 3800 3625 175
2 10 4500 4050 450
3 11 4000
4 12 4900

gmm 1.151087983

24
Q1 Q2 Q3 Q4
156.25 487.5 -168.75 -403.125 71.88
17.97 17.97 17.97 17.97 (adjustments)
174.22 505.47 -150.78 -385.155 Adjusted
6351.262 7615.789289 8033.823 9036.043 Forecast

Avg change per time period 362.5


5311.72 6005.47 5711.72 5839.845

2. Zeus Computer Chips, Inc., used to have major contracts to produce the Pentium-type
chips. The market has been declining during the past three years because of the dual-core
chips, which it cannot produce, so Zeus has the unpleasant task of forecasting next year.
The task is unpleasant because the firm has not been able to find replacement chips for its
product lines. Here is demand over the past 12 quarters:
2005 Actual 2006 Actual 2007 Actual
demand demand demand
I 4800 I 3500 I 3200
II 3500 II 2700 II 2100
III 4300 III 3500 III 2700
IV 3000 IV 2400 IV 1700
a. Forecast the four quarters of 2008 using least square method to estimate the trend line?
(Technique for calculating seasonal variation is Additive model).

Year Qtr x y t y-t xy


1 1 1 4800 4235.897 564.1026 4800
2 2 3500 4032.401 -532.401 7000
3 3 4300 3828.904 471.0956 12900
4 4 3000 3625.408 -625.408 12000
2 1 5 3500 3421.911 78.08858 17500
2 6 2700 3218.415 -518.415 16200
3 7 3500 3014.918 485.0816 24500
4 8 2400 2811.422 -411.422 19200
3 1 9 3200 2607.925 592.0746 28800
2 10 2100 2404.429 -304.429 21000
3 11 2700 2200.932 499.0676 29700
4 12 1700 1997.436 -297.436 20400

∑x 78.00

25
∑y 37400.00
∑xy 214000.00
∑x^2 650.00
-
b1 203.4965035
b0 4439.393939

Q1 Q2 Q3 Q4
-
411.4219 451.7482517 485.0816 -444.755 0.00
0 0 0 0 (adjustments)
-
411.4219 451.7482517 485.0816 -444.755 Adjusted
2205.361 1138.694639 1872.028 738.6946 Forecast

b. Forecast the four quarters of 2008 using four-quarter moving average to extract the
trend?
t
(centered
Year Qtr x y y-t
moving
avg)
1 1 1 4800
2 2 3500
3 3 4300 3737.5 562.5
4 4 3000 3475 -475
2 1 5 3500 3275 225
2 6 2700 3100 -400
3 7 3500 2987.5 512.5
4 8 2400 2875 -475
3 1 9 3200 2700 500
2 10 2100 2512.5 -412.5
3 11 2700
4 12 1700

gmm 0.944845275

Q1 Q2 Q3 Q4
362.5 -406.25 537.5 -475 18.75
-4.75 -4.75 -4.75 -4.5 (adjustments)
357.75 -411 532.75 -479.5 Adjusted
2477.029 1591.390853 2424.7 1308.1 Forecast

26
Avg change per time period -175
2345.25 1401.5 2170.25 983

3. The weekly demand (in cases) for a particular brand of automatic dishwasher detergent
for a chain of grocery stores located in Columbus, Ohio, follows.

a. Construct a time series plot. What type of pattern exists in the data? 

b. Use a three-week moving average to develop a forecast for week 11. 

4. The Garden Avenue Seven sells CDs of its musical performances. The following table
re- ports sales (in units) for the past 18 months. The group’s manager wants an accurate
method for forecasting future sales.

a. Construct a time series plot. What type of pattern exists in the data? 

b. Use trend projection to provide a forecast.
5. The Costello Music Company has been in business for five years. During that time,
sales of pianos increased from 12 units in the first year to 76 units in the most recent year.
Fred Costello, the firm’s owner, wants to develop a forecast of piano sales for the coming
year. The historical data follow. 


Year 1 2 3 4 5
Sales 12 28 34 50 76

a. Construct a time series plot. What type of pattern exists in the data? 


27
b. Develop the linear trend equation for the time series. What is the average increase in

sales that the firm has been realizing per year? 

c. Forecast sales for years 6 and 7. 

6. Consider the Costello Music Company problem in exercise 5. The quarterly sales data
follow.

a. Use the following dummy variables to develop an estimated regression equation to


account for any seasonal and linear trend effects in the data: Qtr1 􏰔= 1 if Quarter 1, 0
otherwise; Qtr2 = 1 if Quarter 2, 0 otherwise; and Qtr3 = 1 if Quarter 3, 0 otherwise. 

b. Compute the quarterly forecasts for next year. 

c. Using time series decomposition, compute the seasonal indexes for the four quarters.

d. When does Costello Music experience the largest seasonal effect? Does this result ap-
pear reasonable? Explain. 

e. Deseasonalize the data and use the deseasonalized time series to identify the trend. 

f. Use the results of part (e) to develop a quarterly forecast for next year based on trend.

g. Use the seasonal indexes developed in exercise 50 to adjust the forecasts developed

in part (f) to account for the effect of season. 

7. Hudson Marine has been an authorized dealer for C&D marine radios for the past
seven years. The following table reports the number of radios sold each year. 

Year 1 2 3 4 5 6 7
Number 35 50 75 90 105 110 130
sold

a. Construct a time series plot. Does a linear trend appear to be present? 



b. develop a linear trend equation for this time series. 

c. Use the linear trend equation developed in part (b) to develop a forecast for annual

sales in year 8. 

INDEXES

28
1. Suppose the following data represent the price of 20 reams of office paper over a 50-
year time frame. Find the simple index numbers for the data
a. Let 1950 be the base year
b. Let 1980 be the base year
Year Price ($) Year Price ($)
1950 22.45 1980 69.75
1955 31.40 1985 73.44
1960 32.33 1990 80.05
1965 36.5 1995 84.61
1970 44.9 2000 87.28
1975 61.24 2005 89.56

2. The U.S Patent and Trademark Office reports fiscal year figures for patents issued in
the US. Following are the numbers of patents issued for the years 1980 through 2007.
Using these data and a base year of 1990, determine the simple index numbers for each
year.

3. Using the data that follow, computer the aggregate index numbers for the four types of
meat. Let 1995 be the base year for this market basket of goods.

29
4. Suppose the following data are prices of market goods involved in household
transportation for the years 2001 through 2009. Using 2003 as a base year, compute
aggregate transportation price indexes for this data.

e. Calculate Laspreyes price indexes for 2007 – 2009 from the following data. Use
2000 as the base year

6. Calculate Paaches price indexes for 2008 and 2009 using the following data and 2000
as the base year

30

S-ar putea să vă placă și