The Normal Distribution Estimation Correlation

THE NORMAL DISTRIBUTION
DEFINITION: A continuous random variable X is said to be normally distributed if its density

function is given by:

for and for constants and , where

Notation: If X follows the above distribution, we write
The graph of the normal distribution is called normal curve.
Properties of the normal curve:
1. The curve is bell-shaped and symmetric about a vertical axis through the mean .
2. The normal curve approaches the horizontal axis asymptotically as we proceed in either
direction away from the mean.
3. The total area under the curve and above the horizontal axis is equal to 1.
0 1 2 3 -3 -2 -1
DEFINITION: The distribution of a normal random variable with mean zero and standard
deviation equal to 1is called a standard normal distribution.
If , then X can be transformed into a standard normal random variable
through the following transformation:

If X is between the values , the random variable Z will fall between the
corresponding values:

Therefore,
Examples:
1. Let Z be a standard normal random variable. That is, . Find the following
probabilities: (see the z-table for the probabilities)
A.
B.

C.
D.

2. Let Z be a standard normal random variable. That is . Find the value of a.
A.

B.

C.

3. Let X be a normal random variable with . Find the following
probabilities:
A.

Therefore, the
B.

Therefore, the

C.

Therefore, the
4. Given a test with a mean of 84 and a standard deviation of 12.
A. What is the probability of an individual obtaining a score of 100 or above in this
test?
B. What score includes 50% of all the individuals who took the test?
C. If 654 students took the examination, then how many students got a score below
60?
Solution: Given: =84, =12
A.

Therefore, the probability of an individual obtaining a score of 100 or above on this test
is 0.0918 or 9.18%.
B. In notation form, the statement is equivalent to:

Finding the corresponding z-score of the probability 0.50, z = 0.00
From the transformation formula,

Therefore, the score that includes 50% of those who took the exam is 84.
C. Given: =84, =12, N= 654

The number of students who got a score lower than 60 is equal to the product of the
probability and the total number of students.

Exercise 6.2
1. Let Z be a standard normal variable. Find the following probabilities:
a.
b.
c.
d.
2. Given a normal distribution with = 82 and find the probability that X assumes
a value
a. Less than 78
b. More than 90
c. Between 75 and 80
3. The mean weight of 500 male students at a certain college is 151 pounds. And the
standard deviation is 15 pounds. Assume that the weights are normally distributed.
a. How many students weigh between 120 and 155 pounds?
b. What is the probability that a randomly selected male student weighs less than 128
pounds?
ESTIMATION
Basic Concepts of Estimation
Definition of terms:
Estimator- any statistic whose value is used to estimate an unknown parameter.
Estimate- a realized value of an estimator.
Point Estimate- a single value used to represent the parameter of interest.
Interval Estimator- a rule that tells us how to calculate two numbers based on a sample data,
forming an interval within which the parameter is expected to lie. The pair of numbers (a,b) is
called interval estimate or confidence interval.
Level of Confidence or confidence coefficient- the degree of certainty to an interval estimate
for the unknown parameter

Point Estimation of the mean and the Standard Deviation
A statistic is used to estimate parameters. The following are used to estimate the
parameters given below:
Parameter Statistic
Population mean ()
Population Standard Deviation ()
Interval Estimation of the Mean for a Single Population
Confidence Interval for , is known
If is the mean of a random sample of size n from a population with known variance
confidence interval for is given by

Note:
For small samples selected from nonnormal populations, we cannot expect our degree of
confidence to be accurate. However, for small samples of size , regardless of the shape
of most population, sampling theory guarantees good results.
To compute a confidence interval for , it was assumed that is known. Since
this is generally not the case, shall be estimated by s, provided
Example:
A survey of the delivery time of 100 orders worth P20,000 from WILLIAMS PIZZA
yielded a mean of 55 minutes with a standard deviation of 12 minutes. Assuming that the
delivery time follow a normal distribution, construct a 95% confidence interval for the true
mean.
Solution:
Given: minutes, 12 minutes, n = 100 orders, = 5%

Substituting the values in the formula:

we obtained:

Conclusion: The WILLIAMS PIZZA is 95% confident that the true mean delivery time is between
52.648 minutes and 57.352 minutes.
Error in Estimating the Population Mean
If is used as an estimate of , we can be confident that the error will
not exceed
Example:
The heights of a random sample of 50 college students showed a mean of 174.5 cm and
a standard deviation of 6.9 cm. What can we assert with 98% confidence about the possible size
of our error if we estimate the mean height of all college students to be 174.5?
Solution:
Given: = 174.5 cm, = 6.9 cm, n= 50 students, = 2%
The possible size of the error can be obtained by using

Substituting the values in the formula:

Conclusion: We can therefore conclude that we are 98% confident that the sample mean differs
from the true mean height by 2.27 cm.
Sample Size for Estimating the Population Mean
If is used as an estimate of , we can be confident that the error will
not exceed a specified amount e when the sample size is .
Example:
The monthly wage of new employees at a certain broadcasting company is said to follow
a normal distribution with a standard deviation of P1,000. How large sample would be needed
to be 99% confident that the sample mean will be within P300 of the true mean.
Solution:
Given: , , = 1%

by substitution:

Conclusion: Therefore we can conclude that the sample size should be 74 employees to be 99%
confident that the sample mean will be within P300 of the true mean wage.
Small-Sample Confidence Interval for , is unknown
If and s are the mean and standard deviation respectively, of a random sample of size
from an approximate normal population with unknown variance ,
confidence interval for is given by

where is the t value with degrees of freedom.
Note: Values for t are found in the Table of T-values
Example:
A random sample of 8 cigarettes of a certain brand has average nicotine content of 3.6
milligrams and a standard deviation of 0.9 milligrams. Construct a 99% confidence interval for
the true average nicotine content of this particular brand of cigarettes, assuming an
approximate normal distribution.
Solution:
Given: , 0.9 milligrams, n = 8 cigarettes, = 1%
with
by substitution:

we obtained:

Conclusion: Therefore we can conclude that we are 99% confident that the true average nicotine
content of a certain brand of cigarette is within 3.2818 milligrams and 3.9182 milligrams.
Exercise 7.
1. An electrical firm manufactures light bulbs that have a length of life that is
approximately normally distributed, with a standard deviation of 40 hours. If a random
sample of 30 bulbs has an average life of 780 hours, find a 96% confidence interval for
the population mean of all bulbs produced by this firm. How large a sample is needed if
we wish to be 96% confident that our sample mean will be within 10 hours of the true
mean?
2. The contents of 7 similar containers of sulfuric acid are 9.8, 10.2, 10.4, 9.8, 10.0, 10.2
and 9.6 liters. Find a 95% confidence interval for the mean content of all such
containers, assuming an approximate normal distribution for container contents.
3. A random sample of 100 PUJ (Public utility jeep) shows that a jeepney is driven on the
average 24,500 km per year, with a standard deviation of 3,900 km.
a. Construct a 99% confidence interval for the average number of kilometer a jeepney
is driven annually.
b. What can we assert with 99% confidence about the possible size of our error if we
estimate the average number of km driven by jeepney drivers to be 23,500 km per
year?
4. Suppose that the time allotted for commercials on a primetime TV program is known to
have a normal distribution with a standard deviation of 1.5 minutes. A study of 35
showings gave an average commercial time of 10 minutes. Compute for the maximum
error. Construct a 95% confidence interval for the true mean.
5. A random sample of 12 female students in a certain dorm showed an average weekly
expenditure of P750 for snack foods, with a standard deviation of P175. Construct a 90%
confidence interval for the average amount spent each week on snack foods by female
students living in this dormitory, assuming the expenditures to be approximately
normally distributed.
6. The mean and standard deviation for the quality grade point averages of a random
sample of 28 college seniors are calculated to be 2.6 and 0.3 respectively. Find the 95%
confidence interval for the mean of the entire senior class. How large a sample is
required if we want to be 95% confident that our estimate of is not off by more than
0.05?
7. To estimate the average serving time at a fast food restaurant, a consultant noted the
time taken by 40 counter servers to complete a standard order (consisting of 2 burgers,
2 large fries and 2 drinks). The servers averaged 78.4 seconds with a standard deviation
of 13.2 seconds to complete the orders. What can the consultant assert with 95%
confidence about the maximum error if he uses seconds as an estimate of the
true average time required to complete this standard order?
8. A company surveyed 4400 college graduates about the lengths of time required to earn
their bachelors degrees. The mean is 5.15 years, and the standard deviation is 1.68
years. Based on these sample data, construct the 99% confidence interval for the mean
time required by all college graduates.
9. In a time-use study, 20 randomly selected managers were found to spend an average of
2.4 hours each day on paperwork. The standard deviation of the 20 observations is 1.30
hours. Construct a 95% confidence interval for the mean time spent on paperwork by
managers.
10. In a study of physical attractiveness and mental disorders 231 subjects were rated for
attractiveness, and the resulting sample mean and standard deviation are 3.94 and 0.75,
respectively. Determine the sample size necessary to estimate the sample mean,
assuming you want a 95% confidence and a margin of error of 0.05.
11. The number of incorrect answers on a true-false test for a sample of 15 students was
recorded as follows: 2, 1, 3, 0, 1, 3, 6, 0, 3, 3, 5, 2, 1, 4, 2. Estimate the variance.
12. In a study of the use of hypnosis to relieve pain, sensory ratings were measured for 16
subjects, with the results given below. Use these sample data to estimate the mean.
8.8 6.2 7.7 7.4 6.4 6.1 6.8 9.8 8.3 11.9 8.5 5.2
6.1 11.3 6.0 10.6

CORRELATION ANALYSIS
- A correlation exists between two variables when one of them is related to the other in
some way.
- Correlation Analysis attempts to measure the strength of relationships between two
variables by means of a single number called a correlation coefficient r.
- The linear correlation coefficient r measures the strength of the linear relationship
between the paired x and y values in the sample. This is also referred to as the Pearson
product moment correlation coefficient in honor of Karl Pearson who originally
developed it. The formula is given below:
( )( )( )
( )( ) | |( )( ) | |
2
2
2
2

=
i i i i
i i i i
y y n x x n
y x y x n
r

- Since r is computed from the sample data, it is a sample statistic.
- Interpretation of the values of r
r = 1 : perfect positive correlation between X and Y
0.5 s r < 1 : strong positive correlation between X and Y
0 < r < 0.5 : positive correlation between X and Y
r = 0 : zero correlation
-0.5 < r < 0 : negative correlation between X and Y
-1 < r s -0.5 : strong negative correlation between X and Y
r = -1 : perfect negative correlation between X and Y
- Zero correlation means lack of linearity and not lack of association.
- r measures the strength of the linear relationship. It is not designed to measure the
strength of a relationship that is not linear.
- The value of r is always between 1 and 1, that is 1 s r s 1 . (rounding off should be at
least up to 3 decimal places)
- Common errors in interpreting the results:
1. We must be careful to avoid concluding that a significant linear correlation
between two variables is a proof that there is a cause-effect relationship
between them.
2. No significant linear correlation does not mean X and Y are not related in any
way.
3. Rounding errors can wreak havoc with the results. Round the linear correlation
coefficient to three decimal places.

Examples:
For numbers 1 to 4, identify the error in the stated conclusion and write the correct conclusion.
1. Given: The paired sample data result in a linear correlation coefficient very close to zero.
Conclusion: The two variables are not related in any way.
2. Given: There is a strong positive linear correlation between smoking and cancer.
Conclusion: Smoking causes cancer.
3. Given: x = age y = test score r = 0.40
Conclusion: Older people tend to get lower scores.

4. Given: There is a strong positive linear correlation between income and spending.
Conclusion: Increased spending is caused by increased income.
5. Ten students from the College of Business Administration were chosen to become
respondents in a study conducted to determine the relationship between the grades of
students ( X ) with their number of hours studying ( Y ). After computing the degree of
relationship, it was found out to be 0.575. What would be the conclusion?

6. The data on yearly consumption of cigarettes in the Philippines and the percentage of the
countrys population admitted to mental institutions as psychiatric cases were collected for 8
years. The correlation coefficient r = 0.61. What can we conclude about the data?

7. The temperature in a certain locality and number of pregnant women were found to have a
strong negative correlation. What would be the right conclusion?

EXAMPLES: Construct a scatter diagram, find r and interpret the results.
1. X 2 3 7 12 16 20 22
Y 14 20 9 14 5 1 15
2. X 9 4 5 4 2 6 3 7 2 8
Y 8 5 8 4 3 4 4 10 4 10
3. X 2 4 6 8 10 12
Y 6 12 18 24 30 36
4. X 25 64 75 35 86 15 19 66 37 9 12 9 47
Y 90 3 85 70 67 45 22 12 85 66 54 16 24
5.
X 3 4 3 4 5 6 5 6 7 8 7 8 9 11 9 10
Y 15 17 3 4 5 21 23 13 11 12 25 6 7 9 16 7

EXERCISES
A. Construct a scatter diagram, find r and interpret the results.
1. Grades of 6 students selected at random
MATH GRADE ( X ) 70 92 80 74 65 83
ENGLISH GRADE (Y) 74 84 63 87 78 90
2. The data below consists of weights in pounds of discarded paper and size of households
X (paper) 2.41 7.57 9.55 8.82 8.72 6.96 6.83 11.42
Y (household size) 2 3 3 6 4 2 1 5

3.The data below consists of number of persons in the household and the number of cars they
own
X (household size) 2 4 4 2 2 1 2 3 5
Y (cars) 2 0 2 2 1 1 3 0 2

4. The data below consists of age and the income in thousands of dollars
Age 60 63 51 25 47 56 19 24 25 20 66 19 48 52 27
Income 43.4 18.8 14.4 29.4 19.4 83 10.4 12.6 36.4 29.6 17.2 17.2 67 33 37.4

5. A teacher is interested in knowing whether or not two IQ tests produce linearly related
scores. A sample of 10 students was taken randomly. Five students took Test 1 and 5 students
took Test 2 in the morning. In the afternoon, those who took Test 1 took Test 2 and vice versa.
The results are shown in the table below:
STUDENT TEST 1 (X) TEST 2 (Y)
A 125 114
B 145 127
C 110 126
D 120 116
E 124 108
F 110 100
G 121 129
H 142 131
I 100 96
J 126 113
a. Plot a scatter diagram for these data.
b. Solve for r.
c. How well do the two tests relate linearly? Explain.
6. In a study of factors that affect success in a calculus score, data were collected for 10
different persons. Scores on an Algebra placement tests are given, along with Calculus
achievement scores.
b. Find the value of the linear correlation coefficient r.
c. Test the significance of r at = 0.05.

ALGEBRA
SCORE (X)
17 21 11 16 15 11 24 27 19 8
CALCULUS
SCORE (Y)
73 66 64 61 70 71 90 68 84 52

7. One study was conducted to determine the relationship between the age and systolic blood
pressure of 12 women.
Age ( X ) Systolic Blood Pressure ( Y )
56 147
42 125
72 160
36 118
63 149
47 128
55 150
49 145
38 115
42 140
68 152
60 155
b. Solve for r and interpret.
c. What can you conclude about the relationship between age and systolic blood
pressure of women? Explain statistically.

The Normal Distribution Estimation Correlation

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

The Normal Distribution Estimation Correlation

Încărcat de

Drepturi de autor:

Formate disponibile

THE NORMAL DISTRIBUTION

DEFINITION: A continuous random variable X is said to be normally distributed if its density

S-ar putea să vă placă și