Sunteți pe pagina 1din 22

Assignment 1

Question 1:
(Marks: 16)
Write the short notes on the following:

Solution:

i) Variable and constant


Variable: A measurable quantity which can vary from one individual or object to
another is called a variable.
Constant: A quantity which can assume only one value is called a constant.

ii) Continuous and Qualitative variable


Continuous variable: A variable which can assume an infinite number of values
with in a given range is called a continuous variable .e.g. weight, height, length
Qualitative variable: A variable that can not be expressed in numerical form but
shows the presence or absences of some attribute is called qualitative data. For
examples marital status, religion, sex etc

iii) Population and sample


Population: The collection of individuals or objects having some common
measurable characteristics.
Sample: A representative part of the population is called sample.

iv) Primary and secondary data


Primary data: the data published or used by an organization which originally
collected by them is called primary data.
Secondary data: The data published or used by an organization which they did not
collect originally is known as secondary data.

v) Sampling errors and non-sampling errors


Sampling Error: The difference between the estimate derived from the sample and
the true population value (the parameter) is technically called the sampling error.
Non-sampling errors: There are certain errors which are not attributable to sampling
but arise in the process of data collection, even if a complete count is carried out.
Such errors are known as non-sampling errors.

vi) Multiple bar chart and component bar chart


Multiple bar chart: This chart is simple an extension of simple bar chart. In this
chart, grouped (adjacent) bars are used to represent related set of data. Each bar in a
group is shaded differently for distinction.
Component bar chart: This chart is consisting of horizontal or vertical bar which
are subdivided into two or more parts. This chart is used when it is desired to present
data which are subdivisions of totals.

vii) Frequency distribution


Frequency distribution: A frequency distribution is a tabular arrangement of data in
which various items are arranged into classes and the number if item falling in each
class (called class frequency) is stated.

viii) Measure of central tendency


Measure of central tendency: A single value which represents the whole data is
called the average value. Since the average tends to lie in the center of
data/distribution, it is also called measures of central tendency.

Question 2:
(Marks: 4)
State which of the following represent qualitative data and which one of them represents
quantitative data.
i) Religion of the people of the country (qualitative data)
ii) Fee of VU students (quantitative data)
iii) Majority of population like Geo TV (qualitative data)

iv) Inches of rainfall in Lahore city during the last year (quantitative data)

Note:
Question 3:
(Marks: 10)
The following data are the weights in pound of 42 students of Virtual University.
Construct a stem-and-leaf display of the data.

135 157 152 189 135 164 146


144 154 153 150 158 168 165
140 132 140 126 146 135 144
147 138 173 161 125 136 176
142 145 149 148 163 147 135
142 150 156 145 128 154 171

Solution:
The stem-and-leaf display of the data is shown below.

Stem Leaf
12 5 6 8
13 8 2 6 5 5 5 5
14 4 9 6 0 7 8 4 6 2 0 5 2 5 7
15 0 7 8 2 4 3 0 6 4
16 4 8 3 5 1
17 6 3 1
18 9

Stem Leaf

12 5 6 8
13 2 5 5 5 5 6 8
14 0 0 2 2 4 4 5 5 6 6 7 7 8 9
15 0 0 2 3 4 4 6 7 8
16 1 3 4 5 8
17 1 3 6
18 9
Assignment 2
Question 1:
a) What is the difference between Chebyshev’s inequality and empirical rule (in
terms of skweness)?
Solution:
Chebyshev’s inequality and Empirical rule both tells us the proportion of data values
that must lie within a specified number of standard deviation from mean.
Chebyshev’s inequality is a general rule for all symmetric and non symmetric
distributions.
But empirical rule is applicable only on the symmetric distributions.
b) The share prices of a company in Lahore and Islamabad market during the last
months are recorded below:

Months Jan Feb March April May Jun July Aug Sep Oct
Lahore 105 120 115 118 130 127 109 110 104 112
Islamabad 108 117 120 130 100 125 125 120 110 135
In which market, the shares prices are more stable?
Solution:
For the stability of market we have to check the Coefficient of variation for both
cities, the city having less CV will show stability in its market.

x lahore 
 x  1150  115
n 10

x x 132944  115


2 2

S lahore      8.33
2

n  n  10
 

C.V .Lahore 
S 8.33
 100   100  7.24
x 115
y Islamabad 
 y  1190  119
n 10

y
y
  
2 2
142628
S Islamabad    119  10.09
2

n  
 n  10

C.V .Islamabad   100 


S 10.09
 100  8.48
y 119
By the comparison of coefficient of variations shows that there is more stability in
Lahore stock exchange as compare to Islamabad.
Question 2:
a) Interpret standard deviation.
Solution:
The standard deviation is a very important concept that serves as a basic measure of
variability. A smaller value of the standard deviation indicates that most of the
observations in a data set are close to the mean while large value of S.D implies that
the observations are scattered widely about the mean.
b) The following data give the number of passengers traveling by airplane from one
city to another in one week.
115 112 129 113 119 124 132 120 110 116
Calculate the mean and standard deviation and determine the percentage of class
that lies between (i)    (ii)   2 (iii)   3 . What percentage of data lies
outside these limits?
Solution: Let x represents the number of people traveling by airplane from one
city to another in one week.
Calculations for mean and standard deviations are given

 x  1190  119 x x 142096  119


2 2

x S      6.97
2

n 10 n  n  10

Thus percentage of data lies between given limits:


Interval Values within %age of %age of
Interval values within values
interval Falling
Outside
  113,115, 116, 119, 6 40%
100  60%
119  6.97  125.97,112.03 120, 124 10

  2 110, 112,113, 115,


nil
116, 119, 120, 124, 10
 100  100%
119  2(6.97)  132.94,105.06 129, 132 10
  3 nil
119  3  6.97   139.91,98.09 All values
100
Assignment 3
Question 1
Give the short answers of the following:
I. What are moments? And why we use moments.
II. What is meant by kurtosis?
III. Lepto kurtic
IV. Platy kurtic
V. Normal distribution
VI. Regression
VII. Regressor
VIII. Regressand

Solution:
What are moments? And why we use moments.
Moments are central parameters, which are used for testing the symmetry and
normality of the distribution.
What is meant by kurtosis?
The term kurtosis is meant to show the degree of peak ness of the distribution.
Lepto kurtic:
A distribution having a relatively higher peak is called Lepto kurtic distribution.
Platy kurtic:
A distribution, which is flat – topped, is called platy distribution.
Normal distribution:
A distribution which is neither very peaked nor very flat, is called normal
distribution or mesokurtic.
Regression:
It investigates the dependence of one dependent variable on the other
independent variable.
Regressor:
The independent or the non-random variable is also referred to as the regressor,
the predictor, the regression variable or the explanatory variable.
Regressand:
The dependent or the random variable is also referred as the regressand , the
predictand , the response or the explained variable.
Question 2:
If distribution has mean 1403 and mode 1487, what can you say about the
skewness?
Solution:
Mean = 1403
Mode = 1487
The distribution is negatively skewed, because
Mean < Mode
Question 3:
a) Distinguish between permutation and combination.
b) First four moments of a certain distribution about Y = 17.5 are 0.3,74,45,
and 12125 respectively. Find out whether the distribution is Lepto kurtic or
Platy kurtic.

Solution:
a. Permutation:
A permutation is an arrangement of all or part of a set of objects in a
definite order. The number of permutations of n distinct objects taken r
at a time is
n!
n
Pr 
(n  r )!
Combination:
A combination is an arrangement of objects without regard to their order. The
number of combinations of n objects taken r at a time is
n!
n
Cr 
r !(n  r )!

b. First four moments about Y = 17.5


m1  0.3
m2  74
m3  45
m4  12125
Moments about mean:
m1  0
m2  m2  (m1 ) 2  74  0.09  73.91
m3  m3  3m2m1  2(m1 )
m3  45  3(74)(0.3)  0.054  21.546
m4  m4  4m3m1  6m2 (m1 ) 2  3(m1 ) 4
m4  12125  4(0.3)(45)  6(74)(0.3) 2  3(0.3) 4
=12125-54+39.96-0.0243
= 12110.94
m 12110.94
b2  42   2.22
m2 (73.91)2
b2  3
The distribution is platykurtic.
Assignment 4
Question 1

An Urn contains 5 white and 7 black balls, five balls drawn at random.
a) Find the distribution function of the probability distribution of no. of white balls.
b) Draw the graph of the distribution function.

a. Let X be a random variable which represent the number of white balls then the random
variable X takes the values 0,1,2,3,4,5 and their probabilities are:

N=12, n=5
5 7

p  X  0  c c
. 0 5
12
c 5
=21/792

=7/264

5 7

p  X=1  c c
. 1 4
12
c 5

=175/792

5 7

p  X=2   c c
.2 3
12
c 5

=350/792

=175/396

5 7

p  X=3  c c
. 3 2
12
c 5
=210/792

=35/132
5 7

p  X=4   c 4 c1
.
12
c 5

=35/792
5 7

p  X=5   c c
.
5 0
12
c 5
=1/792

Probability Distribution of X

Number of white balls Probability


X f(X)
0 21/792
1 175/792
2 350/792
3 210/792
4 35/792
5 1/792

In order to obtain the distribution function of the probability distribution, we compute the
Cumulative Probabilities as follows:

Number of white balls Probability Cumulative frequency


X f(X) F(X)
0 21/792 21/792
1 175/792 196/792
2 350/792 546/792
3 210/792 756/792
4 35/792 791/792
5 1/792 792/792

Hence the desired Distribution Function is:

0 for x<0
21/792 for 0x<1
196/792 for 1x<2
F(X) = 546/792 for 2x<3
756/792 for 3x<4
791/792 for 4x<5
1 for x5

b.
Question 2

Three balls are drawn at random from a box containing 3 blue balls, 2 red balls and 3
green balls. If X represents no. of blue balls and y is the number of red balls. Then
a) Make the joint distribution of X and Y
b) Find f(x/1)
c) P(X=2/Y=0)
Solution:
The joint probability distribution will be determined as follows
3
Cx 2 C y 3C3 x y
f ( X  x, Y  y)  8
C3 Where x=0, 1, 2, 3 and y=0, 1, 2
3 2 3
C0 C0 C3
f ( x  0, y  0)  8
 1/ 56
C3
3
C0 2C1 3C2
f ( x  0, y  1)  8
 6 / 56
C3
3
C1 2C0 3C2
f ( x  1, y  0) 
8
 9 / 56
C3
Similarly we can find the remaining probabilities
a. Joint distribution of X and Y

Y X
0 1 2 3 h(y)
0 1/56 9/56 9/56 1\56 20/56
1 6/56 18/56 6/56 0 30/56
2 3/56 3/56 0 0 6/56
g(x) 10/56 30/56 15/56 1/56 1
For Part (b):
f ( x,1)
f (x/1)=
h(1)
Now we have to find first the h (1)
h(1)=f(0,1)+f(1,1)+f(2,1)+f(3,1)
=6/56+18/56+6/56+0=30/56
Then,
56 f ( x,1)
f (x/1)=
30
56
f (0 /1)  f (0,1)
30
56 6 1
 ( )
30 56 5
56
f (1/1)  f (1,1)
30
56 18 3
 ( )
30 56 5
56
f (2 /1)  f (2,1)
30
56 6 1
 ( )
30 56 5
56
f (3 /1)  (0)  0
30
x 0 1 2 3

f(x/1) 1/5 3/5 1/5 0

c.
P(x=2/Y=0)
f ( x  2, y  0)
P( x  2 / Y  0) 
h(0)
9 / 56
=  9 / 20
20 / 56
Assignment 5
Question 1:
Define Poisson process.
Sol:
A Poisson process represents a situation where events occur randomly over a
specified interval of time or space or length.

a) Given a random variable X, E(X) = 0.63 & Var (X) = 0.2331. Find E ( X 2 ) .

Sol:
E(X) = 0.63 & Var (X) = 0.2331
Var ( X )  E( X 2 )   E( X )
2

Putting the information in the above formula , we get


0.2331  E ( X 2 )  0.63
2

0.2331  E ( X 2 )  0.3969
0.2331  0.3969  E ( X 2 )

E ( X 2 )  0.63
Question 2:
a) When do we deal discrete Uniform distribution?

Sol:
The point to be kept in mind is that, whenever we have a situation where the various
outcomes are equally likely, and of a form such that we have a random variable X with
values 0, 1, 2, … ..n then we will be dealing with the discrete uniform distribution.

b) A random variable X is normally distributed with   50and 2  25 . Find


the probability that it will

I. larger than 54
II. Smaller than 57.
Sol:
With   50and 2  25 , we have
i) At x=54
54  50
Z  0.80
5
Hence using table we have
P(X>54) = P (Z>0.8)
= 0.5- P (0  Z  0.8)
= 0.5 – 0.2881= 0.2119.
ii) At x= 57
57  50
Z  1.40
5
Therefore using table
P(X<57) = P (Z<1.40)
= 0.5+ P (0  Z  1.40)
=0.5+ 0.4192
= 0.9192

Question 3:
In which condition, Poisson distribution is used to approximate the hyper geometric
distribution?
Sol:
The Poisson distribution can be used to approximate the hyper geometric
distribution when
n < 0.05N, n > 20, and p < 0.05
a) A fair coin is tossed 20 times. Find the probability that the number of heads
occurring is between 10 and 14 inclusive by using the normal approximation
to the binomial distribution.

Sol:
Since n= 20, p= 0.5, q= 1-p = 0.5
  np  20(0.5)  10

  npq  20(0.5)(0.5)  2.24


For the normal approximation, the interval of discrete value 10  X  14 is
replaced by the interval 9.5  X  14.5, we compute as
At x=9.5, we find
9.5  10
Z1   0.22
2.24
14.5  10
Z2   2.01
2.24
Hence by using table
P (10  X  14) = P (-0.22  X  2.01)
= P (-0.22  X  0) + P(0  X  2.01)
= 0.0871 + 0.4778
= 0.5649.
Assignment 6
Question 1:
Define the following terms:
 Parameter and statistic
The quantity calculated from the population is known as Parameter whereas the statistic
is a quantity calculated form sample.

 Sampling distribution of a statistic


The probability distribution of a sample statistic is the sampling distribution of a statistic.

 Unbiased estimator
An estimator is unbiased if the mean of its sampling distribution is equal to the
population parameter to be estimated.

 Statistical Estimation
The statistical estimation is a procedure of making judgment about the unknown value of
a population parameter by using the sample observations.

 Standard Error of a statistic


The standard deviation of the sampling distribution of a statistic is called the standard
error of the statistic.

Question 2:
a) A random variable X has the following probability distribution:

x 4 5 6
P(x) 0.3 0.5 0.2

Find the mean  X and standard error  X of the mean for a random sample of size 2.

Solution:
A random variable X has the following probability distribution:

x 4 5 6
P(x) 0.3 0.5 0.2

xP(x) 1.2 2.5 1.2  xP( x) =4.9


x 2 P(x) 4.8 12.5 7.2  x P( x) =24.5
2

  E( x)   xP( x)  4.9
 2  Var ( x)   x 2 P( x)   xP( x)  24.5  (4.9) 2  0.49
  0.49  0.7
We know that:
 X    4.9
2
0.49
 X2   0.245
n 2
 X  0.245  0.495

b) It is known that 3% of the persons living in Gujranwala city are known to have a
certain disease. Find the mean and standard error of sampling distribution of proportion
of diseased persons in a random sample of 500 persons.
Solution:
We have proportion in the population P= 0.03 and the sample size n= 500.
Let the sample proportion is P̂

Then,  pˆ  P  0.03
P (1  P ) 0.03(1  0.03)
And  pˆ    0.00763
n 500
Question 3:
a) In a random sample of 500 people eating lunch at a hospital cafeteria on various
Fridays, it was found that x  160 preferred seafood. Find 95% confidence interval for the
actual proportion of people who eat seafood on Fridays at this cafeteria.
Solution:
160
The point estimate of population proportion is pˆ   0.32 .Using table we
500
find z0.05 / 2  1.96 .Therefore
pˆ (1  pˆ )
pˆ  z / 2
n
(0.32)(0.68)
0.32  1.96
500
0.32  0.04
0.28, 0.36
b)
The mean and standard deviation for the quality grade-point averages of a random sample
are calculated to be 2.6 and 0.3. How large sample is required if we want to be 95%
confident that our estimate of  is not off by more than 0.05
Solution:
We know that
 z .ˆ 
2

n    /2 
 e 

As given
z / 2  1.96
ˆ  0.3
e  0.05

By substituting given values

 z .ˆ   1.96  0.3 


2 2

n    /2      138.3
 e   0.05 
n  138
Assignment 7
Question 1:
2 2
Prove that when n is large, s is approximately equal to S
Solution:
As we knowthat
( x  x ) 2
s2   ( x  x ) 2  (n  1) s 2
n 1
whereas
( x  x ) 2
S 
2
 ( x  x ) 2  nS 2
n
Hence
(n  1) 2  1  2
(n  1) s 2  nS 2  S 2  s  1   s
n  n
Now, as
1
n  0
n
Hence
If n is LARGE
S2 s2
(a) A random sample of 100 workers with children in day care show a mean day-care
cost of Rs.2650 and a standard deviation of Rs.500. Verify the department’s claim
that the mean exceeds Rs.2500 at the 0.05 level with this information.
Step  1:
H 0 :   2500
H1 :   2500 (one  sided  test )
Step  2 :
  0.05
Step  3 :
x  0
z
S
n
2650  2500 150
z 
500 50
100
z 3
Step  4 :
The critical region for   0.05 is z  1.645
Step  5 :
Since the calculated value is Z falls in the critical region, so we accept Alternative
hypothesis.

Question 2:
(a) A random sample of size n is drawn from normal population with mean 5 and
variance  . Answer the following:
2

If s=15, x =14 and t=3, what is value of n?


Solution:
As we know that
x 
t
s
n
14  5
3
15
n
15
3( )  9
n
45  9 n
5 n
n  25

(b) In a poll of college students in a large university, 300 of 400 students living in
students’ residences (hostels) approved a certain course of action, whereas 200 of 300
students not living in students’ residences approved it. Compute the 90% confidence
interval for this difference.
Solution:
Let
300
pˆ1   0.75
400
 qˆ1  1  pˆ1  1  0.75  0.25
and
200
pˆ 2   0.67
300
 qˆ2  1  pˆ 2  1  0.67  0.33

  0.10  z0.05  1.645


The 90% confidence interval for p1  p2 is
pˆ1qˆ1 pˆ 2 qˆ2
 pˆ1  pˆ 2   Z 2 
n1 n2
(0.75)(0.25 (0.67)(0.33)
 0.75  0.67   1.645 
400 300
0.08  1.645(0.0347)
0.08  0.057
 0.023, 0.137 

S-ar putea să vă placă și