Documente Academic
Documente Profesional
Documente Cultură
Formula
1. Mean for grouped data, using assumed mean with step deviation method
Mean = A +
∑ fd * c
n
Where –
A is the assumed mean
d is deviations from assumed mean divided by common interval
Statistics Page 1
OBJECTIVE QUESTIONS
Choose the best answer / Fill in the blanks / True or False –
1. If the classes are of the form 0 - 10, 10 – 20, etc they are called _______________ classes
2. If the classes are of the form 1 - 10, 11 - 20,etc they are called _________________ classes
3. If the classes are of the form 0 - 10, 10 – 20, etc an item of value 10 will be entered in –
a. Class 0 – 10
b. Class 10 – 20
c. Either of the above
d. None of the above
4. If the classes are of the form 0 - 10, 10 – 20, etc the class interval is ____________
5. If the classes are of the form 0 - 10, etc the mid point of class is ____________
6. Number of observations falling within a class is called - Class _____________
7. Ogive means –
a. Cumulative frequency curve
b. Frequency Cure
c. Mathematical Average
d. Arithmetic Mean
8. Data can be in ________________________ or _____________________form.
9. The measures of central tendency are ______________, ______________ & _________________
10. Mean, Median and Mode are known as –
a. Measures of Central Tendency
b. Measures of Dispersion
c. Measures of Middle Values
d. Measures of Mathematical Averages
11. If all the items in a distribution are of the same value, then-
a. Mean = Median = Mode
b. Mean > Median > Mode
c. Mean < Median < Mode
d. Mean + Median = Mode
12. The sum of deviations of all observations from the Arithmetic Mean is ____________
13. In a symmetrical distribution-
a. Mean = Median = Mode
b. Mean > Median > Mode
c. Mean < Median < Mode
d. Mean + Median = Mode
Statistics Page 2
14. Empirical formula about measures of central tendency given by Karl Pearson for an asymmetrical distribution is –
a. Mean – Mode = 3 (Mean – Median)
b. 2 Mode = (Mean + Median)
c. 2 Mean = (Mode + Median)
d. 2 Median = (Mode + Mean)
15. Quartiles are _____________________
16. Percentiles are ____________________
17. Deciles are _____________________
18. True or False
a. The following measures are affected when the highest value in a set of observations is altered
b. The following measures are affected when the lowest value in a set of observations is altered
c. The following measures are affected when the highest value and the lowest in a set of observations are altered
d. The following measures are affected when each value in a set of observations are increased or decreased by a
constant value
e. The following measures are affected when each value in a set of observations are multiplied or divided by a
constant value
Measure a b c d e
Mean
Median
Mode
Statistics Page 3
PROBLEMS
CALCULATE THE MEASURES OF CENTRAL TENDENCY AND THE FIVE NUMBER SUMMARY FOR THE
FOLLOWING DATA
1. Data pertaining to marks of students and ages of people is given below
a. Marks of students in a test is 48, 60, 59, 67, 66, 78
b. Ages of people in a group is 70, 72, 63, 56, 37, 82, 55, 85, 63
Statistics Page 4
MEASURES OF DISPERSION
Statistics Page 5
OBJECTIVE QUESTIONS
Choose the best answer / Fill in the blanks / True or False -
1. The measure of degree of scatter of the data from the central value is
a. Dispersion
b. Skewness
c. Average
d. Mean
2. ______________is the difference between the largest and the smallest value of the variable
3. Quartile deviation is otherwise called as –
a. Quartile Range
b. Inter quartile range
c. Intra quartile range
d. Semi inter quartile range
4. Mean deviation is otherwise called as –
a. Average deviation
b. Dispersion
c. Difference
d. Zero sum
5. The relative measure of standard deviation is called ___________________________________
6. Square of standard deviation is called _______________________________
7. Sum of squares of deviation is minimum when taken from ___________________
8. Sum of absolute deviation is minimum when taken from _____________
9. Inter quartile range is
a. Q3 – Q1
b. Q1 – Q2
c. Q2 – Q1
d. Q3 – Q2
Statistics Page 6
10. True or False
a. The following measures are affected when the highest value in a set of observations is altered
b. The following measures are affected when the lowest value in a set of observations is altered
c. The following measures are affected when the highest value and the lowest in a set of observations are altered
d. The following measures are affected when each value in a set of observations are increased or decreased by a
constant value
e. The following measures are affected when each value in a set of observations are multiplied or divided by a
constant value
Measure a b c d e
Range
Mean Deviation
Quartile Deviation
Standard Deviation
Variance
Statistics Page 7
PROBLEMS
CALCULATE THE MEASURES OF DISPERSION FOR THE FOLLOWING DATA
1. The following are the runs scored by two cricketers in 10 innings.
a. Find which batsman is a better player
b. Find out which batsman is more consistent (more reliable)
Batsman I 16 8 24 56 90 104 48 32 8 14
Batsman II 42 56 43 37 31 45 50 29 30 27
3. A factory produced two types of electric bulbs A and B. In a study about the life of bulbs, the following results were
obtained
c. Find which type of bulb is long lasting
d. Find out which type of bulb is more variable
Length of Life A (no. of B (no. of
(in hours) bulbs) bulbs)
60 – 80 10 8
80 – 100 22 60
100 – 120 52 24
120 – 140 20 16
140 - 160 16 12
Statistics Page 8
CORRELATION AND REGRESSION
1. Correlation measures the degree of relationship between two or more variables
a. The symbol for measuring correlation is ‘r’
b. ‘r’ lies between -1 and +1
c. Correlation is independent of origin and scale
d. Correlation is symmetric with respect to the variables
e. It is independent of units
f. Correlation means relationship and not causation
2. Understanding why association exists -
a. Dependency
b. Nature and strength of association
c. Causation
d. Coincidental relationship
e. Influence of other variables
3. Important types of correlation are –
a. Positive and negative correlation
b. Linear and non-linear correlation
c. Simple, partial and multiple correlation
• Lag and lead in correlation
a. Difference in periods for cause and effect relationship to be established is known as lag and lead
b. Advertisement and marketing expenses may lead to sales with a lag
c. Additional supply of materials today may lead to reduction in prices after some time
d. Effect of increase in income may lead to increase in expenditure and savings after a period
e. Boom in agricultural produce may lead to increase in industrial output after a gap of time
• Regression
a. Regression is a functional relationship between the value of 2 variables
b. With the help of regression lines we can predict most likely value of one variable given the other
c. If x and y are two variables, then y can be represented as equal to ax + b or x is equal to cy + d where a, b, c, and d
are constants. These are known as linear regression equations
d. Rate of change of one variable to unit change in other variable is called regression coefficient
e. The regression lines intersect at ( x , y ) where x and y are mean of x and y respectively
f. If r = 0, then the regression lines will be perpendicular to each other
g. If r = ± 1, then the regression lines will coincide
h. r is the geometric mean of the regression coefficients
i. Both the regression coefficients are either positive or negative
j. At least 1 regression coefficient must be numerically less than unity
k. Regression coefficients are independent of origin but not scale
Statistics Page 9
Formula-
1. Methods of Correlation
a. Karl Pearson’s Coefficient of Correlation
Assumed mean method
N ∑ dxdy − ∑ dx ∑ dy
r=
N ∑ dx 2 − (∑ dx) 2 N ∑ dy 2 − (∑ dy ) 2
Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the
number of observations
Direct method
N ∑ xy − ∑ x ∑ y
r=
N ∑ x 2 − ( ∑ x ) 2 N ∑ y 2 − (∑ y ) 2
Where x is all values of x and y is all values of y and N is the number of observations
(Note: Karl Pearson’s coefficient of correlation is also called product moment correlation)
b. Spearman’s Rank Correlation
WHEN RANKS ARE NOT GIVEN OR UNEQUAL RANKS GIVEN
6∑ d 2
R = 1−
n(n 2 − 1)
Where, d is difference of ranks of x and y variable and n is number of observations
1
6(∑ d 2 +
3
(mi − mi ))
R = 1− 12
n(n 2 − 1)
Where, d is difference of ranks of x and y variable and n is number of observations and mi is number of times a rank
is repeated in the first or second variable
C. Two way Frequency Table
Steps-
Take step-deviations of x and y from assumed mean and denote them dx and dy
Multiply dx and dy and the frequency of each cell and note the figure in upper right hand corner of each cell
Add all values of fdxdy and obtain ∑fdxdy
Statistics Page 10
Multiply frequencies of variable x by deviations of variable x and obtain ∑fdx
Take square of deviations from variable x and multiply by frequencies to obtain ∑fdx2
Multiply frequencies of variable x by deviations of variable y and obtain ∑fdy
Take square of deviations from variable y and multiply by frequencies to obtain ∑fdy2
Substitute the values in the formula to obtain r
d Concurrent Deviation Method
2C − n
R=±
n
Where C is number of concurrent deviations (where sign change from previous pair of x and y is same and n is
number of pairs observed)
4. Probable Error
(1 − r 2 )
PE = 0.6745
n
Where r is correlation and n is number of pairs observed
(1 − r 2 )
SE =
n
Where r is correlation and n is number of pairs observed
δ (Rho) is r ± PE
5. Calculation of Regression Equation
σx
a. (x − x) = r ( y − y)
σy
σx
Where x and y are means of x and y respectively and r is called the regression coefficient of x on y
σy
σy
b. ( y − y) = r (x − x)
σx
σy
Where x and y are means of x and y respectively and r is called the regression coefficient of x on y
σx
c. Fitting a straight line y on x –
Equation is Y = a + bX
∑ y = na + b ∑ x
∑ xy = a ∑ x + b ∑ x 2
Where if we solve for ‘a’ and equate the 2 equations, we will get the value of b as mentioned below
σy N ∑ dxdy − ∑ dx ∑ dy
r = by =
σx x
N ∑ dx 2 − (∑ dx )2
Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the
number of observations
d. Fitting a straight line x on y -
Statistics Page 11
σx N ∑ dxdy − ∑ dx ∑ dy
r = bxy =
σy N ∑ dy 2 − (∑ dy ) 2
Where dx is (all values of x – assumed mean of x) and dy is all values of y – assumed mean of y and N is the
number of observations
e. Fitting a parabolic curve or a second degree equation-
Equation is Y = a + bX + cX2
∑ y = na + b ∑ x + c ∑ x 2
∑ xy = a ∑ x + b ∑ x 2 + c ∑ x 3
∑ x 2 y = a ∑ x 2 + b ∑ x3 + c ∑ x 4
f. Multiple Regression Equations
For 3 variables, equation is X = a + bY + cZ
∑ x = na + b ∑ y + c ∑ z
∑ xy = a ∑ y + b ∑ y 2 + c ∑ yz
∑ xz = a ∑ z + b ∑ yz + c ∑ z 2
Similarly, it can be done for N variables.
Statistics Page 12
OBJECTIVE QUESTIONS
Statistics Page 13
e. If correlation between 2 variables are 0, then the variables are independent
Statistics Page 14
15. Do the following items have positive, negative or zero correlation
a. Price and demand
b. Age and life expectancy
c. Age of husband and wife
d. Income and savings of a person
Statistics Page 15
PROBLEMS
CALCULATE CORRELATION FOR THE FOLLOWING DATA
1. Find the correlation and also regression equations between advertisement expenses and sales of a particular brand of ice-
cream Dippy-Dip
Month Jan Feb Mar Apr May Jun
Advt. Exp (Rs 000s) 20 25 28 32 36 34
Sales (Rs lakhs) 30 36 40 42 45 40
2. Find correlation and also regression equations between marks in statistics and accounting of a particular group of students
Roll No of student 101 102 103 104 105
Statistics marks 45 66 58 74 81
Accounting marks 79 56 61 48 40
3. Find correlation and regression equations between age of cars and annual maintenance cost
Age of cars 2 4 6 8 10
Annual maintenance cost 1600 1500 1800 1700 2100
4. Find rank correlation between marks in test and marks in interview of a group of candidates in a job selection procedure
Marks in Test 24 33 33 42 53 60 60 60 71 75
Marks in Interview 38 40 44 50 49 45 52 50 55 68
6. Excel Pharma has launched a new preventive medicine for the treatment of Swine Flu. The data below is the effect on 100
patients who have taken the medicine against 100 patients who have not taken the medicine and being admitted to the
hospital with viral infection. 98% are free from Swine Flu in the first case vs. 21% who are infected with Swine Flu in the
second case. Excel Pharma is claiming a very high success rate on use of their medicine. Comment
Statistics Page 16
7. Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. What will be
the sensex value in Oct 2010, if the gold price will increase by 10% for diwali purchase season?
MONTH JAN 10 FEB 10 MAR 10 APR 10 MAY 10 JUN 10 JUL 10 AUG 10 SEP 10
24 Ct Gold Price/gm 1500 1550 1600 1620 1700 1750 1800 1850 1900
Sensex 14000 15000 1550 15500 16000 17000 17500 18000 18500
8. Find the multiple linear regression equation of X on Y and Z from the data given below-
X 2 4 6 8
Y 3 5 7 9
Z 4 6 8 10
9. (Please find below an article printed in the front page of Chennai Times)
Chennai
During our recent investigations, it was found that five Chennai cricket players, Sairam, Sandeep, Sankar, Sundar, and Suresh
are deeply involved with the betting syndicate. It has been confirmed by our sources that these players willfully
underperformed in the recently concluded ODI series against the Bangalore team. In the table below are the batting scores of
these five players along with the team score and the result of the matches in the recently concluded Friendship series.
Career
Player Batting 1st ODI 2nd ODI 3rd ODI 4TH ODI 5th ODI
Average
Sairam 28 41 19 12 33 30
Sandeep 26 17 19 17 71 10
Sankar 41 33 42 39 36 45
Sundar 85 89 112 58 90 67
Suresh 34 0 3 2 1 1
Team Chennai 224 272 212 171 265 178
Result 60% WON WON LOST LOST WON LOST
Further, it was predicted by the paper in a letter to the board that the players will under perform in their matches against
Mumbai also and the prediction factor was given to the Chennai Police much in advance before the actual matches were
played. The table contains scores calculated by the prediction factor vs. actual scores for the five Chennai players in the one
off ODI match against Mumbai
Please give your comments about these investigations and the truth in the allegations against the players.
Statistics Page 17
TIME SERIES
Time Series - It is arrangement of data according to time of occurrence in chronological order. Any series of measurement
that is variable over time is called Time series.
Mathematical Models
• Additive Model
Y=T+S+C+I
Components are independent to each other
Different components are expressed in original units and are residuals
S, C & I are expressed as deviations from T
• Multiplicative Model
Y=T*S*C*I
S, C & I are expressed as ratios or in percentages
Components may be dependent on each other
Mostly used in real life practice
Statistics Page 18
• Preliminary adjustments before Analyzing Time Series
o Time Variation - Adjusting for no. of days in a month
o Population Variation - Adjust for variables affected by population like per capita income
o Price Changes - Use real values rather than nominal values
o Comparability - Make data homogeneous and comparable
o Miscellaneous Changes
Measurement of Trend
Freehand or Graphic Method
• Simplest and Most Flexible Method
• First step to plot points on a paper
• Then, draw a freehand smooth curve through points
• Number of points above curve and below curve should be equal
• Total deviations should be zero
• Sum of square of deviations should be the minimum possible
Merits and Demerits of Graphic Method
Merits
• Simple and time saving
• No mathematical calculation required
• Very flexible
Demerits
• Highly subjective
• Hence, not suitable for forecasting and decision making <>
Statistics Page 19
Method of Moving Averages
• Method helps to reduce fluctuations and obtain trend values with fair degree of accuracy
• Method consists of taking arithmetic mean of the values for a certain time span and placing at the centre of time
span
• In case of even years, the centered moving average has to be found
• In some cases, weights may be given to the moving averages called weighted moving average
Merits and Demerits of Method of Moving Averages
Merits
• Simple and Objective method
• Flexible to add additional data without affecting calculations
• If period of moving average coincides with period of cyclical fluctuations, then they are automatically eliminated
Demerits
• No trend values for some initial and end periods
• No functional relationship between value and time
• Difficulty in selecting period of moving average
• Bias in case the trend is non-linear<>
Statistics Page 20
Selection of type of trend
• If first differences are constant, use linear method
• If second differences are constant, use quadratic method
• If first differences of logarithm are constant, use exponential curve
• If first differences tend to decrease by a constant percentage, use modified exponential curve
De-Seasonalisation of Data
• Elimination of seasonal variation is called as de-seasonalisation of data
• Either additive or multiplicative models are used
• Measurement of cyclical variations
Residual Method
• Eliminate Trends and Seasonal Variations from the original data using additive or multiplicative models
• Irregular variations are removed from this data by using the method of moving averages of appropriate period
• Cyclical variations are the only variations left and can be measured now
• Measurement of Irregular variations
Statistics Page 21
• Using additive or multiplicative models by removing trend, seasonal or cyclical variations
• They are found to be of small magnitude
Forecasting of Data
Qualitative Forecasting
• When historical data are not available
Quantitative Forecasting
• When historical data available
• Casual forecasting methods
• Time Series forecasting methods
Statistics Page 22
Objective Questions
CHOOSE THE BEST ANSWER / FILL IN THE BLANKS / TRUE OR FALSE
1. With which form of time series would you associate the following-
a. A fire in the factory delaying production for three weeks
b. Need for increased wheat production due to rise in the population
c. Change in day temperature from winter to summer
d. Increase in employment during harvest time
e. Price hike in petroleum products due to Gulf war
2. Fill in the blanks
a. An overall rise or fall in a time series is called____________
b. A time series consists of data arranged in _________________ order
c. The additive model is expressed as Y = ________________________
d. The multiplicative model is expressed as Y = ________________________
e. The trend line obtained by the method of least squares is known as line of __________
f. The component of time series useful for long-term forecasting is _____________
g. For the annual data _______________________component of time series is missing
h. If growth rate is constant, the trend line is _____________
i. A polynomial of the form Y = a + bX + cX2 is called _______________________
j. Trend is the overall tendency of the time series data to _____________ or _______________ over a long period of time
k. Seasonal variations are variations with periods of _________________ and are mostly caused by _________________
3. Choose the correct answer
a. Trend refers to a long term tendency to
i. Increase only
ii. Decrease only
iii. Increase or Decrease
iv. None of the above
b. If trend is absent in a time series, seasonal indices are obtained by using
i. Method of simple averages
ii. Ratio to trend method
iii. Ratio to moving average method
iv. Method of least squares
c. The most widely used method of measuring seasonal variations is
i. Method of simple averages
ii. Ratio to trend method
iii. Ratio to moving average method
iv. Link relative method
Statistics Page 23
d. The method used in the study of cyclical variations is
i. Ratio to trend method
ii. Ratio to moving average method
iii. Link relative method
iv. Residual method
Statistics Page 24
PROBLEMS
2. Following is the data pertaining to the sensex value and the gold price as on 1st of month from Jan to Sep 2010. What will be
the sensex value in Oct 2010, if the gold price will increase by 10% for diwali purchase season?
Month Jan 10 Feb 10 Mar 10 Apr 10 May 10 Jun 10 Jul 10 Aug 10 Sep 10
24 Ct Gold Price/gm 1500 1550 1600 1620 1700 1750 1800 1850 1900
Sensex 14000 15000 1550 15500 16000 17000 17500 18000 18500
Year Q1 Q2 Q3 Q4
2005 73 67 66 68
2006 70 63 61 66
2007 73 68 68 72
2008 75 64 61 67
2009 65 60 56 63
4. Monthly data pertaining to rice production in lakhs of tonnes the period of Jan 2007 to Dec 2009
Statistics Page 25
5. Calculate the seasonal variations by ratio to trend method for the following data from 2005 to 2009
Year IQ II Q III Q IV Q
2005 30 40 36 34
2006 34 52 50 44
2007 40 58 54 48
2008 54 76 68 62
2009 80 92 86 82
6. Calculate the seasonal variations by ratio to moving average method for the following data from 2007 to 2009
Year IQ II Q III Q IV Q
2007 68 62 61 63
2008 65 58 66 61
2009 68 63 63 67
Statistics Page 26
PROBABILITY
Concepts
Probability is the mathematics of chance. A probability experiment is a chance process that leads to well defined outcomes or
results. An outcome of a probability experiment is the result of a single trial of a probability experiment. Each outcome of a
probability experiment occurs at random. Each outcome of the experiment is equally likely. A trial means tossing a coin once,
rolling a die or drawing a single card from the deck. The set of all outcomes of a probability experiment is called a sample space.
Sample space can be represented using tree diagrams and tables. Probability Experiment is a process of chance that leads to well
defined outcomes or results. An event is one or more outcomes of a sample space. An event with a single outcome is called
simple event and with two or more outcomes is called a compound event.
Rules –
1. The probability of any event will always be from 0 to 1
2. When an event cannot occur (impossible event), the probability will be 0
3. When an event is certain to occur, the probability is 1
4. The sum of the probabilities of all the outcomes in the sample space is 1
5. The probability that an event will not occur = (1 – probability that event will occur)
Sample space can be represented in two ways: tree diagrams and tables.
A tree diagram can be used to determine the outcome of a probability experiment. A tree diagram consists of branches
corresponding to the outcomes of two or more probability experiments that are done in sequence.
Sample spaces can also be represented using tables. For example, the outcomes when selecting a card from an ordinary deck can
be represented by a table. When two dice are rolled, 36 outcomes can be represented by using a table. Once a sample space is
found, probabilities can be computed for specific events
Addition Rules-
Many times in probability, it is necessary to find probability of two or more events occurring. In these cases, the addition rules are
used.
When the events are mutually exclusive, they have no outcome in common.
P (A or B) = P (A) + P (B)
When the two events are not mutually exclusive, they have some common outcomes.
P (A or B) = P (A) + P (B) – P (A and B)
The key word in these problems is “Or”, and it means add or union.
Multiplication Rules-
When two events occur in sequence, the probability that both events occur can be found by using multiplication rules.
When two events are independent, the probability that the first event occurs does not affect or change the probability of the
second event occurring.
Statistics Page 27
P (A and B) = P (A). P (B)
If the events are dependent, the probability of the second event occurring is changed after the second event occurs.
Conditional Probability –
The key word for multiplication rule is “and” and it means intersection. Conditional probability is used when additional
information is known about the probability of an event.
Odds in favor =
Odds against =
Expected Value-
Mathematical expectations can be thought of as a long term average. If the game is played many times, the average of the
outcomes or the payouts can be computed using mathematical expectation.
E(x) =
In order to determine the number of outcomes or events, the fundamental counting rule, the permutation rules, and the
combination rule can be used. The difference between a permutation and a combination is that for a permutation, the order or
arrangement of the objects is important. For example, order is important in phone numbers, identification tags, social security
numbers, license plates, dictionary etc. Order is not important when selecting objects from a group.
Bayes’ theorem –
Statistics Page 28
Probability Distributions –
1. Uniform Distribution- A distribution is said to be uniform if the probability of the variable is equal for all values in the
given interval.
For example – If people come to a railway station in a uniform distribution and a train leaves every 5 minutes. What is the
probability that a person arriving at the station will have to wait for less than a minute?
The number of persons arriving is uniform and hence one in five persons arrive every minutes and hence probability = 0.2
2. Binomial Distribution –
• Each trial can only have two outcomes
• There are a fixed number of trials
• The outcome of each trial is independent of each other
• The probability for an outcome must be same for each trial
• where n is number of trials, r is number of successes, p is probability of success
3. Poisson Distribution –
• It is used when variable occurs over a period of time, over a period of area or volume
• P= where e is mathematical constant, λ is mean or expected value and x is number of successes where mean
and variance = np
Statistics Page 29
4. Normal Distribution –
• It is bell shaped and symmetric about the mean and continuous and asymptotic to the axis
• Area under the curve is 1
• The mean, median and mode are at the centre of the distribution
Statistics Page 30
Problems
1. When a die is rolled, what is the probability of getting a number greater than 4?
2. Two dice are rolled. The probability that the sum of spots on the faces will be ‘8’ is?
3. When two coins are tossed, the probability of getting two tails is?
4. When a card is selected from a standard pack, the probability that it is a ‘9’ is?
5. When a card is selected from a standard pack, the probability that it is a diamond or a number card is?
6. In a survey of 180 people, 7s are over 60. If a person is selected at random, what is the probability that the person is over
60?
7. If a letter is selected at random from the word “PROBABILITY”, the probability that it is a vowel is?
8. In a box, there are 6 white marbles, 3 blue marbles and 1 red marble. If a marble is selected at random what is the
probability that it is not white?
9. In a sample of 10 pieces, 4 are defective. If 3 are selected at random and tested, what is the probability that they are not
defective?
10. How many different 3 digit codes can be made?
11. If 30% of commuters ride to work on a bus, find the probability that if 8 workers are selected at random, 3 will ride the
bus.
12. A survey found that 10% of older people have given up driving. If a sample of 1000 persons is taken, the standard
deviation of the sample will be?
13. A board of directors consists of 7 women and 5 men. If 4 directors are selected at random, the probability that exactly 2
directors are men is?
14. The probability that there will be a car accident in a particular road is 0.01. The number of accidents follows Poisson
distribution. If there are 500 cars on the road on a particular day, find the probability that there will be exactly 4
accidents?
15. About 5% of rabbits are brown in color. If the distribution is Poisson, find the probability that in 100 randomly selected
rabbits, 7 rabbits are brown in color?
16. In an exam (which is approximately normally distributed), the average marks were 200 and variance was 400. If a person
who took the exam was selected at random, find the probability that the person scores above 230.
17. The average height for adult kangaroos is 64 inches with a variance of 4 inches. Assume normal distribution. If a
kangaroo is selected at random, find the probability that its height is between 62 and 66.8 inches
18. Box 1 contains 2 red balls and 1 blue ball. Box 2 contains 1 red ball and 3 blue balls. Each of the two boxes is selected
and a ball is selected from the box at random. If the ball is red, find the probability it came from box 1?
19. Two manufacturers supply paper cups to a certain catering service. ‘A’ supplied 100 cups and 5 were damaged. ‘B’
supplied 50 cups and 3 were damaged. If a cup is damaged, find the probability that it came from ‘A’?
20. A street vendor, if the vendor is caught by city inspector, must pay a fine of Rs 50. Otherwise, the vendor can make Rs
100 at Main Road or Rs 75 at Cross Road. Construct a payoff table, determine the optimal strategy for both locations,
and find the value of the game.
Statistics Page 31
HYPOTHESIS TESTING
1. Formulate a Hypothesis
2. Set up a suitable significance level
3. Select test criterion
4. Compute the statistic
5. Make the decision
H0 Accepted H0 Rejected
H0 is True Correct decision Type I error (α)
H0 is False Type II error (β) Correct decision
Explanations-
Statistics Page 32
Non-Parametric Tests
12
U −µ
Z=
σ
Where Ri is sum of ranks of each group and ni = number of observations in each group
• H Test (Kruskal Wallis Rank Sum Test for Equality of several means)
2
12 R
H= Σ i − 3(n + 1) Where n = total number of observations, Ri = group sum of ranks
n(n + 1) ni
Statistics Page 33
PROBLEMS-
1. A company surveyed 100 respondents to know about the importance of computers in their life. The respondents
indicated as follows. Use Kolmogorov-Smirnov test (K-S test) to test the hypothesis that there is no difference in ratings
amongst the respondents
1. The following data indicates the lifetime (in hours) of samples of two kinds of light bulbs in continuous use. Use Mann-
Whitney U test to compare the life time of brands A and B light bulbs.
Brand A 603 625 641 622 585 593 660 600 633 580 615 648
Brand B 620 640 646 620 652 639 590 646 631 669 610 619
2. A company used three different methods of advertising its product in three cities It found out the increased sales in
identical retail outlets in three cities as follows. Use Kruskal-Wallis method (H test) to test the hypothesis that the
increase in sales using different methods in different cities is the same at 5% level of significance.
Chennai 70 58 60 45 55 62 89 72
Mumbai 65 57 48 55 75 68 45 52 63
Kolkata 53 59 71 70 63 60 58 75
Statistics Page 34
Chi-Square Test
Statistics Page 35
PROBLEMS-
1. The following table gives the average number of calls received by an operator on various days of the week in a call
centre. Find out whether the calls are uniformly distributed over the week.
2. The following information is obtained concerning 50 randomly selected students. Can it be inferred that availing of loans
is more common among boys?
Statistics Page 36
• Z test for one sample mean-
x −µ σ
Z= Where is the standard error. If ‘ σ ’ is not given, we can use‘s’
σ n
n
Statistics Page 37
ANOVA
1. The following table gives the retail prices of a certain commodity in some selected shops in four cities as below. Can we say
the prices of the commodities differ in the four cities?
City Prices
Chennai 11 7 10 8
Mumbai 7 9 11
Delhi 9 4 7 3 2
Kolkata 8 12 12 8
2. The sales of 4 salesmen - A, B, C & D of the Company Sellers in three seasons are given below. Can we conclude that
overall sales are dependent on seasons? Are the four salesmen equally effective?
Season/Salesman A B C D
Summer 6 4 8 6
Winter 7 6 6 9
Monsoon 8 5 10 9
Statistics Page 38
Statistics Page 39
DECISION THEORY
DECISION UNDER UNCERTAINTY
1. A retailer has space for up to 4 Kgs of tomato in his store. The cost per Kg is Rs 30 and the selling price per Kg is Rs
50.Any units not sold at the end of the day are wasted. He sells in Kgs only. Construct a payoff and opportunity loss table.
2. A newspaper vendor can stock up to 10 newspapers in his store. There is a guaranteed demand for 5 newspapers. Each
newspaper costs Rs 2 per unit and is sold for Rs 4. Unsold newspapers are disposed off for Rs 1 per unit. Construct a payoff
and opportunity loss table.
3. A food product company is contemplating the introduction of a new product to replace an existing product at a higher price
(S1), modifying the existing product at a moderately increased price (S2), and continuing the same product with new
packaging at a nominally increased price (S3). Sales may increase (E1), not change at all (E2) or decrease (E3) with respect
to these strategies. The marketing department has given profits for each of these strategies are given below-
E1 E2 E3
S2 500,000 450,000 0
What strategy should the company choose on the basis of - Maximin criterion, Maximax criterion, Minimax Regret
criterion, Laplace criterion and Hurwitz criterion (α=0.8)?
4. A milk producer needs to determine how many litres of milk are to be produced on a daily basis to meet demand. Milk is
sold in multiples of 5 litres only and there is an assured demand for 15 litres every day. Milk costs Rs 14 per litre and is sold
at Rs 20 per litre. Unsold milk is disposed off. Past records of 200 days show the following demand pattern
Milk (Litres) 15 20 25 30 35 40 45
No. of days 4 16 20 80 40 30 10
Construct a conditional profit table, Identify the best course of action for maximum expected profits and Calculate EVPI
Statistics Page 40
Statistics Page 41