0 evaluări0% au considerat acest document util (0 voturi)

25 vizualizări112 paginiSep 20, 2019

© © All Rights Reserved

DOC, PDF, TXT sau citiți online pe Scribd

© All Rights Reserved

0 evaluări0% au considerat acest document util (0 voturi)

25 vizualizări112 pagini© All Rights Reserved

Sunteți pe pagina 1din 112

Draft

By

Paul Chege

C. TEMPLATE STRUCTURE

I. INTRODUCTION

1. TITLE OF MODULE

Probability and Statistics

Secondary school statistics and probability.

3. TIME

The total time for this module is 120 study hours.

4. MATERIAL

Students should have access to the core readings specified later. Also, they will need a

computer to gain full access to the core readings. Additionally, students should be able

to install the computer software wxMaxima and use it to practice algebraic concepts.

5. MODULE RATIONALE

Probability and Statistics, besides being a key area in the secondary schools’

teaching syllabuses, it forms an important background to advanced mathematics at

tertiary level. Statistics is a fundamental area of Mathematics that is applied across

many academic subjects and is useful in analysis in industrial production. The study

of statistics produces statisticians that analyse raw data collected from the field to

provide useful insights about a population. The statisticians provide governments

and organizations with concrete backgrounds of a situation that helps managers in

decision making. For example, rate of spread of diseases, rumours, bush fires,

rainfall patterns, and population changes.

On the other hand, the study of probability helps decision making in government

agents and organizations based on the theory of chance. For example:- predicting

the male and female children born within a given period and projecting the amount of

rainfall that regions expect to receive based on some historical data on rainfall

patterns. Probability has also been extensively used in the determination of high,

middle and low quality products in industrial production e.g the number of good and

defective parts expected in an industrial manufacturing process.

2

II. CONTENT

6. Overview

This module consists of three units:

mathematics or as an introduction to first time learners of statistics. It introduces the

measures of dispersion in statistics. The unit also introduces the concept of

probability and the theoretical treatment of probability.

This unit requires Unit 1 as a prerequisite. It develops from the moment and moment

generating functions, Markov and Chebychev inequalities, special univariate

distributions, bivariate probability distributions and analyses conditional probabilities.

The unit gives insights into the analysis of correlation coefficients and distribution

functions of random variables such as the Chi-square, t and F.

This unit builds up from unit 2. It analyses probability using indicator functions. It

introduces Bonferoni inequality random vectors,, generating functions, characteristic

functions and statistical independence random samples. It develops further the

concepts of functions of several random variables and independence of X and S2 in

normal samples order statistics. The unit summarises with the treatment of

convergence and limit theorems.

Outline: Syllabus

Level 1. Priority A. No prerequisite.

Frequency distributions relative and cumulative distributions, various frequency

curves, mean, Mode Median. Quartiles and Percentiles, Standard deviation,

symmetrical and skewed distributions. Probability; sample space and events;

definition of probability, properties of probability; random variables; probability

distributions, expected values of random variables; particular distributions; Bernoulli,

binomial, Poisson, geometric, hypergeometric, uniform, exponential and normal.

Bivariate frequency distributions. Joint probability tables and marginal probabilities.

3

Unit 2 ( 40 hours): Random Variables and Test Distributions

Level 2. Priority B. Statistics 1 is prerequisite.

Moment and moment generating function. Markov and Chebychev inequalities,

special Univariate distributions. Bivariate probability distribution; Joint Marginal and

conditional distributions; Independence; Bivariate expectation Regression and

Correlation; Calculation of regression and correlation coefficient for bivariate data.

Distribution function of random variables, Bivariate normal distribution. Derived

distributions such as Chi-Square. t. and F.

Level 3. Priority C. Statistics 2 is prerequisite.

Probability: Use of indicator functions. Bonferoni inequality Random vectors.

Generating functions. Characteristics functions. Statistical independence Random

samples. Multinomial distribution. Functions of several random variables.

The independence of X and S2 in normal samples Order statistics Multivariate

normal distribution. Convergence and limit theorems. Practical exercises.

4

Graphic Organiser

Variance

&

Standard

deviation

Frequency

Mean, Curves,

Mode, and Quartiles

Median Deciles and

Percentiles,

Indicator

functions

DATA Moment Markov and

and Chebychev

moment inequalities

generating

function

Bonferoni

Inequalities,

random Probabilit Joint marginal Univariate

vectors y & conditional and

distributions Bivariate

distributions

Generating

functions,

characteristic Probability Regression Derived

functions & distributions & correlation distributions-

random samples Chi-square,

t and F

distributions, distribution, probability

Functions of Convergence & tables

random variables limit theorems

5

7. General Objective(s)

By the end of this module, the trainee should be able to compute the various measures

of dispersions in statistics and work out probabilities based on laws of probability and

carry out tests on data using the theories of probability

Unit 1: Descriptive Statistics and Probability Distributions ( 40 Hours)

By the end of unit 1, the trainee should be able to:

Draw various frequency curves

Work out the mean, mode, median, quartiles, percentiles and standard deviations

of discrete and grouped data

Define and state the properties of probability

Illustrate random variables, probability distributions, and expected values of

random variables.

Illustrate Bernoulli, Binomial, Poisson, Geometric, Hypergeometric, Uniform,

Exponential and Normal distributions

Investigate Bivariate frequency distributions

Construct joint probability tables and marginal probabilities.

By the end of unit 2, the trainee should be able to:

Illustrate moment and moment generating functions

Analyse Markov and Chebychev inequalities

Examine special Univariate distributions, bivariate probability distributions, Joint

marginal and conditional distributions.

Show Independence, Bivariate expectation, regression and correlation

Calculate regression and correlation coefficient for bivariate data

Show distribution function of random variables.

Examine Bivariate normal distribution

Illustrate derived distributions such as Chi-Square, t, and F.

By the end of unit 3, the trainee should be able to:

Use indicator functions in probability

Show Bonferoni inequality random vectors

Illustrate generating and characteristic functions

Examine statistical independence random samples and multinomial distribution

Evaluate functions of several random variables

Illustrate the independence of X and S2 in normal samples order statistics

Show multivariate normal distribution

Illustrate convergence and limit theorems.

Work out practical exercises.

6

III. TEACHING AND LEARNING ACTIVITIES

Statistics.

QUESTIONS

1) When a die is rolled, the probability of getting a number greater than 4 is

1

A.

6

1

B.

3

1

C.

2

D. 1

2) A single card is drawn at random from a standard deck of cards. Find the probability

that is a queen.

1

A.

13

1

B.

52

4

C.

13

1

D.

2

3) Out of 100 numbers, 20 were 4’s, 40 were 5’s, 30 were 6’s and the remainder were

7’s. Find the arithmetic mean of the numbers.

A. 0.22

B. 0.53

C. 2.20

D. 5.30

7

4) Calculate the mean of the following data.

60 - 62 61

63 - 65 64

66 - 68 67

69 - 71 70

72 - 74 73

A. 57.40

B. 62.00

C. 67.45

D. 72.25

and 4.

A. 4

B. 5

C. 6

D. 8

A. From 0 to 1

B. From -1 to +1

C. From 1 to 100

1

D. From 0 to

2

A. 12

B. 5

C. 8

D. 6

A. 9

B. 11

C. 7

D. 8.88

8

A. H, T and HT

B. HH, HT, TH, TT

C. HH, HT, TT

D. H, T

10) If a letter is selected at random from the word “Mississippi”, find the probability that

it is an “i”

1

A.

8

1

B.

2

3

C.

11

4

D.

11

ANSWER KEY

1. B 2. A 3. D 4. C 5. B

6. A 7. D 8. B 9. B 10. D

This pre-assessment is meant to give the learners an insight into what they can

remember regarding Probability and Statistics. A score of less than 50% in the

pre-assessment indicates the learner needs to revise Probability and Statistics

covered in secondary mathematics. The pre-assessment covers basic concepts

that trainees need to be familiar with before progressing with this module. Please

revise Probability and Statistics covered in secondary mathematics to master the

basics if you have problems with this pre-assessment.

9

KEY CONCEPTS ( GLOSSARY)

1) Mutually Exclusive: Two events are mutually exclusive if they cannot occur at

the same time.

2) Variance of a set of data is defined as the square of the standard deviation i.e

variance = s2.

3) A trial: This refers to an activity of carrying out an experiment like picking a card

from a deck of cards or rolling a die or dices

4) Sample space: This refers to all possible outcomes of a probability experiment.

e.g. in tossing a coin, the outcomes are either Head(H) or tail(T)

5) A random variable: is a function that assigns a real number to every possible

result of a random experiment.

6) Random sample is one chosen by a method involving an unpredictable

component.

7) Bernoulli distribution: is a discrete probability distribution, which takes value 1

with success probability p and value 0 with failure probability q = 1 − p.

8) Binomial distribution is the discrete probability distribution of the number of

successes in a sequence of n independent yes/no experiments, each of which

yields success with probability .p

9) Hypergeometric distribution: is a discrete probability distribution that describes

the number of successes in a sequence of n draws from a finite population

without replacement.

10)Poisson distribution: is a discrete probability distribution that expresses the

probability of a number of events occurring in a fixed period of time if these

events occur with a known average rate, and are independent of the time since

the last event

11) Correlation: is a measure of association between two variables.

12)Regression: is a measure used to examine the relationship between one

dependent and one independent variable.

13)Chi-square test is any statistical hypothesis test in which the test statistic has a

chi-square distribution when the null hypothesis is true, or any in which the

probability distribution of the test statistic (assuming the null hypothesis is true)

can be made to approximate a chi-square distribution as closely as desired by

making the sample size large enough.

14)Multivariate normal distribution is a specific probability distribution, which can

be thought of as a generalization to higher dimensions of the one-dimensional

normal distribution.

15)t -test is any statistical hypothesis test for two groups in which the test statistic

has a Student's t distribution if the null hypothesis is true

10

STATISTICAL TERMS:

1) Raw data: Data that has not been organised numerically

2) Arrays: An arrangement of raw data numerical data in ascending order of

magnitude.

3) Range: the difference between the largest and the smallest numbers in a data

4) Class intervals: In a range of grouped data e.g 21-30, 31-40 etc, then 21-30 l is

called the class interval.

5) Class limits: In a class interval of 21-30, then 21 and 30 are called class limits.

6) Lower class limits (l.c.l) : In the class interval 21-30, the lower class limit is 21.

7) Upper class limit (u.c.l): in the class interval 21-30, the upper class limit is 30

8) Lower and upper class boundaries: In the class interval 21-30, the lower class

boundary is 20.5 and the upper class boundary is 30.5. These boundaries

assume that theoretically measurements for a class interval 21-30 includes all

the numbers from 20.5 to 30.5.

9) Class Interval: In a class 21-30, then the class interval is the difference between

the upper class limit and the lower class limit i.e. 30.5-20.5 = 10. The class

interval is also known as class width or class size.

10)Class Mark or Mid-point: In a class interval 21-30, the class mark is the

21 30

average of 21 and 30 i.e 25.5

2

11) Frequency Distributions: large masses of raw data maybe arranged in

classes in tabular form with their corresponding frequencies. e.g.

table.

cumulative frequencies are calculated as additions of individual frequencies

Frequency (f) 4 10 16 8 2

Frequency( C.F

)

frequencies of all smaller values.

The above table is called a Cumulative Frequency table.

11

13). Relative – Frequency Distributions: In a frequency distribution

Frequency (f) 4 10 16 8 2

f 40

divided by the total frequency of all classes (cumulative frequency) and

generally expressed as a percentage.

Example:

f

The relative frequency of the class 25-29 = 100% = 10 100 25%

f 40

Frequency (f) 4 10 16 8 2

Frequency( C.F

)

12

From the above Cumulative Frequency table, we can draw a graph of

Cumulative frequency verses the upper class boundaries.

boundaries

Cumulative 3 14 30 38 40

frequencies

Note: From the cumulative frequency data, the first plotting point is ( 24.5, 3). If we

started our graph at this point, it would remain hanging on the y-axis. We create

another point (19.5, 0) as a starting point. 19.5 is the projected upper class

boundary of the preceding class.

13

SHAPES OF FREQUENCY CURVES

of the central maximum e.g. normal curve Has the maximum towards the left and

the longer tail to the right

Has the maximum towards the right of Has the maximum occurring at the right

the and the longer tail to the left end

14

Reverse J-Shaped U- shaped

Has the maximum occurring at the left Has maxima at both ends

end

Bimodal Multimodal

15

COMPILED LIST OF COMPULSORY READINGS

Abstract : This reference gives the much needed reading material in probability and

statistics. The reference has a number of illustrations that empower the learner through

different approach methodology. Wolfram MathWorld is a specialised on-line

mathematical encyclopaedia.

Rationale: It provides the most detailed references to any mathematical topic. Students

should start by using the search facility for the module title. At any point students should

search for key words that they need to understand. The entry should be studied

carefully and thoroughly.

extremely up-to-date as entries are continually revised. Also, it has proved to be

extremely accurate. The mathematics entries are very detailed.

Rationale: It gives definitions, explanations, and examples that learners cannot access

in other resources. The fact that wikipedia is frequently updated gives the learner the

latest approaches, abstract arguments, illustrations and refers to other sources to

enable the learner acquire other proposed approaches in Probability and Statistics.

the internet. The resources are 16rganised by historical characters and by historical

themes.

Rationale: Students should search the MacTutor archive for key words in the topics

they are studying (or by the module title itself). It is important to get an overview of

where the mathematics being studied fits in to the history of mathematics. When the

student completes the course and is teaching high school mathematics, the characters

in the history of mathematics will bring the subject to life for their students. Particularly,

the role of women in the history of mathematics should be studied to help students

understand the difficulties women have faced while still making an important

contribution.. Equally, the role of the African continent should be studied to share with

students in schools: notably the earliest number counting devices (e.g. the Ishango

bone) and the role of Egyptian mathematics should be studied.

16

Compulsory Resources

Resource #1 Maxima.

without resources to handle them. The absence of face to face daily lessons with

teachers means that learners can become totally handicapped if not well equipped with

resources to solve their mathematical problems. This handicap is solved by use of

accompanying resource: Maxima.

Rationale: Maxima is an open-source software that can enable learners to solve linear

and quadratic equations, simultaneous equations, integration and differentiation,

perform algebraic manipulations: factorisation, simplification, expansion, etc This

resource is compulsory for learners taking distance learning as it enables them learn

faster using the ICT skills already learnt.

Resource #2 Graph

most especially functions in 3 dimensions. The learners, being distance learners, will

inevitably encounter situations that will need mathematical graphing. This course is

accompanied by a software called Graph to help learners in graphing. Learners

however need to familiarise with the Graph software to be able to use it.

access on the given CD. It helps all mathematics learners to graph what would

otherwise be a nightmare for them. It is simple to use once a learner invests time to

learn how to use it. Learners should take advantage of the Graph software because it

can assist the learners in graphing in other subjects during the course and after.

Learners will find it extremely useful when teaching mathematics at secondary school

level.

17

USEFUL LINKS

Useful Link #1

Title : Wikipedia

URL : http:www. http://en.wikipedia.org/wiki/Statistics

Screen capture :

is frequently updated. Most learners will encounter problems of reference materials from

time to time. Most of the books available cover only parts or sections of Probability and

Statistics. This shortage of reference materials can be overcome through the use of

Wikipedia. It’s easy to access through “Google search”

Rationale: The availability of Wikipedia solves the problem of crucial learning materials

in all branches of mathematics. Learners should have first hand experience of

Wekipedia to help them in their learning. It is a very useful free resource that not only

solves student’s problems of reference materials but also directs learners to other

related useful websites by clicking on given icons. Its usefulness is unparalleled.

Useful Link #2

Title : Mathsguru

URL : http://en.wikipedia.org/wiki/Probability

Screen capture :

branches of number theory module. It is easy to access through Google search and

provides very detailed information on various probability questions. It offers

explanations and examples that learners can understand easily.

Rationale: Mathsguru gives alternative ways of accessing other subject related topics,

hints and solutions that can be quite handy to learners who encounter frustrations of

getting relevant books that help solve learners’ problems in Probability. It gives a helpful

approach in computation of probabilities by looking at the various branches of the

probability module.

18

Useful Link #3

URL : http:www. http://mathworld.wolfram.com/Probability

Screen capture :

Learners’ should access this website quite easily through Google search for easy

reference. Wolfram also leads learners to other useful websites that cover the same

topic to enhance the understanding of the learners.

providing new challenges and methodology in number theory. The site comes handy in

mathematics modelling and is highly recommended for learners who wish to study

number theory and other branches of mathematics. It gives aid in linking other webs

thereby furnishing learners with a vast amount of information that they need to

comprehend in Probability and Statistics.

19

1) UNIT1: 40 HOURS)

LEARNING ACTIVITY 1

1. Plants 80 tree seedlings on 1st March. She measures the heights of the trees

on 1st December.

2. She weighs all the 40 cows in her farm and records the weights in her diary.

3. She records the daily production of eggs from the poultry section.

4. She records the time taken to deliver the milk to the processing plant.

The records are kept as below.

1. Heights of plants in cm

77 76 62 85 63 68 82 67 75 68

74 85 71 53 78 60 81 80 88 73

75 53 95 71 85 74 73 62 75 61

71 68 69 83 95 94 87 78 82 66

60 83 60 68 77 75 75 78 89 96

72 71 76 63 62 78 61 65 67 79

75 53 62 85 93 88 97 79 73 65

93 85 76 76 90 72 57 84 73 86

2. Weights

of goats in kg

(kg) 126 135 144 153 162 171 180

No. of 3 5 9 12 5 4 2

goats

20

3. Number of laid eggs

Eggs 462 480 498 516 534 552 570 588 606 624

No of 98 75 56 42 30 21 15 11 6 2

days

minutes 100 89 79 69 59 49 39

No. of days 9 32 43 21 11 3 1

CASE 1:

A local firm dealing with agriculture extension services visits the farmer. She proudly

produces her records. The agricultural officer is very impressed by her good records

but clearly realises that the farmer needs some skills in data management to enable

her make informed decisions based on her farm outputs.

The agricultural officer designs a short course on data processing for all the rural

farmers.

During the course planning stage, the following terms are defined and designed for a

lesson one to the farmers.

b) Frequency: Rate of occurrence e.g. number of goats weighed.

c) Mean: The average of a data

d) Mode: The highest occurring in a data.

e) Median: In an ascending data, the median is the term occurring at the middle of

the data.

f) Range: the difference between the highest and the lowest in the data.

Introduction to Statistics

summarize a set of data. In a sense, we are using the data on members of a set to

describe the set. The techniques are commonly classified as:

2. Tabular description in which we use tables to summarize data.

3. Parametric description in which we estimate the values of certain parameters

which we assume to complete the description of the set of data.

21

In general, statistical data can be described as a list of subjects or units and the data

associated with each of them. We have two objectives for our summary:

1. We want to choose a statistic that shows how different units seem similar.

Statistical textbooks call the solution to this objective, a measure of central

tendency.

2. We want to choose another statistic that shows how they differ. This kind of

statistic is often called a measure of statistical variability.

answer the first question with the arithmetic mean, the median, or the mode.

Sometimes, we choose specific values from the cumulative distribution function called

quartiles.

The most common measures of variability for quantitative data are the variance; its

square root, the standard deviation; the statistical range; interquartile range; and the

absolute deviation.

FARMERS LESSONS

The farmers are taught how to compute the

a) Mean or Average of a data as follows:

Average of a data= Sum total of the data divided by number of items in data.

Example:

Calculate the mean of the following data:

1) 1,3,4,4,5,6,3,7,

1 3 4 4 5 6 3 7 33

Solution: Mean = = = 4.125

8 8

Solution:

650 675 700 725 800 900 1050 1125 1200 575

Mean =

10

8400

=

10

= 840

22

LESSON TWO

Example:

1) Find the mean of the following data:

X 22 24 25 33 36 37 41

f 5 7 8 4 6 9 11

Solution:

Mean = = 32.56

5 7 8 4 6 9 11 50

No. of Workers 12 15 18 20 5

Solution:

220(12) 250(15) 300(18) 350(20) 375(5) 20665

Mean = =

12 15 18 20 5 70

= $ 295.214

Example:

The weights of milk deliveries to a processing plant are shown below:

45 49 50 46 48 42 39 47 42 51

48 45 45 41 46 37 46 47 43 33

56 36 42 39 52 46 43 51 46 54

39 47 46 45 35 44 45 46 40 47

23

a) Using class intervals of 5, tabulate this data in a frequency table

b) Calculate the mean mass of the milk delivered.

Solution:

33- 37 //// 4

48-52 //// // 7

53-57 // 2

Total 40

35

2

53-57 // 2 55 110

Total 40 1775

24

Mean =

fx 1775 44.375

f 40

DO THIS

2). x 1 2 3 4 5

f(x) 11 10 5 3 1

3).

Weight (x) 4-8 9-13 14-18 19-23 24-28 29-33

Frequenc 2 4 7 14 8 5

y

5).

Height (x) 61 64 67 70 73

25

Frequenc 5 18 42 27 8

y

6).

Weight (x) 30.5-36.5 36.5-42.5 42.5-48.5 48.5-54.5 54.5-60.5

Frequenc 4 10 14 27 45

y

Answer Key:

1). 66.4 2). 2.1 3). 20.6

4) 80 5) 76.45 6) 51.44

26

LESSON THREE

MODE

Example

1) Find the mode of the following data: 1,3,4,4,5,6,1,3,3,2,2,3,3,5

Solution:

The mode of a data is the item that appears most times. In this data, 3 occurs most

times or most frequently i.e. 5 times. Therefore the mode is 3.

2) Find the mode of the following data: 22, 24, 25,22, 27, 22, 25, 30, 25, 31

Solution:

22 and 25 occur three times each. Therefore the modes are 22 and 25. this is called a

bimodal data.

3) Find the mode of the data:

Observation 0 1 2 3 4

( X)

Frequency 3 7 10 16 11

( f)

Solution:

The most occurring observation is 3 i.e. 3 occurs 16 times.

Frequency 3 6 8 5 15 9 13

( f)

Solution:

The modal class is 70-74 because it has the highest frequency of occurrence.

27

DO THIS

1) 6, 8, 3,5,2,6,5,9,5

2) 20.4, 20.8, 22.1, 23.4, 19.7, 31.2, 23.4, 20.8, 25.5,23.4

3)

Weight (x) 4-8 9-13 14-18 19-23 24-28 29-33

Frequenc 2 4 7 14 8 5

y

4)

Weight (x) 30.5-36.5 36.5-42.5 42.5-48.5 48.5-54.5 54.5-60.5

Frequenc 4 10 14 27 45

y

Answer key:

1) 5 2) 23.4 3) 19-23 4) 54.5-60.5

LESSON FOUR

MEDIAN

The median is the value in the middle of a distribution e.g. in 1, 2,3,4,5, the median is 3

i.e it comes at exactly in the middle of the distribution. For the data 1,2,2,3,4,5,6,7,7,8;

there are 10 terms and no middle number. In such a case, the median is the average of

the two numbers bordering the centre line

Eg 1,2,2,3, 4 5 6,7,7, 8

45

Therefore the median = 4.5

2

28

MEDIAN OF A GROUPED DATA

Example

Frequency (f) 4 10 16 8 2

Solution:

20 21

= 10.5th term

2

The Lower Class Limit ( L.C.L) or lower class boundary and the Upper Class Limits

(U.C.L) or upper class boundary are the lower and upper bounds of a class interval e.g

the lower and upper limits of the class interval 20-24 are 19.5 and 20.5 and the L.C.L

and U.C.L of the class interval 35-39 are 34.5 and 39.5.

Frequency (f) 4 10 16 8 2

Frequency

Step 2: L.C.L and U.C.L of 30-34 are 29.5 and 34.5

Step 3: Work out the Cumulative Frequency ( C.F)

Step 4: Work out the class interval as U.C.L – L.C.L

Step 5: To get the 10.5th term.

29

Summation difference

10.5 th term L.C.L of class with median Class Interval

Class frequency

i.e Summation difference 20.5 – 14 = 6.5 where 14 is the C.F of the class interval 25-29.

6.5

Step 6: The median = 29.5 + 5 = 31.53125.

16

Note that the denominator 16 is the class frequency in the class interval 30-34.

RANGE OF A DATA

The range of a data is simply the difference between the highest and the lowest score in

a data

Example: 23,26,34, 47,63 the range is 63-23=40 and in 121, 65, 78, 203, 298, 174 the

range is 298 – 65= 233.

1) QUARTILES

Data arranged in order of magnitude can be subdivided into four equal portions i.e. 25%

each. The first portion is the lower quartile occurring at 25%. The middle or centre

occurring at 50% is called the median while the third quarter occurring at 75% is called

the upper quartile. The three points are normally referenced as Q 1, Q2 , Q3 respectively.

Q Q1

Q 3

2

3) DECILES

If data arranged in order of magnitude is sub-divided into 10 equal portions ( 10% each),

then each portion constitutes a decile. The deciles are denoted by D 1, D2, D3,……D9

4) PERCENTILES

If data divided arranged in order of magnitude is subdivided into 100 equal portions

(1%each), then the portion constitutes a percentile. Percentiles are denoted as P 1, P2,

P3…, P99

30

THE MEAN DEVIATION

The mean deviation (average deviation), of a set of N numbers X 1 ,X2, X3, X4, X5,……,

XN is defined by

N

X X

Mean deviation (MD) = j 1 j = XX = X X , where X is the arithmetic

N

N

mean of the numbers and X X is the absolute value of the deviation of X j from X .

Example:

Find the mean deviation of the set 3, 4, 6, 8, 9.

Solution:

3 4 6 8 9 30

Arithmetic mean = 6

5 5

36 46 66 86 96

The mean deviation ( X ) = =

5

3 2 0 2 3 3 2 0 2 3 10

= 5

5 5 2

Values X1 X2 X3 …… XN

Frequencies f1 f2 f3 …. Fm

m

f X X

Mean deviation = j 1 j j f XX

XX

N N

31

THE STANDARD DEVIATION

The Standard deviation of a set of N numbers X 1 ,X2, X3, X4, X5,……, XN is denoted by s

and is defined by:

N 2

s=

(X j X )

j 1 = ( X X )2 = x2 = ( X X )2

N N

N

It follows that the standard deviation is the root mean square of the deviations

from the mean.

Values X1 X2 X3 …… XN

Frequencies f1 f2 f3 …. Fm

m 2

f j (X X )

s= j 1 f (X X )

2

fx

2

( X X )2

N N N

m

where N= f f .

j

j 1

THE VARIANCE

The variance of a set of data is defined as the square of the standard deviation i.e

variance = s2. We sometimes use s to denote the standard deviation of a sample of a

population and ( Greek letter sigma ) to denote the standard deviation of a

population population. Thus 2 can represent the variance of a population and s 2 the

variance of sample of a population.

EXAMPLES

32

Solutions

8) Mean =

n

x =

5 5 4 4 4 4 2 2 2

3 .5 6

9

N

n

or

0 =

5 2 4 4 2 3

0

9

9

= 3.56

9) Range 5 – 2 =3.

Example

Given 13 observations

1,1,2,3,4,4,5,6,8,10,14,15,17

n 1 14

The median falls = 607

2 2

14

The value = 7th position. The median is 5

2

n 1

2

But if it is even, we consider the average of the two middle terms.

10) Example

1,1,2,2,3,4,4,5,6,8,10,14,15,17

4 5

= 4 .5

2

33

Median of Grouped Data

is the value at or below 50% of the observation

fall.

DO THIS

1. 1,1,2,2,3,4,5,7,7,7,9

2. 7,8,1,1,9,19,11,2,3,4,8

Group Work

1. Study the computation of the variance and standard

from the following example.

Definition

h ( x x) 2

s

2

N

Where: x x is deviation from the mean, N is number of observations

s 2 is variance and s2 is standard deviation.

Example

Given the data 2,4,5,8,11. Find the variance and the standard deviation.

X

xx ( x x) 2

2 -4 16

4 -2 4

5 -1 1

8 2 4

11 5 25 34

x= ( x x)

2 =5

5

0

30 50

So x 6 52 10

5 5

50

Variance= s 10

2

5

Standard deviation = √10.

DO THIS

SKEWNESS

positive and negative skewness above)

For skewed distributions, the mean tends to lie on the same side of the mode as the

longer tail.

mean mod e X mod e

Skewness= s tan dard

deviation s

This coefficient is defined as:

35

3(mean median) 3( X median)

Skewness= s tan dard

deviation s

(Q Q ) (Q Q ) Q 2Q Q

3 2 2 1 3 2 1

Quartile coefficient of skewness = Q Q Q Q

3 1 3 1

This is defined as:

(P P ) (P P ) P 2P P

90 50 50 10 90 50 10

10-90 percentile of skewness = P P P P

90 10 90 10

( n 1) x 0.25

25th percentile = 9(.25) 22.5( percentile)

2nd = 2

4th = 4

Group Work

1. Study the computation of percentiles and attempt

the following question..

DO THIS

36

Find the 25th percentile, the 50th percentile, and 90th percentile

46,21,89,42,35,36,67,53,42,75,42,75,47,85,40,73,48,32,41,20,75,48,48,32,52,61,49,50,

69,59,30,40,31,25,43,52,62,50

Answer Key

a) 36 b) 48 c) 73

KURTOSIS

Definition: Kurtosis is the degree of peakedness of a distribution, as compared to the

normal distribution.

EXAMPLES:

1) LEPTOKURTIC DISTRIBUTION

2) PLATYKURTIC DISTRIBUTION

DO THIS

37

Find the mode for the data collection:

1) 1,3,4,4,2,3,5,1,3,3,5,4,2,2,2,3,3,4,4,5

2) Number of marriage per 1000 persons in Africa population for years 1965 – 1975

Year Rate

1965 9.3

1966 9.5

1967 9.7

1968 10.4

1969 10.6

1970 10.6

1971 10.6

1972 10.9

1973 10.8

1974 10.5

1975 10.0

3) Number of deaths per 1000 years for years 1960 and 1965 – 1975

1960 9.5

1965 9.4

1966 9.5

1967 9.4

1968 9.7

1969 9.5

1970 9.5

1971 9.3

1972 9.4

1973 9.3

1974 9.1

1975 8.8

SOLUTIONS

1. 3

2. 10.6

3. 9.5

READ:

An Introduction to Probability by Charles M.

Grinstead pages 247 -263

Exercise on pg 263-267 Nos. 4,7,8,9

38

PROBABILITY

Terminology

a) A Probability experiment

When you toss a coin or pick a card from a deck of playing cards or roll a dice, the act

constitutes a probability experiment. In a probability experiment, the chances are well

defined with equal chances of occurrence e.g. there are only two possible chances of

occurrence in tossing a coin. You either get a head or tail. The head and the tail have

equal chances of occurrence.

b) An Outcome

This is defined as the result of a single trial of a probability experiment e.g. When you

toss a coin once, you either get head or tail.

c). A trial

This refers to an activity of carrying out an experiment like picking a card from a deck of

cards or rolling a die or dices.

This refers to all possible outcomes of a probability experiment. e.g. in tossing a coin,

the outcomes are either Head(H) or tail(T) i.e there are only two possible outcomes in

tossing a coin. The chances of obtaining a head or a tail are equal.

e). A Simple and Compound Events

In an experimental probability, an event with only one outcome is called a simple event.

If an event has two or more outcomes, it is called a compound event.

2) Definition of Probability.

Probability can be defined as the mathematics of chance. There are mainly four

approaches to probability;

1) The classical or priori approach

2) The relative frequency or empirical approach

3) The axiomatic approach

4) The personalistic approach

Probability is the ratio of the number of favourable cases as compared to the total likely

cases. Suppose an event can occur in N ways out of a total of M possible ways. Then

the probability of occurrence of the event is denoted by

N

p=Pr(N)= . Probability refers to the ratio of possible outcomes to all possible

M

outcomes.

39

The probability of non-occurrence of the same event is given by {1-p(occurrence)}. The

probability of occurrence plus non-occurrence is equal to one.

If probability occurrence; p(O) and probability of non-occurrence (O’), then p(O)

+p(O’)=1.

For example:

Observation ( X) 0 1 2 3 4

Frequency ( f) 3 7 10 16 11

freuency of 2 f ( 2) 10 10

P(2)= sum of frequencies

f 3 7 10 16 11 47

3) Properties of Probability

a) Probability of any event lies between 0 and 1 i.e. 0 p(O) 1. It follows that

probability cannot be negative nor greater than 1.

b) Probability of an impossible event ( an event that cannot occur ) is always zero(0)

c) Probability of an event that will certainly occur is 1.

d) The total sum of probabilities of all the possible outcomes in a sample space is

always equal to one(1).

e) If the probability of occurrence is p(o)= A, then the probability of non-occurrence

is 1-A.

COUNTING RULES

1) FACTORIALS

Definition: Factorial 4 ! = 4 x 3 x 2 x 1 and 7! = 7 x 6 x 5 x 4 x 3 x 2 x 1

40

2) PERMUTATION RULES

n !

Definition: nPr = (n r ) !

Examples:

5! 5 x 4 x3 x 2 x1

5 x 4 x3 60

P3 = (5 3)! 2 x1

5

8! 8! 8 x7 x 6 x5 x 4 x3 x 2 x1

8P5 = 8 x7 x 6 x5 x 4 6720

(8 5)! 3! 3 x 2 x1

3) COMBINATIONS

n !

Definition: nCr = ( n r ) ! r !

Examples:

5! 5 x 4 x3 x 2 x1 5 x 4

5C2 = (5 2)!2! 3! 2!

2 x1

10

10! 10! 10 x9 x8 x7 x 6! 10 x9 x8 x7

10C6 = (10 6)!6! 4! 6! 4 x3x 21x 6! 4 x3x 2 x1 210

DO THIS

1). 8P3

2) 8C3

3) 15C10

4) 6C3

5) 15P4

6) 9C3

7) 10C8

8) 7P4

Answer key

1) 336 2) 56 3) 3003 4) 20

5) 32 760 6) 84 7) 90 8) 840

41

RULES OF PROBABILITY

ADDITION RULES

P(A or B)=P(A)+P(B)

Solution: P(3) =1/6 and P(5) =1/6.

Therefore P( 3 or 5) = P(3) + P(5) = 1/6+1/6 =2/6=1/3.

2) Rule 2: If A and B are two events that are NOT mutually exclusive, then

P(A or B) = P(A) + P(B) - P(A and B), where A and B means the number of

outcomes that event A and B have in common.

Example: When a card is drawn from a pack of 52 cards, find the probability that the

card is a 10 or a heart.

Solution:

P( 10) = 4/52 and P( heart)=13/52

P ( 10 that is Heart) = 1/52

P( A or B) = P(A) +P(B)-P( A and B) = 4/52 _ 13/52 – 1/52 = 16/52.

MULTIPLICATION RULES

1) Rule 1: For two independent events A and B, then P( A and B) = P(A) x P(B).

Example: Determine the probability of obtaining a 5 on a die and a tail on a coin in

one throw.

Solution: P( 5) =1/6 and P(T) =1/2.

P(5 and T)= P( 5) x P(T) = 1/6 x ½= 1/12.

2) Rule 2: When to events are dependent, the probability of both events occurring is

P(A and B)=P(A) x P(B|A), where P(B|A) is the probability that event B occurs given that

event A has already occurred.

Example: Find the probability of obtaining two Aces from a pack of 52 cards without

replacement.

Solution: P( Ace) =2/52 and P( second Ace if NO replacement) = 3/51

Therefore P(Ace and Ace) = P(Ace) x P( Second Ace) = 4/52 x 3/51 = 1/221

CONDITIONAL PROBABILITY

P ( A and B)

The conditional probability of two events A and B is P(A|B) = , where P(A

P( B )

and B) means the probability of the outcomes that events A and B have in common.

Example: When a die is rolled once, find the probability of getting a 4 given that an

even number occurred in an earlier throw.

Solution: P( 4 and an even number) = 1/6 ie. P(A and B) =1/6. P(even number) =3/6

=1/2.

42

P( A and B) 1 1

P( A|B) = 6

P( B) 1

2 3

EXAMPLES:

1) A bag contains 3 orange, 3 yellow and 2 white marbles. Three marbles are selected

without replacement. Find the probability of selecting two yellow and a white marble.

Solution. P( 1st Y) =3/8, P( 2nd Y) = 2/7 and P( W)= 2/6

P(Y and Y and W)=P(Y) x P(Y) x P(W) = 3/8 x 2/7 x 2/6 = 1 / 28

2) In a class, there are 8 girls and 6 boys. If three students are selected at random for

debating, find the probability that all girls.

Solution: P( G) =8/14 and P(B) =6/14. P( 1st G)=8/14, P(2nd G) 7/13 and P(3rdG)=

6/12.

P( three girls) 8/14 x 7/13 x 6/12= 2/13

Solution: 8C3 = 56 ways.

4) A box has 12 bulbs, of which 3 are defective. If 4 bulbs are sold, find the probability

that exactly one will be defective.

Solution:

P( defective bulb)= 3C1 and P( non-defective bulbs) = 9C3

3! 9!

C1 x

3 9C3 = x 252

(3 1)!1! (9 3)!3!

P( 4 bulbs from 12) = 12C4 = 495.

P( 1 defective bulb and 3 okey bulbs) = 295/495=0.509.

DO THIS

2) In how many ways can 3 pens be selected from 12 pens?

3) From a pack of 52 cards, 3 cards are selected. What is the probability that they will

all be diamonds?

Answer Key:

1). 5040

2). 220

3). 0.013

43

READ :An Introduction to Probability & Random Processes

By Kenneth B & Gian-Carlo R, pages

1.20 -1.22

Exercise Chapter 1: Sets, Events & Probability Pg 1.23-1.28

Nos. 1-12 & 14-20

2.1-2.33

Exercise Chapter 2: Finite Processes Pg 2.33 Nos. 1,2,3,13-

20, 22-27

Introduction to Probability, By Charles M. Grinstead

pages139-141

RANDOM VARIABLES

Definition: A random variable is a function that assigns a real number to

every possible result of a random experiment.

(Harry Frank & Steve C Althoen,CUP, 1994, pg 155)

A random variable is a variable in the sense that it can be used as a placeholder for a

number in equations and inequalities. Its randomness is completely described by its

cumulative distribution function which can be used to determine the probability it takes

on particular values.

real numbers. For example, a random variable can be used to describe the process of

rolling a fair die and the possible outcomes { 1, 2, 3, 4, 5, 6 }. The most obvious

representation is to take this set as the sample space, the probability measure to be

uniform measure, and the function to be the identity function.

Random variable

Some consider the expression random variable a misnomer, as a random variable is not

a variable but rather a function that maps outcomes (of an experiment) to numbers. Let

A be a σ-algebra and Ω the space of outcomes relevant to the experiment being

performed. In the die-rolling example, the space of outcomes is the set Ω = { 1, 2, 3, 4,

5, 6 }, and A would be the power set of Ω. In this case, an appropriate random variable

might be the identity function X(ω) = ω, such that if the outcome is a '1', then the

random variable is also equal to 1. An equally simple but less trivial example is one in

which we might toss a coin: a suitable space of possible outcomes is Ω = { H, T } (for

heads and tails), and A equal again to the power set of Ω. One among the many

possible random variables defined on this space is

44

Mathematically, a random variable is defined as a measurable function from a sample

space to some measurable space.

In probability theory, there are several notions of convergence for random variables.

They are listed below in the order of strength, i.e., any subsequent notion convergence

in the list implies convergence according to all of the preceding notions.

converges to the random variable in distribution if their respective

cumulative distribution functions converge to the cumulative distribution

function of , wherever is continuous.

Weak convergence: The sequence of random variables is said to

converge towards the random variable weakly if for

every ε > 0. Weak convergence is also called convergence in probability.

Strong convergence: The sequence of random variables is said to

converge towards the random variable strongly if Strong

convergence is also known as almost sure convergence.

both cases the random variables show an increasing correlation with .

However, in case of convergence in distribution, the realized values of the random

variables do not need to converge, and any possible correlation among them is

immaterial.

If a fair coin is tossed, we know that roughly half of the time it will turn up heads, and the

other half it will turn up tails. It also seems that the more we toss it, the more likely it is

that the ratio of heads:tails will approach 1:1. Modern probability allows us to formally

arrive at the same result, dubbed the law of large numbers. This result is remarkable

because it was nowhere assumed while building the theory and is completely an

offshoot of the theory. Linking theoretically-derived probabilities to their actual frequency

of occurrence in the real world, this result is considered as a pillar in the history of

statistical theory.

45

The strong law of large numbers (SLLN) states that if an event of probability p is

observed repeatedly during independent experiments, the ratio of the observed

frequency of that event to the total number of repetitions converges towards p strongly

in probability.

with probability p and 0 with probability 1-p, then the sequence of random numbers

The central limit theorem is the reason for the ubiquitous occurrence of the normal

distribution in nature, for which it is one of the most celebrated theorems in probability

and statistics.

The theorem states that the average of many independent and identically distributed

random variables tends towards a normal distribution irrespective of which distribution

the original random variables follow. Formally, let be independent random

variables with means , and variances Then the sequence of

random variables

will also be a random variable on Ω, since the composition of measurable functions is

also measurable. The same procedure that allowed one to go from a probability space

(Ω, P) to (R, dFX) can be used to obtain the distribution of Y. The cumulative distribution

function of Y is

46

Example

If y ≥ 0, then

So

PROBABILITY DISTRIBUTIONS

Certain random variables occur very often in probability theory due to many natural and

physical processes. Their distributions therefore have gained special importance in

probability theory. Some fundamental discrete distributions are the discrete uniform,

Bernoulli, binomial, negative binomial, Poisson and geometric distributions. Important

continuous distributions include the continuous uniform, normal, exponential, gamma

and beta distributions.

DISTRIBUTION FUNCTIONS

can ask questions like "How likely is it that the value of X is bigger than 2?". This is the

same as the probability of the event which is often written as

P(X > 2) for short.

yields the probability distribution of X. The probability distribution "forgets" about the

particular probability space used to define X and only records the probabilities of various

values of X. Such a probability distribution can always be captured by its cumulative

distribution function

use the random variable X to "push-forward" the measure P on Ω to a measure dF on

R. The underlying probability space Ω is a technical device used to guarantee the

47

existence of random variables, and sometimes to construct them. In practice, one often

disposes of the space Ω altogether and just puts a measure on R that assigns measure

1 to the whole real line, i.e., one works with probability distributions instead of random

variables.

Discrete probability theory deals with events which occur in countable sample

spaces.

Examples: Throwing dice, experiments with decks of cards, and random walk.

Classical definition: Initially the probability of an event to occur was defined as number

of cases favorable for the event, over the number of total outcomes possible.

For example, if the event is "occurrence of an even number when a die is rolled", the

probability is given by , since 3 faces out of the 6 have even numbers.

Modern definition: The modern definition starts with a set called the sample space

which relates to the set of all possible outcomes in classical sense, denoted by

. It is then assumed that for each element , an intrinsic

"probability" value is attached, which satisfies the following properties:

1.

2.

An event is defined as any subset of the sample space . The probability of the

event defined as

So, the probability of the entire sample space is 1, and the probability of the null event is

0.

The function mapping a point in the sample space to the "probability" value is

called a probability mass function abbreviated as pmf. The modern definition does

not try to answer how probability mass functions are obtained; instead it builds a theory

that assumes their existence.

48

CONTINUOUS PROBABILITY THEORY

Continuous probability theory deals with events which occur in a continuous sample

space.

If the sample space is the real numbers, then a function called the cumulative

distribution function or cdf is assumed to exist, which gives .

2.

3.

Whereas the pdf exists only for continuous random variables, the cdf exists for all

random variables (including discrete random variables) that take values on .

49

PROBABILITY DENSITY FUNCTION

DISCRETE DISTRIBUTION

If X is a variable that can assume a discrete set of values X 1, X2, X3,…….., Xk wih respet

to probabilities p1, p2, p3,……., pk, where p1+ p2 + p3,……., + pk = 1, we say that a

discrete probability distribution for X has been defined. The function p(X), which has

the respective values p1, p2, p3,……., pk for X= X1, X2, X3,…….., Xk is called the

probability function, or frequency function, of X. Because X can assume certain

values with given probabilities, it is often called a discrete random variable. A

random variable is also known as a chance variable or stochastic variable. { Murray

R, 2006 pg 130}

CONTINUOUS DISTRIBUTION

specified by its probability density function which is written f(x) where f(x) 0

throughout the range of values for which x is valid. This probability density function

can be represented by a curve, and the probabilities are given by the area under the

curve.

P(X)

X

a b

The total area under the curve is equal to 1. The are under the curve between the lines

x=a and x=b ( shaded) gives the probability that X lies between a and b, which can

be denoted by P(a<X<b). p(X) is called a probability density function and the

variable X is often called a continuous random variable

Since the total area under the curve is equal to 1, it follows that the probability between

a range space a and b is given by

b

P ( a X b) f ( x)dx ,

a

which is the shaded area.

50

Has the maximum towards the right of Has the maximum occurring at the right

the and the longer tail to the left end

Note: when computing area from a to b, we need not distinguish

( and ) and ( and ) inequalities. We assume the lines at a and b have no

thickness and its area is zero.

Solved Examples:

defined by

f(x) = kx(16-x2), for 0<x<4.

Evaluate:

a). The value of constant k

b). The probability of range space P(1<X<2)

c). The probability P(x 3)

Solution:

f(x)

x

a b

f(x) 0, for a x b,

b

and a f ( x )dx 1

may be taken as the probability density function (p.d.f) of a continuous random variable

in the range space a x b.

51

Procedure:

Step 1: In general, if X is a continuous random variable (r.v) with p.d.f f(x) valid

over the range a x b, then

f ( x ) dx 1

all x i.e.

b

a

f ( x ) dx 1

Step 2:

a). To determine k, we use the fact that in f(x) = kx(16-x 2), for 0<x<4, then

4

0

kx(16 x 2 ) dx 1

4

k 16 x x 3 ) dx 1

0

1

k

64

Step 3:

Solution:

2

P(1<X<2)= 1 f ( x) dx

1 2 81

64 1

(16 x x 3 ) dx

256

Step 4:

c). To find P(x 3)

1 4 49

P( x 3)

64 3

(16 x x 3 )dx

256

Example 2:

52

2). X is the continuous random variable ‘the mass of a substance, in kg, per

minute in an industrial production process’, where

1 (0 x 3)

x(6 x )

f ( x) 12

0 otherwise

Solution:

X can take values from 0 to 3 only. We sketch f(x), and shade the area

required.

1

f(x)

f ( x) x (6 x )

12

x

0 2 3

3 1

P ( x 2) x (6 x )dx

2 12

1 3

12 2

(6 x x 2 ) dx

3

1 2 x3

3 x

12 3 2

0.722 (3 d . p )

The probability that the mass is more than 2 kg is 0.722.

Worked example:

3). A continuous random variable has p.d.f f(x) where

f ( x) kx 2 , 0 x 6.

a). Find the value of k

b). Find P (2 X 4)

Solution:

a). Since X is a random variable the total probability is 1. i.e.

53

f ( x)dx 1

all

6

0

kx 2 dx 1

6

kx 3

3 1

0

216k

1

3

3

k

216

3 2 1 2

Therefore f(x)= x x , 0x6

216 72

b).

1 2

f(x) f ( x) x

72

4 1 2

P ( 2 x 4)

2 72

x dx

1

x3 4

2

216

0.259 x

0 2 4 6

Therefore the probability P (2 X 4) = 0.259

Worked Example:

4). The continuous random variable (r.v) has a probability density function(p.d.f)

where

k 0 x2

f ( x) k (2 x 3) 2 x5

0 otherwise

a). Find the value of the constant k

b). Sketch y=f(x)

c). Find P(X 1)

d). Find P(X>2.5)

Solution:

a). Since X is a r.v, then

54

all x

f ( x )dx 1

Therefore

2 5

0

kdx k ( 2 x 3)dx 1

2

kx

2

0

k x 2 3x 5

2

2k 19k 1

1

k

21

1

21 0 x2

1

(2 x 3) 2 x5

f ( x ) 21

0 otherwise

SKETCH

1

3

1

21

0

1 2 2.5 3 4 5

1 1

c). P(x 1) = area between zero and 1 = L x W= 1 x = = 0.048

21 21

55

1 1 1 2 11

=( x 2 ) + ( {0.5}{ } = 0.131

21 2 21 21 84

ICT in Graphs. The link opens up avenue for Maths teachers to learn how

to draw graphs using free software

http://www.chartwellyorke.com/

56

57

DO THIS

1). The continuous random variable X has p.d.f f(x) where f(x)= k, 0 x 3 .

a) Sketch y=f(x)

b). Find the value of the constant k

c). Find P(0.5 X 1

2). The continuous random variable has p.d.f f(x) where f(x)=kx 2, 1 x 4 .

a). Find the value of the constant

b). Find P(x 2)

c). Find P(2.5 x 3.5

3). The continuous random variable has p.d.f f(x) where

k 0 x2

f ( x) k (2 x 1) 2 x3

0 otherwise

b) Sketch y=f(x)

c) Find P(X 2 )

d) Find P(1 X 2.2)

EXPECTATION

Definition:

If X is a continuous variable (r.v) with probability density function (p.d.f) f(x), then the

expectation of X is E(X) where

E( X ) x

all x

f ( x) dx

Example:

58

1 2

1). If X is a continuous variable ( r.v) with a p.d.f f ( x) x , 0 x 3, find

16

E(X).

Solution:

E( X ) x

all x

f ( x) dx

3 1

0 16

{x} x 2 dx

3

1 x4 81

1.265

16 4 0 64

2

2). If the continuous random variable X has p.d.f f ( x) (3 x)( x 1), 1 x 3,

5

find E(X).

E( X ) x

all x

f ( x) dx

3 2

E ( x) 1 5

{x} (3 x )( x 1)dx

3

2 x4 2 x 3 3x 2

5 4 3 2 1

608

60

10.13

59

GENERALISATION:

If g( x) is any function of the continuous random variable r.v X having p.d.f f(x), then

E g ( X ) g ( x) f ( x)dx

all x

and in particular

E( X 2 ) x f x dx

2

all x

1. E (a) a

2. E ( aX ) aE ( X )

3. E ( aX b) aE ( X ) b

4. E ( f1 ( X ) f 2 ( X ) E f 2 ( X )

Example:

1

1). The continuous random variable X has p.d.f f(x) where f(x)= x, 0 x 3.

2

Find

a). E(X)

b). E(X2)

c). E(2X +3)

Solution:

a) E( X ) x

all x

f ( x) dx

3 1 2

0 2

x dx

3

1 x3

2

3 0

4.5

b)

E( X 2 ) x

2

f ( x ) dx

all x

1 3

2

0

x 3 dx

3

1 x4

2 4 0

81

10.125

8

60

c). E(2X +3) = E (2X) + 3

= 2E(X) +3

= 2(10.125)+5

= 25.25 ( from (b) above)

DO THIS

kx 0 x 1

f ( x) k 1 x 3

k ( 4 x ) 3 x5

0 otherwise

a). Find k

b) Calculate E(X)

1

2). The continuous random variable has p.d.f f(x) where f(x) = ( x 3), 0 x 5

10

a). Find E(X)

b). Find E(2X+4)

c). Find E(X2).

d). Find E( X2 + 2X – 1).

BERNOULLI DISTRIBUTION

In probability theory and statistics, the Bernoulli distribution, named after Swiss

scientist Jakob Bernoulli, is a discrete probability distribution, which takes value 1 with

success probability p and value 0 with failure probability q = 1 − p. So if X is a random

variable with this distribution, we have:

61

The expected value of a Bernoulli random variable X is , and its variance is

The kurtosis goes to infinity for high and low values of p, but for p = 1 / 2 the Bernoulli

distribution has a lower kurtosis than any other probability distribution, namely -2.

BINOMIAL DISTRIBUTION

In probability theory and statistics, the binomial distribution is the discrete probability

distribution of the number of successes in a sequence of n independent yes/no

experiments, each of which yields success with probability p. Such a success/failure

experiment is also called a Bernoulli experiment or Bernoulli trial. In fact, when n = 1,

the binomial distribution is a Bernoulli distribution. The binomial distribution is the basis

for the popular binomial test of statistical significance.

Examples

An elementary example is this: roll a die ten times and count the number of 1s as

outcome. Then this random number follows a binomial distribution with n = 10 and p =

1/6.

For example, assume 5% of the population is green-eyed. You pick 500 people

randomly. The number of green-eyed people you pick is a random variable X which

follows a binomial distribution with n = 500 and p = 0.05 (when picking the people with

replacement).

EXAMPLES:

1) A coin is tossed 3 times. Find the probability of getting 2 heads and a tail in any

given order.

FORMULA:

We can use the formula nCx. (p)x.(1-p)n-x

x = the number of successes ( 1,2,…)

p= the probability of a success.

1) nCx determines the number of ways a success can occur.

2) (p)x is the probability of getting x successes and

3) (1-p)n-x is the probability of getting n-x failures

Solution:

62

Tossing 3 times means n=3

Two heads means x=2

P(H)=1/2; P(T)=1/2

1 1

P( 2 heads) = 3C2. (

2

)2.(1- 2 )3-1 = 3(1/4)(1/2)= 3/8

DO THIS

2) Find the probability of getting 3 heads when 8 coins are tossed.

3) A bag contains 4 red and 2 green balls. A ball is drawn and replaced 4 times. What is

the probability of getting exactly 3 red balls and 1 green ball.

Answer:

1 1 5 2

1). P( one 5) = 3C1. ( ) .( 6 ) = 25/72 = 0.347 i.e n=3, x=1, p=1/6

6

1 1 5

2). P ( 3 heads) = 8C3. ( )3.( ) = 7/32 = 0.218. i.e n=8, x=3, p=1/2

2 2

2 1 1

3). P( 3 Red balls) = 4C3. ( )3.( ) = 32/81= 0.395 i.e. n=4, x=3, p=2/3

3 3

63

READ:

Lectures on Statistics, By Robert B. Ash, , page 1-4

Exercise Nos.1, 2 and 3 on pg 4.

An Introduction to Probability & Random Processes By

Kenneth B & Gian-Carlo R, pages 3.1-3.63

Exercise Chapter 3: Random Variables pg 3.64-3.82 Nos. 1-

7, 11-17, 20-24, 34-36

3. An Introduction to Probability By Charles M. Grinstead

pages 96-107, & 184

Exercise on pages 113-118 Nos. 1,2,3,4,5,8,9,10,19,20

Ref: http://en.wikipedia.org/wiki/measurable_space

Ref: http://en.wikipedia.org/wiki/Probability_theory

Ref: http://en.wikipedia.org/wiki/Bernoulli_distribution

POISSON DISTRIBUTION

distribution that expresses the probability of a number of events occurring in a fixed

period of time if these events occur with a known average rate, and are independent of

the time since the last event.

Gaussian for a Gauss or normal distribution.

The Poisson distribution is used when the variable occurs over a period of time, volume,

area etc…it can be used for the arrival of airplanes at airports, the number of phone

calls per hour for a station, the number of white blood cells on a certain area.

e x

where e is a mathematical constant = 2.7183

x!

64

Group Work

1. Study the probability computations and attempt the

given question.

Example

If there are 100 typographical errors randomly distributed. In 500 pages manuscripts

find the probability that any given page has exactly 4 errors.

Solution

100 1

Find the mean number of errors x 0.2

100 5

In other words there is an average of 0.2 errors per page. In this case 4 so the

probability of selecting a page with exactly 4 errors

e .x x

2.7183 0.2

0 .2 4

x! 41

= 0.00168

Amount 0.2%

Worked Example

A hot line with a full free number receives an average of 4 calls per hour for any given

hour. Find the probability that it will receive exactly 5 calls.

2.7183 3 5

3

e . x

x! 5!

= 0.1001

Which is 10%

DO THIS

65

1) A telephone Marketing Company gets an average of 5 orders per 1000 calls. If a

company calls 500 people find the probability of getting 2 orders.

SOLUTION

0.26

Which is 26%

READ :

An Introduction to Probability & Random Processes By

Kenneth B & Gian-Carlo R, pages187-192

problems 1,2,3 on pg 15.

Ref: http://en.wikipedia.org/wiki/Normal_distribution

GEOMETRIC DISTRIBUTION

In probability theory and statistics, the geometric distribution is either of two discrete

probability distributions:

the probability distribution of the number X of Bernoulli trials needed to get one

success, supported on the set { 1, 2, 3, ...}, or

success, supported on the set { 0, 1, 2, 3, ... }.

Which of these one calls "the" geometric distribution is a matter of convention and

convenience.

66

If the probability of success on each trial is p1, then the probability that k trials are

needed to get one success is

for k = 1, 2, 3, ....

Equivalently, if the probability of success on each trial is p0, then the probability that

there are k failures before the first success is

for k = 0, 1, 2, 3, ....

For example, suppose an ordinary die is thrown repeatedly until the first time a "1"

appears. The probability distribution of the number of times it is thrown is supported on

the infinite set { 1, 2, 3, ... } and is a geometric distribution with p1 = 1/6.

The formula for the probability that the first success occurs on the nth trial is

success and n is the trial number of the first success.

Example:

1) Find the probability that the first tail occurs on the third toss of a coin.

Solution:

The outcome of a tail on the third throw implies HHT. From (1-p)n-1p , n=3, p=1/2 and

1 3-1 1 1 1 1

therefore P(HHT) = ( 1- ) ( ) = ( ) .. ( ) ( ) =1/8

2 2 2 2 2

Flipping a coin several times we apply the geometric to distribution to get the answer of

flipping a coin several times.

Example

1) A coin is tossed find the probability that the first head occurs on the third toss

solution. Out come is TTH

n = 3 and p=1/2

67

Probability of getting 2 tails and then one head is

1 1 1 1

2 2 2 8

Or by the formula

31 2

1 1 1 1 1.

1 .

2 2 2 2 8

2) A die is rolled; find the probability of getting the first 3 on the fourth roll.

Solution:

n=4 p=1/6

4 1 3 3

1 1 5 5 1 125

1 0.96

6 6 6 6 6 1296

Example

If cards are selected from a deck and replaced, how many trials would it take on

average to get two clubs?

P (Club) = 13/52=1/4

2 4

2x 8

1 1

4

DO THIS

68

1. A card from an ordinary deck of cards is selected and then replaced with another

card selected etc… find the probability that the first club will occur on he fourth

draw.

2. A die is tossed until 5 or 6 is obtained. Find the expected number of tosses.

Answer Key:

1) Fourth

2) 3

HYPERGEOMETRIC DISTRIBUTION

probability distribution that describes the number of successes in a sequence of n draws

from a finite population without replacement.

objects in which D are defective. The hypergeometric distribution describes the

probability that in a sample of n distinctive objects drawn from the shipment exactly k

objects are defective.

parameters N, D and n, then the probability of getting exactly k successes is given by

The formula can be understood as follows: There are possible samples (without

there are ways to fill out the rest of the sample with non-defective objects.

69

When the population size is large compared to the sample size (i.e., N is much larger

than n) the hypergeometric distribution is approximated reasonably well by a binomial

distribution with parameters n (number of trials) and p = D / N (probability of success in

a single trial).

HYPERGEOMETRIC FORMULA

When there are two groups of items such that there are ‘a’ items in the first group and

‘b’ items in the second group, so that the total number of items is (a + b), the probability

of selecting x items from the first group and (n-x) items from the second group is

C C

a x . b nx

C

, where n is the total of items selected without replacement.

ab n

Examples:

1. A bag contains 3 blue chips and 3 green chips. If two chips are selected at

random, find the probability that both are blue.

Solution:

C C

a x . b nx

From the formula C

; a = 3, b= 3, x=2, n=2, n-x=2-2=0

ab n

C C

3 2 . 3 22 3 x1 1

The probability of both blue = C

15

0.2

5

33 2

of 6 men and 3 women. Find the probability that the committee consists of 2 men

and 2 women.

Solution:

So into

a=6 b=3

n = 6+3=9

Pr 6C 2 3C1 15 x3 15

0.536

9C 3 84 28

and tested find the probability that exactly one will be defective solution.

70

3 are defective 7 are good

a=3 b=7

Pr (one to be defective)

n=4 x=1

n-x=4-1=3

0. 5

10C 4 210

DO THIS

1. In a box of 10 shirts there are five (5) defective ones. If 5 shirts are sold at

random find the probability that exactly two are defective. Answer

2. In a shipment of 12 lawn chairs 86 are brown and 4 are blue. If 3 chairs are sold

at random find the probability that that all are brown.

Answer Key:

1). 0.397 2) 0.255

Group Work

Revise the following probability questions and answers

Discuss any problems encountered in the computations of

the probabilities.

1 1

P(choosing 5) = 15 =

C 5 3003

cards?

71

4

P ( A ce) P A U B P A P B P A U B

52

13 4 13 1

P s p a d e =

52 52 52 52

16 4

=

52 13

1

3) There are problems pregnant for women. The probability of dying is what is the

5

probability that at least one will die in every 5 women

5

1 50

P A P(At least one will die) = use calculator

51 51

1 50

P A 1

51 51

replacement. Think of an urn with two types of marbles, black ones and white ones.

Define drawing a white marble as a success and drawing a black marble as a failure

(analogous to the binomial distribution). If the variable N describes the number of all

marbles in the urn (see contingency table above) and D describes the number of

white marbles (called defective in the example above), then N − D corresponds to the

number of black marbles.

Now, assume that there are 5 white and 45 black marbles in the urn. Standing next to

the urn, you close your eyes and draw 10 marbles without replacement. What's the

probability p (k=4) that you draw exactly 4 white marbles (and - of course - 6 black

marbles) ?

white

4 (k) 1 = 5 − 4 (D − k) 5 (D)

marbles

72

black

6 = 10 − 4 (n − k) 39 = 50 + 4 − 10 − 5 (N + k − n − D) 45 (N − D)

marbles

can be calculated by the formula

So, the probability of drawing exactly 4 white marbles is quite low (approximately 0.004)

and the event is very unlikely. It means, if you repeated your random experiment

(drawing 10 marbles from the urn of 50 marbles without replacement) 1000 times you

just would expect to obtain such a result 4 times.

But what about the probability of drawing even (all) 5 white marbles? You will intuitively

agree upon that this is even more unlikely than drawing 4 white marbles. Let us

calculate the probability for such an extreme event.

black marbles 5 = 10 − 5 (n − k) 40 = 50 + 5 − 10 − 5 (N + k − n − D) 45 (N − D)

And we can calculate the probability as follows (notice that the denominator always

stays the same):

73

As expected, the probability of drawing 5 white marbles is even much lower than

drawing 4 white marbles.

Conclusion:

Consequently, one could expand the initial question as follows: If you draw 10 marbles

from an urn (containing 5 white and 45 black marbles), what's the probability of drawing

at least 4 white marbles? Or, what's the probability of drawing 4 white marbles and more

extreme outcomes such as drawing 5)? This corresponds to calculating the cumulative

probability p(k>=4) and can be calculated by the cumulative distribution function

(cdf). Since the hypergeometric distribution is a discrete probability distribution the

cumulative probability can be calculated easily by adding all corresponding single

probability values.

READ :

An Introduction to Probability & Random Processes By

Kenneth B & Gian-Carlo R, pages 184-195

The bivariate normal distribution is the statistical distribution with probability function

where

74

and

is the correlation of and (Kenney and Keeping 1951, pp. 92 and 202-205; Whittaker

and Robinson 1967, p. 32)

= =

And

= =

75

JOINT PROBABILITY TABLES

MARGINAL PROBABILITIES.

Let be partitioned into disjoint sets and where the general subset is denoted

. Then the marginal probability of is

READ :

An Introduction to Probability & Random Processes By

Kenneth B & Gian-Carlo R, pages 142-150

Exercise pg 150 Nos. 1,23,4,5,6,7,8,9,14,15,16,17,26.

76

REFLECTION: ICT resources are difficult to access!! The link opens up

avenue for Mathematics teachers to access ICT resources.

http://www.tsm-resources.com/suppl.html

77

UNIT 2 ( 40 HOURS):

MOMENTS

number of parameters, which also have a practical interpretation. For example, it is

often enough to know what its "average value" is. This is captured by the mathematical

concept of expected value of a random variable, denoted E[X]. Note that in general,

E[f(X)] is not the same as f(E[X]). Once the "average value" is known, one could then

ask how far from this average value the values of X typically are, a question that is

answered by the variance and standard deviation of a random variable.

class of random variables X, find a collection {fi} of functions such that the expectation

values E[fi(X)] fully characterize the distribution of the random variable X.

There are several different senses in which random variables can be considered to be

equivalent. Two random variables can be equal, equal almost surely, equal in mean, or

equal in distribution.

given below.

Equality in distribution

Two random variables X and Y are equal in distribution if they have the same

distribution functions:

Two random variables having equal moment generating functions have the same

distribution.

Equality in mean

Two random variables X and Y are equal in p-th mean if the pth moment of |X − Y| is

zero, that is,

Equality in pth mean implies equality in qth mean for all q<p. As in the previous case,

there is a related distance between the random variables, namely

78

Equality

Finally, the two random variables X and Y are equal if they are equal as functions on

their probability space, that is,

MOMENT-GENERATING FUNCTION

variable X is

moments of the probability distribution.

function is given by

moment is given by

If X has a continuous probability density function f(x) then the moment generating

function is given by

where mi is the ith moment. MX( − t) is just the two-sided Laplace transform of f(x).

79

Regardless of whether the probability distribution is continuous or not, the moment-

generating function is given by the Riemann-Stieltjes integral

If X1, X2, ..., Xn is a sequence of independent (and not necessarily identically distributed)

random variables, and

where the ai are constants, then the probability density function for Sn is the convolution

of the probability density functions of each of the Xi and the moment-generating function

for Sn is given by

Related to the moment-generating function are a number of other transforms that are

common in probability theory, including the characteristic function and the probability-

generating function.

MARKOV'S INEQUALITY

f(x)

XX | f ( x)

Markov's inequality gives an upper bound for the probability that X lies within

XX | f ( x)

In probability theory, Markov's inequality gives an upper bound for the probability that

a non-negative function of a random variable is greater than or equal to some positive

constant. It is named after the Russian mathematician Andrey Markov, although it

appeared earlier in the work of Pafnuty Chebyshev (Markov's teacher).

80

Markov's inequality (and other similar inequalities) relate probabilities to expectations,

and provide (frequently) loose but still useful bounds for the cumulative distribution

function of a random variable.

For any event E, let IE be the indicator random variable of E, that is, IE = 1 if E occurs

and = 0 otherwise. Thus I(|X| ≥ a) = 1 if the event |X| ≥ a occurs, and I(|X| ≥ a) = 0 if |X| < a.

Then, given a>0,

Therefore

Now observe that the left side of this inequality is the same as

Thus we have

READ :

Robert B. Ash, Lectures on Statistics, page 9-13

An Introduction to Probability & Random Processes By Kenneth B &

Gian-Carlo R, pages 366 -374 & 404 - 407

Exercise on pg 376 -376 Nos. 1,3,7,8

Exercise on pg 442 Nos. 1,2,3,4,5

Ref:

http://en.wikipedia.org/wiki/Moment-generating_function

http://en.wikipedia.org/wiki/characteristic_function_%28probability_

theory%29.

http://en.wikipedia.org/wiki/Integral_transform

81

CHEBYSHEV'S INEQUALITY

inequality, Chebyshev's theorem, or the Bienaymé-Chebyshev inequality), named

after Pafnuty Chebyshev, who first proved it, states that in any data sample or

probability distribution, nearly all the values are close to the mean value, and provides a

quantitative description of "nearly all" and "close to". For example, no more than 1/4 of

the values are more than 2 standard deviations away from the mean, no more than 1/9

are more than 3 standard deviations away, no more than 1/25 are more than 5 standard

deviations away, and so on.

Probabilistic statement

Let X be a random variable with expected value μ and finite variance σ2. Then for any

real number k > 0,

As an example, using k=√2 shows that at least half of the values lie in the interval (μ −

√2 σ, μ + √2 σ).

Typically, the theorem will provide rather loose bounds. However, the bounds provided

by Chebyshev's inequality cannot, in general (remaining sound for variables of arbitrary

distribution), be improved upon. For example, for any k > 1, the following example

(where σ = 1/k) meets the bounds exactly.

The theorem can be useful despite loose bounds because it applies to random variables

of any distribution, and because these bounds can be calculated knowing no more

about the distribution than the mean and variance.

Chebyshev's inequality is used for proving the weak law of large numbers.

82

Example application

For illustration, assume we have a large body of text, for example articles from a

publication. Assume we know that the articles are on average 1000 characters long with

a standard deviation of 200 characters. From Chebyshev's inequality we can then

deduce that at least 75% of the articles have a length between 600 and 1400 characters

(k = 2).

Probabilistic proof

Markov's inequality states that for any real-valued random variable Y and any positive

number a, we have Pr(|Y| > a) ≤ E(|Y|)/a. One way to prove Chebyshev's inequality is to

apply Markov's inequality to the random variable Y = (X − μ)2 with a = (σk)2.

It can also be proved directly. For any event A, let IA be the indicator random variable of

A, i.e. IA equals 1 if A occurs and 0 otherwise. Then

The direct proof shows why the bounds are quite loose in typical cases: the number 1 to

the left of "≥" is replaced by [(X − μ)/(kσ)]2 to the right of "≥" whenever the latter exceeds

1. In some cases it exceeds 1 by a very wide margin.

READ :

An Introduction to Probability & Random Processes By Kenneth B &

Gian-Carlo R, pages 305-318

Exercise on pg 309 nos. 1,2,3,4,5

Exercise on pg 320-324 Nos. 1,3,10,12

CORRELATION TYPES

Correlation is a measure of association between two variables. The variables are not

designated as dependent or independent. The two most popular correlation coefficients

are: Spearman's correlation coefficient rho and Pearson's product-moment correlation

coefficient.

When calculating a correlation coefficient for ordinal data, select Spearman's technique.

For interval or ratio-type data, use Pearson's technique.

83

The value of a correlation coefficient can vary from minus one to plus one. A minus one

indicates a perfect negative correlation, while a plus one indicates a perfect positive

correlation. A correlation of zero means there is no relationship between the two

variables. When there is a negative correlation between two variables, as the value of

one variable increases, the value of the other variable decreases, and vise versa. In

other words, for a negative correlation, the variables work opposite each other. When

there is a positive correlation between two variables, as the value of one variable

increases, the value of the other variable also increases. The variables move together.

intervals around a true correlation of zero. If your correlation coefficient falls outside of

this range, then it is significantly different than zero. The standard error can be

calculated for interval or ratio-type data (i.e., only for Pearson's product-moment

correlation).

statistic. The probability of the t-statistic indicates whether the observed correlation

coefficient occurred by chance if the true correlation is zero. In other words, it asks if the

correlation is significantly different than zero. When the t-statistic is calculated for

Spearman's rank-difference correlation coefficient, there must be at least 30 cases

before the t-distribution can be used to determine the probability. If there are fewer than

30 cases, you must refer to a special table to find the probability of the correlation

coefficient.

Example

number of salespeople and the total number of sales. They collect data for five months.

Variable 1 Variable 2

207 6907

180 5991

220 6810

205 6553

190 6190

Standard error of the coefficient = ..068

t-test for the significance of the coefficient = 4.100

Degrees of freedom = 3

Two-tailed probability = .0263

84

Another Example

Likert scale (excellent, good, fair, poor). They were also asked to judge the reputation of

the company that made the product on a three-point scale (good, fair, poor). Is there a

significant relationship between respondents perceptions of the company and their

perceptions of quality of the product?

Since both variables are ordinal, Spearman's method is chosen. The first variable is the

rating for the quality the product. Responses are coded as 4=excellent, 3=good, 2=fair,

and 1=poor. The second variable is the perceived reputation of the company and is

coded 3=good, 2=fair, and 1=poor.

Variable 1 Variable 2

4 3

2 2

1 2

3 3

4 3

1 1

2 1

t-test for the significance of the coefficient = 3.332

Number of data pairs = 7

Probability must be determined from a table because of the small sample size.

REGRESSION

Simple regression is used to examine the relationship between one dependent and one

independent variable. After performing an analysis, the regression statistics can be used

to predict the dependent variable when the independent variable is known. Regression

goes beyond correlation by adding prediction capabilities.

People use regression on an intuitive level every day. In business, a well-dressed man

is thought to be financially successful. A mother knows that more sugar in her children's

diet results in higher energy levels. The ease of waking up in the morning often depends

85

on how late you went to bed the night before. Quantitative regression adds precision by

developing a mathematical formula that can be used for predictive purposes.

For example, a medical researcher might want to use body weight (independent

variable) to predict the most appropriate dose for a new drug (dependent variable). The

purpose of running the regression is to find a formula that fits the relationship between

the two variables. Then you can use that formula to predict values for the dependent

variable when only the independent variable is known. A doctor could prescribe the

proper dose based on a person's body weight.

The regression line (known as the least squares line) is a plot of the expected value of

the dependent variable for all values of the independent variable. Technically, it is the

line that "minimizes the squared residuals". The regression line is the one that best fits

the data on a scatterplot.

Using the regression equation, the dependent variable may be predicted from the

independent variable. The slope of the regression line (b) is defined as the rise divided

by the run. The y intercept (a) is the point on the y axis where the regression line would

intercept the y axis. The slope and y intercept are incorporated into the regression

equation. The intercept is usually called the constant, and the slope is referred to as the

coefficient. Since the regression model is usually not a perfect predictor, there is also an

error term in the equation.

In the regression equation, y is always the dependent variable and x is always the

independent variable. Here are three equivalent ways to mathematically describe a

linear regression model.

y = a + bx + e

The significance of the slope of the regression line is determined from the t-statistic. It is

the probability that the observed correlation coefficient occurred by chance if the true

correlation is zero. Some researchers prefer to report the F-ratio instead of the t-

statistic. The F-ratio is equal to the t-statistic squared.

The t-statistic for the significance of the slope is essentially a test to determine if the

regression model (equation) is usable. If the slope is significantly different than zero,

then we can use the regression model to predict the dependent variable for any value of

the independent variable.

On the other hand, take an example where the slope is zero. It has no prediction ability

because for every value of the independent variable, the prediction for the dependent

variable would be the same. Knowing the value of the independent variable would not

improve our ability to predict the dependent variable. Thus, if the slope is not

significantly different than zero, don't use the model to make predictions.

86

The coefficient of determination (r-squared) is the square of the correlation coefficient.

Its value may vary from zero to one. It has the advantage over the correlation coefficient

in that it may be interpreted directly as the proportion of variance in the dependent

variable that can be accounted for by the regression equation. For example, an r-

squared value of .49 means that 49% of the variance in the dependent variable can be

explained by the regression equation. The other 51% is unexplained.

The standard error of the estimate for regression measures the amount of variability in

the points around the regression line. It is the standard deviation of the data points as

they are distributed around the regression line. The standard error of the estimate can

be used to develop confidence intervals around a prediction.

Example

expenditures and its sales volume. The independent variable is advertising budget and

the dependent variable is sales volume. A lag time of one month will be used because

sales are expected to lag behind actual advertising expenditures. Data was collected for

a six month period. All figures are in thousands of dollars. Is there a significant

relationship between advertising budget and sales volume?

4.2 27.1

6.1 30.4

3.9 25.0

5.7 29.7

7.3 40.1

5.9 28.8

Standard error of the estimate = 2.568

t-test for the significance of the slope = 4.095

Degrees of freedom = 4

Two-tailed probability = .0149

r-squared = .807

You might make a statement in a report like this: A simple linear regression was

performed on six months of data to determine if there was a significant relationship

between advertising expenditures and sales volume. The t-statistic for the slope was

significant at the .05 critical alpha level, t(4)=4.10, p=.015. Thus, we reject the null

87

hypothesis and conclude that there was a positive significant relationship between

advertising expenditures and sales volume. Furthermore, 80.7% of the variability in

sales volume could be explained

READ :

An Introduction to Probability & Random Processes By Kenneth B &

Gian-Carlo R, pages 18-30, 212-215, 300-303

Robert B. Ash, Lectures on Statistics, page 28-29.

Ref: "http://en.wikipedia.org/wiki/Correlation

Ref: "http://en.wikipedia.org/wiki/Regression

CHI-SQUARE TEST

A chi-square test is any statistical hypothesis test in which the test statistic has a chi-

square distribution when the null hypothesis is true, or any in which the probability

distribution of the test statistic (assuming the null hypothesis is true) can be made to

approximate a chi-square distribution as closely as desired by making the sample size

large enough.

differences between proportions for two or more groups in a data set.

Yates' chi-square test also known as Yates' correction for continuity

Mantel-Haenszel chi-square test

Linear-by-linear association chi-square test

distribution) is one of the most widely used theoretical probability distributions in

inferential statistics, i.e. in statistical significance tests. It is useful because, under

reasonable assumptions, easily calculated quantities can be proven to have

distributions that approximate to the chi-square distribution if the null hypothesis is true.

88

If Xi are k independent, normally distributed random variables with mean 0 and variance

1, then the random variable

The chi-square distribution has one parameter: k - a positive integer that specifies the

number of degrees of freedom (i.e. the number of Xi)

The best-known situations in which the chi-square distribution is used are the common

chi-square tests for goodness of fit of an observed distribution to a theoretical one, and

of the independence of two criteria of classification of qualitative data. However, many

other statistical tests lead to a use of this distribution.

CHARACTERISTIC FUNCTION

Properties

instance in chi-square tests and in estimating variances. It enters the problem of

estimating the mean of a normally distributed population and the problem of estimating

the slope of a regression line via its role in Student's t-distribution. It enters all analysis

of variance problems via its role in the F-distribution, which is the distribution of the ratio

of two independent chi-squared random variables divided by their respective degrees of

freedom.

89

Various chi and chi-square distributions

Name Statistic

chi-square distribution

chi distribution

READ :

Ref: http://en.wikipedia.org/wiki/pearson%chi-square_test

Ref: http://en.wikipedia.org/wiki/Chi-Square_test

STUDENT'S T-TEST

A t test is any statistical hypothesis test for two groups in which the test statistic has a

Student's t distribution if the null hypothesis is true.

History

The t statistic was introduced by William Sealy Gosset for cheaply monitoring the quality

of beer brews. "Student" was his pen name. Gosset was a statistician for the Guinness

brewery in Dublin, Ireland, and was hired due to Claude Guinness's innovative policy of

recruiting the best graduates from Oxford and Cambridge to apply biochemistry and

statistics to Guinness' industrial processes. Gosset published the t test in Biometrika in

1908, but was forced to use a pen name by his employer who regarded the fact that

they were using statistics as a trade secret. In fact, Gosset's identity was unknown not

only to fellow statisticians but to his employer—the company insisted on the pseudonym

so that it could turn a blind eye to the breach of its rules.

Today, it is more generally applied to the confidence that can be placed in judgments

made from small samples.

90

Use

A test of the null hypothesis that the means of two normally distributed

populations are equal. Given two data sets, each characterized by its mean,

standard deviation and number of data points, we can use some kind of t test to

determine whether the means are distinct, provided that the underlying

distributions can be assumed to be normal. All such tests are usually called

Student's t tests, though strictly speaking that name should only be used if the

variances of the two populations are also assumed to be equal; the form of the

test used when this assumption is dropped is sometimes called Welch's t test.

There are different versions of the t test depending on whether the two samples

are

o independent of each other (e.g., individuals randomly assigned into two

groups), or

o paired, so that each member of one sample has a unique relationship with

a particular member of the other sample (e.g., the same people measured

before and after an intervention, or IQ test scores of a husband and wife).

If the t value that is calculated is above the threshold chosen for statistical

significance (usually the 0.05 level), then the null hypothesis that the two groups

do not differ is rejected in favor of an alternative hypothesis, which typically

states that the groups do differ.

A test of whether the mean of a normally distributed population has a value

specified in a null hypothesis.

A test of whether the slope of a regression line differs significantly from 0.

Once a t value is determined, a P value can be found using a table of values from

Student's t-distribution.

sample of size n and calculate the sample's variance, s. An unbiased estimator of the

population's variance is

Clearly for small values of n this estimation is inaccurate. Hence for samples of small

size instead of calculating the z value for the number of standard deviations from the

mean

91

and using probabilities based on the normal distribution, calculate the t value

The probability that the t value is within a particular interval may be found using the t

distribution. The sample's degrees of freedom are the number of data that need to be

known before the rest of the data can be calculated.

e.g.

The samples' mean weight is 30.015 with standard deviation of 0.045. With the mean

and the first five weights it is possible to calculate the sixth weight. Consequently there

are five degrees of freedom.

The t distribution tells us that, for five degrees of freedom, the probability that t > 2.571

is 0.025. Also, the probability that t < −2.571 is 0.025. Using the formula for t with t = ±

2.571 a 95% confidence interval for the populations mean may be found by making μ

the subject of the equation.

i.e.

READ :

Introduction to Probability By Charles M. Grinstead, pages 18-30,

212-215, 300-303

Robert B. Ash, Lectures on Statistics, page 23-29.

Answer problems 1- 6 on pg 23.

Ref: http://en.wikipedia.org/wiki/Statistical_Hypothesis_testing

Ref: http://en.wikipedia.org/wiki/Null_hypothesis

92

REFLECTION: The study of Correlation, Regression Hypothesis testing

and other Mathematical modelling maybe simplified through ICT. The

following link enables trainees to learn modelling with ease

http://www.ncaction.org.uk/subjects/maths/ict-lrn.htm

93

UNIT 3 ( 40 HOURS): PROBABILITY THEORY

INDICATOR FUNCTION

defined on a set X that indicates membership of an element in a subset A of X.

defined as

BONFERONI INEQUALITY

Let be the probability that is true, and be the probability that at least one

of , , ..., is true. Then "the" Bonferroni inequality, also known as Boole's inequality,

states that

where denotes the union. If and are disjoint sets for all and , then the inequality

becomes an equality. A beautiful theorem that expresses the exact relationship between

the probability of unions and probabilities of individual events is known as the inclusion-

exclusion principle.

94

GENERATING FUNCTION

encode information about a sequence an that is indexed by the natural numbers.

functions, exponential generating functions, Lambert series, Bell series, and

Dirichlet series; definitions and examples are given below. Every sequence has a

generating function of each type. The particular generating function that is most useful

in a given context will depend upon the nature of the sequence and the details of the

problem being addressed.

argument x. Sometimes a generating function is evaluated at a specific value of x.

However, it must be remembered that generating functions are formal power series, and

they will not necessarily converge for all values of x.

If an is the probability mass function of a discrete random variable, then its ordinary

generating function is called a probability-generating function.

indexes. For example, the ordinary generating function of a sequence am,n (where n and

m are natural numbers) is

defines its probability distribution. On the real line it is given by the following formula,

where X is any random variable with the distribution in question:

where t is a real number, i is the imaginary unit, and E denotes the expected value.

the Riemann-Stieltjes integral

95

In cases in which there is a probability density function, fX, this becomes

to be a dot product.

integrating a bounded function over a space whose measure is finite.

characteristic function of a distribution F, then Fn(x) converges to F(x) at every value of

x at which F is continuous.

Characteristic functions are particularly useful for dealing with functions of independent

random variables. For example, if X1, X2, ..., Xn is a sequence of independent (and not

necessarily identically distributed) random variables, and

where the ai are constants, then the characteristic function for Sn is given by

characteristic function:

Observe that the independence of X and Y is required to establish the equality of the

third and fourth expressions.

96

Because of the continuity theorem, characteristic functions are used in the most

frequently seen proof of the central limit theorem.

Characteristic functions can also be used to find moments of random variable. Provided

that nth moment exists, characteristic function can be differentiated n times and

READ

Robert B. Ash, Lectures on Statistics, page 32 of 45:

Ref "

http://en.wikipedia.org/wiki/Characteristic_function_%28probability_

theory%29

STATISTICAL INDEPENDENCE

In probability theory, to say that two events are independent intuitively means that the

occurrence of one event makes it neither more nor less probable that the other occurs.

For example:

The event of getting a "6" the first time a die is rolled and the event of getting a

"6" the second time are independent.

By contrast, the event of getting a "6" the first time a die is rolled and the event

that the sum of the numbers seen on the first and second trials is "8" are

dependent.

If two cards are drawn with replacement from a deck of cards, the event of

drawing a red card on the first trial and that of drawing a red card on the second

trial are independent.

By contrast, if two cards are drawn without replacement from a deck of cards,

the event of drawing a red card on the first trial and that of drawing a red card on

the second trial are dependent.

Similarly, two random variables are independent if the conditional probability distribution

of either given the observed value of the other is the same as if the other's value had

not been observed.

97

INDEPENDENT EVENTS

Here A ∩ B is the intersection of A and B, that is, it is the event that both events A and

B occur.

More generally, any collection of events -- possibly more than just two of them -- are

mutually independent if and only if for any finite subset A1, ..., An of the collection we

have

If two events A and B are independent, then the conditional probability of A given B is

the same as the "unconditional" (or "marginal") probability of A, that is,

There are at least two reasons why this statement is not taken to be the definition of

independence: (1) the two events A and B do not play symmetrical roles in this

statement, and (2) problems arise with this statement when events of probability 0 are

involved.

RANDOM SAMPLE

one chosen by a method involving an unpredictable component. Random sampling can

also refer to taking a number of independent observations from the same probability

98

distribution, without involving any real population. A probability sample is one in which

each item has a known probability of being in the sample.

The sample will usually not be completely representative of the population from which it

was drawn— this random variation in the results is known as sampling error. In the case

of random samples, mathematical theory is available to assess the sampling error.

Thus, estimates obtained from random samples can be accompanied by measures of

the uncertainty associated with the estimate. This can take the form of a standard error,

or if the sample is large enough for the central limit theorem to take effect, confience

intervals may be calculated.

A simple random sample is selected so that every possible sample has an equal

chance of being selected.

A self-weighting sample, also known as an epsem sample, is one in which every

individual, or object, in the population of interest has an equal opportunity of

being selected for the sample. Simple random samples are self-weighting.

Stratified sampling involves selecting independent samples from a number of

subpopulations (or strata) within the population. Great gains in efficiency are

sometimes possible from judicious stratification.

Cluster sampling involves selecting the sample units in groups. For example, a

sample of telephone calls may be collected by first taking a collection of

telephone lines and collecting all the calls on the sampled lines. The analysis of

cluster samples must take into account the intra-cluster correlation which reflects

the fact that units in the same cluster are likely to be more similar than two units

picked at random.

MULTINOMIAL DISTRIBUTION

distribution.

independent Bernoulli trials, with the same probability of "success" on each trial. In a

multinomial distribution, each trial results in exactly one of some fixed finite number k of

possible outcomes, with probabilities p1, ..., pk (so that pi ≥ 0 for i = 1, ..., k and

), and there are n independent trials. Then let the random variables Xi

indicate the number of times outcome number i was observed over the n trials.

follows a multinomial distribution with parameters n and p.

99

SOLUTIONS FROM MULTINOMIAL DISTRIBUTION FORMULA

A short version of the multinomial formula for three consecutive outcomes is given

below.

If X consists of events E1, E 2, E3, which have the corresponding probabilities of p 1, p2,

and p3 of occurring, where x1 is the number of times E 1 will occur, x2 is the number

of times E2 will occur, and x3 is the number of times E 3 will occur, then the

probability of X is

n! x1 x2 x3

. p .p .p where x1 + x2 + x3 = n and p1 + p2 + p3 = 1

x1 ! x 2 ! x3 ! 1 2 3

Example:

1) In a large city, 60% of the workers drive to work, 30% take the bus, and 10% take

the train. If 5 workers are selected at random, find the probability that 2 will drive, 2

will take the us, and 1 will take the train.

Solution:

n= 5, x1=2, x2 = 2, x3= 1 and p1=0.6, p2= 0.3, and p3 = 0.1

Hence, the probability that 2 workers will take the bus, and one will take the train is

5! 2 2 1

. (0.6) (0.3) (0.1) 0.0972

2 ! 2 !1 !

2) A box contains 5 red balls, 3 blue balls, and 2 white balls. If 4 balls are selected with

replacement, find the probability of getting 2 red balls, one blue ball, and one white

ball.

Solution:

5 3 2

n=4, x1=2, x2=1, x3=1, and p1= , p 2= , and p3= . Hence, the probability of

10 10 10

getting 2 red balls, one blue ball, and one white ball is

2 1 1

4! 5 3 2 3 9

12 0.18

2! 1! 1! 10 10 10 200 50

{ Allan G, 2005, pg 132}

ORDER STATISTIC

= 3.

In statistics, the kth order statistic of a statistical sample is equal its kth-smallest value.

Together with rank statistics, order statistics are among the most fundamental tools in

non-parametric statistics and inference.

100

Important special cases of the order statistics are the minimum and maximum value of a

sample, and (with some qualifications discussed below) the sample median and other

sample quartiles.

When using probability theory to analyse order statistics of random samples from a

continuous distribution, the cumulative distribution function is used to reduce the

analysis to the case of order statistics of the uniform distribution.

READ :

Robert B. Ash, Lectures on Statistics, page 25 -26 and Answer

problems 1-4 on pg 26/27.

Ref: http://en.wikipedia.org/wiki/Ranking

Ref: http://en.wikipedia.org/wiki/non-parametric_Statistics

For example, suppose that four numbers are observed or recorded, resulting in a

sample of size n = 4. If the sample values are

6, 9, 3, 8,

where the subscript i in xi indicates simply the order in which the observations were

recorded and is usually assumed not to be significant. A case when the order is

significant is when the observations are part of a time series.

where the subscript (i) enclosed in parentheses indicates the ith order statistic of the

sample.

The first order statistic (or smallest order statistic) is always the minimum of the

sample, that is,

101

where, following a common convention, we use upper-case letters to refer to random

variables, and lower-case letters (as above) to refer to their actual observed values.

Similarly, for a sample of size n, the nth order statistic (or largest order statistic) is the

maximum, that is,

The sample range is the difference between the maximum and minimum. It is clearly a

function of the order statistics:

A similar important statistic in exploratory data analysis that is simply related to the

order statistics is the sample interquartile range.

The sample median may or may not be an order statistic, since there is a single middle

value only when the number n of observations is odd. More precisely, if n = 2m + 1 for

some m, then the sample median is X(m + 1) and so is an order statistic. On the other

hand, when n is even, n = 2m and there are two middle values, X(m) and X(m + 1), and the

sample median is some function of the two (usually the average) and hence not an

order statistic. Similar remarks apply to all sample quantiles.

called a multivariate Gaussian distribution, is a specific probability distribution, which

can be thought of as a generalization to higher dimensions of the one-dimensional

normal distribution (also called a Gaussian distribution).

Higher moments

where

(a) If k is odd, .

102

where the sum is taken over all allocations of the set into λ (unordered)

λ−1

pairs, giving (2λ − 1)! / (2 (λ − 1)!) terms in the sum, each being the product of λ

covariances. The covariances are determined by replacing the terms of the list

by the corresponding terms of the list consisting of r1 ones, then r2 twos,

etc, after each of the possible allocations of the former list into pairs.

For fourth order moments (four variables) there are three terms. For sixth-order

moments there are 3 × 5 = 15 terms, and for eighth-order moments there are 3 × 5 × 7

= 105 terms.

103

SYNTHESIS OF THE MODULE

At the end of this module learners are expected to compute various measures of

dispersions and apply the laws of probability to various probability distributions. The

learners should be able to solve various coefficients of correlation and regression. Unit

one of Probability and Statistics covers Frequency distributions relative and cumulative

distributions, various frequency curves, mean, Mode Median. Quartiles and Percentiles,

Standard deviation, symmetrical and skewed distributions. The learner is introduced to

various statistical measures and guided examples.

The examples are well illustrated and learners can follow without difficulty. It is

recommended that learners attempt the formative evaluations given to assess their

progress in the understanding of the content. Learners should take time to check the

recommended reference material in CD’S, attached open source materials and the

recommended websites. Most importantly, learners are encouraged to read the content

widely and attempt the questions after each topic. Unit two of the module takes learners

through Moment and moment generating function, Markov and Chebychev inequalities,

special Univariate distributions, Bivariate probability distribution; Joint Marginal and

conditional distributions; Independence; Bivariate expectation Regression and

Correlation; Calculation of regression and correlation coefficient for bivariate data.

Distribution function of random variables, Bivariate normal distribution. Derived

distributions such as Chi-Square, t and F.

Unit two has various learning activities to aid learning and learners are advised to

master the content of the various sub-topics and assess themselves through the

formative evaluations. Failure to answer the formative assessments should be a positive

indicator that learners should revise the sub-topics before progressing to other sub-

topics. The tasks given under the different learning activities demands that you

demonstrate a high level of ICT skills competency. The learning objectives are well

stated in the beginning of the module and should guide learners in the level of

expectations of the module.

Unit three focuses on probability theory and concentrates on the various probability

distributions.

The summative evaluation will be used to judge the learners mastery of the module. It is

recommended that learners revise the module before sitting for the final summative

evaluation.

104

SUMMATIVE EVALUATION

Answer Any Four Questions. Each question carries 15 marks.

1) In the following table, the weights of 40 cows are recorded to the nearest kilogram.

157 138 150 147 140 125 144 173

144 146 140 176 154 148 163 164

135 146 142 142 149 119 134 158

165 168 138 147 152 153 136 126

Find;

a). the highest weight

b). the least weight

c). the range

d). construct a frequency distribution table starting with a class of 118-126

e). calculate the mean of the data

f). calculate the standard deviation

2) A). A coin and a die are thrown together. Draw a possibility space diagram and find

the probability of obtaining:

a). a head

b). a number greater than 4

c). a head and a number greater than 4

d). a head or a number greater than 4

19 2 4

B). Events M and N are such that P(M) = , P(N) = and P(M N)= . Find

30 5 5

P(M N).

3) A book contains 500 pages and has 750 misprints.

a). What is the average number of misprints per page?

b) Find the probability that page 427 contains

i). no misprints

ii). exactly 4 misprints

c). find the probability that pages 427 and 428 will contain no misprints

105

Question 4: Continuous Random Variable

4) A continuous random variable (r.v) X has a probability density function (p.d.f) f(x)

where 2

k(x 2) 2 x 0

4k 0 x 1 1 3

f ( x)

0 otherwise

a) Find the value of the constant k

b) Sketch y=f(x)

c) Find P( - 1 X 1)

d) Find P(x>1)

Probability of an event

5). Given that P(AUB) =7/8, P(A B)=1/4 and P(A’)=5/8, find the values of

a) P(A)

b) P(B)

c) P(A B’)

d) P(A’U B’)

e) The probability that only one of A, B will occur.

Expected Value

6). The continuous random variable r.v has the p.d.f

1

f ( x) x 0 x 1

2

Find:

a). E(X)

b). E(24X +6)

1

c). E( 1-X) 2

7). The masses, to the nearest kg, of 50 boys are recorded below.

Frequency (f) 2 6 12 14 10 6

a). Construct a cumulative frequency curve

b). Use the curve to estimate the ;

i) Median

ii). Interquartile range

iii). 7th decile

iii). 60th percentile.

106

MARKING SCHEME OF SUMMATIVE EVALUATION

1). a) 176

b) 119

c) 176-119=57

d) Using 7 classes gives us a class interval of 9

118-126 /// 3

127-135 //// 5

154-162 //// 5

163-171 //// 4

172-180 // 2

Total 40

f). Accept any method of calculating the standard deviation

2) A). A coin has either Head(H) or Tail(T) while a die has sides 1,2,3,4,5&6.

Coin / Die 1 2 3 4 5 6

Coin H H1 H2 H3 H4 H5 H6

Sample Coin T T1 T2 T3 T4 T5 T6 space=12.

a). 6/12=1/2

b). 4/12=1/3

c). 2/12=1/6

d). 8/12=2/3

4 19 2

= + - P(M N).

5 30 5

107

19 12 24 7

P(M N) = +

30 30 30 30

b) Let X be ‘the number of misprints per page’. Then, assuming that misprints

occur at random, X ~ P0(1.5)

i). P(X= 0) = e-1.5

= 0.2231…

P(there will be no misprints on page 427) = 0.223 ( 3d.p).

(1.5) 4

ii). P(X=4)= e-1.5 = 0.0470…

4!

P( there will be 4 misprints on page 427) = 0.047 ( 3d.p)

c). We expect 1.5 misprints on each page and so on two pages 427 & 428 we

expect 1.5 + 1.5 = 3 misprints.

Let Y be the ‘’number of misprints on two pages’’

Y ~ P(3), so P0(Y=0)= e-3

= 0.4421

all x

1

1

0

Therefore k ( x 2) 2 dx 34kdx 1

2 0

0 1 1

k

( x 2) 3 4k x 3 1

3 2 0

k 4

(8) 4k 1

3 3

8k=1

1

k=

8

b) The p.d.f of X is

1 2

8 (x 2) 2 x 0

1

0 x 1 13

f ( x)

20 otherwise

y

x

108

-2 0 11

3

c)

x

-1

-2 0 11

3

0

1 7

P(- 1 x 0 ) 8 ( x 2) dx 24

2

1

and

1

P (0 x 1) area of rec tan gle

2

Therefore

7 1 19

P( 1 X 1)

24 2 24

109

d).

x

-2 0 1 1 1

3

1 1 1

P(0 X 1) = area of rectangle= .

3 2 6

1

Therefore P(x>1) =

6

5) a) P(A)=1-P(A’)=1- 5/8=3/8

b) P(AUB)=P(A) – P(B) – P(A B)

7/8=3/8+P(B) – ¼

P(B)=3/4

c) P(A B’)=P(A) – P(A B)

= 3/8-1/4

=1/8

d) A’ U B’ = (A B)’ and P(A’U B’) = 1 – P(A B) = 3/4

P(only one of A,Boccurs) = P(A B’)+P(A’ B)

= { P(A)-P(A B)} + { P(B)-P(A B)}

= 1/8 + ½ =5/8

6). a). E(X)=7/8

b). E(24X+6)=20

1 1 1 3

= (1 x) 2 ( x )dx

1

c). E( 1-X) 2

0 2 5

7). a) Medium= 76.3 kg.

b). Interquartile range = 9 kg

7

c). Estimate of 50 35th decile from curve .

10

60

d). Estimate of 50 30 th percentile from curve

100

110

REFERENCES

1) http://en.wikipedia.org/wiki/Statistics

2) A concise Course in A-Level Statistics By J. Crawshaw and J.Chambers, Stanley

Thornes Publishers, 1994

3) http://en.wikipedia.org/wiki/Probability

4) Business Calculation and Statistics Simplified, By N.A. Saleemi, 2000

5) http://microblog.routed.net/wp-content/uploads/2007/01/onlinebooks.html

6) Statistics: concepts and applications, By Harry Frank and Steven C Althoen,

Cambridge University Press, 2004

7) http://mathworld.wolfram.com/Statistics

8) http://mathworld.wolfram.com/Probability

9) probability Demystified, By Allan G. Bluman, McGraw Hill, 2005.

10)http://directory.fsf.org/math/

11) http://microblog.routed.net/wp-content/uploads/2007/01/onlinebooks.html

12)Lectures on Statistics, By Robert B. Ash, 2005.

13)Introduction to Probability, By Charles M. Grinstead and J. Laurie Snell, Swarthmore

College.

14)http://directory.fsf.org/math/

15)Simple Statistics, By Frances Clegg, Cambridge University Press 1982.

16)Statistics for Advanced Level Mathematics, By I. Gwyn Evans University College of

Wales, 1984.

Module Developer Writing Tip. Though for most modules the final mark (for one module) will be closely linked to the summative

evaluation, it is often wise to mark or give points for the completion of other activities or formative evaluations. Module Developers

are therefore required to provide a clearly laid out “My Records” spreadsheet page that includes:

- Organized columns for entry of “future students” ;

- Organized columns for entry of all required marks ;

- Calculated columns to indicate overall achievement.

- Module Developers should provide the name of the EXCEL file.

Name of the EXCEL file : Mathematics: Probability and Statistics Student Records

111

MAIN AUTHOR OF THE MODULE

Contact: paulamoud@yahoo.com

The module author is a teacher trainer at Amoud University,

Borama, Republic of Somaliland.

He has been a teacher trainer in Kenya, Republic of

Seychelles, and Somalia. He has been involved in

strengthening Mathematics and Sciences at secondary and

tertiary levels under the Japan International Corporation

Agency (JICA) programme in fifteen African countries.

He is married with three children.

Module Developer Writing Tip. The file naming and structure must follow the AVU/PI Consortium template as defined and

explained by the AVU. Module Developers still need to provide the name of all the files (module and other files accompanying the

module).

Daily, each module will be loaded in the personal eportfolio created for each consultant. For this, training will be provided by

professor Thierry Karsenti and his team (Salomon Tchaméni Ngamo and Toby Harper).

Name of the module (WORD) file : Mathematics: Probability and Statistics ( Word)

Name of all other files (WORD, PDF, PPT, etc.) for the module.

2. Probability and Statistics: Marking Scheme for Summative Evaluation ( Word)

3. An Introduction to Probability and Random Processes, Textbook by Kenneth

Baclawski and Gian-Carlo Rota ( 1979) ( PDF)

4. Introduction To Probability, Textbook by Charles M. Grinstead and J. Laurie Snell

(PDF)

5. Lectures on Statistics, Textbook by Robert B. Ash (PDF)

112