Sunteți pe pagina 1din 275





Bio - Lecture note, G/her B (BSc, MSc, AssProf) 1

Instructor Address:

Name: Gerezgiher B. (MSc, Asst. Prof., PhD fellow)


Cell phone: +251910903186

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 2

Statistics- is the process of scientifically collecting,
organizing, analyzing and interpreting data, and
the drawing of inferences about a body of data when
only part of the data are observed.
Biostatistics- It is a special statistics in which the
data being analyzed are derived from biological and
medical science

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 3

Statistical data: Information that is systematically
analysis for result interpretation to draw conclusions.
The information about which we are concerned is called
Data: aggregate of variables as a result of measurement
or counting.
Parameter: A descriptive measure computed from the
data of a population. E.g. population mean (),
population variance, population standard deviation,
Statistic: A descriptive measure computed from the data
of a sample.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 4
descriptive stastics
relational stastics and
inferential statistics
1.Descriptive stastics;fall into one of two categories;
measure of central tendency( mean, median, and mode)
measures of dispersion ( standard deviation and variance).
Their purpose is to explore hunches that may have come up
during the course of the reasearch process

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 5

Types of data:
Primary data: collected from the items or individual
respondents directly for the purpose of certain
Secondary data: which had been collected by certain
agency, and used for other purpose.
E.g. reports and records of health institutions, vital registration of a municipality.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 6


1. Categorical (Qualitative) variable:

The notion of magnitude is absent or implicit.

Nominal: have distinct levels, & no inherent ordering

When only with two categories, are called binary or
dichotomous, otherwise called polythumous
E.g. Sex: male or female , Color: black, red white ---
Ordinal: have levels that do follow a distinct ordering.

E.g. severity of pain (mild, moderate severe)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 7

2. Numeric (Quantitative ) variable:
Variable that has magnitude
Discrete data: numbers represent actual measurable quantities
Restricted to only specified values (integers) that differ by fixed amounts.
E.g. Number of new AIDS cases reported
Continuous data: represent measurable quantities but are not restricted to taking on certain
specific values i.e. fractional values are possible.
Can use interval (no true zero value) or ratio scale (begins at
E.g. weight, cholesterol level, time, temperature

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 8


Measurement: the assignment of numbers or

names or events according to a set of rules:
Measuring an individuals weight is
qualitatively different from measuring
their response to some treatment on a
three category of scale, improved,
stable, not improved.
Measuring scales are different according to
the degree of precision involved.
There are four types of scales of
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 9
Types of measurement scales:

Nominal scale: uses names, labels, or symbols to

assign each measurement to one of a limited
number of categories that cannot be ordered.
Examples: Blood type, sex, race, marital status
Ordinal scale: assigns each measurement to one of a
limited number of categories that are ranked in
terms of a graded order.
Examples: Patient status, Cancer stages

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

Types of measurement scales

Interval scale: assigns each measurement to one of

an unlimited number of categories that are
equally spaced. It has no true zero point.
Example: Temperature measured on Celsius or
Ratio scale: measurement begins at a true zero point
and the scale has equal space.
Examples: Height, weight, blood pressure

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12
Before any statistical works to be done, data must be
Data collection is a way of collecting information for
statistical use.
Source of data:
Primary data: collected from the items or individual respondents directly for the purpose of certain
Secondary data: which had been collected by certain agency, and used for other purpose.

Eg. reports and records of health institutions, vital

registration of a municipality.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Observation: using guiding checklists

it can be a kind of interview (face to

Questionnaire based:

face, in-depth, Phone) or sending in E-mail,

Focus group discussions (FGD) check list may be used

Documentary sources:

it is called secondary data

E.g. life histories, Case studies

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

Advantages of observation:
Gives relatively more accurate data on behavior and activities
Disadvantages of observation :
Investigators or observers own biases, desires etc.
Need more resources and skilled human power during the use of high level machines.
Risk for observers

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15

Usually used for community survey, in collecting data
about Community Health Status.
The Quality of information Which will be gather
from Depends up on the quality of questionnaire
Probably the most commonly used research data
collection techniques.
It could be:

Self administered questionnaires

Interview questionnaire
Mailed, SMS, Phone questionnaires ,

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

There are basically two types of questions:
Open ended questions:
Respondent is free to use once own words to reply
(not given any possible answers to choose from)
E.g. What do you think are the reasons for a high
drop-out rate of village health committee
Closed questions:
Respondent is provided some fixed answers and is
asked to choose one out of a list of possible answers.
Are useful if possible response is known, one is only
interested in certain aspects
E.g. Sex: 1. Male 2. Female

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Use of Documentary Sources:
Clinical and other Personal records, death certificates, published mortality statistics, census,
publication etc.
Is less time consuming and relatively has low cost.
Care should be taken on quality
It is questionable in the completeness of the data.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18


Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Metods of data presentation
For the piramary objactive of this different techniques of
data organization and presentation like order
arry,tables,diagram,and graphs are used.
This is kind of describing variables depending on the type of
- Numerical-most in the aspect of quantitivedata or
-categorical-most in the aspect of qualitative data.
Some times we transform numeric data in to
when lesser degree detail is required

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Categorical variables: can be described as
Table of frequency distributions
Relative frequency
Cumulative frequencies
Bar charts
Pie charts

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Numerical Variables: can be described as

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Frequency Distribution:
Is a simple and effective way of summarizing categorical data
This is done by counting the number of observations falling into each of the categories, or
levels of the variables.
Frequency distribution table: A table which involves a listing of all observed values of the
variable being studied and how many times each value is observed (table 1).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Relative Frequency:
The distribution of proportions or percentages of observations is called the relative frequency
distribution of the variable.
Given a total number of observations, relative frequency distribution is easily derived from
frequency distribution.
The third column of Table 1 shows the relative frequency distribution of birth weight

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Cumulative frequency: Two other distributions are
useful describing ordinal data.
The cumulative frequency of a category is the number
of observations in the category plus observations in all
categories smaller than it. = ORDINAL
The cumulative relative frequency is the proportion of
observations in the category plus observations in all
categories smaller than it, and is obtained by dividing
the cumulative frequency by the total number of
observations (table 1).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Table 1. Distribution of birth weight of newborns
between 1976-1996 at X.

BWT Freq. R. Freq(%) Cum. Freq Cum.rel.freq.

Very low 43 0.4 43 0.4
Low 793 8.0 836 8.4
Normal 8870 88.9 9706 97.3
Big 268 2.7 9974 100
Total 9974 100

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Bar graph(diagram)
Is easiest & most adaptable general purpose chart.
Though can be used for any type of series, it is especially satisfactory for nominal and
ordinal data.
Categories on X-axis at regular interval and corresponding frequencies on the Y-axis
(ordinate) in case of vertical bar diagram

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27

Constructing bar graph:
All bars in single diagram should be of the same width & equal distances
Should rest on the same line called the base
Types of bar graph:
Simple bar graph: It is one-dimensional diagram in which the bar represents the whole of the

The height indicates frequency of figures


Bio - Lecture note, G/her B (BSc, MSc, AssProf) 28

Table 2, Distribution of pediatric patients in X hospital
ward by type of admitting diagnosis Jan, 2000

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 29


Bio - Lecture note, G/her B (BSc, MSc, AssProf) 30

Sub-divided (component or segmented) bar
If there are different quantities forming the subdivisions of the totals, simple bars may be
subdivided in the ratio of the various subdivisions to exhibit the relationship of the parts to
the whole.
The order in which the components are shown in a "bar" is followed in all bars used in the

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 31

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 32
Gender Dss
p m Card
Gender M 5 9 5
F 6 2 7
Total 11 11 12

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 33

Multiple bar graph:
Multiple Bar diagrams can be used to represent the relationships among more than two
We can see from the graph quickly that the prevalence of the system increases both with the
child's smoking and with that of their parents.
The following figure shows the relationship between childrens reports of breathlessness and
cigarette smoking by themselves and their parents.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 34

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 35
Pie chart:

Pie chart shows the relative frequency for each

category by dividing a circle into sectors, the angles
of which are proportional to the relative frequency.
Steps to construct a pie-chart:
Construct a frequency table
Change the frequency into percentage (P)
Change the percentages into degrees, where: degree = Percentage X 3600
Draw a circle and divide it accordingly

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 36

Table 3: Distribution of death for females, in England
and Wales, 1989.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 37

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 38
Histograms are frequency distributions with continuous class interval that have been turned into
To construct a histogram, draw interval boundaries on a horizontal line and frequencies on a
vertical line.
Non-overlapping intervals that cover all of the data values must be used.

Bars drawn over intervals that areas of bars are

proportional to their interval frequencies.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 39

Table 4: Distribution of the RBC cholinesterase values
(mol/min/ml) obtained from 35 workers Exposed to

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 40

Histogram of the RBC cholinesterase values of 35

pesticide exposed workers

Number of pesticide exposed workers





6.95 8.95 10.95 12.95 14.95 16.95

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 41

RBC choilinesterase(umol/min/ml)
Frequency polygon:
A frequency distribution can be portrayed graphically in yet another way by means of a
frequency polygon.
To draw a frequency polygon we connect the mid-point of the tops of the cells of the histogram
by a straight line.
It can be also drawn without erecting rectangles as follows:

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 42

The scale should be marked in the numerical values of the mid-points of intervals.
Erect ordinates/y-axis/ on the mid-point of the interval-the length or altitude of an ordinate
representing the frequency of the class on whose mid-point it is erected.
Join the tops of the ordinates and extend the connecting line to the scale of sizes.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 43

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 44
Cumulative frequency polygon (ogive curve):
Some times it may become necessary to know the number of items whose values are more or
less than a certain amount.
We may, for example, be interested in knowing the number of patients whose weight is less
than 50 Kg or more than say 60 Kg.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 45

To get this information it is necessary to change the form of the frequency distribution from a
simple to cumulative' distribution.
Use lower true boundary on x axis
Ogive curve turns a cumulative frequency distribution in to graphs.
To know percentile _________OGIVE CURVE

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 46

Table 5: Heart rate of patients admitted to Hospital B,

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 47

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 48

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 49

A quantitative measure of uncertainty, measure of

the degree of chance or likelihood of occurrence of

an uncertain event.
A measure of the strength of belief in the
occurrence of an uncertain event.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 50

Mutually exclusive events: Events that
cannot occur together. E.g. event A=Male
and B=Pregnant are two mutually exclusive events

Independent events: The presence or

absence of one does not alter the
chance of the other being present.
Probability: If an event can occur in N
mutually exclusive and equally likely
ways, and if m of these possess a
characteristic E, the probability of the
occurrence of E is P(E) = m/N.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 51

Properties and rule of probability:
1. probability value must lie between 0 and 1,
0P(E)1 (or between 0% and 100%) .
A value 0 = the event can not occur
A value 1 = the event definitely will occur
A value 0.5 = the probability that the event will occur
is the same as probability that it will not occur.
Probability is measured on a scale from 0 to 1.0

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 52

2. The sum of the probabilities of all
mutually exclusive outcome is equal to
P(E1) + P(E2) + .... + P(En) = 1

3. For any two events A and B,

P(A or B) = P(A) + P(B) -P(A and B)
(Addition rule)
For two mutually exclusive events A and B,
P(A or B ) = P(A) + P(B).

4. For any two independent events A and

P(A and B) = P(A) P(B).
(Multiplication rule)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 53

To calculate the probability of event (A) and event (B)
happening (independent events).
For example, if you have two identical packs of cards
(pack A and pack B),what is the probability of
drawing the ace of spades from both packs?(pack
=52 cards)
Formula: P(A) x P(B)
P(pack A) = 1 card, from a pack of 52 cards = 1/52 = 0.0192
P(pack B) = 1 card, from a pack of 52 cards = 1/52 = 0.0192
P(A) x P(B) = 0.0192 x 0.0192 = 0.00037

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 54

Classical Probability: The probability of an event is
the events long run relative frequency in repeated
trials under similar conditions.
If some process is repeated n times, and some event E occurs m times, the relative frequency of E
(m/n) will be approximately equal to the probability of E.

Symbolically: Pr(E) = m/n

E.g. Suppose that of 158 people who attended a dinner party, 99 were ill due to food poisoning. The
probability of illness for a person selected at random is Pr (illness) = 99/158 = 0.63 or 63%

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 55

Randomness: We call a phenomenon random if:
The exact outcome is not predictable in advance.
Nonetheless, there is a predictable long term pattern that can be described by the distribution of
outcomes of many trials.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 56

Subjective probability: measures the confidence
that an individual has in the truth of a particular
E.g. If some one says that he is 95% certain that a cure for AIDS will be discovered within 5
years, then he means that Pr(discovery of cure for AIDS within 5 years) = 95%.

Although the subjective view of probability has

enjoyed increased attention over the years, it has
not been fully accepted by scientists.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 57

Conditional probability: Sometimes the set of all
possible outcomes may constitute a subset of the total
In other words the interest is determining the
probability that an event A will occur, given that
another event B has already taken place.
E.g. Pr(a person living 65 years given that he is 60)
Symbolically Pr(A\B)
Multiplication Property: both A & B occur dependent
Pr(A and B) = Pr (AB) = Pr(A) Pr(B\A)
Pr (B\A) = Pr (AB) Pr(A\B) = Pr (AB)
Pr(A) Pr(B)
If the two events are independent, then Pr(B\A) = Pr(B) and
Pr(A\B) = Pr(A). Therefore Pr (AB) = Pr(A) Pr(B)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 58

Motivational examples
Calculating probability of an event

Table: shows the frequency of cocaine use by gender

among adult cocaine users

Life time frequency Male Female

of cocaine use

1-19 times 32 7 39
20-99 times 18 20 38
more than 100 times 25 9 34
Total 75 36 111
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 59
1. What is the probability of a person randomly picked
is a male?
2. What is the probability of a person randomly picked
uses cocaine more than 100 times?
3. Given that the selected person is male, what is the
probability of a person randomly picked uses cocaine
more than 100 times?
4. Given that the selected person is female, what is the
probability of a person has used cocaine less than
100 times?
5. What is the probability of a person randomly picked
is a male and uses cocaine more than 100 times?
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 60
1. Pr(m)=Total adult males/Total adult cocaine users =75/111

2. Pr(c>100)=All adult cocaine users more than 100 times/ Total

adult cocaine users=34/111=0.31.

3. Pr (c>100\m)=25/75=0.33.

4. Pr(f\c<100)=(7+20)/36=27/36=0.75.

5. Pr(m c>100)= Pr(m) Pr (c>100)=75/111

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 61
The term probability distribution or just distribution
refers to the way data are distributed, in order to
draw conclusions about a set of data.

Distribution of a random variable can be displayed

by a table or graph or mathematical formula.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 62

1. Application of probability of
categorical variables:
We can estimate prevalence of certain
events in a given population.
Disease (TB, DM, heart disease)
Certain characteristics (High BP, low birth weight)
Certain behavior (Smoking, drug use, condom use ).

a. Binomial distribution:
Dichotomous variable: a nominal variable with only
two possible values
The two mutually exclusive outcomes - referred as
failure and success.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 63
A random variable of this type is known as a
Bernoulli random variable:
E.g. Let X represents smoking status; X=1 smoker and X=0
non-smoker. The two outcomes are mutually exclusive.
Take the case of USA; in 1987, 29% of the adults in USA were
smokers, therefore Pr (X=1) = 0.29 and
Pr (X=0) = 1-0.29 = 0.71.
Suppose randomly select two individuals in USA, see
the smoking status of the two persons.
What is the probability
That both are non smokers?
One is a smoker?
Both are smokers?
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 64
If Pr(X=1)=p and pr (X=0) = 1- p, then the above can

be calculated using the multiplicative rule.


Outcome of X
Person1 Person2 Probability No of smokers
0 0 (1- p)(1- p) = 0.710.71=0.50 0
0 1 (1- p) p = 0.710.29=0.21 1
1 0 p (1- p) = 0.290.79=0.21 1
1 1 p p = 0.29 0.29=0.08 2

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 65

In general the binomial distribution involves three
There are a fixed n number of trials each of which results in one of two mutually exclusive
The outcomes of n trials are independent.
The probability of success is constant for each trial

Pr (X=success) = Pr (X=1) = p
Pr (X=failure) = Pr (X=0) = 1-p
The probability P(X=x) that outcome X occurs exactly x
times is:
Pr (X= x) = n! p x (1- p) n- x
x ! (n- x )! ,

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 66

E.g. Suppose in a certain population 52% of all
recorded births are males.
If we select randomly 10 birth records What is the
probability that exactly:
5 will be males?
Given n=10, x=5, Pr (X= x) = n! p x (1- p) n- x
x ! (n -x )!
So Pr (X=5) = 10! X 0.52 5 (1- 0.52)10-5 =0.24

3 or more will be females?

Pr(X3) = 1- Pr (X<3) = 1-[Pr(X=0)+Pr(X=1)+Pr(X=2)]
=1-[0.001+0.013+0.055]= 1-0.069=0.931

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 67

2. Probability distribution of Numeric variables:

Probability distribution of a discrete


To construct the probability distribution for X we list

each of the values x the variable assumes and its
associated probability (relative frequency).
E.g. Let X be the birth order of all live births in USA in 1986
then the probability distribution of X can be shown by a table
or a graph as shown below.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 68
The probability distribution of X can be used to
describe about the possible outcomes the random
What is the probability that a randomly chosen newborn is 4 th
P(X=4) = 0.058
What is the probability of an infant to be 1st or 2nd child?
P(X=1 or X=2) = P(X=1) + P(X=2)

= 0.416 + 0.330 = 0.746

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 69

Probability distribution of continuous variables:
The outcome of a random variable may not be
limited to categories or counts.
E.g. Suppose, X represents the continuous variable Height;
X can assume an infinite number of intermediate values
170.1, 170.2, 170.3 etc.

The probability associated with any particular one value is

almost equal to zero.

However the probability that X will assume some value in

the interval enclosed by two ranges say x1 and x2 is a
value greater than zero
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 70
Characteristics of a distribution: Features commonly
used to describe a distribution are location,
dispersion, modality and skewness.
Location tells us something about the average value of the variable.
Dispersion tells us something about how spread out, the values of the
variable are.

Modality refers to the number of peaks in the distribution.

Skewness refers to whether or not the distribution is symmetric

A distribution is said to be symmetric if it is

symmetrically distribute about its mode.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 71
The Normal Distribution:
Is used extensively in the analyses of continuous variables
and has an especially important role in statistics.
The normal distribution is a uni-modal and symmetric.
Completely described by two parameters, and .
The can be any number (-, + or 0), the must be a
The defines the location and the defines the
dispersion of the distribution about the mean.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 72
For any normal distribution:
-68% of the observations is contained within one SD of the mean,
95% within two SDs and 99% within three SDs of the mean.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 73

The standard normal distribution:
Since a normal distribution could be an infinite
number of possible values for its mean and SD, it is
impossible to tabulate the area associated for each
and every normal curve.
Instead only a single curve for which = 0 and = 1 is
The curve is called the standard normal distribution

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 74


Bio - Lecture note, G/her B (BSc, MSc, AssProf) 75

What Sampling?
Sampling is a process of choosing a section of the
population for observation and study.
Why Sample?
Cost in terms of money, time and manpower
To produces a more accurate data

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 76

Sampling bias
Selection of non representative sample
Failure to weight analysis of unequal probability
Sampling error
Difference between survey result and population value
Sample size
Sampling scheme

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 77

Population and hierarchy of

Study subjects
The actual participants in the study
Subjects who are selected
Sampling Frame
The list of potential subjects from which the sample is drawn
Source population
The Population from whom the study subjects would be obtained
Target population
The population to whom the results would be applied

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 78

There are two broad categories of sampling:
probability and non-probability methods.

In probability sampling, every individual may

be selected into the sample with a known
(non-zero) probability.
Non-probability sampling is not based upon
the statistical principles which govern
probability sampling.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 79
Simple Random Sampling :
Use of lottery
Use of random numbers
Use of computer programs
Advantages of simple random sampling:-
No bias i.e. no tendency to have too high or too low
statistic when you take many samples. .
Small variability (of the values of the statistics from
sample to sample).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 80

Systematic Sampling

k is sampling interval (standard distance

between individuals) obtained dividing the
number in the frame by the sample size
Every kth individual in the frame is selected for
inclusion in the sample
Systematic sampling involves a danger if the
list of individuals has some periodicity or
some pattern

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 81

Advantages of SS:
Require no frame,
Easier to perform it,
Very good when population is
homogeneously distributed,
Make geographical spread certain if
the units are in geographical

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 82

Stratified Sampling
Sampling error is reduced by two factors:
1. Large sample size
2. Homogeneous population
Stratified sampling is based on the
assumption that samples are drawn from
homogeneous population
The choice of stratifying variables depends
on the investigator (variables you want to
represent accurately)
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 83
Cluster Sampling:

Involves selection of groups called clusters

followed by selection of individuals within
each selected cluster
Can be used when it is either impossible or
impractical to compile exhaustive list of
Cluster sampling is recommended for its
efficiency, however accuracy is less because
it is subject to more than one sampling error
unlike SRS
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 84
Multistage cluster sampling
Multistage cluster sampling then involves the
repetitions of two basic steps the listing and
List primary sampling units
Then sample of those units is selected by using simple
random sampling method.
List secondary sampling units
Then sample of those units is again selected by SRS.
So forth. Finally Sample size will be calculated by
multiplying by the number of stages used (which we call
the design effect).
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 85
Sampling frame:
Sampling frames are an integral part of
probability sampling.
Developing a sampling frame
determining whether and where members of the
particular group tend to gather
identify location and estimate sizes of these potential
target group that serve as a sampling frame
Household surveys and institutional based
surveys are simple or readily available for
developing sampling frame.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 86


In general they do not produce representative

data for larger population.
They are cheaper and easier and used in:
Probability sampling methods are not feasible,
Good for pretests, pilot studies, in-depth
Precise representative ness is not necessary

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 87

Purposive (Judgmental): Selection of subjects
on basis of their knowledge of the
Convenience (Reliance) :Selection of a sample
based on easy accessibility
Quota: selection of samples based from groups
cased on a fixed quota.
In general these are examples of un acceptable/
biased sampling methods.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 88

How largeSIZE
SAMPLE a sample?
Too large:
Unnecessary involvement of extra subjects
High cost
Time constraints
Too Small:
Not able to show biological/social effect
False conclusion
Need to be not less than 30 or >25% of the population

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 89

On the basis of your answers to these above five questions, you can
calculate the sample size needed to measure a given proposition
with a given degree of accuracy at a given level of statistical
significance by using a sample formula, provided that the total
population size is greater than 10,000:

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 90

n- the desired sample size (when population is greater
than 10,000)
z- The standard normal deviate, usually set at 1.96 (or
more simple at 2.0), which corresponds to the 95
percent confidence level.
p- The proportions in the target population estimated to
have a particular characteristic. If there is no reasonable
estimate then use 50 percent (0.50).
q= 1.0-p
d- Degree of accuracy desired. Usually set at 0.05 or
occasionally at 0.022
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 91
E.g. If the proportion of a target population with a
certain characteristic is 0.50, the Z statistic is 1.96,
and we desire accuracy at the 0.05 level, then the
sample size is;

If N (the entire population) is less than 10,000, the

required sample size will be smaller, in such cases,
calculate a final sample estimate (nf) by using the
following formula:

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 92

nf - the desired sample size(when population is less
than 10,000)
n- The desired sample size (when the population is
more than 10,000)
N- The estimate of population size.
E.g. If n were found to be 400 and if the population
size were estimated at 1,000, then nf would be
calculated as follows:

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 93


Population: a largest collection of interests
Sample: part of the population
Parameter: Characteristics of population
(fixed number, do not know its value).
Statistic: Characteristics in a sample (value is
known, can change from sample to sample).
We often use statistic to estimate unknown
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 94
What is Sampling Distribution?
In order to make an inference, one has to know
some assumptions of the statistic.
The sampling distribution is defined as the
distribution of all possible values which can be
taken by the statistic.
Formed by calculating statistic for each
samples of the same size drown randomly.
The frequency distribution of all these samples
forms the sampling distribution .
E.g. age at the time of death, US population 1979-1981.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 95

Three things about sampling distribution:
Its mean
Its variance
Its shape
Due to variation, different samples from the same
population will have different sample means.
If we repeatedly take sample of n from a population
the means of the samples will form a sampling
distribution of means of size n.
In practice we do not take repeated samples from a
population but it is necessary
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 96
Empirical rule: For Variables with a Normal
(Bell-Shaped Distribution):

~68% of the values fall within +/- 1 standard

deviation of the mean.

~95% of the values fall within +/-2 standard

deviations of the mean.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 97

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 98
1. Sampling Distribution of the mean:

Suppose we choose a random sample of size n,

the sampling distribution of the sample mean
x posses the following properties.
The sample mean x will be an estimate of the
population mean .
The standard deviation of x is /n (called the standard
error of the mean).
Provided n is large enough, the shape of the sampling
distribution of x is normal.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 99
2. Sampling Distribution of the proportion:
Suppose we choose a random sample of size n, the
sampling distribution of the sample means p posses
the following properties.

The sample proportion p will be an estimate of

the population mean p.
The standard deviation of p is = p(1-p) /n
called the standard error of the proportion).

Provided n is large enough the shape of the sampling

distribution of p is normal.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10
3. Standard deviation and Standard error:
Standard deviation is a measure of variability
between individual observations
(descriptive index relevant to mean)
Standard error refers to the variability of
summary statistics (e.g. variability of
sample mean or sample proportion)
Standard error is a measure of uncertainty in
a sample statistics i.e. precision of the
estimate of the estimator.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10


Bio - Lecture note, G/her B (BSc, MSc, AssProf) 102

Statistical Estimation:
Estimation is the use of sample statistics to estimate
population parameters.
Two types of estimation:
Point estimation
Interval estimation

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

Types of estimation:
1. Point estimation: A single numerical value used to estimate the corresponding population
Sample mean, is a point estimator of the population
mean, .
Sample standard deviation, S is a point estimator of
population standard deviation, .
Sample proportion, p is an estimator of the population
proportion, P ()
Sample correlation coefficient, r is estimator of
population correlation coefficient,

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

2. Interval estimation: Consists of two numerical values with a specified
degree of confidence we feel to include the parameter called Confidence
Interval .
Confidence Interval (CI) estimate of a parameter:
The probability the method produces an interval that contains the
parameter is called the confidence level. Mostly used confidence level is
close to 1, such as 0.95 or 0.99.
CI = Estimator (reliability coefficient) x (standard error)
Multiplying reliability coefficient by standard error of parameter is called
precision of the estimate or the margin of error (d).
CI = point estimate margin of error
_ _
CI = {X- Z1- /2 x / n < < X + Z1- /2 x / n}

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

The reliability coefficient: is the value of
Z1- /2 corresponding to the confidence level.
With probability 0.95, sample proportion falls within
1.96 = reliability coefficient = z-score
Two questions to put bounds on our point estimate:
How wide does the bracket have to be?
What is our tolerance of error (variability, not mistake)?

Usually accept a 5% chance the range will not include

the true population value
The interval is called 95% confidence interval

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

Reliability Coefficient
Confiden -value Z-value
ce level
90% 0.1 1.645
95% 0.05 1.96
99% 0.01 2.58
The Confidence Interval is central and symmetric
around the sample mean , so that there is (/2 %)
chance that the parameter is more than the upper limit,
and (/2 % ) chance that it is less than the lower limit
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10
Standard error: Is another term for standard
deviation of a sampling distribution
Used in hypothesis testing and the calculation of
confidence intervals
The most frequently used calculations are:
1. Comparing a sample mean with a population mean
(large samples)

2. Single proportion (large samples) where p =

proportion and n sample size.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

Were 95% confident the population proportion who
are very happy is between 0.15 and 0.28.
The Greater confidence requires wider CI
Greater sample size gives narrower CI
If we repeatedly took random samples of size n and
each time calculated a 95% CI, about 95% of the CIs
would contain the population proportion .
The probability that the CI does not contain is called
the error probability, and is denoted by . = 1
confidence coefficient

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 10

Example: What percentage of 18-22 year-old
Americans report being very happy?
Recent data: 35 of n = 164 say they are very
happy (others report being pretty happy or not too
35 /164 .213 (.31 for all ages),
se (1 ) / n 0.213(0.787) /164 0.032

Find 95% CI:

0.213 1.96(0.032), or 0.213 0.063,= (0.15, 0.28)
(i.e., margin of error = 0.063)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

Estimation of using Confidence Intervals
a. Known variance (large sample):
A 100(1-) % C.I. for is:

B. Unknown variance (small sample size n<30):

A 100(1-) % C.I. for is:
The reliability coefficient for CI is the t distribution with n-1 degree of
t--distribution density curve is bell shaped distribution and
symmetrical about zero.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

A hypothesis is an unproved theory that is
formulated as a starting point for an
investigation - for example, 'patients who take
drug A will have better outcomes than those
who take drug B'
Hypothesis testing is the form of statistical
inference used to judge the truthfulness of
certain preconceived statements concerning the
value of a population parameter.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

Steps in Hypothesis Testing
Hypothesis testing involves the following steps:
1. State the research question in terms of statistical hypothesis.
(Null and Alternative Hypothesis)
2. Select a sample and collect data.
3. Decide appropriate test statistic (Z, t, 2, F, etc.)
4. Select the level of significance () (=0.05, 0.01, 0.1, etc).
5. Determine critical value.
value (label rejection & "acceptance"
6. Perform the calculation
7. Draw and state the conclusion

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

The Null and Alternative Hypothesis:
The Null Hypothesis and Alternative Hypothesis

mutually exclusive and complementary

Null hypothesis (H0) :
States that there is not a significant
difference among the events
Implies conclusions like: A = B
No effect
No difference
No association
E.g. Drug A has no effect on the blood glucose level of diabetic
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11
Alternative hypothesis (HA or H1) or experimental
The hypothesis that will be accepted if H0 is rejected.
Implies conclusions like:A B
is not equal,
has effect,
there is difference and
there is association.

E.g. Drug A has effect on the blood glucose level of

diabetic patients.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

Test Statistic:
Hypothesis testing is done through calculating the probability
of getting the estimated value given the hypothesized value
is true.
If the probability is very low we reject the null hypothesis.
The probability is calculated using test statistic.
A test statistic is a quantity that measures the discrepancy
between the null hypothesis and the sample data.
The most commonly used test statistics are Z, student-t, X2 and
F tests.
The general formula to calculate test statistic is:
(estimate) (hypothesized value)
test statistic
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11
Errors in Hypothesis Testing
Two types of errors can be committed: Type I and Type II

Decision of the
hypothesis testing
Accept H0 Reject H0
Null H0 True Correct Type I
Hypothes error
H0 False Type II Correct
The probability of committing type I error is denoted as .
It is also called the Level of significance.
The probability of committing type two error is denoted as .
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11
One and Two Tailed Hypothesis :
Some hypotheses test whether one value is different
from another or not, without additionally
predicting which will be higher:
Non-directional or two-tailed test

At times some hypotheses not only test difference of

one value from the other but also direction of the
difference. i.e. it would be lower or higher:
Directional or one-tailed test.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11

Level of Significance, Critical Values and Critical Area:

In practice, the level of significance () is

chosen arbitrarily. Three levels 0.01, 0.05, or
0.10. (depending on confidence level)
The smaller the level of significance, the better.
The level of significance determines the test
statistic that would cause us to reject the
The corresponding test statistic values for the
level of significance are called the Critical
In a probability distribution the area which is
left to the extreme right or/and left is called
the Critical Area (Rejection area).
The area between the two critical values is
called the Acceptance Area.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 11
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12
A level of significance has different critical values for
one and two tailed test,
Level of significance of 0.05 has critical value of
1.96 if the test is two tailed.
However if the test is one tailed the critical value
would be 1.64 to either of the tails.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12
(level of Two One One tailed
significance) tailed tailed test, >
test test, <

0.10 1.64 -1.28 1.28

0.05 1.96 -1.64 1.64

0.01 2.58 -2.33 2.33

Bio - Lecture note, G/her B (BSc, MSc, AssProf)

Draw and state the conclusion
If the numerical value of the test statistic falls in the
rejection region, we reject the null hypothesis and
conclude that the alternative hypothesis testing
process will lead to this conclusion incorrectly only
100 % of the time when H0 is true.
If the test statistic does not fall in the rejection
region, we do not reject H0.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12

Another way to state conclusion:
Reject the null hypothesis if P
Don't reject ("accept") the null hypothesis if P>
P is the probability of getting a sample statistic at least as extreme as or more extreme (in the
direction supporting HA) than the actually calculated statistic if the null hypothesis is true. i.e. if
the Ho is true, we should have a probability greater than or equal to the calculated value of the test
It is the smallest value of for which the H0 can be rejected.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12

Hypotheses on common population parameters:
1. Hypothesis test about a population mean
A. Known variance (large sample)
Shows how to test the null hypothesis that the
population mean is equal to some hypothesized
The Z test and the t test used.
Sample > 30: Z test
Sample < 30 and population SD known: Z test
Sample < 30 and population SD unknown: t test

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12

Test of Hypothesis about Single Population Mean
/ n

S / n
Example 1: Researchers are interested in the mean
level of some enzyme in a certain population. They
want to know whether they can conclude that the mean
enzyme level in this population is different from 25.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12

Step 1: State the hypothesis: Ho: =25 ; H125
Step 2: They collect a sample of size 10 from a normally
distributed population with a known variance, 2=45
and the calculated sample mean is =22.
Step 3: Select the appropriate test statistic.
The assumptions given are
Testing a hypothesis about population mean
The population is normally distributed
Population variance is known
So, Z-statistic is appropriate

Step 4: Level of significance: =0.05

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 12
Step 5: Critical value the test statistic must attain to
be declared significant.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

.Step 6. Perform the calculation

Step 7: Since -1.41 falls in the acceptance region we

accept the null hypothesis. The mean enzyme level in
the population is not different from 25.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Statistical Estimation:
Estimation is the use of sample statistics to estimate
population parameters.
Two types of estimation:
Point estimation
Interval estimation

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Types of estimation:
1. Point estimation: A single numerical value used to estimate the corresponding population
Sample mean, is a point estimator of the population
mean, .
Sample standard deviation, S is a point estimator of
population standard deviation, .
Sample proportion, p is an estimator of the population
proportion, P ()
Sample correlation coefficient, r is estimator of
population correlation coefficient,

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

2. Interval estimation: Consists of two numerical values with a specified
degree of confidence we feel to include the parameter called Confidence
Interval .
Confidence Interval (CI) estimate of a parameter:
The probability the method produces an interval that contains the
parameter is called the confidence level. Mostly used confidence level is
close to 1, such as 0.95 or 0.99.
CI = Estimator (reliability coefficient) x (standard error)
Multiplying reliability coefficient by standard error of parameter is called
precision of the estimate or the margin of error (d).
CI = point estimate margin of error
_ _
CI = {X- Z1- /2 x / n < < X + Z1- /2 x / n}

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

The reliability coefficient: is the value of
Z1- /2 corresponding to the confidence level.
With probability 0.95, sample proportion falls within
1.96 = reliability coefficient = z-score
Two questions to put bounds on our point estimate:
How wide does the bracket have to be?
What is our tolerance of error (variability, not mistake)?

Usually accept a 5% chance the range will not include

the true population value
The interval is called 95% confidence interval

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Reliability Coefficient
Confiden -value Z-value
ce level
90% 0.1 1.645
95% 0.05 1.96
99% 0.01 2.58
The Confidence Interval is central and symmetric
around the sample mean , so that there is (/2 %)
chance that the parameter is more than the upper limit,
and (/2 % ) chance that it is less than the lower limit
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13
Standard error: Is another term for standard
deviation of a sampling distribution
Used in hypothesis testing and the calculation of
confidence intervals
The most frequently used calculations are:
1. Comparing a sample mean with a population mean
(large samples)

2. Single proportion (large samples) where p =

proportion and n sample size.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Example: What percentage of 18-22 year-old
Americans report being very happy?
Recent data: 35 of n = 164 say they are very
happy (others report being pretty happy or not too
35 /164 .213 (.31 for all ages),
se (1 ) / n 0.213(0.787) /164 0.032

Find 95% CI:

0.213 1.96(0.032), or 0.213 0.063,= (0.15, 0.28)
(i.e., margin of error = 0.063)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 13

Were 95% confident the population proportion who
are very happy is between 0.15 and 0.28.
The Greater confidence requires wider CI
Greater sample size gives narrower CI
If we repeatedly took random samples of size n and
each time calculated a 95% CI, about 95% of the CIs
would contain the population proportion .
The probability that the CI does not contain is called
the error probability, and is denoted by . = 1
confidence coefficient

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

Estimation of using Confidence Intervals
a.Known variance (large sample):
A 100(1-) % C.I. for is:

Example: A physical therapist wished to estimate, with

99% confidence, the mean maximal strength of a
particular muscle in a certain group of individuals. He
assumes that strength scores are approximately
normally distributed with a variance of 144. A sample
of 15 subjects who participated in the experiment
yielded a mean of 84.3. 14
Bio - Lecture note, G/her B (BSc, MSc, AssProf)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14
B. Unknown variance (small sample size n<30):
A 100(1-) % C.I. for is:
The reliability coefficient for CI is the t distribution
with n-1 degree of freedom
x t x t / 2 ,n1 , x t / 2,n1
,n 1 n n n

t--distribution density curve is bell shaped distribution

and symmetrical about zero.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

A study of hypoxemia during the immediate
postoperative period reported the fractions of ideal
weight for 11 patients who became severely
hypoxemic during transfer to the recovery room.
The mean is 1.51 and the standard deviation is 0.33.
Estimate the 95% C.I. for the population mean
fraction of ideal weight, where the population
consists of hypoxemic patients similar to those in
the study (The data is normally distributed, use

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

A hypothesis is an unproved theory that is formulated
as a starting point for an investigation - for
example, 'patients who take drug A will have
better outcomes than those who take drug B'
Statistical hypotheses: are hypotheses that are stated
in such a way that they may be evaluated by
appropriate statistical technique.
Hypothesis testing is the form of statistical inference
used to judge the truthfulness of certain
preconceived statements concerning the value of a
population parameter.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14
Steps in Hypothesis Testing
Hypothesis testing involves the following steps:
1. State the research question in terms of statistical hypothesis.
(Null and Alternative Hypothesis)
2. Select a sample and collect data.
3. Decide appropriate test statistic (Z, t, 2, F, etc.)
4. Select the level of significance () (=0.05, 0.01, 0.1, etc).
5. Determine critical value.
value (label rejection & "acceptance"
6. Perform the calculation
7. Draw and state the conclusion

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

The Null and Alternative Hypothesis:
The Null Hypothesis and Alternative Hypothesis

mutually exclusive and complementary

Null hypothesis (H0) :
States that there is not a significant
difference among the events
Implies conclusions like: A = B
No effect
No difference
No association
E.g. Drug A has no effect on the blood glucose level of diabetic
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14
Alternative hypothesis (HA or H1) or experimental
The hypothesis that will be accepted if H0 is rejected.
Implies conclusions like:A B
is not equal,
has effect,
there is difference and
there is association.

E.g. Drug A has effect on the blood glucose level of

diabetic patients.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14

Test Statistic:
Hypothesis testing is done through calculating the probability
of getting the estimated value given the hypothesized value
is true.
If the probability is very low we reject the null hypothesis.
The probability is calculated using test statistic.
A test statistic is a quantity that measures the discrepancy
between the null hypothesis and the sample data.
The most commonly used test statistics are Z, student-t, X2 and
F tests.
The general formula to calculate test statistic is:
(estimate) (hypothesized value)
test statistic
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 14
Errors in Hypothesis Testing
Two types of errors can be committed: Type I and Type II

Decision of the
hypothesis testing
Accept H0 Reject H0
Null H0 True Correct Type I
Hypothes error
H0 False Type II Correct
The probability of committing type I error is denoted as .
It is also called the Level of significance.
The probability of committing type two error is denoted as .
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15
One and Two Tailed Hypothesis :
Some hypotheses test whether one value is different
from another or not, without additionally
predicting which will be higher:
Non-directional or two-tailed test

At times some hypotheses not only test difference of

one value from the other but also direction of the
difference. i.e. it would be lower or higher:
Directional or one-tailed test.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15

Level of Significance, Critical Values and Critical Area:

In practice, the level of significance () is

chosen arbitrarily. Three levels 0.01, 0.05, or
0.10. (depending on confidence level)
The smaller the level of significance, the better.
The level of significance determines the test
statistic that would cause us to reject the
The corresponding test statistic values for the
level of significance are called the Critical
In a probability distribution the area which is
left to the extreme right or/and left is called
the Critical Area (Rejection area).
The area between the two critical values is
called the Acceptance Area.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15
A level of significance has different critical values for
one and two tailed test,
Level of significance of 0.05 has critical value of
1.96 if the test is two tailed.
However if the test is one tailed the critical value
would be 1.64 to either of the tails.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15
(level of Two One One tailed
significance) tailed tailed test, >
test test, <

0.10 1.64 -1.28 1.28

0.05 1.96 -1.64 1.64

0.01 2.58 -2.33 2.33

Bio - Lecture note, G/her B (BSc, MSc, AssProf)

Draw and state the conclusion
If the numerical value of the test statistic falls in the
rejection region, we reject the null hypothesis and
conclude that the alternative hypothesis testing
process will lead to this conclusion incorrectly only
100 % of the time when H0 is true.
If the test statistic does not fall in the rejection
region, we do not reject H0.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15

Another way to state conclusion:
Reject the null hypothesis if P
Don't reject ("accept") the null hypothesis if P>
P is the probability of getting a sample statistic at least as extreme as or more extreme (in the
direction supporting HA) than the actually calculated statistic if the null hypothesis is true. i.e. if
the Ho is true, we should have a probability greater than or equal to the calculated value of the test
It is the smallest value of for which the H0 can be rejected.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 15

Hypotheses on common population parameters:
1. Hypothesis test about a population mean
A. Known variance (large sample)
Shows how to test the null hypothesis that the
population mean is equal to some hypothesized
The Z test and the t test used.
Sample > 30: Z test
Sample < 30 and population SD known: Z test
Sample < 30 and population SD unknown: t test

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Test of Hypothesis about Single Population Mean
/ n

S / n
Example 1: Researchers are interested in the mean
level of some enzyme in a certain population. They
want to know whether they can conclude that the mean
enzyme level in this population is different from 25.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Step 1: State the hypothesis: Ho: =25 ; H125
Step 2: They collect a sample of size 10 from a normally
distributed population with a known variance, 2=45
and the calculated sample mean is =22.
Step 3: Select the appropriate test statistic.
The assumptions given are
Testing a hypothesis about population mean
The population is normally distributed
Population variance is known
So, Z-statistic is appropriate

Step 4: Level of significance: =0.05

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16
Step 5: Critical value the test statistic must attain to
be declared significant.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

.Step 6. Perform the calculation

Step 7: Since -1.41 falls in the acceptance region we

accept the null hypothesis. The mean enzyme level in
the population is not different from 25.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16
Test of Hypothesis of Single Population Mean
b. Unknown variance (small sample size n<30):
The test statistic is the t-distribution

Serum Amylase level determination was made on a
sample of 15 apparently health subjects. The
sample yielded the mean of 96 units/100 ml and a
standard deviation of 35 units /100 ml.
The variance of the population was unknown. We
want to know wheter we can conclude that the
mean of the population is different from 120
units/100 ml.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16
Step 1 and 2: Define the Ho and H1. Sample was
selected. H o : 120 H 1 : 120

Step 3: Decide approprate test statistic - t test

Step 4 and 5: Decide level of significance and critical
value of 0.05.
t value for of 0.025 at df 14: 2.145

Step 6: Obtain the Value of the Test Statistics and

X 96 120
S/ n
t t 2.65
35 / 15

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Step 7: Make a decision and interpret it.
We reject the null hypothesis b/c
The cal test statistic -2.65 is in the rejection area
The corrspoinding P value of -2.65 (b/n 0.01 and 0.005) is less than the /2 value of 0.025.
we can conclude that the mean of the population is different from 120 units/100 ml.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Test Hypothesis of Single Population Proportion

The null hypothesis that the population proportion is

equal to some hypothesized value.
One begins with a statement that claims a particular
value for the unknown population proportion.
The hypothesis testing for single population
proportion either accepts or rejects this statement.
Here Z test statistic is used. The formula is given as:
(1 )

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Example :
A survey was conducted to determine the prevalence
of protein energy malnutrition in a rural kebele. Of
300 under five children assessed, 123 were stunted.
Can we conclude that the prevalence of PEM in the
population is 50%?

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 16

Step 1 and 2: Define the Ho and HN
H o : 0.5 H N : 0.5
Step 3: Approprate test statistic: Z statistic
Step 4 and 5: Decide the level of significance and the
corresponding critical value:
Lets take value of 0.1. Hence 1.645 is the critical value.

Step 6: Obtain the Value of the Test Statistics:

p 0.41 0.5 0.09

Z 3.11
(1 ) 0.5(0.5) 0.25
n 300 300
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17
Step 7: Make a decision and interpret it.
At 90% confidence level we reject the null hypothesis
that P=0.5.
The calculated test statistic -3.11 is in the rejection region.
The corrspoinding P value of -3.11 (i.e. 0.0009) is less than the value of 0.05.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

It is also possible to apply hypothesis testing on
categorical data.
The Chi-squared (X2) test statistic commonly used.
This test is usually applied to tabulated data.
The table contains two variables called the row and
column variable.
The test measures the discripancy between K
observed frequencies (O) and correspoinding K
expected frequencies (e). i.e. for all cells of the
Expected frequencies are frequencies which happen
when there is no association between the raw and
column variables.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17
The Ho of Chi-squared test is there is no
association between the row and
column variables.
While the alternative hypothesis is
there is associaiton between the row
and column variables.
The closer observed frequencies are to
(O e )
k 2

x frequencies,

the more likely

i 1 i
the H0 is true.
row total for the cell x column total for the cell
grand total

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Assumptions of Chi-squared test:
No cell of the table has expected frequency less than 1,
No more than 20% of the the expected frequencies should be less than 5.

Chi-squared test should compaired with chi-square

disribution with df of (R-1)(C-1).
Though the distribution of Chi-square is one tailed,
the test is always two tailed.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Example :
A researcher is interested to assess the effect of litracy
on family planning use. Accordingly he collected data
and tabulated the findings in the following manner.
Can we say there is association between educational
status and family planning use?

FP use Educational Status

Illiterate Literate Total
Yes 63 49 112
No 15 33 48
Total 78 82 160

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Step 1 and 2: Define the Ho and HN:
Ho: There is not association between litracy and family planning
H1: There is association between litracy and family planning use
Step 3: Decide approprate test statistic: X test. 2

Step 4 and 5: Decide and the corresponding critical

Lets take value of 0.01.
At df of 1 the critical value is 6.635.
Accptance area is 0-6.635, Rejection area X2 > 6.635.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Step 6: Obtain the Value of the Test Statistics:
First the expected frequency should be calculated:
Expected frequency for cell a: 78 x 112/160 = 54.6
Expected frequency for cell b: 82 x 112/160 = 57.4
Expected frequency for cell c: 78 x 48/160 = 23.4
Expected frequency for cell d: 82 x 48/160 = 24.6
NB: Assumptions of X2 test fulfilled.
Then we calculate the Chi-square statistics.

(Oi ei ) 2

i 1 ei

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

(63 54.6) 2 (49 57.4) 2 (15 23.4) 2 (33 24.6) 2

54.6 57.4 23.4 24.6

x 2 1.29 1.23 3.02 2.87 8.41

Step 7: Make a decision and interpret it.

At 99% confidence level we accept the HA that the two
variables are associated due to the following
The calculated test statistic 8.41 is in the rejection area.
The corrspoinding P value of 8.41 (between 0.005 and 0.002) is less than the value of (0.01).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 17
What is Public health Surveillance ?
Epidemiologic Surveillance is the systematic collection,
analysis, interpretation and dissemination of health data in
an ongoing basis

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Use of surveillance information:
Priority setting and planning
Resource mobilization and collection
Prediction and early detection of epidemics
Early and adequate detection/ response to epidemics
Monitoring and evaluation of intervention
Surveillance: Surveillance can be conducted:
Globally ( AIDS surveillance system managed, WHO),
Regional (polio surveillance in Latin America),
Institutional (hospital acquired, refugee camps).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Sources of Data for Surveillance: The following are
some key sources of surveillance data, not all of
which are available in every country:
Census data: periodic enumeration of a population
Reports (birth & death certificates, autopsy, corpse exam)
Morbidity reports
Hospital data (discharge, surgical logs, infection)
Absenteeism records (school, workplace, claims)
Epidemic reports
Laboratory test utilization and result reports

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Drug utilization records and adverse drug
Special surveys (e.g., research data)
Police records (especially for injury, alcohol risk)
Information on animal reservoirs and vectors
( rabies, plague)
Environmental data (hazard surveillance, water
and food testing)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Types of Surveillance
PASIVE: Is that in which health care providers send
reports based on a known set of rules and regulations
ENHANCED: Collection of additional data reported
under routine surveillance.
INTENSIFIED: The upgrading from passive to an active
surveillance system. Limited period (because of out
ACTIVE: Surveillance where public health officials seek
reports from participants in the surveillance system on
a regular basis, rather than waiting for the report
Limited to specific diseases over a limited period

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Conditions in which surveillance appropriate:
Periodic evaluation of ongoing programs - HIV/AIDS
Programs which have time limit of operation - Small pox
With the occurrence of unusual situations:
when a new disease/event discovered
when investigating a new mode of transmission
when a high-risk period is recognized
when a disease appears in a new geographic area or found to
affect a new subgroup of the population
When previously eradicated disease reappear

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Features of good surveillance system
Uses combination of passive & active surveillance
Emphasize collection of minimum data in simplest
Make sure that the data collected is useful for the
workers who collect the data.
Timely reporting.
Timely and comprehensive action.
both case detection and treatment
Strong laboratory services for accurate diagnosis

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Rationale for monitoring of Surveillance
Keep Standards
Maintain high quality
Good data + good analysis = good system = useful
Good data + bad analysis = bad system = useless

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

Evaluation of Surveillance: The following aspects of the
system should be assessed:
The importance to the public health:
Incidence and prevalence
Severity (case-fatality or death-to-case ratio)
Mortality (overall and age-specific mortality rates, years of potential life lost)
Health care costs
Potential for spread

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

The objectives and operation of the system
The case definition of the health event
The population under surveillance
the time period for data collection (weekly, monthly, annually)
What information is collected (Is it what programs need?)
The reporting sources
How data are handled (transfers, delays, confidentiality)
How data are analyzed (by whom? frequency)
How data are disseminated

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 18

The systems usefulness
Action taken to date as a result of the information
Future or potential uses

Cost or resource requirements for system operation

Attributes or qualities of the surveillance system

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Limitations of Surveillance System
Under reporting (such as due to lack of knowledge of
reporting requirements, negative attitudes toward
Lack of representativeness of reported cases (such as
due to a bias toward reporting severe cases, or
increased likelihood of reporting after publicity)
Lack of timeliness
Inconsistency of case-definitions

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Factors related to selection of disease for surveillance:
Magnitude of the disease
Feasibility of control measures
Need for monitoring and evaluating the performance
of a control program
Resource availability

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Selecting Priority disease for Surveillance:
Does the disease result in a high disease impact?
(Mortality, Disability, Morbidity)

Does it have a significant epidemic potential?

(E.g. Cholera, Meningitis, Measles)

Is it a specific target of a national, regional, or

international control program?
Is it disease with potential to rapidly spread across
national boundaries?
Will the information to be collected lead to significant
public health action?
(E.g. Immunization campaign)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Core activities of serveillance
Detection (identifying cases and outbreaks)
Conformation (Epidemiological and Laboratory)
Reporting (early warning and routine)
Analysis and interpretation
Response (preventive and control measures, out break investigation, program adjustment,
changes in policy and planning)
Evaluation and Monitoring

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Support activities of surveillance
Setting standardize:
Case definitions
Standard case management guidelines
Standard procedures for investigation
Communications: radio, fax, e-mail, phone, health
Providing Resources:

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Integrated Disease Surveillance
Response (IDSR)

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

IDSR Concept and Experience in Ethiopia:
Integrated disease surveillance and response (IDSR)
is an approach adapted to strengthen national
disease surveillance systems by coordinating and
streamlining all surveillance activities and ensuring
timely provision of surveillance data to all disease
prevention and control programs in order to
initiate timely response (intervention).

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

IDSR in Ethiopia:
IDSR initiative was launched by the WHO-AFRO
(Africa regional office for WHO) in the second half
of the 1990s.
Since then the initiative has been adapted by many
African countries including Ethiopia.
In fact, Ethiopia was one of the countries in Africa that
has made good progress in IDSR implementation.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Adaptation of the national guidelines and training
modules for IDSR,
Training modules were prepared
Training for professionals from national to woreda
level, and
Preparation and distribution of relevant forms are
Data collection and reporting using the IDSR
guideline and forms is also initiated

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 19

Strategic objectives of IDSR:
Design disease surveillance system on priority disease (???13)
Case- based reporting on selected diseases
Integrate all surveillance activities
Strengthen surveillance data management
Strengthen the capacity and involvement of laboratories in IDSR

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Activities : IDSR to achieve objectives, it seeks to:
Strengthen the capacity of Woredas to conduct effective surveillance activities
Integrate multiple surveillance systems so that forms, personnel and resources can be used more
efficiently and effectively
Improve the use of information for decision making
Improve the flow of surveillance information between and within levels of the health system

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Improve laboratory capacity in identification of
pathogens and monitoring of drug sensitivity
Increase the involvement of health workers in
the surveillance system.
Emphasize community participation in detection
and response to public health problems
Strengthen the involvement of laboratory
personnel in epidemiological surveillance

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Disease of IDSR important:
List of Priority diseases in Ethiopia: Epidemic- born
1. Cholera
2. Diarrhea with blood (Shigella)
3. Measles
4. Meningitis
5. Plague
7. Viral Hemorrhagic fever
8. Yellow fever
9. Typhoid fever
10. Relapsing fever
11. Epidemic Typhus
12. Malaria
13. Bird flue

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Disease of International Interest
Targeted for eradication
Dracunculiasis ( Guinea worm)
Poliomyelitis (Acute flaccid paralysis)
American Trypanosome / chagas disease
Lymphatic Filariasis
Targeted for reduced incidence/prevalence

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Targeted for reduced transmission
Disease submitted to international health regulation
Yellow Fever

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20
Define importance of epidemic investigations
Discuses problems in epidemic investigations
Identify patterns of epidemic investigation
Define point source epidemic
Identify common source epidemic
Define preparative epidemic
Discuss mixed type of investigations epidemic

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

MANAGEMENT means sudden occurrence of
Outbreak investigations are important component of
public health.
Properly conducted investigations can help to:
Identify the source of ongoing outbreaks and
Prevent additional cases.
Provide a useful knowledge about the disease

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Outbreak investigations: outbreak investigations
encounter more constraints that is not a research:
o Great urgency to find sources
o Pressure to conclude urgently
o Limited samples
o If delayed, might be difficult to get enough evidences
o Medias may cause bias (early report)
o Source may overlooked (ignored) because of conflict of interest

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 20

Principal Patterns of epidemic:
Point Source Epidemic: Out break caused by
simultaneous exposure of a group of susceptible
person to a common source agent or a toxin
Develop the disease with one incubation period, a
rapid rise and fall of an epidemic curve
E.g. Food poisoning

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Continuous Common Source:
Source (exposure) remains for a long period of time,
There will be multiple exposures, with variable
incubation periods.
The epidemic curve is with no clear peak and
duration of the outbreak will be prolonged
Intermittent common source: results in irregular
patterns of the epidemic curve that reflects the
intermittent nature of the exposure

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Propagated or progressive (contact) epidemic:
Occurring through direct person - to - person
transmission or through a vector
The epidemics usually wanes (subside) after a few
generations, either because:
The number of susceptible falls bellow some critical level, or
Intervention measures become effective
Mixed Epidemics:
Epidemics having the features of both common source
and propagated epidemics are referred as mixed

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21
Steps of an Epidemic Investigation
Prepare for field work:
Investigation related:
Investigator must have the appropriate scientific
supplies, and
equipment to carry out the investigation.
Discuss the situation with
knowledgeable people,
review applicable literature, and
collect sample questionnaire.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Administration related:
observe all administrative procedures.

arrangement of transportation and

organizing personnel matters.
Clarify role of the team in the field.
Identify local contacts at the site where the
outbreak is reported and
Arrange where and when to meet them.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Verify the existence of epidemic:
Compare current number of cases with past levels of
disease in that community,
Compare the observed number of cases with expected
number of cases in the area.
Be careful, excess may not always indicate outbreak:
Changes in local reporting procedures,
Change in case definition,
Increased interest because of awareness,
Improvements in diagnostic procedures.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Verify the diagnosis
Review the clinical and laboratory findings of the
cases to establish the diagnosis.
Ensure that the problem has been
Properly diagnosed and
Rule out laboratory error as basis for increase in

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Summarize the clinical findings:
Use frequency distribution, it is useful in:
spectrum of the illness, diagnosis, and case definitions.
Visit as much patients as you can: generating hypothesis about
disease etiology and spread.
Etablish criteria for labelling persons as "cases".
The criteria are used for case definition

Case definition:
Confirmed/definite: a case with laboratory verification.
Probable: a case with typical clinical features, without
laboratory confirmation.
Possible : a case presented with fewer of the typical clinical

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Verification of the diagnosis:
Surveillance- identifying and counting cases.
Types of surveillance commonly utilized in epidemic:
Stimulated or enhanced passive surveillance includes:
Sending out a letter asking for reports.
Alerting the public directly, usually through local media, to see a physician if they have symptoms
Asking case-patients if they know anyone else with the same condition

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 21

Active surveillance: Conduct survey of entire population.
Collect the following types of information about every case:
Identifying information - allows to contact patients and to map the geographic extent
Demographic information - gives population at risk.
Clinical information - verification of the case
Risk factor information - exposure to suspected cause
Reporter information - additional information if there is a need to report back results

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Describe epidemic with respect to time, place, person:
Collect relevant information: collect information
carefully to characterize the outbreak with respect
to time, place and person.
Epidemic curve: plots the cases by the time of onset and provides a time frame for the outbreak
Spot map: plots the cases by location and shows the geographic spread of cases
Attack rates: Calculate rates of illness in population at risk by exposure to specific suspected

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Formulate hypothesis: based on your characterization of
the epidemic by time, place, and person.
The hypotheses should address:
the source of the agent,
the mode of transmission, and
the exposures that caused the disease.
Determine the type of epidemic-
Common source Vs propagated.

Evaluation of hypotheses can be done in two ways:

Comparing hypotheses with the established fact or
Analytic epidemiology to quantify relationships &
Explore the role of chance.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Analytic approach: The technique utilizes the cohort
and the case-control approach to identify possible
source of an outbreak.
Cohort approach: identifies the comparison group based on exposure status.

Calculate relative risk to determine whether there is

association between exposed and non-exposed
The case control method identifies the comparison groups on the basis of their disease status.

Compute Odds ratio to find association between

cases and controls with regard to exposure to the
suspected cause

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22
Search for additional cases: Locate unreported cases.
Passive: Inquire hospitals, professionals or both whether they have seen similar cases.
Active: Do intensive investigation in the community on asymptomatic persons or contacts of the

For example: doing liver function test in an

investigation of hepatitis A outbreak

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Analyze the Data
Assemble all results, and interpret findings.
Make a decision on the hypotheses tested: All
findings must be consistent with one and only one
Intervention and follow-up: intervention must start
as soon as possible depending on the specific
One might aim control measures at the specific agent,
source, or reservoir.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

. an outbreak might be controlled by destroying
contaminated foods, sterilizing contaminated water,
or destroying mosquito breeding sites. Or
An infectious food handler could be removed from the
job and treated.
Report of the investigation: At the end prepare a
comprehensive report and submit to the
appropriate/concerned agency (or agencies).
The report should follow the usual scientific format:
Introduction Results
Background Discussion
Methods Recommendations.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

The report should discuss in detail:
Factors leading to the epidemic.
Evaluation of measures used for the control of the
Recommendations for the prevention of similar
episodes in the future.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Managing Outbreak/epidemics
Management of epidemics require an urgent and
intelligent use of appropriate measures against the
spread of the disease.
Action to be taken is dependent on the type of the
disease as well as the source of the outbreak.
However, the action can be generally categorized as
presented below to facilitate easy understanding of
the strategies.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 22

Action can be generally categorized as:
Measures Directed Against the
Domestic animals as reservoir:
Testing of herds
Destruction of infected animals
Example : brucellosis and bovine tuberculosis
Wild animals as reservoir:
post-exposure prophylaxis
Example : rabies

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Humans as reservoir
Removal of the focus of infection- e.g. cholecystectomy
(gallbladder) in a chronic typhoid carrier.
Isolation of infected persons.
Treatment to make them non-infectious: e.g., tuberculosis.
Disinfection of contaminated objects
Quarantine- is the limitation of freedom of
movement of apparently healthy persons
or animals who have been exposed to a
case of infectious disease.
Usually imposed for the duration of the
maximal incubation period of the disease.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

E.g. Cholera, Plaque, and yellow fever are the three
internationally quarantable diseases by
international agreement.
Now quarantine is replaced in some countries by active
surveillance of the individuals - maintaining close
supervision over possible contacts of ill persons to
detect infection or illness promptly; their freedom of
movement is not restricted.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Measures that interrupt the transmission of
Action to prevent transmission of disease by ingestion:
purification of water
pasteurization of milk
inspection procedures designed to ensure safe food
improve housing conditions.
Attempts to reduce transmission of respiratory
Chemical disinfection of air and ultraviolet light.
Work on ventilation patterns, like unidirectional
("laminar") air flow to reduce the transmission of
organisms in hospitals.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Action to interrupt transmission of diseases whose
cycles involve an intermediate host
clearing irrigation farms from snails to control
Measures that reduce host susceptibility
Active immunization, when either the altered
organism or its product is given to a person to
induce production of antibodies - EPI.
Passive immunization, has lesser role in the control of
communicable diseases than active immunization:
Transfer of maternal antibodies to the fetus through the

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Prophylaxis administration of immune serum
globulin (ISG). E.g., TAT for un-immunized
persons who receive penetrating wounds, antitoxin
against Clostridium botulinum and antiserum
against rabies.
Chemoprophylaxis: use of antibiotics for known
contacts of cases-
E.g. in tuberculosis, gonorrhoea, and syphilis. use of chlorquine to persons traveling to malaria
endemic areas

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Uncovering outbreaks : Outbreaks are detected in one
of the following ways:
Through timely analysis of routine surveillance data,
this may reveal an increase in reported cases or
unusual clustering of cases.
Report from clinician.
Report from the community, either from the affected
group or
Concerned citizen.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Why Investigate Possible Outbreaks
To institute control and prevention measures
In order to design and implement appropriate control
measures assessment of the extent of the outbreak
and the size and the characteristics of the
population at risk needs to be done.
Opportunity for research
Outbreaks are natural experiment waiting to be
analyzed and exploited. It gives a unique opportunity
to study the natural history of diseases. It may also
help to assess the impact of control measures and
the usefulness of new epidemiology and laboratory
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23
Training opportunity
Investigating an outbreak requires a combination of
diplomacy, logical thinking, problem-solving ability,
quantitative skills, epidemiologic know-how, and
These skills improve with practice and experience.
Therefore, an outbreak may provide a good
opportunity for an epidemiologist in-training to
learn these skills by working with experienced

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23

Opportunity for program evaluation
An outbreak of a disease targeted by a public health
program, such as EPI, tuberculosis or STDs, may
reveal a weak point in that program and provide the
opportunity to change or strengthen the program's
Public, political, or legal concerns
Public, political, or legal concerns sometimes override
scientific concerns in the decision to conduct an
The call from these parties usually has no scientific basis
and such investigations mostly do not identify a causal
link between exposure and disease.
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 23
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24
Define importance of epidemic investigations
Discuses problems in epidemic investigations
Identify patterns of epidemic investigation
Define point source epidemic
Identify common source epidemic
Define preparative epidemic
Discuss mixed type of investigations epidemic

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

MANAGEMENT means sudden occurrence of
Outbreak investigations are important component of
public health.
Properly conducted investigations can help to:
Identify the source of ongoing outbreaks and
Prevent additional cases.
Provide a useful knowledge about the disease

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Outbreak investigations: outbreak investigations
encounter more constraints that is not a research:
o Great urgency to find sources
o Pressure to conclude urgently
o Limited samples
o If delayed, might be difficult to get enough evidences
o Medias may cause bias (early report)
o Source may overlooked (ignored) because of conflict of interest

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Principal Patterns of epidemic:
Point Source Epidemic: Out break caused by
simultaneous exposure of a group of susceptible
person to a common source agent or a toxin
Develop the disease with one incubation period, a
rapid rise and fall of an epidemic curve
E.g. Food poisoning

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Continuous Common Source:
Source (exposure) remains for a long period of time,
There will be multiple exposures, with variable
incubation periods.
The epidemic curve is with no clear peak and
duration of the outbreak will be prolonged
Intermittent common source: results in irregular
patterns of the epidemic curve that reflects the
intermittent nature of the exposure

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Propagated or progressive (contact) epidemic:
Occurring through direct person - to - person
transmission or through a vector
The epidemics usually wanes (subside) after a few
generations, either because:
The number of susceptible falls bellow some critical level, or
Intervention measures become effective
Mixed Epidemics:
Epidemics having the features of both common source
and propagated epidemics are referred as mixed

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24
Steps of an Epidemic Investigation
Prepare for field work:
Investigation related:
Investigator must have the appropriate scientific
supplies, and
equipment to carry out the investigation.
Discuss the situation with
knowledgeable people,
review applicable literature, and
collect sample questionnaire.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 24

Administration related:
observe all administrative procedures.

arrangement of transportation and

organizing personnel matters.
Clarify role of the team in the field.
Identify local contacts at the site where the
outbreak is reported and
Arrange where and when to meet them.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Verify the existence of epidemic:
Compare current number of cases with past levels of
disease in that community,
Compare the observed number of cases with expected
number of cases in the area.
Be careful, excess may not always indicate outbreak:
Changes in local reporting procedures,
Change in case definition,
Increased interest because of awareness,
Improvements in diagnostic procedures.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Verify the diagnosis
Review the clinical and laboratory findings of the
cases to establish the diagnosis.
Ensure that the problem has been
Properly diagnosed and
Rule out laboratory error as basis for increase in

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Summarize the clinical findings:
Use frequency distribution, it is useful in:
spectrum of the illness, diagnosis, and case definitions.
Visit as much patients as you can: generating hypothesis about
disease etiology and spread.
Etablish criteria for labelling persons as "cases".
The criteria are used for case definition

Case definition:
Confirmed/definite: a case with laboratory verification.
Probable: a case with typical clinical features, without
laboratory confirmation.
Possible : a case presented with fewer of the typical clinical

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Verification of the diagnosis:
Surveillance- identifying and counting cases.
Types of surveillance commonly utilized in epidemic:
Stimulated or enhanced passive surveillance includes:
Sending out a letter asking for reports.
Alerting the public directly, usually through local media, to see a physician if they have symptoms
Asking case-patients if they know anyone else with the same condition

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Active surveillance: Conduct survey of entire population.
Collect the following types of information about every case:
Identifying information - allows to contact patients and to map the geographic extent
Demographic information - gives population at risk.
Clinical information - verification of the case
Risk factor information - exposure to suspected cause
Reporter information - additional information if there is a need to report back results

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Describe epidemic with respect to time, place, person:
Collect relevant information: collect information
carefully to characterize the outbreak with respect
to time, place and person.
Epidemic curve: plots the cases by the time of onset and provides a time frame for the outbreak
Spot map: plots the cases by location and shows the geographic spread of cases
Attack rates: Calculate rates of illness in population at risk by exposure to specific suspected

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Formulate hypothesis: based on your characterization of
the epidemic by time, place, and person.
The hypotheses should address:
the source of the agent,
the mode of transmission, and
the exposures that caused the disease.
Determine the type of epidemic-
Common source Vs propagated.

Evaluation of hypotheses can be done in two ways:

Comparing hypotheses with the established fact or
Analytic epidemiology to quantify relationships &
Explore the role of chance.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Analytic approach: The technique utilizes the cohort
and the case-control approach to identify possible
source of an outbreak.
Cohort approach: identifies the comparison group based on exposure status.

Calculate relative risk to determine whether there is

association between exposed and non-exposed
The case control method identifies the comparison groups on the basis of their disease status.

Compute Odds ratio to find association between

cases and controls with regard to exposure to the
suspected cause

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 25
Search for additional cases: Locate unreported cases.
Passive: Inquire hospitals, professionals or both whether they have seen similar cases.
Active: Do intensive investigation in the community on asymptomatic persons or contacts of the

For example: doing liver function test in an

investigation of hepatitis A outbreak

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Analyze the Data
Assemble all results, and interpret findings.
Make a decision on the hypotheses tested: All
findings must be consistent with one and only one
Intervention and follow-up: intervention must start
as soon as possible depending on the specific
One might aim control measures at the specific agent,
source, or reservoir.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

. an outbreak might be controlled by destroying
contaminated foods, sterilizing contaminated water,
or destroying mosquito breeding sites. Or
An infectious food handler could be removed from the
job and treated.
Report of the investigation: At the end prepare a
comprehensive report and submit to the
appropriate/concerned agency (or agencies).
The report should follow the usual scientific format:
Introduction Results
Background Discussion
Methods Recommendations.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

The report should discuss in detail:
Factors leading to the epidemic.
Evaluation of measures used for the control of the
Recommendations for the prevention of similar
episodes in the future.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Managing Outbreak/epidemics
Management of epidemics require an urgent and
intelligent use of appropriate measures against the
spread of the disease.
Action to be taken is dependent on the type of the
disease as well as the source of the outbreak.
However, the action can be generally categorized as
presented below to facilitate easy understanding of
the strategies.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Action can be generally categorized as:
Measures Directed Against the
Domestic animals as reservoir:
Testing of herds
Destruction of infected animals
Example : brucellosis and bovine tuberculosis
Wild animals as reservoir:
post-exposure prophylaxis
Example : rabies

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Humans as reservoir
Removal of the focus of infection- e.g. cholecystectomy
(gallbladder) in a chronic typhoid carrier.
Isolation of infected persons.
Treatment to make them non-infectious: e.g., tuberculosis.
Disinfection of contaminated objects
Quarantine- is the limitation of freedom of
movement of apparently healthy persons
or animals who have been exposed to a
case of infectious disease.
Usually imposed for the duration of the
maximal incubation period of the disease.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

E.g. Cholera, Plaque, and yellow fever are the three
internationally quarantable diseases by
international agreement.
Now quarantine is replaced in some countries by active
surveillance of the individuals - maintaining close
supervision over possible contacts of ill persons to
detect infection or illness promptly; their freedom of
movement is not restricted.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Measures that interrupt the transmission of
Action to prevent transmission of disease by ingestion:
purification of water
pasteurization of milk
inspection procedures designed to ensure safe food
improve housing conditions.
Attempts to reduce transmission of respiratory
Chemical disinfection of air and ultraviolet light.
Work on ventilation patterns, like unidirectional
("laminar") air flow to reduce the transmission of
organisms in hospitals.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Action to interrupt transmission of diseases whose
cycles involve an intermediate host
clearing irrigation farms from snails to control
Measures that reduce host susceptibility
Active immunization, when either the altered
organism or its product is given to a person to
induce production of antibodies - EPI.
Passive immunization, has lesser role in the control of
communicable diseases than active immunization:
Transfer of maternal antibodies to the fetus through the

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 26

Prophylaxis administration of immune serum
globulin (ISG). E.g., TAT for un-immunized
persons who receive penetrating wounds, antitoxin
against Clostridium botulinum and antiserum
against rabies.
Chemoprophylaxis: use of antibiotics for known
contacts of cases-
E.g. in tuberculosis, gonorrhoea, and syphilis. use of chlorquine to persons traveling to malaria
endemic areas

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27

Uncovering outbreaks : Outbreaks are detected in one
of the following ways:
Through timely analysis of routine surveillance data,
this may reveal an increase in reported cases or
unusual clustering of cases.
Report from clinician.
Report from the community, either from the affected
group or
Concerned citizen.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27

Why Investigate Possible Outbreaks
To institute control and prevention measures
In order to design and implement appropriate control
measures assessment of the extent of the outbreak
and the size and the characteristics of the
population at risk needs to be done.
Opportunity for research
Outbreaks are natural experiment waiting to be
analyzed and exploited. It gives a unique opportunity
to study the natural history of diseases. It may also
help to assess the impact of control measures and
the usefulness of new epidemiology and laboratory
Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27
Training opportunity
Investigating an outbreak requires a combination of
diplomacy, logical thinking, problem-solving ability,
quantitative skills, epidemiologic know-how, and
These skills improve with practice and experience.
Therefore, an outbreak may provide a good
opportunity for an epidemiologist in-training to
learn these skills by working with experienced

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27

Opportunity for program evaluation
An outbreak of a disease targeted by a public health
program, such as EPI, tuberculosis or STDs, may
reveal a weak point in that program and provide the
opportunity to change or strengthen the program's effort.
Public, political, or legal concerns
Public, political, or legal concerns sometimes override
scientific concerns in the decision to conduct an
The call from these parties usually has no scientific basis
and such investigations mostly do not identify a causal
link between exposure and disease.

Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27


THANKS! Bio - Lecture note, G/her B (BSc, MSc, AssProf) 27


S-ar putea să vă placă și