Epi Final

Disease -definition
- abnormal condition of an ind that impairs physioal

funcing; broad array of health conditions including
physiologic states & mental health
- words disease, illness & sickness loosely
interchangeable
Disease = physioal/psychal dysfunc
Illness = subjective state of person who is aware of not being

well
Sickness = state of social dysfunc, i.e., a role that the ind

assumes when ill
Epidemiology-definition
Greek medical terminology:
Epi= Upon
Demos= People
Logos= Study of (Body of Knowledge)
study of how disease is distributed in populations & factors that

influence or determine this distribution
study of distribution & determinants of health-reld states or events

in specific pops & application of this study to control health
problems
Epidemiology and clinical practice
Practice of medicine dependent on pop data
- Clinical dx (e.g. clinical finding associated w/pathology in

large group of ppl)
- Prognosis (e.g. observations of large groups of pts w/same

disease, stage, tx)
- Selection of appropriate therapy (e.g. Studying efx of a tx

in large groups of pts in randomized clinical trials)
Final
Epidemic, Endemic, Pandemic

Epidemic
Greater # of cases of disease than expected in given
pop
Endemic
Constant presence of disease in a particular locality,
region or ppl
Pandemic
Epidemic that spreads thru human pops across a
large region, continent, or even worldwide
Epidemiology-objectives
To identify etiology or cause of a disease & factors that increase a
persons risk to disease
To determine extent of disease found in the community
To study natural history and prognosis of disease
To evaluate both existing & new preventive and therapeutic

measures and modes of health care delivery
To provide foundation for developing public policy & making

regulatory decisions relating environmental problems
Public health- definition
The science concerned w/safeguarding & improving

physical, mental & social well-being of community as a
whole
promoting health and efficiency through

organized community effort
prolonging life
preventing disease
Epidemiology - a population science
Clinicians are concerned with the health of an individual
Epidemiologists are concerned with the collective health

of the people in a community
Design Strategies
Descriptive
Analytic
Final
Descriptive Epidemiology
Examines natural history & distribution of disease in pop

Observes its distribution in terms of Time, Person,
& Place = Triad of Descriptive Epidemiology
Final
Characteristics of Time
Cyclic Fluctuations: Annual occurrence, Seasonal

variation, Daily occurrence during an epidemic
Changing or stable; trends (comparing today with x yrs

ago)
Clustered (epidemic) or evenly distributed (endemic)

Final
Characteristics of Person
Age
Gender
Ethnicity / Race
Marital status
Socio-economic status (SES)
Occupation
Final
Characteristics of Place
Geographically restricted or widespread
Geographic variation: rural/urban, states
Multiple clusters or one
Physical location such as relation to water or food supply,

pollution
Final
Determines extent of disease in community

Evaluates trends of disease w/in & among pops
Provides a basis for planning, provision & evaluation of
health services
Provides data to be studied by analytic methods (helps
form hypothesis)
Types of study:
Case Reports
Case Series
Correlation studies
Cross sectional studies = Community health survey
Final
Descriptive vs Analytic
DESCRIPTIVE ANALYTIC
EPIDEMIOLOGY EPIDEMIOLOGY
Examining distribution of a Testing a specific hypothesis

disease in a pop, & observing about relationship of a disease
the basic features of its to an alleged cause, by
distribution in terms of time, conducting study that relates
place, and person. exposure of interest to disease
of interest.
Typical study design: Typical study designs:

community health survey = cohort
cross-sectional study case-control
correlation studies clinical trial
Final
Analytic Epidemiology
Examines the determinants of a disease in a population
What factors are associated with disease (risk factors)
what factors are causing the disease
Uses comparison groups -

Triad of Analytic Epidemiology
The three phenomena assessed are:
Host Factors
Agent Environment
Agents
Nutrients
Allergens
Radiation
Physical trauma
Microbes
Psychological experiences
Host Factors
degree to which the ind contacts and is able to adapt to the

stressors produced by agent
Genetic endowment
Immune System (e.g. vaccinated)
Nutritional Status
Environment
Temperature
Sanitation
Pollution of water / air
Population density
Disease Transmission
Modes of communication
Phenomena in the environment that bring host and

agent together, such as:
1. - Reservoir
2. - Vector
3. - Vehicle
Dynamics of Disease Transmission
Source or Reservoir Modes of transmission Susceptible host
The process of spread of a disease agent through

a population. Answer the questions: Who got it?
and How did it spread (either from a common
source or from person A to person B)?
Reservoir
natural habitat of the infectious agent
Any person, animal, arthropod, plant, soil, or a

combination of these
- in which an infectious agent normally lives and

multiplies, on which it depends primarily for
survival, and where it reproduces itself in such a
manner that it can be transmitted to a susceptible
host.
Modes of transmission
Mode of transmission
Direct Indirect
transmission transmission
Direct contact Vehicle-borne

Vector-borne
Contact with soil
Air-borne
Inoculation into skin or mucosa
Trans-placental (vertical)
Vector
An animate intermediary in disease transmission.
Most vectors are arthropods such as mosquitoes,
fleas, or ticks.
Vehicle
Inanimate objects such as food, water, biologic
products (e.g. blood), and fomites that may
indirectly transmit an infectious agent from a
reservoir to a host.
Frequency Measurements
Incidence & Prevalence
This Lecture
Count and Rate

Incidence
Prevalence
Relationship between Incidence and Prevalence
Epidemiology
The study of:
Health related events or Diseases in human populations

related to
frequency, distribution and determinants
and the application of this knowledge

to the control of health problems
Distribution
Think descriptive epidemiology:
Measurement of disease frequency

&
pattern of disease occurrence (who, where,
when)
Distribution
Who?
Describe disease in terms of demographics e.g. sex, age,
race, ethnicity, SES
Where?
Geographic variation - rural/urban, physical location
When?
Annual occurrence, seasonal variation,
daily/ hourly occurrence

Objectives of
Descriptive Epidemiology:
Permits evaluation of trends in health and disease within

and among populations
Identification of emerging problems
Provides a basis for planning, provision and evaluation of
health services
Identifies problems to be studied by analytic methods
Descriptive studies cannot be used to

prove an association between 2 variables
Heart Disease by Race
Surveillance
Monitoring progress of a disease in a community.
A continued watchfulness over the distribution and

trends of diseases through the systematic collection,
consolidation and evaluation of morbidity (disease) and
mortality (death) reports and other relevant information
Surveillance
Active Surveillance
Based on public health legislation, refers to daily, weekly, monthly
contacting physicians, laboratories, schools to actively search for cases.
Used during outbreaks to identify additional cases
Passive Surveillance
Reporting of cases by health care providers on a periodic and consistent
way. Usually thru legislatively mandated reporting of certain conditions; not
actively seeking new cases
Sentinel Surveillance
Monitoring rate of occurrence of specific conditions to assess stability or
change in health levels of a population; by specific orgs to know how
disease patterns have changed
Count and Ratio
Count, Proportion,
Ratio and Rates
Count
The simplest and most frequently performed

quantitative measure in epidemiology
The number of cases of disease or other health

phenomenon being studied
E.g. Cases of Ebola during 2014 in Liberia

Ratio
Includes: Percentage, Rate and Proportion
Numerator
Denominator
In Epidemiology:
Number of events in time T

Population in time T
Incidence Final
= # of new cases (incidents) during a time period
Prevalence
= # of existing cases at or over a point in time
aka total
Incidence rate
The rate of:
new events or incidents
(disease, injury or death)
in population at risk
during a period of time
At beginning of given time!

Ie. beginning of yr.
New cases _____ x 10n
Population at risk
Incidence rate Final
One of the most important rates in epidemiology
Measures the rate at which people without a disease

develop the disease during a specific period of time
Measured in a cohort study
Can be described as:

cumulative incidence or incidence density
Cumulative Incidence:
Final
The number of new cases of a disease during a

specified period of time in a population at risk
New cases occurring in a given period x 10n

Population at risk during same time period
=Often an estimate
Example:
Among inpatients at Hospital during the month of

June, a total of 12 patients acquired nosocomial
infections. For the same month, the hospital had a
total of 2400 patients admitted. The Incidence for
the month of June per 1000 patients is:
12/2400 x 1000 = 5 per 1000

Incidence Density
The true incidence of disease at any given point in time
The most ideal, although not practical measure
New cases occurring in a given period x 10n

Total person-time of observation
Calculation of Person-years for Incidence
Density
Cases Total time
Subject A ------------------- 2 years

Subject B -------------X 1 years
Subject C ----------------------- 2 years
Subject D ----------------------------------X 3 years
x = developed disease
-- = time followed
Incidence Density = 2 / 8
= 25 per 100 person-years
A Prospective Study of post-menopausal hormones and coronary heart
disease
NEJM 313:1044, 1985
Population: 32,317 postmenopausal women

Cases of coronary heart disease: 90
Time period: 105,786.2 person-years
What is the Incidence Density of CHD in this study?
Incidence Density = 90 / 105,786.2 person-years

= 85.1 / 105 person-years
Prevalence rate
The proportion of persons in the population

who have a particular disease at a given time
all cases during a given time period x 10n

population during the same time period
Total cases / total pop!

Prevalence:
2nd MC measure of disease frequency
Focus on chronic conditions
Measured in a cross-sectional study
Can be measured at
a specific point in time (point prevalence) or
over a specified period of time (period prevalence)
Point vs. Period Prevalence
Point Prevalence examines prevalence

at a single point in time
= status of disease in a population at a point in time
Period Prevalence examines prevalence

over a longer period.
Is the proportion of a population that has the condition
at some time during a given period
e.g. a year
Example
St. Maarten has a survey which shows that 20% of the adult
population over the age of 40 is diabetic. In St. Kitts the
figure is 10%. The statement that living in St. Maarten is
associated with a greater risk of becoming diabetic is
a) Correct
b) Incorrect, the comparison is not based on rates
c) incorrect, because no control or comparison group is used
d) incorrect, because prevalence is used instead of incidence
Incidence and Prevalence
Prevalence does not say anything about risk of

developing a disease
Incidence = new cases;

Tells about the risk & how well the control measures
are working
Prevalence
Prevalence = total disease load
To know how many people need treatment, what

supplies are needed, requirements to take care of
disease.
The Relationship between
The Prevalence Pot

Relation between
Prevalence is related to:

Incidence of disease and
duration of disease (D)
P I x D !!!
Duration is shorter ie. Cold, prevalence of it will be lower too
Paradox between
A disease can have a high incidence, but if there is

rapid mortality or recovery the prevalence can be
low
Ie. cold
A disease with low incidence may still have high

prevalence if the disease is not associated with cure
or death
Accumulated chronic conditions, esp in elderly
Disease of brief duration will be more likely to be
missed by a prevalence study
Example
30% of all deaths from myocardial infarction
occur within 24 hours of the onset of symptoms
in people having no prior evidence of disease
Disease of long duration are well represented in a
prevalence study, even when there incidence is
low.
Example: Crohns disease

Incidence is about 2-7 per 100,000/year
Prevalence is more than 100 per 100,000/year
Question:
If you want to determine the cost of treating
diabetics in your country do you need to know
the prevalence or incidence of diabetes?
If you want to know if anti-smoking legislation

has resulted in fewer cases of lung cancer
should you use incidence or prevalence?
What happens to incidence and
prevalence if:
New effective treatment is initiated
Incidence doesnt change; occurrence has already taken place
Prevalence decreases
New effective vaccine gains widespread use
Incidence decreases
Number of patients dying from the condition increases
Incidence doesnt change
Prevalence decrease
Additional federal research dollar are targeted to a specific
condition
Nothing changes unless new effective drug
Behavioral risk factors are reduced in the population at large
Incidence decreases
MCQ
A new chemotherapy treatment is developed that
reduces death from leukaemia but does not produce
recovery. Which of the following will occur?
a)Prevalence of the disease will decrease

b)Incidence of the disease will increase
c)Prevalence of the disease will increase
d)Incidence of the disease will decrease
e)Incidence and prevalence of the disease will decrease
This Lecture
Attack Rate
Morbidity Rate
Mortality Rate
Case Fatality Rate
Relative Risk
Attributable Risk
Attack Rates
Number of events among a population at risk in a period of
time
A variant of an incidence rate; a cumulative incidence rate

mainly used in epidemic situations
applied to a narrowly defined pop over a limited time
same formula as incidence rate!!!
new cases occurring in a given time period x 10n

population at risk during the same time period
Secondary Attack Rate
Number of new cases among contact of known cases of

specific groups
Cases among contacts of primary cases during the period x10n

total number of contacts at risk
Question: Calculate primary and secondary attack rates for the
following epidemic of hepatitis A
7 cases of hepatitis occurred among 70 children attending a day care

center. Each infected child came from a different family. The total
number of persons in the 7 affected families (including the children
from the daycare) was 32. One incubation period later, 5 additional
family members developed hepatitis.
10 attack rate =
20 attack rate in the families =
Question: Calculate primary and secondary attack rates for the
following epidemic of hepatitis A
Seven cases of hepatitis occurred among 70 children attending a day

care center. Each infected child came from a different family. The
total number of persons in the 7 affected families (including the
children from the daycare) was 32. One incubation period later, 5
additional family members developed hepatitis.
10 attack rate = 7 / 70 = 10%

20 attack rate in the family members = 5 / (32-7)
= 5 / 25 = 20%
Morbidity Frequency Measures
Incidence Rate
Attack Rate
Secondary Attack Rate
Point Prevalence
Period Prevalence
Mortality Frequency Measures
Crude Mortality Rate

Specific Mortality Rate
Standardized (=Adjusted) Mortality Rate
Mortality Rates
Rate of death in pop at risk
But usually:
denominator is midpoint pop so that makes it an
estimate & not an exact figure
Crude Mortality Rates
The proportion of population dying every year
All deaths during a calendar year x 1,000

Population at midyear
It is the actual measured rate for the whole population

Only multiply by 1000, if asked per person
Example : Age Specific Mortality Rate
Number of people who died

in a particular age group
x 1000
Total population of same age group
at midyear
Age group in Deaths in Miami

years
17 to 27 50
28 to 47 750
> 48 1000
Total 1800
Example : Age Specific Mortality Rate
Number of people who died

in a particular age group
x 1000
Total population of same age group
during the same year
Age group in Population Deaths in Age Specific Mortality

years of Miami Miami rate
17 to 27 50,000 50 50/50,000 =1
28 to 47 150,000 750 (MR 5) 750/150,000 =5
> 48 250,000 1000 (MR 4) 1000/250,000 =4
Total 450,000 1800 (MR 4) 1800/450,000 =4

Mortality Rates
Crude rates: summary rates based on actual number of
events in a population.
When comparing regions or countries:

1. Specific rates divides a population into more
homogeneous subgroups based on age, sex, race, risk factors,
cause, etc
2. Adjusted rates summary rates in which an as if

statistical procedure has been applied to remove the effect of
differences in composition of the various populations (ie. diff
countries in mils). Standard population sizes are used.
Comparing Mortality Rates
2 populations can only be compared when

using:
Age specific mortality rates or

Age-adjusted rates
More Mortality Rates
Age Specific Mortality Rate
e.g. Neonatal Mortality Rate, Infant Mortality Rate
Proportionate Mortality
Case Fatality Rate

Infant Mortality Rates
The number of deaths in the first year after live birth

Imp indicator of countrys level of health & devt
~10 mil infants per year world-wide
#Deaths of infants under 1 year old in a given year x 1000

Live births in same year
Proportionate Mortality
The proportion of the overall mortality ascribed to a

specific disease
Deaths assigned to specified disease in a time period x 100

Total # of deaths from all causes during same period
Final
Case-Fatality Rate Final
Deaths from a specific disease

per number of persons with the disease
A measure of probability of death
of severity of disease
to see benefit of therapy
Esp. used in acute infectious diseases
Deaths assigned to disease in a time period x 100

Total # of people w/disease in the same time period
Example: Case Fatality Rate
In a population of 100,000 persons:
20 have disease X
In one year 18 die from that disease
Case Fatality is
18 / 20 = 0.9 or 90%
Example: Calculation of
Infant Mortality Rates
In year X 38,910 infants died and

3.9 million children were born
The IMR (number of deaths children less than 1 year old)
= 38910 / 3.9 million

= 9.95 per 1000
Estimating Risk:
Measures of Association
Measures of disease frequency is the basis for
comparison of pops w/diff exposures
To identify disease determinants we look for an
association between exposure and the risk of
developing disease
To compare risks between exposure (= a risk
factor/causative factor) and disease
There are several Measures of association
Presentation is often in 2 x 2 table

Disease
Yes No Total
Yes a b a+b
Exposure
No
c d c+d
Total a+c b+d a+b+c+d

RR= Relative Risk

OR= Odds Ratio
AR= Attributable Risk
ARP= Attributable Proportion
Relative Risk = RR = Risk Ratio
Compares risk of a particular event in 2 groups
How much more likely is one group to develop a disease or
death than the other
RR = Absolute risk in the exposed group

Absolute risk in the unexposed group Final
RR = Incidence rate of exposed group

Incidence rate of unexposed group
= a/(a + b) = dis + exp / all exposed

c/(c + d) dis + nexp/ all non exposed
Relative Risk
A measure of association; it indicates the likelihood
How much the risk increases for a person with risk

factor compared to a person without the risk factor
measures strength of association btwn factor &

outcome
Examp
le
50 out of 100 students who drink Tap water get

gastroenteritis during the semester (attack rate =
50%)
150 out of 300 students who dont drink tap water

also get sick (attack rate = 50%)
The relative risk is 1, therefore there is no

increased risk associated with drinking cistern
water.
Relative Risk
A risk ratio of 1 indicates identical risk in 2 groups
A risk ratio > 1 indicates that exposure gives increased

risk
(i.e. smoking - lung carcinoma)
A risk ratio < 1 indicates protective factor against

developing disease
(i.e. sunscreen protects against skin cancer)
Attributable Risk = AR
= Risk Difference
The risk of disease in exposed group that can be considered
attributable to the exposure
So it is the benefit that might happen if the risk factor is removed
The absolute effect of exposure in those exposed vs. not exposed
= the excess risk of disease (risk difference)
AR = Ie Io (subtract!)
Ie is incidence in the exposed
Io is incidence in un-exposed
How many more cases in one group Final

Attributable Proportion/risk %
The proportion of disease attributable to exposure
Also called: Attributable risk percent
If the risk factor is removed, that proportion can be benefitted
Inc. for exposed group Inc. for unexposed group x100%

Inc. for the exposed group
Another formula: (RR 1) x100%

RR
Incidence is used to calculate RR, AR & ARP
RR: How much more likely; How much the risk for a
patient who smokes increased compared to a non smoker.
AR: How many more cases; Excess cases in the exposed

group that can be attributed to smoking
AP: reduction in dis; Benefit for pop if risk factor is

removed
Study Designs
This Lecture
Descriptive Studies
Case Report and Case series
Cross-sectional Studies
Correlational Studies
(Scatterplots, Regression, Correlation

Coefficient r, Coefficient of Determination
r 2)
Types of Research Study Design
Descriptive Studies (Observational)
Case Reports
Case Series
Cross sectional studies
Ecological /Correlational studies
Analytic Studies (Observational)

Case-control
Cohort
Analytic Studies (Interventional)
Randomized controlled trial
Cross over trials
Types of Descriptive Studies
Case reports
Case series
Cross-Sectional Surveys
Correlational / Ecological studies

Case Reports
Most basic descriptive study
Describes the experience of a single patient
Quick and Cheap
Document unusual medical occurrences

Final
May lead to the identification of a new

disease
Case report: Example
The case of a 51-year-old Moroccan male

admitted for a non-reducible right inguinal
hernia in which surgical exploration showed
the presence of a small bowel tumor that
had migrated into his hernia sac. A histo-
pathological examination of the tumor was in
favor of a small bowel schwannoma.
Case Series
A collection of Case Reports
same as the Case Report but:
Describe exp of a group of pts w/same dx
Is usually 5-10 people but can be up to 100

Final
Case series: Example
Between Oct 1980 and May 1981, 5 cases

of Pneumocystis carinii pneumonia were
reported among young, previously healthy,
homosexual men in L.A. These cases
were studied further.
Such infections previously occurred only in

older, immunosuppressed cancer patients
Case reports and series
May describe:
Previously described disorder involving a new
population or subgroup
Known disorder with unusual clinical
presentation or clinical course
Condition in which novel tx methods are used
Previously unknown condition

Case Reports/Series
problems/disadvantages
Often based on exp of only 1 or few pts
Presence of a risk factor may only be

coincidental
Can not test for the presence of a valid

statistical association
There is no comparison group

Cross-Sectional Study
Also called Prevalence Surveys
Exposure & disease measured simultaneously
Provides a snapshot of population
Examples:
Obesity and TV viewing
Alcohol and CHD
Hypertension and physical inactivity
Final
Cross-sectional survey of coronary heart disease
Number Number Prevalence

Examined with CHD Rate
Not 89 14 157.2/1000
physically
active
Physically 90 3 33.3/1000
active
Total 179 17 95.0/1000
The data show an association btwn inactivity

and CHD
Cross-Sectional Studies - BENEFITS
Used to provide info on prevalence of

disease (disease load) & health outcomes
Allows administrators to assess health status

& needs of population
Used to formulate hypotheses ONLY; cannot

establish association (for that, must do
analytical study)
Final
Cross-Sectional Surveys -
PROBLEMS
Surveys gather prevalent not incidence data
Since both exposure and disease are assessed at

the same time you can not determine whether
exposure preceded or resulted from disease
chicken or egg dilemma = no temporal sequence
Usually can not be used to test a hypothesis

Correlational (Ecological) Studies
Investigating a possible exposure-disease relationship
Use populations as unit of analysis
Populations can be countries, counties, provinces etc
Uses database from entire populations to compare

frequency of a particular disease in relation to some
factor
The Relation of obesity and depression:
Depression
Prevalence of obesity in U.S states

Ecological studies
Associations on population levels may not reflect
associations on individual levels.
Example: Dont know whether individuals who are

obese tend also to be depressed
Ecological Fallacy: Incorrectly assuming that an

association on a pop level rflx association on a ind
level
Ie. telling pt who is obese, he is more likely to get
depression as well WRONG.
Correlation Analysis
Results can be presented in the form of:
A scatter diagram a graphical form
Line of least squares (linear regression)

a descriptive form
Final
Correlation coefficient a descriptive form

Scatter Diagram
Scatter diagram = graphical method to
display the relationship between two
variables
plots pairs of bivariate observations (x, y)

on X-Y plane
Y is called the dependent variable
X is called an independent variable

Scatter Diagram - example
A researcher believes that there is a
linear relationship between BMI BMI (Kg/m2) Birth-weight
(Kg/m2) of pregnant mothers and the (Kg)
birth-weight (BW in Kg) of their 20 2.7
newborn 30 2.9
The following data set provide 50 3.4
information on 15 pregnant mothers 45 3.0
who were contacted for this study 10 2.2
X = BMI; independent variable 30 3.1
Y = birth-weight; dependent variable 40 3.3
25 2.3
50 3.5
4
3.5
20 2.5
3 10 1.5
2.5
2
55 3.8
1.5 60 3.7
1
0.5
50 3.1
0
0 10 20 30 40 50 60 70
35 2.8
Is there a linear relationship between BMI and
Body weight?
Scatter diagrams are important for initial

exploration of the relationship between two
quantitative variables
In the above example, we may wish to

summarize this relationship by a straight
line drawn through the scatter of points
Regression
Although we could fit a line "by eye" e.g. using a

transparent ruler, this would be a subjective
approach and therefore unsatisfactory.
An objective, and therefore better, way of

determining the position of a straight line is to
use the method of least squares or regression.
best fitting line = regression line

Regression
A mathematical model to describe the effect of one
or more independent variables on a dependent
variable. It gives a prediction equation
Linear regression: Effect of one independent

variable (X) on one dependent variable (Y)
Multiple regression: Effect of many independent

variables on one dependent variable
Linear Regression or
Least Squares
The equation for the least squares regression line:

y = a + bx; PREDICTION EQN
a= intercept b= slope
This will enable you to predict y given x
It will give an average, not an exact figure
4
3.5
2.5
1.5
0.5
0
0 10 20 30 40 50 60 70
Least Squares
Using this method, we choose a line such that the sum of
squares of vertical distances of all points from the line is
minimized.
These vertical distances, i.e., distance btwn y values &

their corresponding estimated values on line are called
residuals
Line which fits best is regression line OR least-squares

line
Least Squares
y = a + bx a=1.775 b=0.033
This equation allows you to estimate
Body weight of other newborns when the BMI is
given.
e.g., For a mother who has BMI=40,

what would the predicted birth weight of the baby
be?
Y= a + bx= 1.775+0.033 (40)= 3.096
Correlation Coefficient = r (-1 to +1)
summarizes size & direction of a relnship

btwn 2 variables
The relationship means that Y changes in

systematic way as X changes & vice versa
relationship can be linear (straight line) or

non-linear (other than straight line)
Final
Difference between Correlation and
Regression
Correlation Coefficient, r, measures the

strength of bivariate association or reln
regression line = prediction equation that

estimates the values of y for any given x
y = a + bx
Correlation Coefficient: r
A measure of strength of linear association

between 2 variables, x and y
Most statistical packages & some hand

calculators can calculate r
For the data in our Example r = 0.94
r has some unique characteristics

Final
Correlation Coefficient, r
r takes values between -1 and +1
r = +1; Maximum or perfect Positive correlation
r = -1; Maximum or perfect Negative correlation
r = 0 represents no linear relationship between the two

variables
r > 0 implies a direct linear relationship

Final
r < 0 implies an inverse linear relationship
The closer r comes to either +1 or -1, the stronger is the

linear relationship
Correlation Coefficient
Final
Correlation Coefficient
Final
Coefficient of Determination = r2
Another important measure of linear
association between x and y (0 r2 1)
Measures the proportion of the total variation

in y which is explained by x
For the data in our example: r = 0.94 so r2 =

0.8751
So 87.51% of variation in BW is explained by

the independent variable x (BMI)
More on Scatterplots and Correlation
Scatter plots can show you a lot about the
relationship between X and Y when using
correlation coefficient
A circular scatter plot or an elliptical scatter plot,

with a horizontal line through its longest part,
indicates no relationship between X and Y
When the line thru the longest pt of the ellipse

either slopes up or down theres a relationship
btwn X & Y
Correlations of various sizes
r =1 r =0.9
r =0.5 r=0
Correlation and linear regression
Correlation Linear Regression

Quantifies the degree to What is the cause and what is
which two variables the effect as the regression
are related; it does line is determined as the best
not find a best-fit line way to predict Y from X.
(that is regression)
Reasons for Using the
Correlational Study Design
Simple to design and conduct
Cost effective
Time
To discover new relationships that can later be

explored using analytic studies
A study is published that
demonstrates a strong relationship
between breast cancer and dietary
fat intake. The per capita fat intake
is compared to breast cancer
mortality using data from 21
countries. The Pearson correlation
coefficient is 0.889, p<0.05. A
patient asks you for advice. Can
she reduce her risk of breast
cancer by maintaining a strict low
fat diet?
A.Yes, because the data clearly show an inverse correlation between

dietary fat intake and breast cancer
B.Yes, because this type of study has high ecologic validity
C.No, because this type of study can not be used to determine a cause
and effect relationship
D.No, because the number of countries surveyed was too small (big)
E. No, because the study did not reach statistical significance (it did!)
Key points: Correlations
You cant draw causation from a
correlation or any descriptive study
A common trick on the USMLE is to ask
about causation / etiology between 2
factors based on a correlational or other
descriptive study
The correct answer is that no causation
can be proven from this type of study
Which of the following is the best approximation of the
correlation coefficient between cigarette per capita and
CHD deaths as determined from these data?
A +1.20
B 0.22
C +0.72
D 0.85
E0
The postnatal weight gain (Y) in kilograms over a specified
period of time was related to varied amounts of a formula
supplement (X), in unit doses, taken during the same
period for a sample of 50 infants. The following results
were obtained:
Regression equation: Y = 1.0 + 0.5 X

Correlation coefficient: r = 0.74, P<0.05
Which of the following is a reasonable conclusion?
A. Infants not taking the supplement are expected to gain 2

Kg
B. There is no correlation between weight gain and the
amount of supplement taken
C. An infant taking 2 units of the supplement during the
time period is expected to gain 2 Kg
D. An infant taking 5 units of the supplement during the
time period is expected to gain 3 Kg
Prevention
Prevention and Epidemiology
An effective
Descriptive Analytic Intervention
Epidemiology Epidemiology =
Prevention
An injury story
22 years old male, recent college graduate, driving at high speed,

without seat belt, went out of control, crashed head on into another
vehicle and flew out of windshield
Paramedics found him, drove him in an ambulance to hospital
where he was examined and assessed, then airlifted to trauma center
C 6-7 fracture dislocation with bilateral jumped facets -complete
quadriplegia - surgical decompression - 10 days trauma center - 6
months rehab center
Long term requirements: daily attendant services, am and pm;
modified vehicle; wheelchair, adaptive devices; modified housing;
career choice change
An injury story - costs
Care Costs:
Acute care---------------- $ 65,000.00
Rehab care--------------- $ 90.000.00
Ongoing care
p.a.--------------------- $ 45,000.00
10 yr-------------------$ 450.000.00
Foregone income
p.a.--------------------$ 71,000.00
10 yr--------------------$ 710,000.00
TOTAL COSTS ----------- $ 1.315 Million
Prevention cost: Almost nothing!
Definition of Prevention
Actions aimed to promote and preserve health;

eradicating, eliminating, or minimizing the impact of injury,
disease and disability
Levels of prevention
The concept of prevention is best defined in the context of levels,
traditionally called:
Primary: any axn to prevent disease onset in 1st place;
disease process hasnt started
Secondary: take axn when disease already started but sx

havent devd yet; catching disease at early disease
Tertiary: disease started w/sx progressing but aim at

limiting its impact
Final
Natural History of Disease
Time
A B C D
Biologic Disease detectable Symptoms Recovery
onset by screening develop Disability
or
Death
Detectable
pre-symptomatic phase
Primary Prevention:
Modifying risk factors or eliminating causes
Phase:
Before disease is present
Avoids the development of disease
Strategy:
Reduce or remove risk factors
Educate on health promotion
Good nutrition
Vaccinations
Regular exercise Final
Methods of Primary Prevention:
Health promotion
Life-style: Health education
Exercise, Avoiding Tobacco, Limit alcohol, Safe

sex
Nutrition: Healthy diet; reducing salt, sugar and

fat
Promoting Breastfeeding
Final
Health promotion
Environment:
Clean water, vector control
Adequate housing and good waste disposal
Environmental pollution, and violence
Final
Specific protection
Supplements: e.g. folic acid in pregnancy

Immunizations: to prevent infectious disease
Occupational safety
Automobile safety
Final
Secondary Prevention
Finding disease in an asymptomatic person

to improve prognosis
Phase:
Disease present
Strategy:
Diagnose the disease early
Prompt treatment
Arrest diseases process Final
Prevent disability
Methods of Secondary Prevention
Early diagnosis and treatment
Screening tests
- Pap smear
- Colonoscopy
- mammography
Regular checkups
Final
Tertiary Prevention
Limiting complications and disability in symptomatic

patients
Phase:
Disease present; clinical course after occurrence of
disease
Strategy:
Restore normal / near normal functioning
Reduce fatalities and complications
Final
Methods of Tertiary Prevention
Disability limitation
Rehabilitation
Medical Treatment / Therapy
Prosthetics
Physiotherapy
Final
The Prevention Plan
Motivates individuals to make decisions that

are good for them, their family and community
Identifies an individuals current and future

health risks through a detailed health risk
assessment
Agencies within the Public Health
Service
National Institutes of Health (NIH)
Substance Abuse and Mental Health Services Administration (SAMHSA)
Administration for Children and Families (ACF)
Administration on Aging (AoA)
Agency for Healthcare Research and Quality (AHRQ)
Agency for Toxic Substances and Disease Registry (ATSDR)
Centers for Disease Control and Prevention (CDC)
Centers for Medicare & Medicaid Services (CMS)
Food and Drug Administration (FDA)
Health Resources and Services Administration (HRSA)
Indian Health Service (IHS)
The CDC
Involved in developing and applying prevention
and education activities to improve health of the
people of U.S.:
Disease prevention and control (especially

infectious diseases)
Environmental health
Occupational safety and health
Health promotion
Some Challenges
for the 21st Century
Smoking
Obesity
Institute a Rational Health Care System
Eliminate Health Disparities
Clean up and protect the Environment
Prevention and Epidemiology
Descriptive epidemiology measures
The extent of certain disease (frequencies)
Who (where and when) is at risk (TPP)
The consequences of this disease (Incidence, Case Fatality,
Prevalence=Disease Load)
Analytic epidemiology
What are the risk factors involved
(= determinants of disease)
Prevent, detect, reduce
Cohort Studies Case control studies
- Starts w/group of exposed ppl & non- - Starts w/group of ppl already
exposed ppl then follow them to see how w/disease & group w/out disease
many ppl develop disease & how many did then look at their histories to see how
not many were* exposed to risk factor & how
- Prospective: only go fwd from date of study many were not
- Retrospective: events have already occurred; - Calculate odds ratio b/c no known
go back, even when you do this however, still incidence rate (for relative risk)
start study w/exposure present or not; & then
see who developed disease & who didnt (as AKA CCO
very similar to case control) - Adv: easy to study rare disease, cheap,
- calculate relative risk b/c known incidence short duration (less time consuming)
- Disadv: difficult to establish temporal
rate AKA CSR association, biased due to confounders
- Adv: able to study temporal association
(exposures precedes disease can be clearly
established), study rare exposures, can KNOW WHAT TYPE OF
control confounders (preventing bias)
- Disadv: time consuming & expensive, high STUDY IT IS; MORE QUES
drop out rate ON THAT THAN CALCS!!!
- Nested case control study: start w/exposed
& unexposed, eventually get ppl w/disease &
no disease & then go back into history to Final
study other factors of interest, once these
grouped so develops into case control
Clinical trials: ppl w/disease used as reference Cross-over study: pts in group A & B receives
pop expal group divided into diff drugs; then wait (wash out period) 1 yr
- Drug group switch drugs given to each
- Placebo group - adv: each n can compare themselves (as
subjects become their own controls), need
Placed by randomization to reduce selection less ppl
bias, reduce effect of known & unknown - 2 types of clinical analysis in clinical trials
confounders - Intention to treat: drug & placebo
Blinding groups aka ALL participants will
Single-blind: participants blinded, be analyzed, even if non-
investigators know compliant, taken into analysis
Double-blind: both dont know
produces/reveals effectiveness of
Triple-blind: both + analysts dont
know who got drug & who got placebo drug!**More VALUE
Phases of clinical trial: - Explanatory analysis: drug &
1. Preclinical phase: animal studies placebo groups but ONLY
2. Phase I: healthy inds compliant pts analyzed; reveals
3. Phase II: diseased inds, ie. drug for HTN efficacy of drug! More reptd by
is it really reducing the disease (HTN) in drug companies
pt? Is it valid for that particular disease?
4. Phase III: randomized control trial
5. Phase IV: after drug released into market to
see any other efx
STUDY conducted to know association of alcoholism & cirrhosis of liver;
total of 300 agreed to participate in study; 100 pts w/cirrhosis selected &
matched w/200 ppl w/out cirrhosis; interviewed & asked about their
alcohol consumption in past. Of 100 pts who had cirrhosis, 80 were
alcoholics. In remaining 200 w/out cirrhosis, 40 were alcoholics. What is
the type of study conducted? Calculate measurement of risk.
= CASE CONTROL STUDY
Odd ratio = incidence in exposed/ incidence in unexposed
= [80*160] / [40*20]
100 200
Disease No disease
Exposed 80 40
Unexposed 20 160
In 1945, study conducted to know efx of exposure of Dutch famine to
fetus & its association w/childhood & adult illnesses. 500 pregnant women
who were exposed to famine & IDd & 500 women who werent exposed to
famine were also studied. Children born to both groups were followed. It
was observed that more # of kids that were born to famine exposed
women devd early CV & respiratory diseases.
= PROPSECTIVE COHORT STUDY
In 1998, study conducted to know efx of exposure of Dutch famine to

fetus & its association w/childhood & adult illnesses. Records were
acquired from 1944. 500 pregnant women who were exposed to famine
were IDd & 500 women who werent exposed to famine. From hospital
recds, found that more # of kids that were born to famine exposed
women devd early CV & respiratory diseases.
= RETROSPECTIVE COHORT STUDY

Cohort Studies
Also called:
Longitudinal studies
Follow-up studies
Incidence Studies
Cohort Studies
One of the most useful observational study

Individuals are divided on basis of presence or absence
of risk factors
Inds then followed over time to determine if they
develop a specific outcome or disease
What is a cohort?
Group of individuals
sharing same experience
followed up for specific period of time
Examples:
Group of smokers
Occupational cohort of chemical plant

workers
Cohort Studies
Cohort Studies
Follow up Follow up
Disease Disease Does Totals Incidence
develops Not Develop
Exposed A B A+B A/A+B
Not exposed C D C+D C/ C+D
Incidences in the exposed group and the

not exposed group are calculated
Cohort Study of smoking and Coronary
Heart Disease (CHD)
Follow up Follow up
CHD CHD Does Not Totals Incidence
develops Develop /1000 per
year
Smoke cigarettes 84 2916 3000 28.0
Do not smoke 87 4913 5000 17.4
cigarettes
Select a group of 3,000 smokers (exposed) and 5,000

nonsmokers (not exposed)
All are free of heart disease at baseline
Both groups are followed for the development of
CHD
Incidence in both groups is compared
Types of Cohort Studies
PROSPECTIVE (FUTURE
STUDIES)
RETROSPECTIVE
Prospective Cohort Study
Initiation of study occurs before occurrence of

disease
Groups of exposed & unexposed inds are
monitored over time to assess devt of disease
incidence of disease in both groups is compared
Potential confounders documented
Ie. smoking in a study btwn lung cancer &
alcohol; smoking would be a confounder
Final
Exposure Disease
?
Final
Exposure may have occurred at study entry

Outcome definitely has not occurred at study
entry
Prospective Cohort Study of smoking relationship
to lung cancer
Identify a population of elementary school students
and follow them up
Identify those who smoke and those who do not
Observe who develop lung disease and who do not in
future
Final
= Concurrent Cohort
= Longitudinal Study
Cohort Study: Lead level and Affective Disorders
Exposed group: 100 children exposed to high levels

of lead were followed for 15 years; 40 developed an
affective disorder
Non-exposed group: similar group of 100 children
not exposed to lead were followed. 5 of these
children developed an affective disorder
What is the incidence of affective disorders among

those exposed to high lead levels?
Affective Disorder
present absent
exposed
40 60
not exposed 5 95

those exposed to high lead levels?
40/100 = 40% or 0.40

those NOT exposed to high lead levels?
5/100 = 5% or 0.05
Retrospective Cohort Study
Initiation of study occurs after occurrence of

disease / outcome of interest
Allows investigators to complete study in less
time (saves money)
Subject to bias
Since they depend on exposure data occurring
previously, info can be incomplete; information
on confounders may not be available
Retrospective Cohort Study
Exposure Disease
?
Both exposure and disease have already occurred

Retrospective Cohort Study: Example
In 1963, a group of asbestos workers was identified from social

security tax returns during 1948-1951
All deaths in the group 1948-1963 were investigated and

compared to deaths of a group of cotton textile workers
An excess lung cancer mortality was revealed among the

asbestos workers
= Non-concurrent
Cohort
= Historical cohort
Study
FAMOUS COHORT STUDIES
The Framingham Study
One of the most important and best-known cohort of

cardiovascular disease begun in 1948
Population 30,000
Eligibility: age between 30 and 62 years of age
Many exposures: smoking, obesity, elevated blood pressure,
elevated cholesterol levels, low levels of physical activity
New coronary events were identified by examining the
study population every 2 years and by daily surveillance of
hospitalizations at the only one hospital in Framingham
The Framingham Study -hypotheses
The incidence of Coronary Heart Disease (CHD) increases with age.
It occurs earlier and more frequently in males
Persons with hypertension develop CHD at a greater rate than those
who are normotensive
Elevated blood cholesterol level is associated with an increased risk
for CHD
Tobacco smoking and habitual use of alcohol are associated with an
increased incidence of CHD
Increased physical activity is associated with a decrease in the
development of CHD
An increased in body weight predisposes a person to the
development of CHD
An increased rate of development of CHD occurs in patients with
diabetes mellitus
Cohort studies of Special Exposures
Atomic Bomb Causality Commission:

Hiroshima and Nagasaki survivors (effects of
radiation)
Dutch famine survivors (effects of starvation)

Cohort studies Childhood Health and Disease
Fetuses exposed to radiation from atomic bombs in Hiroshima and

Nagasaki during World War II
Followed-up for development of cancer and other health problems
resulting from intrauterine exposure to radiation
Exposure dose was calibrated for the survivors on the basis of how far
the person was from the point of bomb drop and the nature of
barriers were present
Determine risk of adverse outcome to the radiation dose that was
received
Cohort pregnancies during Dutch Famine in World War II
Identify cohorts who were exposed to the severe famine at different
times in gestation and to compare them with each other and with a
non-exposed group
OCCUPATION-BASED COHORTS
British Doctors Study ( Doll smoking)

Nurses Health Study (Speizer, Willett
many issues)
London civil servants (Marmot - SES)
Taiwanese civil servants (Beasley chronic
hepatitis & liver cancer)
Advantages of Cohort Studies
The temporal sequence btwn exposure & disease is

clearly established
Well suited for assessing efx of rare exposures
Incidence of disease can be calculated
Can examine multiple outcomes of a single
exposure
True estimate of risk can be calculated - RR
Disadvantages of Cohort Studies:
Time
Money expensive
Subject to loss of follow up
Added: data may not be completely reliable or

completeie. Unable to retain all records
When not to do a cohort study?
No clear distinction between exposed and not

exposed
Rare diseases not enough people
Chronic disease: long gap btwn exposure &
outcome more costly! Makes it harder to follow
the longer you have to do it
Bad records, unreliable data for retrospective
cohort study
5 Steps in cohort study
1. Choose the design

2. Selection of exposed group
3. Selection of comparison group
4. Follow up
5. Analysis and interpretation
Choice of Cohort Design
Depends on the study question

Food handling and gastroenteritis: retrospective study
design
To investigate the association between hyperlipidemia

and coronary heart disease: prospective study design
using community cohort (Framingham Study)
Selection of the Exposed Population
For diseases with common exposures: highly
compliant and motivated participants are
chosen e.g. doctors, nurses, union members,
veterans
For diseases with rare exposure: usually find high
exposure groups (uranium workers, Chernobyl
residents)
The higher exposure should lead to relatively higher
incidence of the disease
Selection of the Comparison Group
The comparison group must be free from the

exposure, but otherwise similar to the case group
In case of a special exposure group (asbestos

workers), an external comparison group (e.g.
general population) should be used
McMichael A. Mortality among rubber
workers: Relationship to specific jobs. J
Occup. Med 18:178,1976
Rubber workers at a tire manufacturing plant in
Akron, Ohio were followed for development of
disease and causes of death
Comparison group was from the general US
population, matched for age and sex
All cause mortality for rubber workers was only 82%
of general population
This is an example of the Hawthorne effect
Hawthorne Effect
When individuals participating in a study change behavior
(mainly towards positive) for a temporary period of time,
due to the fact that they are being observed
This can be addressed by using workers from the same

establishment who are not exposed to the risk factor, as
comparison group
e.g. within the same company compare factory floor

workers (exposed) and office workers (not exposed)
Sources of Data
Need accurate and readily available data, can be
difficult for retrospective cohort studies
Usual sources e.g medical / death certificates,

interviews, questionnaires
Exposure information: pre-existing records more

objective than patient interviews
Outcome Data
Death certificates
Hospital records
Periodic health exams of cohort (e.g. Framingham Heart

Study)
Questionnaires : validate with medical records (Nurses

Health Study)
Issues in a cohort study
Bias
Loss to follow up
Non participation
Bias in cohort study
Bias is less of a concern in cohort studies

than in case-control studies
Recall bias rare, but misclassification and

selection bias can still occur
Lost to Follow Up
Since cohort studies follow people over a period of

time, participants can move or become non-
compliant
A study that loses >25% of participants is

flawed
The study design should use stable, compliant

individuals who are committed
Case Control Studies
Types of Analytic Studies
Observational
Case-control
Cohort
Experimental
Randomized control trial
Case-control Study
People diagnosed as having a disease (cases) are

compared with persons who do not have the
disease (controls)
The purpose is to determine if the 2 groups differ

by exposure
It compares cases and controls with regard to the

exposures in their past Final
Case Control Studies
Study which involves identifying patients who

have the outcome of interest (cases) and control
patients who do not have that same outcome,
and looking back to see if they had the
exposure of interest
Final
The exposure could be some environmental

factor, a behavioral factor, or exposure to a drug
Case-control Study
Final
Case-control Study Design - benefits
Rare diseases can not be analyzed easily using another

approach. Case control study is best
Chronic diseases (e.g. cancer) have long latency periods. Case

control study is suitable
Time & money issues: cost efficient & less time to complete
Many exposure factors can be studied at the same time
Final
Case-Control Design - issues
Difficulties choosing appropriate controls

Can not get true estimate of risk (relative risk cannot be calculated
here b/c we dont have incidence - # of new cases; instead
calculate odds ratio)
Issues w/temporal association (which came first & which came
later which can be established however in Cohort Study)
Strong potential for bias
Confounding
Final
Design of Case-control Studies
I. Definition and selection of cases

II. Selection of controls
III. Ascertainment of disease & exposure status
IV. Analysis by calculating odd ratio (vs. relative risk
for cohort)
Final
I. Definition and Selection of Cases
The definition must be specific

e.g. meningioma, not brain tumor
Use standardized diagnostic criteria

II. Selection of Controls
selection of an appropriate control group is most

difficult issue in case-control design
They should be comparable to the source

population of the cases, including exclusions and
restrictions
No control group is optimal

Selection of Controls multiple controls
Investigators usually use 2 to 4 control groups, selected

in different ways.
Since cases are rarer than controls, and we choose

cases and controls, we can have more than one control
per case to improve the statistical power of our study.
Community Controls
Neighborhood controls
Best friend control (same habits + limiting
confounders)
Spouse or sibling control
Selection of Hospitalized Controls
Easily identified and readily available

Medical records & health histories available
Less non-response
Confounders: Hospitalized patients are more
likely to be smokers, alcoholics and with other
high risk behaviors
III. Ascertainment of Disease and
Exposure Status
Disease:
Hospital records, case-registries, pathology log books
etc
Exposure:
Interview, mail questionnaires, medical records etc
IV. Stratified Analysis
Create strata of the confounding variable

If sex is a confounder then analyze men and women
separately
If age is a confounder then analyze data separately for each
age group
Disadvantage:
It is extremely cumbersome
Difficult to control for more than 1 confounder at a time

Case-control Study and the
Odds Ratio
Incidence can not be derived in a case-control

study
The estimate of relative risk (odds ratio) can

be calculated
ODDS RATIO
Example
Of 200 patients in the hospital, 50 have lung cancer. Of these 50 patients, 45
are smokers. Of the remaining 150 hospitalized patients who do not have lung
cancer, 60 are smokers. This information can be used to calculate the odds ratio
for smoking and the risk of lung cancer.
Disease (n = 50)
LC (n = 50) No LC (n = 150)
Smokers 45 60
Exposure
Non-smokers 5 90
ODDS RATIO
Cases Controls
with LC without LC
smokers A=45 B=60
nonsmokers C=5 D=90
Odds Ratio =
ratio of odds of exposure among cases
ratio of odds of exposure among controls
OR = A/C or (A)(D) = (45)(90) = 13.5

B/D (B)(C) (5)(60)
Analysis of Case-control Studies
Case Control
Yes
a b
Exposure
No
c d
Odds Ratio = ad
bc
Odds Ratio (OR)
A ratio that measures the odds of exposure for cases

compared to odds of exposure for controls
OR Numerator: Odds of exposure for cases
OR Denominator: Odds of exposure for controls

Interpreting the Odds Ratio for CHD and
smoking
Odds Ratio = 1.62

Those with CHD are 1.62 times more
likely to be smokers than those without
CHD
The odds of exposure for cases are 1.62
times the odds of exposure for controls.
Bias
Bias is any systematic error in an epidemiologic study that
results in incorrect estimate of risk of association btwn
exposure & risk of disease
All studies, but esp case-control studies, have potential for

bias
The efx are difficult to evaluate in the analysis
Bias should be eliminated, if possible, when the study is

designed
Bias
Selection bias
Recall bias
Interviewer bias
Experimenter expectancy
Misclassification bias
Measurement bias
Final
Selection Bias
= sampling bias
Sample selected differs in properties in cases
and controls
Sample selection may involve pre- or post-

selection that may preferentially include or
exclude certain kinds of samples and therefore
affecting the results
Selection Bias - examples
A Case control study for heart disease and smoking. Ex:
Cases are selected from a community, controls are selected
from a health club.
A case control study for endometrial cancer and hormone

replacement therapy. Cases are post-menopausal women
and controls are younger women controls should actually
be taken from post-menopausal age w/out endometrial
cancer*** - THIS IS HOW we correct for selection bias
Recall Bias:
Recall bias occurs whenever inds with a particular

adverse effect outcome rmr their previous exposure exp
diff ly from those who arent similarly affected
Ex: People who are sick tend to think about possible

causes for their illness
Recall bias can lead to over or underestimation of risk
Final
Recall Bias - example
A study of prenatal infections & congenital

malformations:
Cases mothers of children with congenital
malformations
Controls mothers of children without congenital
malformations Final
Mothers of children with congenital malformations

remembered better about infections during pregnancy
Interviewer Bias
Interviewers who are aware of the study hypothesis are likely

to question cases and controls differently
More probing questions may be asked of cases
interviewer may unconsciously sabotage the process

Experimenter expectancy
Pygmalion effect
Experimenters expectations are communicated

to subjects, unintentionally
The subject then produce the desired effect

Misclassification and Measurement Bias
Misclassification: Subjects may be erroneously

categorized with respect to exposure or disease status
Measurement: Method of collecting information was

flawed
Control of Bias
Selection bias: controls picked from same source
as cases, use motivated individuals
Recall bias: one hospitalized control group
Interviewer bias: highly trained personnel, blinded

to study hypothesis, standardized
questionnaires etc
Misclassification bias: use standard sources to

validate
Confounding
A potential confounder is the variable that is known to be

associated with the outcome (effect) even though it is not
the variable under study
Confounder
Exposure Disease
Final
Confounders
Common confounders include age, gender, tobacco,

alcohol, socio-economic status
e.g. A case-control study shows an association between

decreased level of physical activity and increased risk of MI.
Could age be a confounder?
Final
Controlling for Confounding
Study design:
Matching
Restriction
Analysis:
Stratified analysis
Multivariate analysis
Final
Matching
Cases and controls are matched by usual
confounders (e.g. age, sex, SES, smoking, alcohol
etc.) so that these factors are equally distributed in
both groups and will not confound the association
between the variables
Disadvantage: It can be very difficult and expensive to

find a perfect match for each case
Restriction
Another way to reduce the effect of confounders in a study is
to place restrictions on the study subjects
If smoking can be a confounder, then only enroll non-smokers
If age is a confounder then place age restrictions

Evaluation of a Case-control Study
Was the study design appropriate?

How were cases diagnosed and selected?
How were controls selected?
Did the investigators identify areas of bias and
confounding?
How was bias minimized?
How was confounding controlled?
Randomized Clinical
Trials
Intervention (Experimental) Studies
Also known as the clinical trial
Similar to a cohort study as individuals are studied on

basis of exposure
Main difference btwn observational & intervention

studies is that investigator in intervention study have
full control over exposure received by the participants
In RCT, tx is assigned by randomization

Randomized Clinical Trial (RCT) vs.
Population (RCT) Population (Cohort)
Volunteers Volunteers
RANDOMIZE
intervention control exposed non-exposed
outcome outcome outcome outcome

Types of Intervention Studies
Clinical trial (Therapeutic)

Does the agent or procedure diminish symptoms
(disease) or decrease mortality from disease in a group
of individuals?
e.g. most drug trials and therapy trials
Community trial (Preventive / Prophylactic)

Does the agent or procedure decrease the incidence of
disease in a given community?
e.g. most vaccine trials
Uses of RCT
Evaluate new drugs

Evaluate new treatment procedures
Testing efficacy of new health care
program
Assessment of preventive measures
Assessment of new program for screening
Assessment of new ways to deliver health
services
Design of RCT
Reference population
Experimental population
Participants
Treatment Allocation (Randomly)
Treatment group Comparison group(s)

Crossover Design
The subjects get both treatments in sequence
Each subject serves as his / her own control
A subject is randomly assigned to a specific treatment

order
Some subjects will receive the standard therapy first,

followed by the new therapy (A,B). Others will receive
the new therapy first, followed by the standard
therapy (B,A) Final
Crossover Design-planned
Final
Conducting a Clinical Trial
Formulate the hypothesis
Choose sample size and select participants
Do necessary exclusions
Random assignment
Outcome measurement
Final
Selection of Study Population
reference group: general group to whom

results will be applicable to
Expal pop: actual group in whom study is

conducted - must be of adequate sample
size and of potential to reach endpoints
Individuals are then assigned randomly to

treatment and comparison groups
Allocation of Study Regimens
Assignment to tx groups should occur after study

pop is chosen & informed consent has been obtained
Randomization tables & computer generated

randomization used most frequently
Block randomization can be used when you wish to

maintain equal #s of pop charac in each group, e.g.
gender (make equal # of men & women in both drug
A & drug B)
Final
Block Randomization
Study population n=1200

(100 women & 1100 men)
100 women 1100 men
Randomization occurs after

assignment into blocks
Drug Drug male and female Drug Drug
A B A B
n=50 n=50 n=550 n=550

Why do we randomize?
Reduces bias due to known and unknown

confounders - should have equal numbers of
potential confounders in all groups
Reduces bias - no selection bias
Final
Blinding
Single blind only experimenter knows assignment

of subjects, subjects dont
Double blind neither experimenter nor subjects

know the assignments
Triple blind subject , experimenter & data analysts

are blind to assignments
Final
Blinding
A double blind trial (or triple-blind) provides the

best protection against bias
When the study is not conducted blind it is

important to scrutinize it carefully for bias are
the groups followed with equal intensity for
evaluation of the outcome?
Why Choose a Control Group?
All intervention studies use a control group

Show that the new treatment is truly effective
The control group can be compared in various
ways:
No intervention vs. Intervention
Placebo treatment vs. Real Treatment
Standard care vs. New care
Final
Uncontrolled trials
Issues arising from not using a control group:
Predictable improvement (ie. headache/migraine

goes away within a day or two)
Fluctuation of disease severity
Hawthorne effect
Predictable Improvement
In many diseases, individuals who are sick will

recover without treatment e.g. common cold,
many other viral infections, headache
If no control group is used, the treatment may

appear to work when in fact it has not
Fluctuation of Disease Severity
Many diseases have a clinical course marked by

exacerbations and remissions, e.g. Crohn's Disease,
Multiple Sclerosis & Migraine
Treatment may be perceived to have a beneficial

effect, when in reality the lessening of symptoms
was part of the natural history of the disease
Hawthorne Effect
Individuals enrolled in a study will change

their behavior solely because they are being
watched as study participants
improvement in sx may just be b/c participant

adopted a healthier lifestyle & not b/c of drug.
This would be difficult to know without a
control group.
Types of controls
Historical control: from past
Concurrent non randomized control
Concurrent randomized control -

Preferred
Controls
Must be a concurrent control group
Must be randomized control
Should receive a placebo or standard

treatment rather than no treatment
Follow up and testing should be identical

Factors that can affect the outcome of RCT
Errors in hypothesis testing
Sample size
Post randomization changes in groups
Analysis of data
Post-randomization Changes in
Comparison Groups
Migration bias:
Study participants may drop out, switch
Tx groups, become non-compliant
Compliance bias:
Inds in 1 tx arm may drop out at higher
rates due to factors such as side efx
Analysis of Data
Intention to treat (preserves random

allocation, simulates real world experience)
Explanatory only analyzes those who

actually take treatment
Intention to Treat
Study population
Drug A Placebo
1000 1000
200 800 1000

non-compliant compliant compliant
250 250
cured cured
Intention to treat means that the cure rate for Drug A

is calculated as 250/1000 = 25%
The cure rate for placebo arm is 250/1000 = 25%
Drug A has no effect

Explanatory
Study population
Drug A Placebo
1000 1000
200 800 1000

compliant compliant
noncompliant
250 250
cured cured
Explanatory analysis means that the cure rate for Drug A is calculated as
250/800 = 31%
The cure rate for placebo arm is 250/1000 = 25%
Drug A is more effective than placebo

Important
Analyses in the medical literature (drug

trials) should always be analyzed and
reported by intention to treat method
Intention to Treat why include non-
compliant
Attempting to account for noncompliance by

excluding noncompliant subjects can bias the
treatment evaluation
In clinical practice, some patients are not fully
compliant
Compliant subjects usually have better outcomes
than noncompliant subjects, regardless of tx
Outcome:
Efficacy vs. Effectiveness
Efficacy = ability of tx to work in ideal study setting
Effectiveness = ability of tx to work under realistic

circumstances by using intention to treat analysis;
preferred over efficacy
For efficacy trails, explanatory analysis can be used

but for effectiveness trials we use intent to treat
Multi-center trials
There may not be enough patient at one given

center to give big enough sample for a study
Many centers conduct a study using the same

intervention and placebo
Trials of N equal to 1:
Clinical trial done w/ind patients

Pt given tx or placebo randomly at diff times
A record is kept of simple outcomes like
symptom score, relief etc
Only possible in conditions which occur
frequently & resolve quickly e.g. migraine,
asthma
Phases of Clinical Trials
Preclinical Phase: Animal studies
Phase I: Initial testing in healthy human volunteers

following animal studies
Identify dose limiting toxicities, tolerated doses, describe
pharmacology (metabolism, excretion)
Phase II: testing in subjects w/disease to determine activity

& therapeutic efficacy Validate toxicity & dosage data
Phase III: Randomized trials for comparison w/Standard

therapy
Phase IV: Studies done after drug/tx has been marketed to

gather info on drug's effect in various pops & any side
efx associated w/long-term use
Declaration of Helsinki
Declaration of Helsinki 1964 (WMA)-

Document on research ethics
Informed consent must be obtained from all

participants involved in human experiments
Ethical Aspects
Informed consent
Protecting the interests of the patient
Withholding treatment known to be effective
Monitoring for toxicity and adverse effects
Stopping rules
When to withdraw a patient from study
Informed Consent
Patients must be aware of the study hypothesis

They must understand that they can be assigned to
treatment or placebo arms
They must be told all possible consequences of
participation
Minors can only enroll with guardians consent
An ethics board must oversee the study
Final
Stopping Rules
Guidelines for deciding when a trial should be

modified or terminated:
Over time knowledge about a disease treatment

may come to light
New treatments may become available
If the results show a sustained statistical association,
it is unethical to withhold treatment from the
placebo arm
DeMets D Hardy R et al Statistical Aspects of early
termination in the Beta-Blocker Heart Attack Trial. Cont Clin
Trials 5:362,1984
Beta Blocker Heart Attack Trial was a randomized double blind

study comparing propanolol with placebo in 3837 patients with
a recent myocardial infarction
The trial was terminated by external monitoring board 9
months before schedule
The propanolol group had a 26% reduction in mortality
compared to placebo (p = 0.005)
Screening
Screening
A strategy used to identify disease in an unsuspecting population
Tests are performed mainly on those without any clinical

indication of disease, the apparently well
Test must be simple, rapid and preferably inexpensive

Screening
Basic purpose is to detect disease from a large group of
apparently well persons early.
Thus enabling diagnostic workup and if diseased, brought
to treatment with intention to reduce mortality and
suffering from this disease
Ex: Pap smears (cervical cancer), Colonoscopy (colon
cancer), Mantoux tests (TB), mammograms (breast cancer)
& PSA (prostate cancer) tests
Interpretation of
Diagnostic Procedures
Two aspects of measurement that are
crucial in evaluating laboratory
tests, physical maneuvers, or any
diagnostic procedure:
Reliability
Validity
Reliability aka Reproducibility
Whether a lab test consistently gives the same value when

multiple tests are conducted on the same sample
Inter-rater reliability degree of agreement among raters

Test-retest reliability measure of reliability obtained by
administering the same test twice over a period of time to a group of
individuals
A good Kappa value, that indicates a reliable test &

reliable raters, is at least 75%.
Validity
screening tests ability to do what its supposed to
do:
To distinguish btwn subjects w/condition & those
w/out
So
Whether what is intended to be measured is in fact
measured
i.e. whether a positive lab test indicates a
person truly has the disease
Validity
This is measured by
sensitivity and specificity
If the disease is present how often does the test detect it

: Sensitivity
If the disease is not present how often does the test

correctly gives a negative result : Specificity
Final
Validity: Screening Tests
Disease status
Present Absent
+ +
Results of
Positive a b
True +ve False +ve
Screening test
- -
Negative
c d
False ve True ve
Final
Screening Tests
Disease status
Present Absent
True False
Positive
Results of positive positive
Screening Test
Negative False True
negative negative
Sensitivity
The proportion of persons w/disease whore correctly
identified by test
= The probability that a diseased person will have a positive

test result
true positive rate
A highly sensitive test gives positive results in individuals who have

disease
Sensitivity= true positives = true positives_________ x100

Diseased individuals true positives + false negatives
Final
Used L column only
Sensitivity
Disease
Present Absent
Positive TP or a FP or b
Test
Negative
Test FN or c TN or d
Measures only the distribution of persons with disease

Uses data from the left column
Final
Specificity
The proportion of persons without the disease who are
correctly identified by the test
= The probability that a disease-free individual will have a
negative test result
true negative rate
A highly specific test gives negative results in individuals who do not

have disease
Specificity= true negatives = true negatives________ x 100

Non-diseased individuals true negatives + false positives
Use R column ONLY! Final
Specificity
Disease
Present Absent
Test
Negative
Measures only the distribution of persons who are disease free

Uses data from the right column
Final
Application of sensitivity and
specificity
Screening tests do not have both 100% sensitivity &
100% specificity
In order to have a high yield often a series of tests are

done, 1st with high sensitivity and the second with high
specificity
Examples include VDRL & FTA-ABS for syphilis and

ELISA and Western Blot testing for HIV Final
Relation Sensitivity and
Specificity
We would like to have a sensitivity and

specificity that are both as close to 100% as
possible
In practice we may gain sensitivity at the

expense of specificity and vice versa
Final
Population distribution of intraocular
pressures in those with and without
Low specificity Glaucoma
High specificity
High sensitivity Low sensitivity
Screening level set here :

Area of Poor Sensitivity all ppl w/disease WONT be IDd
Overlap Good Specificity all ppl w/out disease will be correctly
Numbers of Eyes
IDd out
Hence less false +ves
Screening level set here:

Eyes Good Sensitivity all ppl w/disease will be IDd
without Poor Specificity but all ppl w/out disease will not be
correctly IDd
Glaucoma Hence less false -ves
Eyes with Glaucoma

14 16 18 20 22 24 26 28 30 32 34 36 38 40 42
Final
Intraocular pressure in MM of HG
Post-test Probability
Positive Predictive Value

Negative Predictive Value
Final
Positive Predictive Value = PPV
probability that ind with a positive test result has disease
PPV= true positives = a/(a+b)

all with positive tests
USE ONLY TOP ROW!
Final
Disease
Present Absent
Test
Negative
Final
TP/(TP+FP)
Measures only the distribution of persons with a positive test
Uses data from the top row
The probability that an individual with a negative

test result does not have the disease.
NPV: true negatives =d/(c+d)

all with negative tests
ONLY USE BOTTOM ROW

Final
Disease
Present Absent
Test
Negative
Final
TN/(TN+FN)
Measures only the distribution of persons with a negative test
Uses data from the bottom row
Two examples
Disease
(as determined by Final
"Gold standard")
Present Absent
True Pos pred

Pos False Positive
Positive value
Test
outcome
False Neg pred
Neg True Negative
Negative value

Sensitivity Specificity
FOB screen test is used in 203 people to look for bowel cancer:
Patients with bowel cancer
(as confirmed on endoscopy) Final
Present Absent
= TP / (TP + FP)
Pos TP = 2 FP = 18 = 2 / (2 + 18)
FOB = 2 / 20 10%
test = TN / (TN + FN)
Neg FN = 1 TN = 182 182 / (1 + 182)
= 182 / 183 99.5%

= TN / (FP + TN)
= TP / (TP + FN)
= 182 / (18 + 182)
= 2 / (2 + 1)
= 182 / 200
= 2 / 3 66.67%
PPV=10%: Positive test is poor at confirming 91%
cancer
Sensitivity: It will pickup 66.7% of all cancers
Specificity: As initial screen it correctly identifies 91% of those who do not have cancer
NPV=99.5%: As a screening, a negative result is very good at reassuring a patient
does not have cancer
Breast Cancer Detection and Implications for
Periodicity of Screening. Am J. Epi 100: 357-366,1974
Breast Cancer
Present Not Present
Positive 132 985

Screening Test
Negative 47 62,295
Prevalence = (a+c)/ (a+b+c+d)= 179/63,459 = 0.3%

Breast Cancer
Present Not Present Final
Positive
132 985
Screening Test
Negative 47 62,295
1. Sensitivity = a/ (a+c) = 132 / 179 = 73.7%
2. Specificity = d/(b+d) = 62,295 / 63,280 = 98.4%
3. PV+ = a / (a+b) = 132 / 1117 = 11.8%
4. PV- = d/ (c+d) = 62,295 / 62,342 = 99.9%

Accuracy
Proportion of all subjects who were correctly
classified by the test
The degree to which a measurement represents the true value
(TP + TN) / (TP+TN+FP+FN) =
True Positives + True Negatives / Total Screened

Gold-standard
The gold-standard is a test that is considered to be the
most accurate among all the known tests. All the others
should be compared with this test, in order to indicate
whether they are reliable.
Prevalence
The proportion of individuals in a population who

have the disease
Number with disease

Total Number of individuals in the study
(a+c)/ (a+b+c+d)
Predictive Value,
specificity & prevalence
The Positive Predictive Value
the probability that if the test is positive, the patient truly has the
disease
depends on:
the Specificity and
even more on Prevalence of the disease
PPV increases when specificity &/or prevalence increases!!!

Disease
present absent
Positive
900 4950
Screening Test
Prevalence 1%
Negative 100 94,050
1. Sensitivity = a/ (a+c) = 900 / 1000 = 90%
2. Specificity = d/(b+d) = 94,050 / 99,000 = 95%
3. PV+ = a / (a+b) = 900 / 5850 = 15.4%
4. PV- = d/ (c+d) = 94,050 / 94,150 = 99.9%

Disease
present absent
Positive
900 1980
Screening Test
Prevalence 1%
Negative 100 97,020
1. Sensitivity = a/ (a+c) = 900/1000 = 90%
2. Specificity = d/(b+d) = 97,020/99,000 = 98%
3. PV+ = a / (a+b) = 900/2880 = 31.3%
4. PV- = d/ (c+d) = 97,020/97,120 = 99.9%

Disease
present absent
Positive
4,500 4,750
Screening Test
Prevalence 5%
Negative 500 90,250
1. Sensitivity = a/ (a+c) = 4,500/5000 = 90%
2. Specificity = d/(b+d) = 90,250/95,000 = 95%
3. PV+ = a / (a+b) = 4,500/9,250 = 48.6%
4. PV- = d/ (c+d) = 97,020/97,120 = 99.9%

Effect of Prevalence on
Prevalence PV+ Sensitivity Specificity

% % %
0.1 1.8 90 95
1.0 15.4 90 95
5.0 48.6 90 95
50 94.7 90 95
Key Points
If the prevalence of a disease low, then PV+ will be
low even if you have a test w/high sensitivity &
specificity
This is a reason that b/c most rare diseases are not

screened for
The yield can be increased by screening in high risk

groups (Tay Sachs among Jews, sickle cell among
African Americans/Mediterranean origin, Huntingtons
Disease in family groups)
Bias in screening programs
Sometimes even if a test is valid and reliable, the results

obtained can be biased
Three main types exist:

lead time bias,
length bias and
selection bias
Lead Time Bias
The misperception that the case has a longer
survival simply because the disease was identified
earlier in the natural course of disease (even if tx
not working)
Misperception that screening test detection time is
when disease started falsely proving screening
test improves/increases survival time
Lead Time Bias
AGE 35 40 41 43 45
I I I I I
Biologic Disease Patient A Symptoms A&B
onset of detectable diagnosed develop: both
disease by screen at screening B diagnosed die
2 patients A & B and their courses of disease

Both have same age-specific mortality, but different survival
times from diagnosis.
Length Bias
Tumors detected by screening programs tend to be slower
growing and therefore have a better prognosis
The faster growing tumors may become symptomatic in
between scheduled screening and are usually more
aggressive
So screening tends to find tumors with a inherently better

prognosis (aka benign) just good at itdoesnt mean
Misperception that screening itself leads to better
outcomes.
Selection Bias
Individuals who are motivated to participate in
screening programs may have a different probability of
disease than individuals who refuse to participate
Ex: Women with family history breast cancer

joining screening may give biased result; more
women with illness are found and more dying of
it. When it is applied to general population the
results may be different
Controlling Bias
Large and strict RCT
but
These take a long time and are expensive
Fundamental Concepts of Screening
Sensitivity: a/(a+c) = true +ve rate
given disease, how many have a positive test
Specificity: d/(b+d) = true ve rate
given no disease, how many have a negative test
PV+: a/(a+b) = DISEASE RATE
given a positive test, how many have disease
PV-: d/(c+d) = NON-DISEASE RATE
given a neg.test, how many do not have disease
PV+ increases esp. if Prevalence increases

PV+ increases also if Specificity increases
Bias: Lead Time, Length and Selection
A blood test to detect prostate cancer was given to
1000 male members of a large HMO. Although 100
of the men actually had prostate cancer, the test was
positive in only 30; the other 70 patients with prostate
cancer had negative tests. Of the 900 men without
prostate cancer, the test was positive in 150 men and
negative in 750. The specificity of this test is
approximately
A. 7% Whats sensitivity?
B. 17% 30% 30/ [30+70] = 30/100 =
T+ F+
C. 18% F- T-
N = 1000
D. 30%
True +ve = 30 Total true = 100
E. 83% True ve = 750
30 150
70 750
False ve = 70 Total false = 900
False +ve = 150
Cut point of a screening test intended for
detecting Diabetes Mellitus was lowered from
140 mg to 130 mg of glucose per dL of blood.
Change in the cut point for this screening test
would
A. Increase specificity
B. Increase sensitivity
C. Decrease true positive rate
D. No change in sensitivity or specificity
since the number of people with DM will
remain the same
Confidence Interval
for Relative Risk and Odds ratio
Confidence Interval
(in terms of RR or OR)
Provides an interval range around the odds ratio or
the relative risk and represents the range within
which the true magnitude of effect lies
Usually set at 95% level equivalent to p< 0.05

Provides all the information of the p value and
more
If interval doesnt contain 1.0 then association

btwn variables is SIGNIFICANT***
LOOK NOTES UNDERNEATH*

Example 1
A study is designed to investigate the association
between body fat and breast cancer. The results show a
risk ratio of 6.0, however the 95% confidence interval is
(0.8 - 23.2).
Since the interval includes 1, the results may be due to
chance alone not significant
Example 2
A study is designed to investigate the association
between alcoholism and cirrhosis of liver. The results
show a risk ratio of 4.5 (95% CI 2.2-6.8).
Since the interval does not include 1, the result is found

to be statistically significant
CAUSE EFFECT
RELATIONSHIP
Hills Criteria
Association and Cause
Just because there is an association does not

mean it is a cause as well
Once you see an association, you apply certain

criteria to see if it is causal.
Knowing When to Accept the Findings of a
Study
Statistical Significance is only part of the answer

Hills Criteria are used to help you
decide whether or not to accept the
findings
Whenever you do a criticism of a medical study you
should use Hills Criteria
Final
8 Hills Criteria:
1. Study design
2. Strength of Association
3. Consistency
4. Correct Temporal Relationship clearly established or not?
5. Dose Response Relationship
6. Plausibility is there a known scientific explanation?
7. Specificity
8. Analogy
Final
1. Study Design
Rank strongest to weakest study design
Experimental study (RCT) strongest

Prospective cohort study
Historical cohort study
Case-control study
Cross-sectional study
Case-series
Case report weakest
2. Strength of Association
Large relative risk or odds ratio

e.g. RR of 27 vs. RR of 1.68
Statistically significant (p value < 0.05)

e.g. p value of 0.00001 vs. 0.047
3. Consistency
Several diff studies conducted at diff times in diff settings &

w/diff patients all come to same conclusion.
For example:
Many studies of different designs (case control, cohort, case
series etc.), using different subjects, found an association
between smoking and lung cancer
4. Correct Temporal Relationship
Cause must always precede effect

Ex: Smoking precedes lung cancer
Final
5. Dose Response Relationship
Risk increases with increasing exposure to

the risk factor
Increased smoking increases the risk of lung
cancer
Final
6. Biologic Plausibility
Consistent with the current knowledge of the

underlying mechanisms of disease
Or
Makes sense according to current knowledge
e.g. smoking and lung cancer

7. Specificity (??)
Single cause linked to a single effect provides evidence

in favor of a causal relationship
Usually acute infectious diseases or hereditary diseases
/ single gene defects
Cause need not be specific to causing that disease,
may also cause other diseases
8. Analogy
Existence of other cause and effect relationships

analogous to the one in question
If toxins in cigarette smoke can cause lung disease, so

can other toxins like asbestos, arsenic and uranium.
Nested Case-control Studies
A case-control study can be inserted into a cohort

study
When enough individuals have developed the

outcome of interest, they can be compared to
controls
This allows us to look at and compare for other

exposures
Nested Case-Control Study
Start as cohort study
Population
Exposed and unexposed groups
Do Not
Develop Develop
Disease Disease
Cases Subgroup
Selected as
Controls
CASE-CONTROL STUDY
Evidence based medicine
: conscientious explicit & judicious use of current best

evidence in making decisions about care of ind pts
Conscientious being careful, & thorough, in what you do
Explicit being open, clear and transparent
Judicious using good judgment and common sense
Using the most reliable evidence from clinical research, scientific understanding and medical practice
to make the best possible medical decisions for patients.
EBM not only identifies which txs are effective but also those which are ineffective and may do
more harm than good, and identifies areas where more investigation is needed and where there may
be gaps in knowledge
Steps for practicing EBM
Step 1: Formulating a well built question
Clinicians work in order to convert the need for information

(regarding prevention, diagnosis, prognosis, therapy,
causation, etc) into an answerable question.
Example: Is an exercise program or a nutritional education

program more effective in reducing weight in obese
elementary school children?
Step 2: Identifying resources
Clinicians seek to assemble the best and most up-to-date evidence with
which to answer that question.
Need to consult several types of information resources.
Example: Harrisons principles of internal medicine, Cochrane

database of systematic reviews, Dynamed, MEDLINE
Step 3: Critical appraisal
Clinicians appraise and assess evidence for its validity

(truthfulness), impact (size of effect), & applicability
(usefulness to specific clinical practice & situation).
Who were the patients and how were they selected?
Were they randomized for treatment?

How were the confounders addressed?
Were the results significant?
Step 4: Applying the evidence
Clinicians integrate the critical appraisal with clinical expertise

and with the patients unique biology, values, and
circumstances.
Step 5: Re-evaluation
Clinicians evaluate effectiveness in executing Steps 1-4.

Clinicians also seek ways to improve methods for next
clinical encounter.
Systematic reviews
A thorough, comprehensive, and explicit way of interrogating the
medical literature. It typically involves several steps, including (1)
asking an answerable question (2) identifying one or more
databases to search, (3) developing an explicit search strategy, (4)
selecting titles, abstracts, and manuscripts based on explicit
inclusion and exclusion criteria, and (5) abstracting data in a
standardized format.
Ex: Cochrane reviews

Abstract
Background
Depression is a debilitating condition affecting more than 350 million people worldwide (WHO 2012) with a limited number of evidence-based
treatments. Drug treatments may be inappropriate due to side effects and cost, and not everyone can use talking therapies.There is a need for evidence-
based treatments that can be applied across cultures and with people who find it difficult to verbally articulate thoughts and feelings. Dance movement
therapy (DMT) is used with people from a range of cultural and intellectual backgrounds, but effectiveness remains unclear.
Objectives
To examine the effects of DMT for depression with or without standard care, compared to no treatment or standard care alone, psychological
therapies, drug treatment, or other physical interventions. Also, to compare the effectiveness of different DMT approaches.
Search methods
The Cochrane Depression, Anxiety and Neurosis Review Group's Specialised Register (CCDANCTR-Studies and CCDANCTR-References) and
CINAHL were searched (to 2 Oct 2014) together with the World Health Organization's International Clinical Trials Registry Platform (WHO ICTRP)
and ClinicalTrials.gov. The review authors also searched the Allied and Complementary Medicine Database (AMED), the Education Resources
Information Center (ERIC) and Dissertation Abstracts (to August 2013), handsearched bibliographies, contacted professional associations, educational
programmes and dance therapy experts worldwide.
Selection criteria
Inclusion criteria were: randomised controlled trials (RCTs) studying outcomes for people of any age with depression as defined by the trialist, with at
least one group being DMT. DMT was defined as: participatory dance movement with clear psychotherapeutic intent, facilitated by an individual with a
level of training that could be reasonably expected within the country in which the trial was conducted. For example, in the USA this would either be a
trainee, or qualified and credentialed by the American Dance Therapy Association (ADTA). In the UK, the therapist would either be in training with, or
accredited by, the Association for Dance Movement Psychotherapy (ADMP, UK). Similar professional bodies exist in Europe, but in some countries
(e.g. China) where the profession is in development, a lower level of qualification would mirror the situation some decades previously in the USA or
UK. Hence, the review authors accepted a relevant professional qualification (e.g. nursing or psychodynamic therapies) plus a clear description of the
treatment that would indicate its adherence to published guidelines including Levy 1992, ADMP UK 2015, Meekums 2002, and Karkou 2006.
Main results
Three studies totalling 147 participants (107 adults and 40 adolescents) met the inclusion criteria. Seventy-four participants took
part in DMT treatment, while 73 comprised the control groups. Two studies included male and female adults with depression. One
of these studies included outpatient participants; the other study was conducted with inpatients at an urban hospital. The third
study reported findings with female adolescents in a middle-school setting. All included studies collected continuous data using
two different depression measures: the clinician-completed Hamilton Depression Rating Scale (HAM-D); and the Symptom
Checklist-90-R (SCL-90-R) (self-rating scale).
Statistical heterogeneity was identified between the three studies. There was no reliable effect of DMT on depression (SMD -0.67
95% CI -1.40 to 0.05; very low quality evidence). A planned subgroup analysis indicated a positive effect in adults, across two
studies, 107 participants, but this failed to meet clinical significance (SMD -7.33 95% CI -9.92 to -4.73).
One adult study reported drop-out rates, found to be non-significant with an odds ratio of 1.82 [95% CI 0.35 to 9.45]; low quality
evidence. One study measured social functioning, demonstrating a large positive effect (MD -6.80 95 % CI -11.44 to -2.16; very
low quality evidence), but this result was imprecise. One study showed no effect in either direction for quality of life (0.30 95% CI
-0.60 to 1.20; low quality evidence) or self esteem (1.70 95% CI -2.36 to 5.76; low quality evidence).
Authors' conclusions
The low-quality evidence from three small trials with 147 participants does not allow any firm conclusions to be drawn regarding
the effectiveness of DMT for depression. Larger trials of high methodological quality are needed to assess DMT for depression,
with economic analyses and acceptability measures and for all age groups.
Meta-analysis
Statistical approach to combine the data derived from a

systematic-review. Therefore, every meta-analysis should be
based on an underlying systematic review.
Calculation of effect size from all the studies.

Biostatistics I
This Lecture
Frequency Distribution
Measures of Central Location
Measures of Variance
Why Study Statistics?
As medical students / clinicians you are:
Researchers
Consumers of medical research
Statistics in Medical Research
The goal:
To design the process and extent of sampling
in order to form valid and accurate inferences
To make inferences about a population by

analyzing sample data
To make assessments of the extent of

uncertainty in these inferences
With Statistics
We may find differences

(variability) when we make
comparisons
Real differences?
Due to chance?
Frequency Distribution
Ways to describe variation in clinical data:
- Numerical
- Pictorial
Always: values of the variables on horizontal axis and
their frequencies on the vertical axis
Frequency Distribution;
Graphical Display: Pie Chart
Frequency Distribution;
Graphical Display
Frequency Polygon:
The midpoints of the top
of each bar of the
histogram are plotted
and connected with
straight lines.
This makes it easier to
put two or more sets of
data on same graph
Shape of distributions
Properties of frequency
distribution:
- Shape of frequency distribution
- Central Location or Central Tendency
- Variation or Dispersion
Shape of distribution
Symmetric
- Normal distribution (Gaussian Curve)
Skewed
- Tail to the right: positively skewed
- Tail to the left: negatively skewed
Symmetric Distribution
15
RBC cholinesterase
mmol/min/ml
10 Frequency
Freq
5
0
5.95-7.95 7.95-9.95 9.95-11.95 11.95- 13.95-
13.95 15.95
Cholinesterase levels
RBC Cholinesterase
5.
0
5
10
15
95
-7
.9
7. 5
95
-9
.9
9.
95 5
-1
11 1.
95
.9
5 -1
13 3.
95
.9
5 -1
15 5.
95
.9
5 -1
7.
95
Negatively skewed
Frequency
RBC Cholinesterase
5.
95
0
2
4
6
8
10
12
14
16
-7
.9
7. 5
95
-9
9. .9
95 5
-1
11 1.
.9 95
5-
13
13 .9
.9 5
5-
15
15 .9
.9 5
5-
17
.9
5
Positively skewed
Frequency
Normal distribution
Symmetric (bell-shaped) curve
Measures of central tendency
Mean (Arithmetic mean)

Median
Mode
Mean = X
arithmetic average (X1+X2+..+Xn/n)=
sum of the observed measurements

number of observations
The arithmetic center of the distribution

It will give the average when using quantitative
variables with somewhat symmetric distribution
Most commonly used but:
It is sensitive to extreme values or outliers
Median
That measurement below which half the

measurements fall, & half (50%) of #s fall above
that value = 50th percentile
e.g. The length of hospital stay for nine patients

1, 1, 3, 4, 8, 9, 12, 13, 15
median is the middle number = 8

Median
What if the data was:
1, 3, 4, 8, 9, 12, 13, 15?
Median = (8+9)/2 = 8.5

(the average of the 2 middle numbers for an even number
of observations)

Mode
The most frequently occurring observation.

If more than one value occurs frequently the
distribution can be bimodal or multi-modal
e.g. for values 1,4,3,1,2 the mode is 1
for values 2, 4, 2, 3, 1, 5, 1 the distribution is bimodal

as 1 & 2 occur most often
Mode..
What if the data was:
1, 4, 6, 3, 2, 7, 9, 11, 5, 10, 8?
There is no mode for this distribution

Positive skew (Tail to right)
Mean is greater than median; always twds tail

NeMEAN
Negative skew (Tail to left)
Median is greater than mean MeanMed

Determine the average length of stay
for six patients undergoing classic cholecystectomy.
The length of stay in days for each patient is 1, 3, 2, 2, 4, 5
Question:
Calculate mean, median and mode

The average length of stay for six patients undergoing cholecystectomy. The
length of stay in days for each patient is
1, 3, 2, 2, 4, 5
a) Mean = 1+2+2+3+4+5 =17/6 = 2.83

6
b) Median = 2.5 (the average of the 2 middle numbers for an

even number of observations)
c) Mode = 2
Variation or Dispersion
Properties of frequency
distribution:
- Shape of frequency distribution
- Central Location or Central Tendency
- Variation or Dispersion
Measures of Spread or Variation
RANGE
PERCENTILES AND QUARTILES
VARIANCE
STANDARD DEVIATION
Range:
-Arrange the data in ascending order
-Find out the maximum and minimum values
- Maximum value minimum value
Percentile: measure that tells us what percent of total ppl

scored below a given score
percentile rank = percentage of scores that fall below a given score.
Nth percentile = observation that has n% of the
values below it.
Measures of Variation
Variance - average of the squared differences
from the mean
difficult to interpret because it is in the
units of the variable squared
Standard deviation - square root of the

variance; summary of dispersion around the
mean
same units as the variable of interest
Standard deviation
Measure of absolute variation in a given data
set, and a supplement to the mean
Large SD : observations are widely spread out
Small SD: observations are closely centered

around mean
Standard Deviation
The positive square root of variance
s2 = (xi - x)2
n-1
BIRTHWEIGHT
2000
1000
Frequency
Std. Dev = 623.36

Mean = 3367.2
0 N = 9747.00
25 75 12 17 22 27 32 37 42 47 52 57 62
0. 0. 50 50 50 50 50 50 50 50 50 50 50
0 0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0
BIRTHWEIGHT
Statistics
BIRTHWEIGHT
N Valid 9747
Mis sing 0
Mean 3367.19
Median 3405.00
Mode 3430
Std. Deviation 623.36
Variance 388574.54
Minim um 312
Maxim um 6605
Percentiles 25 3061.00
50 3405.00
75 3749.00
Birth weight
Birth weight is approximately normally
distributed (bell-shaped curve)
Mean and median are close
68% of values are within 1 standard deviation of

mean (2744 and 3990 grams)
95% are within 2 standard deviations

( 2121 and 4613 grams)
It has been found that many human biological
characteristics conform to a normal distribution
closely enough for it to be commonly used.
For example, heights of adult men and women,

blood pressures in a healthy population, and
many other types of laboratory measurements
and biochemical data.
Normal Distribution
Blood pressure
Why do we need to know this?
If a particular data shows a normal distribution, we
can apply the specific characteristics of normal
distribution to it
Helps us in deciding normal range ( mean + 2SD)

Enables us in comparing different populations
This will help us in testing a hypothesis
Normal Distribution
What is normal distribution?
It is a theoretically perfect frequency polygon which:
1.Takes the form of a bell shaped curve
2. Is symmetric
3.In which mean, median and mode coincide in

the center
Normal Distribution
In any normal curve, a constant proportion
of cases fall with 1, 2 or 3 standard deviations
of mean:
Within 1 SD: 68%

Within 2 SD: 95% = Normal Range
Within 3 SD: 99.7%
Normal Distribution
Biostatistics II
The Normal Distribution
The Normal Distribution
Questions
In a normal distribution curve, how many cases

are above 2SD above the mean? 2.5 %
In a normal distribution curve, how many cases

are above 2SD below the mean? 97.5%
The scores of a single student in two different
tests are given below. On which test did she do
better in the class?
Score Mean S.Deviation

Test A 45 30 5
Test B 60 40 10
MCQ
The distribution of factor X in a population of
men ages 20-40 follows a multi-modal
distribution with mean of 20 mg/dl and
standard deviation 2 mg/dl
95% of men will have Factor X levels between
ranges
A. 16-24
B. 18-22
C. 15-25
D. Can not be calculated
Bimodal distribution of height
= 2 modes
Statistical Inference
Generalizations from a sample to the

population as a whole
Inferential Statistics is about
Samples and Populations
For obvious reasons studies are not carried out on
entire populations
Sample populations are used to test hypotheses
Inferences are made about the total population

from the data obtained from the sample
The analysis of data of a sample includes

significance testing
underlying premise when conducting a sample study is that
participants are representative of general population
If we assume that sample is random, then mean of sample

(x) should approximate mean of population(). However, if
sample is small or its not random, there can be a large
difference btwn x & (sampling error)
To tell how close is the sample mean value to population

mean value, we use standard error of mean & confidence
interval
Standard Error of the Mean
SEM = SD / n
a measure of the variability of sample means about
the true population mean
(=the precision of the sample mean;
= the quality of the sample)
Larger size of study = more confidence we have in

sample mean, & smaller the standard error of mean
Confidence Interval of Mean
Admitting that any measurement from a sample is
only an estimate of the population:
A confidence Interval specifies how close
our sample based value lies to population value
= the true value
and it gives the range of these values
Confidence Intervals specify how confident we are

To estimate the limits within which the true pop mean
lies and to specify how confident we are of those
limits
To know CI from population mean
CI = x + confidence coeff x Standard Error Mean

Where confidence coefficient (Z score) for :
90% CI is 1.64
95% CI is 1.96 (for calculation use Z score of 2.0 for CI of 95%)
99% CI is 2.58
And SEM = SD / n
Example
length of stay in a Patient Length of
hospital for 5 patients Stay
1 3
we want to calculate 2 5
the 95% CI 3 2
4 3
5 2
Patient Length of Deviation (xi - x)2
Stay (xi - x)
1 3 3-3=0 0
Mean=3
2 5 5-3=2 4
3 2 2-3=-1 1
4 3 3-3=0 0
5 2 2-3=-1 1
CI= Mean Z score x standard error of the mean

S2 = (xi - x)2 = 6/4 = 1.5 S = 1.5 =1.22
n-1
SEM= s/ n = 1.22/2.24 = 0.54 ( 5=2.24)
95% confidence interval of the mean=

3 2 x 0.54 =(1.92 - 4.08)
99% CI= 1.60-4.39 Higher CI=wider interval

Confidence Interval
A 95% Confidence Interval means you are

95% sure (confident) the true value is
between the range of the CI
99% CI is wider than 95% CI

Confidence Interval
The wider the CI, the greater the variability
in the estimate of the effect
The larger the sample, the more precise the

estimate will be and the narrower the CI
e.g.
Prevalence rate of DM in people aged 46-64 years
CI of 95% in a Survey of 90 42.8 -61.0 per 1000
CI of 95% in sample 4x bigger 47.2 -56.6 per 1000
CI of 95% in sample one quarter 33.8 -70.0 per 1000
What is a Z-score?
Z score = score for normal distribution =

confidence coefficient = standard score
It has a mean of 0
It is represented in standard deviation units
Z-score of 2 is 2 SD above the mean,
Z-score -1.5 is 1.5 SD below the mean
To convert a score to Z-score

Z=(x x) / s
Question
Assume that national mean weight of females is
120 pounds, and the std deviation is 6 pounds.
If Mary weighs 112 pounds, what is Marys Z
score?
Applying Z=(x x) / s
= 112-120 /6 = -1.33
Standard Deviation = variability of observations
Standard Error of mean = variability of sample

means about true population mean
95% Confidence Interval of mean = range of

values for pop mean in which youre 95% sure
the true pop mean falls
Statistics I
Continuous Variables with symmetrical distribution require

information on: Shape, Central Location and Variation
Shape of Frequency Distribution:

Normal symmetrical spread or Skewed
Measures of Central Location:

Mean=average, Median=middle, Mode=most common
Statistics I
Measures of Variation:
Range, Percentiles, Standard Deviation(=square root of
Variance)
Normal Distribution (Bell shaped):
1SD=68%, 2SD=95%, 3SD=99.7%
Normal Range :
Mean +/- 2SD
Statistics I
Both SEM and confidence interval indicate how
precise (or imprecise) our estimate is.
The standard error of the mean, is based on variability
in data (the standard deviation) and the size of the
sample
SEM = SD / n
CI: is based on the SEM and defines the interval within

which the true magnitude of effect is likely to fall.
Most used CI 95%
CI = x + confidence coeff x Standard Error Mean
Statistical Notation
S = sample standard deviation

s2= sample variance
x = sample mean
n = number of observations
=population standard deviation
2 = population variance
= population mean
Biostatistics III
VARIABLES
SAMPLING
STATISTICAL SIGNIFICANCE
ERRORS
CHI SQUARE TEST
T TEST
VARIABLES
Medical research is simply the study of

variables and their relationships
When we study a single variable UNIVARIATE

When more than one MULTIVARIATE
TYPES OF VARIABLE SCALES
Categorical
Variable is categorized as one of two or more
alternatives;
* Nominal
* Ordinal
Numerical
* Discrete
* Continuous
TYPES OF VARIABLE SCALES
NOMINAL name; no order
e.g. blood types, gender, race
ORDINAL order
limited categories
e.g. SES, cancer staging
DISCRETE: count number, whole #s, no decimals

e.g. number of pregnancies, number of decayed filled
or missing teeth, # pts
CONTINUOUS: interval/ ratio continuous with

decimals
e.g. body temperature, BP, length, age
Data Collection Techniques
Door to door interviews
Telephone interviews / random digit dialing
Questionnaires by mail
Voter registration or motor registration lists
Patient lists from ERs
Community survey
Volunteer participation in study
SAMPLING
All studies rely on the fact that the sample is
representative of the population
If the sample is not representative, then the

study is flawed and the data are worthless
To ensure representation the sample must be

chosen randomly
WHAT IS A RANDOM
SAMPLE?
A random sample is one in which each member
of the population has an equal chance of being
selected
The sample should be representative of the

study population
SAMPLING TYPES
SIMPLE RANDOM
STRATIFIED
SYSTEMATIC
MULTISTAGE
SNOWBALL
CONVENIENCE
Statistical significance
When an association between 2 variables is seen, we
must ask - Is this true or due to chance?
The P value indicates the probability that the findings

observed could have occurred by chance alone.
The P value for statistical significance is usually set at

<0.05
We can use:
p value
Confidence interval
Statistical Significance
Confidence intervals and p-values are
used to demonstrate statistical significance
P-value< 0.05 and 95% C.I. Both state
that a result as extreme as the one
obtained is likely to have occurred by
chance only 5% of the time
i.e. you can assume that the result is
unlikely to have occurred by chance
How to use CI around Relative Risk
or Odds Ratio as a measure of
statistical significance?
Testing for statistical difference
using CI
Look at the CI carefully
See if the interval contains 1 in it.
If it does, it is not statistically significant
If it does not include 1, it is significant

Statistical Differences
Analysis of outcomes can be done using the

Relative Risk with either p-values or confidence
intervals
Characteristics of groups can be compared by

comparing proportions (chi-square) or means (t-
test)
Hypothesis
Definition:
Statement based on inference, existing literature, or
preliminary studies postulating that a difference exists
between two groups.
The possibility that this difference occurred by chance is

tested using statistical procedures.
Types of hypothesis:
-Null hypothesis (Ho)
--Alternative hypothesis (Ha)
Hypothesis Testing
(H0) Null hypothesis states there is no difference between

characteristics of groups or outcomes, e.g. no relationship
between exposure and disease.
No statistical significance = reject alternative hypothesis & accept
null hypothesis
(Ha) Alternative hypothesis states that there is a difference
between characteristics of groups or outcomes e.g. there is a
relationship between the exposure and the disease
Statistical significance = reject null hypothesis & accept alternative
hypothesis
Test Statistic
To test the hypothesis, we collect a sample & compute a test
statistic, such as a sample mean or sample proportion.
The way we compute the test statistic, depends on sample size, type
of variables , & sometimes shape of population distribution.
Type of variable
When comparing proportions chi square
when comparing mean t test

When to use a Chi-square test?
Chi-square test can be used to compare two

proportions (categorical data).
When to use t-test?
You want to know if treatment impacts on treatment
groups, when the data is interval/ratio, or if samples are
different (when you are comparing Means (averages)
between groups)
If you have 2 groups, use t-test
If you have more than 2 groups, use ANOVA (analysis

of variance)
Independent (non paired) t- test: tests mean
difference in body weights of subjects in group
A & group B at time 1 (i.e., two groups of
subjects are sampled on one occasion).
2 groups; 1 time
Dependent (paired) t- test: tests mean difference in

body weights of ppl in group A at Time 1 &
Time 2 (i.e., same sample people are sampled on
two occasions).
1 group; 2 times
Errors
Type I error ( error) usually preset at 0.05; rejecting null hypothesis
when its true
Thus Assuming there is a significant association when there is none
Prefixed value; how much error can we allow; aka significance lvl (where p value
comes from, & how it should be less than <0.05)
Type II error ( error) accepting null hypothesis when

its false
Thus Assuming there is no association when in fact there was
Power of the test = 1

How can you increase power? By increasing
sample size
Smoking and Birth weight
t-test
Is there a difference in the mean birth weight

of infants born to women who smoked
during pregnancy, compared with infants
born to women who did not smoke during
pregnancy?
Null hypothesis: The mean birth weight of infants

born to women who smoke during pregnancy is
the same as among those who do not.
Alternative hypothesis: The mean birth weight of

infants born to women who smoke during
pregnancy is different from those who do not.
SMOKER Mean Variance Std Dev
1
1 = smokers 3177.96 434245.66 658.97
2
2 = non smokers 3417.27 385266.11 620.70
Difference -239.31
p-value t-value
0.00001 15.890287
Mean birth weights:
born to non-smokers is 3417 grams (~ 7 lbs. 8 oz.)
born to smokers is 3178 grams (~ 7 lbs. 0 oz.)
difference is 239 grams (~ 8 oz.)
Test statistic: t-statistic

Associated P value is 0.00001 (very small)
Reject the null hypothesis
Result is statistically significant

When to use a t-test?
t-test is used to compare two means.
Birth weight - continuous with normal

(bell-shaped) distribution
One-tailed vs. two-tailed tests
A one-tailed test is used when we predict direction
of the difference in advance (e.g. one mean will be
larger than the other).
In standard testing, probability is calculated from
both tails. Thus, p-value from a two-tailed test is
twice the p-value of a one-tailed test.
It is rarely correct to perform a one-tailed test;
usually we want to test whether any difference
exists. So it is always better to perform a two
tailed test.
Smoking and LBW
Chi-Square test
Scenario: As clinicians, knowing that LBW is
associated with increased morbidity and mortality, we
are concerned that pregnant patients who smoke are
more likely to deliver a low birth weight (LBW) infant.
Is there a difference in the proportion of LBW infants

born to women who smoked during pregnancy,
compared with infants born to women who did not
smoke during pregnancy?
Smoking and LBW
Smoking -- any smoking during pregnancy
Variable - SMOKER:
EQUAL TO 1 IF SMOKED DURING PREGNANCY
EQUAL TO 2 IF DID NOT SMOKE DURING
PREGNANCY
Low birthweight (LBW) birth weight of less than

2500 grams (~ 5.8 lbs)
Variable - LOW BIRTHWEIGHT:
EQUAL TO 1 IF BIRTHWEIGHT < 2500 GRAMS
EQUAL TO 2 IF BIRTHWEIGHT 2500 GRAMS
Low Birthweight
SMOKER | 1 2 | Total
-----------+---------------+------
1 | 242 2037 | 2279
> 10.6% 89.4% > 23.4%
| 35.6% 22.4% |
2 | 437 7039 | 7476
> 5.8% 94.2% > 76.6%
| 64.4% 77.6% |
-----------+---------------+------
Total | 679 9076 | 9755
| 7.0% 93.0% |
Chi-Squares P-values
----------- --------
Uncorrected: 61.45 0.000001
Mantel-Haenszel: 61.44 0.000001
Yates corrected: 60.71 0.000001
Smoking and LBW
Null hypothesis: The proportion of low birth

weight infants born to women who smoke during
pregnancy is the same as among those who do not
(No association between smoking and LBW)
Alternative hypothesis: The proportion of low

birth weight infants born to women who smoke is
different from those who do not
Smoking and LBW
Proportions of LBW infants:
born to smokers is 10.6%
born to non-smokers is 5.8%
Test statistic: Chi-square

Associated P value is 0.0001 (very small)
Reject the null hypothesis of no association
Result is statistically significant which means
there is an association between smoking and
LBW
example
Analysis of the data from a research study
designed to examine the hypothesis that
estrogen replacement therapy is associated with
an increased risk for breast cancer reveals a p-
value of <0.01
Is this statistically significant result? Can the
researcher reject the null hypothesis?
Investigation of
Disease Outbreaks
What is an Outbreak
Definition:
The occurrence of cases of an illness
in excess of expectancy
Identify the existence of
the outbreak
Is the group of ill persons normal for the time

of year, geographic area, etc.?
(background information on disease
occurrence)
Epidemic Curves
Visual display of epidemics magnitude
and course
# cases by time of onset

Shape of the curve gives you clues
Epidemic Curves:
Point Source Outbreak
No. of cases
10
9
8
7
6
5
4
3
2
1
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Date
Propagated or
Person-person Outbreak
35
30
25
# CASES 20
15
10
5
0
1 4 7 10 13 16 19 22 25 28
Continuous Source Outbreak
No. of cases
10
9
8
7
6
5
4
3
2
1
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Date
Initial assessment / action
Most important: Confirm the diagnosis
Is occurrence outside of normal expectation?

Review case histories from CDC or Health Department.
Communicate with local doctors / health workers involved
Interview and/or examine several cases
Discuss tests / quality of specimens with laboratory
involved
Read the literature, consult experts
Know the clinical presentation and spectrum
Descriptive epidemiology
Case definition and identification

Data collection
Synthesis: generate hypothesis
An example of case definition

All children in grade 3 of a local school who took
part in the field trip on November 20th, and who fell
ill with vomiting and/or diarrhea between the
evening of the 20th and the evening of the 21st
November.
Data collection
Time
incl. epidemic curve
Place Remember:
incl. place of residence, use standard format for
work, travel etc. data collection
Person organize your data;

keep track of all cases
age and sex
other demographic info
clinical symptoms
laboratory results
Questionnaires and forms
Example of questionnaire for foodborne disease
- Person ID: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Age: . . . . . . . years
- Sex: M F
- Ill: Y N
IF ILL: Start of symptoms: . . . . / . . . . / . . . . (date) . . . . . . (time)
- Fever Y N - Abdominal cramps Y N
- Nausea Y N - Diarrhoea Y N
- Vomiting Y N - Bloody diarrhoea Y N
- Duration of symptoms: . . . . . . . . . . .
- Treatment: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Hospitalization: Y N
- Outcome: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Lab tests: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Questionnaires and forms
Example of questionnaire for foodborne disease (cont.)
Meal 1 Meal 2 Meal 3 Meal 4
Date/time Date/time Date/time Date/time
........ ........ ........ ........
Place of meal:
........ ........ ........ ........
Food items:
........ ........ ........ ........
........ ........ ........ ........
........ ........ ........ ........
........ ........ ........ ........
Generate hypothesis
About
pathogen
route of transmission
Vector/vehicle
From
Symptoms and incubation period
Epidemic curve and place distribution
Age and sex distribution
Review all available data
Past experience, search the literature
Analytical epidemiology
Retrospective cohort study

Used when the population/group of people is
closed.
Starting point: exposure status (exposed -
unexposed)
Calculate attack rates and risk ratios: Ie/Io
Test for statistical significance
Calculate confidence interval of risk ratio
Look for possible confounding
Analytical epidemiology
Case-control study
Starting point: disease status (ill - not ill)
Find odds of exposure of cases and non-cases
Calculate odds ratios: ad/bc
Test for statistical significance
Calculate confidence interval of odds ratio
Look for possible confounding
Environmental inspection
Environmental inspection
Get info on usual practices
Inspect premises and practices
Take samples
Get info on food storage and handling (cold chain,
hot chain)
Get info on personnel (disease history?)
Ask for maps and plans
Need for specialist health inspector
Have your eyes open for clues from unexpected
sources
Control Measures
Remove source
Isolate / treat cases
Destroy food, recall products
Stop production, close premises
Ensure good practice procedures
Protect persons at risk
General hygiene
Vaccination
Other prophylaxis
Initial assessment/action
Is further investigation necessary ?
Do cases continue to occur?

Is it a serious event?
Is the cause unclear?
Is there a risk for recurrence?
Preventive Measures
Make recommendations
Produce guidelines
Make proposals for change in law
Communication
During the outbreak

Information for the public and the media
Information for professionals
Regular and consistent updates
After the end of the investigation
Produce a report for officials, parties involved,
general public, media
Write up for scientific publication
12 Steps of Outbreak Investigation
Preparations
1. Prepare for fieldwork

Identify Team and resources
Research the disease Final
Make administrative arrangements
Clarify your role
2. Establish the existence of an outbreak
Does the observed number of cases exceed the
expected number? Of course you need to know the
disease before you can do that.
3. Verify the diagnosis
Speak directly with persons who are affected
Define and Describe
4. Define cases
Establish a case definition
5. Identify cases
Identify and count cases by Line listing
6. Describe and orient the data in terms of
time, place and person
Outbreak curve
Map
Identify demographic and other characteristics of
persons at risk
Analyze
7. Develop hypotheses
Open-ended and wide-ranging interviews with a few
people
8. Evaluate hypotheses
Comparison: hypotheses with established facts
Analytic epidemiology
Cohort studies (RR; 95% CI)
Case-control studies (OR; 95% CI)
9. Refine hypotheses and carry out additional studies

Finalize
10. Implement control and prevention measures

Should occur as soon as info available
Make recommendations and produce guidelines
Make proposals for change in law
11. Communicate findings
Summarize investigation for requesting authority
Produce written report
12. Maintain surveillance to monitor trends and evaluate
control/preventive measures
Number needed to treat (NNT)
Number Needed to Treat (NNT) = # of pts you need to
treat to prevent 1 additional bad outcome (death, stroke,
etc.). Ex: if drug has NNT of 5, means you have to treat 5
people w/drug to prevent one additional bad outcome;
Lower the value, the better it is
To calculate the NNT, you need to know Absolute Risk

Reduction (ARR); the NNT is the inverse of ARR:
NNT = 1/ARR
Where ARR (absolute risk reduction) = CER (Control

Event Rate) EER (Experimental Event Rate).
Example
The ARR is therefore the amount by which your therapy
reduces the risk of the bad outcome. For example, if your
drug reduces the risk of a bad outcome from 50 per cent
to 30 per cent, the ARR is:
ARR = CER EER = 0.5 0.3 = 0.2 (20 per cent)
NNT = 1/ARR = 1/0.2 = 5

Example
A well-designed randomized controlled trial in children
with a particular disease found that 20 per cent of the
control group developed bad outcomes, compared with
only 12 per cent of those receiving treatment. Calculate
number needed to treat.
Answer:
NNT = 13
Number needed to harm (NNH)
= # of PPL, who must be exposed to something in order for one of them to experience
an adverse effect
NNH = 1/ Absolute risk increase
Absolute risk increase = Experimental event rate control event rate
55 out of 75 people died due to usage of an experimental drug. Among 75 people who
took placebo only 35 of the them died. What is the number needed to harm?
HIGHER VALUE = BETTER; TAKES LONGER TO GET ADVERSE EFX ON PT
1 / (55/75) 35/75) = 1/ (0.73-0.47) = 1/0.26 = ~3.8
NNH = 4
CLINICAL PROBABILITY
The probability of an event can be expressed as a

ratio of the number of likely outcomes to the
number of possible outcomes
The probability of an event is denoted by P

Probabilities are usually expressed as decimals
fractions, not as percentages, and must lie btwn 0
(zero probability) & 1 (absolute certainty)
Probability
Methods of calculating probability:
The multiplication rule

The addition rule
Multiplication rule
multiplication rule of probability states the
probability of 2 or + independent events occurring
at same time is equal to the product of their
individual probabilities
Example:
Chance of having a brown hair is 0.3
Chance of getting a cold is 0.2
What is the chance of meeting brown haired person
with a cold?
0.30.2 = 0.06
Addition rule
addition rule of probability states that probability of
any one of several particular events occurring is
equal to the sum of their individual probabilities,
provided the events are mutually exclusive (i.e. they
cannot happen at one time)
Example: Deck of cards

The probability of picking a heart card in a deck is
0.25, The probability of picking a diamond card in a
deck is 0.25. what is the probability of picking a
heart or a diamond card?
0.25+0.25 = 0.5
Chi-square value and statistical
significance
Suppose the calculated value was 4.25
Degree of freedom was 1
Look at chi square table, find the value of p
P < 0.05
Chi-square value
If the calculated value was 2.5
Degree of freedom was 1
Look at chi square table, find the value of p

t- value and statistical significance
t- value = 2.35
Degree of freedom = 24
P value? Btwn 0.05 & 0.01 statistically
sig

Epi Final

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Epi Final

Încărcat de

Drepturi de autor:

Formate disponibile

Disease -definition

- abnormal condition of an ind that impairs physioal

Disease = physioal/psychal dysfunc

Illness = subjective state of person who is aware of not being

Sickness = state of social dysfunc, i.e., a role that the ind

study of how disease is distributed in populations & factors that

study of distribution & determinants of health-reld states or events

- Clinical dx (e.g. clinical finding associated w/pathology in

- Prognosis (e.g. observations of large groups of pts w/same

- Selection of appropriate therapy (e.g. Studying efx of a tx

Epidemic, Endemic, Pandemic

To determine extent of disease found in the community

To study natural history and prognosis of disease

To evaluate both existing & new preventive and therapeutic

To provide foundation for developing public policy & making

The science concerned w/safeguarding & improving

promoting health and efficiency through

Clinicians are concerned with the health of an individual

Epidemiologists are concerned with the collective health

Examines natural history & distribution of disease in pop

Cyclic Fluctuations: Annual occurrence, Seasonal

Changing or stable; trends (comparing today with x yrs

Clustered (epidemic) or evenly distributed (endemic)

Geographically restricted or widespread

Geographic variation: rural/urban, states

Multiple clusters or one

Physical location such as relation to water or food supply,

Determines extent of disease in community

Examining distribution of a Testing a specific hypothesis

Typical study design: Typical study designs:

Examines the determinants of a disease in a population

What factors are associated with disease (risk factors)

what factors are causing the disease

Uses comparison groups -

degree to which the ind contacts and is able to adapt to the

Phenomena in the environment that bring host and

Source or Reservoir Modes of transmission Susceptible host

The process of spread of a disease agent through

Any person, animal, arthropod, plant, soil, or a

- in which an infectious agent normally lives and

Direct contact Vehicle-borne

Count and Rate

The study of:

Health related events or Diseases in human populations

and the application of this knowledge

Measurement of disease frequency

daily/ hourly occurrence

Permits evaluation of trends in health and disease within

Descriptive studies cannot be used to

A continued watchfulness over the distribution and

The simplest and most frequently performed

The number of cases of disease or other health

E.g. Cases of Ebola during 2014 in Liberia

Includes: Percentage, Rate and Proportion

Number of events in time T

= # of new cases (incidents) during a time period

At beginning of given time!

One of the most important rates in epidemiology

Measures the rate at which people without a disease

Measured in a cohort study

Can be described as:

The number of new cases of a disease during a

New cases occurring in a given period x 10n