Sunteți pe pagina 1din 481

Disease -definition

- abnormal condition of an ind that impairs physioal


funcing; broad array of health conditions including
physiologic states & mental health
- words disease, illness & sickness loosely
interchangeable

Disease = physioal/psychal dysfunc

Illness = subjective state of person who is aware of not being


well

Sickness = state of social dysfunc, i.e., a role that the ind


assumes when ill
Epidemiology-definition
Greek medical terminology:

Epi= Upon
Demos= People
Logos= Study of (Body of Knowledge)

study of how disease is distributed in populations & factors that


influence or determine this distribution

study of distribution & determinants of health-reld states or events


in specific pops & application of this study to control health
problems
Epidemiology and clinical practice
Practice of medicine dependent on pop data

- Clinical dx (e.g. clinical finding associated w/pathology in


large group of ppl)

- Prognosis (e.g. observations of large groups of pts w/same


disease, stage, tx)

- Selection of appropriate therapy (e.g. Studying efx of a tx


in large groups of pts in randomized clinical trials)
Final

Epidemic, Endemic, Pandemic


Epidemic
Greater # of cases of disease than expected in given
pop
Endemic
Constant presence of disease in a particular locality,
region or ppl
Pandemic
Epidemic that spreads thru human pops across a
large region, continent, or even worldwide
Epidemiology-objectives
To identify etiology or cause of a disease & factors that increase a
persons risk to disease

To determine extent of disease found in the community

To study natural history and prognosis of disease

To evaluate both existing & new preventive and therapeutic


measures and modes of health care delivery

To provide foundation for developing public policy & making


regulatory decisions relating environmental problems
Public health- definition

The science concerned w/safeguarding & improving


physical, mental & social well-being of community as a
whole

promoting health and efficiency through


organized community effort
prolonging life
preventing disease
Epidemiology - a population science

Clinicians are concerned with the health of an individual

Epidemiologists are concerned with the collective health


of the people in a community
Design Strategies

Descriptive

Analytic
Final

Descriptive Epidemiology

Examines natural history & distribution of disease in pop


Observes its distribution in terms of Time, Person,
& Place = Triad of Descriptive Epidemiology
Final

Characteristics of Time

Cyclic Fluctuations: Annual occurrence, Seasonal


variation, Daily occurrence during an epidemic

Changing or stable; trends (comparing today with x yrs


ago)

Clustered (epidemic) or evenly distributed (endemic)


Final

Characteristics of Person

Age
Gender
Ethnicity / Race
Marital status
Socio-economic status (SES)
Occupation
Final

Characteristics of Place

Geographically restricted or widespread

Geographic variation: rural/urban, states

Multiple clusters or one

Physical location such as relation to water or food supply,


pollution
Final

Descriptive Epidemiology

Determines extent of disease in community


Evaluates trends of disease w/in & among pops
Provides a basis for planning, provision & evaluation of
health services
Provides data to be studied by analytic methods (helps
form hypothesis)
Descriptive Epidemiology

Types of study:

Case Reports
Case Series
Correlation studies
Cross sectional studies = Community health survey
Final
Descriptive vs Analytic
DESCRIPTIVE ANALYTIC
EPIDEMIOLOGY EPIDEMIOLOGY

Examining distribution of a Testing a specific hypothesis


disease in a pop, & observing about relationship of a disease
the basic features of its to an alleged cause, by
distribution in terms of time, conducting study that relates
place, and person. exposure of interest to disease
of interest.

Typical study design: Typical study designs:


community health survey = cohort
cross-sectional study case-control
correlation studies clinical trial
Final

Analytic Epidemiology

Examines the determinants of a disease in a population

What factors are associated with disease (risk factors)

what factors are causing the disease

Uses comparison groups -


Triad of Analytic Epidemiology
The three phenomena assessed are:

Host Factors

Agent Environment
Agents

Nutrients
Allergens
Radiation
Physical trauma
Microbes
Psychological experiences
Host Factors

degree to which the ind contacts and is able to adapt to the


stressors produced by agent

Genetic endowment
Immune System (e.g. vaccinated)
Nutritional Status
Environment

Temperature
Sanitation
Pollution of water / air
Population density
Disease Transmission

Modes of communication

Phenomena in the environment that bring host and


agent together, such as:
1. - Reservoir
2. - Vector
3. - Vehicle
Dynamics of Disease Transmission

Source or Reservoir Modes of transmission Susceptible host

The process of spread of a disease agent through


a population. Answer the questions: Who got it?
and How did it spread (either from a common
source or from person A to person B)?
Reservoir
natural habitat of the infectious agent

Any person, animal, arthropod, plant, soil, or a


combination of these

- in which an infectious agent normally lives and


multiplies, on which it depends primarily for
survival, and where it reproduces itself in such a
manner that it can be transmitted to a susceptible
host.
Modes of transmission
Mode of transmission

Direct Indirect
transmission transmission

Direct contact Vehicle-borne


Vector-borne
Contact with soil
Air-borne
Inoculation into skin or mucosa

Trans-placental (vertical)
Modes of transmission
Modes of transmission
Vector
An animate intermediary in disease transmission.
Most vectors are arthropods such as mosquitoes,
fleas, or ticks.

Vehicle
Inanimate objects such as food, water, biologic
products (e.g. blood), and fomites that may
indirectly transmit an infectious agent from a
reservoir to a host.
Frequency Measurements
Incidence & Prevalence
This Lecture

Count and Rate


Incidence
Prevalence
Relationship between Incidence and Prevalence
Epidemiology

The study of:

Health related events or Diseases in human populations


related to
frequency, distribution and determinants

and the application of this knowledge


to the control of health problems
Distribution
Think descriptive epidemiology:

Measurement of disease frequency


&
pattern of disease occurrence (who, where,
when)
Distribution
Who?
Describe disease in terms of demographics e.g. sex, age,
race, ethnicity, SES
Where?
Geographic variation - rural/urban, physical location

When?
Annual occurrence, seasonal variation,

daily/ hourly occurrence


Objectives of
Descriptive Epidemiology:

Permits evaluation of trends in health and disease within


and among populations
Identification of emerging problems
Provides a basis for planning, provision and evaluation of
health services
Identifies problems to be studied by analytic methods

Descriptive studies cannot be used to


prove an association between 2 variables
Heart Disease by Race
Surveillance
Monitoring progress of a disease in a community.

A continued watchfulness over the distribution and


trends of diseases through the systematic collection,
consolidation and evaluation of morbidity (disease) and
mortality (death) reports and other relevant information
Surveillance
Active Surveillance
Based on public health legislation, refers to daily, weekly, monthly
contacting physicians, laboratories, schools to actively search for cases.
Used during outbreaks to identify additional cases

Passive Surveillance
Reporting of cases by health care providers on a periodic and consistent
way. Usually thru legislatively mandated reporting of certain conditions; not
actively seeking new cases

Sentinel Surveillance
Monitoring rate of occurrence of specific conditions to assess stability or
change in health levels of a population; by specific orgs to know how
disease patterns have changed
Count and Ratio

Count, Proportion,
Ratio and Rates
Count

The simplest and most frequently performed


quantitative measure in epidemiology

The number of cases of disease or other health


phenomenon being studied

E.g. Cases of Ebola during 2014 in Liberia


Ratio

Includes: Percentage, Rate and Proportion

Numerator
Denominator

In Epidemiology:

Number of events in time T


Population in time T
Incidence Final

= # of new cases (incidents) during a time period

Prevalence
= # of existing cases at or over a point in time
aka total
Incidence rate
The rate of:
new events or incidents
(disease, injury or death)
in population at risk
during a period of time

At beginning of given time!


Ie. beginning of yr.
New cases _____ x 10n
Population at risk
Incidence rate Final

One of the most important rates in epidemiology

Measures the rate at which people without a disease


develop the disease during a specific period of time

Measured in a cohort study

Can be described as:


cumulative incidence or incidence density
Cumulative Incidence:
Final

The number of new cases of a disease during a


specified period of time in a population at risk

New cases occurring in a given period x 10n


Population at risk during same time period

=Often an estimate
Example:

Among inpatients at Hospital during the month of


June, a total of 12 patients acquired nosocomial
infections. For the same month, the hospital had a
total of 2400 patients admitted. The Incidence for
the month of June per 1000 patients is:

12/2400 x 1000 = 5 per 1000


Incidence Density

The true incidence of disease at any given point in time

The most ideal, although not practical measure

New cases occurring in a given period x 10n


Total person-time of observation
Calculation of Person-years for Incidence
Density
Cases Total time

Subject A ------------------- 2 years


Subject B -------------X 1 years
Subject C ----------------------- 2 years
Subject D ----------------------------------X 3 years

x = developed disease
-- = time followed

Incidence Density = 2 / 8
= 25 per 100 person-years
A Prospective Study of post-menopausal hormones and coronary heart
disease
NEJM 313:1044, 1985

Population: 32,317 postmenopausal women


Cases of coronary heart disease: 90
Time period: 105,786.2 person-years

What is the Incidence Density of CHD in this study?

Incidence Density = 90 / 105,786.2 person-years


= 85.1 / 105 person-years
Prevalence rate

The proportion of persons in the population


who have a particular disease at a given time

all cases during a given time period x 10n


population during the same time period

Total cases / total pop!


Prevalence:

2nd MC measure of disease frequency

Focus on chronic conditions

Measured in a cross-sectional study

Can be measured at
a specific point in time (point prevalence) or
over a specified period of time (period prevalence)
Point vs. Period Prevalence

Point Prevalence examines prevalence


at a single point in time
= status of disease in a population at a point in time

Period Prevalence examines prevalence


over a longer period.
Is the proportion of a population that has the condition
at some time during a given period
e.g. a year
Example
St. Maarten has a survey which shows that 20% of the adult
population over the age of 40 is diabetic. In St. Kitts the
figure is 10%. The statement that living in St. Maarten is
associated with a greater risk of becoming diabetic is

a) Correct
b) Incorrect, the comparison is not based on rates
c) incorrect, because no control or comparison group is used
d) incorrect, because prevalence is used instead of incidence
Incidence and Prevalence

Prevalence does not say anything about risk of


developing a disease

Incidence = new cases;


Tells about the risk & how well the control measures
are working
Prevalence

Prevalence = total disease load

To know how many people need treatment, what


supplies are needed, requirements to take care of
disease.
The Relationship between
Incidence and Prevalence

The Prevalence Pot


Relation between
Incidence and Prevalence

Prevalence is related to:


Incidence of disease and
duration of disease (D)

P I x D !!!
Duration is shorter ie. Cold, prevalence of it will be lower too
Paradox between
Incidence and Prevalence

A disease can have a high incidence, but if there is


rapid mortality or recovery the prevalence can be
low
Ie. cold

A disease with low incidence may still have high


prevalence if the disease is not associated with cure
or death
Accumulated chronic conditions, esp in elderly
Incidence and Prevalence
Disease of brief duration will be more likely to be
missed by a prevalence study

Example
30% of all deaths from myocardial infarction
occur within 24 hours of the onset of symptoms
in people having no prior evidence of disease
Incidence and Prevalence
Disease of long duration are well represented in a
prevalence study, even when there incidence is
low.

Example: Crohns disease


Incidence is about 2-7 per 100,000/year
Prevalence is more than 100 per 100,000/year
Question:
If you want to determine the cost of treating
diabetics in your country do you need to know
the prevalence or incidence of diabetes?

If you want to know if anti-smoking legislation


has resulted in fewer cases of lung cancer
should you use incidence or prevalence?
What happens to incidence and
prevalence if:
New effective treatment is initiated
Incidence doesnt change; occurrence has already taken place
Prevalence decreases
New effective vaccine gains widespread use
Incidence decreases
Prevalence decreases
Number of patients dying from the condition increases
Incidence doesnt change
Prevalence decrease
Additional federal research dollar are targeted to a specific
condition
Nothing changes unless new effective drug
Behavioral risk factors are reduced in the population at large
Incidence decreases
Prevalence decreases
Incidence and Prevalence
MCQ
A new chemotherapy treatment is developed that
reduces death from leukaemia but does not produce
recovery. Which of the following will occur?

a)Prevalence of the disease will decrease


b)Incidence of the disease will increase
c)Prevalence of the disease will increase
d)Incidence of the disease will decrease
e)Incidence and prevalence of the disease will decrease
This Lecture

Attack Rate
Morbidity Rate
Mortality Rate
Case Fatality Rate
Relative Risk
Attributable Risk
Attack Rates
Number of events among a population at risk in a period of
time

A variant of an incidence rate; a cumulative incidence rate


mainly used in epidemic situations
applied to a narrowly defined pop over a limited time
same formula as incidence rate!!!

new cases occurring in a given time period x 10n


population at risk during the same time period
Secondary Attack Rate

Number of new cases among contact of known cases of


specific groups

Cases among contacts of primary cases during the period x10n


total number of contacts at risk
Question: Calculate primary and secondary attack rates for the
following epidemic of hepatitis A

7 cases of hepatitis occurred among 70 children attending a day care


center. Each infected child came from a different family. The total
number of persons in the 7 affected families (including the children
from the daycare) was 32. One incubation period later, 5 additional
family members developed hepatitis.

10 attack rate =
20 attack rate in the families =
Question: Calculate primary and secondary attack rates for the
following epidemic of hepatitis A

Seven cases of hepatitis occurred among 70 children attending a day


care center. Each infected child came from a different family. The
total number of persons in the 7 affected families (including the
children from the daycare) was 32. One incubation period later, 5
additional family members developed hepatitis.

10 attack rate = 7 / 70 = 10%


20 attack rate in the family members = 5 / (32-7)
= 5 / 25 = 20%
Morbidity Frequency Measures

Incidence Rate
Attack Rate
Secondary Attack Rate
Point Prevalence
Period Prevalence
Mortality Frequency Measures

Crude Mortality Rate


Specific Mortality Rate
Standardized (=Adjusted) Mortality Rate
Mortality Rates

Rate of death in pop at risk

But usually:
denominator is midpoint pop so that makes it an
estimate & not an exact figure
Crude Mortality Rates

The proportion of population dying every year

All deaths during a calendar year x 1,000


Population at midyear

It is the actual measured rate for the whole population


Only multiply by 1000, if asked per person
Example : Age Specific Mortality Rate

Number of people who died


in a particular age group
x 1000
Total population of same age group
at midyear

Age group in Deaths in Miami


years
17 to 27 50

28 to 47 750

> 48 1000

Total 1800
Example : Age Specific Mortality Rate

Number of people who died


in a particular age group
x 1000
Total population of same age group
during the same year

Age group in Population Deaths in Age Specific Mortality


years of Miami Miami rate
17 to 27 50,000 50 50/50,000 =1

28 to 47 150,000 750 (MR 5) 750/150,000 =5

> 48 250,000 1000 (MR 4) 1000/250,000 =4

Total 450,000 1800 (MR 4) 1800/450,000 =4


Mortality Rates
Crude rates: summary rates based on actual number of
events in a population.

When comparing regions or countries:


1. Specific rates divides a population into more
homogeneous subgroups based on age, sex, race, risk factors,
cause, etc

2. Adjusted rates summary rates in which an as if


statistical procedure has been applied to remove the effect of
differences in composition of the various populations (ie. diff
countries in mils). Standard population sizes are used.
Comparing Mortality Rates

2 populations can only be compared when


using:

Age specific mortality rates or


Age-adjusted rates
More Mortality Rates
Age Specific Mortality Rate
e.g. Neonatal Mortality Rate, Infant Mortality Rate

Proportionate Mortality

Case Fatality Rate


Infant Mortality Rates

The number of deaths in the first year after live birth


Imp indicator of countrys level of health & devt
~10 mil infants per year world-wide

#Deaths of infants under 1 year old in a given year x 1000


Live births in same year
Proportionate Mortality

The proportion of the overall mortality ascribed to a


specific disease

Deaths assigned to specified disease in a time period x 100


Total # of deaths from all causes during same period

Final
Case-Fatality Rate Final

Deaths from a specific disease


per number of persons with the disease
A measure of probability of death
of severity of disease
to see benefit of therapy
Esp. used in acute infectious diseases

Deaths assigned to disease in a time period x 100


Total # of people w/disease in the same time period
Example: Case Fatality Rate
In a population of 100,000 persons:
20 have disease X

In one year 18 die from that disease

Case Fatality is
18 / 20 = 0.9 or 90%
Example: Calculation of
Infant Mortality Rates

In year X 38,910 infants died and


3.9 million children were born
The IMR (number of deaths children less than 1 year old)

= 38910 / 3.9 million


= 9.95 per 1000
Estimating Risk:
Measures of Association
Measures of disease frequency is the basis for
comparison of pops w/diff exposures
To identify disease determinants we look for an
association between exposure and the risk of
developing disease
To compare risks between exposure (= a risk
factor/causative factor) and disease
There are several Measures of association
Measures of Association

Presentation is often in 2 x 2 table


Disease
Yes No Total

Yes a b a+b
Exposure
No
c d c+d

Total a+c b+d a+b+c+d


Measures of Association

RR= Relative Risk


OR= Odds Ratio
AR= Attributable Risk
ARP= Attributable Proportion
Relative Risk = RR = Risk Ratio
Compares risk of a particular event in 2 groups
How much more likely is one group to develop a disease or
death than the other

RR = Absolute risk in the exposed group


Absolute risk in the unexposed group Final

RR = Incidence rate of exposed group


Incidence rate of unexposed group

= a/(a + b) = dis + exp / all exposed


c/(c + d) dis + nexp/ all non exposed
Relative Risk

A measure of association; it indicates the likelihood

How much the risk increases for a person with risk


factor compared to a person without the risk factor

measures strength of association btwn factor &


outcome
Examp
le

50 out of 100 students who drink Tap water get


gastroenteritis during the semester (attack rate =
50%)

150 out of 300 students who dont drink tap water


also get sick (attack rate = 50%)

The relative risk is 1, therefore there is no


increased risk associated with drinking cistern
water.
Relative Risk

A risk ratio of 1 indicates identical risk in 2 groups

A risk ratio > 1 indicates that exposure gives increased


risk
(i.e. smoking - lung carcinoma)

A risk ratio < 1 indicates protective factor against


developing disease
(i.e. sunscreen protects against skin cancer)
Attributable Risk = AR
= Risk Difference
The risk of disease in exposed group that can be considered
attributable to the exposure
So it is the benefit that might happen if the risk factor is removed
The absolute effect of exposure in those exposed vs. not exposed
= the excess risk of disease (risk difference)

AR = Ie Io (subtract!)
Ie is incidence in the exposed
Io is incidence in un-exposed

How many more cases in one group Final


Attributable Proportion/risk %
The proportion of disease attributable to exposure
Also called: Attributable risk percent

If the risk factor is removed, that proportion can be benefitted

Inc. for exposed group Inc. for unexposed group x100%


Inc. for the exposed group

Another formula: (RR 1) x100%


RR
Measures of Association
Incidence is used to calculate RR, AR & ARP

RR: How much more likely; How much the risk for a
patient who smokes increased compared to a non smoker.

AR: How many more cases; Excess cases in the exposed


group that can be attributed to smoking

AP: reduction in dis; Benefit for pop if risk factor is


removed
Study Designs
This Lecture

Descriptive Studies
Case Report and Case series

Cross-sectional Studies

Correlational Studies

(Scatterplots, Regression, Correlation


Coefficient r, Coefficient of Determination
r 2)
Types of Research Study Design
Descriptive Studies (Observational)
Case Reports
Case Series
Cross sectional studies
Ecological /Correlational studies

Analytic Studies (Observational)


Case-control
Cohort
Analytic Studies (Interventional)
Randomized controlled trial
Cross over trials
Types of Descriptive Studies
Case reports

Case series

Cross-Sectional Surveys

Correlational / Ecological studies


Case Reports
Most basic descriptive study

Describes the experience of a single patient

Quick and Cheap

Document unusual medical occurrences


Final

May lead to the identification of a new


disease
Case report: Example

The case of a 51-year-old Moroccan male


admitted for a non-reducible right inguinal
hernia in which surgical exploration showed
the presence of a small bowel tumor that
had migrated into his hernia sac. A histo-
pathological examination of the tumor was in
favor of a small bowel schwannoma.
Case Series

A collection of Case Reports

same as the Case Report but:

Describe exp of a group of pts w/same dx

Is usually 5-10 people but can be up to 100


Final
Case series: Example

Between Oct 1980 and May 1981, 5 cases


of Pneumocystis carinii pneumonia were
reported among young, previously healthy,
homosexual men in L.A. These cases
were studied further.

Such infections previously occurred only in


older, immunosuppressed cancer patients
Case reports and series

May describe:
Previously described disorder involving a new
population or subgroup
Known disorder with unusual clinical
presentation or clinical course
Condition in which novel tx methods are used

Previously unknown condition


Case Reports/Series
problems/disadvantages
Often based on exp of only 1 or few pts

Presence of a risk factor may only be


coincidental

Can not test for the presence of a valid


statistical association

There is no comparison group


Cross-Sectional Study
Also called Prevalence Surveys

Exposure & disease measured simultaneously

Provides a snapshot of population

Examples:
Obesity and TV viewing
Alcohol and CHD
Hypertension and physical inactivity
Final
Cross-sectional survey of coronary heart disease

Number Number Prevalence


Examined with CHD Rate

Not 89 14 157.2/1000
physically
active

Physically 90 3 33.3/1000
active

Total 179 17 95.0/1000

The data show an association btwn inactivity


and CHD
Cross-Sectional Studies - BENEFITS

Used to provide info on prevalence of


disease (disease load) & health outcomes

Allows administrators to assess health status


& needs of population

Used to formulate hypotheses ONLY; cannot


establish association (for that, must do
analytical study)
Final
Cross-Sectional Surveys -
PROBLEMS

Surveys gather prevalent not incidence data

Since both exposure and disease are assessed at


the same time you can not determine whether
exposure preceded or resulted from disease
chicken or egg dilemma = no temporal sequence

Usually can not be used to test a hypothesis


Correlational (Ecological) Studies

Investigating a possible exposure-disease relationship

Use populations as unit of analysis

Populations can be countries, counties, provinces etc

Uses database from entire populations to compare


frequency of a particular disease in relation to some
factor
The Relation of obesity and depression:

Depression

Prevalence of obesity in U.S states


Ecological studies
Associations on population levels may not reflect
associations on individual levels.

Example: Dont know whether individuals who are


obese tend also to be depressed

Ecological Fallacy: Incorrectly assuming that an


association on a pop level rflx association on a ind
level
Ie. telling pt who is obese, he is more likely to get
depression as well WRONG.
Correlation Analysis

Results can be presented in the form of:

A scatter diagram a graphical form

Line of least squares (linear regression)


a descriptive form
Final

Correlation coefficient a descriptive form


Scatter Diagram
Scatter diagram = graphical method to
display the relationship between two
variables

plots pairs of bivariate observations (x, y)


on X-Y plane

Y is called the dependent variable

X is called an independent variable


Scatter Diagram - example
A researcher believes that there is a
linear relationship between BMI BMI (Kg/m2) Birth-weight
(Kg/m2) of pregnant mothers and the (Kg)
birth-weight (BW in Kg) of their 20 2.7
newborn 30 2.9
The following data set provide 50 3.4
information on 15 pregnant mothers 45 3.0
who were contacted for this study 10 2.2
X = BMI; independent variable 30 3.1
Y = birth-weight; dependent variable 40 3.3
25 2.3
50 3.5
4

3.5
20 2.5
3 10 1.5
2.5

2
55 3.8
1.5 60 3.7
1

0.5
50 3.1
0
0 10 20 30 40 50 60 70
35 2.8
Is there a linear relationship between BMI and
Body weight?

Scatter diagrams are important for initial


exploration of the relationship between two
quantitative variables

In the above example, we may wish to


summarize this relationship by a straight
line drawn through the scatter of points
Regression

Although we could fit a line "by eye" e.g. using a


transparent ruler, this would be a subjective
approach and therefore unsatisfactory.

An objective, and therefore better, way of


determining the position of a straight line is to
use the method of least squares or regression.

best fitting line = regression line


Regression
A mathematical model to describe the effect of one
or more independent variables on a dependent
variable. It gives a prediction equation

Linear regression: Effect of one independent


variable (X) on one dependent variable (Y)

Multiple regression: Effect of many independent


variables on one dependent variable
Linear Regression or
Least Squares

The equation for the least squares regression line:


y = a + bx; PREDICTION EQN
a= intercept b= slope
This will enable you to predict y given x
It will give an average, not an exact figure
4

3.5

2.5

1.5

0.5

0
0 10 20 30 40 50 60 70
Linear Regression or
Least Squares
Using this method, we choose a line such that the sum of
squares of vertical distances of all points from the line is
minimized.

These vertical distances, i.e., distance btwn y values &


their corresponding estimated values on line are called
residuals

Line which fits best is regression line OR least-squares


line
Linear Regression or
Least Squares
y = a + bx a=1.775 b=0.033
This equation allows you to estimate
Body weight of other newborns when the BMI is
given.

e.g., For a mother who has BMI=40,


what would the predicted birth weight of the baby
be?
Y= a + bx= 1.775+0.033 (40)= 3.096
Correlation Coefficient = r (-1 to +1)

summarizes size & direction of a relnship


btwn 2 variables

The relationship means that Y changes in


systematic way as X changes & vice versa

relationship can be linear (straight line) or


non-linear (other than straight line)
Final
Difference between Correlation and
Regression

Correlation Coefficient, r, measures the


strength of bivariate association or reln

regression line = prediction equation that


estimates the values of y for any given x
y = a + bx
Correlation Coefficient: r

A measure of strength of linear association


between 2 variables, x and y

Most statistical packages & some hand


calculators can calculate r

For the data in our Example r = 0.94

r has some unique characteristics


Final
Correlation Coefficient, r
r takes values between -1 and +1

r = +1; Maximum or perfect Positive correlation

r = -1; Maximum or perfect Negative correlation

r = 0 represents no linear relationship between the two


variables

r > 0 implies a direct linear relationship


Final
r < 0 implies an inverse linear relationship

The closer r comes to either +1 or -1, the stronger is the


linear relationship
Correlation Coefficient

Final
Correlation Coefficient

Final
Coefficient of Determination = r2
Another important measure of linear
association between x and y (0 r2 1)

Measures the proportion of the total variation


in y which is explained by x

For the data in our example: r = 0.94 so r2 =


0.8751

So 87.51% of variation in BW is explained by


the independent variable x (BMI)
More on Scatterplots and Correlation
Scatter plots can show you a lot about the
relationship between X and Y when using
correlation coefficient

A circular scatter plot or an elliptical scatter plot,


with a horizontal line through its longest part,
indicates no relationship between X and Y

When the line thru the longest pt of the ellipse


either slopes up or down theres a relationship
btwn X & Y
Correlations of various sizes

r =1 r =0.9

r =0.5 r=0
Correlation and linear regression

Correlation Linear Regression


Quantifies the degree to What is the cause and what is
which two variables the effect as the regression
are related; it does line is determined as the best
not find a best-fit line way to predict Y from X.
(that is regression)
Reasons for Using the
Correlational Study Design

Simple to design and conduct

Cost effective

Time

To discover new relationships that can later be


explored using analytic studies
A study is published that
demonstrates a strong relationship
between breast cancer and dietary
fat intake. The per capita fat intake
is compared to breast cancer
mortality using data from 21
countries. The Pearson correlation
coefficient is 0.889, p<0.05. A
patient asks you for advice. Can
she reduce her risk of breast
cancer by maintaining a strict low
fat diet?

A.Yes, because the data clearly show an inverse correlation between


dietary fat intake and breast cancer
B.Yes, because this type of study has high ecologic validity
C.No, because this type of study can not be used to determine a cause
and effect relationship
D.No, because the number of countries surveyed was too small (big)
E. No, because the study did not reach statistical significance (it did!)
Key points: Correlations
You cant draw causation from a
correlation or any descriptive study
A common trick on the USMLE is to ask
about causation / etiology between 2
factors based on a correlational or other
descriptive study
The correct answer is that no causation
can be proven from this type of study
Which of the following is the best approximation of the
correlation coefficient between cigarette per capita and
CHD deaths as determined from these data?
A +1.20

B 0.22

C +0.72

D 0.85

E0
The postnatal weight gain (Y) in kilograms over a specified
period of time was related to varied amounts of a formula
supplement (X), in unit doses, taken during the same
period for a sample of 50 infants. The following results
were obtained:

Regression equation: Y = 1.0 + 0.5 X


Correlation coefficient: r = 0.74, P<0.05

Which of the following is a reasonable conclusion?

A. Infants not taking the supplement are expected to gain 2


Kg
B. There is no correlation between weight gain and the
amount of supplement taken
C. An infant taking 2 units of the supplement during the
time period is expected to gain 2 Kg
D. An infant taking 5 units of the supplement during the
time period is expected to gain 3 Kg
Prevention
Prevention and Epidemiology

An effective
Descriptive Analytic Intervention
Epidemiology Epidemiology =
Prevention
An injury story

22 years old male, recent college graduate, driving at high speed,


without seat belt, went out of control, crashed head on into another
vehicle and flew out of windshield
Paramedics found him, drove him in an ambulance to hospital
where he was examined and assessed, then airlifted to trauma center
C 6-7 fracture dislocation with bilateral jumped facets -complete
quadriplegia - surgical decompression - 10 days trauma center - 6
months rehab center
Long term requirements: daily attendant services, am and pm;
modified vehicle; wheelchair, adaptive devices; modified housing;
career choice change
An injury story - costs
Care Costs:
Acute care---------------- $ 65,000.00
Rehab care--------------- $ 90.000.00
Ongoing care
p.a.--------------------- $ 45,000.00
10 yr-------------------$ 450.000.00
Foregone income
p.a.--------------------$ 71,000.00
10 yr--------------------$ 710,000.00
TOTAL COSTS ----------- $ 1.315 Million
Prevention cost: Almost nothing!
Definition of Prevention

Actions aimed to promote and preserve health;


eradicating, eliminating, or minimizing the impact of injury,
disease and disability
Levels of prevention
The concept of prevention is best defined in the context of levels,
traditionally called:
Primary: any axn to prevent disease onset in 1st place;
disease process hasnt started

Secondary: take axn when disease already started but sx


havent devd yet; catching disease at early disease

Tertiary: disease started w/sx progressing but aim at


limiting its impact
Final
Natural History of Disease
Time

A B C D
Biologic Disease detectable Symptoms Recovery
onset by screening develop Disability
or
Death
Detectable
pre-symptomatic phase
Primary Prevention:
Modifying risk factors or eliminating causes

Phase:
Before disease is present
Avoids the development of disease
Strategy:
Reduce or remove risk factors
Educate on health promotion
Good nutrition
Vaccinations
Regular exercise Final
Methods of Primary Prevention:
Health promotion

Life-style: Health education

Exercise, Avoiding Tobacco, Limit alcohol, Safe


sex

Nutrition: Healthy diet; reducing salt, sugar and


fat

Promoting Breastfeeding
Final
Methods of Primary Prevention:
Health promotion
Environment:

Clean water, vector control

Adequate housing and good waste disposal

Environmental pollution, and violence

Final
Methods of Primary Prevention:
Specific protection

Supplements: e.g. folic acid in pregnancy


Immunizations: to prevent infectious disease
Occupational safety
Automobile safety

Final
Secondary Prevention

Finding disease in an asymptomatic person


to improve prognosis

Phase:
Disease present

Strategy:
Diagnose the disease early
Prompt treatment
Arrest diseases process Final

Prevent disability
Methods of Secondary Prevention
Early diagnosis and treatment

Screening tests
- Pap smear
- Colonoscopy
- mammography

Regular checkups
Final
Tertiary Prevention

Limiting complications and disability in symptomatic


patients

Phase:
Disease present; clinical course after occurrence of
disease

Strategy:
Restore normal / near normal functioning
Reduce fatalities and complications
Final
Methods of Tertiary Prevention

Disability limitation
Rehabilitation
Medical Treatment / Therapy
Prosthetics
Physiotherapy

Final
The Prevention Plan

Motivates individuals to make decisions that


are good for them, their family and community

Identifies an individuals current and future


health risks through a detailed health risk
assessment
Agencies within the Public Health
Service
National Institutes of Health (NIH)
Substance Abuse and Mental Health Services Administration (SAMHSA)
Administration for Children and Families (ACF)
Administration on Aging (AoA)
Agency for Healthcare Research and Quality (AHRQ)
Agency for Toxic Substances and Disease Registry (ATSDR)
Centers for Disease Control and Prevention (CDC)
Centers for Medicare & Medicaid Services (CMS)
Food and Drug Administration (FDA)
Health Resources and Services Administration (HRSA)
Indian Health Service (IHS)
The CDC
Involved in developing and applying prevention
and education activities to improve health of the
people of U.S.:

Disease prevention and control (especially


infectious diseases)
Environmental health
Occupational safety and health
Health promotion
Some Challenges
for the 21st Century

Smoking
Obesity
Institute a Rational Health Care System
Eliminate Health Disparities
Clean up and protect the Environment
Prevention and Epidemiology
Descriptive epidemiology measures
The extent of certain disease (frequencies)
Who (where and when) is at risk (TPP)
The consequences of this disease (Incidence, Case Fatality,
Prevalence=Disease Load)

Analytic epidemiology
What are the risk factors involved
(= determinants of disease)
Prevent, detect, reduce
Cohort Studies Case control studies
- Starts w/group of exposed ppl & non- - Starts w/group of ppl already
exposed ppl then follow them to see how w/disease & group w/out disease
many ppl develop disease & how many did then look at their histories to see how
not many were* exposed to risk factor & how
- Prospective: only go fwd from date of study many were not
- Retrospective: events have already occurred; - Calculate odds ratio b/c no known
go back, even when you do this however, still incidence rate (for relative risk)
start study w/exposure present or not; & then
see who developed disease & who didnt (as AKA CCO
very similar to case control) - Adv: easy to study rare disease, cheap,
- calculate relative risk b/c known incidence short duration (less time consuming)
- Disadv: difficult to establish temporal
rate AKA CSR association, biased due to confounders
- Adv: able to study temporal association
(exposures precedes disease can be clearly
established), study rare exposures, can KNOW WHAT TYPE OF
control confounders (preventing bias)
- Disadv: time consuming & expensive, high STUDY IT IS; MORE QUES
drop out rate ON THAT THAN CALCS!!!
- Nested case control study: start w/exposed
& unexposed, eventually get ppl w/disease &
no disease & then go back into history to Final
study other factors of interest, once these
grouped so develops into case control
Clinical trials: ppl w/disease used as reference Cross-over study: pts in group A & B receives
pop expal group divided into diff drugs; then wait (wash out period) 1 yr
- Drug group switch drugs given to each
- Placebo group - adv: each n can compare themselves (as
subjects become their own controls), need
Placed by randomization to reduce selection less ppl
bias, reduce effect of known & unknown - 2 types of clinical analysis in clinical trials
confounders - Intention to treat: drug & placebo
Blinding groups aka ALL participants will
Single-blind: participants blinded, be analyzed, even if non-
investigators know compliant, taken into analysis
Double-blind: both dont know
produces/reveals effectiveness of
Triple-blind: both + analysts dont
know who got drug & who got placebo drug!**More VALUE
Phases of clinical trial: - Explanatory analysis: drug &
1. Preclinical phase: animal studies placebo groups but ONLY
2. Phase I: healthy inds compliant pts analyzed; reveals
3. Phase II: diseased inds, ie. drug for HTN efficacy of drug! More reptd by
is it really reducing the disease (HTN) in drug companies
pt? Is it valid for that particular disease?
4. Phase III: randomized control trial
5. Phase IV: after drug released into market to
see any other efx
STUDY conducted to know association of alcoholism & cirrhosis of liver;
total of 300 agreed to participate in study; 100 pts w/cirrhosis selected &
matched w/200 ppl w/out cirrhosis; interviewed & asked about their
alcohol consumption in past. Of 100 pts who had cirrhosis, 80 were
alcoholics. In remaining 200 w/out cirrhosis, 40 were alcoholics. What is
the type of study conducted? Calculate measurement of risk.

= CASE CONTROL STUDY

Odd ratio = incidence in exposed/ incidence in unexposed

= [80*160] / [40*20]
100 200
Disease No disease

Exposed 80 40

Unexposed 20 160
In 1945, study conducted to know efx of exposure of Dutch famine to
fetus & its association w/childhood & adult illnesses. 500 pregnant women
who were exposed to famine & IDd & 500 women who werent exposed to
famine were also studied. Children born to both groups were followed. It
was observed that more # of kids that were born to famine exposed
women devd early CV & respiratory diseases.

= PROPSECTIVE COHORT STUDY

In 1998, study conducted to know efx of exposure of Dutch famine to


fetus & its association w/childhood & adult illnesses. Records were
acquired from 1944. 500 pregnant women who were exposed to famine
were IDd & 500 women who werent exposed to famine. From hospital
recds, found that more # of kids that were born to famine exposed
women devd early CV & respiratory diseases.

= RETROSPECTIVE COHORT STUDY


Cohort Studies

Also called:
Longitudinal studies
Follow-up studies
Incidence Studies
Cohort Studies

One of the most useful observational study


Individuals are divided on basis of presence or absence
of risk factors
Inds then followed over time to determine if they
develop a specific outcome or disease
What is a cohort?
Group of individuals
sharing same experience

followed up for specific period of time

Examples:
Group of smokers

Occupational cohort of chemical plant


workers
Cohort Studies
Cohort Studies

Follow up Follow up
Disease Disease Does Totals Incidence
develops Not Develop
Exposed A B A+B A/A+B
Not exposed C D C+D C/ C+D

Incidences in the exposed group and the


not exposed group are calculated
Cohort Study of smoking and Coronary
Heart Disease (CHD)
Follow up Follow up
CHD CHD Does Not Totals Incidence
develops Develop /1000 per
year
Smoke cigarettes 84 2916 3000 28.0
Do not smoke 87 4913 5000 17.4
cigarettes

Select a group of 3,000 smokers (exposed) and 5,000


nonsmokers (not exposed)
All are free of heart disease at baseline
Both groups are followed for the development of
CHD
Incidence in both groups is compared
Types of Cohort Studies

PROSPECTIVE (FUTURE
STUDIES)
RETROSPECTIVE
Prospective Cohort Study

Initiation of study occurs before occurrence of


disease
Groups of exposed & unexposed inds are
monitored over time to assess devt of disease
incidence of disease in both groups is compared
Potential confounders documented
Ie. smoking in a study btwn lung cancer &
alcohol; smoking would be a confounder
Final
Prospective Cohort Study

Exposure Disease

?
Final

Exposure may have occurred at study entry


Outcome definitely has not occurred at study
entry
Prospective Cohort Study of smoking relationship
to lung cancer
Identify a population of elementary school students
and follow them up
Identify those who smoke and those who do not
Observe who develop lung disease and who do not in
future
Final

= Concurrent Cohort
= Longitudinal Study
Cohort Study: Lead level and Affective Disorders

Exposed group: 100 children exposed to high levels


of lead were followed for 15 years; 40 developed an
affective disorder
Non-exposed group: similar group of 100 children
not exposed to lead were followed. 5 of these
children developed an affective disorder

What is the incidence of affective disorders among


those exposed to high lead levels?
Affective Disorder
present absent
exposed
40 60

not exposed 5 95

What is the incidence of affective disorders among


those exposed to high lead levels?
40/100 = 40% or 0.40

What is the incidence of affective disorders among


those NOT exposed to high lead levels?
5/100 = 5% or 0.05
Retrospective Cohort Study

Initiation of study occurs after occurrence of


disease / outcome of interest
Allows investigators to complete study in less
time (saves money)
Subject to bias
Since they depend on exposure data occurring
previously, info can be incomplete; information
on confounders may not be available
Retrospective Cohort Study

Exposure Disease
?

Both exposure and disease have already occurred


Retrospective Cohort Study: Example

In 1963, a group of asbestos workers was identified from social


security tax returns during 1948-1951

All deaths in the group 1948-1963 were investigated and


compared to deaths of a group of cotton textile workers

An excess lung cancer mortality was revealed among the


asbestos workers

= Non-concurrent
Cohort
= Historical cohort
Study
FAMOUS COHORT STUDIES
The Framingham Study

One of the most important and best-known cohort of


cardiovascular disease begun in 1948
Population 30,000
Eligibility: age between 30 and 62 years of age
Many exposures: smoking, obesity, elevated blood pressure,
elevated cholesterol levels, low levels of physical activity
New coronary events were identified by examining the
study population every 2 years and by daily surveillance of
hospitalizations at the only one hospital in Framingham
The Framingham Study -hypotheses
The incidence of Coronary Heart Disease (CHD) increases with age.
It occurs earlier and more frequently in males
Persons with hypertension develop CHD at a greater rate than those
who are normotensive
Elevated blood cholesterol level is associated with an increased risk
for CHD
Tobacco smoking and habitual use of alcohol are associated with an
increased incidence of CHD
Increased physical activity is associated with a decrease in the
development of CHD
An increased in body weight predisposes a person to the
development of CHD
An increased rate of development of CHD occurs in patients with
diabetes mellitus
Cohort studies of Special Exposures

Atomic Bomb Causality Commission:


Hiroshima and Nagasaki survivors (effects of
radiation)

Dutch famine survivors (effects of starvation)


Cohort studies Childhood Health and Disease

Fetuses exposed to radiation from atomic bombs in Hiroshima and


Nagasaki during World War II
Followed-up for development of cancer and other health problems
resulting from intrauterine exposure to radiation
Exposure dose was calibrated for the survivors on the basis of how far
the person was from the point of bomb drop and the nature of
barriers were present
Determine risk of adverse outcome to the radiation dose that was
received
Cohort pregnancies during Dutch Famine in World War II
Identify cohorts who were exposed to the severe famine at different
times in gestation and to compare them with each other and with a
non-exposed group
OCCUPATION-BASED COHORTS

British Doctors Study ( Doll smoking)


Nurses Health Study (Speizer, Willett
many issues)
London civil servants (Marmot - SES)
Taiwanese civil servants (Beasley chronic
hepatitis & liver cancer)
Advantages of Cohort Studies

The temporal sequence btwn exposure & disease is


clearly established
Well suited for assessing efx of rare exposures
Incidence of disease can be calculated
Can examine multiple outcomes of a single
exposure
True estimate of risk can be calculated - RR
Disadvantages of Cohort Studies:

Time

Money expensive

Subject to loss of follow up

Added: data may not be completely reliable or


completeie. Unable to retain all records
When not to do a cohort study?

No clear distinction between exposed and not


exposed
Rare diseases not enough people
Chronic disease: long gap btwn exposure &
outcome more costly! Makes it harder to follow
the longer you have to do it
Bad records, unreliable data for retrospective
cohort study
5 Steps in cohort study

1. Choose the design


2. Selection of exposed group
3. Selection of comparison group
4. Follow up
5. Analysis and interpretation
Choice of Cohort Design

Depends on the study question


Food handling and gastroenteritis: retrospective study
design

To investigate the association between hyperlipidemia


and coronary heart disease: prospective study design
using community cohort (Framingham Study)
Selection of the Exposed Population
For diseases with common exposures: highly
compliant and motivated participants are
chosen e.g. doctors, nurses, union members,
veterans
For diseases with rare exposure: usually find high
exposure groups (uranium workers, Chernobyl
residents)
The higher exposure should lead to relatively higher
incidence of the disease
Selection of the Comparison Group

The comparison group must be free from the


exposure, but otherwise similar to the case group

In case of a special exposure group (asbestos


workers), an external comparison group (e.g.
general population) should be used
McMichael A. Mortality among rubber
workers: Relationship to specific jobs. J
Occup. Med 18:178,1976
Rubber workers at a tire manufacturing plant in
Akron, Ohio were followed for development of
disease and causes of death
Comparison group was from the general US
population, matched for age and sex
All cause mortality for rubber workers was only 82%
of general population
This is an example of the Hawthorne effect
Hawthorne Effect
When individuals participating in a study change behavior
(mainly towards positive) for a temporary period of time,
due to the fact that they are being observed

This can be addressed by using workers from the same


establishment who are not exposed to the risk factor, as
comparison group

e.g. within the same company compare factory floor


workers (exposed) and office workers (not exposed)
Sources of Data
Need accurate and readily available data, can be
difficult for retrospective cohort studies

Usual sources e.g medical / death certificates,


interviews, questionnaires

Exposure information: pre-existing records more


objective than patient interviews
Outcome Data
Death certificates

Hospital records

Periodic health exams of cohort (e.g. Framingham Heart


Study)

Questionnaires : validate with medical records (Nurses


Health Study)
Issues in a cohort study

Bias

Loss to follow up

Non participation
Bias in cohort study

Bias is less of a concern in cohort studies


than in case-control studies

Recall bias rare, but misclassification and


selection bias can still occur
Lost to Follow Up

Since cohort studies follow people over a period of


time, participants can move or become non-
compliant

A study that loses >25% of participants is


flawed

The study design should use stable, compliant


individuals who are committed
Case Control Studies
Types of Analytic Studies

Observational
Case-control
Cohort

Experimental
Randomized control trial
Case-control Study

People diagnosed as having a disease (cases) are


compared with persons who do not have the
disease (controls)

The purpose is to determine if the 2 groups differ


by exposure

It compares cases and controls with regard to the


exposures in their past Final
Case Control Studies

Study which involves identifying patients who


have the outcome of interest (cases) and control
patients who do not have that same outcome,
and looking back to see if they had the
exposure of interest
Final

The exposure could be some environmental


factor, a behavioral factor, or exposure to a drug
Case-control Study

Final
Case-control Study Design - benefits

Rare diseases can not be analyzed easily using another


approach. Case control study is best

Chronic diseases (e.g. cancer) have long latency periods. Case


control study is suitable

Time & money issues: cost efficient & less time to complete

Many exposure factors can be studied at the same time

Final
Case-Control Design - issues

Difficulties choosing appropriate controls


Can not get true estimate of risk (relative risk cannot be calculated
here b/c we dont have incidence - # of new cases; instead
calculate odds ratio)
Issues w/temporal association (which came first & which came
later which can be established however in Cohort Study)
Strong potential for bias
Confounding

Final
Design of Case-control Studies

I. Definition and selection of cases


II. Selection of controls
III. Ascertainment of disease & exposure status
IV. Analysis by calculating odd ratio (vs. relative risk
for cohort)

Final
I. Definition and Selection of Cases

The definition must be specific


e.g. meningioma, not brain tumor

Use standardized diagnostic criteria


II. Selection of Controls

selection of an appropriate control group is most


difficult issue in case-control design

They should be comparable to the source


population of the cases, including exclusions and
restrictions

No control group is optimal


Selection of Controls multiple controls

Investigators usually use 2 to 4 control groups, selected


in different ways.

Since cases are rarer than controls, and we choose


cases and controls, we can have more than one control
per case to improve the statistical power of our study.
Community Controls

Neighborhood controls
Best friend control (same habits + limiting
confounders)
Spouse or sibling control
Selection of Hospitalized Controls

Easily identified and readily available


Medical records & health histories available
Less non-response
Confounders: Hospitalized patients are more
likely to be smokers, alcoholics and with other
high risk behaviors
III. Ascertainment of Disease and
Exposure Status

Disease:
Hospital records, case-registries, pathology log books
etc

Exposure:
Interview, mail questionnaires, medical records etc
IV. Stratified Analysis

Create strata of the confounding variable


If sex is a confounder then analyze men and women
separately
If age is a confounder then analyze data separately for each
age group

Disadvantage:
It is extremely cumbersome

Difficult to control for more than 1 confounder at a time


Case-control Study and the
Odds Ratio

Incidence can not be derived in a case-control


study

The estimate of relative risk (odds ratio) can


be calculated
ODDS RATIO
Example
Of 200 patients in the hospital, 50 have lung cancer. Of these 50 patients, 45
are smokers. Of the remaining 150 hospitalized patients who do not have lung
cancer, 60 are smokers. This information can be used to calculate the odds ratio
for smoking and the risk of lung cancer.

Disease (n = 50)

LC (n = 50) No LC (n = 150)

Smokers 45 60

Exposure

Non-smokers 5 90
ODDS RATIO

Cases Controls
with LC without LC
smokers A=45 B=60
nonsmokers C=5 D=90

Odds Ratio =
ratio of odds of exposure among cases
ratio of odds of exposure among controls

OR = A/C or (A)(D) = (45)(90) = 13.5


B/D (B)(C) (5)(60)
Analysis of Case-control Studies

Case Control

Yes
a b
Exposure
No
c d

Odds Ratio = ad
bc
Odds Ratio (OR)

A ratio that measures the odds of exposure for cases


compared to odds of exposure for controls

OR Numerator: Odds of exposure for cases

OR Denominator: Odds of exposure for controls


Interpreting the Odds Ratio for CHD and
smoking

Odds Ratio = 1.62


Those with CHD are 1.62 times more
likely to be smokers than those without
CHD
The odds of exposure for cases are 1.62
times the odds of exposure for controls.
Bias
Bias is any systematic error in an epidemiologic study that
results in incorrect estimate of risk of association btwn
exposure & risk of disease

All studies, but esp case-control studies, have potential for


bias

The efx are difficult to evaluate in the analysis

Bias should be eliminated, if possible, when the study is


designed
Bias
Selection bias
Recall bias
Interviewer bias
Experimenter expectancy
Misclassification bias
Measurement bias

Final
Selection Bias

= sampling bias
Sample selected differs in properties in cases
and controls

Sample selection may involve pre- or post-


selection that may preferentially include or
exclude certain kinds of samples and therefore
affecting the results
Selection Bias - examples
A Case control study for heart disease and smoking. Ex:
Cases are selected from a community, controls are selected
from a health club.

A case control study for endometrial cancer and hormone


replacement therapy. Cases are post-menopausal women
and controls are younger women controls should actually
be taken from post-menopausal age w/out endometrial
cancer*** - THIS IS HOW we correct for selection bias
Recall Bias:

Recall bias occurs whenever inds with a particular


adverse effect outcome rmr their previous exposure exp
diff ly from those who arent similarly affected

Ex: People who are sick tend to think about possible


causes for their illness

Recall bias can lead to over or underestimation of risk

Final
Recall Bias - example

A study of prenatal infections & congenital


malformations:
Cases mothers of children with congenital
malformations
Controls mothers of children without congenital
malformations Final

Mothers of children with congenital malformations


remembered better about infections during pregnancy
Interviewer Bias

Interviewers who are aware of the study hypothesis are likely


to question cases and controls differently

More probing questions may be asked of cases

interviewer may unconsciously sabotage the process


Experimenter expectancy

Pygmalion effect

Experimenters expectations are communicated


to subjects, unintentionally

The subject then produce the desired effect


Misclassification and Measurement Bias

Misclassification: Subjects may be erroneously


categorized with respect to exposure or disease status

Measurement: Method of collecting information was


flawed
Control of Bias
Selection bias: controls picked from same source
as cases, use motivated individuals

Recall bias: one hospitalized control group

Interviewer bias: highly trained personnel, blinded


to study hypothesis, standardized
questionnaires etc

Misclassification bias: use standard sources to


validate
Confounding

A potential confounder is the variable that is known to be


associated with the outcome (effect) even though it is not
the variable under study

Confounder

Exposure Disease
Final
Confounders

Common confounders include age, gender, tobacco,


alcohol, socio-economic status

e.g. A case-control study shows an association between


decreased level of physical activity and increased risk of MI.
Could age be a confounder?

Final
Controlling for Confounding

Study design:
Matching
Restriction
Analysis:
Stratified analysis
Multivariate analysis

Final
Matching
Cases and controls are matched by usual
confounders (e.g. age, sex, SES, smoking, alcohol
etc.) so that these factors are equally distributed in
both groups and will not confound the association
between the variables

Disadvantage: It can be very difficult and expensive to


find a perfect match for each case
Restriction
Another way to reduce the effect of confounders in a study is
to place restrictions on the study subjects

If smoking can be a confounder, then only enroll non-smokers

If age is a confounder then place age restrictions


Evaluation of a Case-control Study

Was the study design appropriate?


How were cases diagnosed and selected?
How were controls selected?
Did the investigators identify areas of bias and
confounding?
How was bias minimized?
How was confounding controlled?
Randomized Clinical
Trials
Intervention (Experimental) Studies

Also known as the clinical trial

Similar to a cohort study as individuals are studied on


basis of exposure

Main difference btwn observational & intervention


studies is that investigator in intervention study have
full control over exposure received by the participants

In RCT, tx is assigned by randomization


Randomized Clinical Trial (RCT) vs.
Prospective Cohort Study
Population (RCT) Population (Cohort)

Volunteers Volunteers

RANDOMIZE

intervention control exposed non-exposed

outcome outcome outcome outcome


Types of Intervention Studies

Clinical trial (Therapeutic)


Does the agent or procedure diminish symptoms
(disease) or decrease mortality from disease in a group
of individuals?
e.g. most drug trials and therapy trials

Community trial (Preventive / Prophylactic)


Does the agent or procedure decrease the incidence of
disease in a given community?
e.g. most vaccine trials
Uses of RCT

Evaluate new drugs


Evaluate new treatment procedures
Testing efficacy of new health care
program
Assessment of preventive measures
Assessment of new program for screening
Assessment of new ways to deliver health
services
Design of RCT
Reference population

Experimental population

Participants

Treatment Allocation (Randomly)

Treatment group Comparison group(s)


Crossover Design
The subjects get both treatments in sequence

Each subject serves as his / her own control

A subject is randomly assigned to a specific treatment


order

Some subjects will receive the standard therapy first,


followed by the new therapy (A,B). Others will receive
the new therapy first, followed by the standard
therapy (B,A) Final
Crossover Design-planned

Final
Conducting a Clinical Trial

Formulate the hypothesis

Choose sample size and select participants

Do necessary exclusions

Random assignment

Outcome measurement
Final
Selection of Study Population

reference group: general group to whom


results will be applicable to

Expal pop: actual group in whom study is


conducted - must be of adequate sample
size and of potential to reach endpoints

Individuals are then assigned randomly to


treatment and comparison groups
Allocation of Study Regimens

Assignment to tx groups should occur after study


pop is chosen & informed consent has been obtained

Randomization tables & computer generated


randomization used most frequently

Block randomization can be used when you wish to


maintain equal #s of pop charac in each group, e.g.
gender (make equal # of men & women in both drug
A & drug B)
Final
Block Randomization

Study population n=1200


(100 women & 1100 men)

100 women 1100 men

Randomization occurs after


assignment into blocks
Drug Drug male and female Drug Drug
A B A B

n=50 n=50 n=550 n=550


Why do we randomize?

Reduces bias due to known and unknown


confounders - should have equal numbers of
potential confounders in all groups

Reduces bias - no selection bias

Final
Blinding

Single blind only experimenter knows assignment


of subjects, subjects dont

Double blind neither experimenter nor subjects


know the assignments

Triple blind subject , experimenter & data analysts


are blind to assignments

Final
Blinding

A double blind trial (or triple-blind) provides the


best protection against bias

When the study is not conducted blind it is


important to scrutinize it carefully for bias are
the groups followed with equal intensity for
evaluation of the outcome?
Why Choose a Control Group?

All intervention studies use a control group


Show that the new treatment is truly effective
The control group can be compared in various
ways:
No intervention vs. Intervention
Placebo treatment vs. Real Treatment
Standard care vs. New care

Final
Uncontrolled trials

Issues arising from not using a control group:

Predictable improvement (ie. headache/migraine


goes away within a day or two)
Fluctuation of disease severity
Hawthorne effect
Predictable Improvement

In many diseases, individuals who are sick will


recover without treatment e.g. common cold,
many other viral infections, headache

If no control group is used, the treatment may


appear to work when in fact it has not
Fluctuation of Disease Severity

Many diseases have a clinical course marked by


exacerbations and remissions, e.g. Crohn's Disease,
Multiple Sclerosis & Migraine

Treatment may be perceived to have a beneficial


effect, when in reality the lessening of symptoms
was part of the natural history of the disease
Hawthorne Effect

Individuals enrolled in a study will change


their behavior solely because they are being
watched as study participants

improvement in sx may just be b/c participant


adopted a healthier lifestyle & not b/c of drug.
This would be difficult to know without a
control group.
Types of controls

Historical control: from past

Concurrent non randomized control

Concurrent randomized control -


Preferred
Controls

Must be a concurrent control group

Must be randomized control

Should receive a placebo or standard


treatment rather than no treatment

Follow up and testing should be identical


Factors that can affect the outcome of RCT

Errors in hypothesis testing

Sample size

Post randomization changes in groups

Analysis of data
Post-randomization Changes in
Comparison Groups

Migration bias:
Study participants may drop out, switch
Tx groups, become non-compliant

Compliance bias:
Inds in 1 tx arm may drop out at higher
rates due to factors such as side efx
Analysis of Data

Intention to treat (preserves random


allocation, simulates real world experience)

Explanatory only analyzes those who


actually take treatment
Intention to Treat

Study population

Drug A Placebo
1000 1000

200 800 1000


non-compliant compliant compliant

250 250
cured cured

Intention to treat means that the cure rate for Drug A


is calculated as 250/1000 = 25%
The cure rate for placebo arm is 250/1000 = 25%

Drug A has no effect


Explanatory
Study population

Drug A Placebo
1000 1000

200 800 1000


compliant compliant
noncompliant

250 250
cured cured

Explanatory analysis means that the cure rate for Drug A is calculated as
250/800 = 31%
The cure rate for placebo arm is 250/1000 = 25%

Drug A is more effective than placebo


Important

Analyses in the medical literature (drug


trials) should always be analyzed and
reported by intention to treat method
Intention to Treat why include non-
compliant

Attempting to account for noncompliance by


excluding noncompliant subjects can bias the
treatment evaluation
In clinical practice, some patients are not fully
compliant
Compliant subjects usually have better outcomes
than noncompliant subjects, regardless of tx
Outcome:
Efficacy vs. Effectiveness

Efficacy = ability of tx to work in ideal study setting

Effectiveness = ability of tx to work under realistic


circumstances by using intention to treat analysis;
preferred over efficacy

For efficacy trails, explanatory analysis can be used


but for effectiveness trials we use intent to treat
Multi-center trials

There may not be enough patient at one given


center to give big enough sample for a study

Many centers conduct a study using the same


intervention and placebo
Trials of N equal to 1:

Clinical trial done w/ind patients


Pt given tx or placebo randomly at diff times
A record is kept of simple outcomes like
symptom score, relief etc
Only possible in conditions which occur
frequently & resolve quickly e.g. migraine,
asthma
Phases of Clinical Trials

Preclinical Phase: Animal studies

Phase I: Initial testing in healthy human volunteers


following animal studies
Identify dose limiting toxicities, tolerated doses, describe
pharmacology (metabolism, excretion)

Phase II: testing in subjects w/disease to determine activity


& therapeutic efficacy Validate toxicity & dosage data

Phase III: Randomized trials for comparison w/Standard


therapy

Phase IV: Studies done after drug/tx has been marketed to


gather info on drug's effect in various pops & any side
efx associated w/long-term use
Declaration of Helsinki

Declaration of Helsinki 1964 (WMA)-


Document on research ethics

Informed consent must be obtained from all


participants involved in human experiments
Ethical Aspects

Informed consent
Protecting the interests of the patient
Withholding treatment known to be effective
Monitoring for toxicity and adverse effects
Stopping rules
When to withdraw a patient from study
Informed Consent

Patients must be aware of the study hypothesis


They must understand that they can be assigned to
treatment or placebo arms
They must be told all possible consequences of
participation
Minors can only enroll with guardians consent
An ethics board must oversee the study

Final
Stopping Rules

Guidelines for deciding when a trial should be


modified or terminated:

Over time knowledge about a disease treatment


may come to light
New treatments may become available
If the results show a sustained statistical association,
it is unethical to withhold treatment from the
placebo arm
DeMets D Hardy R et al Statistical Aspects of early
termination in the Beta-Blocker Heart Attack Trial. Cont Clin
Trials 5:362,1984

Beta Blocker Heart Attack Trial was a randomized double blind


study comparing propanolol with placebo in 3837 patients with
a recent myocardial infarction
The trial was terminated by external monitoring board 9
months before schedule
The propanolol group had a 26% reduction in mortality
compared to placebo (p = 0.005)
Screening
Screening
A strategy used to identify disease in an unsuspecting population

Tests are performed mainly on those without any clinical


indication of disease, the apparently well

Test must be simple, rapid and preferably inexpensive


Screening
Basic purpose is to detect disease from a large group of
apparently well persons early.
Thus enabling diagnostic workup and if diseased, brought
to treatment with intention to reduce mortality and
suffering from this disease
Ex: Pap smears (cervical cancer), Colonoscopy (colon
cancer), Mantoux tests (TB), mammograms (breast cancer)
& PSA (prostate cancer) tests
Interpretation of
Diagnostic Procedures
Two aspects of measurement that are
crucial in evaluating laboratory
tests, physical maneuvers, or any
diagnostic procedure:

Reliability
Validity
Reliability aka Reproducibility

Whether a lab test consistently gives the same value when


multiple tests are conducted on the same sample

Inter-rater reliability degree of agreement among raters


Test-retest reliability measure of reliability obtained by
administering the same test twice over a period of time to a group of
individuals

A good Kappa value, that indicates a reliable test &


reliable raters, is at least 75%.
Validity
screening tests ability to do what its supposed to
do:
To distinguish btwn subjects w/condition & those
w/out
So
Whether what is intended to be measured is in fact
measured
i.e. whether a positive lab test indicates a
person truly has the disease
Validity
This is measured by
sensitivity and specificity

If the disease is present how often does the test detect it


: Sensitivity

If the disease is not present how often does the test


correctly gives a negative result : Specificity
Final
Validity: Screening Tests
Disease status

Present Absent
+ +

Results of
Positive a b
True +ve False +ve
Screening test
- -
Negative
c d
False ve True ve

Final
Screening Tests
Disease status

Present Absent

True False
Positive
Results of positive positive
Screening Test
Negative False True
negative negative
Sensitivity
The proportion of persons w/disease whore correctly
identified by test

= The probability that a diseased person will have a positive


test result
true positive rate

A highly sensitive test gives positive results in individuals who have


disease

Sensitivity= true positives = true positives_________ x100


Diseased individuals true positives + false negatives
Final
Used L column only
Sensitivity
Disease
Present Absent

Positive TP or a FP or b
Test

Negative
Test FN or c TN or d

Measures only the distribution of persons with disease


Uses data from the left column
Final
Specificity
The proportion of persons without the disease who are
correctly identified by the test
= The probability that a disease-free individual will have a
negative test result
true negative rate

A highly specific test gives negative results in individuals who do not


have disease

Specificity= true negatives = true negatives________ x 100


Non-diseased individuals true negatives + false positives
Use R column ONLY! Final
Specificity
Disease
Present Absent

Positive TP or a FP or b
Test

Negative
Test FN or c TN or d

Measures only the distribution of persons who are disease free


Uses data from the right column
Final
Application of sensitivity and
specificity
Screening tests do not have both 100% sensitivity &
100% specificity

In order to have a high yield often a series of tests are


done, 1st with high sensitivity and the second with high
specificity

Examples include VDRL & FTA-ABS for syphilis and


ELISA and Western Blot testing for HIV Final
Relation Sensitivity and
Specificity

We would like to have a sensitivity and


specificity that are both as close to 100% as
possible

In practice we may gain sensitivity at the


expense of specificity and vice versa

Final
Population distribution of intraocular
pressures in those with and without
Low specificity Glaucoma
High specificity
High sensitivity Low sensitivity

Screening level set here :


Area of Poor Sensitivity all ppl w/disease WONT be IDd
Overlap Good Specificity all ppl w/out disease will be correctly
Numbers of Eyes

IDd out
Hence less false +ves

Screening level set here:


Eyes Good Sensitivity all ppl w/disease will be IDd
without Poor Specificity but all ppl w/out disease will not be
correctly IDd
Glaucoma Hence less false -ves

Eyes with Glaucoma


14 16 18 20 22 24 26 28 30 32 34 36 38 40 42
Final
Intraocular pressure in MM of HG
Post-test Probability

Positive Predictive Value


Negative Predictive Value

Final
Positive Predictive Value = PPV

probability that ind with a positive test result has disease

PPV= true positives = a/(a+b)


all with positive tests

USE ONLY TOP ROW!

Final
Positive Predictive Value
Disease
Present Absent

Positive TP or a FP or b
Test

Negative
Test FN or c TN or d
Final

TP/(TP+FP)
Measures only the distribution of persons with a positive test
Uses data from the top row
Negative Predictive Value

The probability that an individual with a negative


test result does not have the disease.

NPV: true negatives =d/(c+d)


all with negative tests

ONLY USE BOTTOM ROW


Final
Negative Predictive Value
Disease
Present Absent

Positive TP or a FP or b
Test

Negative
Test FN or c TN or d

Final
TN/(TN+FN)
Measures only the distribution of persons with a negative test
Uses data from the bottom row
Two examples
Disease
(as determined by Final
"Gold standard")

Present Absent

True Pos pred


Pos False Positive
Positive value
Test
outcome
False Neg pred
Neg True Negative
Negative value


Sensitivity Specificity
FOB screen test is used in 203 people to look for bowel cancer:
Patients with bowel cancer
(as confirmed on endoscopy) Final
Present Absent
= TP / (TP + FP)
Pos TP = 2 FP = 18 = 2 / (2 + 18)
FOB = 2 / 20 10%
test = TN / (TN + FN)
Neg FN = 1 TN = 182 182 / (1 + 182)
= 182 / 183 99.5%


= TN / (FP + TN)
= TP / (TP + FN)
= 182 / (18 + 182)
= 2 / (2 + 1)
= 182 / 200
= 2 / 3 66.67%
PPV=10%: Positive test is poor at confirming 91%
cancer
Sensitivity: It will pickup 66.7% of all cancers
Specificity: As initial screen it correctly identifies 91% of those who do not have cancer
NPV=99.5%: As a screening, a negative result is very good at reassuring a patient
does not have cancer
Breast Cancer Detection and Implications for
Periodicity of Screening. Am J. Epi 100: 357-366,1974

Breast Cancer
Present Not Present

Positive 132 985


Screening Test

Negative 47 62,295

Prevalence = (a+c)/ (a+b+c+d)= 179/63,459 = 0.3%


Breast Cancer
Present Not Present Final

Positive
132 985
Screening Test

Negative 47 62,295

1. Sensitivity = a/ (a+c) = 132 / 179 = 73.7%

2. Specificity = d/(b+d) = 62,295 / 63,280 = 98.4%

3. PV+ = a / (a+b) = 132 / 1117 = 11.8%

4. PV- = d/ (c+d) = 62,295 / 62,342 = 99.9%


Accuracy
Proportion of all subjects who were correctly
classified by the test
The degree to which a measurement represents the true value

(TP + TN) / (TP+TN+FP+FN) =

True Positives + True Negatives / Total Screened


Gold-standard
The gold-standard is a test that is considered to be the
most accurate among all the known tests. All the others
should be compared with this test, in order to indicate
whether they are reliable.
Prevalence

The proportion of individuals in a population who


have the disease

Number with disease


Total Number of individuals in the study

(a+c)/ (a+b+c+d)
Predictive Value,
specificity & prevalence
The Positive Predictive Value
the probability that if the test is positive, the patient truly has the
disease

depends on:
the Specificity and
even more on Prevalence of the disease

PPV increases when specificity &/or prevalence increases!!!


Disease
present absent

Positive
900 4950
Screening Test
Prevalence 1%
Negative 100 94,050

1. Sensitivity = a/ (a+c) = 900 / 1000 = 90%

2. Specificity = d/(b+d) = 94,050 / 99,000 = 95%

3. PV+ = a / (a+b) = 900 / 5850 = 15.4%

4. PV- = d/ (c+d) = 94,050 / 94,150 = 99.9%


Disease
present absent

Positive
900 1980
Screening Test
Prevalence 1%
Negative 100 97,020

1. Sensitivity = a/ (a+c) = 900/1000 = 90%

2. Specificity = d/(b+d) = 97,020/99,000 = 98%

3. PV+ = a / (a+b) = 900/2880 = 31.3%

4. PV- = d/ (c+d) = 97,020/97,120 = 99.9%


Disease
present absent

Positive
4,500 4,750
Screening Test
Prevalence 5%
Negative 500 90,250

1. Sensitivity = a/ (a+c) = 4,500/5000 = 90%

2. Specificity = d/(b+d) = 90,250/95,000 = 95%

3. PV+ = a / (a+b) = 4,500/9,250 = 48.6%

4. PV- = d/ (c+d) = 97,020/97,120 = 99.9%


Effect of Prevalence on
Positive Predictive Value

Prevalence PV+ Sensitivity Specificity


% % %

0.1 1.8 90 95
1.0 15.4 90 95
5.0 48.6 90 95
50 94.7 90 95
Key Points
If the prevalence of a disease low, then PV+ will be
low even if you have a test w/high sensitivity &
specificity

This is a reason that b/c most rare diseases are not


screened for

The yield can be increased by screening in high risk


groups (Tay Sachs among Jews, sickle cell among
African Americans/Mediterranean origin, Huntingtons
Disease in family groups)
Bias in screening programs

Sometimes even if a test is valid and reliable, the results


obtained can be biased

Three main types exist:


lead time bias,
length bias and
selection bias
Lead Time Bias
The misperception that the case has a longer
survival simply because the disease was identified
earlier in the natural course of disease (even if tx
not working)
Misperception that screening test detection time is
when disease started falsely proving screening
test improves/increases survival time
Lead Time Bias

AGE 35 40 41 43 45
I I I I I
Biologic Disease Patient A Symptoms A&B
onset of detectable diagnosed develop: both
disease by screen at screening B diagnosed die

2 patients A & B and their courses of disease


Both have same age-specific mortality, but different survival
times from diagnosis.
Length Bias
Tumors detected by screening programs tend to be slower
growing and therefore have a better prognosis
The faster growing tumors may become symptomatic in
between scheduled screening and are usually more
aggressive

So screening tends to find tumors with a inherently better


prognosis (aka benign) just good at itdoesnt mean
Misperception that screening itself leads to better
outcomes.
Selection Bias
Individuals who are motivated to participate in
screening programs may have a different probability of
disease than individuals who refuse to participate

Ex: Women with family history breast cancer


joining screening may give biased result; more
women with illness are found and more dying of
it. When it is applied to general population the
results may be different
Controlling Bias
Large and strict RCT
but
These take a long time and are expensive
Fundamental Concepts of Screening
Sensitivity: a/(a+c) = true +ve rate
given disease, how many have a positive test
Specificity: d/(b+d) = true ve rate
given no disease, how many have a negative test
PV+: a/(a+b) = DISEASE RATE
given a positive test, how many have disease
PV-: d/(c+d) = NON-DISEASE RATE
given a neg.test, how many do not have disease

PV+ increases esp. if Prevalence increases


PV+ increases also if Specificity increases
Bias: Lead Time, Length and Selection
A blood test to detect prostate cancer was given to
1000 male members of a large HMO. Although 100
of the men actually had prostate cancer, the test was
positive in only 30; the other 70 patients with prostate
cancer had negative tests. Of the 900 men without
prostate cancer, the test was positive in 150 men and
negative in 750. The specificity of this test is
approximately

A. 7% Whats sensitivity?
B. 17% 30% 30/ [30+70] = 30/100 =
T+ F+
C. 18% F- T-
N = 1000
D. 30%
True +ve = 30 Total true = 100
E. 83% True ve = 750
30 150
70 750
False ve = 70 Total false = 900
False +ve = 150
Cut point of a screening test intended for
detecting Diabetes Mellitus was lowered from
140 mg to 130 mg of glucose per dL of blood.
Change in the cut point for this screening test
would
A. Increase specificity
B. Increase sensitivity
C. Decrease true positive rate
D. No change in sensitivity or specificity
since the number of people with DM will
remain the same
Confidence Interval
for Relative Risk and Odds ratio
Confidence Interval
(in terms of RR or OR)
Provides an interval range around the odds ratio or
the relative risk and represents the range within
which the true magnitude of effect lies

Usually set at 95% level equivalent to p< 0.05


Provides all the information of the p value and
more

If interval doesnt contain 1.0 then association


btwn variables is SIGNIFICANT***

LOOK NOTES UNDERNEATH*


Example 1
A study is designed to investigate the association
between body fat and breast cancer. The results show a
risk ratio of 6.0, however the 95% confidence interval is
(0.8 - 23.2).
Since the interval includes 1, the results may be due to
chance alone not significant
Example 2
A study is designed to investigate the association
between alcoholism and cirrhosis of liver. The results
show a risk ratio of 4.5 (95% CI 2.2-6.8).

Since the interval does not include 1, the result is found


to be statistically significant
CAUSE EFFECT
RELATIONSHIP
Hills Criteria
Association and Cause

Just because there is an association does not


mean it is a cause as well

Once you see an association, you apply certain


criteria to see if it is causal.
Knowing When to Accept the Findings of a
Study

Statistical Significance is only part of the answer


Hills Criteria are used to help you
decide whether or not to accept the
findings
Whenever you do a criticism of a medical study you
should use Hills Criteria

Final
8 Hills Criteria:
1. Study design
2. Strength of Association
3. Consistency
4. Correct Temporal Relationship clearly established or not?
5. Dose Response Relationship
6. Plausibility is there a known scientific explanation?
7. Specificity
8. Analogy

Final
1. Study Design

Rank strongest to weakest study design

Experimental study (RCT) strongest


Prospective cohort study
Historical cohort study
Case-control study
Cross-sectional study
Case-series
Case report weakest
2. Strength of Association

Large relative risk or odds ratio


e.g. RR of 27 vs. RR of 1.68

Statistically significant (p value < 0.05)


e.g. p value of 0.00001 vs. 0.047
3. Consistency

Several diff studies conducted at diff times in diff settings &


w/diff patients all come to same conclusion.

For example:
Many studies of different designs (case control, cohort, case
series etc.), using different subjects, found an association
between smoking and lung cancer
4. Correct Temporal Relationship

Cause must always precede effect


Ex: Smoking precedes lung cancer

Final
5. Dose Response Relationship

Risk increases with increasing exposure to


the risk factor
Increased smoking increases the risk of lung
cancer

Final
6. Biologic Plausibility

Consistent with the current knowledge of the


underlying mechanisms of disease
Or
Makes sense according to current knowledge

e.g. smoking and lung cancer


7. Specificity (??)

Single cause linked to a single effect provides evidence


in favor of a causal relationship
Usually acute infectious diseases or hereditary diseases
/ single gene defects
Cause need not be specific to causing that disease,
may also cause other diseases
8. Analogy

Existence of other cause and effect relationships


analogous to the one in question

If toxins in cigarette smoke can cause lung disease, so


can other toxins like asbestos, arsenic and uranium.
Nested Case-control Studies

A case-control study can be inserted into a cohort


study

When enough individuals have developed the


outcome of interest, they can be compared to
controls

This allows us to look at and compare for other


exposures
Nested Case-Control Study
Start as cohort study
Population
Exposed and unexposed groups

Do Not
Develop Develop
Disease Disease

Cases Subgroup
Selected as
Controls

CASE-CONTROL STUDY
Evidence based medicine

: conscientious explicit & judicious use of current best


evidence in making decisions about care of ind pts
Conscientious being careful, & thorough, in what you do
Explicit being open, clear and transparent
Judicious using good judgment and common sense
Using the most reliable evidence from clinical research, scientific understanding and medical practice
to make the best possible medical decisions for patients.

EBM not only identifies which txs are effective but also those which are ineffective and may do
more harm than good, and identifies areas where more investigation is needed and where there may
be gaps in knowledge
Steps for practicing EBM
Step 1: Formulating a well built question

Clinicians work in order to convert the need for information


(regarding prevention, diagnosis, prognosis, therapy,
causation, etc) into an answerable question.

Example: Is an exercise program or a nutritional education


program more effective in reducing weight in obese
elementary school children?
Steps for practicing EBM

Step 2: Identifying resources

Clinicians seek to assemble the best and most up-to-date evidence with
which to answer that question.

Need to consult several types of information resources.

Example: Harrisons principles of internal medicine, Cochrane


database of systematic reviews, Dynamed, MEDLINE
Steps for practicing EBM
Step 3: Critical appraisal

Clinicians appraise and assess evidence for its validity


(truthfulness), impact (size of effect), & applicability
(usefulness to specific clinical practice & situation).

Who were the patients and how were they selected?

Were they randomized for treatment?


How were the confounders addressed?
Were the results significant?
Steps for practicing EBM

Step 4: Applying the evidence

Clinicians integrate the critical appraisal with clinical expertise


and with the patients unique biology, values, and
circumstances.
Steps for practicing EBM
Step 5: Re-evaluation

Clinicians evaluate effectiveness in executing Steps 1-4.


Clinicians also seek ways to improve methods for next
clinical encounter.
Systematic reviews
A thorough, comprehensive, and explicit way of interrogating the
medical literature. It typically involves several steps, including (1)
asking an answerable question (2) identifying one or more
databases to search, (3) developing an explicit search strategy, (4)
selecting titles, abstracts, and manuscripts based on explicit
inclusion and exclusion criteria, and (5) abstracting data in a
standardized format.

Ex: Cochrane reviews


Abstract
Background
Depression is a debilitating condition affecting more than 350 million people worldwide (WHO 2012) with a limited number of evidence-based
treatments. Drug treatments may be inappropriate due to side effects and cost, and not everyone can use talking therapies.There is a need for evidence-
based treatments that can be applied across cultures and with people who find it difficult to verbally articulate thoughts and feelings. Dance movement
therapy (DMT) is used with people from a range of cultural and intellectual backgrounds, but effectiveness remains unclear.

Objectives
To examine the effects of DMT for depression with or without standard care, compared to no treatment or standard care alone, psychological
therapies, drug treatment, or other physical interventions. Also, to compare the effectiveness of different DMT approaches.

Search methods
The Cochrane Depression, Anxiety and Neurosis Review Group's Specialised Register (CCDANCTR-Studies and CCDANCTR-References) and
CINAHL were searched (to 2 Oct 2014) together with the World Health Organization's International Clinical Trials Registry Platform (WHO ICTRP)
and ClinicalTrials.gov. The review authors also searched the Allied and Complementary Medicine Database (AMED), the Education Resources
Information Center (ERIC) and Dissertation Abstracts (to August 2013), handsearched bibliographies, contacted professional associations, educational
programmes and dance therapy experts worldwide.

Selection criteria
Inclusion criteria were: randomised controlled trials (RCTs) studying outcomes for people of any age with depression as defined by the trialist, with at
least one group being DMT. DMT was defined as: participatory dance movement with clear psychotherapeutic intent, facilitated by an individual with a
level of training that could be reasonably expected within the country in which the trial was conducted. For example, in the USA this would either be a
trainee, or qualified and credentialed by the American Dance Therapy Association (ADTA). In the UK, the therapist would either be in training with, or
accredited by, the Association for Dance Movement Psychotherapy (ADMP, UK). Similar professional bodies exist in Europe, but in some countries
(e.g. China) where the profession is in development, a lower level of qualification would mirror the situation some decades previously in the USA or
UK. Hence, the review authors accepted a relevant professional qualification (e.g. nursing or psychodynamic therapies) plus a clear description of the
treatment that would indicate its adherence to published guidelines including Levy 1992, ADMP UK 2015, Meekums 2002, and Karkou 2006.
Main results
Three studies totalling 147 participants (107 adults and 40 adolescents) met the inclusion criteria. Seventy-four participants took
part in DMT treatment, while 73 comprised the control groups. Two studies included male and female adults with depression. One
of these studies included outpatient participants; the other study was conducted with inpatients at an urban hospital. The third
study reported findings with female adolescents in a middle-school setting. All included studies collected continuous data using
two different depression measures: the clinician-completed Hamilton Depression Rating Scale (HAM-D); and the Symptom
Checklist-90-R (SCL-90-R) (self-rating scale).

Statistical heterogeneity was identified between the three studies. There was no reliable effect of DMT on depression (SMD -0.67
95% CI -1.40 to 0.05; very low quality evidence). A planned subgroup analysis indicated a positive effect in adults, across two
studies, 107 participants, but this failed to meet clinical significance (SMD -7.33 95% CI -9.92 to -4.73).

One adult study reported drop-out rates, found to be non-significant with an odds ratio of 1.82 [95% CI 0.35 to 9.45]; low quality
evidence. One study measured social functioning, demonstrating a large positive effect (MD -6.80 95 % CI -11.44 to -2.16; very
low quality evidence), but this result was imprecise. One study showed no effect in either direction for quality of life (0.30 95% CI
-0.60 to 1.20; low quality evidence) or self esteem (1.70 95% CI -2.36 to 5.76; low quality evidence).

Authors' conclusions
The low-quality evidence from three small trials with 147 participants does not allow any firm conclusions to be drawn regarding
the effectiveness of DMT for depression. Larger trials of high methodological quality are needed to assess DMT for depression,
with economic analyses and acceptability measures and for all age groups.
Meta-analysis

Statistical approach to combine the data derived from a


systematic-review. Therefore, every meta-analysis should be
based on an underlying systematic review.

Calculation of effect size from all the studies.


Biostatistics I
This Lecture

Frequency Distribution
Measures of Central Location
Measures of Variance
Why Study Statistics?

As medical students / clinicians you are:

Researchers
Consumers of medical research
Statistics in Medical Research
The goal:
To design the process and extent of sampling
in order to form valid and accurate inferences

To make inferences about a population by


analyzing sample data

To make assessments of the extent of


uncertainty in these inferences
With Statistics

We may find differences


(variability) when we make
comparisons

Real differences?
Due to chance?
Frequency Distribution

Ways to describe variation in clinical data:

- Numerical

- Pictorial
Always: values of the variables on horizontal axis and
their frequencies on the vertical axis
Frequency Distribution;
Graphical Display: Pie Chart
Frequency Distribution;
Graphical Display
Frequency Polygon:
The midpoints of the top
of each bar of the
histogram are plotted
and connected with
straight lines.
This makes it easier to
put two or more sets of
data on same graph
Shape of distributions
Properties of frequency
distribution:

- Shape of frequency distribution

- Central Location or Central Tendency

- Variation or Dispersion
Shape of distribution
Symmetric
- Normal distribution (Gaussian Curve)

Skewed
- Tail to the right: positively skewed
- Tail to the left: negatively skewed
Symmetric Distribution

15
RBC cholinesterase
mmol/min/ml

10 Frequency
Freq
5

0
5.95-7.95 7.95-9.95 9.95-11.95 11.95- 13.95-
13.95 15.95

Cholinesterase levels
RBC Cholinesterase
5.

0
5
10
15
95
-7
.9
7. 5
95
-9
.9
9.
95 5
-1
11 1.
95
.9
5 -1
13 3.
95
.9
5 -1
15 5.
95
.9
5 -1
7.
95
Negatively skewed

Frequency

RBC Cholinesterase
5.
95

0
2
4
6
8
10
12
14
16

-7
.9
7. 5
95
-9
9. .9
95 5
-1
11 1.
.9 95
5-
13
13 .9
.9 5
5-
15
15 .9
.9 5
5-
17
.9
5
Positively skewed
Frequency
Normal distribution
Symmetric (bell-shaped) curve
Measures of central tendency

Mean (Arithmetic mean)


Median
Mode
Mean = X
arithmetic average (X1+X2+..+Xn/n)=

sum of the observed measurements


number of observations

The arithmetic center of the distribution


It will give the average when using quantitative
variables with somewhat symmetric distribution
Most commonly used but:
It is sensitive to extreme values or outliers
Measures of central tendency
Median

That measurement below which half the


measurements fall, & half (50%) of #s fall above
that value = 50th percentile

e.g. The length of hospital stay for nine patients


1, 1, 3, 4, 8, 9, 12, 13, 15
median is the middle number = 8

Measures of central tendency


Median
What if the data was:
1, 3, 4, 8, 9, 12, 13, 15?

Median = (8+9)/2 = 8.5


(the average of the 2 middle numbers for an even number
of observations)

Measures of central tendency


Mode

The most frequently occurring observation.


If more than one value occurs frequently the
distribution can be bimodal or multi-modal

e.g. for values 1,4,3,1,2 the mode is 1

for values 2, 4, 2, 3, 1, 5, 1 the distribution is bimodal


as 1 & 2 occur most often
Measures of central tendency
Mode..
What if the data was:

1, 4, 6, 3, 2, 7, 9, 11, 5, 10, 8?

There is no mode for this distribution


Measures of central tendency
Positive skew (Tail to right)

Mean is greater than median; always twds tail


NeMEAN
Negative skew (Tail to left)

Median is greater than mean MeanMed


Determine the average length of stay
for six patients undergoing classic cholecystectomy.

The length of stay in days for each patient is 1, 3, 2, 2, 4, 5

Question:
Calculate mean, median and mode

Measures of central tendency


The average length of stay for six patients undergoing cholecystectomy. The
length of stay in days for each patient is
1, 3, 2, 2, 4, 5

a) Mean = 1+2+2+3+4+5 =17/6 = 2.83


6

b) Median = 2.5 (the average of the 2 middle numbers for an


even number of observations)

c) Mode = 2
Measures of central tendency
Variation or Dispersion
Properties of frequency
distribution:

- Shape of frequency distribution

- Central Location or Central Tendency

- Variation or Dispersion
Measures of Spread or Variation
RANGE

PERCENTILES AND QUARTILES

VARIANCE

STANDARD DEVIATION
Range:
-Arrange the data in ascending order
-Find out the maximum and minimum values
- Maximum value minimum value

Percentile: measure that tells us what percent of total ppl


scored below a given score
percentile rank = percentage of scores that fall below a given score.
Nth percentile = observation that has n% of the
values below it.
Measures of Variation
Variance - average of the squared differences
from the mean
difficult to interpret because it is in the
units of the variable squared

Standard deviation - square root of the


variance; summary of dispersion around the
mean
same units as the variable of interest

Measures of Variation
Standard deviation
Measure of absolute variation in a given data
set, and a supplement to the mean

Large SD : observations are widely spread out

Small SD: observations are closely centered


around mean

Measures of Variation
Standard Deviation

The positive square root of variance

s2 = (xi - x)2
n-1

Measures of Variation
BIRTHWEIGHT
2000

1000
Frequency

Std. Dev = 623.36


Mean = 3367.2

0 N = 9747.00
25 75 12 17 22 27 32 37 42 47 52 57 62
0. 0. 50 50 50 50 50 50 50 50 50 50 50
0 0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0 .0

BIRTHWEIGHT
Statistics

BIRTHWEIGHT
N Valid 9747
Mis sing 0
Mean 3367.19
Median 3405.00
Mode 3430
Std. Deviation 623.36
Variance 388574.54
Minim um 312
Maxim um 6605
Percentiles 25 3061.00
50 3405.00
75 3749.00
Birth weight
Birth weight is approximately normally
distributed (bell-shaped curve)

Mean and median are close

68% of values are within 1 standard deviation of


mean (2744 and 3990 grams)

95% are within 2 standard deviations


( 2121 and 4613 grams)
It has been found that many human biological
characteristics conform to a normal distribution
closely enough for it to be commonly used.

For example, heights of adult men and women,


blood pressures in a healthy population, and
many other types of laboratory measurements
and biochemical data.

Normal Distribution
Blood pressure
Why do we need to know this?
If a particular data shows a normal distribution, we
can apply the specific characteristics of normal
distribution to it

Helps us in deciding normal range ( mean + 2SD)


Enables us in comparing different populations
This will help us in testing a hypothesis

Normal Distribution
What is normal distribution?
It is a theoretically perfect frequency polygon which:

1.Takes the form of a bell shaped curve

2. Is symmetric

3.In which mean, median and mode coincide in


the center
Normal Distribution
In any normal curve, a constant proportion
of cases fall with 1, 2 or 3 standard deviations
of mean:

Within 1 SD: 68%


Within 2 SD: 95% = Normal Range
Within 3 SD: 99.7%

Normal Distribution
Biostatistics II
The Normal Distribution
The Normal Distribution
Questions

In a normal distribution curve, how many cases


are above 2SD above the mean? 2.5 %

In a normal distribution curve, how many cases


are above 2SD below the mean? 97.5%
The scores of a single student in two different
tests are given below. On which test did she do
better in the class?

Score Mean S.Deviation


Test A 45 30 5

Test B 60 40 10
MCQ
The distribution of factor X in a population of
men ages 20-40 follows a multi-modal
distribution with mean of 20 mg/dl and
standard deviation 2 mg/dl
95% of men will have Factor X levels between
ranges
A. 16-24
B. 18-22
C. 15-25
D. Can not be calculated
Bimodal distribution of height

= 2 modes
Statistical Inference

Generalizations from a sample to the


population as a whole
Inferential Statistics is about
Samples and Populations
For obvious reasons studies are not carried out on
entire populations

Sample populations are used to test hypotheses

Inferences are made about the total population


from the data obtained from the sample

The analysis of data of a sample includes


significance testing
Samples and Populations
underlying premise when conducting a sample study is that
participants are representative of general population

If we assume that sample is random, then mean of sample


(x) should approximate mean of population(). However, if
sample is small or its not random, there can be a large
difference btwn x & (sampling error)

To tell how close is the sample mean value to population


mean value, we use standard error of mean & confidence
interval
Standard Error of the Mean
SEM = SD / n
a measure of the variability of sample means about
the true population mean
(=the precision of the sample mean;
= the quality of the sample)

Larger size of study = more confidence we have in


sample mean, & smaller the standard error of mean
Confidence Interval of Mean
Admitting that any measurement from a sample is
only an estimate of the population:
A confidence Interval specifies how close
our sample based value lies to population value
= the true value

and it gives the range of these values

Confidence Intervals specify how confident we are


To estimate the limits within which the true pop mean
lies and to specify how confident we are of those
limits
To know CI from population mean

CI = x + confidence coeff x Standard Error Mean


Where confidence coefficient (Z score) for :
90% CI is 1.64
95% CI is 1.96 (for calculation use Z score of 2.0 for CI of 95%)
99% CI is 2.58

And SEM = SD / n
Example
length of stay in a Patient Length of
hospital for 5 patients Stay
1 3
we want to calculate 2 5
the 95% CI 3 2
4 3
5 2
Patient Length of Deviation (xi - x)2
Stay (xi - x)

1 3 3-3=0 0
Mean=3
2 5 5-3=2 4

3 2 2-3=-1 1

4 3 3-3=0 0

5 2 2-3=-1 1

CI= Mean Z score x standard error of the mean


S2 = (xi - x)2 = 6/4 = 1.5 S = 1.5 =1.22
n-1
SEM= s/ n = 1.22/2.24 = 0.54 ( 5=2.24)

95% confidence interval of the mean=


3 2 x 0.54 =(1.92 - 4.08)

99% CI= 1.60-4.39 Higher CI=wider interval


Confidence Interval

A 95% Confidence Interval means you are


95% sure (confident) the true value is
between the range of the CI

99% CI is wider than 95% CI


Confidence Interval
The wider the CI, the greater the variability
in the estimate of the effect

The larger the sample, the more precise the


estimate will be and the narrower the CI
e.g.
Prevalence rate of DM in people aged 46-64 years
CI of 95% in a Survey of 90 42.8 -61.0 per 1000
CI of 95% in sample 4x bigger 47.2 -56.6 per 1000
CI of 95% in sample one quarter 33.8 -70.0 per 1000
What is a Z-score?

Z score = score for normal distribution =


confidence coefficient = standard score
It has a mean of 0
It is represented in standard deviation units
Z-score of 2 is 2 SD above the mean,
Z-score -1.5 is 1.5 SD below the mean

To convert a score to Z-score


Z=(x x) / s
Question
Assume that national mean weight of females is
120 pounds, and the std deviation is 6 pounds.
If Mary weighs 112 pounds, what is Marys Z
score?

Applying Z=(x x) / s
= 112-120 /6 = -1.33
Samples and Populations

Standard Deviation = variability of observations

Standard Error of mean = variability of sample


means about true population mean

95% Confidence Interval of mean = range of


values for pop mean in which youre 95% sure
the true pop mean falls
Statistics I

Continuous Variables with symmetrical distribution require


information on: Shape, Central Location and Variation

Shape of Frequency Distribution:


Normal symmetrical spread or Skewed

Measures of Central Location:


Mean=average, Median=middle, Mode=most common
Statistics I
Measures of Variation:
Range, Percentiles, Standard Deviation(=square root of
Variance)
Normal Distribution (Bell shaped):
1SD=68%, 2SD=95%, 3SD=99.7%

Normal Range :
Mean +/- 2SD
Statistics I
Both SEM and confidence interval indicate how
precise (or imprecise) our estimate is.
The standard error of the mean, is based on variability
in data (the standard deviation) and the size of the
sample
SEM = SD / n

CI: is based on the SEM and defines the interval within


which the true magnitude of effect is likely to fall.
Most used CI 95%
CI = x + confidence coeff x Standard Error Mean
Statistical Notation

S = sample standard deviation


s2= sample variance
x = sample mean
n = number of observations
=population standard deviation
2 = population variance
= population mean
Biostatistics III
VARIABLES

SAMPLING

STATISTICAL SIGNIFICANCE

ERRORS

CHI SQUARE TEST

T TEST
VARIABLES

Medical research is simply the study of


variables and their relationships

When we study a single variable UNIVARIATE


When more than one MULTIVARIATE
TYPES OF VARIABLE SCALES
Categorical
Variable is categorized as one of two or more
alternatives;
* Nominal
* Ordinal
Numerical
* Discrete
* Continuous
TYPES OF VARIABLE SCALES
NOMINAL name; no order
e.g. blood types, gender, race
ORDINAL order
limited categories
e.g. SES, cancer staging

DISCRETE: count number, whole #s, no decimals


e.g. number of pregnancies, number of decayed filled
or missing teeth, # pts

CONTINUOUS: interval/ ratio continuous with


decimals
e.g. body temperature, BP, length, age
Data Collection Techniques
Door to door interviews
Telephone interviews / random digit dialing
Questionnaires by mail
Voter registration or motor registration lists
Patient lists from ERs
Community survey
Volunteer participation in study
SAMPLING
All studies rely on the fact that the sample is
representative of the population

If the sample is not representative, then the


study is flawed and the data are worthless

To ensure representation the sample must be


chosen randomly
WHAT IS A RANDOM
SAMPLE?
A random sample is one in which each member
of the population has an equal chance of being
selected

The sample should be representative of the


study population
SAMPLING TYPES
SIMPLE RANDOM
STRATIFIED
SYSTEMATIC
MULTISTAGE
SNOWBALL
CONVENIENCE
Statistical significance
When an association between 2 variables is seen, we
must ask - Is this true or due to chance?

The P value indicates the probability that the findings


observed could have occurred by chance alone.

The P value for statistical significance is usually set at


<0.05
We can use:
p value

Confidence interval
Statistical Significance
Confidence intervals and p-values are
used to demonstrate statistical significance
P-value< 0.05 and 95% C.I. Both state
that a result as extreme as the one
obtained is likely to have occurred by
chance only 5% of the time
i.e. you can assume that the result is
unlikely to have occurred by chance
How to use CI around Relative Risk
or Odds Ratio as a measure of
statistical significance?
Testing for statistical difference
using CI
Look at the CI carefully

See if the interval contains 1 in it.

If it does, it is not statistically significant

If it does not include 1, it is significant


Statistical Differences

Analysis of outcomes can be done using the


Relative Risk with either p-values or confidence
intervals

Characteristics of groups can be compared by


comparing proportions (chi-square) or means (t-
test)
Hypothesis
Definition:
Statement based on inference, existing literature, or
preliminary studies postulating that a difference exists
between two groups.

The possibility that this difference occurred by chance is


tested using statistical procedures.

Types of hypothesis:
-Null hypothesis (Ho)
--Alternative hypothesis (Ha)
Hypothesis Testing

(H0) Null hypothesis states there is no difference between


characteristics of groups or outcomes, e.g. no relationship
between exposure and disease.
No statistical significance = reject alternative hypothesis & accept
null hypothesis
(Ha) Alternative hypothesis states that there is a difference
between characteristics of groups or outcomes e.g. there is a
relationship between the exposure and the disease
Statistical significance = reject null hypothesis & accept alternative
hypothesis
Test Statistic
To test the hypothesis, we collect a sample & compute a test
statistic, such as a sample mean or sample proportion.

The way we compute the test statistic, depends on sample size, type
of variables , & sometimes shape of population distribution.
Type of variable

When comparing proportions chi square

when comparing mean t test


When to use a Chi-square test?

Chi-square test can be used to compare two


proportions (categorical data).
When to use t-test?
You want to know if treatment impacts on treatment
groups, when the data is interval/ratio, or if samples are
different (when you are comparing Means (averages)
between groups)

If you have 2 groups, use t-test

If you have more than 2 groups, use ANOVA (analysis


of variance)
Independent (non paired) t- test: tests mean
difference in body weights of subjects in group
A & group B at time 1 (i.e., two groups of
subjects are sampled on one occasion).
2 groups; 1 time

Dependent (paired) t- test: tests mean difference in


body weights of ppl in group A at Time 1 &
Time 2 (i.e., same sample people are sampled on
two occasions).
1 group; 2 times
Errors
Type I error ( error) usually preset at 0.05; rejecting null hypothesis
when its true
Thus Assuming there is a significant association when there is none
Prefixed value; how much error can we allow; aka significance lvl (where p value
comes from, & how it should be less than <0.05)

Type II error ( error) accepting null hypothesis when


its false
Thus Assuming there is no association when in fact there was

Power of the test = 1


How can you increase power? By increasing
sample size
Smoking and Birth weight

t-test

Is there a difference in the mean birth weight


of infants born to women who smoked
during pregnancy, compared with infants
born to women who did not smoke during
pregnancy?
Smoking and Birth weight

Null hypothesis: The mean birth weight of infants


born to women who smoke during pregnancy is
the same as among those who do not.

Alternative hypothesis: The mean birth weight of


infants born to women who smoke during
pregnancy is different from those who do not.
SMOKER Mean Variance Std Dev
1
1 = smokers 3177.96 434245.66 658.97
2
2 = non smokers 3417.27 385266.11 620.70
Difference -239.31

p-value t-value
0.00001 15.890287
Smoking and Birth weight
Mean birth weights:
born to non-smokers is 3417 grams (~ 7 lbs. 8 oz.)

born to smokers is 3178 grams (~ 7 lbs. 0 oz.)

difference is 239 grams (~ 8 oz.)

Test statistic: t-statistic


Associated P value is 0.00001 (very small)

Reject the null hypothesis

Result is statistically significant


When to use a t-test?

t-test is used to compare two means.

Birth weight - continuous with normal


(bell-shaped) distribution
One-tailed vs. two-tailed tests
A one-tailed test is used when we predict direction
of the difference in advance (e.g. one mean will be
larger than the other).
In standard testing, probability is calculated from
both tails. Thus, p-value from a two-tailed test is
twice the p-value of a one-tailed test.
It is rarely correct to perform a one-tailed test;
usually we want to test whether any difference
exists. So it is always better to perform a two
tailed test.
Smoking and LBW

Chi-Square test
Scenario: As clinicians, knowing that LBW is
associated with increased morbidity and mortality, we
are concerned that pregnant patients who smoke are
more likely to deliver a low birth weight (LBW) infant.

Is there a difference in the proportion of LBW infants


born to women who smoked during pregnancy,
compared with infants born to women who did not
smoke during pregnancy?
Smoking and LBW
Smoking -- any smoking during pregnancy
Variable - SMOKER:
EQUAL TO 1 IF SMOKED DURING PREGNANCY
EQUAL TO 2 IF DID NOT SMOKE DURING
PREGNANCY

Low birthweight (LBW) birth weight of less than


2500 grams (~ 5.8 lbs)
Variable - LOW BIRTHWEIGHT:
EQUAL TO 1 IF BIRTHWEIGHT < 2500 GRAMS
EQUAL TO 2 IF BIRTHWEIGHT 2500 GRAMS
Low Birthweight
SMOKER | 1 2 | Total
-----------+---------------+------
1 | 242 2037 | 2279
> 10.6% 89.4% > 23.4%
| 35.6% 22.4% |
2 | 437 7039 | 7476
> 5.8% 94.2% > 76.6%
| 64.4% 77.6% |
-----------+---------------+------
Total | 679 9076 | 9755
| 7.0% 93.0% |

Chi-Squares P-values
----------- --------
Uncorrected: 61.45 0.000001
Mantel-Haenszel: 61.44 0.000001
Yates corrected: 60.71 0.000001
Smoking and LBW

Null hypothesis: The proportion of low birth


weight infants born to women who smoke during
pregnancy is the same as among those who do not
(No association between smoking and LBW)

Alternative hypothesis: The proportion of low


birth weight infants born to women who smoke is
different from those who do not
Smoking and LBW
Proportions of LBW infants:
born to smokers is 10.6%
born to non-smokers is 5.8%

Test statistic: Chi-square


Associated P value is 0.0001 (very small)
Reject the null hypothesis of no association
Result is statistically significant which means
there is an association between smoking and
LBW
example
Analysis of the data from a research study
designed to examine the hypothesis that
estrogen replacement therapy is associated with
an increased risk for breast cancer reveals a p-
value of <0.01
Is this statistically significant result? Can the
researcher reject the null hypothesis?
Investigation of
Disease Outbreaks
What is an Outbreak

Definition:
The occurrence of cases of an illness
in excess of expectancy
Identify the existence of
the outbreak

Is the group of ill persons normal for the time


of year, geographic area, etc.?
(background information on disease
occurrence)
Epidemic Curves
Visual display of epidemics magnitude
and course

# cases by time of onset


Shape of the curve gives you clues
Epidemic Curves:
Point Source Outbreak

No. of cases
10
9
8
7
6
5
4
3
2
1
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Date
Propagated or
Person-person Outbreak

35

30
25
# CASES 20

15

10
5

0
1 4 7 10 13 16 19 22 25 28
Continuous Source Outbreak

No. of cases
10
9
8
7
6
5
4
3
2
1
9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29
Date
Initial assessment / action

Most important: Confirm the diagnosis

Is occurrence outside of normal expectation?


Review case histories from CDC or Health Department.
Communicate with local doctors / health workers involved
Interview and/or examine several cases
Discuss tests / quality of specimens with laboratory
involved
Read the literature, consult experts
Know the clinical presentation and spectrum
Descriptive epidemiology

Case definition and identification


Data collection
Synthesis: generate hypothesis
Descriptive epidemiology

An example of case definition


All children in grade 3 of a local school who took
part in the field trip on November 20th, and who fell
ill with vomiting and/or diarrhea between the
evening of the 20th and the evening of the 21st
November.
Descriptive epidemiology

Data collection
Time
incl. epidemic curve
Place Remember:
incl. place of residence, use standard format for
work, travel etc. data collection

Person organize your data;


keep track of all cases
age and sex
other demographic info
clinical symptoms
laboratory results
Questionnaires and forms
Example of questionnaire for foodborne disease
- Person ID: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Age: . . . . . . . years
- Sex: M F
- Ill: Y N
IF ILL: Start of symptoms: . . . . / . . . . / . . . . (date) . . . . . . (time)
- Fever Y N - Abdominal cramps Y N
- Nausea Y N - Diarrhoea Y N
- Vomiting Y N - Bloody diarrhoea Y N
- Duration of symptoms: . . . . . . . . . . .
- Treatment: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Hospitalization: Y N
- Outcome: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
- Lab tests: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Questionnaires and forms
Example of questionnaire for foodborne disease (cont.)
Meal 1 Meal 2 Meal 3 Meal 4
Date/time Date/time Date/time Date/time
........ ........ ........ ........
Place of meal:
........ ........ ........ ........
Food items:
........ ........ ........ ........
........ ........ ........ ........
........ ........ ........ ........
........ ........ ........ ........
Generate hypothesis
About
pathogen
route of transmission
Vector/vehicle
From
Symptoms and incubation period
Epidemic curve and place distribution
Age and sex distribution
Review all available data
Past experience, search the literature
Analytical epidemiology

Retrospective cohort study


Used when the population/group of people is
closed.
Starting point: exposure status (exposed -
unexposed)
Calculate attack rates and risk ratios: Ie/Io
Test for statistical significance
Calculate confidence interval of risk ratio
Look for possible confounding
Analytical epidemiology

Case-control study
Starting point: disease status (ill - not ill)
Find odds of exposure of cases and non-cases
Calculate odds ratios: ad/bc
Test for statistical significance
Calculate confidence interval of odds ratio
Look for possible confounding
Environmental inspection
Environmental inspection
Get info on usual practices
Inspect premises and practices
Take samples
Get info on food storage and handling (cold chain,
hot chain)
Get info on personnel (disease history?)
Ask for maps and plans
Need for specialist health inspector
Have your eyes open for clues from unexpected
sources
Control Measures

Remove source
Isolate / treat cases
Destroy food, recall products
Stop production, close premises
Ensure good practice procedures
Protect persons at risk
General hygiene
Vaccination
Other prophylaxis
Initial assessment/action

Is further investigation necessary ?

Do cases continue to occur?


Is it a serious event?
Is the cause unclear?
Is there a risk for recurrence?
Preventive Measures

Make recommendations
Produce guidelines
Make proposals for change in law
Communication

During the outbreak


Information for the public and the media
Information for professionals
Regular and consistent updates
After the end of the investigation
Produce a report for officials, parties involved,
general public, media
Write up for scientific publication
12 Steps of Outbreak Investigation
Preparations

1. Prepare for fieldwork


Identify Team and resources
Research the disease Final
Make administrative arrangements
Clarify your role
2. Establish the existence of an outbreak
Does the observed number of cases exceed the
expected number? Of course you need to know the
disease before you can do that.
3. Verify the diagnosis
Speak directly with persons who are affected
12 Steps of Outbreak Investigation
Define and Describe

4. Define cases
Establish a case definition
5. Identify cases
Identify and count cases by Line listing
6. Describe and orient the data in terms of
time, place and person
Outbreak curve
Map
Identify demographic and other characteristics of
persons at risk
12 Steps of Outbreak Investigation
Analyze

7. Develop hypotheses
Open-ended and wide-ranging interviews with a few
people
8. Evaluate hypotheses
Comparison: hypotheses with established facts
Analytic epidemiology
Cohort studies (RR; 95% CI)

Case-control studies (OR; 95% CI)

9. Refine hypotheses and carry out additional studies


12 Steps of Outbreak Investigation
Finalize

10. Implement control and prevention measures


Should occur as soon as info available
Make recommendations and produce guidelines
Make proposals for change in law
11. Communicate findings
Summarize investigation for requesting authority
Produce written report
12. Maintain surveillance to monitor trends and evaluate
control/preventive measures
Number needed to treat (NNT)
Number Needed to Treat (NNT) = # of pts you need to
treat to prevent 1 additional bad outcome (death, stroke,
etc.). Ex: if drug has NNT of 5, means you have to treat 5
people w/drug to prevent one additional bad outcome;
Lower the value, the better it is

To calculate the NNT, you need to know Absolute Risk


Reduction (ARR); the NNT is the inverse of ARR:

NNT = 1/ARR

Where ARR (absolute risk reduction) = CER (Control


Event Rate) EER (Experimental Event Rate).
Example
The ARR is therefore the amount by which your therapy
reduces the risk of the bad outcome. For example, if your
drug reduces the risk of a bad outcome from 50 per cent
to 30 per cent, the ARR is:

ARR = CER EER = 0.5 0.3 = 0.2 (20 per cent)

NNT = 1/ARR = 1/0.2 = 5


Example
A well-designed randomized controlled trial in children
with a particular disease found that 20 per cent of the
control group developed bad outcomes, compared with
only 12 per cent of those receiving treatment. Calculate
number needed to treat.

Answer:
NNT = 13
Number needed to harm (NNH)
= # of PPL, who must be exposed to something in order for one of them to experience
an adverse effect

NNH = 1/ Absolute risk increase

Absolute risk increase = Experimental event rate control event rate

55 out of 75 people died due to usage of an experimental drug. Among 75 people who
took placebo only 35 of the them died. What is the number needed to harm?
HIGHER VALUE = BETTER; TAKES LONGER TO GET ADVERSE EFX ON PT

1 / (55/75) 35/75) = 1/ (0.73-0.47) = 1/0.26 = ~3.8

NNH = 4
CLINICAL PROBABILITY

The probability of an event can be expressed as a


ratio of the number of likely outcomes to the
number of possible outcomes

The probability of an event is denoted by P


Probabilities are usually expressed as decimals
fractions, not as percentages, and must lie btwn 0
(zero probability) & 1 (absolute certainty)
Probability

Methods of calculating probability:

The multiplication rule


The addition rule
Multiplication rule
multiplication rule of probability states the
probability of 2 or + independent events occurring
at same time is equal to the product of their
individual probabilities

Example:
Chance of having a brown hair is 0.3
Chance of getting a cold is 0.2
What is the chance of meeting brown haired person
with a cold?
0.30.2 = 0.06
Addition rule
addition rule of probability states that probability of
any one of several particular events occurring is
equal to the sum of their individual probabilities,
provided the events are mutually exclusive (i.e. they
cannot happen at one time)

Example: Deck of cards


The probability of picking a heart card in a deck is
0.25, The probability of picking a diamond card in a
deck is 0.25. what is the probability of picking a
heart or a diamond card?

0.25+0.25 = 0.5
Chi-square value and statistical
significance
Suppose the calculated value was 4.25
Degree of freedom was 1

Look at chi square table, find the value of p

P < 0.05
Chi-square value
If the calculated value was 2.5
Degree of freedom was 1

Look at chi square table, find the value of p


t- value and statistical significance
t- value = 2.35
Degree of freedom = 24
P value? Btwn 0.05 & 0.01 statistically
sig

S-ar putea să vă placă și