Sunteți pe pagina 1din 98

Metodologia cercetrii tiinifice

Colectarea datelor
Strategii de masurare

Corin Badiu, 2007

Obiective
Stabilirea tipurilor de variabile

Surse IT de date
Identifica si localizeaza seturile corecte de date
In institutie
In afara institutiei

Strategii de masurare
Analiza, interpretarea si raportarea rezultatelor

Selecteaza programul adecvat: Excel, SPSS


Foloseste programele pentru analize statistice simple si
prezentarea grafica a rezultatelor
Interpreteazarezultatele

Tipuri de variabile
Variabile de
confuzie*

Predictor*

Rezultat

Modificatori ai efectului*
*Considerate general ca expunere la factori de risc

Tipuri de studii clinice


Studii fara variabile

Studii de caz, serii de cazuri, editoriale,


opinii / comentarii, rapoarte review

Studii cu o singura variabila

Studii descriptive

Studii cu 2 variabile

Experimente
Studii observationale
Meta-analize si review-uri sistematice

Ierarhia tipurilor de studii clinice

Studii clinice
Descriptive

Analitice

Experimentale

Observationale

Cohorta

Caz-control

Cross-sectional

Variabile

Variabila predictor
(independenta)

Variabila rezultat
(dependenta)

Evidence Based Medicine


Metode de tratament sustinute de
dovezi clinice si de cercetare.
Necesita integrarea celor mai bune
dovezi din cercetare pentru diagnostic
si tratament cu experienta clinica.
Ia in considerare ce este optim pentru
fiecare pacient precum si preferintele
acestuia.

Realitatea
Cercetarea asupra eficientei de tratament
face subiectul unui numar mic de articole.
Evidence based medicine este considerat un
concept ce foloseste baze de date inclusiv
studii sistematice de caz pentru a ghida
interventii terapeutice.
Dovezile trebuie evaluate intr-un context
terapeutic efectiv, Ce tip de interventie
capata sens pentru mine ca practician?

Clinical Questions
What is the best choice of therapy
for my patient?
Is this program theoretically sound?
Does this therapy program work?
How long with the therapy take?
Where do I go from here?

Practicing Clinicians Needed!


Clinicians
are on the front line
have necessary clinical expertise
know their patients well
are naturally scientific thinkers
are well-versed in data collection
know how to look for outcomes

Baseline Measurements
A baseline is a measure of response
rates in the absence of treatment
Baselines
Establish a need for treatment
Document improvement
Allow us to modify if we dont see
improvement

Baseline Data
Create a set of exemplars of each of your
targets and prepare a recording sheet.
Utilize criterion referenced measures.

Data Collection Strategies


Always have more than one measurement
Check the reliability of the baseline data
Select research/clinical design

Research/Clinical Designs
ABA designs
Test, treat and test
ABAB designs
Test, treat, test and treat
Time-Series designs
Establish stable baseline
Begin treatment
Measure treatment results
Multiple-Baseline designs
Have a number of different baselines
Each baseline must be independent of the others
Only treat one variable

Data Collection Instruments


Requirements

Reliable
Valid
Responsive
Universal
Unbiased

Data Collection Instrument


Is it reliable?
Will the instrument measure consistently across:
Different testing situations?
Test-retest reliability

Different judges?
Inter-rater reliability

Data Collection Instrument


Is it valid?
Is the instrument
being used to
measure the kind of
data for which it
was intended?

Data Collection Instrument


Is it responsive?
The instrument should be equally sensitive, whether a
characteristic is present or absent.
Must measure both:
False-negatives:
You thought it was intact, but it was torn.
False-positives:
You thought it was torn, but it was intact.

Data Collection Instrument


Is it universal?
The investigator should
employ a widely used data
collection instrument, which
helps minimize reporting bias
because the data can then be
compared with other published
literature.

Data Collection Instrument


Is it unbiased?
There should be no difference between the true value
and the value that an investigator actually obtains
other than a difference caused by sampling variability.

Sampling Methods

Random Sampling (Simple)

Systematic Sampling

Stratified Sampling

4. Cluster Sampling
5. Convenience Sampling
6. More complex sampling

Qualitative and Quantitative Variables

Examples of qualitative variables are occupation, sex, marital


status, and etc
Variables that yield observations that can be measured are
considered to be quantitative variables. Examples of
quantitative variables are weight, height, and age
Quantitative variables can further be classified as discrete or
continuous

Variables types

1.

2.

3.

4.

Categorical variables (e.g., Sex, Marital Status,


income category)
Continuous variables (e.g., Age, income,
weight, height, time to achieve an outcome)
Discrete variables (e.g.,Number of Children in
a family)
Binary or Dichotomous variables (e.g.,
response to all Yes or No type of questions)

Scale of Data
1. Nominal: These data do not represent an amount or quantity (e.g.,
Marital Status, Sex)
2. Ordinal: These data represent an ordered series of relationship (e.g.,
level of education)
3. Interval: These data is measured on an interval scale having equal
units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit)
4. Interval Ratio: Variable such as weight for which we can compare
meaningfully one weight versus another (say, 100 Kg is twice 50 Kg)

Variables in the protocol


TYPES OF VARIABLE
independent
dependent
intermediate
confounding

Independent Variable
The characteristic being observed and/or
measured that is hypothesized to influence an
event or outcome (dependent variable).
NOTE
The independent variable is not influenced
by the event or outcome, but may cause it
or contribute to its variation.

Dependent Variable
A variable whose value is dependent on
the effect of other variables (ie.,
independent variables) in the
relationship being studied. Synonyms:
outcome or response variable.
NOTE
an event or outcome whose variation we
seek to explain or account for by the
influence of independent variables.

Intermediate Variable
A variable that occurs in a causal pathway from
an independent to a dependent variable.
Synonyms: intervening, mediating
NOTES
it produces variation in the dependent
variable, and is caused to vary by the
independent variable.
such a variable is associated with both the
dependent and independent variables.

Confounding Variable
A factor (that is itself a determinant of the
outcome), that distorts the apparent effect of a
study variable on the outcome.
NOTE
such a factor may be unequally distributed
among the exposed and the unexposed, and
thereby influence the apparent magnitude and
even the direction of the effect.

Organizing Data
1.
2.
3.
4.
5.
6.
7.
8.
9.

Frequency Table
Frequency Histogram
Relative Frequency Histogram
Frequency polygon
Relative Frequency polygon
Bar chart
Pie chart
stem-and-leaf display
Box Plot

Frequency Table
Suppose we are interested in studying the number of
children in the families living in a community. The
following data has been collected based on a random
sample of n = 30 families from the community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6,
5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!

Frequency Table
X=No. of Children Count
(Freq.)

Relative Freq.

2/30=0.067

3/30=0.100

5/30=0.167

5/30=0.167

6/30=0.200

4/30=0.133

2/30=0.067

2/30=0.067

1/30=0.033

Frequency plot

Frequency Table
Now suppose we need to construct a similar frequency table for the
age of patients with Heart related problems in a clinic.
The following data has been collected based on a random sample of
n = 30 patients who went to the emergency room of the clinic for
Heart related problems.
The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67,
53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40, 55, 72, 69.

Frequency Table
Age Groups

Frequency

Relative
Frequency

32 -36.99

2/30=0.067

37- 41.99

3/30=0.100

42-46.99

4/30=0.134

47-51.99

3/30=0.100

52-56.99

8/30=0.267

57-61.99

3/30=0.100

62-66.99

4/30=0.134

67-72

3/30=0.100

Total

n=30

1.00

Measures of Central Tendency


Where is the heart of distribution?
1. Mean
2. Median
3. Mode

Empirical Rule
For a Normal distribution approximately,
a) 68% of the measurements fall within one standard
deviation around the mean
b) 95% of the measurements fall within two standard
deviations around the mean
c) 99.7% of the measurements fall within three
standard deviations around the mean

Prerequisite Skills
Fundamental concepts of measurement

Scales of measurement
Distribution, central tendency, variability,
probability
Disease prevalence and incidence
Disease outcomes (eg, fatality rates)
Associations (correlation or covariance)
Health impact (eg, risk differences and ratios)
Sensitivity, specificity, predictive values

Scales of Measure

Nominal qualitative classification of equal


value: gender, race, color, city
Ordinal - qualitative classification which can
be rank ordered: socioeconomic status of
families
Interval - Numerical or quantitative data: can
be rank ordered and sizes compared :
temperature
Ratio - interval data with absolute zero value:
time or space

Distribution, Central Tendency


Mean

Variability, Probability

Mean
Median
Mode
Standard deviation
Statistical Significance p < .01

Confidence Interval

Statistical Significance
Type I and Type II errors
Null Hypothesis = Ho
Ho True

Ho False

Reject Ho

Type I error

Correct
decision

Do Not Reject
Ho

Correct
decision

Type II error

Statistics Online Textbook


The Statistics Homepage
http://www.statsoftinc.com/textbook/sta
thome.html

Disease Prevalence and


Incidence
Prevalence

probability of disease in entire population at


any point in time
2% of the population has diabetes

Incidence

probability that patient without disease


develops disease during interval
0.2% or 2 per 1000 new cases per year

Sensitivity, Specificity
sensitivity =
a / (a+c)
specificity =
d / (b+d)

Patients
with
disease
Test is
positive
Test is
negative

Patients
without
disease

Predictive Value
Positive predictive value
= a / ( a+b)
Negative predictive
value = d / (c+d)
Post-test probability of
disease given positive
test = a / (a+b)
Post-test probability of
disease given negative
test = c / (c+d)

Patients
with
disease
Test is
positive
Test is
negative

Patients
without
disease

Good Resource Sen, Spc, PV


An Introduction to Information Mastery
http://www.poems.msu.edu/InfoMastery/defau
lt.htm

Diagnosis
Sensitivity and specificity
Predictive values
Likelihood ratios

InfoRetriever

Calculators: Epidemiology, Diagnostic test

Bias in Clinical Trials


Areas in which bias can occur
Systematic error in . . .
Allocation
Response
Assessment

Bias in Clinical Trials


Allocation or Susceptibility Bias
Can occur when patient assignments to a trial
group are influenced by an investigators
knowledge of the treatment to be received.
Can result in
treatment groups
that have different
prognoses.

Bias in Clinical Trials


Allocation or Susceptibility Bias
Treatment groups must have similar
prognoses, which is achieved by:

Randomization of patients
Prospective evaluation of patients
Well-defined inclusion and exclusion criteria

Randomization in Clinical Trials


Occurs when patients
are assigned to
treatments by means
of a mechanism that
prevents both the
patients and the
investigator from
knowing which
treatment is being
assigned.

Benefits of Randomization
Prevents the systematic introduction of bias.
Minimizes the possibility of allocation bias.
Balances prognostic factors for treatment groups.
Improves the validity of statistical tests used to
compare treatments.

Bias in Clinical Trials


Response & Assessment/Recording Bias
Can occur when a patient reports a treatment
response or when an investigator assesses that
responseeither person can be influenced by
knowing the treatment.
A patient or an investigator may have a
preconceived idea of which treatment is better.
The patient may also want to please the
investigator.

Bias in Clinical Trials


Blinding
To minimize Response & Assessment/Recording Bias
Single Blind (patient blinded): protects against
response bias.
Double Blind (patient and investigator blinded):
protects against assessment/recording bias as well
as response bias.

Bias in Clinical Trials


Transfer bias
Occurs when patients are lost to follow-up.
Must be minimized.
Performance bias
Can occur with a single surgeon or with
multiple surgeons.

Confounding Example
Relationship between coffee and
pancreatic cancer, BUT
Smoking is a known risk factor for
pancreatic cancer
Smoking is associated with coffee
drinking but it is not a result of coffee
drinking.

What is confounding?
If an association is observed between
coffee drinking and pancreatic cancer

Coffee actually causes pancreatic cancer,


or
The coffee drinking and pancreatic cancer
association is the result of confounding by
cigarette smoking.

How to handle confounding


If you know something is a possible
confounder, in the data analysis use

Stratification, or
Adjustment

Fear the unknown!

Study Design Taxonomy


Treatment vs. Observational
Prospective vs. Retrospective
Longitudinal vs. Cross-sectional
Randomized vs. Non-Randomized
Blinded/Masked or Not

Single-blind, Double blind, Unblinded

Randomization: Definition
Random Allocation

known chance receiving a treatment


cannot predict the treatment to be given

Eliminate Selection Bias


Similar Treatment Groups

ONE Factor is Different


Randomization tries to ensure that ONE
factor is different between two or more
groups.
Observe the Consequences
Attribute Causality

Types of Randomization
Standard ways:
Random number tables (see text)
Computer programs
NOT legitimate
Birth date
Last digit of the medical record number
Odd/even room number

Types of Randomization
Simple
Blocked Randomization
Stratified Randomization

Simple Randomization
Randomize each patient to a treatment
with a known probability

Corresponds to flipping a coin

Could have imbalance in # / group or


trends in group assignment
Could have different distributions of a
trait like gender in the two arms

Block Randomization
Insure the # of patients assigned to
each treatment is not far out of balance
Variable block size

An additional layer of blindness

Different distributions of a trait like


gender in the two arms possible

Stratified Randomization
A priori certain factors likely important
(e.g. Age, Gender)
Randomize so different levels of the
factor are balanced between treatment
groups
Cannot evaluate the stratification
variable

Stratified Randomization
For each subgroup or strata perform a
separate block randomization
Common strata

Clinical center, Age, Gender

Stratification MUST be taken into


account in the data analysis!

Outline

Introductory Statistical Definitions


What is Randomization?
Randomized Study Design
Experimental vs. Observational
Non-Randomized Study Design
Stat Software, Books, Articles

Types of Randomized Studies

Parallel Group
Sequential Trials
Group Sequential trials
Cross-over
Factorial Designs

Parallel Group
Randomize patients to one of k
treatments
Response

Measure at end of study


Delta or % change from baseline
Repeated measures
Function of multiple measures

Ideal Study - Gold Standard


Double blind
Randomized
Parallel groups

Two Scenarios
Study 1

A U.S. study (2000) compared 469 patients with brain cancer


to 422 patients who did not have brain cancer. The patients
cell phone use was measured using a questionnaire. The two
groups use of cell phones was similar.

Study 2

An Australian study (1997) conducted a study with 200


transgenic mice. One hundred were exposed for two 30
minute periods a day to the same kind of microwaves with
roughly the same power as the kind transmitted from a cell
phone. The other 100 mice were not exposed. After 18
months, the brain tumor rate for the exposed mice was twice
as high as that for the unexposed mice.

Questions to Consider
How do the two studies differ?

Study 1

Study 2

Questions to Consider
Why do the results of different medical
studies sometimes disagree?

Could the second study be performed


on human beings?

Questions to Consider
Suppose a friend recently diagnosed with
brain cancer was a frequent cell phone
user. Is this strong evidence that frequent
cell phone use increases the likelihood of
getting brain cancer?

Informal observations of this type are called


_____________ _____________.
You should rely on reputable research studies,
not anecdotes.

Two Main Ways to Gather Data


Observational Study

The researcher observes values of the response and


explanatory variables for the sampled subjects without
imposing any treatments
Example:

Experiment

The researcher assigns experimental conditions (also


called treatments) to subjects (also called experimental
units) and then observes outcomes on the response
variable.
Treatments correspond to values of the explanatory
variable
Example:

Types of Observational Studies


Retrospective

Observational studies that look back in time


This is sometimes done to find risk factors for certain
diseases

Cross-Sectional

Observational studies that take a cross section of


the population at the current time

Prospective

Observational studies in which subjects are


followed into the future

Advantages of Experiments over


Observational Studies
In an observational study, there can always be
lurking variables affecting the results.
This means that observational studies can
_________ show causation.
It is easier to adjust for lurking variables in an
experiment.
In general, we can study the effect of an explanatory
variable on a response variable more accurately
with an experiment than with an observational study.

Disadvantages of Experiments
They can be ____________ to perform on the
subjects in which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
complete.
Results of experiments that use animals do not
______________ to humans.
They are unnecessary the question of interest does
not involve trying to assess _____________.

Sampling Designs for


Observational Studies
Simple Random Sampling (SRS)

A simple random sample of n subjects from a population is


one in which each possible sample of that size has the
_______ chance of being selected.

Sampling Designs for Observational


Studies
Stratified Sampling

A stratified random sample divides the population into


separate groups, called ________, and then selects an SRS
from each stratum.

Sampling Designs for


Observational Studies
Cluster Sampling

A cluster random sample can be used if the target population


naturally divides into groups, each of which is representative of the
entire target population. In this method, a SRS of groups (or strata)
is taken. Every member of the selected groups is put into the
sample.

Sampling Designs for


Observational Studies
Systematic Sampling

A systematic sample selects every kth


person from the sample frame. The
researcher randomly selects a number
between 1 and k in order to know which
person to select first, then selects every k th
person after this.

Advantages of the Various


Sampling Designs
Simple Random Sampling (SRS)

It is the easiest most widespread form of sampling.


Each subject has an _______ chance to be in the
sample.
The sample enables us to determine how likely it is
that descriptive statistics (like the sample mean)
fall close to corresponding values for which we
would like to make inference (like the population
mean).

Advantages of the Various


Sampling Designs
Stratified Sampling

It ensures that there are enough _________ in


each group that you want to compare.

Cluster Sampling

It does not require a sampling frame of subjects.


It is less ___________ to implement.

Bias in Sampling
A sampling method is _________ if

The sample tends to favor some parts of the


population over others.
In other words, the results from the sample are not
representative of the population.

Obviously, __________ samples are our goal.

Types of Bias
Undercoverage

Occurs when a sampling frame leaves out some groups in the


population

Nonresponse bias

Occurs when some sampled subjects cannot be reached, refuse to


participate or fail to answer some questions

Response bias

Occurs when the subject gives an incorrect response or when the


question wording or the way the interviewer asks the questions is
confusing or misleading

Examples of Poor Samples that


Result in Bias
Convenience Samples

Voluntary Response Samples

Elements of a Good Experiment


Control group

Gives us something to compare against


Enables us to control the __________ _______
The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.

Randomization

Eliminates ______ that can result when researchers assign


treatments to the subjects
Balances the group on variables that you know affect the
response
Balances the group on _________ variables that may be
unknown to you

Elements of a Good Experiment


Blinding

Increases reliability of the results


_________-blind: subjects do not know the
treatment assignment
_________-blind: neither the subjects nor those
in contact with the subjects know the treatment
assignment

Example

A pharmaceutical company has developed a new drug for treating


high blood pressure. To determine the effectiveness of the drug,
the company conducted an experiment in which subjects with a
history of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of


high blood pressure into two groups. Group A was treated with
the new drug as before. Group B received the most popular drug
on the market at that time. The subjects were unaware of which
treatment they received. 60% of the patients in Group A
improved, while 63% of the patients in Group B improved.

The __________ experiment is better because

Example

To investigate whether antidepressants help smokers to quit smoking,


one study used 429 men and women who were 18 or older and had
smoked 15 cigarettes or more per day in the previous year. They were
all highly motivated to quit and in good health. They were assigned to
one of two groups: one group took an antidepressant called Zyban,
while the other group did not take anything. At the end of a year, the
study observed whether each subject had successfully abstained from
smoking.

Logic Behind Randomized


Comparative Experiments
Randomization ensures that the groups of subjects
are similar in all respects before the treatments are
applied.
Using a control group for comparison ensures that
external influences operate equally on both groups.
If the groups are large enough, natural differences in
subjects will average out.
This means that there be little difference in the results
for the groups unless the treatments themselves
actually cause the difference.

Did You Know?


Observational studies can also have control groups.

These are called ______-________ studies.


The cases are people who have a certain disease or
condition, and the controls are people who do not have the
disease.
Their purpose is to see if one of the explanatory variables is
related to the disease.
_________ from the beginning of these notes is an example
of a case-control study.

Important Points

Types of studies:

Observational studies and experiments

Experiments control for lurking variables


Sampling designs:

SRS, stratified random samples and cluster samples

SRS is the preferred method


Potential sources of bias:

Undercoverage

Response bias

Nonresponse bias

Convenience sampling

Voluntary response sampling


Elements of good experiments:

Control group, randomization and blinding

Important Points
If a group is underrepresented in the sample, we cannot
make inference about it.
We must be careful when interpreting the results of
observational studies.
For comparison of several treatments to be valid, you must
apply all treatments to similar groups of experimental units.
Interesting questions are usually pretty tough to answer. This
is due in part to the fact that no single experiment or
observational study can determine causation.

Stop and Think!!!


Write the study!
Describe & classify the
variables.
Instruments for measure?
Bias?
Prepare to analyze data!