Sunteți pe pagina 1din 98

Metodologia cercetrii tiinifice

Colectarea datelor
Strategii de masurare

Corin Badiu, 2007

Stabilirea tipurilor de variabile

Surse IT de date
Identifica si localizeaza seturile corecte de date
In institutie
In afara institutiei

Strategii de masurare
Analiza, interpretarea si raportarea rezultatelor

Selecteaza programul adecvat: Excel, SPSS

Foloseste programele pentru analize statistice simple si
prezentarea grafica a rezultatelor

Tipuri de variabile
Variabile de



Modificatori ai efectului*
*Considerate general ca expunere la factori de risc

Tipuri de studii clinice

Studii fara variabile

Studii de caz, serii de cazuri, editoriale,

opinii / comentarii, rapoarte review

Studii cu o singura variabila

Studii descriptive

Studii cu 2 variabile

Studii observationale
Meta-analize si review-uri sistematice

Ierarhia tipurilor de studii clinice

Studii clinice








Variabila predictor

Variabila rezultat

Evidence Based Medicine

Metode de tratament sustinute de
dovezi clinice si de cercetare.
Necesita integrarea celor mai bune
dovezi din cercetare pentru diagnostic
si tratament cu experienta clinica.
Ia in considerare ce este optim pentru
fiecare pacient precum si preferintele

Cercetarea asupra eficientei de tratament
face subiectul unui numar mic de articole.
Evidence based medicine este considerat un
concept ce foloseste baze de date inclusiv
studii sistematice de caz pentru a ghida
interventii terapeutice.
Dovezile trebuie evaluate intr-un context
terapeutic efectiv, Ce tip de interventie
capata sens pentru mine ca practician?

Clinical Questions
What is the best choice of therapy
for my patient?
Is this program theoretically sound?
Does this therapy program work?
How long with the therapy take?
Where do I go from here?

Practicing Clinicians Needed!

are on the front line
have necessary clinical expertise
know their patients well
are naturally scientific thinkers
are well-versed in data collection
know how to look for outcomes

Baseline Measurements
A baseline is a measure of response
rates in the absence of treatment
Establish a need for treatment
Document improvement
Allow us to modify if we dont see

Baseline Data
Create a set of exemplars of each of your
targets and prepare a recording sheet.
Utilize criterion referenced measures.

Data Collection Strategies

Always have more than one measurement
Check the reliability of the baseline data
Select research/clinical design

Research/Clinical Designs
ABA designs
Test, treat and test
ABAB designs
Test, treat, test and treat
Time-Series designs
Establish stable baseline
Begin treatment
Measure treatment results
Multiple-Baseline designs
Have a number of different baselines
Each baseline must be independent of the others
Only treat one variable

Data Collection Instruments



Data Collection Instrument

Is it reliable?
Will the instrument measure consistently across:
Different testing situations?
Test-retest reliability

Different judges?
Inter-rater reliability

Data Collection Instrument

Is it valid?
Is the instrument
being used to
measure the kind of
data for which it
was intended?

Data Collection Instrument

Is it responsive?
The instrument should be equally sensitive, whether a
characteristic is present or absent.
Must measure both:
You thought it was intact, but it was torn.
You thought it was torn, but it was intact.

Data Collection Instrument

Is it universal?
The investigator should
employ a widely used data
collection instrument, which
helps minimize reporting bias
because the data can then be
compared with other published

Data Collection Instrument

Is it unbiased?
There should be no difference between the true value
and the value that an investigator actually obtains
other than a difference caused by sampling variability.

Sampling Methods

Random Sampling (Simple)

Systematic Sampling

Stratified Sampling

4. Cluster Sampling
5. Convenience Sampling
6. More complex sampling

Qualitative and Quantitative Variables

Examples of qualitative variables are occupation, sex, marital

status, and etc
Variables that yield observations that can be measured are
considered to be quantitative variables. Examples of
quantitative variables are weight, height, and age
Quantitative variables can further be classified as discrete or

Variables types





Categorical variables (e.g., Sex, Marital Status,

income category)
Continuous variables (e.g., Age, income,
weight, height, time to achieve an outcome)
Discrete variables (e.g.,Number of Children in
a family)
Binary or Dichotomous variables (e.g.,
response to all Yes or No type of questions)

Scale of Data
1. Nominal: These data do not represent an amount or quantity (e.g.,
Marital Status, Sex)
2. Ordinal: These data represent an ordered series of relationship (e.g.,
level of education)
3. Interval: These data is measured on an interval scale having equal
units but an arbitrary zero point. (e.g.: Temperature in Fahrenheit)
4. Interval Ratio: Variable such as weight for which we can compare
meaningfully one weight versus another (say, 100 Kg is twice 50 Kg)

Variables in the protocol


Independent Variable
The characteristic being observed and/or
measured that is hypothesized to influence an
event or outcome (dependent variable).
The independent variable is not influenced
by the event or outcome, but may cause it
or contribute to its variation.

Dependent Variable
A variable whose value is dependent on
the effect of other variables (ie.,
independent variables) in the
relationship being studied. Synonyms:
outcome or response variable.
an event or outcome whose variation we
seek to explain or account for by the
influence of independent variables.

Intermediate Variable
A variable that occurs in a causal pathway from
an independent to a dependent variable.
Synonyms: intervening, mediating
it produces variation in the dependent
variable, and is caused to vary by the
independent variable.
such a variable is associated with both the
dependent and independent variables.

Confounding Variable
A factor (that is itself a determinant of the
outcome), that distorts the apparent effect of a
study variable on the outcome.
such a factor may be unequally distributed
among the exposed and the unexposed, and
thereby influence the apparent magnitude and
even the direction of the effect.

Organizing Data

Frequency Table
Frequency Histogram
Relative Frequency Histogram
Frequency polygon
Relative Frequency polygon
Bar chart
Pie chart
stem-and-leaf display
Box Plot

Frequency Table
Suppose we are interested in studying the number of
children in the families living in a community. The
following data has been collected based on a random
sample of n = 30 families from the community.
2, 2, 5, 3, 0, 1, 3, 2, 3, 4, 1, 3, 4, 5, 7, 3, 2, 4, 1, 0, 5, 8, 6,
5, 4 , 2, 4, 4, 7, 6
Organize this data in a Frequency Table!

Frequency Table
X=No. of Children Count

Relative Freq.










Frequency plot

Frequency Table
Now suppose we need to construct a similar frequency table for the
age of patients with Heart related problems in a clinic.
The following data has been collected based on a random sample of
n = 30 patients who went to the emergency room of the clinic for
Heart related problems.
The measurements are: 42, 38, 51, 53, 40, 68, 62, 36, 32, 45, 51, 67,
53, 59, 47, 63, 52, 64, 61, 43, 56, 58, 66, 54, 56, 52, 40, 55, 72, 69.

Frequency Table
Age Groups



32 -36.99


37- 41.99

















Measures of Central Tendency

Where is the heart of distribution?
1. Mean
2. Median
3. Mode

Empirical Rule
For a Normal distribution approximately,
a) 68% of the measurements fall within one standard
deviation around the mean
b) 95% of the measurements fall within two standard
deviations around the mean
c) 99.7% of the measurements fall within three
standard deviations around the mean

Prerequisite Skills
Fundamental concepts of measurement

Scales of measurement
Distribution, central tendency, variability,
Disease prevalence and incidence
Disease outcomes (eg, fatality rates)
Associations (correlation or covariance)
Health impact (eg, risk differences and ratios)
Sensitivity, specificity, predictive values

Scales of Measure

Nominal qualitative classification of equal

value: gender, race, color, city
Ordinal - qualitative classification which can
be rank ordered: socioeconomic status of
Interval - Numerical or quantitative data: can
be rank ordered and sizes compared :
Ratio - interval data with absolute zero value:
time or space

Distribution, Central Tendency


Variability, Probability

Standard deviation
Statistical Significance p < .01

Confidence Interval

Statistical Significance
Type I and Type II errors
Null Hypothesis = Ho
Ho True

Ho False

Reject Ho

Type I error


Do Not Reject


Type II error

Statistics Online Textbook

The Statistics Homepage

Disease Prevalence and


probability of disease in entire population at

any point in time
2% of the population has diabetes


probability that patient without disease

develops disease during interval
0.2% or 2 per 1000 new cases per year

Sensitivity, Specificity
sensitivity =
a / (a+c)
specificity =
d / (b+d)

Test is
Test is


Predictive Value
Positive predictive value
= a / ( a+b)
Negative predictive
value = d / (c+d)
Post-test probability of
disease given positive
test = a / (a+b)
Post-test probability of
disease given negative
test = c / (c+d)

Test is
Test is


Good Resource Sen, Spc, PV

An Introduction to Information Mastery

Sensitivity and specificity
Predictive values
Likelihood ratios


Calculators: Epidemiology, Diagnostic test

Bias in Clinical Trials

Areas in which bias can occur
Systematic error in . . .

Bias in Clinical Trials

Allocation or Susceptibility Bias
Can occur when patient assignments to a trial
group are influenced by an investigators
knowledge of the treatment to be received.
Can result in
treatment groups
that have different

Bias in Clinical Trials

Allocation or Susceptibility Bias
Treatment groups must have similar
prognoses, which is achieved by:

Randomization of patients
Prospective evaluation of patients
Well-defined inclusion and exclusion criteria

Randomization in Clinical Trials

Occurs when patients
are assigned to
treatments by means
of a mechanism that
prevents both the
patients and the
investigator from
knowing which
treatment is being

Benefits of Randomization
Prevents the systematic introduction of bias.
Minimizes the possibility of allocation bias.
Balances prognostic factors for treatment groups.
Improves the validity of statistical tests used to
compare treatments.

Bias in Clinical Trials

Response & Assessment/Recording Bias
Can occur when a patient reports a treatment
response or when an investigator assesses that
responseeither person can be influenced by
knowing the treatment.
A patient or an investigator may have a
preconceived idea of which treatment is better.
The patient may also want to please the

Bias in Clinical Trials

To minimize Response & Assessment/Recording Bias
Single Blind (patient blinded): protects against
response bias.
Double Blind (patient and investigator blinded):
protects against assessment/recording bias as well
as response bias.

Bias in Clinical Trials

Transfer bias
Occurs when patients are lost to follow-up.
Must be minimized.
Performance bias
Can occur with a single surgeon or with
multiple surgeons.

Confounding Example
Relationship between coffee and
pancreatic cancer, BUT
Smoking is a known risk factor for
pancreatic cancer
Smoking is associated with coffee
drinking but it is not a result of coffee

What is confounding?
If an association is observed between
coffee drinking and pancreatic cancer

Coffee actually causes pancreatic cancer,

The coffee drinking and pancreatic cancer
association is the result of confounding by
cigarette smoking.

How to handle confounding

If you know something is a possible
confounder, in the data analysis use

Stratification, or

Fear the unknown!

Study Design Taxonomy

Treatment vs. Observational
Prospective vs. Retrospective
Longitudinal vs. Cross-sectional
Randomized vs. Non-Randomized
Blinded/Masked or Not

Single-blind, Double blind, Unblinded

Randomization: Definition
Random Allocation

known chance receiving a treatment

cannot predict the treatment to be given

Eliminate Selection Bias

Similar Treatment Groups

ONE Factor is Different

Randomization tries to ensure that ONE
factor is different between two or more
Observe the Consequences
Attribute Causality

Types of Randomization
Standard ways:
Random number tables (see text)
Computer programs
NOT legitimate
Birth date
Last digit of the medical record number
Odd/even room number

Types of Randomization
Blocked Randomization
Stratified Randomization

Simple Randomization
Randomize each patient to a treatment
with a known probability

Corresponds to flipping a coin

Could have imbalance in # / group or

trends in group assignment
Could have different distributions of a
trait like gender in the two arms

Block Randomization
Insure the # of patients assigned to
each treatment is not far out of balance
Variable block size

An additional layer of blindness

Different distributions of a trait like

gender in the two arms possible

Stratified Randomization
A priori certain factors likely important
(e.g. Age, Gender)
Randomize so different levels of the
factor are balanced between treatment
Cannot evaluate the stratification

Stratified Randomization
For each subgroup or strata perform a
separate block randomization
Common strata

Clinical center, Age, Gender

Stratification MUST be taken into

account in the data analysis!


Introductory Statistical Definitions

What is Randomization?
Randomized Study Design
Experimental vs. Observational
Non-Randomized Study Design
Stat Software, Books, Articles

Types of Randomized Studies

Parallel Group
Sequential Trials
Group Sequential trials
Factorial Designs

Parallel Group
Randomize patients to one of k

Measure at end of study

Delta or % change from baseline
Repeated measures
Function of multiple measures

Ideal Study - Gold Standard

Double blind
Parallel groups

Two Scenarios
Study 1

A U.S. study (2000) compared 469 patients with brain cancer

to 422 patients who did not have brain cancer. The patients
cell phone use was measured using a questionnaire. The two
groups use of cell phones was similar.

Study 2

An Australian study (1997) conducted a study with 200

transgenic mice. One hundred were exposed for two 30
minute periods a day to the same kind of microwaves with
roughly the same power as the kind transmitted from a cell
phone. The other 100 mice were not exposed. After 18
months, the brain tumor rate for the exposed mice was twice
as high as that for the unexposed mice.

Questions to Consider
How do the two studies differ?

Study 1

Study 2

Questions to Consider
Why do the results of different medical
studies sometimes disagree?

Could the second study be performed

on human beings?

Questions to Consider
Suppose a friend recently diagnosed with
brain cancer was a frequent cell phone
user. Is this strong evidence that frequent
cell phone use increases the likelihood of
getting brain cancer?

Informal observations of this type are called

_____________ _____________.
You should rely on reputable research studies,
not anecdotes.

Two Main Ways to Gather Data

Observational Study

The researcher observes values of the response and

explanatory variables for the sampled subjects without
imposing any treatments


The researcher assigns experimental conditions (also

called treatments) to subjects (also called experimental
units) and then observes outcomes on the response
Treatments correspond to values of the explanatory

Types of Observational Studies


Observational studies that look back in time

This is sometimes done to find risk factors for certain


Observational studies that take a cross section of

the population at the current time


Observational studies in which subjects are

followed into the future

Advantages of Experiments over

Observational Studies
In an observational study, there can always be
lurking variables affecting the results.
This means that observational studies can
_________ show causation.
It is easier to adjust for lurking variables in an
In general, we can study the effect of an explanatory
variable on a response variable more accurately
with an experiment than with an observational study.

Disadvantages of Experiments
They can be ____________ to perform on the
subjects in which you are interested.
It can be difficult to monitor subjects to ensure that
they are doing what they are told.
They can take many years, even decades, to
Results of experiments that use animals do not
______________ to humans.
They are unnecessary the question of interest does
not involve trying to assess _____________.

Sampling Designs for

Observational Studies
Simple Random Sampling (SRS)

A simple random sample of n subjects from a population is

one in which each possible sample of that size has the
_______ chance of being selected.

Sampling Designs for Observational

Stratified Sampling

A stratified random sample divides the population into

separate groups, called ________, and then selects an SRS
from each stratum.

Sampling Designs for

Observational Studies
Cluster Sampling

A cluster random sample can be used if the target population

naturally divides into groups, each of which is representative of the
entire target population. In this method, a SRS of groups (or strata)
is taken. Every member of the selected groups is put into the

Sampling Designs for

Observational Studies
Systematic Sampling

A systematic sample selects every kth

person from the sample frame. The
researcher randomly selects a number
between 1 and k in order to know which
person to select first, then selects every k th
person after this.

Advantages of the Various

Sampling Designs
Simple Random Sampling (SRS)

It is the easiest most widespread form of sampling.

Each subject has an _______ chance to be in the
The sample enables us to determine how likely it is
that descriptive statistics (like the sample mean)
fall close to corresponding values for which we
would like to make inference (like the population

Advantages of the Various

Sampling Designs
Stratified Sampling

It ensures that there are enough _________ in

each group that you want to compare.

Cluster Sampling

It does not require a sampling frame of subjects.

It is less ___________ to implement.

Bias in Sampling
A sampling method is _________ if

The sample tends to favor some parts of the

population over others.
In other words, the results from the sample are not
representative of the population.

Obviously, __________ samples are our goal.

Types of Bias

Occurs when a sampling frame leaves out some groups in the


Nonresponse bias

Occurs when some sampled subjects cannot be reached, refuse to

participate or fail to answer some questions

Response bias

Occurs when the subject gives an incorrect response or when the

question wording or the way the interviewer asks the questions is
confusing or misleading

Examples of Poor Samples that

Result in Bias
Convenience Samples

Voluntary Response Samples

Elements of a Good Experiment

Control group

Gives us something to compare against

Enables us to control the __________ _______
The placebo effect occurs when patients seem to improve
regardless of the treatment they receive.


Eliminates ______ that can result when researchers assign

treatments to the subjects
Balances the group on variables that you know affect the
Balances the group on _________ variables that may be
unknown to you

Elements of a Good Experiment


Increases reliability of the results

_________-blind: subjects do not know the
treatment assignment
_________-blind: neither the subjects nor those
in contact with the subjects know the treatment


A pharmaceutical company has developed a new drug for treating

high blood pressure. To determine the effectiveness of the drug,
the company conducted an experiment in which subjects with a
history of high blood pressure were treated with the new drug.

A later experiment randomly divided subjects with a history of

high blood pressure into two groups. Group A was treated with
the new drug as before. Group B received the most popular drug
on the market at that time. The subjects were unaware of which
treatment they received. 60% of the patients in Group A
improved, while 63% of the patients in Group B improved.

The __________ experiment is better because


To investigate whether antidepressants help smokers to quit smoking,

one study used 429 men and women who were 18 or older and had
smoked 15 cigarettes or more per day in the previous year. They were
all highly motivated to quit and in good health. They were assigned to
one of two groups: one group took an antidepressant called Zyban,
while the other group did not take anything. At the end of a year, the
study observed whether each subject had successfully abstained from

Logic Behind Randomized

Comparative Experiments
Randomization ensures that the groups of subjects
are similar in all respects before the treatments are
Using a control group for comparison ensures that
external influences operate equally on both groups.
If the groups are large enough, natural differences in
subjects will average out.
This means that there be little difference in the results
for the groups unless the treatments themselves
actually cause the difference.

Did You Know?

Observational studies can also have control groups.

These are called ______-________ studies.

The cases are people who have a certain disease or
condition, and the controls are people who do not have the
Their purpose is to see if one of the explanatory variables is
related to the disease.
_________ from the beginning of these notes is an example
of a case-control study.

Important Points

Types of studies:

Observational studies and experiments

Experiments control for lurking variables

Sampling designs:

SRS, stratified random samples and cluster samples

SRS is the preferred method

Potential sources of bias:


Response bias

Nonresponse bias

Convenience sampling

Voluntary response sampling

Elements of good experiments:

Control group, randomization and blinding

Important Points
If a group is underrepresented in the sample, we cannot
make inference about it.
We must be careful when interpreting the results of
observational studies.
For comparison of several treatments to be valid, you must
apply all treatments to similar groups of experimental units.
Interesting questions are usually pretty tough to answer. This
is due in part to the fact that no single experiment or
observational study can determine causation.

Stop and Think!!!

Write the study!
Describe & classify the
Instruments for measure?
Prepare to analyze data!