Documente Academic
Documente Profesional
Documente Cultură
Frank
Harrell
Chair
of
Biosta2s2cs,
Vanderbilt
Regression
Modeling
Strategies
rst
edi2on
2001,
revised
2011
Design
package
in
R
one
of
the
most
widely
used
and
abused
of
all
data
analysis
techniques
very
commonly
employed
for
reasons
of
developing
a
concise
model
or
of
a
false
belief
that
it
is
not
legi2mate
to
include
insignicant
regression
coecients
when
presen2ng
results
to
the
intended
audience
-
Frank
Harrell
a
very
popular
technique
for
many
years,
but
if
it
had
just
been
proposed
as
a
sta2s2cal
method,
it
would
most
likely
be
rejected
because
it
violates
every
principle
of
sta2s2cal
es2ma2on
and
hypothesis
tes2ng
-
Frank
Harrell
causes
severe
biases
in
the
resul2ng
mul2variable
model
ts
while
losing
valuable
predic2ve
informa2on
from
dele2ng
marginally
signicant
variables.
-
Frank
Harrell
personally,
I
would
no
more
let
an
automa2c
rou2ne
select
my
model
than
I
would
let
some
best-t
procedure
pack
my
suitcase.
-
Ronan
Conroy,
biosta2s2cian,
RCS
(Ireland)
treat
all
claims
based
on
stepwise
algorithms
as
if
they
were
made
by
Saddam
Hussein
on
a
bad
day
with
a
headache
having
a
friendly
chat
with
George
Bush.
-
Steve
Blinkhorn,
psychometrician,
author
of
one
of
Natures
magnicent
seven
in
2003
I
don't
know
what
knowledge
we
would
lose
if
all
papers
using
stepwise
regression
were
to
vanish
from
journals
at
the
same
2me
as
programs
providing
their
use
were
to
become
terminally
virus-laden.
-
Ira
Bernstein,
professor
of
Clinical
Sciences,
UTSW
regression
coecients
are
biased
too
high,
even
with
a
single
predictor:
a
predictor
is
more
likely
to
be
included
if
coecient
is
overes2mated
a
predictor
is
less
likely
to
be
included
if
coecient
is
underes2mated
mainly
relevant
for
marginally-signicant
predictors
mul2ple
comparisons
say
we
test
for
age,
gender,
race,
class
using
p
<
.05
for
each
assuming
predictors
are
independent
and
have
no
real
eect,
chance
of
nding
one
or
more
signicant
predictors
is
.185
using
Sidak
correc2on,
p
<
.013
the
chance
of
nding
one
or
more
spuriously
signicant
predictors
is
.05
if
shing,
dont
hide
it
in
repor2ng
results
accurate
inference
is
based
on
all
candidates
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
1
0.9
0.8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.500
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
0.550
0.540
0.530
0.520
0.510
0.500
0.500
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
0.550
0.540
0.530
0.520
0.510
0.500
0.500
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
.047
.142
.436
0.500
0.505
0.510
0.515
0.520
0.525
0.530
0.535
0.540
0.545
0.550
0.500
0.505
0.510
0.0
0.043
0.1
0.145
0.2
0.3
0.394
0.4
0.5
0.0
0.063
0.1
0.151
0.2
0.3
0.418
0.4
0.5
0.500
0.505
0.510
data
reduc2on
combine
collinear
variables
into
one
variable
thanks
to:
Kyle
Gorman
Sali
Tagliamonte
workshop
par2cipants