Documente Academic
Documente Profesional
Documente Cultură
ANOVA is a statistical test to demonstrate statistically significant differences between
the means of several groups. It is similar to a student's ttest apart from that ANOVA
allows the comparison of more than just two means.
ANOVA assumes that the variable is normally distributed. The nonparametric
equivalents to this method are the KruskalWallis analysis of ranks, the Median test,
Friedman's twoway analysis of variance, and Cochran Q test
It works by comparing the variance of the means. It distinguishes between within
group variance (the variance of the sample mean) and between group variance (the
variance between the separate sample means). The null hypothesis assumes that the
variance of all the means are the same and that within group variance is the same as
between group variance. The test is based on the ratio of these two variances (known
as the F statistic).
Association and causation
Two variables are said to be associated when one is found more commonly in the
presence of the other.
There are three types of association.
Spurious an association that has arisen by chance and is not real
Indirect the association is due to the presence of another factor (a
confounding variable)
Direct a true association not linked by a third (confounding) variable
Once the association has been established, the next question is whether the
association is causal.
In order to establish causation, the Bradford Hill Causal Criteria (1) are used, these
include:
Strength The stronger the association the more likely it is to be truly causal.
Temporality Does the exposure precede the outcome?
Specificity Is the suspected cause associated with a specific outcome/
disease?
Coherence Does the association fit with other biological knowledge?
Consistency Is the same association found in many studies?
(1) Bradford Hill. The Environment and Disease: Association or Causation?
Proceedings of the Royal Society of Medicine, 58 (1965), 295300.
Bias
Bias describes the situation in a trial where one outcome is systematically favoured. It should be noted
that there is considerable variation in the definitions and classification of bias. The table below lists
some of the more common types of bias.
Type Description
Selection Error in assigning individuals to
bias groups leading to differences which
may influence outcome. Subtypes
include sampling bias where the
subjects are not representative of
the population. This may be due
to volunteer bias. An example of
volunteer bias would be a study
looking at the prevalence
of Chlamydia in the student
population. Students who are at
risk of Chlamydia may be more, or
less, likely to participate in the
study. A similar concept is non
responder bias. If a survey on
dietary habits was sent out in the
post to random households it is
likely that the people who didn't
respond would have poorer diets
than those who did.
Other examples include
loss to follow up bias
prevalence/incidence bias
(Neyman bias): when a study
is investigating a condition
that is characterised by early
fatalities or silent cases. It
results from missed cases
being omitted from
calculations
admission bias (Berkson's
bias): cases and controls in a
hospital case control study
are systematically different
from one another because
the combination of exposure
to risk and occurrence of
disease increases the
likelihood of being admitted to
the hospital
healthy worker effect
Recall bias Difference in the accuracy of the
recollections retrieved by study
participants, possibly due to
whether they have disorder or not.
E.g. a patient with lung cancer may
search their memories more
thoroughly for a history of asbestos
exposure than someone in the
control group. A particular problem
in casecontrol studies.
Publication Failure to publish results from valid
bias studies, often as they showed a
negative or uninteresting result.
Important in metaanalyses where
studies showing negative results
may be excluded.
Workup In studies which compare new
bias diagnostic tests with gold standard
(verification tests, workup bias can be an
bias) issue. Sometimes clinicians may be
reluctant to order the gold standard
test unless the new test is positive,
as the gold standard test may be
invasive (e.g. tissue biopsy). This
approach can seriously distort the
results of a study, and alter values
such as specificity and sensitivity.
Sometimes workup bias cannot be
avoided, in these cases it must be
adjusted for by the researchers.
Expectation Only a problem in nonblinded
bias trials. Observers may
(Pygmalion subconsciously measure or report
effect) data in a way that favours the
expected study outcome.
Hawthorne Describes a group changing it's
effect behaviour due to the knowledge
that it is being studied
Latelook Gathering information at an
bias inappropriate time e.g. studying a
fatal disease many years later
when some of the patients may
have died already
Procedure Occurs when subjects in different
bias groups receive different treatment
Leadtime Occurs when two tests for a
bias disease are compared, the new
test diagnoses the disease earlier,
but there is no effect on the
outcome of the disease
Error in assigning individuals to
groups leading to differences
which may influence outcome.
Subtypes include sampling bias
where the subjects are not
representative of the population.
This may be due to volunteer
bias. An example of volunteer
bias would be a study looking at
the prevalence of Chlamydia in
the student population. Students
who are at risk of Chlamydia
may be more, or less, likely to
participate in the study. A similar
concept is nonresponder bias. If
a survey on dietary habits was
sent out in the post to random
households it is likely that the
people who didn't respond
would have poorer diets than
those who did. Other examples
include loss to follow up bias,
prevalence/incidence bias aka
Neyman bias, admission bias
Selection aka Berkson's bias, healthy
bias worker effect)
Information A form of bias that occurs when
bias measurement of information differs
among study groups examples
include recall bias, reporting bias,
diagnostic bias, and Hawthorne
effect, errors in measurement
Failure to publish results from valid
Publication studies, often as they showed a
bias negative or uninteresting result.
Important in metaanalyses where
studies showing negative results
may be excluded
Confounding Distortion of exposure, disease
bias relation by some other factor
Clinical trial: phases
Clinical trials are commonly classified into 4 phases;
I Determines Conducted on
pharmacokinetics healthy volunteers
and
pharmacodynamics
and sideeffects
prior to larger
studies
II Assess efficacy + Involves small
dosage number of patients
affected by
particular disease
May be subdivided
into
IIa assesses
optimal
dosing
IIb assesses
efficacy
IV Postmarketing Monitors for long
surveillance term effectiveness
and sideeffects
Confidence interval and standard error of the mean
The confidence interval is a common and sometimes misunderstood principle in
medical statistics.
a formal definition may be: a range of values for a variable of interest
constructed so that this range has a specified probability of including the true
value of the variable. The specified probability is called the confidence level,
and the end points of the confidence interval are called the confidence limits*
in simpler terms: a range of values within which the true effect of intervention is
likely to lie
The likelihood of the true effect lying within the confidence interval is determined by
the confidence level. For example a confidence interval at the 95% confidence level
means that the confidence interval should contain the true effect of intervention 95%
of the time.
How is the confidence interval calculated?
The standard error of the mean (SEM) is a measure of the spread expected for the
mean of the observations i.e. how 'accurate' the calculated sample mean is from the
true population mean
Key point
SEM = SD / square root (n)
where SD = standard deviation and n = sample size
therefore the SEM gets smaller as the sample size (n) increases
A 95% confidence interval:
lower limit = mean (1.96 * SEM)
upper limit = mean + (1.96 * SEM)
The above formula is a slight simplification:
if a small sample size is used (e.g. n < 100) then it is important to use a
'Student's T critical value' lookup table to replace 1.96 with a different value
if a different confidence level is required, e.g. 90% then 1.96 is replaced by a
different value. For 90% this would 1.645
Results such as mean value are often presented along with a confidence interval. For
example, in a study the mean height in a sample taken from a population is 183cm.
You know that the standard error (SE) (the standard deviation of the mean) is 2cm.
This gives a 95% confidence interval of 179187cm (+/ 2 SE).
*Last JM. A dictionary of epidemiology. Oxford: International Journal of Epidemiology,
1988
Confounding
In statistics confounding refers to a variable which correlates with other variables
within a study leading to spurious results.
For example
a casecontrol study looks at whether lowdose aspirin can prevent colorectal
cancer
the proportion of people diagnosed with colorectal who took aspirin is compared
to the proportion of people without colorectal cancer who took aspirin
if the case and control groups are not matched for age then age could be said
to be a confounding factor as older people are more likely to take aspirin and
also more likely to develop cancer
In another example a study which finds that people who drink coffee are more likely to
develop heart disease. The confounding factor in this study is smoking. Smoking is
associated with both drinking coffee and heart disease. People who drink coffee are
also more likely to smoke. In this case smoking confounds the apparent relationship
between coffee and heart disease.
Confounding occurs when there is a non random distribution of risk factors in the
populations. Age, sex and social class are common causes of confounding.
In the design stage of an experiment, confounding can be controlled by randomisation
which aims to produce an even amount of potential risk factors in two populations.
In the analysis stage of an experiment, confounding can be controlled for by
stratification.
Correlation and linear regression
The terms correlation and regression are related but are not synonymous. Correlation
is used to test for association between variables (e.g. whether salary and IQ are
related). Once correlation between two variables has been shown regression can be
used to predict values of other dependent variables from independent variables.
Regression is not used unless two variables have firstly been shown to correlate.
Correlation
The degree of correlation is summarised by the correlation coefficient (r). This
indicates how closely the points lie to a line drawn through the plotted data. In
parametric data this is called Pearson's correlation coefficient and can take any value
between 1 to +1.
For example
r = 1 strong positive correlation (e.g. systolic blood pressure always increases
with age)
r = 0 no correlation (e.g. there is no correlation between systolic blood
pressure and age)
r = 1 strong negative correlation (e.g. systolic blood pressure always
decreases with age)
Whilst correlation coefficients give information about how one variable may increase
or decrease as another variable increases they do not give information about how
much the variable will change. They also do not provide information on cause and
effect.
Correlation is summarised when using parametric variables by Pearson's correlation
coefficient (represented by a small r). In the situation of non parametric variables,
Spearman's correlation coefficient is used. Spearman's correlation coefficient is
usually represented by the Greek letter p (rho), or by rs.
In the case of dichotomous variables logistic regression is used. Linear (or simple
linear) regression is used when looking for association between two continuous
variables, and multiple regression is used when looking for association between more
than two continuous variables.
Linear regression
In contrast to the correlation coefficient, linear regression may be used to predict how
much one variable changes when a second variable is changed. A regression
equation may be formed, y = a + bx, where
y = the variable being calculated
a = the intercept value, when x = 0
b = the slope of the line or regression coefficient. Simply put, how much y
changes for a given change in x
x = the second variable
Data types
Data type Description
Nominal Observed values can be put into set
categories which have no particular
order or hierarchy. You can count
but not order or measure nominal
data (for example birthplace)
Ordinal Observed values can be put into set
categories which themselves can be
ordered (for example NYHA
classification of heart failure
symptoms)
Discrete Observed values are confined to a
certain values, usually a finite
number of whole numbers (for
example the number of asthma
exacerbations in a year)
Continuous Data can take any value with certain
range (for example weight)
Binomial Data may take one of two values (for
example gender)
Interval A measurement where the difference
between two values is meaningful,
such that equal differences between
values correspond to real differences
between the quantities that the scale
measures (for example temperature)
Disease rates
A rate is a quantity measured with respect to another measured quantity (e.g. 60
miles an hour).
Disease rates are used to measure diseases, help establish causation, and monitor
interventions.
The attributable risk is the rate in the exposed group minus the rate in the
unexposed group. For example the attributable risk for lung cancer in smokers is the
rate of lung cancer in smokers minus the rate of cancer in non smokers. Essentially it
tells you what proportion of deaths in the exposed group were due to the exposure.
The relative risk (RR) is the risk of an event relative to exposure. It is also known as
the risk ratio.
Not
Diseased diseased
Exposed a b
Not c d
exposed
The relative risk is calculated by the following formula
RR= a/(a+b) divided by c/(c+d)
Interpreting the RR is simple, for example if the RR of developing a side effect in drug
X compared to drug Y is 20 then people taking drug X would be 20 times more likely
to develop the side effect than people taking drug Y. (Note: the same cannot be done
with the odds ratio)
A relative risk of 1 means there is no difference between the two groups.
A relative risk of <1 means that the event is less likely to occur in the exposed group.
A relative risk of >1 means that the event is more likely to occur in the exposed group.
The population attributable risk can be described as the reduction in incidence that
would be observed if the population were entirely unexposed. For instance how would
the incidence of lung cancer change if everyone stopped smoking? It can be
calculated by multiplying the attributable risk by the prevalence of exposure in the
population.
The attributable proportion is the proportion of the disease that would be eliminated
in a population if its disease rate were reduced to that of the unexposed group.
Epidemiology (incidence and prevalence)
The terms incidence and prevalence are used to describe the frequency of a condition in a
population.
Incidence
Measure of new cases of a
disease or other health outcome
that develop in a population of
individuals at risk, during a
Incidence specified time period
Incidence risk (aka cumulative incidence) = number of new cases in a specified time period /
number of disease free people at the beginning of the time period (aka population at risk)
The problem with the incidence risk is that it does not account for people who are not present at
the end of the time period. For example in a cohort study a number of people in the study may
refuse to participate, migrate, or die. For that reason the incidence rate is sometimes used.
Incidence rate = number of new cases in a specified time period / total person time at risk during
the follow up period (aka person years at risk)
The denominator in an incidence rate is the sum of each individual's time at risk (ie the length of
time they were followed up in the study) and is commonly expressed as person years at risk.
Prevalence
A measure of the total number
of cases of a disease in a
population during or at a
Prevalence specified time period
Point prevalence = number of cases in a defined population / number of people in a defined
population at the same time
Period prevalence = number of identified cases during a specified period of time / total number of
people in that population
Key points
Prevalence = incidence x duration of condition
In chronic diseases the prevalence is much greater than the incidence
In acute diseases the prevalence and incidence are similar. For conditions such as the
common cold the incidence may be greater than the prevalence
Incidence is a useful measure to study disease etiology
Prevalence is useful for health resource planning
Forest plots
A forest plot (aka a blobbogram) is a graphical display of a number of results from
different studies. It is the main method for illustrating the results of a metaanalysis.
The name of the trials is listed down the left hand side, usually in chronological order.
On the right hand side the results of the studies are shown as squares centred on the
point estimate of the result of each trial. The size of the square is proportional to the
weight of the study in the metaanalysis. The line running through the square shows
the confidence interval, usually at 95%.
The large vertical line is the line of no effect. Results with confidence intervals which
cross this line could potentially have mean values which are beyond this line and
therefore insignificant. Beneath the individual trials is the summary result (i.e. The
result of the metaanalysis) represented by a diamond.
Funnel plot
A funnel plot is primarily used to demonstrate the existence of publication bias in
metaanalyses. Funnel plots are usually drawn with treatment effects on the horizontal
axis and study size on the vertical axis.
Interpretation
a symmetrical, inverted funnel shape indicates that publication bias is unlikely
conversely, an asymmetrical funnel indicates a relationship between treatment
effect and study size. This indicates either publication bias or a systematic
difference between smaller and larger studies ('small study effects')
Graphical representations of statistical data
The table below gives a brief summary of the main types of graphs used to represent statistical data.
Graphical representation of the
Boxand sample minimum, lower quartile,
whisker median, upper quartile and
plot sample maximum
Funnel Used to demonstrate the existence
plot of publication bias in metaanalyses
Histogram A graphical display of continuous
data where the values have been
categorised into a number of
categories
Forest Forest plots are usually found in
plot metaanalyses and provide a
graphical representation of the
strength of evidence of the
constituent trials
Scatter Graphical representation using
plot Cartesian coordinates to display
values for two variables for a set of
data
Kaplan A plot of the KaplanMeier estimate
Meier of the survival function showing
survival decreasing survival with time
plot
Hazard ratio
The hazard ratio (HR) is similar to relative risk but is used when risk is not constant to
time. It is typically used when analysing survival over time
Incidence and prevalence
These two terms are used to describe the frequency of a condition in a population.
The incidence is the number of new cases per population in a given time period.
For example, if condition X has caused 40 new cases over the past 12 months per
1,000 of the population the annual incidence is 0.04 or 4%.
The prevalence is the total number of cases per population at a particular point in
time.
For example, imagine a questionnaire is sent to 2,500 adults asking them how much
they weigh. If from this sample population of 500 of the adults were obese then the
prevalence of obesity would be 0.2 or 20%.
Relationship
prevalence = incidence * duration of condition
in chronic diseases the prevalence is much greater than the incidence
in acute diseases the prevalence and incidence are similar. For conditions such
as the common cold the incidence may be greater than the prevalence
Intention to treat analysis
Intention to treat analysis is a method of analysis for randomized controlled trials in
which all patients randomly assigned to one of the treatments are analysed together,
regardless of whether or not they completed or received that treatment
Intention to treat analysis is done to avoid the effects of crossover and dropout, which
may affect the randomization to the treatment groups
Key trials: diabetes mellitus
The following table summarises some of the key trials that have altered the approach to diabetes
mellitus:
The United Kingdom Prospective
Diabetes Study was a seminal trial of
over 5,000 patients with type 2
diabetes mellitus. Patients were
followed for an average of 10 years
to establish whether control of blood
glucose levels was associated with
clinical benefits (reduced
macrovascular and microvascular
complications) and whether there
was an advantage to any particular
type of drug treatment. UKPDS also
had a blood pressure control arm to
establish whether this had an impact
on complication rates.
Main results
UKPDS confirmed the
importance of tight glycaemic
and blood pressure control in
type 2 diabetics
both macrovascular and
microvascular complications
were reduced in patients with
tight glycaemic control
UKPDS
DCCT The Diabetes Control and
Complications Trial involved 1,400
patients with type 1 diabetes mellitus in
the US and Canada between 1983 and
1993.
Main results
DCCT showed a significant
reduction in microvascular
complications for patients who
had tight glycaemic control
there was a higher incidence of
hypoglycaemia in the group who
had tight glycaemic control
DREAM The Diabetes Reduction Assessment
with ramipril and rosiglitazone
Medication trial looked at whether
patients with impaired fasting glucose
(IFG) and/or impaired glucose tolerance
(IGT) could be stopped from developing
type 2 diabetes by using either ramipril
and rosiglitazone.
The study showed that the onset of type
2 diabetes may be delayed by
rosiglitazone therapy.
Key trials: hypertension
The following table summarises some of the key trials that have altered the approach to hypertension:
The 1999 Swedish Trial in Old
Patients with Hypertension2 study
looked at whether older drugs (beta
blockers or thiazides) or newer
drugs (ACE inhibitors or calcium
channel blockers) were better at
preventing fatal cardiovascular
disease.
Main results
old and new antihypertensive
drugs were similar in
prevention of cardiovascular
mortality or major events
decrease in blood pressure
was the most important factor
in the prevention of
cardiovascular events in this
age group
supports the NICE approach to
using older agents firstline in
the elderly population
STOP2
ALLHAT The Antihypertensive and Lipid
Lowering Treatment to Prevent Heart
Attack Trial was a large randomised
controlled trial that was started in 1994
and reported in 2002. ALLHAT
compared amlodipine, chlorthalidone
(a thiazide), lisinopril and doxazosin.
Over 40,000 patients aged 55 years or
older who had hypertension with one
other risk factor (for example diabetes)
were included in the trial.
ALLHAT is seen as a landmark trial
due to the large size and inclusion of
minority groups such as people of Afro
Caribbean descent.
Main results
chlorthalidone outperformed
lisinopril in preventing
cardiovascular disease, a
surprising finding which has been
debated since (particularly in
relation to the large number of
black patients in the trial (ACE
inhibitors are known to be less
effective in this group)
the doxazosin arm was stopped
prematurely due to a higher
incidence of heart failure
60% of patients reached the
target blood pressure of 140/90
mmHg (it was generally thought
prior to the trial that blood
pressure targets were more
difficult to achieve)
ASCOT The 2003 AngloScandinavian Cardiac
Outcomes Trial Blood Pressure
Lowering Arm was a doubleblinded,
randomised controlled trial of around
20,000 patients with hypertension and
other risk factors. Patients were
randomised to either atenolol (with the
addition of bendroflumethiazide if
needed) or amlodipine (with the
addition perindopril if needed). The
primary outcome was nonfatal
myocardial infarction (MI) and fatal
ischaemic heart disease (IHD).
Main results
the study was stopped
prematurely because of a higher
death rate in the atenolol
assigned group
the group receiving amlodipine
based regimes had a non
significant 10% reduction in
primary outcomes (nonfatal MI
plus fatal IHD) and significant
reductions in nearly all secondary
cardiovascular endpoints and
newonset diabetes
the trial resulted in a major shift
away from the use of beta
blockers in the management of
hypertension
Key trials: lipid management
The following table summarises some of the key trials which have altered the approach to lipid
management:
The 1994 Scandinavian
Simvastatin Survival Study was a
doubleblinded randomised
controlled trial looking at the
secondary prevention of
cardiovascular disease.
Patients who had ischaemic heart
disease and a cholesterol
between 5.5 and 8.0 mmol/l were
given either simvastatin or a
placebo.
Main results
total mortality decreased by
30% with death related to
ischaemic heart disease
decreased by 42%
no increase in non
The 4S
cardiovascular death
trial
WOSCOPS The 1995 West of Scotland
Coronary Prevention Study was a
randomised controlled trial of men
aged 4564 years with no history of
ischaemic heart disease and with a
raised cholesterol ( > 6.5 mmol/l).
Participants were given either
pravastatin or a placebo.
Main results
total mortality decreased by
22% with death related to
ischaemic heart disease
decreased by 31%
Heart A large randomised controlled trial
Protection of just over 20,000 patients funded
Study by the Medical Research Council.
Initial results were published in
2002. Patients were included if they
were between 40 80 years and
were considered to have a
substantial 5year risk of death from
ischaemic heart disease due to a
history of vascular disease or risk
factors such as diabetes or
hypertension.
Patients were randomly allocated
either simvastatin 40mg,
antioxidants (600 mg vitamin E, 250
mg vitamin C and 20 mg beta
carotene daily), placebo or a
combination.
Main results
number needed to treat (NNT)
to prevent all cause death =
57, NNT to prevent death
related to ischaemic heart
disease = 85
NNT to prevent a vascular
event = 19, NNT to prevent a
major coronary event = 33,
NNT to prevent a stroke = 73
vascular events were reduced
by around 25%
antioxidants did not affect
clinical outcome
Normal distribution
The normal distribution is also known as the Gaussian distribution or 'bellshaped'
distribution. It describes the spread of many biological and clinical measurements
Properties of the Normal distribution
symmetrical i.e. Mean = mode = median
68.3% of values lie within 1 SD of the mean
95.4% of values lie within 2 SD of the mean
99.7% of values lie within 3 SD of the mean
this is often reversed, so that within 1.96 SD of the mean lie 95% of the sample
values
the range of the mean (1.96 *SD) to the mean + (1.96 * SD) is called the 95%
confidence interval, i.e. If a repeat sample of 100 observations are taken from
the same group 95 of them would be expected to lie in that range
Standard deviation
the standard deviation (SD) is a measure of how much dispersion exists from
the mean
SD = square root (variance)
Numbers needed to treat and absolute risk
reduction
Numbers needed to treat (NNT) is a measure that indicates how many patients would
require an intervention to reduce the expected number of outcomes by one
It is calculated by 1/(Absolute risk reduction) and is rounded to the next highest whole
number
Experimental event rate (EER) = (Number who had particular outcome with the
intervention) / (Total number who had the intervention)
Control event rate (CER) = (Number who had particular outcome with the control/
(Total number who had the control)
Absolute risk reduction = CEREER or EERCER?
The absolute risk reduction (ARR) may be calculated by finding the absolute
difference between the control event rate (CER) and the experimental event rate
(EER). You will often find both versions of the above listed in different sources. In
some ways in doesn't matter which you use as you will end up with the same answer
but from a technical point of view:
if the outcome of the study is undesirable then ARR = CER EER
if the outcome of the study is desirable then ARR* = EER CER
*this may be more accurately termed absolute benefit increase, rather than absolute
risk reduction
EER = Number who
had outcome ex / Total
number of ex CER =
Number who had
outcome cont / Total
number of control EER
CER = ARR 1/ARR =
NNT
Save my notes
Odds and odds ratio
Odds are a ratio of the number of people who incur a particular outcome to the
number of people who do not incur the outcome. The odds ratio may be defined as
the ratio of the odds of a particular outcome with experimental treatment and that of
control.
Odds vs. probability
In contrast, probability is the fraction of times you'd expect to see an event in many
trials. When expressed as a single number probability is always between 0 and 1. So,
if we take the example of rolling a dice:
the probability of rolling a six is 1/6 or 0.166666
the odds of rolling a six is 1/5 or 0.2
Odds ratios are the usual reported measure in casecontrol studies. It approximates to
relative risk if the outcome of interest is rare.
For example, if we look at a trial comparing the use of paracetamol for
dysmenorrhoea compared to placebo we may get the following results
Total
number Achieved =
of 50% pain
patients relief
Paracetamol 60 40
Placebo 90 30
The odds of achieving significant pain relief with paracetamol = 40 / 20 = 2
The odds of achieving significant pain relief with placebo = 30 / 60 = 0.5
Therefore the odds ratio = 2 / 0.5 = 4
Pre and post test odds and probability
Pretest probability
The proportion of people with the target disorder in the population at risk at a specific
time (point prevalence) or time interval (period prevalence)
For example, the prevalence of rheumatoid arthritis in the UK is 1%
Posttest probability
The proportion of patients with that particular test result who have the target disorder
Posttest probability = post test odds / (1 + posttest odds)
Pretest odds
The odds that the patient has the target disorder before the test is carried out
Pretest odds = pretest probability / (1 pretest probability)
Posttest odds
The odds that the patient has the target disorder after the test is carried out
Posttest odds = pretest odds x likelihood ratio
where the likelihood ratio for a positive test result = sensitivity / (1 specificity)
PubMed Searching
PubMed searches can be refined by using multiple search terms connected by
Boolean operators: AND, OR, and NOT. Boolean operators must be entered in upper
case. The operator AND selects the references that contain both search terms, OR
selects the references that contain either search term, and NOT selects the
references that contain the first term but not the second term.
PubMed processes all Boolean operators in a lefttoright sequence unless closed off
by parenthesis, eg, depression AND (smoking OR unemployment).
Relative risk
Relative risk (RR) is the ratio of risk in the experimental group (experimental event
rate, EER) to risk in the control group (control event rate, CER). The term relative risk
ratio is sometimes used instead of relative risk.
To recap
EER = rate at which events occur in the experimental group
CER = rate at which events occur in the control group
For example, if we look at a trial comparing the use of paracetamol for
dysmenorrhoea compared to placebo we may get the following results
Total
number Experienced
of significant
patients pain relief
Paracetamol 100 60
Placebo 80 20
Experimental event rate, EER = 60 / 100 = 0.6
Control event rate, CER = 20 / 80 = 0.25
Therefore the relative risk ratio = EER / CER = 0.6 / 0.25 = 2.4
If the risk ratio is > 1 then the rate of an event (in this case experiencing significant
pain relief) is increased compared to controls. It is therefore appropriate to calculate
the relative risk increase if necessary (see below).
If the risk ratio is < 1 then the rate of an event is decreased compared to controls. The
relative risk reduction should therefore be calculated (see below).
Relative risk reduction (RRR) or relative risk increase (RRI) is calculated by
dividing the absolute risk change by the control event rate
Using the above data, RRI = (EER CER) / CER = (0.6 0.25) / 0.25 = 1.4 = 140%
Scatter graphs
Scatter graphs are used in correlation and regression analyses. They assist in
determining, visually, if variables are associated. They may also show the nature of a
relationship. They can also assist in determining if there are any outliers that may be
effecting the distribution.
An outlier is defined as a data point that emanates from a different model than do the
rest of the data. The data (on the scatter graph below) appear to come from a linear
model with a given line except for the outliers (Italy and to a lesser extent Spain) which
appear to have been generated from some other model.
Screening test statistics
Patients and doctors need to know if a disease or condition is present or absent. Tests can be used to
help us decide. Tests generally guide us by indicating how likely it is that the patient has the condition.
In order to interpret test results we need to have a working knowledge of the statistics used to describe
them.
Contingency tables (also known as 2 * 2 tables, see below) are used to illustrate and calculate test
statistics such as sensitivity. It would be unusual for a medical exam not to feature a question based
around screening test statistics. Commit the following table to memory and spend time practicing using
it as you will be expected to make calculations using it in your exam.
TP = true positive; FP = false positive; TN = true negative; FN = false negative
Disease Disease
present absent
Test positive TP FP
Test FN TN
negative
The table below lists the main statistical terms used in relation to screening tests:
Positive and negative predictive values are prevalence dependent. Likelihood ratios are not prevalence
dependent.
Precision
The precision quantifies a tests ability to produce the same measurements with repeated tests.
Screening: Wilson and Junger criteria
1. The condition should be an important public health problem
2. There should be an acceptable treatment for patients with recognised disease
3. Facilities for diagnosis and treatment should be available
4. There should be a recognised latent or early symptomatic stage
5. The natural history of the condition, including its development from latent to
declared disease should be adequately understood
6. There should be a suitable test or examination
7. The test or examination should be acceptable to the population
8. There should be agreed policy on whom to treat
9. The cost of casefinding (including diagnosis and subsequent treatment of patients)
should be economically balanced in relation to the possible expenditure as a whole
10. Casefinding should be a continuous process and not a 'once and for all' project
Significance tests
A null hypothesis (H0) states that two treatments are equally effective (and is hence
negatively phrased). A significance test uses the sample data to assess how likely the
null hypothesis is to be correct.
For example:
'there is no difference in the prevalence of colorectal cancer in patients taking
lowdose aspirin compared to those who are not'
The alternative hypothesis (H1) is the opposite of the null hypothesis, i.e. There is a
difference between the two treatments
The p value is the probability of obtaining a result by chance at least as extreme as
the one that was actually observed, assuming that the null hypothesis is true. It is
therefore equal to the chance of making a type I error (see below).
Two types of errors may occur when testing the null hypothesis
type I: the null hypothesis is rejected when it is true i.e. Showing a difference
between two groups when it doesn't exist, a false positive. This is determined
against a preset significance level (termed alpha). As the significance level is
determined in advance the chance of making a type I error is not affected by
sample size. It is however increased if the number of endpoints are increased.
For example if a study has 20 endpoints it is likely one of these will be reached,
just by chance.
type II: the null hypothesis is accepted when it is false i.e. Failing to spot a
difference when one really exists, a false negative. The probability of making a
type II error is termed beta. It is determined by both sample size and alpha
Study Study rejects
accepts H0 H0
Reality Type 1 error
H0 (alpha)
The power of a study is the probability of (correctly) rejecting the null hypothesis when
it is false, i.e. the probability of detecting a statistically significant difference
power = 1 the probability of a type II error
power can be increased by increasing the sample size
Significance tests: types
The type of significance test used depends on whether the data is parametric
(something which can be measured, usually normally distributed) or nonparametric
Parametric tests
Student's ttest paired or unpaired*
Pearson's productmoment coefficient correlation
Nonparametric tests
MannWhitney U test unpaired data
Wilcoxon signedrank test compares two sets of observations on a single
sample
chisquared test used to compare proportions or percentages
Spearman, Kendall rank correlation
*paired data refers to data obtained from a single group of patients, e.g. Measurement
before and after an intervention. Unpaired data comes from two different groups of
patients, e.g. Comparing response to different interventions in two groups
Skewed distributions
Normal (Gaussian) distributions: mean = median = mode
Positively skewed distribution: mean > median > mode
Negatively skewed distribution mean < median < mode
To remember the above note how they are in alphabetical order, think positive going
forward with '>', whilst negative going backwards '<'
Statistical terms: descriptive statistics
The table below gives a brief definition of commonly encountered terms:
Term Description
Mean The average of a series of observed
values
Median The middle value if series of
observed values are placed in order
Mode The value that occurs most frequently
within a dataset
Range The difference between the largest
and smallest observed value
Stats Histograms and bar charts
A bar chart is used to summarise ordinal or nominal data. Conventionally the xaxis
represents the categories (and therefore does not have a scale) and the yaxis the
frequencies. Columns are of equal length and the height of the bar indicates the
frequency.
Histograms are similar to bar charts except that the width of the columns can be
different. Histograms are used for quantitative data. In a histogram both axes have a
scale. The value of the yaxis is the relative frequency (aka the frequency density).
The area (not height) of a column indicates the true frequency.
Stats Hypothesis testing
A null hypothesis (H0) states that two treatments are equally effective (and is hence
negatively phrased). A significance test uses the sample data to assess how likely the
null hypothesis is to be correct.
For example:
'there is no difference in the prevalence of colorectal cancer in patients taking low
dose aspirin compared to those who are not'
The alternative hypothesis (H1) is the opposite of the null hypothesis, i.e. There is a
difference between the two treatments
When the alternative hypothesis makes no prediction about the way in which two
variables are related then it is referred to as two tailed. When it does make a
prediction about the direction of the relationship it is referred to as one tailed.
He is an example to clarify these terms. You wonder if those people taking paper 1 of
the MRCPsych for the second time get higher marks. The null hypothesis would be,
those Candidates taking paper 1 for the second time get the same result as those
taking it for the first time (i.e. there is no difference). A one tailed alternative
hypothesis might be, those taking paper 1 for the second time get higher marks than
those taking it for the first time. Where as a two tailed alternative hypothesis might be,
those taking the exam for the second time get different results to those taking it for the
first time.
The p value is the probability of obtaining a result by chance at least as extreme as
the one that was actually observed, assuming that the null hypothesis is true. It is
therefore equal to the chance of making a type I error (see below).
Two types of errors may occur when testing the null hypothesis
Type I: the null hypothesis is rejected when it is true i.e. Showing a difference
between two groups when it doesn't exist, a false positive. This is determined
against a preset significance level (termed alpha). As the significance level is
determined in advance the chance of making a type I error is not affected by
sample size. It is however increased if the number of endpoints are increased.
For example if a study has 20 endpoints it is likely one of these will be reached,
just by chance.
Type II: the null hypothesis is accepted when it is false i.e. Failing to spot a
difference when one really exists, a false negative. The probability of making a
type II error is termed beta. It is determined by both sample size and alpha
Study Study rejects
accepts H0 H0
Reality Type I error
H0 (alpha)
Power = 1 the probability of a type II error
Power can be increased by increasing the sample size
To understand p values in more detail consider the following example. A researcher
wants to test whether or not a drug alters people's height. The obvious experiment is
to select two similar groups, administer a placebo to one group, the drug to the other,
measure the heights in both groups and compute a mean and standard deviation. The
mean heights will probably be different. The question then arises is the observed
difference due to the drug or to random variation?
To answer this question, a statistician would firstly assume that the drug does not
affect an affect on a person's height (the null hypothesis). If the null hypothesis is true
then any difference observed is due to random variation. Now, in theory, the
statistician repeats the experiment using all possible samples and generates a range
of values. This gives him an idea of the way the height varies. Most of these values
will be small, but a few samples will not be representative of the population and will
produce large results. These unrepresentative samples happen approximately 5% of
the time and therefore the statistician can create a cut off point in the values of the top
5%.
Having determined a cut of point, an experiment is done with the drug in question
versus a placebo. If the result is above this cut of point we can conclude that there is a
less than 5% chance of observing this result on the assumption that the drug has had
no effect was true (that the null hypothesis was true).
The p value is the probability of obtaining a result that is as large or larger when in
reality there is no difference between two groups.
A researcher must choose the point at which they reject the null hypothesis, this is
called the alpha level and is usually 5% or 1%.
If the p value is 0.03, that means that there is a 3% chance of observing a difference
as large as you observed even if the two population means are identical. It is tempting
to conclude, therefore, that there is a 97% chance that the difference you observed
reflects a real difference between populations and a 3% chance that the difference is
due to chance. Wrong. What you can say is that random sampling from identical
populations would lead to a difference smaller than you observed in 97% of
experiments and larger than you observed in 3% of experiments.
Stats Intention to treat analysis
Intention to treat analysis is a method of analysis for randomized controlled trials in
which all patients randomly assigned to one of the treatments are analysed together,
regardless of whether or not they completed or received that treatment
Intention to treat analysis is done to avoid the effects of crossover and dropout, which
may affect the randomization to the treatment groups.
Stats Internal consistency
Internal consistency is the extent to which items on a test measure various aspects of
the same characteristic and nothing else.
There are four main ways to assess it:
Average interitem correlation
Average itemtotal correlation
Split half correlation
Cronbach's alpha
Stats KaplanMeier curves
With some experiments, the outcome is a survival time, and you want to compare the
survival of two or more groups. Survival curves show, for each time plotted on the X
axis, the portion of all individuals surviving as of that time.
The term 'survival' is a bit misleading as you can use survival curves to study times
required to reach any welldefined endpoint (e.g. time to an episode of self harm, time
to relapse in psychotic illness).
These survival curves are better known as KaplanMeier curves.
The graph below illustrates a KaplanMeier survival curve
The vertical green line illustrates the situation at day 80 of the study. At this point you
can see that 75% of group A, and 40% of group B have survived.
Stats Kappa
The kappa statistic (aka Cohen's kappa coefficient) gives a quantitative measure of
the magnitude of agreement between observers.
Interobserver variation can be measured in any situation where two or more
independent observers are evaluating the same thing.
Kappa can take any value between 0 and 1. 0 implies the observers are in complete
disagreement and a value of 1 implies complete agreement.
The Kappa statistic. Family Medicine 2005;37(5):3603.
Stats Measures of central tendency
Descriptive statistics are used to describe the basic features of the data in a study. They are typically
distinguished from inferential statistics which help to form conclusions beyond the immediate data.
Descriptive statistics help us to simplify data.
Measures of central tendency
There are three measures of central tendency, the mean, median, and mode.
Measure Description
Median The median is the middle number of a
set of numbers arranged in numerical
order. It is not affected by outliers
It is calculated by arranging the values
in order then selecting the value
whereby half the values are above and
half are below. For example in the data
set below, the vale 3 (in bold) is the
median value
1, 3, 3, 4, 5
In cases where there are an even
number of values the median value is
half way between the middle two
values. For example in the data set
below, the median value is half way
between 3 and 4 which is 3.5
1, 3, 3, 4, 5, 6
Mode The mode is the most frequent value
In some data sets there may be two
modes or more. For example see the
data set below
1, 1, 1, 2, 2, 2, 3, 4
The modal values in this case would
be both 1 and 2
Mean The mean is calculated by adding all
the scores together and dividing by the
number of scores.
For example in the following data set
1, 2, 2, 2, 3
mean = (1 + 2 + 2 + 2 + 3) / 5
mean = 10 / 2
mean = 5
Unlike the median or the mode, the
mean is sensitive to a change in any
value of the data set. The mean is
sensitive to outliers and skewed data.
The Range is the difference between the largest and smallest observed value.
The table below summarises the appropriate method of summarising the middle or typical value of a
data set depending on the measurement scale.
Measure of central
Measurement scale tendency
Categorical Mode
Nominal Mode
Ordinal Median or mode
Interval (Normal Mean (preferable), median,
distribution) or mode
Interval (Skewed Median
data)
Ratio (Normal Mean (preferable), median,
distribution) or mode
Ratio (Skewed data) Median
Stats Measures of dispersion
Dispersion is an indication as to how much variation or spread there is across a data set. It
is usually used in conjunction with a measure of central tendency, such as the mean or
median, to provide an overall description of a set of data.
Range
The simplest measure of dispersion is the range, which is the difference between the
largest and smallest value.
Interquartile range
The Interquartile range (also called the mid spread) is equal to the difference between the
3rd and 1st quartiles.
The median divides the data into two halves. For a set of n ordered numbers the median
is the (n + 1) ?? 2 th value. I this case n = 11 so it is the 6th value.
1 2 3 4 5 6 7 8 9 10 11
62 65 67 71 72 74 74 88 89 108 125
The 1st quartile divides the bottom half of the data into two halves, and the 3rd quartile
divides the upper half of the data into two halves.
The 1st quartile is the (n + 1) ?? 4 th value. Which in this case is number 3 which is 67.
The 3rd quartile is the 3 (n + 1) ?? 4 th value. Which in this case is number 9 which is 89.
The interquartile range of this data set would therefore be 89 67 which is 22
Quartiles and ranges are useful, but they are also somewhat limited because they do not
take into account every score in our group of data. To get a more representative idea of
spread we need to take into account the actual values of each score in a data set. The
variance and standard deviation are such measures.
Variance
The variance gives an indication as to the amount the values in the data set vary from the
mean. It is calculated by the following formula
Sum (X�?)�? / N
X = score (or value)
�? = mean
N = number of scores
As a measure of variability, the variance is useful. If the scores in a group of data are
spread out, the variance will be a large number. Conversely, if the scores are spread
closely around the mean, the variance will be a smaller number. However, there are two
potential problems with the variance. First, because the deviations of scores from the
mean are 'squared' (this is done to deal with the negative values), this gives more weight
to extreme scores. If our data contains outliers, this can give undo weight to these scores.
Secondly, the variance is not in the same units as the scores in our data set (variance is
measured in the units squared). This means we cannot place it on our frequency
distribution and cannot directly relate its value to the values in our data set. Calculating the
standard deviation rather than the variance rectifies this problem.
Standard deviation (SD)
The SD is used to reflect the distribution of individual scores around their mean. It
indicates how the data varies.
To calculate it simply take the square root of the variance.
A low standard deviation indicates that data points tend to be very close to the mean.
Unlike the variance, the standard deviation is expressed in the same units as the data set.
SDs can be used to indicate how confident we are that data points lie within a particular
range (this is however different to a confidence interval (CI) see below)
For example. If we did a study on the IQ of MRCPsych candidates we would calculate a
mean (hopefully above 100), and we could also calculate a standard deviation (by a
complicated formula that you do not need to know for the exams).
Using the standard deviation we could then say that we would expect that...
A range of one SD above and below the mean would include 68.2% of the values from the
study.
A range of two SDs above and below the mean would include 95.4% of the values from
the study.
A range of three SDs above and below the mean include 99.7% of the values from the
study.
Standard error of the mean (SEM)
The standard error of the mean (SEM) is a measure of the spread expected for the mean
of the observations i.e. how 'accurate' the calculated sample mean is from the true
population mean
SEM = SD / square root (n)
where SD = standard deviation and n = sample size
Therefore the SEM gets smaller as the sample size (n) increases
The SE (not the SD) is used to construct CIs. These CIs indicate the probability of the
population mean lying within a range of values.
So....
For example, for a 95% CI (+/ 1.96 x SE) we would be 95% confident that the population
mean would be within +/ 1.96 standard errors from the sample mean.
On its own the SE is pretty meaningless, as its real use lies in calculating CIs.
For a very helpful review of this complicated topic see, Streiner. Maintaining Standards:
Differences between the Standard Deviation and Standard Error, and When to Use Each.
Can J Psychiatry 1996;41:498??502.
Stats Normality testing
Parametric tests are based on the assumption that the data set is normally distributed.
Nonparametric tests do not but are less powerful.
There are a variety of tests used to check that a distribution is normally distributed
(only the most well known ones are listed)
The KolmogorovSmirnov (GoodnessofFit) Test (when adapted specifically for
this purpose it is sometimes referred to as the Lilliefor's test)
JarqueBera test
WilkShapiro test
Pplot
Qplot
Note also that if a data set is not normally distributed that sometimes it can be
transformed to make it so (e.g. by taking a logarithm of the values).
Stats Odds ratio
Odds are calculated by dividing the number of times an event happens by the number
of times is does not happen.
For example...
If 2 in every 100 patients treated with an antipsychotic have a seizure then the odds of
developing a seizure if given an antipsychotic is 2/98 = 0.0204
Note: this is different to the risk which is the probability that an event will happen
divided by those at risk. In this case 2/100 = 0.02.
The Odds ratio is calculated by dividing the odds of having been exposed to a risk
factor by the odds in the control group.
An odds ratio of 1 indicates no difference in risk between the two groups.
If the odds ratio >1 then the rate of that event is increased in people exposed to that
risk factor.
If the odds ratio <1 then the rate of that event is reduced in people exposed to that
risk factor.
Odds ratios are the usual reported measure in casecontrol studies. It approximates to
relative risk if the outcome of interest is rare.
For example, if we look at a trial comparing the use of paracetamol for headache
compared to placebo we may get the following results
Total Achieved =
number of 50% pain
patients relief
Paracetamol 60 40
Placebo 90 30
The odds of achieving significant pain relief with paracetamol = 40 / 20 = 2
The odds of achieving significant pain relief with placebo = 30 / 60 = 0.5
Therefore the odds ratio = 2 / 0.5 = 4
Stats Opportunity cost
Opportunity cost is an economic term used to help us compare choices.
It is defined as 'the value of the nextbest alternative that is forgone when an
economic choice is made'.
Basically it highlights the fact that when you only have a limited budget and you spend
your money on one thing (e.g. antidepressants) you cannot spend it on other things
(e.g. CBT). It also recognises that these alternatives have a value.
In summary you should think wisely about how to spend your money in order to get
the best value.
In medicine the opportunity cost is often compared using QALY's (quality adjusted life
years).
Stats Parametric statistics
Parametric statistics is a branch of statistics that assumes data comes from a type of
probability distribution and makes inferences about the parameters of the distribution.
A parameter is a numerical quantity, usually unknown, that describes a certain
population characteristic. Within a population, a parameter is a fixed value that does
not vary. Examples include:
The true average height of human males
The median income in Egypt
Stats Power
The power of a study is the probability of (correctly) rejecting the null hypothesis when
it is false (i.e. it will not make a Type II error).
Basically we use the power to help us decide how many people need to be recruited in
a study in order to detect a clinically meaningful difference or effect.
Power can assume values between 0 and 1 (Since probability values are expressed
by numbers between 0 and 1 only). Sometimes it is expressed as a percentage 0
referring to 0 %, and 1 referring to 100 %.
Power is expressed as 1 beta, where beta is the probability of a Type II error. A
power 0.80 is often seen as the level of minimum acceptability.
Power is influenced by the following:
Sample size (larger samples lead to parameter estimations of smaller variance
and therefore increase the study's ability to detect a significant effect
Meaningful effects size (this has to be decided at the beginning of a study, it
is the size of the difference between two means that lead you to reject the null
hypothesis)
Significance level (aka the alpha level, which is the probability of a type I
error)
Stats Publication Bias
Publication bias referrs to the tendency for only studies with a positive result to be
published. This can lead to significant bias of the overall results.
Publication bias can be detected by the following graphical methods (Athanasiou
2009):
Funnel plot
Galbraith plot
Ordered forrest plot
Normal quantile plot
Of these, the most commonly used and the one you absolutely must know about for
the exams is the funnel plot.
A funnel plot is a graph used to check for publication bias in systematic reviews and
metaanalyses. They are a form of scatter graph that offers an easy visual way of
making sure that the published literature is evenly weighted (drug companies have a
habit of withholding data that doesn't support the product).
Interpretation
A symmetrical, inverted funnel shape indicates that publication bias is unlikely
(see below)
Conversely, an asymmetrical funnel indicates a relationship between treatment
effect and study size. This indicates either publication bias or a systematic
difference between smaller and larger studies ('small study effects')
Athanasiou. Key Topics in Surgical Research and Methodology. Springer; 1st Edition,
17 Dec 2009.
Stats Scales of measurement
Variables can be divided into quantitative and qualitative types.
The following table summerises the various types and a more detailed explanation follows below.
Data type Description
Nominal Observed values can be put into set
categories which have no particular
order or hierarchy. You can count
but not order or measure nominal
data (for example birthplace)
Ordinal Observed values can be put into set
categories which themselves can be
ordered (for example social class)
Interval A measurement where the difference
between two values is meaningful,
such that equal differences between
values correspond to real differences
between the quantities that the scale
measures (for example temperature)
Ratio Like interval scales except they have
true zero points
Discrete Observed values are confined to a
certain values, usually a finite
number of whole numbers (for
example the number of depressive
relapses in a year)
Continuous Data can take any value with certain
range (for example weight)
Binomial Data may take one of two values (for
example gender)
Quantitative variables
Quantitative variables take on numeric values and can be further classified into discrete and continuous
types. A discrete variable is one whose values vary by specific finite steps (e.g. Number of siblings). A
continuous variable on the other hand, can take any value. Quantitative variables can also be
subdivided into interval and ratio types.
Interval scales
On interval measurement scales, one unit on the scale represents the same magnitude on the trait or
characteristic being measured across the whole range of the scale. For example, if depression were
measured on an interval scale, then a difference between a score of 10 and a score of 11 would
represent the same difference in depression as would a difference between a score of 50 and a score
of 51. Interval scales do not have a true zero point, however, and therefore it is not possible to make
statements about how many times higher one score is than another. For a depression scale, it would
not be valid to say that a person with a score of 30 was twice as depressed as a person with a score of
15. A good example of an interval scale is the Fahrenheit scale for temperature. Equal differences on
this scale represent equal differences in temperature, but a temperature of 30 degrees is not twice as
warm as one of 15 degrees.
Ratio scales
Ratio scales are like interval scales except they have true zero points. A good example is the Kelvin
scale of temperature. This scale has an absolute zero. Thus, a temperature of 300 Kelvin is twice as
high as a temperature of 150 Kelvin. Other examples of ratio scales include weight, time, and length.
Qualitative variables
Qualitative variables do not take on numerical values and are usually names. Some qualitative
variables have an inherent order in their categories (e.g. Social class) and are described as ordinal.
Qualitative variables are also called categorical or nominal variables (the values they take are
categories or names). When a qualitative variable has only two categories it is called a binary
(dichotomous or attribute) variable.
Stats Screening: Wilson and Junger criteria
1. The condition should be an important public health problem
2. There should be an acceptable treatment for patients with recognised disease
3. Facilities for diagnosis and treatment should be available
4. There should be a recognised latent or early symptomatic stage
5. The natural history of the condition, including its development from latent to
declared disease should be adequately understood
6. There should be a suitable test or examination
7. The test or examination should be acceptable to the population
8. There should be agreed policy on whom to treat
9. The cost of casefinding (including diagnosis and subsequent treatment of patients)
should be economically balanced in relation to the possible expenditure as a whole
10. Casefinding should be a continuous process and not a 'once and for all' project
Stats Skewed data
In a data set of normally distributed data a bell shaped curve is seen that is
symmetrical. In these situations the median, mode, and mean are all equal.
Skewness is a measure of the degree of asymmetry of a distribution. In a negative
skew the bulk of data is concentrated to the right of the figure and the left tail is longer
(this is referred to as left skew). In positive skew the opposite is true.
For skewed data the median is always positioned between the mode and the mean,
because it is the halfway point. The mode always corresponds to the peak of the
distribution as it represents the most common value. The mean however moves away
from the median in the direction of the tail because of its tendency to be affected by
extreme values (outliers).
For normally distributed data mode=median=mean
For positively (right) skewed data mean>median>mode
For negatively (left) skewed data mode>median>mean
Stats Standards of reporting
You must be familiar with the following standards of publication:
CONSORT (Consolidated Standards of Reporting Trials) Guidelines for
randomised controlled trials
QUORUM (Quality of Reporting of Metaanalyses)
MOOSE (Metaanalysis Of Observational Studies in Epidemiology)
STROBE (Strengthening the Reporting of Observational Studies in
Epidemiology)
SQUIRE (Standards for QUality Improvement Reporting Excellence)
STARD (STAndards for the Reporting of Diagnostic accuracy)
Stats Statistical test comparisons
Questions on test choice in order to make comparisons are very tricky. Don't panic, equipped with a
basic understanding you'll be able to select the right answer without fully understanding the statistics
which underpin it.
Statistical test comparisons are used to test hypotheses. If there is no hypothesis, then no statistical
test is appropriate.
When asked to choose the test used for making a comparison between two or more groups, first you
must ask the following questions:
What type of data is used?
How is the data distributed?
Is the data paired or unpaired?
How many groups are being compared?
Once you have answered these questions, use the table to select the test.
When assessing for correlation, use the following:
For parametric data Pearson's productmoment coefficient
For nonparametric data Spearman, Kendall rank
Stats Study design
There are two main types of study design each with their own subtypes.
1) Experimental (where an intervention is used)
Randomised trial
Non randomised trial
2) Observational (no intervention, observation only)
Cohort study
Casecontrol study
Crosssectional study
Ecological study
Case series
The table below lists the major types of studies along with their key advantages and disadvantages.
Numerous systems for apprasing the quality and strength of study designs have been
developed. The consensus for the hierachy of study designs is as follows (strongest at
top):
Systematic review/ metaanalysis
Randomised controlled trial
Nonrandomised controlled trial
Cohort studies
Casecontrol studies
Crosssectional surveys
Case series
Case report
Expert opinion
Stats Study design: evidence and recommendations
There is a hierarchy of evidence. Unfortunately there are a lot of different classification systems
which creates inconsistencies with regard to questions. Internet forums frequently discuss this topic
with some claiming one thing and others another.
We provide classification systems which are most consistent with UK practice and so are most
likely to be relevant to the exams. You will find other online exam sites come up with different
answers.
The table below lists the hierarchy as it is considered by the NHS Centre for Reviews and
Dissemination (CRD).
Level of
evidence Description
1 Experimnetal studies (e.g. RCTs
and metaanalyses)
2 Quasi experimental studies (i.e.
controlled trials which are not
randomised)
3 Controlled observational studies
(cohort and casecontrol studies)
4 Observational studies without a
control group
5 Expert opinion based on consensus
In addition, NICE publish a grading of evidence as follows.
Grading of
recommendation Description
A Based on level 1 evidence
(i.e. RCTs)
B Based on level 2 or 3
evidence
C Based on evidence from a
panel of experts
Stats Study designs for new drugs
When a new drug is launched there are a number of options available in terms of
study design. One option is a placebo controlled trial. Whilst this may provide robust
evidence it may be considered unethical if established treatments are available and it
also does not provide a comparison with standard treatments.
If a drug is therefore to be compared to an existing treatment a statistician will need to
decide whether the trial is intended to show superiority, equivalence or noninferiority:
Superiority: whilst this may seem the natural aim of a trial one problem is the
large sample size needed to show a significant benefit over an existing
treatment
Equivalence: an equivalence margin is defined (delta to +delta) on a specified
outcome. If the confidence interval of the difference between the two drugs lies
within the equivalence margin then the drugs may be assumed to have a similar
effect
Noninferiority: similar to equivalence trials, but only the lower confidence
interval needs to lie within the equivalence margin (i.e. delta). Small sample
sizes are needed for these trials. Once a drug has been shown to be non
inferior large studies may be performed to show superiority
It should be remembered that drug companies may not necessarily want to show
superiority over an existing product. If it can be demonstrated that their product is
equivalent or even noninferior then they may compete on price or convenience.
Stats Ttests
A ttest (aka 'Student' ttest) is used to assess whether the means of two groups have
a statistically significant difference.
There are 3 types of ttest:
One sample ttest. This is used to see if there is a difference between a
sample mean and a hypothesised population mean (or claimed mean). For
example. Imagine a farmer was selling bags of fertiliser and claimed that it
would cover 14 acres of land (claimed mean). If you then bought 10 bags of
him (sample of 10) and tested each bag and produced your own mean of 13
acres (sample), you would compare these means with a one sample ttest.
Independent ttest. This test is used when you want to compare to means
from independent groups. For example. A study is conducted to see if vitamin
tablets cause children to grow. 100 children are split into two groups. One
group is given vitamins and the other a placebo. The mean height of each
group is then taken. The means are then compared with an independent ttest.
Paired ttest. This is used when comparing the means of two groups that are
considered to be paired (matched, or dependent). An example would be a
study that first takes the blood pressure from a group of people then gives them
a drug and then repeats the blood pressure measurements. This will produce
two means from the same sample. Another example would be a study that tried
to prove that people had bigger right hands than left hands. A sample would be
taken and the mean of all the right hands and left hands would be taken and
compared. The aspect that makes the variables paired is that the left and right
hands of a single person are taken.
Stats The normal distribution
The normal distribution is also known as the Gaussian distribution or 'bellshaped'
distribution. It describes the spread of many biological and clinical measurements
Properties of the Normal distribution
Symmetrical i.e. Mean = mode = median
68.3% of values lie within 1 SD of the mean
95.4% of values lie within 2 SD of the mean
99.7% of values lie within 3 SD of the mean
This is often reversed, so that within 1.96 SD of the mean lie 95% of the sample
values
The range of the mean (1.96 *SD) to the mean + (1.96 * SD) is called the 95%
confidence interval, i.e. If a repeat sample of 100 observations are taken from
the same group 95 of them would be expected to lie in that range
Standard deviation
The standard deviation (SD) represents the average difference each
observation in a sample lies from the sample mean
SD = square root (variance)
Stats Treatment effects
Absolute risk reduction / increase
The absolute risk reduction (also termed risk difference) is the difference between the
absolute risk of an event in the intervention group and the absolute risk in the control group.
When working with treatment effects such as absolute and relative risk it is helpful to
construct a 2x2 table as follows
Outcome Outcome
present absent
Intervention a b EER
/ exposed = a /
(a +
b)
Control / c d CER
not = c /
exposed (c +
d)
Absolute risk reduction (ARR) is given by the following formula
ARR = CER EER
For example, if we look at a trial comparing the use of olanzapine for psychosis compared to
placebo we get the following results.
Did not
Achieved achieve
remission remission
(outcome (outcome
present) absent)
ARR = CER EER
EER = a / (a + b)
= 60 / 100
= 0.6
CER = c / (c + d)
=20 / 80
= 0.25
ARR = CER EER
= 0.25 0.6
= 0.35 (you can ignore the minus sign as we are interested in the absolute difference)
= 35%
The absolute risk reduction of 35% means that there was a risk difference of 35% between
the two groups.
Relative risk
Relative risk (RR) is the ratio of risk in the intervention (experimental) group to the risk in the
control group.
RR = EER / CER
EER = rate at which events occur in the experimental group
CER = rate at which events occur in the control group
For example, if we look at the same trial
Did not
Achieved achieve
remission remission
(outcome (outcome
present) absent)
RR = EER / CER
EER = a / (a + b)
= 60 / 100
= 0.6
CER = c / (c + d)
=20 / 80
= 0.25
RR = EER / CER
= 0.6 / 0.25
= 2.4
A relative risk of 2.4 means that patients in the intervention group were 2.4 times more likely
to achieve remission than those who received a placebo.
If the risk ratio is > 1 then the rate of an event (in this case experiencing significant pain
relief) is increased compared to controls. It is therefore appropriate to calculate the relative
risk increase if necessary (see below).
If the risk ratio is < 1 then the rate of an event is decreased compared to controls. The
relative risk reduction should therefore be calculated (see below).
Relative risk reduction
The relative risk reduction (RRR) is the proportion by which the intervention reduces the
event rate.
Relative risk reduction (RRR) or relative risk increase (RRI) is calculated by dividing the
absolute risk reduction (risk difference) by the control event rate
RRR = ARR / CER
RRR = (CER EER) / CER
Using the above data
RRI = (CER EER) / CER = (0.6 0.25) / 0.25 = 1.4 = 140%
Numbers needed to treat
The number needed to treat (NNT) is the number of patients who need to treated for one to
get benefit.
The ARR is the reciprocal of the NNT and vice versa
ARR = 1 / NNT
or
NNT = 1 / ARR
For example, using the same trial
Did not
Achieved achieve
remission remission
(outcome (outcome
present) absent)
The ARR = 0.35
The NNT = 1 / ARR
= 1 / 0.35
= 2.9 (1 dp)
Stats Validity
Validity refers to the extent to which something measures what it claims to measure. There are several
different types of validity (see table)
Validity
subtype Description
Face validity Face validity refers to the general
impression of a test. A test has
face validity if it appears to test
what it is meant to
Content Content validity refers to the
validity extent to which a test or measure
assesses the full content of a
subject or area. For example if a
test is designed to help diagnose
depression, it would have poor
content validity if it only asked
about psychological symptoms
and neglected biological ones
Criterion A test has good criterion validity if
validity it is useful for predicting
something
Criterion In concurrent validation, the
validity predictor and criterion data are
(Concurrent)* collected at or about the same
time
Criterion In Predictive validation, the
validity predictor scores are collected first
(Predictive)* and criterion data are collected at
some later/future point.
Construct The extent to which a test
validity measures the construct it aims to
Construct A test has convergent validity if it
validity has a high correlation with another
(Convergent)* test that measures the same
construct
Construct A test's divergent validity is
validity demonstrated through a low
(Divergent)* correlation with a test that
measures a different construct
Internal Internal validity is the confidence
validity that we can place in the cause
and effect relationship in a study
External External validity is the degree to
validity which the conclusions in a study
would hold for other persons in
other places and at other times,
i.e. it ability to generalise
* Note that concurrent and predictive validity are subcategories of criterion validity and that convergent
and divergent validity are subcategories of construct validity.
Stats Variables
The three main variables in a typical study are the, independent, dependent and
controlled variables.
The independent variable is something that the experimenter purposely changes over
the course of the investigation. The dependent variable is the one that is observed
and changes in response to the independent variable. During the experiment all other
variable should be controlled. The variables that are not changed are called controlled
variables.
Dependent variables are affected by independent variables but not by controlled
variables as these do not vary throughout the study.
Study design
The following table highlights the main features of the main types of study:
Study type Key features
Randomised Participants randomly allocated to
controlled intervention or control group (e.g.
trial standard treatment or placebo)
Practical or ethical problems may
limit use
Cohort Observational and prospective. Two
study (or more) are selected according to
their exposure to a particular agent
(e.g. medicine, toxin) and followed
up to see how many develop a
disease or other outcome.
The usual outcome measure is the
relative risk.
Examples include Framingham
Heart Study
Case Observational and retrospective.
control Patients with a particular condition
study (cases) are identified and matched
with controls. Data is then collected
on past exposure to a possible
causal agent for the condition.
The usual outcome measure is the
odds ratio.
Inexpensive, produce quick results
Useful for studying rare conditions
Prone to confounding
Cross Provide a 'snapshot', sometimes
sectional called prevalence studies
survey
Provide weak evidence of cause
and effect
Study design: evidence and recommendations
Levels of evidence
Ia evidence from metaanalysis of randomised controlled trials
Ib evidence from at least one randomised controlled trial
IIa evidence from at least one well designed controlled trial which is not
randomised
IIb evidence from at least one well designed experimental trial
III evidence from case, correlation and comparative studies
IV evidence from a panel of experts
Grading of recommendation
Grade A based on evidence from at least one randomised controlled trial (i.e.
Ia or Ib)
Grade B based on evidence from nonrandomised controlled trials (i.e. IIa, IIb
or III)
Grade C based on evidence from a panel of experts (i.e. IV)
Study design: new drugs
When a new drug is launched there are a number of options available in terms of
study design. One option is a placebo controlled trial. Whilst this may provide robust
evidence it may be considered unethical if established treatments are available and it
also does not provide a comparison with standard treatments.
If a drug is therefore to be compared to an existing treatment a statistician will need to
decide whether the trial is intended to show superiority, equivalence or noninferiority:
superiority: whilst this may seem the natural aim of a trial one problem is the
large sample size needed to show a significant benefit over an existing
treatment
equivalence: an equivalence margin is defined (delta to +delta) on a specified
outcome. If the confidence interval of the difference between the two drugs lies
within the equivalence margin then the drugs may be assumed to have a similar
effect
noninferiority: similar to equivalence trials, but only the lower confidence
interval needs to lie within the equivalence margin (i.e. delta). Small sample
sizes are needed for these trials. Once a drug has been shown to be non
inferior large studies may be performed to show superiority
It should be remembered that drug companies may not necessarily want to show
superiority over an existing product. If it can be demonstrated that their product is
equivalent or even noninferior then they may compete on price or convenience.