Sunteți pe pagina 1din 97

Basic statistics for research

Session 1:

Descriptive Statistics, Confidence Intervals and


Test of Hypothesis

Mizanur Khondoker

Department of Biostatistics
Institute of Psychiatry, King’s College London

29 September 2011 1
Outline

Session 1
1.  Introduction to statistics

2.  Descriptive and inferential statistics

3.  Descriptive Statistics: Measures of centre and variability

4.  Inferential Statistics: Confidence intervals and test of hypothesis

5.  Variable types and measurement scales

6.  Choice of statistical test: Parametric and non-parametric tests

2
Learning outcome

At the end of this session you should be able to:

1.  Distinguish between descriptive and inferential statistics

2.  Understand the concept of confidence interval and test of


hypothesis

3.  Construct confidence intervals and carry out two-sample t-tests on


different data sets

4.  Interpret the results of confidence intervals and two-sample t-tests


5.  Classify variables according to their types, and choose
appropriate statistical test for your data

3
Introduction to statistics

What is statistics?

•  Statistics is a science that deals with the collection, analysis,


interpretation and presentation of numerical data

•  It is applicable to a wide variety of academic disciplines, from the


physical and social sciences to the humanities

•  Statistics is also
•  used for making informed decisions, and
•  misused intentionally or accidentally.

4
Descriptive and Inferential Statistics

•  Descriptive statistics consists of procedures used to


summarise and describe the important characteristics of
a set of measurements

•  Examples of descriptive statistics are:

"  Sample mean: average height of a random sample of participants


of this course is 170 cm

"  Sample proportion: around 45% of a random sample of the UK


working population are female.

5
Descriptive and Inferential Statistics (cont’d.)

•  Inferential statistics consists of procedures used to make inferences


about the population characteristics based on the sample data
•  The objective of inferential statistics is to make inferences, i.e.,

"  Draw conclusions, make predictions, make decisions about the


population characteristics

•  Examples of inferential statistics are:

"  Testing hypothesis: a t-test comparing the hippocampal volumes


between the controls and patients was significant at 5% level

"  Confidence interval: the 95% confidence interval for the mean reduction
(%) in right hippocampal volume was (3.8,13.3).

6
Descriptive statistics

•  Most commonly used descriptive measures for


quantitative data are:

"  Measures of centre: e.g., mean, median

"  Measures of spread/variability: e.g., Inter quartile range,


standard deviation

•  Relatively smaller number of descriptive measures are


available for nominal or categorical data:

"  Proportion of a specified category of a categorical variable (e.g.,


proportion of dementia cases) is the simplest and obvious choice
in most applications
7
Measures of centre
•  Measures of centre are descriptive statistics that give an idea about
the location of a set of measurements

•  Two most commonly used measures of centre are:


"  The mean or average – is defined as the sum of a set of measurements
divided by the number of measurements

"  Mathematically mean (m) for a set of n measurements (xi) is given by:
n
∑ xi
i =1
m=
n
"  The median – is a value that falls in the middle position when the
measurements are ordered from smallest to largest

"  For even number of measurements, there will be two middle values, and
the median is estimated as the average of the two values. 8
Measures of centre (cont’d.)
•  For example, consider a set of alcohol misuse scores measured on
a random sample of n = 5 violent offenders: 2, 9, 11, 5, and 6
•  To find the sample mean we calculate:
2 + 9 + 11 + 5 + 6 33
Mean = ------------------------ = -------- = 6.6
5 5

•  To calculate the median, we rank the 5 measurements from smallest


to largest, and locate the middle value (6)
Mean

Score
2 5 6 9 11

Median
9
Measures of spread/variability
•  Distributions of datasets having the same centre (mean) may look different
because of the way the numbers are spread out from the centre

•  Both data sets


are centred at
mean = 4

•  But there is a big


difference in the
way the
measurements
are spread out or
vary

•  The data in Fig. (a) vary from 3 to 5

•  In Fig. (b) they vary from 0 to 8


10
Measures of spread/variability (cont’d)
•  Two of the most commonly used measures of variability
are the Inter Quartile Range (IQR) and Standard
Deviation

•  Inter Quartile Range (IQR):


"  The Quartiles of a set of data are
three values (Q1, Q2, and Q3) IQR
that divide the distribution into 4
equal parts

"  Each part contains 25% of the


data values, and

"  IQR = Q3 – Q1, the difference


between the third and first
quartiles
Q1 Median=Q2 Q3 11
Measures of spread/variability (cont’d)
•  Standard Deviation: the most commonly used measure of variability
is the variance or standard deviation

•  The variance of a sample of measurements is defined to be the


average of the squared deviations of the measurements about their
mean
n
2
(
∑ ix − m )
•  Mathematical formula for variance is: S 2 = i =1
n −1
where m is the mean and n is the sample size

•  Standard Deviation is obtained by taking square root of variance


and is given by:
S = S2
12
Example: Variance and Standard deviation
•  Consider n = 3 measurements : 2, 3, and 7

•  The mean value = (2+3+7)/3 = 4

•  The variance is given by:

(2 - 4)2 + (3 - 4)2 + (7 - 4)2 14


S2 = ------------------------------ = ---- = 7
3-1 2

•  The standard deviation is just the square root of variance: S = 7 = 2.64

13
Descriptive statistics and estimation
•  Descriptive statistics are actually used to estimate or
represent the unknown value of a population parameter
•  For example:
"  the sample mean (m) of a set of quantitative data can be used to
estimate the unknown population mean (µ)
"  the sample proportion (p) of a certain category can be used to
estimate the unknown population proportion (π)

•  This is also called point estimation, because a single


value is used to estimate the population parameter

•  An alternative is the interval estimation or confidence


intervals
14
Confidence Intervals
•  In point estimation the value of sample statistic from a
single sample is used to estimate the population
parameter
•  Problem: what happens if we take another sample? Or,
more than one samples?
•  Almost certainly, we will get different estimates. Which
one do we believe?
•  So, the motivation of an interval estimate is to give a
plausible range to the population parameter, rather than
estimating by a single value.
•  Such a plausible range (confidence interval), can be
obtained from sampling distribution of the statistic
15
Confidence Interval (cont’d.)
•  The idea behind the interval estimate is to give a range
of values within which the true value of the population
parameter is believed to lie
•  When the sampling distribution of an estimator can be
assumed normal, an approximate 95% confidence
interval for the corresponding population parameter is
given by:

•  Lower limit = estimate - 2×SE


•  Upper limit = estimate + 2×SE

•  Interpretation: If we define confidence intervals in this


way for repeated samples, then 95% of them will
contain the true value of the population parameter (µ)
16
Example: Confidence Interval

•  Difference in means = 1.12, SE = 0.24


•  Lower limit = 1.12 - 2 ×0.24 =0.64, Upper limit = 1.12 + 2 ×0.24 =1.60
•  95% Confidence Interval is: (0.64, 1.60)
•  Interpretation: There is a 95% chance that the true difference in mean
hippocampal volumes lies in this interval
17
The confidence level

•  The confidence level k is the pre-specified proportion


that the interval should overlap the true parameter over
repeated sampling

•  The more confidence is required, i.e. the larger the


confidence level the wider the confidence interval

•  It is conventional to construct 95% confidence intervals


(i.e. k=0.95)

18
Test of Hypothesis

•  Statistical test of hypothesis is used to make decisions (or


inference) about the value of a population parameter

•  A statistical test of hypothesis consists of five parts


1.  The null hypothesis, denoted by H0

2.  The alternative hypothesis, denoted by H1

3.  The test statistic and

4.  The p-value

5.  The conclusion/decision

19
The null and alternative hypotheses
•  A hypothesis is a statement concerning one or more population
parameter(s)
•  It reflects the investigator’s belief about the unknown parameters
•  There are two competing hypotheses in a test problem:
"  the null hypothesis (H0), and
"  the alternative hypothesis (H1)

•  Generally, the investigator’s belief is stated in the alternative


hypothesis
•  The null hypothesis is a contradiction of the alternative hypothesis
•  Or, in other words, what the investigator believes is assumed to be null
in the null hypothesis
•  As a result, the investigator’s intention would generally be to reject the
null hypothesis (accept the alternative hypothesis ⇒ support his/her
belief)
20
The philosophy of a statistical test
•  The reasoning of a statistical test is similar to the process in a court
trial
•  In trying a person for a crime, the court must decide between
innocence and guilt
•  As the trial begins, the accused person is assumed to be innocent (the
null hypothesis)

•  The prosecution collects and presents all available evidence in an


attempt to disprove the innocence hypothesis

•  If there is enough evidence against innocence, the court will reject the
innocence hypothesis and declare the defendant guilty
•  Otherwise the court will find the accused not guilty

21
The philosophy of a statistical test (cont’d.)
•  The same philosophy applies to statistical test
•  Suppose an investigator believes that mean hippocampal volume of
healthy subjects is different from that of AD patients
•  The investigator formulates the null and alternative hypotheses:
•  H0: Mean hippocampal volume is the same in healthy controls and AD
patients (µ1- µ2 = 0)
•  H1: There is a difference between the mean volumes (µ1- µ2 ≠ 0)

•  The investigator carries out a study, and calculates the standardised


size of the observed difference t =(m1– m2)/SE from the sample data –
the test statistic
•  The null hypothesis of no difference is rejected if the observed
difference is significantly large
•  The word “significance” is justified using p-value (to be discussed
next)
22
The p-value
•  A p-value is the probability of obtaining a test statistic as large or larger
than that found in the studied sample assuming that there is no difference
in the underlying population

•  For the test problem in the last slide, the test statistic will be the
standardised observed difference (t-statistic):

m1 − m2 ⎧m → Average volume in the control group


t= , where ⎨ 1
SE (m1 − m2 ) ⎩ m2 → Average volume in the AD group

•  Under the null hypothesis of no


difference (µ1- µ2 = 0), the statistic will
be t-distributed with n1+ n2 -2 degrees of
freedom.
•  Suppose the calculated value of t = tcal
•  What is the p-value?
•  p-value = Pr (|t| ≥ tcal)
•  Probability that t is as or more extreme
23
than that observed (tcal) from the
sample
One tailed and two-tailed tests
•  P-value of a test is calculated from the area under one tail or
both tails of the sampling distribution of the test statistic
•  This depends on the type of the alternative hypothesis

•  For example, consider testing the equality of two population


means
"  Null hypothesis is: µ1 - µ2 = 0

•  The possible alternative hypotheses are:


1.  µ1 - µ2 > 0 (right hand tailed test )
one-tailed test
2.  µ1 - µ2 < 0 (left hand tailed test ), and
3.  µ1 - µ2 ≠ 0 (i.e., µ1 - µ2 > 0 or µ1 - µ2 < 0 ) - two tailed test

24
One tailed and two-tailed tests (cont’d.)
•  We choose right tailed
test when we are
confident that negative
difference can happen
only by chance

•  We choose left tailed test


when we are confident
that positive difference
can happen only by
chance

•  When in doubt about the


direction of the difference,
choose two-tailed test

25
Two-sample t-test
•  A two-sample t-test is generally used to formally compare the means
of two groups or populations
•  Assumptions:
"  Observations are independent of each other
"  The two groups to be compared are independent of each other
"  Population data from which the samples data are drawn are normally
distributed.
"  The variances of the populations to be compared are equal (a modified
version of the test is available for unequal variances)
•  Empirical studies of the t-test have demonstrated that these
assumptions can be violated to some extent without substantial
effect on the results

26
Example: t-test comparing two means
•  Consider the hippocampal volume dataset for 14
Controls and 14 AD patients
•  We want to test the hypothesis:

"  H0: µ1 - µ2 = 0 (there is no difference in mean AD


volume between the groups),
"  H1: µ1 - µ2 ≠ 0 (there is a difference)

•  Appropriate procedure for testing the above


hypothesis (under the normality assumption) will be a
t-test

Control
Data are approximately
normally distributed
(symmetric)

27
Example: t-test comparing two means (cont’d.)
SPSS Output

•  Calculated t-statistic = 4.64


•  P-value <0.001 (significant)
•  Conclusion: There is a significant difference between the mean
hippocampal volumes between the control and AD patients.
•  SPSS also provides 95% CI: (0.62, 1.63)

•  Note: SPSS gives output assuming both equal and unequal


variances. Unequal variances assumption may be more appropriate
28
for this data
Relation between test and Confidence
Interval
•  We can also decide about statistical significance of a hypothesis by
looking at the confidence interval of the associated parameters
•  For example, a test of the hypothesis (µ1 - µ2 = 0) can be carried
out by looking at the confidence interval for the mean difference

•  The test of the hypothesis: µ1 - µ2 = 0

"  Can be rejected at 5% level if the 95% CI


null value (0) is not contained in the
95% confidence interval for (µ1 - µ2), 0
and

"  Can be accepted if the confidence 95% CI


interval contains the null value
0

29
Variable types and measurement scales

Variables

Qualitative Quantitative
(categorical) (quantity/amount)

Nominal Ordinal Discrete Continuous


(any values in
(no ranking) (ranking) (isolated values)
an interval)
Ethnicity: Economic status: Number of offspring: Height, weight, age:
1= white 1= poor (0, 1, 2, 3,…) (any value within range)
2= black 2= middle class
3= Asian 3= Rich

30
Parametric and non-parametric tests

•  Choice of a Statistical test will greatly depend on the type of the


outcome measure and the underlying distribution of the data
•  Statistical tests are broadly classified as
"  Parametric tests (depend on distributional assumption of the outcome
or the effect measure)
"  Non-parametric tests (are not restricted to any distributional
assumption)
•  Parametric tests are based on specific distributional assumptions such
as the normal distribution
"  F and t-tests are examples of parametric test which are based on the
normality assumption
•  Non-parametric tests may also require some assumptions (usually
less restrictive), but do not assume a parametric form for the
distribution of response or the effect measure
"  Mann-Whitney U test and Wilcoxon rank-sum test are examples of
non-parametric test
31
Choice of Statistical test
•  A Flow chart for the choice of test (continuous data)

Normal Parametric
Distribution Tests

Yes
Continuous
Data

Non-normal Transformed to Non-parametric


Distribution normality No tests

32
Choice of Statistical test –comparing two means

Comparing two means

Paired Groups Independent Groups

Difference Assumption Samples from Assumption


scores are not valid: populations with not valid:
normally distributed nonparametric test normal distributions nonparametric test

Wilcoxon Independent Mann-Whitney


Paired
Signed Ranks samples U-test
t-test
Test and others t-test and others

Equal population Unequal population


variances variances

33
References
•  Mendenhall, W., Beaver, R. J. and Beaver, B. M. (2008). Introduction to
Probability and Statistics, Cengage Learning.
•  Agresti, A. and Finlay, b. (2009) Statistical Methods for the Social Sciences
(4th edition), Pearson Prentice Hall.

34
End of Session 1

35
Basic statistics for research

Session 2:

Overview of common statistical procedures

Daniel Stahl
Department of Biostatistics
Institute of Psychiatry, King’s College London

29 September 2011

36
First lecture: Descriptive and inferential statistics
•  Descriptive statistics:
"  summarizing meaningful aspects of your data, such as
measures of locations (mean, median) or spread/variability
(standard deviation, Interquartile range),
"  assessing the distribution of the data: normally distributed?,
outliers?
•  Inferential statistics:
"  are used to draw inferences about a population from a sample in
order to generalize (make assumptions about this wider
population) and/or make predictions about the future.
o  Null-Hypothesis testing
o  Parameter estimation (point estimation, e.g. treatment effect and
precision, e.g. Standard error or confidence interval)

37
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  [Comparing more than two groups]
5.  Association between two variables
6.  Outlook: Regression

38
Learning outcomes

At the end of this session you should be able to:


•  Classify variables according to their scale of measurement
•  Understand the influence of the type of scale on the appropriate
descriptive and inferential statistics
•  Able to choose appropriate statistical tests for common research
problems
•  Understand the software output of common statistical analyses
•  Interpret basic descriptive and inferential statistics analyses
•  Report the results of an analyses

39
1. Scale of data
We get data by measuring something = assign a number to a trait or
event
The way that the numbers are assigned determines the scale of
measurement.
The type of descriptive and inference statistics depends on the scale of
the data!

The height of Mr. X is 185 cm and he weights 76 kg. He


has green eyes and received a “good” in his statistics
course despite looking after his three dogs.
What did we measure?

40
Scale of data
We get data by measuring something = assign a number to a trait or
event
The way that the numbers are assigned determines the scale of
measurement.
The type of descriptive and inference statistics depends on the scale of
the data!

The height of Mr. X is 185 cm and he weights 76 kg. He


has green eyes and received a “good” in his statistics
course despite looking after his three dogs.
Which scales of measurements?

41
•  Gender: Nominal/binary, categorical
•  Eye color: Nominal, categorical
•  Grade: Ordinal/rank, categorical
•  Weight and height: Continuous
•  Number of dogs: Discrete/Count

The type of descriptive and inference statistics depends on


the scale of the data!

42
Descriptive statistics: location and spread
Scale of
measurement

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

Median, lower (25%) Mean and SD


Frequencies and
& upper (75%) quartile, (if roughly
percentages
IQR unimodal distributed)

Age groups
Males 17 (81%) Median: 3 Mean age of males:
Females: 4 (19%) LQ.=1 & UQ.= 5, 35.5 years (SD=5.5)
Min=0, Max=5 43
Inference: Comparing two independent groups
Comparing two
independent groups

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

Chi2 Test of Mann-Whitney U Test Independent


homogeneity (Rank tests) samples t-test

The mean IQ of anorexic girls


The treatment group scored
The prevalence of depressive was significantly higher than
significantly lower on a 7
symptoms was higher in the mean IQ of the control
point depression scale
females (30.3% vs. 20.5%; group (t=2.2,df=44,p=0.04,
than a control group
chi2 = 4.98, 1df, p = 0.026) mean IQ difference = 5
(U=2, n1=n2=6, p=0.01). 44
(95% CI=0.2-10.2))
Data set: Depression and Diabetes
Ismail K, Winkley K, Stahl D, Chalder T, Edmonds M. (2007) A cohort study of
people with diabetes and their first foot ulcer: the role of depression on
mortality. Diabetes Care, 30(6):1473-9.

Objective:
•  The aim of the study was to evaluate over 18 months whether depression
was associated with mortality in people with their first foot ulcer.

RESEARCH DESIGN AND METHODS:


•  A prospective cohort design was used. Adults with their first diabetic foot
ulcer were recruited from foot clinics in southeast London, U.K.
•  At baseline, the Schedules for Clinical Assessment in Neuropsychiatry 2.1
was used to define those who met DSM (Diagnostic and Statistical Manual of
Mental Disorders)-IV criteria for minor and major depressive disorders….
•  The main outcome was mortality 18 months later.
•  The severity of diabetes was assessed using HbA1C or 'glycosylated
haemoglobin‘. It is a measure of how well diabetes is under control. A value
of 7 or less is regarded as good.
45
Data for 20 patients

•  Can you think about any interesting research question?

46
Which type of scale?

Variable Outcome Type of scale?


Depression yes/no
Died yes/no
Size of ulcer cm2
age years
Sex male/female
Social class 1= Professionals and intermediate
2= Skilled
3= partly skilled and unskilled
Ethnicity Caucasian/Afro-Caribbean/African/Asian/Other
Alcohol consumption 0=0 units; 1=1-3 units; 2=4-6 units;3=7-10 units
4=10-15 units; 5=16-20 units, 6 >20 units
Baseline HbA1c (glycated haemoglobin ) ml
HbA1c 12 months follow-up ml
Smoker yes/no

47
Type of scales

Variable Outcome Type of scale?


Depression yes/no nominal
Died yes/no nominal
Size of ulcer cm2 continuous
age years continuous
Sex male/female nominal
Social class 1= Professionals and intermediate ordinal or nominal
2= Skilled
3= partly skilled and unskilled
Ethnicity Caucasian/Afro-Caribbean/African/Asian/Other nominal
Alcohol consumption 0=0 units; 1=1-3 units; 2=4-6 units;3=7-10 units ordinal
4=10-15 units; 5=16-20 units, 6 >20 units
Baseline HbA1c (glycated haemoglobin ) ml continuous
HbA1c 12 months follow-up ml continuous
Smoker yes/no nominal

48
Are there differences in depressed and not-
depressed patients?

49
Are there differences in depressed and not-
depressed patients?

50
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  Comparing more than two groups
5.  Association between two variables
6.  Outlook: Regression

51
Categorical data
•  Are there differences in mortality between depressed and not
depressed patients?
Crosstab

dead in first 18 months


no yes Total
any DSM-IV not depressed Count 151 20 171
depression % within any
88.3% 11.7% 100.0%
DSM-IV depression
depressed Count 62 20 82
% within any
75.6% 24.4% 100.0%
DSM-IV depression
Total Count 213 40 253
% within any
84.2% 15.8% 100.0%
DSM-IV depression

In our sample 11.7% of the not depressed patients (20 out of 171 )
died within 18 months while 24.4% of the depressed patients died
(20 out of 82). 52
Main steps of statistical analysis
•  Define the Null and alternative Hypotheses under study
"  H0: There is no difference/no effect
"  H1: There is a difference
•  Choose a statistical test
•  Collect the data
•  Present descriptive statistics
•  Calculate the test statistics specific to H0
•  Compare the value of the test statistics from a known
probability distribution and obtain a p-value.
•  Reject the Null-Hypothesis if the p-value is very small
(usually<0.05)
•  Estimate the parameter of interest for the true population
and a measure of precision (95% Confidence interval)
53
Chi2 test and Odds ratio
Chi-Square Tests

Asymp. Sig. Exact Sig. Exact Sig.


Value df (2-sided) (2-sided) (1-sided)
Pearson Chi-Square 6.710 b 1 .010
Continuity Correctiona 5.790 1 .016
Likelihood Ratio 6.365 1 .012
Fisher's Exact Test .016 .009
Linear-by-Linear
6.683 1 .010
Association
N of Valid Cases 253
a. Computed only for a 2x2 table
b. 0 cells (.0%) have expected count less than 5. The minimum expected count is 12.
96.
Risk Estimate

95% Confidence
Interval
Value Lower Upper
Odds Ratio for dead in
2.435 1.226 4.840
first 18 months (no / yes)
For cohort any DSM-IV
depression = not 1.418 1.028 1.956
depressed
For cohort any DSM-IV
.582 .400 .846
depression = depressed
N of Valid Cases 253

The Odds ratio is a commonly estimated parameter to describe the


differences between two groups:
Odds Ratio= Odds of group1/Odds of group 2 54
Main steps of statistical analysis
•  Define the Null and alternative Hypotheses under study
"  H0: The proportions of individuals who died are equal in the two populations
"  H1: The proportions of individuals who died are not equal
•  Choose a statistical test:
"  chi2 (χ²) for homogeneity
•  Collect the data
•  Calculate the test statistics specific to H0:
"  chi2=6.71
•  Compare the value of the test statistics from a known probability
distribution and obtain a p-value.
•  The probability to observe a chi2 of 6.71 or larger with 1 degree of
freedom if H0 is true is 0.01.
•  Reject the Null-Hypothesis if the p-value is very small (usually<0.05)
•  We reject the Null-hypothesis and assume that the proportion of people
died is higher among depressed individuals with diabetes
•  Estimate the parameter of interest for the true population and a measure
of precision (95% C.I.)
•  The odds of dying for a depressed individual are 2.435 higher (95%
C.I.=1.22-4.84)
55
Categories with more than two levels

•  Are there differences in social class between depressed and not


depressed patients?
Crosstab

social class 3 groups (newest+final)


social class social class
1 and 2 social class 3 4 and 5 Total
any DSM-IV not depressed Count 60 64 47 171
depression % within any
35.1% 37.4% 27.5% 100.0%
DSM-IV depression
depressed Count 25 30 27 82
% within any
30.5% 36.6% 32.9% 100.0%
DSM-IV depression
Total Count 85 94 74 253
% within any
33.6% 37.2% 29.2% 100.0%
DSM-IV depression

Chi-Square Tests

Asymp. Sig.
Value df (2-sided)
Pearson Chi-Square .921a 2 .631
Likelihood Ratio .916 2 .633
Linear-by-Linear
.888 1 .346
Association
N of Valid Cases 253
a. 0 cells (.0%) have expected count less than 5. The
minimum expected count is 23.98.

•  The proportions of social classes did not differ significantly between


depressed and not depressed individuals (chi2(2 df)=0.921, p=0.631). 56
Rank data
•  Comparing alcohol consumption between depressed and not-
depressed individuals with diabetes
Median Percentile 25 Percentile 75 Minimum Maximum
any DSM-IV not depressed Alcohol consumption 3.00 2.00 4.00 .00 8.00
depression depressed Alcohol consumption 3.00 2.00 4.00 .00 8.00

•  The median score of alcohol consumption was 3 with an IQR of 2


for both groups.
Test Statisticsa

Alcohol
consumption
Mann-Whitney U 6719.500
Wilcoxon W 21425.500
Z -.545
Asymp. Sig. (2-tailed) .586
a. Grouping Variable: any DSM-IV depression

•  A Mann-Whitney U Test showed that there were no significant


differences in alcohol consumption between the two groups
(U=6719.5, n1=171, n2=82, p=0.586)
57
Continuous data
•  Are there differences in HbA1c between depressed and not-depressed
group at baseline?
Group Statistics

Std. Error
any DSM-IV depression N Mean Std. Deviation Mean
HbA1c baseline depressed
79 8.427 2.1848 .2458
(999=missing,
777=missing not depressed
because died) 166 8.378 1.9972 .1550

•  The mean HbA1c level of depressed individuals in our data set was 8.427
(SD=2.185) for depressed individuals and 8.378 (SD=2.00) for not
depressed individuals.
Independent Samples Test

Levene's Test for


Equality of Variances t-test for Equality of Means
95% Confidence
Interval of the
Mean Std. Error Difference
F Sig. t df Sig. (2-tailed) Difference Difference Lower Upper
HbA1c baseline Equal variances
1.751 .187 .174 243 .862 .0489 .2815 -.5056 .6033
(999=missing, assumed
777=missing Equal variances
because died) .168 141.776 .867 .0489 .2906 -.5256 .6233
not assumed

•  A t-test for independent samples for equal variances showed that there were
no significant differences between the two populations. The estimated mean
difference is 0.489 (95% C.I. =-0.53 to 0.62). 58
Assumptions of t-test
and many other parametric tests

Important assumptions of many parametric tests are that


1.  the sampling distribution approximates a normal distribution,
2.  the samples to be compared were drawn from populations
with the approximately the same variance (Homogeneity of
variances) and
3.  the observations are independent.

59
Assessing normal distribution: Histograms
•  Plot a histogram and compare it with a normal distribution
curve for this mean and standard deviation.
30

not depressed
20

any DSM-IV depression


10
Frequency

30

depressed
20

10

0
5.0 7.5 10.0 12.5 15.0
HbA1c baseline

•  Assess always within each group! 60


Shapes of distributions
•  Symmetric:
"   peak is in the middle
"   left is mirror image of right
•  Negatively skewed:
"   peak on the right, tail on the left
•  Positively skewed:
"   peak on the left, tail on the right

Negatively skewed

Symmetric

Positively skewed 61
Assessing homogeneity of variances
•  The populations from which the samples are drawn should have equal
variances.
•  This can be determined by visual inspection of the data, looking at the
spread or standard deviation (SD) of the data.
•  A rule of thumb says that the SD of the group with the larger SD should
not be more than twice as large (Howell 1997).

Report

HbA1c baseline (999=missing, 777=missing because died)


any DSM-IV depression Mean Std. Deviation
not depressed 8.378 1.9972
depressed 8.427 2.1848
Total 8.393 2.0552

•  The SDs of the two groups are very similar (2.2 versus 2.0). We can
assume homogeneity of variances.
•  If the assumption is violated an unequal sample t-test can be used.
62
Remember: Standard deviation = Variance
Non-parametric tests
•  If the assumptions of normal distribution are seriously violated
an alternative non-parametric test can be selected such as
Mann Whitney U test instead of independent t-test and

•  Non-parametric tests do not assume normal distribution but


the same distribution and equal variances!

•  The power of non-parametric tests is usually similar to the


parametric equivalent but you are more restricted if you want
to do more complex statistical analyses.

•  Nonparametric tests are also not robust against violations of


the assumption of independence!

63
Using the Mann Whitney U Test instead of t-test
Are there differences in HbA1c between depressed and not-depressed group at
baseline?
Median Percentile 25 Percentile 75 Minimum Maximum
any DSM-IV not depressed HbA1c baseline 8.2 6.9 9.4 4.5 15.5
depression depressed HbA1c baseline 7.9 6.8 9.7 5.1 14.5

The median HBA1c level was 8.2 (25 Quartile=6.9, 75% Q=9.4)for depressed
individuals and 7.9 (25% Q=6.8, 75% Q.=9.7) for not depressed individuals.

Test Statisticsa

HbA1c
baseline
Mann-Whitney U 6465.500
Wilcoxon W 9625.500
Z -.177
Asymp. Sig. (2-tailed) .860
a. Grouping Variable: any DSM-IV depression

A Mann-Whitney U Test showed that there were no significant differences in


HBA1c levels between the two groups (U=6465.5, n1=166, n2=79, p=0.866).

64
Questions
•  Are there changes over time in HbA1c from baseline to
12 month follow-up?

•  Independent test?

•  Which test would you use?

•  Could you think about potential problems of the


analysis?

65
Changes over times: paired data
•  Are there changes over time in HbA1c?
•  Independent test?
"  The same persons are observed at baseline and 12 months
follow-up. The data are not independent but paired (dependent).
•  Which test would you use?
"  A paired t-test would be appropriate
•  Could you think about potential problems of the
analysis?
"  Some people died during the 12 months. Perhaps they are the
ones who did not improve. This could cause a bias in our test
"  Perhaps depressed and non-depressed patients change
differently.
"  The assumptions of the test may be violated

66
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  Comparing more than two groups
5.  Association between two variables
6.  Outlook: Regression

67
3. Comparing two paired (dependent) groups
e.g. the same person is observed at two occasions
Comparing
two paired groups

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

Paired samples
McNemar Test Wilcoxon Test t-test

There was a significant


There was no significant
There was no significant weight increase in anorexic
change in smoking
change in depression scores girls after treatment
between the first and
in the patient group (t = 4.18, df=16, p< 0.001,
second time point
(z=-1.02, p=0.31, N=140). mean change=7.2 pounds,
(Chi2=2.7, N=50, p=0.1)
68 p)
95% C.I.=4.2-10.4
Compare the change of HbA1c over time
Paired Samples Statistics

Std. Error
Mean N Std. Deviation Mean
Pair HbA1c 12 month
8.192 191 1.8050 .1306
1 follow-up
HbA1c baseline 8.567 191 1.9920 .1441

•  The HbA1c level in our sample (N=191) changed from 8.567 (SD=1.99) at
baseline to 8.192 (SD=1.81) at 12 months follow-up.
Paired Samples Test

Paired Differences
95% Confidence
Interval of the
Std. Error Difference
Mean Std. Deviation Mean Lower Upper t df Sig. (2-tailed)
Pair HbA1c 12 month
1 follow-up - -.3749 2.0352 .1473 -.6654 -.0844 -2.546 190 .012
HbA1c baseline

•  A t-test for paired data showed that there is a significant decrease of


HbA1c levels over time in our population of individuals with diabetes
(t(190)=-2.55, p=0.012, mean change=- 0.375 (95% C.I. -0.66 to -0.084)).69
Assumptions of paired t-test
•  If data are paired the difference score (12months follow up –
baseline) should be roughly normal distributed!
60

50

40
Frequency

30

20

10

Mean =-0.37
Std. Dev. =2.035
N =191

0
-10.00 -5.00 0.00 5.00 10.00
diff_HbA1c

•  If the assumptions of normal distribution are violated the Wilcoxon


signed-rank test can be used.
70
Wilcoxon signed-rank test
Ranks

N Mean Rank Sum of Ranks


HbA1c 12 month Negative Ranks 101a 97.58 9855.50
follow-up - Positive Ranks 79 b 81.45 6434.50
HbA1c baseline Ties 11 c
Total 191
a. HbA1c 12 month follow-up < HbA1c baseline
b. HbA1c 12 month follow-up > HbA1c baseline
c. HbA1c 12 month follow-up = HbA1c baseline

Test Statisticsb

HbA1c 12
month
follow-up -
HbA1c
baseline
Z -2.444a
Asymp. Sig. (2-tailed) .015
a. Based on positive ranks.
b. Wilcoxon Signed Ranks Test

•  A Wilcoxon signed-rank test showed that there is a


significant decrease from baseline to 12 months follow-up
(z=-2.444, N=191, p=0.015).
71
Categorical dependent data
•  Is there a reduction in smoking between baseline and 12 months follow-up?
smoker * smoker after 12 months (fake data) Crosstabulation

smoker after 12 months


(fake data)
non-smoker smoker Total
smoker non-smoker Count 208 5 213
% within smoker 97.7% 2.3% 100.0%
smoker Count 16 24 40
% within smoker 40.0% 60.0% 100.0%
Total Count 224 29 253
% within smoker 88.5% 11.5% 100.0%

•  In our sample 5 out of 213 (2.3%) non-smoker were smoking after 12


months while 16 out of 40 (40%) smoker did not smoke anymore.
Chi-Square Tests

Exact Sig. Exact Sig. Point


Value (2-sided) (1-sided) Probability
McNemar Test .027a .013a .010a
N of Valid Cases 253
a. Binomial distribution used.

•  The McNemar test determines whether the probability of a change is the


same for smoker and non-smoker. The test shows that significantly more
people changed from smoke to non-smoker than vice-versa (Exact
McNemar Test, N=253, p=0.027). 72
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  Comparing more than two groups
5.  Association between two variables
6.  Outlook: Regression

73
4. Comparing more than 2 groups
•  Are there differences in alcohol consumption or HbA1c levels
between ethnic groups: Caucasian, African and Afro-
Caribbean?
Estimates

•  HbA1c Dependent Variable: HbA1c baseline


95% Confidence Interval
ethnicity Mean Std. Error Lower Bound Upper Bound
Caucasian 8.159 .153 7.858 8.460
African 9.687 .525 8.653 10.721
Afro-Caribbean 9.100 .321 8.467 9.733

Median Percentile 25 Percentile 75 Minimum Maximum


•  Alcohol ethnicity Caucasian Alcohol consumption 3.00 2.00 4.00 .00 8.00
African Alcohol consumption 3.00 2.25 4.75 1.00 5.00
Afro-Caribbean Alcohol consumption 3.00 2.00 4.00 .00 8.00

ethnicity * dead in first 18 months Crosstabulation

dead in first 18 months


no yes Total
ethnicity Caucasian Count 152 30 182
% within ethnicity 83.5% 16.5% 100.0%
•  Died African Count 15 1 16
% within ethnicity 93.8% 6.3% 100.0%
Afro-Caribbean Count 36 6 42
% within ethnicity 85.7% 14.3% 100.0%
Total Count 203 37 240 74
% within ethnicity 84.6% 15.4% 100.0%
Analysis of variance (ANOVA)
•  ANOVA is closely related to Student's t-test, but whereas
the t-test is only suitable for comparing two treatment
means the ANOVA can be used for comparing the
means of more than two groups, for example:
"  between ethnic groups: Caucasian, African and afro-Caribbean
"  severely depressed, mildly depressed and not depressed
patients
"  Baseline, 12 months follow-up and 18 months follow-up
(repeated measurement ANOVA)

•  ANOVA can be also used in more complex situations


where we have more than one factor, for example
"  type of depression and gender
"  type of depression and time (baseline and 12 months follow up).

75
One-way ANOVA for continuous data
•  The ANOVA uses an F test to determine whether there
exists a significant difference between group means.
•  When the F test rejects the null hypothesis, we know
that at least two groups are different from each other.
•  Usually we want to know which groups are different from
each other doing pairwise comparisons (using a t-
statistics).
•  If we compare more than 3 groups we need to adjust for
multiple testing using Tukey or Bonferroni adjustments to
reduce the risk of obtaining a false positive result.

76
One-way ANOVA
•  Example comparing HbA1c levels between three ethnic
groups
Tests of Between-Subjects Effects

Dependent Variable: HbA1c baseline


Type III Sum
Source of Squares df Mean Square F Sig.
Corrected Model 54.643a 2 27.321 6.614 .002
Intercept 7460.777 1 7460.777 1806.223 .000
ethnic 54.643 2 27.321 6.614 .002
Error 945.906 229 4.131
Total 17447.840 232
Corrected Total 1000.549 231
a. R Squared = .055 (Adjusted R Squared = .046)

•  The overall F-test suggests that there are significant


differences between the ethnic groups (F(2,229)=6.614,
p=0.002).

77
Pairwise comparisons
Estimates

Dependent Variable: HbA1c baseline


95% Confidence Interval
ethnicity Mean Std. Error Lower Bound Upper Bound
Caucasian 8.159 .153 7.858 8.460
African 9.687 .525 8.653 10.721
Afro-Caribbean 9.100 .321 8.467 9.733
Pairwise Comparisons

Dependent Variable: HbA1c baseline

Mean 95% Confidence Interval for


a
Difference Difference
a
(I) ethnicity (J) ethnicity (I-J) Std. Error Sig. Lower Bound Upper Bound
Caucasian African -1.528* .547 .017 -2.846 -.210
Afro-Caribbean -.941* .356 .026 -1.799 -.083
African Caucasian 1.528* .547 .017 .210 2.846
Afro-Caribbean .587 .615 1.000 -.897 2.071
Afro-Caribbean Caucasian .941* .356 .026 .083 1.799
African -.587 .615 1.000 -2.071 .897
Based on estimated marginal means
*. The mean difference is significant at the .05 level.
a. Adjustment for multiple comparisons: Bonferroni.

Pairwise comparisons using Bonferroni correction for multiple testing shows


that Caucasians have significant lower levels of HbA1c levels compared
with Africans (mean difference: -1.53 (SE=0.55) , p= 0.017) or Afro-
Caribbean (mean difference -0.94 (SE=0.36), p=0.026). There was no
evidence for a difference between Africans and Afro-Caribbean (mean
78
difference: 0.59 (SE=0.62), p<1).
Comparing more than two independent groups
Comparing more than
two independent groups

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

Chi2 Test of Kruskall-Wallis-Test One-way Analysis


homogeneity (Rank tests) of variance (ANOVA)

If significant perform If significant perform If significant perform pairwise


pairwise comparison using pairwise comparison using comparisons using Tukey or
2x2 chi2 tests Mann-Whitney U Tests Bonferroni correction
(and Bonferoni correction) (and Bonferoni correction) for multiple testing
79
Repeated or dependent measurements
•  If the same participants are observed at more than two
occasions such as measuring HbA1c levels at baseline,
12 months and 18 months follow-up a repeated
measurement ANOVA (or an equivalent nonparametric
alternative) should be used.

80
Comparing more that two paired samples
Repeated measurement ANOVA
Comparing more than
two paired groups

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

One-way repeated
Cochran’s Q Friedman Test measurement
Analysis of variance

If significant perform If significant perform If significant perform pairwise


pairwise comparison using pairwise comparison using comparisons using
2x2 McNemar tests Wilcoxon tests Bonferroni correction
(and Bonferoni correction) (and Bonferoni correction) 81
for multiple testing
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  Comparing more than two groups
5.  Association between two variables
6.  Outlook: Regression

82
Association between two variables
•  A correlation describes the strength and direction of a relationship
between two random variables.
•  The two variables are measured on each of the n individuals, e.g.
height and age of infants, age and HbA1c level
•  A scatter plot allows us to visually assess a relationship between the
two variables:
•  Is there a relationship between age of infant and height?
84.0

82.0
Height in cm

80.0

78.0

76.0

18 20 22 24 26 28 30
83
age in months
Association between two variables
•  There is a linear relationship between two variables if a
straight line drawn through the midst of the data points
provide a good approximation to the observed
relationship.

84
Pearson correlation
•  The strength and direction of a linear relationship between two
continuous random variables is measured by Pearson product
correlation coefficient:
•  Denoted by r (sample) and ρ (rho) (Population)
•  Ranges fromi -1 (perfect negative) to +1 (perfect positive correlation)
•  Value of 0 indicates no correlation
•  Value does not depend on measurement units
•  r2 is the percentage of variance in variable 1 which is explained by
variable 2 (or vice versa)

•  There is a significant positive correlation


between age and height:
r=0.99, N=12, p<0.0001

85
86
Relationship between age and HbA1c
15.0

12.5

HbA1c baseline 10.0

7.5

R Sq Linear = 0.022
5.0

20.0 40.0 60.0 80.0 100.0


age

Correlations

HbA1c
age baseline
age Pearson Correlation 1 -.148*
Sig. (2-tailed) .020
N 253 245
HbA1c baseline Pearson Correlation -.148* 1
Sig. (2-tailed) .020
N 245 245
*. Correlation is significant at the 0.05 level (2-tailed).

•  There is a small negative relationship between age and HbA1c


87
baseline levels (r=- 0.15, N=245, p=0.02)
Pearson’s correlation: Assumptions
•  Measured on continuous scale
•  Linear relationship between the two variables
•  Binormal distributed

•  Anytime you report a correlation, you should examine the


scatterplot between the reported two variables to check
"   for influential outliers
"   for a linear relationship
"   for binormal distribution

•  If you violate the assumptions of normality and linearity or data


are measured on a rank scale you should consider a robust
alternative:
•  e.g. Spearman correlation

88
Spearman correlation
•  is a non-parametric measure of statistical dependence
between two variables
•  transforms the data into ranks
•  Ranks can be assigned to outcomes from continuous and
ordinal scales
•  is simply a Pearson’s correlation with ranked data
•  often denoted by the Greek letter ρ (rho) or as rs, The
coefficient is more robust against outliers.
•  The coefficient is sensitive to any monotonic relationship, not
just linear ones.

89
Relationship between age and HbA1c
Spearman’s correlation
Correlations

HbA1c
age baseline
Spearman's rho age Correlation Coefficient 1.000 -.170**
Sig. (2-tailed) . .008
N 253 245
HbA1c baseline Correlation Coefficient -.170** 1.000
Sig. (2-tailed) .008 .
N 245 245
**. Correlation is significant at the 0.01 level (2-tailed).

•  There is a small negative relationship between age


and HbA1c baseline levels (rs=- 0.17, N=245,
p=0.008)

90
Rank data: Alcohol consumption versus HbA1c
15.0

12.5

HbA1c baseline
10.0

7.5

R Sq Linear = 6.72E-5
5.0

0.00 2.00 4.00 6.00 8.00

Correlations
Alcohol consumption

HbA1c Alcohol
baseline consumption
Spearman's rho HbA1c baseline Correlation Coefficient 1.000 .022
Sig. (2-tailed) . .730
N 245 245
Alcohol consumption Correlation Coefficient .022 1.000
Sig. (2-tailed) .730 .
N 245 253

•  There is no evidence of a relationship between alcohol


consumption and HbA1c at baseline (Spearman’s rs=0.022,
N=245, p=0.73). 91
Summary: Association between two variables

Association between
two variables

Nominal Ordinal Quantitative:


(categorical without (categorical with Continuous
ranking) ranking) (and discrete)

Phi or Pearson’s
Pearson’s
contingency coefficient Spearman’s Rank
correlation
(more common: correlation
Odds ratio) 92
Outline
1.  Scale of data and descriptive and inferential statistics
2.  Comparing two independent groups
3.  Comparing two paired (dependent) groups
4.  Comparing more than two groups
5.  Association between two variables
6.  Outlook: Regression

93
6. Outlook: Correlation and regression

94
6. Outlook: Correlation and regression
•  Correlation quantifies the strength of a relationship between two
variables but does not describe the relationship.
•  It does not inform about the change in one variable change for each
unit increase in the other!
•  Correlation does not assume a causal relationship between the two
variables.
•  If we assume that one variable y is dependent on another variable x
(causal relationship) we can use linear regression to describe the
effect of x on y.
•  Regression can be extended to include several independent
variables, for example:
"   HbA1c at baseline on age and alcohol consumption
•  or even a mix of categorical and continuous variables:
"   HbA1c at baseline on age, alcohol consumption, sex and
depression.

95
Literature
•  Book chapter: Stahl, D. & Leese, M. (2010) Research Methods and Statistics. In: Psychiatry
- An Evidence-Based Text for the MRCPsych Students. Hodder Arnold
Final proof copy in: R:\Applications\Courses\MSc book chapter

•  Agresti, A. and Finlay, b. (2009)Statistical Methods for the Social Sciences (4th edition),
Pearson Prentice Hall.

•  Other books:
"   Mendenhall, W., Beaver, R. J. & Beaver, B. M. (2008). Introduction to Probability and Statistics,
Cengage Learning.

"  Peat, J & Barton, B. (2005) Medical Statistics: A Guide to Data Analysis and Critical Appraisal.
Mass.: Blackwell Publishing Ltd.

"  Campbell, M.J., Machin, d. & Walters, S.J. (2007) Medical Statistics: A Textbook for the Health
Sciences (Medical Statistics), 4th ed. Chichester: Wiley-Blackwell.

"  Petrie, A. & Sabin, C. (2009) Medical statistics at a glance 3rd ed. Oxford: Blackwell.

•  Software:
"   Kinnear, P.R. & Gray, C.D. (2010) SPSS 18 made simple. London: Psychology Press.

"  Andy Field’s web page at http://www.statisticshell.com/ provides a good resource for ANOVA (one
way, repeated, mixed, two way, posthoc comparisons) and regression and shows screenshots of
how to do the analyses in SPSS.

"  Koehler, U. & Kreuter, F. (2010) Data analysis using STATA. College Station: STATA Press 96
End of Session 2

Biostatistics Advisory Service Help Desk:


Book online at
http://www.iop.kcl.ac.uk/departments/?locator=334

Thank you!

97

S-ar putea să vă placă și