Statistics in Research

MEDICAL BIOSTATISTICS
P.G.Student: Dr. Shwetali Gholap

Guide: Dr.Manjeet Santre Sir
Moderator: Dr. Netto Sir
Content
Descriptive statistics
Measures of central tendency
Choosing sample
Concept of probability
Comparing groups
Parametric & Non-parametric tests
Confidence Interval
Why to Learn about Statistics?
• Several studies have reported the error rate in reporting
and/or interpreting statistics in the medical literature is
between 30-90% (Novak et al., 2006).
• Understanding basic statistical concepts will allow us
• A more critical consumer of the medical literature
• To produce better research and make better clinical decisions.
Definitions
• Biostatistics: the application of statistical methods to biological
topics.
Experimental design
Data collection
Analysis
Interpretation.
A broad division of statistics
• Descriptive statistics: involves the summarizing of data.
• Inferential statistics: involves drawing conclusions or

inferences from data, often by examining the relationship
between variables or samples.
• Variable =In any given study, you are trying to measure (or
evaluate) certain elements that change value depending on
certain factors
• Independent
Problem Flow Chart
Independent Variables
Ethnicity Marital Status
Suicidal Tendencies
Dependent Variable ©drtamil@gmail.com 2012

Scales Assumptions
Nominal Named categories(Blood groups)
Ordinal Ordered categories(Severity)
Interval Equal intervals(1993-94 &2001-02)
Ratio Meaningful zero(kelvin scale of temperature)

Descriptive statistics
• 1. Measures of central tendency – that is, the “average” value
in a group.
2. Measures of variability – how much the individual values

vary from this “central point”.
Measures of central tendency
Mean Median Mode
• Arithmetic • Central • Most

mean value recurring
Measures Of variability
• Range-difference between highest and lowest levels
• Variance- is a measure of how spread out a data set is. It is
calculated as the average squared deviation of each number
from the mean of a data set
• Standard deviation-square root of variance
Choosing a sample
• In this case, the entire group of patients with
depression, who would be eligible to receive
CBT, would be the population.
• If we were to select a group of, say, 100
patients from among them, and study their
depression scores or offer them CBT, this would
be a sample.
Normal distribution
• Many values in populations are expected to
follow a certain type of distribution, in which
individual values are distributed equally and
symmetrically around the mean.
• This is known as a normal distribution, or a bell
curve because of its appearance when charted.
The bell curve
Properties of the normal
distribution
• Symmetrical distribution about the mean
(skewness = 0)
• Mean = median = mode
• One standard deviation on either side
encompasses 34% of values. Thus the area
bounded by ±1 S.D is 68%.
• Similarly, 2 S.D on either side encompasses
around 47.8% of values, and the area bounded
by ± 2 S.D is around 95.5%.
• 95% of the values lie approximately between
±1.96 of the S.D.
• The tails of the curve are asymptotic – that is,
they keep coming close to the x-axis but never
touch it.
Importance of the normal
distribution
• Many population values in psychiatry are
approximately normally distributed (for
example, IQ)
• Normal distribution lends itself to statistical
analysis by powerful methods known as
parametric tests.
“Abnormal” distributions
• Not all values may be normally distributed in a
population. For example, there are more
people of younger ages in India, and more
elderly people in England, so the distribution of
ages is not normal.
• Such distributions are known as skewed
distributions.
The concept of probability
• A probability value, or p value, is a measure of
how likely a given finding is due to chance.
• Thus, p = 0.02 means that the event is unlikely
to have occurred by chance.
• The size of this “unlikeliness” is 1-0.02, or 0.98
(98%).
Reporting the results of a trial:
• “Yoga therapy was compared to swimming
therapy in 80 patients with depression. Mean
decreases in depression scores were 9 in the
yoga therapy group and 6.8 in the swimming
therapy group. This was highly significant at p =
0.01. The authors suggest that yoga therapy be
considered as a useful modality in depression.”
Statistical significance
• In this example, the p value is a measure of
statistical significance – that is, how likely it is
that a given finding is simply “due to chance”.
In our example, it is 99% likely that yoga was
better than swimming, and only 1% likely that
it was not.
• The lower the p value, the more likely the
results of a study are to be replicated by further
researchers.
Magic values?
• Fisher (1912) suggested a value of p=0.05 (95%
confidence, or 95% significance) for research in
biostatistics.
• However, this value merely indicates high
significance and has no intrinsic special
properties.
• Sometimes, p=0.01 is used.
Comparing groups
• This is probably one of the commonest tasks in
psychiatric research.
• For example, we may want to compare scores
on a mania rating scale in two groups, one who
received lithium, and the other who received
valproate, to decide “who did better”.
Tests used in comparing groups
• Most commonly used tests (parametric tests)
assume that data is normally distributed.
• T test: compares a continuous variable
between two groups, or in the same group at
two points of time.
• ANOVA: compares a continuous variable across
more than two groups
Parametric Non-parametric
Assumed distribution Normal Any
Assumed variance Homogeneous Any
Typical data Ratio or Interval Ordinal or Nominal

Example: Height in inches: 72, Example: Male-female
60.5, 54.7
Data set relationships Independent Any
Usual central measure Mean Median

The t test
• The t test is an estimate of the difference in
means between a variable in two groups.
• The t-test assumes that the values are normally
distributed, and that variability in both groups
is equal.
• If data is not normally distributed, data is
converted into ranks, and the Mann-Whitney U
test (nonparametric) is used.
Student’s T-Test
 To compare the means of two independent groups.
For example; comparing the mean Hb between
cases and controls. 2 variables are involved here,
one quantitative (i.e. Hb) and the other a
dichotomous qualitative variable (i.e. case/control).

©drtamil@gmail.com 2012
Examples: Student’s t-
test
 Comparing the level of blood cholestrol
(mg/dL) between the hypertensive and
normotensive.
 Comparing the HAMD score of two groups of
psychiatric patients treated with two different
types of drugs (i.e.
Fluoxetine & Sertraline
Example – compare cholesterol
level
 Hypertensive :  Normal :
Mean : 214.92 Mean : 182.19
s.d. : 39.22 s.d. : 37.26
n : 64 n : 36
• Comparing the cholesterol level betweenhypertensive & normal
patients.
• H0 : There is no difference of cholesterol level between hypertensive
and normal patients
• p < 0.05, null hypothesis rejected.There is a significant difference of
cholesterol level between hypertensive and normal patients.
• Hypertensive patients have a significantly higher cholesterol level
Paired T-test
1. Paired samples t-test for two related

samples (group A before / after therapy)
 Comparing the HAMD scorebetween week 0
and week 6 of treatment with Sertraline for a
group of psychiatric patients.
ANOVA
• When comparing more than 2 groups, the t-
test is not useful or practical.
• In this case, analysis of variance (ANOVA) is
used.
• Like the t-test, the ANOVA is parametric. If data
are not normally distributed, the Kruskal-Wallis
test can be used after ranking the data.
Principle of the ANOVA
• The ANOVA compares the variability between
and within groups, using the variance as a
measure.
• ANOVA yields a value called the F value.
• F = (between group variance) divided by
(within group variance)
Nonparametrics: the chi-square test
• Qualitative data
• Based on Frequencies
• All variables -independent
Application of chi square
• Goodness of fit of distribution-how well does the assumed
theoretical distribution fit to the observed data
• Test of independence attributes-to explain whether or not two
attributes are associated
• Test of homogenity-to test whether the occurance of events
follow uniformity or not
CORRELATION/CORRELATION ANALYSIS
When we going to finding a relationship (if it exist)

between the two variables (bivariate) under study
TOOL
WE Correlation
USE
Method and techniques used for

studying and measuring the Correlation
extent of the relationship
between two variables
Analysis
2
FIRST TO UNDERSTAND TERM
BIVARIATE
Example of bivariate distribution S. Height of Flower on
No. plant plant
will clear your concept:
1 4 12
In class 60 students
2 3 10
3 4 13
Obtained marks in two 4 5 15
subject by all of them 5 5 16
6 4 11
In field 10 plants 7 6 18
8 3 9
9 5 14
Height and flower
10 4 12
3
20
18
16
14
12
10
8
6
4
2
0
0 2 4 6 8 10 12
Hight of plant Flower on plant
4
TYPES OF
CORRELATION
Analytical Graphical
Positive Linear
Non-
Negative
linear
5
POSITIVE
CORRELATION
Proceeding goes in a single direction:
e.g.
Turbidity in a culture and OD
Concentration of Antibiotic and Zone of clearance
NEGATIVE
CORRELATION
Proceeding goes in a diverse/different direction:
e.g.
Volume and Pressure of gas Demand of grain
and Price
6
LINEAR CORRELATION
This correlation is categorized based upon the graphical
representation:
The correlation gives a linear straight graph
representation says a linear correlation.
Change in one unit of one variable result in the corresponding
change in the other variable over the entire range of value:
• e.g.
X 2 4 6 8 10
Y 7 13 19 25 31
7
35 Linear Correlation
30
25
Graph
20
X
15 Y
10
5
0
1 2 3 4 5
9
NON-LINEAR
CORRELATION
Relation between two non-linear if corresponding to a unit change in one
variable, the other variable does not change at a constant rate.
But, change at fluctuating rate, So graph will not get a straight line
10
Non-Linear Correlation Graph
35
30
25
20
X
15 Y
10
5
0
1 2 3 4 5
11
REGRESSION
If the two are significantly correlated and if there is some theoretical basis for
doing so, it is possible to predict value of one variable from the other. This
method to analyze so is called the Regression Analysis.
“Estimation or prediction of the unknown value of the
variable from the known value of the other variable.
M. M. Blair has addressed that “ regression analysis is mathematical measure of

the average relationship between two or more variables in terms of the original
unit of the data.
16
References
Research methodology for health Professionals..RC Goyal

Medical statistics made easy..M.Harris and G.Taylor
CME Review article…Lyndon Mansfield,MD
THANK YOU…!!

Statistics in Research

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Statistics in Research

Încărcat de

Drepturi de autor:

Formate disponibile

MEDICAL BIOSTATISTICS

P.G.Student: Dr. Shwetali Gholap

• Inferential statistics: involves drawing conclusions or

Ethnicity Marital Status

Dependent Variable ©drtamil@gmail.com 2012

Nominal Named categories(Blood groups)

Ordinal Ordered categories(Severity)

Interval Equal intervals(1993-94 &2001-02)

Ratio Meaningful zero(kelvin scale of temperature)

2. Measures of variability – how much the individual values

Mean Median Mode

• Arithmetic • Central • Most

Assumed distribution Normal Any

Assumed variance Homogeneous Any

Typical data Ratio or Interval Ordinal or Nominal

Data set relationships Independent Any

Usual central measure Mean Median

1. Paired samples t-test for two related

When we going to finding a relationship (if it exist)

Method and techniques used for

M. M. Blair has addressed that “ regression analysis is mathematical measure of

Research methodology for health Professionals..RC Goyal

S-ar putea să vă placă și