Sunteți pe pagina 1din 69

Medical Statistics

It is an applied science that provides a


scientific framework for the collection,
summarization, analysis, presentation and
interpretation of medical/health data.
Need for Statistics in Medicine
Ever present variation in health
measurements. Statistical tools are used to
sort out these variations.
Sources of variation in
measurements
Inherent Biological differences
Instruments used
Observers
Subjects
Environment
Note that there are variations in the :

Physiological
Anatomical
Biochemical and
Physical Characteristics of individuals
Statistics helps to sort out and explain these
variations
Type of variations
Random (inherent) cannot be controlled
Systematic (measurements) can be
controlled
Measurements in Health
Expression of any health/medcal
phenomenon in numbers or categories- to
give size or extent or magnitude
Impressions
How to measure
Use of instrument
Questionnaire
Lab equip
Senses
Clinical equip
Level of Measurements
Nominal
Ordinal
Interval
Ratio
What is data?
Information
Types of Data:
Quantitative data
Qualitative/categorical data
Sources of data
Routine Sources
Ad-hoc sources
Statistical Tools
Descriptive statistics
Inferential Statistics
Descriptive Statistics
Statistical Methods that deal with description of
characteristics(s) of a finite population
Methods of Descriptive Statistics

Frequency Tables
Diagrams (Graphs/charts)
Summary Indices
Frequency Tables

Arrangement of data by rows & columns


Qualities
Simple Information (not more than 3 variables)
Clear title to indicate what? when? where?
Good labeling of rows & columns
Indicate units of measurements
Row, column & grand total to add up
Diagrams

Quantitative or numerical Data


Histogram, Frequency Polygon
Qualitative or categorical data
Bar , Pie chart
SUMMARY INDICES
Measures of Central Tendency
Measures of Dispersion
Measures of Central Tendency

Arithmetic Mean
Median
Mode
Measures of Dispersion

Range
Interquartile range
Variance
Standard deviation
Coefficient of variation
Percentiles
Quantiles
1. Frequency Distribution Table.
- Useful to summarize data.
- Has two main columns.
- Column 1 lists all values of the variable.
- Column 2 the frequency at which each value occurs.
- For initial data exploration
2. Qualitative Variables - Frequency Tables.
- Ist column: different categories of the variable
(mutually exclusive).
- 2nd column: frequency or count with which each
category occurred.
1. Frequency distribution of continuous variables.
- The variable is quantitative.
- Need to put values into subgroups or classes.
- Find lowest value and highest value.
- Substract lower value from highest value to get the
range.
- Divide the range by number of classes or subgroups
to get the class - width.
- Class limits must not overlap.
- Intervls better to be equal.
- Number of subgroups or classes between 6 and 8.
- May apply struge's rule: k = 1 + 3.3 log10n (n =
sample size).
Example 5 - 9 NOT 5 - 10
10 - 14 10 - 15
15 - 19 15 - 20
20 - 24 20 - 25
Other columns of a frequency Table
(One) Relative Frequency
- Proportion of total observations ascribed to
that value
- Divide frequency in the class interval by total
observation.
(Two) Cumulative Frequency
- Proportion of total observations with certain
value or less.
- Must correspond to end of class interval.
Add up relative frequencies to preceding values.
Graphical Presentation
1. Need for
- To aid the eye
- Diagram make better visual impressions than
numbers.
2. General format.
- Plotted on rectangular co-ordinate axes at right
angles to each other (X and Y).
- Horizontal line (independent variable).
- Vertical line (dependent variable).
- Must have clear title.
- Label axes and add units.
1. Type of Diagram
- Depend on type of variable
- Qualitative or quantitative
2. Qualitative Variables
- Bar chart
- Pie chart
- Pictogram
1. Bar Chart
- Slender rectangles to represent frequency of values
of variable.
- Rectangles are separate and distinct
- Height of rectangle correspond to frequency
Example: The following are reasons given by some physicians in
Riyadh for not smoking; 1409H.
Reasons Frequency
Health 25
Religious 15
Social 12
Profession 5
Others 3

.
BAR CHART
25

20

15

Frequencies
10

0
Health Religious Social Profession Others
Pie Chart
- Use to show the components of a total
- More intelligent visual impressions
sometimes.
- Draw a big circle to represent total
observation.
- Divide circle into sectors according to the
frequency of each attribute.
- Use (n/N) x 3600 to represent each sector.
- Shade sectors in different colours to
distinguish.
8. Example the last example on Bar Chart.
- Total physicians is 60.
- Corresponding degrees in Pie Chart are
Reasons Frequency (n/N) x 360 Degrees
Health 25 (25/60) x 3600 1500
Religion 15 (15/60) x 3600 900
Social 12 12/60 x 3600 300
Profession 5 5/60 x 3600 300
Others 5 3/60 x 3600 190
PIE CHART
5%
8%

Health
42%
Religious
20%
Social
Profession
Others

25%
1. Quantitative Variable
- Use histogram
- Frequency polygon
2. Histogram
- Used to show data on interval or continuous
variables
- Slender rectangles adjoin each other
- Convey area the histogram
Give appropriate title, and labelling of the axes.
1. Example:
Represent the data on the age distribution of adult admissions
into UCH between 1985 and 1991.
Age (years) Frequency
10-19 1697
20-29 2787
30-39 2390
40-49 2445
50-59 2377
60-69 1989
70-79 1514
HISTOGRAM
3000

2500

2000

1500
Age(years
1000

500

0
'10-19 '20-29 '30-39 '40-49 '50-59 '60-69 '70-79
Arithmetic Mean
Most useful measure of central
tendency.
Not good when data is skewed.
Calculation (2 steps).
Add all observations.
Divide by number of observations.
Mean = Sum of all observations
No of observations

-
X = xi
n
Example:
Find the mean
Age of the first 10 1st year clinical students in
U.I 23, 19, 21, 20, 23, 21, 22, 24, 22, 22

x= 23 + 19 + 21 + 20 + 23 + 21 + 22 + 24 + 22 + 22
10

= 21.7 years
Median
Best measure of central tendency when
data is skewed.
Calculate (ungrouped data).
Arrange observations in ascending or
descending order. (2 steps)
Pick observation in the middle as median.
Note: If number of observations is even,
take mean of two middle observation.
EXAMPLE ON MEDIAN
Find the Median of the first 10 1st year clinical
students in U.I 23, 19, 21, 20, 23, 21, 22, 24,
22, 22

Step 1: Arrange in Ascending Order


19, 20,21,21,22,22,22,23,23,24
Step 2: Pick the middle observation
(22+22)/2
= 22
MODE

Least used measure of central


tendency.
The observation that occurs most
frequently.
Example on Mode

In the age of 10 medical students


23, 19, 21, 20, 23, 21, 22, 24,
22, 22
The mode is 22
PROCEDURE FOR CALCULATING
MEAN (Grouped data)
Find class-mid mark for each interval.
Multiply class mid-mark in each interval
by corresponding frequencies.
Add results in (ii) across all intervals.
Divide results in (iii) by number of
observations or total frequency.
Mean - (Grouped Data)
Example:
Marks of students in practical 1 1403H.
Marks Frequency Class mid mark f I xI
60-64 10 62.5 10*62.5
65-69 14 67.5 14*67.5
70-74 12 72.5 12*72.5
75-79 20 77.5 20*77.5
80-84 10 82.5 10*82.5
85-89 14 87.5 14*87.5
-
f = 80 fi xi = 6040

x = fi xI
fi
x = f i xI
fi

x = 6040/80
= 75.5
10. Median (Grouped Data)
Use last example
Cumulative
Marks Frequency Frequency (F)
60-64 10 10
65-69 14 24
70-74 12 36
75-79 20 56
80-84 10 66
85-89 14 80
= 75 + 40 - 36 x 5 = 76 marks
20
Calculation of Median
Sample size (n) = 20
Median position (n/2) = 40th
Median class = 75-79,
Lower boundary (bL) = 75 (for median
class)
Frequency in median class = 20
Cumulative below median class (F) = 36
Class-width ( c ) = 5
Apply formula:

Median = bL + (n/2 f)_ x c


fmed
Measures of Dispersion

Candidates are:
Range
Interquartile range
Variance
Standard deviations
Coefficient of variation
Percentiles
Quantiles
Range
Difference between lowest and
highest values.
Rely on only 2 extreme values.
Easy to calculate
Quartiles
Value divides ordered observations into
4 equal parts.
1st quartile is value below which 1/4 of
the observations lie.
1st quartile equivalent to 25th
percentile.
2nd quartile equivalent to median or 50th
percentile.
3rd quartile value above which 1/4 of
the ordered observations is located.
Interquartile Range.

Difference between 3rd quartile and Ist


quartile.
Concentration on the middle 50% of the
ordered observations.
Not affected by outliers.
Percentiles.
Value divides ordered observations into
100 equal parts.
Variance

Mean squared deviations from the


mean value.
Square of standard deviations.
Units of measurement in square of
original units.

S2 = (xI - x)2
n -1
Standard Deviation

Square root of variance.


Best measure of variation or
dispersion.
Unit same as original units.
Standard Deviation - Practical Example
The formula is this:

S = (xI - x)2
n-1
Example on standard deviations:
The number of crisis experienced by 5 sickle cell
patients in a year are 3, 0, 2, 1, 4
Find the mean, variance and standard deviation.

Mean = xi = 3 + 0 + 2 + 1 + 4 = 2.0
n 5
Variance: (xI - x)2 = (3-2)2 + (0-2)2 + (2-2)2 + (1-2)2 + (4-2)2
n-1 5-1

= 1 + 4 + 0 + 1 + 4= 10
4 4

= 2.5

Standard Deviation = (xI - x)2 = 10 = 10


n-1 4 2

= 1.58

Note calculators are available to do this.


Standard Deviation for Grouped
data.

Sometimes data is presented in a


frequency table.
You can still calculate the measure
of dispersion.
SD = fi(xi-x )2
fi - 1
Where fi =Frequency of Observation in each class
xi = Class Mid Mark
x = Mean
Example:
The frequency distribution of the
weight of 100 patients with
Rheumatoid Arthritis is as
follows:

Weight (kgs) Frequency Class-Mid-Mark


60 - 69 5 65

70 - 79 15 75

80 - 89 20 85

90 - 99 25 95

100 - 109 20 105

110 - 119 15 115

Calculate the mean, variance and standard deviation


SOLUTION
Mean = fI xI = 5 65 + 15 + 75 + 20 + 85 + 25 + 95 + 20 + 105 + 15 + 115
fI 100

= 325 + 1125 + 1700 + 2375 + 2100 + 1725 = 9350


100 100

= 93.5 kgs

14. Variance = fi (xI - x)2 = 5 + (65-93.5)2 .. + 15 + )115-935)2.


fI - 1 100 - 1

= 1823.5 = 18.42 kg2


99

15. Standard deviation = fi (xI - x)2 = 1823.5 = 4.29 kgs


fI - 1 99
Coefficient of Variation.
- Reduces measure of dispersion to a dimensionless

quantity.

- Calculate by dividing standard deviation by the

mean value.

- Express result in percentage.

- Useful to compare variations between 2 variables

not in the same unit.


Calculate the coefficient of variation in the

weight of the subjects in slide 8.1

SD = 4.29 kgs
Mean = 93.5 kgs

COV = 4.29 x 100 = 4.6% (a)


93.5

For the number of crisis in slide 7.

SD = 1.58
Mean = 2.5
COV = 1.58 = 63.2% (b)
2.5

Which of the variables has the higher variability?


Usefulness of mean and standard deviations.
- Useful to summarise data measured on at
least interval scales.
- For mathematical description of the
distribution of biological, biochemical,
heamological and physiological variables.
- Most of the values of these variables appear
in the middle of the distributions and have
symmetric distributions.
- Sometimes tail at one end more prominent
than tail on the other - skewed.
- Skewed distributions are asymmetric but
unimodal e.g. hemoglobin.
Distributions change with characteristics of subjects
like age, sex or nutrition.
THANK YOU
Skewed Distributions.
- Positive if mean is greater than median.
- Negative if mean is less than median.
Qualitative Variables.
- Summarise by proportions with positive
attributes.
- Express in percentages.

Proportion = No with attribute = r=P


Total sample n

Standard deviation = p (1-p)


N
These statistics are better calculated using a
computer.
Normal and Sampling Distributions.

1. Frequency distribution of continuous variables.


- Usual or typical feature is for observations on most
biological variables to concentrate or cluster
around the central value.
- Fewer observations are observed as one moves
away from the central value to the tails.
- Norman Gauss wrote a model of that completely
describe the shape of this distribution.
- Today it is called a Gaussian distribution or Normal
distribution.
- It occupies a central role in statistical inference.
1. Properties of Normal Distribution.
- Bell shaped and symmetric about central value.
- Completely determined by its mean and standard
deviation.
- Mean, median and mode have same value.
- Total area under the curve is 1 (100%).
- 68% of all observations lie within one standard
deviations of the mean value.
- 95% of observations lie within 1.96 standard
deviations of the mean value.
- 99% of all observations lie within 2.58 standard
deviations.
Presentation of Normal Distribution.

As a mathematical equation
Graph
Table
-
1. Mathematical Equation
- 1/2 (x - )2

y = 1___ e
2II
II and e are constants
is arithmetic mean
is standard deviation
Graph
Table of Area
Areas under a standard normal curve
Gives probability of falling within an
interval.
Standard normal curve has a mean = 0
and standard deviation = 1
Need to transform data to standard
normal curve to use this table.
1. Transformation to standard Normal Curve.
- Use Z = (x - )

Z is standardized normal deviate or normal score.
- Read corresponding area from table.
- Z is in the Ist column in the table.
- Area in the heart of the table.
1. Example:
If mean age of onset of diabetes mellitus is 28 years
with a standard deviation of 3 years. What is the
probability that the age of onset of a subject from the
population at 32 years or above.
Solution:

Find the area to the left of 32 years


X = x - = 32 - 28 = 4 = 1.3333
3 3

The area is 0.5 -


1. Importance of the normal distribution.

- Fits many practical distributions of variables in


Medicine.
- If variables are not normally distributed,
transformation techniques to make them normal exist.
- Sampling distributions of means and proportions are
known to have normal distributions.
- Binomial distributions can be approximated by a
normal distribution, if the sample size is large and the
probability of a success is not small.
- It is the cornerstone of all parametric tests of
statistical significance.

S-ar putea să vă placă și