Descriptive Statistics

Medical Statistics
It is an applied science that provides a

scientific framework for the collection,
summarization, analysis, presentation and
interpretation of medical/health data.
Need for Statistics in Medicine
Ever present variation in health
measurements. Statistical tools are used to
sort out these variations.
Sources of variation in
measurements
Inherent Biological differences
Instruments used
Observers
Subjects
Environment
Note that there are variations in the :
Physiological
Anatomical
Biochemical and
Physical Characteristics of individuals
Statistics helps to sort out and explain these
variations
Type of variations
Random (inherent) cannot be controlled
Systematic (measurements) can be
controlled
Measurements in Health
Expression of any health/medcal
phenomenon in numbers or categories- to
give size or extent or magnitude
Impressions
How to measure
Use of instrument
Questionnaire
Lab equip
Senses
Clinical equip
Level of Measurements
Nominal
Ordinal
Interval
Ratio
What is data?
Information
Types of Data:
Quantitative data
Qualitative/categorical data
Sources of data
Routine Sources
Ad-hoc sources
Statistical Tools
Descriptive statistics
Inferential Statistics
Descriptive Statistics
Statistical Methods that deal with description of
characteristics(s) of a finite population
Methods of Descriptive Statistics
Frequency Tables
Diagrams (Graphs/charts)
Summary Indices
Frequency Tables
Arrangement of data by rows & columns

Qualities
Simple Information (not more than 3 variables)
Clear title to indicate what? when? where?
Good labeling of rows & columns
Indicate units of measurements
Row, column & grand total to add up
Diagrams
Quantitative or numerical Data

Histogram, Frequency Polygon
Qualitative or categorical data
Bar , Pie chart
SUMMARY INDICES
Measures of Central Tendency
Measures of Dispersion
Measures of Central Tendency
Arithmetic Mean
Median
Mode
Range
Interquartile range
Variance
Standard deviation
Coefficient of variation
Percentiles
Quantiles
1. Frequency Distribution Table.
- Useful to summarize data.
- Has two main columns.
- Column 1 lists all values of the variable.
- Column 2 the frequency at which each value occurs.
- For initial data exploration
2. Qualitative Variables - Frequency Tables.
- Ist column: different categories of the variable
(mutually exclusive).
- 2nd column: frequency or count with which each
category occurred.
1. Frequency distribution of continuous variables.
- The variable is quantitative.
- Need to put values into subgroups or classes.
- Find lowest value and highest value.
- Substract lower value from highest value to get the
range.
- Divide the range by number of classes or subgroups
to get the class - width.
- Class limits must not overlap.
- Intervls better to be equal.
- Number of subgroups or classes between 6 and 8.
- May apply struge's rule: k = 1 + 3.3 log10n (n =
sample size).
Example 5 - 9 NOT 5 - 10
10 - 14 10 - 15
15 - 19 15 - 20
20 - 24 20 - 25
Other columns of a frequency Table
(One) Relative Frequency
- Proportion of total observations ascribed to
that value
- Divide frequency in the class interval by total
observation.
(Two) Cumulative Frequency
- Proportion of total observations with certain
value or less.
- Must correspond to end of class interval.
Add up relative frequencies to preceding values.
Graphical Presentation
1. Need for
- To aid the eye
- Diagram make better visual impressions than
numbers.
2. General format.
- Plotted on rectangular co-ordinate axes at right
angles to each other (X and Y).
- Horizontal line (independent variable).
- Vertical line (dependent variable).
- Must have clear title.
- Label axes and add units.
1. Type of Diagram
- Depend on type of variable
- Qualitative or quantitative
2. Qualitative Variables
- Bar chart
- Pie chart
- Pictogram
1. Bar Chart
- Slender rectangles to represent frequency of values
of variable.
- Rectangles are separate and distinct
- Height of rectangle correspond to frequency
Example: The following are reasons given by some physicians in
Riyadh for not smoking; 1409H.
Reasons Frequency
Health 25
Religious 15
Social 12
Profession 5
Others 3
.
BAR CHART
25
20
15
Frequencies
10
0
Health Religious Social Profession Others
Pie Chart
- Use to show the components of a total
- More intelligent visual impressions
sometimes.
- Draw a big circle to represent total
observation.
- Divide circle into sectors according to the
frequency of each attribute.
- Use (n/N) x 3600 to represent each sector.
- Shade sectors in different colours to
distinguish.
8. Example the last example on Bar Chart.
- Total physicians is 60.
- Corresponding degrees in Pie Chart are
Reasons Frequency (n/N) x 360 Degrees
Health 25 (25/60) x 3600 1500
Religion 15 (15/60) x 3600 900
Social 12 12/60 x 3600 300
Profession 5 5/60 x 3600 300
Others 5 3/60 x 3600 190
PIE CHART
5%
8%
Health
42%
Religious
20%
Social
Profession
Others
25%
1. Quantitative Variable
- Use histogram
- Frequency polygon
2. Histogram
- Used to show data on interval or continuous
variables
- Slender rectangles adjoin each other
- Convey area the histogram
Give appropriate title, and labelling of the axes.
1. Example:
Represent the data on the age distribution of adult admissions
into UCH between 1985 and 1991.
Age (years) Frequency
10-19 1697
20-29 2787
30-39 2390
40-49 2445
50-59 2377
60-69 1989
70-79 1514
HISTOGRAM
3000
2500
2000
1500
Age(years
1000
500
0
'10-19 '20-29 '30-39 '40-49 '50-59 '60-69 '70-79
Arithmetic Mean
Most useful measure of central
tendency.
Not good when data is skewed.
Calculation (2 steps).
Add all observations.
Divide by number of observations.
Mean = Sum of all observations
No of observations
-
X = xi
n
Example:
Find the mean
Age of the first 10 1st year clinical students in
U.I 23, 19, 21, 20, 23, 21, 22, 24, 22, 22
x= 23 + 19 + 21 + 20 + 23 + 21 + 22 + 24 + 22 + 22
10
= 21.7 years
Median
Best measure of central tendency when
data is skewed.
Calculate (ungrouped data).
Arrange observations in ascending or
descending order. (2 steps)
Pick observation in the middle as median.
Note: If number of observations is even,
take mean of two middle observation.
EXAMPLE ON MEDIAN
Find the Median of the first 10 1st year clinical
students in U.I 23, 19, 21, 20, 23, 21, 22, 24,
22, 22
Step 1: Arrange in Ascending Order

19, 20,21,21,22,22,22,23,23,24
Step 2: Pick the middle observation
(22+22)/2
= 22
MODE
Least used measure of central

tendency.
The observation that occurs most
frequently.
Example on Mode
In the age of 10 medical students

23, 19, 21, 20, 23, 21, 22, 24,
22, 22
The mode is 22
PROCEDURE FOR CALCULATING
MEAN (Grouped data)
Find class-mid mark for each interval.
Multiply class mid-mark in each interval
by corresponding frequencies.
Add results in (ii) across all intervals.
Divide results in (iii) by number of
observations or total frequency.
Mean - (Grouped Data)
Example:
Marks of students in practical 1 1403H.
Marks Frequency Class mid mark f I xI
60-64 10 62.5 10*62.5
65-69 14 67.5 14*67.5
70-74 12 72.5 12*72.5
75-79 20 77.5 20*77.5
80-84 10 82.5 10*82.5
85-89 14 87.5 14*87.5
-
f = 80 fi xi = 6040
x = fi xI
fi
x = f i xI
fi
x = 6040/80
= 75.5
10. Median (Grouped Data)
Use last example
Cumulative
Marks Frequency Frequency (F)
60-64 10 10
65-69 14 24
70-74 12 36
75-79 20 56
80-84 10 66
85-89 14 80
= 75 + 40 - 36 x 5 = 76 marks
20
Calculation of Median
Sample size (n) = 20
Median position (n/2) = 40th
Median class = 75-79,
Lower boundary (bL) = 75 (for median
class)
Frequency in median class = 20
Cumulative below median class (F) = 36
Class-width ( c ) = 5
Apply formula:
Median = bL + (n/2 f)_ x c

fmed
Candidates are:
Range
Interquartile range
Variance
Standard deviations
Coefficient of variation
Percentiles
Quantiles
Range
Difference between lowest and
highest values.
Rely on only 2 extreme values.
Easy to calculate
Quartiles
Value divides ordered observations into
4 equal parts.
1st quartile is value below which 1/4 of
the observations lie.
1st quartile equivalent to 25th
percentile.
2nd quartile equivalent to median or 50th
percentile.
3rd quartile value above which 1/4 of
the ordered observations is located.
Interquartile Range.
Difference between 3rd quartile and Ist

quartile.
Concentration on the middle 50% of the
ordered observations.
Not affected by outliers.
Percentiles.
Value divides ordered observations into
100 equal parts.
Variance
Mean squared deviations from the

mean value.
Square of standard deviations.
Units of measurement in square of
original units.
S2 = (xI - x)2
n -1
Standard Deviation
Square root of variance.

Best measure of variation or
dispersion.
Unit same as original units.
Standard Deviation - Practical Example
The formula is this:
S = (xI - x)2
n-1
Example on standard deviations:
The number of crisis experienced by 5 sickle cell
patients in a year are 3, 0, 2, 1, 4
Find the mean, variance and standard deviation.
Mean = xi = 3 + 0 + 2 + 1 + 4 = 2.0
n 5
Variance: (xI - x)2 = (3-2)2 + (0-2)2 + (2-2)2 + (1-2)2 + (4-2)2
n-1 5-1
= 1 + 4 + 0 + 1 + 4= 10
4 4
= 2.5
Standard Deviation = (xI - x)2 = 10 = 10

n-1 4 2
= 1.58
Note calculators are available to do this.

Standard Deviation for Grouped
data.
Sometimes data is presented in a

frequency table.
You can still calculate the measure
of dispersion.
SD = fi(xi-x )2
fi - 1
Where fi =Frequency of Observation in each class
xi = Class Mid Mark
x = Mean
Example:
The frequency distribution of the
weight of 100 patients with
Rheumatoid Arthritis is as
follows:
Weight (kgs) Frequency Class-Mid-Mark

60 - 69 5 65
70 - 79 15 75
80 - 89 20 85
90 - 99 25 95
100 - 109 20 105
110 - 119 15 115
Calculate the mean, variance and standard deviation

SOLUTION
Mean = fI xI = 5 65 + 15 + 75 + 20 + 85 + 25 + 95 + 20 + 105 + 15 + 115
fI 100
= 325 + 1125 + 1700 + 2375 + 2100 + 1725 = 9350

100 100
= 93.5 kgs
14. Variance = fi (xI - x)2 = 5 + (65-93.5)2 .. + 15 + )115-935)2.

fI - 1 100 - 1
= 1823.5 = 18.42 kg2

99
15. Standard deviation = fi (xI - x)2 = 1823.5 = 4.29 kgs

fI - 1 99
Coefficient of Variation.
- Reduces measure of dispersion to a dimensionless
quantity.
- Calculate by dividing standard deviation by the
mean value.
- Express result in percentage.
- Useful to compare variations between 2 variables
not in the same unit.

Calculate the coefficient of variation in the
weight of the subjects in slide 8.1
SD = 4.29 kgs
Mean = 93.5 kgs
COV = 4.29 x 100 = 4.6% (a)

93.5
For the number of crisis in slide 7.
SD = 1.58
Mean = 2.5
COV = 1.58 = 63.2% (b)
2.5
Which of the variables has the higher variability?

Usefulness of mean and standard deviations.
- Useful to summarise data measured on at
least interval scales.
- For mathematical description of the
distribution of biological, biochemical,
heamological and physiological variables.
- Most of the values of these variables appear
in the middle of the distributions and have
symmetric distributions.
- Sometimes tail at one end more prominent
than tail on the other - skewed.
- Skewed distributions are asymmetric but
unimodal e.g. hemoglobin.
Distributions change with characteristics of subjects
like age, sex or nutrition.
THANK YOU
Skewed Distributions.
- Positive if mean is greater than median.
- Negative if mean is less than median.
Qualitative Variables.
- Summarise by proportions with positive
attributes.
- Express in percentages.
Proportion = No with attribute = r=P

Total sample n
Standard deviation = p (1-p)

N
These statistics are better calculated using a
computer.
Normal and Sampling Distributions.
1. Frequency distribution of continuous variables.

- Usual or typical feature is for observations on most
biological variables to concentrate or cluster
around the central value.
- Fewer observations are observed as one moves
away from the central value to the tails.
- Norman Gauss wrote a model of that completely
describe the shape of this distribution.
- Today it is called a Gaussian distribution or Normal
distribution.
- It occupies a central role in statistical inference.
1. Properties of Normal Distribution.
- Bell shaped and symmetric about central value.
- Completely determined by its mean and standard
deviation.
- Mean, median and mode have same value.
- Total area under the curve is 1 (100%).
- 68% of all observations lie within one standard
deviations of the mean value.
- 95% of observations lie within 1.96 standard
deviations of the mean value.
- 99% of all observations lie within 2.58 standard
deviations.
Presentation of Normal Distribution.
As a mathematical equation
Graph
Table
-
1. Mathematical Equation
- 1/2 (x - )2

y = 1___ e
2II
II and e are constants
is arithmetic mean
is standard deviation
Graph
Table of Area
Areas under a standard normal curve
Gives probability of falling within an
interval.
Standard normal curve has a mean = 0
and standard deviation = 1
Need to transform data to standard
normal curve to use this table.
1. Transformation to standard Normal Curve.
- Use Z = (x - )

Z is standardized normal deviate or normal score.
- Read corresponding area from table.
- Z is in the Ist column in the table.
- Area in the heart of the table.
1. Example:
If mean age of onset of diabetes mellitus is 28 years
with a standard deviation of 3 years. What is the
probability that the age of onset of a subject from the
population at 32 years or above.
Solution:
Find the area to the left of 32 years

X = x - = 32 - 28 = 4 = 1.3333
3 3
The area is 0.5 -

1. Importance of the normal distribution.
- Fits many practical distributions of variables in

Medicine.
- If variables are not normally distributed,
transformation techniques to make them normal exist.
- Sampling distributions of means and proportions are
known to have normal distributions.
- Binomial distributions can be approximated by a
normal distribution, if the sample size is large and the
probability of a success is not small.
- It is the cornerstone of all parametric tests of
statistical significance.

Descriptive Statistics

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Descriptive Statistics

Încărcat de

Drepturi de autor:

Formate disponibile

Medical Statistics

It is an applied science that provides a

Arrangement of data by rows & columns

Quantitative or numerical Data

Step 1: Arrange in Ascending Order

Least used measure of central

In the age of 10 medical students

Median = bL + (n/2 f)_ x c

Difference between 3rd quartile and Ist

Mean squared deviations from the

Square root of variance.

Standard Deviation = (xI - x)2 = 10 = 10

Note calculators are available to do this.

Sometimes data is presented in a

Weight (kgs) Frequency Class-Mid-Mark

100 - 109 20 105

110 - 119 15 115

Calculate the mean, variance and standard deviation

= 325 + 1125 + 1700 + 2375 + 2100 + 1725 = 9350

14. Variance = fi (xI - x)2 = 5 + (65-93.5)2 .. + 15 + )115-935)2.

= 1823.5 = 18.42 kg2

15. Standard deviation = fi (xI - x)2 = 1823.5 = 4.29 kgs

- Calculate by dividing standard deviation by the

- Express result in percentage.

- Useful to compare variations between 2 variables

not in the same unit.

weight of the subjects in slide 8.1

COV = 4.29 x 100 = 4.6% (a)

For the number of crisis in slide 7.

Which of the variables has the higher variability?

Proportion = No with attribute = r=P

Standard deviation = p (1-p)

1. Frequency distribution of continuous variables.

Find the area to the left of 32 years

The area is 0.5 -

- Fits many practical distributions of variables in

S-ar putea să vă placă și