Documente Academic
Documente Profesional
Documente Cultură
DESCRIPTIVE STATISTICS
Module I Plan
Module I Objectives
Statistics
POPULATION
Individual
SAMPLE
Sampling
Inference
N = 15
n=4
Exercise
Population
N = 11
Individual
Variable
In general...
We want to get
information
on an entire population
for a given variable,
The number of
individuals in a
population being
too high
but
Therefore
And thus It becomes tedious, costly,
We measure
or impossible to measure
a subset of individuals
the entire population
Called a SAMPLE.
Sample
n=3
Inference
Qualitative variable
Quantitative variable
Quantitative Variables
Examples
When people say they are 18 years old, they have between 18 and 19 years
of age.
17
18
19
18 years constitutes the interval between 18 and 19. The " age" variable is
therefore continuous.
When someone says he or she has 3 children, it means they have exactly 3
children.
discrete variable.
Continuous
Discrete
Variables identifying:
DPMO,
The number of accidents per month,
The number of participants to training,
Resistance
Test
Passes
Fails
Fails
Resistance = ?
Upper
Specification
Lower
Specification
Location Statistics
AND
To have an idea of
where data mainly
stands
Location Statistics
Description
Mean or Average
The mean or average is the center of gravity in a distribution. It
equals the equilibrium point on a scale. In general, it is
represented by the X symbol.
Location Statistics
Description
The mean or average is the most popular statistic!
Avera
ge
for th temperat
u
e mon
th of re
July...
imir
d
a
l
V
of
e
g
a
r
l .. .
ve
l
a
a
b
g
e
n
i
s
Batt
n ba
i
o
r
e
Guer
Location Statistics
Description
Median
The median in a distribution is the value found in the middle of
observed data placed in an ascending sequence. Half of
observed data will be lower than this value and the other half
will be higher than this value.
Im the median!
Location Statistics
Description
Mode
The mode is the most likely value of a distribution data i.e. its
the most frequently observed value.
Location Statistics
Formulas
Lets say we have these 5 observed values:
Average: X = Xi =
n
5, 2, 2, 4, 7
5+2+2+4+7 = 4
5
Number of Observed Values
Median: md = 2 2 4 5 7 = 4
There are 2 observed values on each side of 4. (If the number
of observed values is even, we take the average of the two
middle observed values).
Mode :
Mode = 2
(its the most recurring observed value). It is possible for a
distribution to have more than one mode.
Location Statistics
Interpretation
The average and the median are two central location statistics.
0 0 1 1 2 2 2 2 3 4
Changing
4 for 34
0 0 1 1 2 2 2 2 3 34
The median goes from 2 to 2 and the average goes from 1.7 to 4.7.
Dispersion Statistics
If a statistician had ice on his head
and fire under his feet, he would
say that on average,
average hes feeling good!!!
Dispersion Statistics
Dispersion Statistics
Description
Range
The range of a distribution is the difference between the
maximum and minimum data. It represents the width of the
distribution. In general, it is represented by the R symbol.
Dispersion Statistics
Description
Standard Deviation
The standard deviation is a quantity based on the distance between
each observed value and the average. It measures the variation
around the average. In general, it is represented by the S symbol.
2
Variance
The variance is the squared standard deviation. In general, it is
represented by the S2 symbol.
Dispersion Statisitcs
Formulas
Lets say we have these 5 observed values:
5, 2, 2, 4, 7
(Xi -X)2 =
n-1
Average
= 2.1
(Number of Observations - 1)
Dispersion Statistics
Interpretation
The range and standard deviation are the two most often used dispersion
statistics. Both of these statistics are strongly affected by outliers.
Changing
4 for 34
0 0 1 1 2 2 2 2 3 4
0 0 1 1 2 2 2 2 3 34
The range goes from 4 to 34 and the standard deviation goes from 1.3 to 10.3
The range is popular because of its simple form. However, the standard
deviation is a more accurate estimate of the "real" dispersion of a distribution.
The majority of data (between 60% and 75%) can be found within 1 standard
deviation from the average.
Almost all of the data (between 90% and 98%) can be found within 2
standard deviations from the average.
Dispersion Statistics
Practical tips
In general, the average and the standard deviation are calculated from a
sample, and therefore, they are NEVER accurate. Its an approximate
figure. The larger the sample, the more precise the approximation.
The average and the standard deviation are affected by outliers. This
phenomenon must be taken into account when making calculations and
interpretations.
Introduction to Minitab
Introduction to Minitab
Columns Function
Rows Function
Worksheet Function
Formulas
Minitab Files
Importing Files in Minitab
Session Window
Contains numerical results of analyses
Data Window
Contains data columns. There is one data window per
worksheet.
Graphics Window
Contains graphs and analyses
Worksheet
Missing Data
Text
Variables Name
Column
Number
Columns
Columns
Columns
Rows
Worksheet
Worksheet
Formulas
Minitab Files
Project Files
Include all of the analyses done on a set of data : worksheet,
numerical summaries, graphs, etc. Very useful to keep analyses
done on a dataset. These files have the .MPJ extension.
Graphic Files
Include graphs saved during previous analyses. These files
have the .MGF extension.
Exercise !
M&M TRIVIA
1.
2.
3.
4.
Results
The mean
The median
The mode
The range
The standard deviation
Total
Red Ones
Use of Minitab
Exercise