Documente Academic
Documente Profesional
Documente Cultură
Probability and
Statisti
cs
Fundamentals of
Statistics
Introduction
Statistics
Both singular and plural in meaning
May refer to either numerical or
quantitative data like age, weight,
height, etc.
Is a scientific method of
collection, presentation, analysis
and
interpretation of data for the
purpose of drawing valid
Fields of Statistics
Descriptive Statistics
Method concerned with the collection
and description of a set of data to
yield meaningful information
Provides information only about
collected data and does not draw
inferences of conclusions about a
large set of data
Fields of Statistics
Inferential Statistics
Composed of those methods
concerned with the analysis of a
smaller group of data which is known
as the sample leading to predictions
or inferences about the larger set of
data, or the population at which the
sample is drawn.
The Language of
Summation
n
i 1
Law of Notation
a.
)
xi
yi
b.
)
cxi
c.
)
n
c
i 1
i 1
i 1
c xi
i 1
n
xi
i 1
i 1
nc
xij
i 1 j 1
( x1
1
xn 2 ... xnm )
Parameters and
Statistics
Parameter any numerical value
describing the characteristics of a
population
Statistic any numerical value
describing the characteristics of a
sample
n = sample size
N = population
size
e = estimated
error
Ne
Example Slovins
Formula
Problem: How large a sample should be
chosen if we expect 5% error from a
population of 3000?
Given:
N = 3000,
e=
0.05
Required: n
3000
Solutio
n
2
1 3000 0.05
n:
n 352.94
whole
value, which is 353, as the minimum
number of
sample to sufficiently obtain the
experiment.
Sampling Techniques
1. Non-Probability Sampling type of
sampling when an individual subject
has certain or no chance of being
chosen as a sample.
a. Convenience Sampling
sampling technique based primarily
on the availability of the
respondents.
Sampling Techniques
Non- Probability Sampling
b. Quota Sampling sampling
technique where there is a desired
number of sample and the
respondents were taken as they
volunteered themselves as to be
part of the experiment
c. Purposive Sampling sampling
technique where sample is obtained
based on a certain premise.
Sampling Techniques
2. Probability Sampling eliminates
the biases against certain event that
has no chance to be selected by
listing all the possible events
a. Simple Random Sampling
performed by arranging the
population according to a certain
rule, each element being numbered
and a sample is taken by various
randomizing principles
Sampling Techniques
Probability Sampling
b. Systematic Sampling done by
arranging the population in
accordance to a certain order and
the sample will be taken by dividing
the population into equal groups and
obtaining the kth element in each
group.
Sampling Techniques
Probability Sampling
c. Stratified Sampling technique
done by grouping the population
into strata, a subpopulation with
generally homogeneous or similar
characteristics, where a random
sampling is performed in each
stratum
proportional to the size of the
stratum relative to the population.
Sampling Techniques
Probability Sampling
d. Cluster Sampling technique
done by identifying groups called
clusters, a subpopulation with
elements as heterogeneous or
diverse in characteristics as possible.
Clusters must be similar to each
other with respect to the parameter
being examined. A cluster or some
clusters is selected for the sampling.
Types of Data
Qualitative descriptions used to
portray the attributes of data
Quantitative measurable
quantities such as scores, weights,
grades being collected.
Grouped data categorized data
Ungrouped data raw, random data
Classification of Data
Categorical data like gender, color,
civil status, and location are
commonly answered by nonnumeric data (qualitative) form.
Numerical data information and
observations that are countable or
measurable quantities such as
scores, weight and grades
(quantitative).
Levels of Measurement of
Numerical
Dat
a
1. Nominal data commonly
categorical data assigned to
numbers. Example of which is
assigning 1 for males and 2 for
females.
2. Ordinal data quantities where the
numbers are used to designate the
rank order of the data. Example of
Levels of Measurement of
Numerical
Dat
a
3. Interval data data type where the range
between the numeric values is constant. In
this type of
data, addition and subtraction can be
performed but not multiplication and
division. Example of which is the year,
temperature measured, final grade, etc.
4. Ratio data widely used in science and
engineering.
Almost all basic operations can be
performed in this
data type. One significant characteristic is
the presence
of a non-arbitrary zero-point. Examples of
which are
length, mass, angles, charge and energy.
Presentation of Data
1. Textual Form way of
presenting data in terms of
statements, sentences, and
paragraphs.
2. Tabular Form using tables to
present data that is direct to the
point and easily understood.
Examples of which are reference
table and the summary table.
Ungrouped Data
Measures of Central Tendency
1. Mean average of all the data or
values
N
Populati
xi
on
i 1
Sampl
e
N
n
x
i 1
x%
Ungrouped Data
Ungrouped Data
Ungrouped Data
Measure of Variation of Data
How the data is spread out from the
mean
1. Range
- Difference of highest and lowest
values
RANGE = HV LV
Ungrouped Data
2. Standard Deviation
Populati
N
2
x
i
on
i 1
Sampl
e
xi
i 1
n 1
Ungrouped Data
3.
Variance
Populati
on
xi
i 1
Sample
s
xi
i 1
n 1
Ungrouped Data
4. Coefficient of Variation
- the percentage of the ratio of
standard deviation to the mean
Population
Sampl
e
Vp
100%
Vs
s
100%
x
Examp
le
A food inspector examined a random
sample of
7 cans of certain brand of tuna to
determine the
percentage of foreign impurities. The
following
data are recorded:
Calculations
7
Mea
n
Media
n
x%
Mod
e
i
i 1
7
1.8
1.8
x 1.8
Grouped Data
1. Stem-Leaf Plot one way of
summarizing ungrouped data.
This table has two
columns, one for the stem and the
other for the leaves.
2. Frequency Distribution Table (fdt)
numerous data can be analyzed
by grouping the data into different
classes with equal class intervals
and determining the number of
Example Stem-Leaf
Plot
Express the following data as a stem
and leaf plot with the tens digit as
the stem and the ones digit as the
leaves
12, 23, 12, 11, 10, 25, 29, 39, 31, 43,
42, 54,
53, 53, 56, 57, 56, 67, 54, 65, 76, 76,
75, 74
Example Stem-Leaf
Plot
STEM-LEAF
PLOT
STEM
LEAF
FREQUENCY
1
0,1,2,2
2
3,5,9
3
1,9
4
2,3
5
3,3,4,4,6,6,7
6
5,7
7
4,5,6,6
4
3
2
2
7
2
4
TOTAL
24
Grouped Data
Steps in constructing a frequency distribution
1. Decide on the number of class intervals
required
Square-Root Principle
k=N
or
Sturges Formula
k = 1 + 3.322logN
(round-up result)
2. Determine the range.
3. Determine the size of the class interval by
dividing the range by the desired number of
class interval (round-off).
4. Determine the lower and upper class limits
of each class interval. Start with the lowest
data or score.
5. Determine the number of observations falling
Examp
le
Example: Suppose that the following
data are the average speeds of
vehicles running in SLEX (in kph):
27
56
38
43
48
38
43
30
34
40
50
43
57
52
25
43
35
29
49
36
29
52
46
49
46
47
31
52
41
31
55
50
42
41
52
25
46
36
41
36
Calculations
1. Square-Root
k
40 6.32
Formula
k 7
or Sturges
a kk
F 2. Range =
o 57-25
r
=
m
3
u
2
l
32
3. Class size =
Frequency
Distribution
Class
x
(Class
Marks)
55-59
57
54.5-59.5
40
50-54
52
49.5-54.5
37
45-49
47
44.5-49.5
31
16
40-44
42
39.5-44.5
24
25
35-39
37
34.5-39.5
15
31
30-34
32
29.5-34.5
35
25-29
27
24.5-29.5
40
Total
N = 40
Class Boundaries
<CF
>CF
Grouped Data
1. Measures of Central Location
a.
Mean
where
f=
frequency
x = class
mark
fx
N
N = number of observations
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.1. Long
Method
fx
x
x
x
N
16 75
40
41.875
Grouped Data
a.2. Coding
Method
fu
N
A C
where
A = class mark of the mean
class (class interval of the
assumed mean)
C = class width
f = frequency
u = code
N = number of observations
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.2. Coding Method
fu
A
C
x
x
( 1)
42 5
40
41.875
Grouped Data
a.3. Short/Deviation Method
fd
N
wher
A
e
A = class mark of the mean class
(class
interval of the assumed
mean)
f = frequency
d = deviation
N = number of observations
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.3. Short/Deviation Method
x
x
x
fd
A
5
42
40
41.875
Grouped Data
b.
Median
%
x
N
LMd
f
2
f Md
where
LMd = lower class boundary of the
median class
f< = the cumulative frequency of
the class preceding the
median class
fMd = frequency of the median
class
C = class size
Sample Calculations
Class
f<
55-59
40
50-54
37
45-49
31
40-44
24
35-39
15
30-34
25-29
Total
N = 40
N = 40, N/2
= 20
Median is 20th term, hence it is under the 40-44 class
interval
LMd = 39.5
f< = 15
fMd = 9
x%
42.28
Grouped Data
c.
Mode
wher
e
x
LMo
Cd1
d1 d 2
Sample Calculations
Class
55-59
50-54
45-49
40-44
35-39
30-34
25-29
Total
N = 40
x
42.5
Grouped Data
2. Measures of
Variation/Deviation a. Mean
Deviation
MD
x x
Sample Calculations
Class
|x-xbar| f|x-xbar|
55-59
57
15.125
45.375
50-54
52
10.125
60.75
45-49
47
5.125
35.875
40-44
42
0.125
1.125
35-39
37
4.875
29.25
30-34
32
9.875
39.5
25-29
27
14.875
74.375
Total
N = 40
MD 7.16
286.25
Grouped Data
b. Standard Deviation
x x
2
N 1
N fx
fx
N N 1
N fu
N N 1
fu
Grouped Data
c.
Variance
s
s
s
x x
N 1
N fx
fx
2
N N 1
C
N fu
N N 1
fu
Sample Calculations
Class
55-59
57
15.125 228.765625
686.296875
50-54
52
10.125 102.515625
615.09375
45-49
47
5.125
26.265625
183.859375
40-44
42
0.125
0.015625
0.140625
35-39
37
4.875
23.765625
142.59375
30-34
32
9.875
97.515625
390.0625
25-29
27
Total
N = 40
s 8.95
80.1025
Grouped Data
d. Semi-interquartile Range (Q)
Q3 Q1
Q
2
e. 10-90 Percentile Range P10-90
P10
90
P90
P10
Grouped Data
3. Measure of Position
From the median formula
x
%
LMd
2
f Md
a. Quartiles
Qk
LQ
kN
f
4
C
fQ
Grouped Data
b.
Deciles
Dk
LD
c.
Percentiles
Pk
LP
kN
f
10
C
fD
kN
f
100
C
fP
Sample Calculations
Compute for Q1, D7, P30.
Class
f<
55-59
40
50-54
37
45-49
31
40-44
24
35-39
15
30-34
25-29
Total
N = 40
Sample Calculations
Quartile 1:
k=1
Q1 class is 35-39
LQ = 34.5
f< = 9
fQ = 6
Therefore, Q1 = 35.33
That is, 25% of the data falls below the
score 35.33
Sample Calculations
Decile 7:
k=7
D7 class is 45-49
LD = 44.5
f< = 24
fD = 7
Therefore, D7 = 47.36
That is, 70% of the data falls below the
score 47.36
Sample Calculations
Percentile 30:
k = 30
P30 class is 35-39
LP = 34.5
f< = 9
fP = 6
Therefore, P30 = 37.00
That is, 30% of the data is below the
score 37.00
Position by
Normalization
The zscore
z
x x
s
Sample
Calculations
Sample
Calculations
Statistic
s
Accounti
ng
Grade
Mean
Statistics
89
80
Accounting
85
78
Std.
Deviation
6
4
z 1.50
z 1.75
Graphical
Representation
a. Bar Chart (Horizontal) or Histogram
(Vertical)
Set of rectangles using the class
boundaries (x-axis for histogram y-axis for
bar chart) and the frequency (y- axis for
histogram, x-axis for bar chart).
Graphical
Representation
b. Frequency Polygon
Line plotting of the frequency (y-axis)
versus the class marks (x-axis).
10
9
8
7
6
5
4
3
2
1
0
27
32
37
42
47
52
57
Graphical
Representation
c. Ogive
Plot of the class boundary (x-axis)
against the cumulative (less than or
greater than) frequency
(yaxis).
45
40
35
30
25
20
15
10
5
0
24.5
29.5
34.5
39.5
44.5
49.5
54.5
Graphical
Representation
d. Pie Chart circular chart
representing percentage
occurrence of each group in the
Pie Chart
sampl
e.
2529
13
30-34 %
10%
35-39
15%
5559
7
%
50-54
15%
45-49
17%
4044
23
%
Skewne
ss
- Defined as the degree of symmetry
of distribution, whereas kurtosis is
the degree of peakedness exhibited
by the distribution.
- Skewness coefficient (Sk) of the
distribution can be determined by
using the Pearsonian Coefficient
of Skewness
Pearsonian Coefficient of
Skewness
Sk
3 x
x%
Skewness
Symmetric (Sk = 0)
-6
6
-4
-2
Skewness
Positively skewed (Sk > 0)
-6
6
-4
-2
Skewness
Negatively skewed (Sk < 0)
-6
6
-4
-2
Kurtos
is
Kurtosis
4
n
n(n 1)
x
x
i
s
(n 1)(n 2)(n 3)
i 1
3( n 1)
(n 2)(n 3)