Engineering Probability and Statistics Intro

Engineering
Probability and
Statisti
cs
Fundamentals of
Statistics
Introduction
Statistics
Both singular and plural in meaning
May refer to either numerical or
quantitative data like age, weight,
height, etc.
Is a scientific method of
collection, presentation, analysis
and
interpretation of data for the
purpose of drawing valid
conclusions and reasonable

decisions
Fields of Statistics
Descriptive Statistics
Method concerned with the collection
and description of a set of data to
yield meaningful information
Provides information only about
collected data and does not draw
inferences of conclusions about a
large set of data
Fields of Statistics
Inferential Statistics
Composed of those methods
concerned with the analysis of a
smaller group of data which is known
as the sample leading to predictions
or inferences about the larger set of
data, or the population at which the
sample is drawn.
The Language of
Summation
n
i 1
The symbol is read as the

sum (or summation) of x1,
x2, x3, , xn.
Or it is technically read as the sum
of xi
terms where i ranges from 1 to n.
The symbol , the Greek capital letter

sigma, is used to denote the sum.
Law of Notation
a.
)
xi
yi
b.
)
cxi
c.
)
n
c
i 1
i 1
i 1
c xi
i 1
n
xi
i 1
i 1
nc

xij
i 1 j 1
( x1
1
x12 ... x1m ) (

x21
x22 ... x2 m ) ... (

xn1
xn 2 ... xnm )
Parameters and
Statistics
Parameter any numerical value
describing the characteristics of a
population
Statistic any numerical value
describing the characteristics of a
sample
Data Collection and

Presentation
Data Collection
1. Data from Primary sources
acquired through experiments,
data gathering devices,
interviews, and surveys
2. Data from Secondary sources
from printed materials such as
books, magazines and journals
Data Collection and

Presentation
Slovins Formula
The sufficiency of sample size in
surveys can be obtained using the
formula
where
n = sample size
N = population
size
e = estimated
error
Ne
Example Slovins
Formula
Problem: How large a sample should be
chosen if we expect 5% error from a
population of 3000?
Given:
N = 3000,
e=
0.05
Required: n
3000
Solutio
n
2
1 3000 0.05
n:
n 352.94
Hence, we choose the next higher
whole
value, which is 353, as the minimum
number of
sample to sufficiently obtain the
experiment.
Sampling Techniques
1. Non-Probability Sampling type of
sampling when an individual subject
has certain or no chance of being
chosen as a sample.
a. Convenience Sampling
sampling technique based primarily
on the availability of the
respondents.
Sampling Techniques
Non- Probability Sampling
b. Quota Sampling sampling
technique where there is a desired
number of sample and the
respondents were taken as they
volunteered themselves as to be
part of the experiment
c. Purposive Sampling sampling
technique where sample is obtained
based on a certain premise.
Sampling Techniques
2. Probability Sampling eliminates
the biases against certain event that
has no chance to be selected by
listing all the possible events
a. Simple Random Sampling
performed by arranging the
population according to a certain
rule, each element being numbered
and a sample is taken by various
randomizing principles
Sampling Techniques
Probability Sampling
b. Systematic Sampling done by
arranging the population in
accordance to a certain order and
the sample will be taken by dividing
the population into equal groups and
obtaining the kth element in each
group.
Sampling Techniques
c. Stratified Sampling technique
done by grouping the population
into strata, a subpopulation with
generally homogeneous or similar
characteristics, where a random
sampling is performed in each
stratum
proportional to the size of the
stratum relative to the population.
Sampling Techniques
d. Cluster Sampling technique
done by identifying groups called
clusters, a subpopulation with
elements as heterogeneous or
diverse in characteristics as possible.
Clusters must be similar to each
other with respect to the parameter
being examined. A cluster or some
clusters is selected for the sampling.
Types of Data
Qualitative descriptions used to
portray the attributes of data
Quantitative measurable
quantities such as scores, weights,
grades being collected.
Grouped data categorized data
Ungrouped data raw, random data
Classification of Data
Categorical data like gender, color,
civil status, and location are
commonly answered by nonnumeric data (qualitative) form.
Numerical data information and
observations that are countable or
measurable quantities such as
scores, weight and grades
(quantitative).
Levels of Measurement of
Numerical
Dat
a
1. Nominal data commonly
categorical data assigned to
numbers. Example of which is
assigning 1 for males and 2 for
females.
2. Ordinal data quantities where the
numbers are used to designate the
rank order of the data. Example of
which is the result of a contest or

race where ranking is measured.
Levels of Measurement of
Numerical
Dat
a
3. Interval data data type where the range
between the numeric values is constant. In
this type of
data, addition and subtraction can be
performed but not multiplication and
division. Example of which is the year,
temperature measured, final grade, etc.
4. Ratio data widely used in science and
engineering.
Almost all basic operations can be
performed in this
data type. One significant characteristic is
the presence
of a non-arbitrary zero-point. Examples of
which are
length, mass, angles, charge and energy.
Presentation of Data
1. Textual Form way of
presenting data in terms of
statements, sentences, and
paragraphs.
2. Tabular Form using tables to
present data that is direct to the
point and easily understood.
Examples of which are reference
table and the summary table.
Ungrouped Data
Measures of Central Tendency
1. Mean average of all the data or
values
N
Populati
xi
on
i 1
Sampl
e
N
n
x
i 1
x%
Ungrouped Data
2. Median middle data or score

Procedure:
a. Arrange data in an
array b. Locate the
middle value
if odd, middle value is the
median if even, get the
average of the two middle
value
Ungrouped Data
3. Mode most frequent value

if two values occur the most at
the same frequency, then bi-modal
if three values occur the most at
the same frequency, then tri-modal
Ungrouped Data
Measure of Variation of Data
How the data is spread out from the
mean
1. Range
- Difference of highest and lowest
values
RANGE = HV LV
Ungrouped Data
2. Standard Deviation
Populati
N
2
x
i
on
i 1
Sampl
e
xi
i 1
n 1
Ungrouped Data
3.
Variance
Populati
on
xi
i 1
Sample
s
xi
i 1
n 1
Ungrouped Data
4. Coefficient of Variation
- the percentage of the ratio of
standard deviation to the mean
Population
Sampl
e
Vp
100%
Vs
s
100%
x
Examp
le
A food inspector examined a random
sample of
7 cans of certain brand of tuna to
determine the
percentage of foreign impurities. The
following
data are recorded:
1.8, 2.1, 1.7, 1.6, 0.9, 2.7, 1.8
Calculations
7
1.8 2.1 1.7 1.6 0.9 2.7 1.8

7
Mea
n
Media
n
0.9,1.6,1.7,1.8,1.8, 2.1, 2.7
x%
Mod
e
i
i 1
7
1.8
1.8
x 1.8
Grouped Data
1. Stem-Leaf Plot one way of
summarizing ungrouped data.
This table has two
columns, one for the stem and the
other for the leaves.
2. Frequency Distribution Table (fdt)
numerous data can be analyzed
by grouping the data into different
classes with equal class intervals
and determining the number of
observations that fall within each

class.
Example Stem-Leaf
Plot
Express the following data as a stem
and leaf plot with the tens digit as
the stem and the ones digit as the
leaves
12, 23, 12, 11, 10, 25, 29, 39, 31, 43,
42, 54,
53, 53, 56, 57, 56, 67, 54, 65, 76, 76,
75, 74
Hint: treat tens digit as the stem and

ones digits as the leaves
Example Stem-Leaf
Plot
STEM-LEAF
PLOT
STEM
LEAF
FREQUENCY
1
0,1,2,2
2
3,5,9
3
1,9
4
2,3
5
3,3,4,4,6,6,7
6
5,7
7
4,5,6,6
4
3
2
2
7
2
4
TOTAL
24
Frequency Distribution Table

Terms
Class limits smallest and largest
values that fall within the class
interval
Class boundaries class intervals
more precise limits (by the next
significant digit)
Frequency number of observations
Class width numerical difference
between the upper and lower class

boundaries
Class mark midpoint of class
interval
Frequency Distribution Table

Terms
Less than cumulative frequency
start to add from the lowest class
interval
Greater than cumulative frequency
start to add from the highest class
interval
Grouped Data
Steps in constructing a frequency distribution
1. Decide on the number of class intervals
required
Square-Root Principle
k=N
or
Sturges Formula
k = 1 + 3.322logN
(round-up result)
2. Determine the range.
3. Determine the size of the class interval by
dividing the range by the desired number of
class interval (round-off).
4. Determine the lower and upper class limits
of each class interval. Start with the lowest
data or score.
5. Determine the number of observations falling
under each class interval by tallying.
Examp
le
Example: Suppose that the following
data are the average speeds of
vehicles running in SLEX (in kph):
27
56
38
43
48
38
43
30
34
40
50
43
57
52
25
43
35
29
49
36
29
52
46
49
46
47
31
52
41
31
55
50
42
41
52
25
46
36
41
36
Calculations
1. Square-Root
k
40 6.32
Formula
k 7
or Sturges
a kk
F 2. Range =
o 57-25
r
=
m
3
u
2
l
32
3. Class size =
1 3.322 log(40) 6.32

7
Frequency
Distribution
Class
x
(Class
Marks)
55-59
57
54.5-59.5
40
50-54
52
49.5-54.5
37
45-49
47
44.5-49.5
31
16
40-44
42
39.5-44.5
24
25
35-39
37
34.5-39.5
15
31
30-34
32
29.5-34.5
35
25-29
27
24.5-29.5
40
Total
N = 40
Class Boundaries
<CF
>CF
Grouped Data
1. Measures of Central Location
a.
Mean
a.1. Long Method

x
where
f=
frequency
x = class
mark
fx
N
N = number of observations
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.1. Long
Method
fx
x
x
x
N
16 75
40
41.875
Grouped Data
a.2. Coding
Method
fu
N
A C
where
A = class mark of the mean
class (class interval of the
assumed mean)
C = class width
f = frequency
u = code
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.2. Coding Method
fu
A
C
x
x
( 1)
42 5
40
41.875
Grouped Data
a.3. Short/Deviation Method
fd
N
wher
A
e
A = class mark of the mean class
(class
interval of the assumed
mean)
f = frequency
d = deviation
Example
Calculations
f
fx
fd
fu
55-59
57
171
15
45
50-54
52
312
10
60
12
45-49
47
329
35
40-44
42
378
35-39
37
222
-5
-30
-1
-6
30-34
32
128
-10
-40
-2
-8
25-29
27
135
-15
-75
-3
-15
Total
N = 40
= 1675
=-5
=-1
Example
Calculations
a.3. Short/Deviation Method
x
x
x
fd
A
5
42
40
41.875
Grouped Data
b.
Median
%
x
N
LMd
f
2
f Md
where
LMd = lower class boundary of the
median class
f< = the cumulative frequency of
the class preceding the
median class
fMd = frequency of the median
class
C = class size
Sample Calculations
Class
f<
55-59
40
50-54
37
45-49
31
40-44
24
35-39
15
30-34
25-29
Total
N = 40
N = 40, N/2
= 20
Median is 20th term, hence it is under the 40-44 class
interval
LMd = 39.5
f< = 15
fMd = 9
x%
42.28
Grouped Data
c.
Mode
wher
e
x
LMo
Cd1
d1 d 2
LMo = lower class boundary of the

modal class
d1 = fmod fmod-1 = difference between the
frequencies of the modal class and the one
preceding it
d2 = fmod fmod+1 = difference between the
frequencies of the modal class and the one

following it
C = class size
Sample Calculations
Class
55-59
50-54
45-49
40-44
35-39
30-34
25-29
Total
N = 40
Modal class is class interval 40-44

LMo = 39.5
d1 = 9-6 = 3
d2 = 9-7 = 2
x
42.5
Grouped Data
2. Measures of
Variation/Deviation a. Mean
Deviation
MD
x x
Sample Calculations
Class
|x-xbar| f|x-xbar|
55-59
57
15.125
45.375
50-54
52
10.125
60.75
45-49
47
5.125
35.875
40-44
42
0.125
1.125
35-39
37
4.875
29.25
30-34
32
9.875
39.5
25-29
27
14.875
74.375
Total
N = 40
MD 7.16
286.25
Grouped Data
b. Standard Deviation
x x
2
N 1
N fx
fx
N N 1
N fu
N N 1
fu
Grouped Data
c.
Variance
s
s
s
x x
N 1
N fx
fx
2
N N 1
C
N fu
N N 1
fu
Sample Calculations
Class
55-59
57
15.125 228.765625
686.296875
50-54
52
10.125 102.515625
615.09375
45-49
47
5.125
26.265625
183.859375
40-44
42
0.125
0.015625
0.140625
35-39
37
4.875
23.765625
142.59375
30-34
32
9.875
97.515625
390.0625
25-29
27
Total
N = 40
s 8.95
|x-xbar| sqr|x-xbar| f(sqr|x-xbar|)
14.875 221.265625 1106.328125

3124.375
80.1025
Grouped Data
d. Semi-interquartile Range (Q)
Q3 Q1
Q
2
e. 10-90 Percentile Range P10-90
P10
90
P90
P10
Grouped Data
3. Measure of Position
From the median formula
x
%
LMd
2
f Md
a. Quartiles
Qk
LQ
kN
f
4
C
fQ
Grouped Data
b.
Deciles
Dk
LD
c.
Percentiles
Pk
LP
kN
f
10
C
fD
kN
f
100
C
fP
Sample Calculations
Compute for Q1, D7, P30.
Class
f<
55-59
40
50-54
37
45-49
31
40-44
24
35-39
15
30-34
25-29
Total
N = 40
Sample Calculations
Quartile 1:
k=1
Q1 class is 35-39
LQ = 34.5
f< = 9
fQ = 6
Therefore, Q1 = 35.33
That is, 25% of the data falls below the
score 35.33
Sample Calculations
Decile 7:
k=7
D7 class is 45-49
LD = 44.5
f< = 24
fD = 7
Therefore, D7 = 47.36
That is, 70% of the data falls below the
score 47.36
Sample Calculations
Percentile 30:
k = 30
P30 class is 35-39
LP = 34.5
f< = 9
fP = 6
Therefore, P30 = 37.00
That is, 30% of the data is below the
score 37.00
Position by
Normalization
The zscore
z
x x
s
- transforms any observation

of the
variable x to a new set of
observation of a random variable z
with zero mean and unity

variance.
Sample
Calculations
Suppose that you are comparing your grade in

Statistics against your grade in Accounting.
Let us say that your grade in Statistics is 89
and your grade in Accounting is only 85. If
those are the only data available for
comparing the grade, we can say that your
grade in Statistics is better as compared
to your grade in Accounting. However, if
we say that the mean grade that your teacher
gave in Statistics is 80 and in Accounting is
78, and that the standard deviations are 6 and
4, respectively, taking into considerations
these values, we cannot exactly say that still
the grade in Statistics is much better.
This is where the z-score is
useful.
Sample
Calculations
Statistic
s
Accounti
ng
Grade
Mean
Statistics
89
80
Accounting
85
78
Std.
Deviation
6
4
z 1.50
z 1.75
The z-score of the grade in Accounting

turned to be higher than that of the grade in
Statistics. If we are to plot it in a normal curve,
having the mean at the middle, the equivalent
z-score of the grade in accounting is farther to
the mean. Hence, the grade in Accounting is
higher as compared to that of the grade in
Statistics.
Graphical
Representation
a. Bar Chart (Horizontal) or Histogram
(Vertical)
Set of rectangles using the class
boundaries (x-axis for histogram y-axis for
bar chart) and the frequency (y- axis for
histogram, x-axis for bar chart).
Graphical
Representation
b. Frequency Polygon
Line plotting of the frequency (y-axis)
versus the class marks (x-axis).
10
9
8
7
6
5
4
3
2
1
0
27
32
37
42
47
52
57
Graphical
Representation
c. Ogive
Plot of the class boundary (x-axis)
against the cumulative (less than or
greater than) frequency
(yaxis).
45
40
35
30
25
20
15
10
5
0
24.5
29.5
34.5
39.5
44.5
49.5
54.5
Graphical
Representation
d. Pie Chart circular chart
representing percentage
occurrence of each group in the
Pie Chart
sampl
e.
2529
13
30-34 %
10%
35-39
15%
5559
7
%
50-54
15%
45-49
17%
4044
23
%
Skewne
ss
- Defined as the degree of symmetry
of distribution, whereas kurtosis is
the degree of peakedness exhibited
by the distribution.
- Skewness coefficient (Sk) of the
distribution can be determined by
using the Pearsonian Coefficient
of Skewness
Pearsonian Coefficient of
Skewness
Sk
3 x
x%
Skewness
Symmetric (Sk = 0)
-6
6
-4
-2
Skewness
Positively skewed (Sk > 0)
-6
6
-4
-2
Skewness
Negatively skewed (Sk < 0)
-6
6
-4
-2
Kurtos
is
Kurtosis
4
n
n(n 1)
x
x
i

s
(n 1)(n 2)(n 3)
i 1
3( n 1)
(n 2)(n 3)

Engineering Probability and Statistics Intro

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Engineering Probability and Statistics Intro

Încărcat de

Drepturi de autor:

Formate disponibile

Engineering

conclusions and reasonable

The symbol is read as the

The symbol , the Greek capital letter

x12 ... x1m ) (

x22 ... x2 m ) ... (

Data Collection and

Data Collection and

Hence, we choose the next higher

which is the result of a contest or

2. Median middle data or score

3. Mode most frequent value

1.8, 2.1, 1.7, 1.6, 0.9, 2.7, 1.8

1.8 2.1 1.7 1.6 0.9 2.7 1.8

0.9,1.6,1.7,1.8,1.8, 2.1, 2.7

observations that fall within each

Hint: treat tens digit as the stem and

Frequency Distribution Table

between the upper and lower class

Frequency Distribution Table

under each class interval by tallying.

1 3.322 log(40) 6.32

a.1. Long Method

LMo = lower class boundary of the

frequencies of the modal class and the one

Modal class is class interval 40-44

|x-xbar| sqr|x-xbar| f(sqr|x-xbar|)

14.875 221.265625 1106.328125

- transforms any observation

with zero mean and unity

Suppose that you are comparing your grade in

The z-score of the grade in Accounting

S-ar putea să vă placă și