Sunteți pe pagina 1din 100

Engineering

Probability and
Statisti
cs
Fundamentals of
Statistics

Introduction
Statistics
Both singular and plural in meaning
May refer to either numerical or
quantitative data like age, weight,
height, etc.
Is a scientific method of
collection, presentation, analysis
and
interpretation of data for the
purpose of drawing valid

conclusions and reasonable


decisions

Fields of Statistics
Descriptive Statistics
Method concerned with the collection
and description of a set of data to
yield meaningful information
Provides information only about
collected data and does not draw
inferences of conclusions about a
large set of data

Fields of Statistics
Inferential Statistics
Composed of those methods
concerned with the analysis of a
smaller group of data which is known
as the sample leading to predictions
or inferences about the larger set of
data, or the population at which the
sample is drawn.

The Language of
Summation
n

i 1

The symbol is read as the


sum (or summation) of x1,
x2, x3, , xn.
Or it is technically read as the sum
of xi
terms where i ranges from 1 to n.

The symbol , the Greek capital letter


sigma, is used to denote the sum.

Law of Notation
a.
)

xi

yi

b.
)

cxi

c.
)
n

c
i 1

i 1

i 1

c xi

i 1
n

xi

i 1

i 1

nc


xij

i 1 j 1

( x1
1

x12 ... x1m ) (


x21

x22 ... x2 m ) ... (


xn1

xn 2 ... xnm )

Parameters and
Statistics
Parameter any numerical value
describing the characteristics of a
population
Statistic any numerical value
describing the characteristics of a
sample

Data Collection and


Presentation
Data Collection
1. Data from Primary sources
acquired through experiments,
data gathering devices,
interviews, and surveys
2. Data from Secondary sources
from printed materials such as
books, magazines and journals

Data Collection and


Presentation
Slovins Formula
The sufficiency of sample size in
surveys can be obtained using the
formula
where

n = sample size

N = population
size
e = estimated
error

Ne

Example Slovins
Formula
Problem: How large a sample should be
chosen if we expect 5% error from a
population of 3000?
Given:
N = 3000,
e=
0.05
Required: n
3000
Solutio
n
2
1 3000 0.05
n:
n 352.94

Hence, we choose the next higher

whole
value, which is 353, as the minimum
number of
sample to sufficiently obtain the
experiment.

Sampling Techniques
1. Non-Probability Sampling type of
sampling when an individual subject
has certain or no chance of being
chosen as a sample.
a. Convenience Sampling
sampling technique based primarily
on the availability of the
respondents.

Sampling Techniques
Non- Probability Sampling
b. Quota Sampling sampling
technique where there is a desired
number of sample and the
respondents were taken as they
volunteered themselves as to be
part of the experiment
c. Purposive Sampling sampling
technique where sample is obtained
based on a certain premise.

Sampling Techniques
2. Probability Sampling eliminates
the biases against certain event that
has no chance to be selected by
listing all the possible events
a. Simple Random Sampling
performed by arranging the
population according to a certain
rule, each element being numbered
and a sample is taken by various
randomizing principles

Sampling Techniques
Probability Sampling
b. Systematic Sampling done by
arranging the population in
accordance to a certain order and
the sample will be taken by dividing
the population into equal groups and
obtaining the kth element in each
group.

Sampling Techniques
Probability Sampling
c. Stratified Sampling technique
done by grouping the population
into strata, a subpopulation with
generally homogeneous or similar
characteristics, where a random
sampling is performed in each
stratum
proportional to the size of the
stratum relative to the population.

Sampling Techniques
Probability Sampling
d. Cluster Sampling technique
done by identifying groups called
clusters, a subpopulation with
elements as heterogeneous or
diverse in characteristics as possible.
Clusters must be similar to each
other with respect to the parameter
being examined. A cluster or some
clusters is selected for the sampling.

Types of Data
Qualitative descriptions used to
portray the attributes of data
Quantitative measurable
quantities such as scores, weights,
grades being collected.
Grouped data categorized data
Ungrouped data raw, random data

Classification of Data
Categorical data like gender, color,
civil status, and location are
commonly answered by nonnumeric data (qualitative) form.
Numerical data information and
observations that are countable or
measurable quantities such as
scores, weight and grades
(quantitative).

Levels of Measurement of
Numerical
Dat
a
1. Nominal data commonly
categorical data assigned to
numbers. Example of which is
assigning 1 for males and 2 for
females.
2. Ordinal data quantities where the
numbers are used to designate the
rank order of the data. Example of

which is the result of a contest or


race where ranking is measured.

Levels of Measurement of
Numerical
Dat
a
3. Interval data data type where the range
between the numeric values is constant. In
this type of
data, addition and subtraction can be
performed but not multiplication and
division. Example of which is the year,
temperature measured, final grade, etc.
4. Ratio data widely used in science and
engineering.
Almost all basic operations can be

performed in this
data type. One significant characteristic is
the presence
of a non-arbitrary zero-point. Examples of
which are
length, mass, angles, charge and energy.

Presentation of Data
1. Textual Form way of
presenting data in terms of
statements, sentences, and
paragraphs.
2. Tabular Form using tables to
present data that is direct to the
point and easily understood.
Examples of which are reference
table and the summary table.

Ungrouped Data
Measures of Central Tendency
1. Mean average of all the data or
values
N
Populati
xi

on
i 1

Sampl
e

N
n

x
i 1

x%

Ungrouped Data

2. Median middle data or score


Procedure:
a. Arrange data in an
array b. Locate the
middle value
if odd, middle value is the
median if even, get the
average of the two middle
value

Ungrouped Data

3. Mode most frequent value


if two values occur the most at
the same frequency, then bi-modal
if three values occur the most at
the same frequency, then tri-modal

Ungrouped Data
Measure of Variation of Data
How the data is spread out from the
mean
1. Range
- Difference of highest and lowest
values
RANGE = HV LV

Ungrouped Data
2. Standard Deviation
Populati
N
2
x
i
on
i 1

Sampl
e

xi

i 1

n 1

Ungrouped Data
3.
Variance
Populati
on

xi

i 1

Sample
s

xi

i 1

n 1

Ungrouped Data
4. Coefficient of Variation
- the percentage of the ratio of
standard deviation to the mean
Population
Sampl
e

Vp

100%

Vs

s
100%
x

Examp
le
A food inspector examined a random
sample of
7 cans of certain brand of tuna to
determine the
percentage of foreign impurities. The
following
data are recorded:

1.8, 2.1, 1.7, 1.6, 0.9, 2.7, 1.8

Calculations
7

1.8 2.1 1.7 1.6 0.9 2.7 1.8


7

Mea
n

Media
n

0.9,1.6,1.7,1.8,1.8, 2.1, 2.7

x%

Mod
e

i
i 1

7
1.8

1.8

x 1.8

Grouped Data
1. Stem-Leaf Plot one way of
summarizing ungrouped data.
This table has two
columns, one for the stem and the
other for the leaves.
2. Frequency Distribution Table (fdt)
numerous data can be analyzed
by grouping the data into different
classes with equal class intervals
and determining the number of

observations that fall within each


class.

Example Stem-Leaf
Plot
Express the following data as a stem
and leaf plot with the tens digit as
the stem and the ones digit as the
leaves
12, 23, 12, 11, 10, 25, 29, 39, 31, 43,
42, 54,
53, 53, 56, 57, 56, 67, 54, 65, 76, 76,
75, 74

Hint: treat tens digit as the stem and


ones digits as the leaves

Example Stem-Leaf
Plot
STEM-LEAF
PLOT
STEM
LEAF
FREQUENCY
1
0,1,2,2
2
3,5,9
3
1,9
4
2,3
5
3,3,4,4,6,6,7
6
5,7
7
4,5,6,6

4
3
2
2
7
2
4

TOTAL

24

Frequency Distribution Table


Terms
Class limits smallest and largest
values that fall within the class
interval
Class boundaries class intervals
more precise limits (by the next
significant digit)
Frequency number of observations
Class width numerical difference

between the upper and lower class


boundaries
Class mark midpoint of class
interval

Frequency Distribution Table


Terms
Less than cumulative frequency
start to add from the lowest class
interval
Greater than cumulative frequency
start to add from the highest class
interval

Grouped Data
Steps in constructing a frequency distribution
1. Decide on the number of class intervals
required
Square-Root Principle
k=N
or
Sturges Formula
k = 1 + 3.322logN
(round-up result)
2. Determine the range.
3. Determine the size of the class interval by
dividing the range by the desired number of
class interval (round-off).
4. Determine the lower and upper class limits
of each class interval. Start with the lowest
data or score.
5. Determine the number of observations falling

under each class interval by tallying.

Examp
le
Example: Suppose that the following
data are the average speeds of
vehicles running in SLEX (in kph):
27

56

38

43

48

38

43

30

34

40

50

43

57

52

25

43

35

29

49

36

29

52

46

49

46

47

31

52

41

31

55

50

42

41

52

25

46

36

41

36

Calculations
1. Square-Root
k
40 6.32
Formula
k 7
or Sturges
a kk
F 2. Range =
o 57-25
r
=
m
3
u
2
l
32

3. Class size =

1 3.322 log(40) 6.32


7

Frequency
Distribution
Class

x
(Class
Marks)

55-59

57

54.5-59.5

40

50-54

52

49.5-54.5

37

45-49

47

44.5-49.5

31

16

40-44

42

39.5-44.5

24

25

35-39

37

34.5-39.5

15

31

30-34

32

29.5-34.5

35

25-29

27

24.5-29.5

40

Total

N = 40

Class Boundaries

<CF

>CF

Grouped Data
1. Measures of Central Location
a.

Mean

a.1. Long Method


x

where
f=
frequency
x = class
mark

fx
N

N = number of observations

Example
Calculations
f

fx

fd

fu

55-59

57

171

15

45

50-54

52

312

10

60

12

45-49

47

329

35

40-44

42

378

35-39

37

222

-5

-30

-1

-6

30-34

32

128

-10

-40

-2

-8

25-29

27

135

-15

-75

-3

-15

Total

N = 40

= 1675

=-5

=-1

Example
Calculations
a.1. Long
Method

fx

x
x
x

N
16 75
40
41.875

Grouped Data
a.2. Coding
Method

fu
N
A C

where
A = class mark of the mean
class (class interval of the
assumed mean)
C = class width
f = frequency
u = code

N = number of observations

Example
Calculations
f

fx

fd

fu

55-59

57

171

15

45

50-54

52

312

10

60

12

45-49

47

329

35

40-44

42

378

35-39

37

222

-5

-30

-1

-6

30-34

32

128

-10

-40

-2

-8

25-29

27

135

-15

-75

-3

-15

Total

N = 40

= 1675

=-5

=-1

Example
Calculations
a.2. Coding Method

fu
A

C
x
x

( 1)
42 5
40
41.875

Grouped Data
a.3. Short/Deviation Method

fd
N

wher
A
e
A = class mark of the mean class
(class
interval of the assumed
mean)
f = frequency

d = deviation
N = number of observations

Example
Calculations
f

fx

fd

fu

55-59

57

171

15

45

50-54

52

312

10

60

12

45-49

47

329

35

40-44

42

378

35-39

37

222

-5

-30

-1

-6

30-34

32

128

-10

-40

-2

-8

25-29

27

135

-15

-75

-3

-15

Total

N = 40

= 1675

=-5

=-1

Example
Calculations
a.3. Short/Deviation Method

x
x
x

fd
A

5
42
40
41.875

Grouped Data
b.
Median

%
x

N
LMd

f
2
f Md

where
LMd = lower class boundary of the
median class
f< = the cumulative frequency of
the class preceding the
median class
fMd = frequency of the median
class

C = class size

Sample Calculations
Class

f<

55-59

40

50-54

37

45-49

31

40-44

24

35-39

15

30-34

25-29

Total

N = 40

N = 40, N/2
= 20
Median is 20th term, hence it is under the 40-44 class
interval
LMd = 39.5
f< = 15
fMd = 9

x%

42.28

Grouped Data
c.
Mode

wher
e

x
LMo

Cd1
d1 d 2

LMo = lower class boundary of the


modal class
d1 = fmod fmod-1 = difference between the
frequencies of the modal class and the one
preceding it
d2 = fmod fmod+1 = difference between the

frequencies of the modal class and the one


following it
C = class size

Sample Calculations
Class

55-59

50-54

45-49

40-44

35-39

30-34

25-29

Total

N = 40

Modal class is class interval 40-44


LMo = 39.5
d1 = 9-6 = 3
d2 = 9-7 = 2

x
42.5

Grouped Data
2. Measures of
Variation/Deviation a. Mean
Deviation

MD

x x

Sample Calculations
Class

|x-xbar| f|x-xbar|

55-59

57

15.125

45.375

50-54

52

10.125

60.75

45-49

47

5.125

35.875

40-44

42

0.125

1.125

35-39

37

4.875

29.25

30-34

32

9.875

39.5

25-29

27

14.875

74.375

Total

N = 40

MD 7.16

286.25

Grouped Data
b. Standard Deviation

x x
2

N 1
N fx

fx

N N 1

N fu

N N 1

fu

Grouped Data
c.
Variance

s
s
s

x x
N 1

N fx

fx
2

N N 1
C

N fu

N N 1

fu

Sample Calculations
Class

55-59

57

15.125 228.765625

686.296875

50-54

52

10.125 102.515625

615.09375

45-49

47

5.125

26.265625

183.859375

40-44

42

0.125

0.015625

0.140625

35-39

37

4.875

23.765625

142.59375

30-34

32

9.875

97.515625

390.0625

25-29

27

Total

N = 40

s 8.95

|x-xbar| sqr|x-xbar| f(sqr|x-xbar|)

14.875 221.265625 1106.328125


3124.375

80.1025

Grouped Data
d. Semi-interquartile Range (Q)
Q3 Q1
Q
2
e. 10-90 Percentile Range P10-90

P10

90

P90

P10

Grouped Data
3. Measure of Position
From the median formula
x
%

LMd

2
f Md

a. Quartiles
Qk

LQ

kN
f
4
C
fQ

Grouped Data
b.
Deciles
Dk

LD

c.
Percentiles
Pk

LP

kN
f
10
C
fD

kN
f
100
C
fP

Sample Calculations
Compute for Q1, D7, P30.
Class

f<

55-59

40

50-54

37

45-49

31

40-44

24

35-39

15

30-34

25-29

Total

N = 40

Sample Calculations
Quartile 1:
k=1
Q1 class is 35-39
LQ = 34.5
f< = 9
fQ = 6
Therefore, Q1 = 35.33
That is, 25% of the data falls below the
score 35.33

Sample Calculations
Decile 7:
k=7
D7 class is 45-49
LD = 44.5
f< = 24
fD = 7
Therefore, D7 = 47.36
That is, 70% of the data falls below the
score 47.36

Sample Calculations
Percentile 30:
k = 30
P30 class is 35-39
LP = 34.5
f< = 9
fP = 6
Therefore, P30 = 37.00
That is, 30% of the data is below the
score 37.00

Position by
Normalization
The zscore
z

x x
s

- transforms any observation


of the
variable x to a new set of
observation of a random variable z

with zero mean and unity


variance.

Sample
Calculations

Suppose that you are comparing your grade in


Statistics against your grade in Accounting.
Let us say that your grade in Statistics is 89
and your grade in Accounting is only 85. If
those are the only data available for
comparing the grade, we can say that your
grade in Statistics is better as compared
to your grade in Accounting. However, if
we say that the mean grade that your teacher
gave in Statistics is 80 and in Accounting is
78, and that the standard deviations are 6 and
4, respectively, taking into considerations
these values, we cannot exactly say that still
the grade in Statistics is much better.
This is where the z-score is
useful.

Sample
Calculations
Statistic
s
Accounti
ng

Grade

Mean

Statistics

89

80

Accounting

85

78

Std.
Deviation
6
4

z 1.50
z 1.75

The z-score of the grade in Accounting


turned to be higher than that of the grade in
Statistics. If we are to plot it in a normal curve,
having the mean at the middle, the equivalent
z-score of the grade in accounting is farther to
the mean. Hence, the grade in Accounting is
higher as compared to that of the grade in
Statistics.

Graphical
Representation
a. Bar Chart (Horizontal) or Histogram
(Vertical)
Set of rectangles using the class
boundaries (x-axis for histogram y-axis for
bar chart) and the frequency (y- axis for
histogram, x-axis for bar chart).

Graphical
Representation
b. Frequency Polygon
Line plotting of the frequency (y-axis)
versus the class marks (x-axis).
10
9
8
7
6
5
4
3
2
1
0
27

32

37

42

47

52

57

Graphical
Representation
c. Ogive
Plot of the class boundary (x-axis)
against the cumulative (less than or
greater than) frequency
(yaxis).
45
40
35
30
25
20
15
10
5
0

24.5
29.5
34.5
39.5
44.5
49.5
54.5

Graphical
Representation
d. Pie Chart circular chart
representing percentage
occurrence of each group in the
Pie Chart
sampl
e.
2529
13
30-34 %
10%
35-39
15%

5559
7
%

50-54
15%

45-49
17%

4044
23
%

Skewne
ss
- Defined as the degree of symmetry
of distribution, whereas kurtosis is
the degree of peakedness exhibited
by the distribution.
- Skewness coefficient (Sk) of the
distribution can be determined by
using the Pearsonian Coefficient
of Skewness

Pearsonian Coefficient of
Skewness

Sk

3 x

x%

Skewness
Symmetric (Sk = 0)

-6
6

-4

-2

Skewness
Positively skewed (Sk > 0)

-6
6

-4

-2

Skewness
Negatively skewed (Sk < 0)

-6
6

-4

-2

Kurtos
is
Kurtosis

4
n
n(n 1)
x
x
i

s
(n 1)(n 2)(n 3)
i 1

3( n 1)
(n 2)(n 3)

S-ar putea să vă placă și