Sunteți pe pagina 1din 49

ENGINEERING

PROBABILITY AND
STATISTICS
SYLLABUS FOR ENGINEERING
PROBABILITY AND STATISTICS
Course Description:
This course focuses on the descriptive branch of
statistics that comprise of data analysis and
organization of raw data into frequency table,
measures central tendency and dispersion,
introduction to probability and counting
techniques, probability laws, Bayes rule, random
variables, discrete and continuous probability
distribution and its applications to real world
setting.
COURSE OUTLINE:

PRELIM PERIOD
Definition of terms
Types of Statistics (Descriptive and inferential statistics
Level of data measurements
Grouped and Ungrouped data
Measures of Central Tendencies (Mean, median and Mode)
Measures of Variability (Range, variance an std. dev.)
Organizing data - Construction of frequency table, and its
graphical representation (freq. histogram and polygon)
Measures of Position (Percentile, Decile and Quartile)
Shape of data (Skewness and Kurtosis)
COURSE OUTLINE:

MIDTERM
Probability concept and theories
Events and Sample Space
Counting Rules
Tree Diagram
Venn Diagram
Addition Rule (Mutually and Not mutually exclusive events)
Multiplication Rule (dependent and Independent events)
Conditional Probability
Bayes Rule
Concept of random variables
- Discrete and continuous probability
COURSE OUTLINE:

FINAL PERIOD
Random Variable and Mathematical expectations
Special discrete probability Distribution
(binomial,multinomial geometric, hypergeometric, neg.
binomial and poisson distribution)
Special Continuous probability Distribution
(Uniform and Normal distribution)
Z and T test
Testing of Hypothesis
Quizzes: there are two exams in the course per period.
The time limit on all exams is 1.5 hours for T-Th class
and 1 hour for MWF.
Assignment: There are one homework (problem set) in
one period. Show all your calculations. You will receive
credit for honest attempts to answer all the questions,
even if your answers are incorrect. Homework that are
sloppy or incomplete will not earn full credit.

Seatworks: There are maximum of two seatwork in one


period. The problem is usually computational and
something we have just covered on assignment.

Departmental Exam: the departmental exam will serve as


your comprehensive final exam. The time limit is 2 hours
and the schedule will be announced before the final exam
week. (No Permit and No calculator, No examination)
INTRODUCTION
STATISTICS is a collection of methods for planning
experiments, obtaining data, and then organizing,
summarizing, analyzing, interpreting, and drawing
conclusions based on the data.

DESCRIPTIVE STATISTICS are methods for organizing


and summarizing data.
For example, tables or graphs are used to organize data,
and descriptive values such as the average score are
used to summarize data.

INFERENTIAL STATISTICS those methods concerned


with the analysis of a subset data leading to predictions or
inferences about the entire population (it allows us to
make claims or conclusions about the population based
on the sample data)
Introduction to Basic Terms

Statistics The science of collecting, describing, and


interpreting data.
Descriptive Statistics: collection, presentation, and
description of sample data.
Inferential Statistics: making decisions and drawing
conclusions about populations.
Variable A characteristic about each individual element of a
population or sample.
Data The value of the variable associated with one element
of a population or sample. This value may be a number, a
word, or a symbol.
Population - All subjects possessing a common
characteristic that is being studied. It consists of the totality
of the observations with which are concerned.

Sample - A subgroup or subset of the population.

Parameter - Characteristic or measure obtained from a


population. A numerical value summarizing all the data of
an entire population

Statistic - Characteristic or measure obtained from a


sample. A numerical value summarizing the sample data.
DATA
Statistical data are usually obtained by counting or measuring
items. Most data can be put into the following categories:
Qualitative - data are measurements that each fail into
one of several categories. (hair color, ethnic groups and
other attributes of the population)
quantitative - data are observations that are measured on
a numerical scale (distance traveled to college, number of
children in a family, etc.)
Measuring Variables
To establish relationships between variables,
researchers must observe the variables and
record their observations. This requires that the
variables be measured.
The process of measuring a variable requires a
set of categories called a scale of
measurement and a process that classifies
each individual into one category.
Types of Measurement Scales
1. Nominal level of measurement is characterized by the data that
consist of names, labels or categories only, and the data cannot be
arranged in an ordering scheme.
Ex. - collection of yes, no, undecided responses to a survey
question.
- responses consisting of 10 nurses, 15 teachers,
16 engineers, 5 priests, 20 businessmen.

2. Ordinal level of measurement involves data that my be arranged


in some order, but differences between data values either cannot be
determined or are meaningless.

Ex. - In a sample of 24 car stereos, 15 were rated good, 6 were


rated better, 3 were rated best
- in considering employee promotion, a manager ranked
Myrna 3rd, Al 7th, and Jena 10th
Types of Measurement Scales
3. Interval level measurement is like the ordinal level, with the
additional, that meaningful amounts of differences between data can
be determined. However, there is no inherent zero stating point.
Ex. - room temperatures ( in degrees Celsius ) of IE department.

4. Ratio level of measurement is the interval level modified to


include the inherent zero starting point. For values at this level,
differences and ratios are meaningful.
Ex. - heights of pine trees along Session road.
- temperature readings on Kelvin Scale since the scale has
an absolute zero
Example: A college dean is interested in learning about the
average age of faculty. Identify the basic terms in this
situation.

The population is the age of all faculty members at the college.


A sample is any subset of that population. For example, we
might select 10 faculty members and determine their age.
The variable is the age of each faculty member.
One data would be the age of a specific faculty member.
The experiment would be the method used to select the ages
forming the sample and determining the actual age of each
faculty member in the sample.
The parameter of interest is the average age of all faculty at
the college.
The statistic is the average age for all faculty in the sample.
Charts and Graphs
Frequency distributions are good ways to present the
essential aspects of data collections in concise and
understable terms
Pictures are always more effective in displaying large
data collections
Histogram
A graph which displays the data by using vertical bars of various
heights to represent frequencies. The bases of each bar are the
class boundary.

Frequency Polygon
Constructed by plotting class frequencies against class marks
and connecting the consecutive points by straight lines.
DESCRIPTIVE STATISTICS
Measures of Central Location

Any measure indicating the center of a set of data,


arranged in an increasing or decreasing order of
magnitude, is called a measure of central location or
measure of central tendency. The most commonly used
Measures of central location are the mean, median, and
mode.
THE MEAN
The arithmetic mean or mean, of a set of measurements is the
sum of the measurements divided by the total number of
measurements.

For Ungrouped Data:

Let x1 , x2 , x3 ,. Xn be n observations of a random variable X. The


sample mean, denoted by x, is the arithmetic average of these
values. That is,
_ x1 + x2 + x3 , + +Xn
x (x-bar) = -------------------------------
n
For Grouped Data
_ fi xi
x (x-bar) = ----------
fi
Where: fi is the frequency of class interval i
xi is the class midpoint of class interval i
THE MEDIAN
For Ungrouped Data:

Let x1 , x2 , x3 ,. Xn be a sample observations arranged in the order of


smallest to largest. The sample median for this collection is given by the middle
observation if n is odd. If n is even, the sample median is the average of the two
middle observations.

For Grouped Data:

When the data are grouped into a frequency distribution, the median is obtained
by finding the cell that has the middle number and then interpolating within the
cell.
n/2 <cf1-1 n/2 >cfi-1
x = Lb + -------------------- (i) OR x = Ub - -------------------- (i)
fi fi
where:
Lb = lower class boundary of the interpolated interval
Ub = lower class boundary of the interpolated interval
<cfi-1 = less than cumulative frequency of the class before interpolated interval
>cfi-1 = greater than cumulative frequency of the class before interpolated interval
fi = frequency of the interpolated interval
i = class size
n = number of data points.
THE MODE
The last measure of central tendency is the mode. The value
that is observed most frequently. The mode is undefined for
sequences in which no observation is repeated.

The drawback to this measure is that there might not be a


unique mode. There might be no single number that occurs more
often that any another. For this reason, the mode is not a particularly
useful descriptive measure.

When the data are grouped into a frequency distribution, the


midpoint of the cell with the highest frequency is the mode, since
this point represents the highest point (greatest frequency) .
Data that are presented in the form of frequency distribution are
called grouped data

A set of measurements that has not been organized numerically


is called raw data.

Frequency Distribution - The organization of raw data in table


form with classes and frequencies. An arrangement of a large
mass of data by grouping into different classes of the same size
and determining the number of observations that fall in each of
the classes.
Ungrouped Frequency Distribution - A frequency distribution
of numerical data. The raw data is not grouped.
Grouped Frequency Distribution - A frequency distribution
where several numbers are grouped into one class.

Class Frequency - The number of observations falling in a


particular class. (denoted by the letter f)
Relative frequency of a class is defined as the frequency of the class
divided by the total number of measurements.

Class Boundaries - Separate one class in a grouped


frequency distribution from another. The boundaries have one
more decimal place than the raw data and therefore do not
appear in the data. There is no gap between the upper boundary
of one class and the lower boundary of the next class. The lower
class boundary is found by subtracting 0.5 units from the lower
class limit and the upper class boundary is found by adding 0.5
units to the upper class limit.

Class Width / Size - The difference between the upper and


lower class boundaries of any. The class width is also the
difference between the lower limits of two consecutive classes or
the upper limits of two consecutive classes. It is not the
difference between the upper and lower limits of the same class.
Class Limits - The smallest and the largest values that can
fall in a given class interval.
Lower Class Limit The smallest value in a class interval
Upper Class Limit - The largest value in a class interval

Class Mark (Midpoint) - The number in the middle of the


class. It is found by adding the upper and lower limits and
dividing by two. It can also be found by adding the upper and
lower boundaries and dividing by two.

The total frequency of all values less than the upper class
boundary of a given interval up to and including that interval is
called the Cumulative frequency
Example:
One hundred families were chosen at random, and their yearly income was
recorded.
Income of 100 families

Income in thousands No. of families


10 14 3
15 19 12
20 24 19
25 29 20
30 34 23
35 39 18
40 - 44 5

100
Frequency distribution table
14 and 10 are Income in No. of families
called class limits thousands
14 is the upper limit 10 14 3
10 is the lower limit
15 19 12
20 24 19

Class intervals 25 29 20
Class frequency
30 34 23
In the table, 35 39 18
the class
width is 5 40 - 44 5
Total 100

The difference between the upper


and lower class boundaries is called the class width or class size
1. Find the class boundaries, class marks and class widths for the
following interval

7 13 Class boundaries = 6.5 13.5 --- (-5) (+5)


Class mark = 10
Class width =7

(-5) (-1) Class boundaries = (-5.5) (-0.5)


Class mark = -3
Class width =5

10.4 18.7 Class boundaries = 10.35 18.75


Class mark = 14.55
Class width = 8.4

0.346 0.418 Class boundaries = 0.3455 0.4185


Class mark = .382
Class width = .073
The weight of 50 men is depicted in the table below in the form of frequency
distribution.

Weight Freq. Boundary Classmarks R. Freq. C. Freq.


115 121 2 114.5 121.5 118 .04 2
122 128 3 121.5 128.5 125 .06 5
129 135 13 128.5 135.5 132 .26 18
136 142 15 135.5 142.5 139 .30 33
143 149 9 142.5 149.5 146 .18 42
150 156 5 149.5 156.5 153 .10 47
157 163 3 156.5 163.5 160 .06 50

Class boundary = 115 - 0.5 = 114.5 and 121 + .5 = 121.5

Class mark = 114.5 + 121.5 = 236 / 2 = 118

Relative frequency = 2 / 50 = .04

Cumulative frequency = 0 + 2 = 2 and 2 + 3 = 5


Weight Freq. Boundary Classmarks R. Freq. C. Freq.
115 121 2 114.5 121.5 118 .04 2
122 128 3 121.5 128.5 125 .06 5
129 135 13 128.5 135.5 132 .26 18
136 142 15 135.5 142.5 139 .30 33
143 149 9 142.5 149.5 146 .18 42
150 156 5 149.5 156.5 153 .10 47
157 163 3 156.5 163.5 160 .06 50

Compute for the mean, median and mode


The frequency table represents the final examination for an
statistics course. Find the mean, the median, and the
mode.
Class Interval Frequency Class mark Cumulative
Frequency
10 19 3 14.5 3
20 29 2 24.5 5
30 39 3 34.5 8
40 49 4 44.5 12
50 59 5 54.5 17
60 69 11 64.5 28
70 79 14 74.5 42
80 89 14 84.5 56
90 99 4 94.5 60
fi xi
Mean = ---------------
fi
(3)(14.5) + (2)(24.5) +( 3)(34.5) + (4)(44.5) + (5)(54.5) +
(11)(64.5) + 14(74.5)+ (14)(84.5) +(4)(94.5)
Mean = -------------------------------------------------------------------------------
3 + 2 + 3 + 4 + 5 + 11 + 14 + 14 + 14
Mean = 66
n/2 <cf1-1
Median = Lb + -------------------- (i)
fi
60/2 28
Median = 69.5 + -------------------- (10)
14
Median = 70.93
Mode = Classmark with the highest frequency
Mode = 74.5 and 84.5
Guidelines for constructing Class frequency distributions
1. Count the number of data points in the set of data.
2. Determine range, R (highest value lowest value).
3. Decide on the number of the class intervals. ( ideal number 5 to 15).
Approximate the appropriate number of class intervals,
Herbert Sturges Formula: K = 1 + 3.322 log n
Where K = number of classes suggested
n = represents the total frequency.
4. Determine the class width:
Class width (i) = round up of ( range/number of classes )
5. Select the lower limit (lowest score or # less than lower range)
6. Add the class width to the starting point to get the second lower class limit.
Add the class width to the second lower class limit to get the third, so on.
7. List the lower class limits in a vertical column, and enter the upper class
limits.
8. Represent each score by a tally in the appropriate class.
9. Get the total frequency count for that class.
The 25 measurements given below represent the sulfur level in the air for a sample
of 25 days. The unit used in parts per million.

27 32 28 32 31
35 28 44 45 36
33 40 41 36 35
39 37 39 37 44
41 41 35 35 33

a. Make a frequency distribution table


b. Construct Histogram
c. Compute for relative frequency
Class Interval Frequency Relative freq.
26 29 3 .12
30 33 5 .20
34 37 8 .32
38 41 6 .24
42 - 45 3 .12
The following scores represents the final examination grade for an elementary
statistics course:

23 60 79 32 57 74 52 70 82 36
80 77 81 95 41 65 92 85 55 76
52 10 64 75 78 25 80 98 81 67
41 71 83 54 64 72 88 62 74 43
60 78 89 76 84 48 84 90 15 79
34 67 17 82 69 74 63 80 85 61

Using 9 intervals with the lowest starting at 10,

a.Set up a frequency distribution table


b.Construct a cumulative frequency distribution
MEASURES OF VARIABILITY - refers to the extent of scatter or
dispersion around the zone of central tendency
A. RANGE
One measure of variation is the range, which has the advantage
of being very easy to compute. The range, R, of a set of n
measurements is defined as the difference between the largest and
smallest measurements.

Formula:
Range = Highest score Lowest Score or R = (H L)

B. VARIANCE and STANDARD DEVIATION

The variance of a population of N measurements is defined to be the


average of the squares of the deviations of the measurements about
their mean . The population variance is denoted by and is given
by the formula
(x - )
= -------------- for ungrouped data
N
(x - )
= ----------------- for grouped data

The variance of a sample of n measurements is defined to be the sum
of the squared deviations of the measurement about their mean x
divided by (n-1). The sample variance is denoted by s and is given by
the formula
(x x)
s = --------------- for ungrouped data
n-1
(x x)
s = ------------------- for grouped data
n-1

The standard deviation, in essence, represents the average amount of


variability in a set of measures, using the mean as a reference point.
Strictly speaking, the standard deviation is the positive square root of
the average of the square deviations about the mean or the positive
square root of the variance. The standard deviation is basically a
measure of how far each score, on the average, is from the mean
1. The reaction times for a random sample of 9 subjects to a stimulant
were recorded as 2.5, 3.6, 3.1, 4.3, 2.9, 2.3, 2.6, 4.1 and 4.3
seconds. Calculate the range, variance and standard deviation.
Range = HV LV
= 4.3 2.3 = 2
(x x-bar)
s = --------------------------
n-1
(2.5-3.3)2 + (3.6-3.3)2 + (3.1-3.3)2 +(4.3-3.3)2 + (2.9-3.3)2 + (2.3-
3.3)2 +(2.6-3.3)2 + (4.1-3.3)2 + (4.3-3.3)2
= -----------------------------------------------------------------------------------
9 -1
= 0. 6325 (sample variance)

s = sqrt (0.6325)
= 0.795298686 or 0.80 (sample standard deviation)
The frequency table (on the right side) represent the final
examination for an statistics course. Find the population range,
population variance and population standard deviation

Class Interval Frequency Class mark Cumulative


Frequency

10 19 3 14.5 3
20 29 2 24.5 5
30 39 3 34.5 8
40 49 4 44.5 12
50 59 5 54.5 17
60 69 11 64.5 28
70 79 14 74.5 42
80 89 14 84.5 56
90 99 4 94.5 60
Range = Highest Upper Class Boundary - Smallest Lower Class Boundary
= 99.5 9.5
= 90
(x - )
= -----------------

3(14.5 66)2 +2 (24.5 66)2 +3 (34.5 66)2 + 4(44.5 66)2 +
5(54.5 66)2 +11 (64.5 66)2 +14 (74.5 66)2 +
14(84.5 66)2 + 4(94.5 66)2
= ----------------------------------------------------------------------------
60
= 432.75

= 20.80264406 or 20.80
Measures
Measures of
of Shape
Shape

- refer to the visual characteristics of a certain


distribution.
- knowledge of the shape of the distribution can
help in concluding whether the distribution is
normal or not

Two (2) Principal Measures


of Shape

SKEWNESS
KURTOSIS
Measures
Measures of
of Shape
Shape

Skewness

refers to the symmetry of a


distribution. A distribution
which is not symmetric with
respect to its mean can be
termed as either positively-
skewed or negatively-skewed Kurtosis
refers to the flatness or
peakedness of a particular
distribution
Skewness
Skewness

SK = 0
Symmetric (Normal)
SK= (Xi - X)/ 3
n SK > 0
where: Positively Skewed

Xi - individual reading
s - standard deviation
X - sample mean SK< 0
n - sample size Negatively Skewed
SKEWNESS
- degree of Symmetry

Population Sample

fi (xi )3 fi (xi x)3


GROUPED SK = ---------------- SK = ----------------
( fi ) 3 ( fi ) s 3

UNGROUPED (xi )3 (xi x)3


SK = ---------------- SK = ----------------
N3 n3
SK > 0 SK = 0 SK< 0
Positively Skewed Symmetric (Normal) Negatively Skewed
Kurtosis
Kurtosis

= 3
MesoKurtic (Normal)
= [(Xi - X)/ 4

where:
n
> 3
Xi - individual reading LeptoKurtic
s - standard deviation
X - sample mean
n - sample size < 3
PlatyKurtic
KURTOSIS
- flatness or peakedness of a distribution

Population Sample

fi (xi )4 fi (xi x)4


GROUPED K = ---------------- K = ----------------
( fi ) 4 ( fi ) s4

UNGROUPED (xi )4 (xi x)4


K = ---------------- K = ----------------
N4 n4

= 3 > 3 < 3
MesoKurtic (Normal) LeptoKurtic PlatyKurtic
Measures of Position :
PERCENTILES, DECILES AND QUARTILES
Measures of position are used to described the location of a particular
observation in relation to the rest of the data set.

Percentile - are values that divide the ranked data set into 100 equal
parts.

The pth percentile is the value that separate the bottom p% of the
ranked scores from the top (100 p)%.

Quartiles are values that divide the ranked data set into four equal parts.
The three quartiles denoted by Q1 , Q2 , Q3 divide the ranked scores into four
equal parts.
Deciles - are values that divide the ranked data set into ten equal parts.
here are nine deciles denoted by D1 , D2 , .D9 which partition the data
into 10 groups with about 10 % of the data in each group
The table below gives the ages of commercial aircraft randomly selected
from several airlines.

2 7 11 15 19
2 7 11 15 19
2 7 12 15 20
2 7 12 15 20
4 7 12 15 20
4 10 14 15 22
4 10 14 16 24
4 10 14 16 25
5 10 14 17 25
5 10 15 17 27

Find the percentiles for the ages 10, 15, and 20. 10 = P30 15 = P58 20 = P84
Find P90, D8, and Q3.
The percentile for observation x if found by dividing the
number of observations less than x by the total number of
observations and then multiplying this quantity by 100. then
rounded to the nearest whole number.

The pth percentile for a ranked data set is found by


computing the index.

i = (n*p) / 100 . If I is not an integer, the next integer


greater than I locates the position of the pth percentile in the
ranked data set.
If I is an integer, the pth percentile is the average of the
observations in positions i and i + 1 in the ranked data set.
Deciles and Quartiles are determined in the same manner
as percentile, since they may be expressed as percentiles.
The deciles are presented as D1,D2,D3, D9 and the
quartiles are represented as Q1,Q2, and Q3
D1 = P10 , D3 = P30 , D7 = P70, Q1 = P25 , Q2 = P50, Q3 = P75

Median = P50 = D5 = Q2

Interquartile range shows the spread of the middle


50% of the data and is not affected by the extremes in
the data set.

IQR = Q3 Q1
RAW DATA NEED TO BE RANKED PRIOR TO
FINDING MEASURES OF POSITION.
3.0 5.0 6.2 7.6 9.4
3.3 5.2 6.3 7.6 9.5
3.5 5.5 6.4 7.7 9.5
3.5 5.5 6.6 7.8 10.0
3.6 5.5 6.6 7.8 10.5
4.0 5.8 6.8 8.5 10.8
4.0 5.8 6.8 8.5 10.9
4.2 5.9 6.8 8.8 11.0
4.6 6.0 7.0 8.8 11.0

To find for the tenth percentile for the data : compute for i = 10 (45) / 100 = 4.5
The next integer greater than 4.5 is 5. The observation in the fifth position in the
above data is 3.6. therefore P10 = 3.6. Note that at least 10% of the data are 3.6
or less and at least 90% of the data are 3.6 or more.

S-ar putea să vă placă și