Documente Academic
Documente Profesional
Documente Cultură
Business Analytics
for Engineers
Normal Distribution
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Data Distribution
• Data can be “distributed” (spread out) in different ways
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What is Normal (Gaussian)
Distribution?
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Types of Distribution
• Frequency Distribution
• Normal (Gaussian) Distribution
• Probability Distribution
• Poisson Distribution
• Binomial Distribution
• Sampling Distribution
• t distribution
• F distribution
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
A Bell Curve
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What are some examples of things that
follow a Normal Distribution?
• Heights of people
• Size of things produced by machines
• Errors in measurements
• Blood Pressure
• Test Scores
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Standard Normal Distribution
• mean=median=mode
• Symmetry about the center
• 50% of the values less than the mean and 50%
greater than the mean
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Characteristics of Normal Distribution
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
The Standard Deviation
68% of values
are within 1
standard
deviation of
the mean
95% of values
are within 2
standard
deviations of
the mean
99.7% of
values are
within 3
standard
deviations of
the mean
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Why do we need to know
Standard Deviation?
• Any value is
• likely to be within 1 standard deviation of the mean
• very likely to be within 2 standard deviations
• almost certainly within 3 standard deviations
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
How good is rule for real data?
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
68% of 120 = .68x120 = ~ 82 runners
In fact, 79 runners fall within 1-SD (15.5 lbs) of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
95% of 120 = .95 x 120 = ~ 114 runners
In fact, 115 runners fall within 2-SD’s of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
99.7% of 120 = .997 x 120 = 119.6 runners
In fact, all 120 runners fall within 3-SD’s of the mean.
25
20
P
e 15
r
c
e
n 10
t
0
80 90 100 110 120 130 140 150 160
POUNDS
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
The Normal Distribution:
as mathematical function (pdf)
1 x 2
1 ( )
f ( x) e 2
2
This is a bell shaped
Note constants: curve with different
=3.14159 centers and spreads
e=2.71828 depending on and
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outliers ?
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
What is an outlier?
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
3 12 7 40 9 14 18 15 17
Mean is 15
Median is 14
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
To find any outliers in a set of data, we need
to find the 5 Number Summary of the data.
Find the 5 Number Summary of the following numbers:
Step 1: Sort the numbers from lowest to highest
3 7 9 12 14 15 17 18 40
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
3 7 9 12 14 15 17 18 40
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
A 5 Number Summary divides your data into four quarters.
3 7 9 12 14 15 17 18 40
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
25% of all the numbers in the set are smaller than Q1
3 7 9 12 14 15 17 18 40
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
3 7 9 12 14 15 17 18 40
17 - 9 = 8
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
3 7 9 12 14 15 17 18 40
IQR = 8
- 12
3 7 9 12 14 15 17 18 40
IQR = 8
-3 39
OUTLIER
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers
Outlier Detection
3 12 7 40 9 14 18 15 17
Mean is 15 Mean is 11.875
Median is 14 Median is 13
© 2018 C. Gangatharan – VIT Dec 11, 2018 – Tue MGT1051 – Business Analytics for Engineers