Lecture 1

INEN 270
ENGINEERING STATISTICS Spring 2011
Introduction
Agenda
Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics
Lecture Summary
Lecture 1: Introduction to Statistics

Purpose
Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics
Lecture Summary
Why Study Statistics?

You need to know how to evaluate published
numerical facts. Your profession or employment may require you to interpret the results of sampling or to employ statistical methods of analysis to make inferences in your work.
What Is the Purpose of Statistics?

One purpose of statistics is to make sense of your data. Statistics provide information about your data so you can answer questions and make informed business decisions.

Purpose Statistical Concepts
Descriptive Statistics and Some of Their Graphs Inferential Statistics
Lecture Summary
Objectives

Explain use of statistics. Define population and sample. Describe processes involved in statistical analysis. Compare descriptive and inferential statistics. Discuss the sampling plan.
Defining the Problem

Before you begin any analysis, you should
complete certain tasks.

1. Outline the purpose of the study. 2. Document the study questions. 3. Define the population of interest. 4. Determine the need for sampling. 5. Define the data collection protocol.
Example: Speeding Data
50 mph
Speed Limit
65 mph
45
48 mph 52 mph
Population and Sample
Basic Definition
STATISTICS: Area of science concerned with
extraction of information from numerical data and its use in making inference about a population from data that are obtained from a sample.
Extract?Information
Population
(set of all measurements)
Sample
(set of measurements selected from the population)
? Make Inference
Basic Definition
Population and Parameter Population: set representing all measurements of interest to the investigator. Parameters: an unknown population characteristic of interest to the investigator. Sample and Statistic Sample: subset of measurements selected from the population of interest. Statistic: a sample characteristic of interest to the investigator. Descriptive Statistics Center of location: mean, median, mode Variability: variance, standard deviation Distribution
Examples of Population and Sample

Selecting the proper diet for shrimp or other
sea animals is an important aspect of sea farming. A researcher wishes to estimate the average weight of shrimp maintained on a specific diet for a period of 6 months. One hundred shrimp are randomly selected from an artificial pond and each is weighed.

Identify the population Identify the sample Identify the parameter Identify the statistic
Simple Random Sampling

The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Convenience Sampling
Process of Statistical Data Analysis

Population
Describe
Random Sample Sample Statistics
Make Inferences
Sampling Plan

Lecture Summary
Objectives

Compute and interpret statistics describing the location of a set of values, such as the mean and median and mode. Compute and interpret statistics describing the variability in a set of values, such as the range and standard deviation. Compute and interpret the measures of shape, skewness and kurtosis. Produce graphical displays of data.
Some Frequently Used Statistics and Parameters

SAMPLE STATISTICS MEAN VARIANCE STANDARD DEVIATION PROPORTION POPULATION PARAMETERS
x
s2 s
Q W W
Measure of Location
Descriptive statistics that locate the center
of your data are called measures of central tendency Sample Mean

The sample mean of a set of n measurements (x1, x2,xn) is equal to the sum of the measurements divided by n.
xi x1 x2 ... xn x ! ! n i !1 n
n
Measure of Location

Sample Median
Median: the middle value (also known as the 50th percentile) The median of a set of n measurements (x1, x2, xn) is the value that falls in the middle position when the measurements are ordered from the smallest to the largest.

x if n n1 2 ~! x xn xn 2 2 1 if n 2
is
odd
is
even
x1, xn are arranged in increasing order of magnitude
RULE FOR CALCULATING THE MEDIAN

1. Order the measurements from the smallest to the
largest. 2. A) If the sample size is odd, the median is the middle measurement. B) If the sample size is even, the median is the average of the two middle measurements.
1 3 3 4 5 8 51
n= 3 n= 3
13345 8
n= 3 n= 3
median
median
Percentiles
98 95 92 90 85 81 79 70 63 55 47 42 third quartile
75th
Percentile=91
50th Percentile=80
Quartiles break your data up into quarters.
25th Percentile=59 first quartile
Example
A random sample of six values were taken from a population. These values were: x1=7, x2=1, x3=10, x4=8, x5=4, and x6=12. What are the sample mean and sample median for these data?
Sample Mean
x !?
x1 x2 x3 x4 x5 x6 x! n 7 1 10 8 4 12 ! 6 !7
CALCULATIONS FOR THE SAMPLE MEDIAN
1. Order Sample
2.Median ~ ! ? x
1. Order Sample
x2=1, x5=4, x1=7, x4=8, x3=10, x6=12 MEDIAN = ( 7 + 8 ) / 2 = 7.5
Example
Given a set of data: 1.7, 2.2, 3.11, 3.9, and 14.7 Sample mean=
Sample median =
x !?
~!? x
1.7 2.2 3.9 3.11 14.7 x! ! 5.12 5
~ ! 3.11 x
Example
Consider the following sample: 4 18 36 39 41 42 43 46 47 48 49 49 50 51 44 53 44 54 45 60
Which measure of central tendency best describes the central location of the data: THE SAMPLE MEAN OR SAMPLE MEDIAN? Why?
x
x!
i !1
n ~ ! 45 46 ! 45.5 x 2
the median
! 43.15
Why? Because there is an outlier (extreme value),4 in the data set, the mean is heavily influenced by this single outlier. Solution: Trimmed meandrops the highest and lowest extreme values and averages the rest. e.g. 5% trimmed mean drops the highest and lowest 5% and averages the rest.
Sample Mode
Sample Mode What is the mode for the previous example?

44 (occurs twice) 49 (occurs twice)
Measures of Central Tendency (Mode, mean and median)

How are they related to a given data set?

Depending on the skewness of the population
(a) A bell-shaped distribution
(b) A distribution skewed to the left
(c) A distribution skewed to the right
A: mean B: median C: mode
A: mode B: median C: mean
Suppose IRS wants to measure the central
tendency of the income of the American population, which measure will you recommend and why?

Hint: Bill Gates Skewed to the right
Other Measures of Locations

Trimmed means Computed by trimming away a certain percent of both the largest and smallest set of values. Less sensitive to outliers than the mean but more-so than the median. What is the relationship between trimmed mean and the median? Example: 0.32 0.53 0.28 0.37 0.47 0.43
0.36 0.42 0.38 0.43
xtr (10 ) ! ?
0.32
0.53 0.36
0.28 0.42
0.37 0.38
0.47 0.43
0.43
0.28 0.42
0.32 0.43
0.36 0.43
0.37 0.47
0.38 0.53
0.32 0.37 0.47 0.43 0.36 0.42 0.38 0.43 8 ! 0.3975 xtr (10) !
The Spread of a Distribution: Variation

Measure
range interquartile range variance standard deviation
Definition
the difference between the maximum and minimum data values the difference between the 25th and 75th percentiles (IR or IQR) a measure of dispersion of the data around the mean a measure of dispersion expressed in the same units of measurement as your data (the square root of the variance) standard deviation as a percentage of of the mean
coefficient of variation
Typical Variation: Standard Deviation

The variance is a measure of variation. The
square root of the variance, or standard deviation, is a measure of variation in terms of the original linear scale.
W ! W 2 is the population standard deviation

is an estimate of the population standard deviation.
sn 1 ! s 21 n
Typical Variation: Average Squared Deviation

Consider the data {3, 4, 8}
Obs 1 2 3 Sum Average Data 3 4 8 15 5 Deviation -2 -1 3 0 0 (Deviation)2 4 1 9 14 14/3
Measures of Variability
Sample Range
XMax-XMin Sample Variance
xi n 2 xi i !1 n ( xi x ) 2 n s2 ! ! i !1 n 1 n 1 i !1
n
Sample Standard Deviation
s! s
Obs. 1 2 3 4 5 6
xi xi x
7 1 10 8 4 12 0 -6 3 1 -3 5
( xi x ) Obs.
2
xi
7 1 10 8 4 12 42
xi
0 36 9 1 9 25 80
1 2 3 4 5 6
49 1 100 64 16 144 374
Sample Variance
x i x 2
S2 !
i !1 n
n 1 80 ! 5 ! 16
xi n xi2 i !1 n 2 i !1 S ! n 1 2 42 374 6 ! 5 ! 16
n
Unbiased Estimate of Population Variance

Calculate the unbiased estimate of population
variance by averaging with n-1 instead of n.
( yi y ) 2 s
2 n 1
i !1
n 1
This estimator is unbiased because, on
average, it equals the population variance.
Discrete and Continuous Data

Discrete Data

Counted: # of defective items, # of accidents Measured: all possible heights, weights, distance,etc.
Continuous Data

Distributions
When you examine the distribution of values
for speed, you can determine

the range of possible data values the frequency of data values whether the data values accumulate in the middle of the distribution or at one end.
Graphical Methods and Data Description

Stem and Leaf Plot Relative Frequency distribution Relative Frequency Histogram
Construction of a Stem-Leaf Display

List the stem values, in order, in a vertical column Draw a vertical line to the right of the stem values For each observation, record the leaf portion of the observation
in the row corresponding to the appropriate stem Reorder the leaves from the lowest to highest within each stem row If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary).
Car Battery Life

2.2 3.4 2.5 3.3 4.7 4.1 1.6 4.3 3.1 3.8 3.5 3.1 3.4 3.7 3.2 4.5 3.3 3.6 4.4 2.6 3.2 3.8 2.9 3.2 3.9 3.7 3.1 3.3 4.1 3.0 3.0 4.7 3.9 1.9 4.2 2.6 3.7 3.1 3.4 3.5
Stem and Leaf Plot of Battery Life

STEM 1 2 3 4 LEAF Frequency 69 2 25669 5 0011112223334445567778899 25 11234577 8
Double-Stem and Leaf Plot of Battery Life

STEM 1 2* 2 3* 3 4* 4 LEAF 69 2 5669 001111222333444 5567778899 11234 577 Frequency 2 1 4 15 10 5 3
Relative Frequency Distribution

Group data into different classes or intervals Counting leaves belonging to each stem Each stem defines a class interval Divide each class frequency by the total
number of observations, we obtain the proportion of the set of observations in each of the classes.
Relative Frequency Distribution of Battery Life

Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9 Class midpoint 1.7 2.2 2.7 3.2 ? ? ? Frequency, f 2 1 4 15 ? ? ? Relative frequency 0.05 0.025 0.100 0.375 ? ? ?
Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9
Class midpoint 1.7 2.2 2.7 3.2 3.7 4.2 4.7
Frequency, f 2 1 4 15 10 5 3
Relative frequency 0.05 0.025 0.100 0.375 0.250 0.125 0.075
Relative Frequency Histogram of Battery Life
Picturing Distributions: Histogram

PERCENT
Each bar in the histogram represents a group of values (a bin). The height of the bar is the percent of values in the bin.
Bins
Measures of Shape: Skewness

Skewed to Left FREQUENCY FREQUENCY Skewed to Right FREQUENCY
Symmetric
Measures of Shape: Kurtosis
Light-tailed Normal Heavy-tailed
Data Displays and Graphical Methods

Box and Whisker Plot or Boxplot Pth Percentile The Pth Percentile is the value Xp such that p% of the measurements will fall below that value and (100-p)% of the measurements will fall above the value.
P%

(100-P)%
Quartile Xp Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q1, the second by Q2, and the third (Upper Quartile) by Q3.
Q1
Q2
Q3
InterQuartile Range (IQR) IQR=Q3-Q1 Outlier Observations that are considered to be unusually far removed from the bulk of the data. We label the observations as outliers when the distance from the box exceeds 1.5 times the interquartile range (in either direction). Box encloses the interquartile range of the data Whiskers show the extreme observations in the
sample.
Box and Whiskers Plot or Boxplot

Calculating Fence Values

Maximum Upper Quartile
Lower Inner Fence:

Q1-1.5(IQR)
Median Lower Quartile
Upper Inner Fence:

Q3+1.5(IQR) Q1-3(IQR)
Minimum
Lower Outer Fence:

Upper Outer Fence:

Q3+3(IQR)
A Quick Method
1. Order the data from smallest to largest
value. 2. Divide the ordered data set into two data sets using the median as the dividing value. 3. Let the lower quartile be the median of the set of values consisting the smaller values. 4. Let the upper quartile be the median of the set of values consisting of the larger values.
Example
Nicotine content was measured in a random
sample of 40 cigarettes. The data is displayed below.
1.Order the data from the smallest to the largest 2.Divide the ordered data set into two data sets using the median as the dividing value
0.72 1.40 1.64 1.69 1.79 1.88 2.03 2.28
0.85 1.47 1.64 1.70 1.79 1.90 2.08 2.31
1.09 1.51 1.67 1.74 1.82 1.92 2.09 2.37
1.24 1.58 1.68 1.75 1.85 1.93 2.11 2.46
1.37 1.63 1.69 1.75 1.86 1.97 2.17 2.55

Q2=? Q1=? Q3=? IQR=Q3-Q1=? Q1=(1.63+1.64)/2=1.635 Q2=(1.75+1.79)/2=1.77 Q3=(1.97+2.03)/2=2.000 IQR=Q3-Q1=0.365
Box-whisker Plot
Outlier
Outlier
Information Drawn from Boxplot

1. The center of the distribution is indicated by the
median line in the box.

2. A measure of the variability is given by the
interquartile range, the length of the box.

3. The relative position of the median line indicates
the symmetry of the middle 50% of the data.

4. The skewness can be obtained by the length of the
whiskers.
5. The presence of outliers can be examined.
Quantile Plot
A quantile plot simply plots the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value.
3 i 8 fi ! 1 n 4
Where i is the order of observations when they are ranked from low to high.
Quantile Plot for paint data (table 8.2 page 238)
Normal Quantile Plots

The normal quantile-quantile plot is a plot of y(i)
(ordered observations) against q ( f ), where

0 ,1 i
3 i 8 fi ! 1 n 4
q0,1 ( f )
! 4.91 f 0.14 (1 f ) 0.14
Normal Quantile Plots

. 3. 2. .... . . .. .. .. .. ....... . . . .. .... .. . .... ... .... . ... . .. ... . .. .. .. . . . ... ... .. .. . . . . ... . . .... .... .......... .. . .... . . 4. ..... . 5. . .. .... ... . .. . .. . . . .. . ...... .. . . . ... . .......... . .. . . .. ... . . ... . .. ..... .... . 1.

Lecture Summary
Objectives

Understand the importance of making inference. Understand the steps conducting a statistical study.
Statistical Inference
making an "INFORMED GUESS" about a parameter based on a statistic. (This is the main objective of statistics.)
STATISTICAL INFERENCE
GATHER DATA SAMPLE
POPULATION
MAKE INFERENCES PARAMETERS SAMPLE STATISTICS
Q , W
, W , T , etc.
x , s , s, T , etc.
2
Variable
A VARIABLE is a characteristic of an individual or
object that may vary for different observations.

A QUANTITATIVE VARIABLE measures a variable
on some sort of scale.

A QUALITATIVE VARIABLE categorizes the values
of the variable.
RAISIN BRAN EXAMPLE

A cereal company claims that the average amount of
raisins in its boxes of raisin bran is two scoops.

A random sample of five boxes was taken off the
production line, and an analysis revealed an average of 1.9 scoops per box.
Components of the Problem

Identify the population Identify the sample Identify the symbol for the parameter Identify the symbol for the statistic
Five Steps in a Statistical Study:

1. Stating the problem 2. Gathering the data 3. Summarizing the data 4. Analyzing the data 5. Reporting the results
Stating the Problem

Specifically identifying the population to be sampled Identifying the parameter (s) being studied
Gathering the Data

SURVEYS

Random Sampling Stratified Sampling Cluster Sampling Systematic sampling Completely Randomized Design Randomized Block Design Factorial Design
EXPERIMENTS


Lecture Summary
Summary
Basics of statistics Descriptive statistics and graphs Inferential statistics Textbook

Chapter 1 (page 1-28) Chapter 8 (page 229-243)

Lecture 1

Încărcat de

Informații document

Descriere originală:

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Lecture 1

Încărcat de

Drepturi de autor:

Formate disponibile

INEN 270

ENGINEERING STATISTICS Spring 2011

Lecture 1: Introduction to Statistics

Why Study Statistics?

What Is the Purpose of Statistics?

Lecture 1: Introduction to Statistics

Defining the Problem

complete certain tasks.

Example: Speeding Data

Population and Sample

Examples of Population and Sample

Simple Random Sampling

Process of Statistical Data Analysis

Random Sample Sample Statistics

Lecture 1: Introduction to Statistics

Some Frequently Used Statistics and Parameters

of your data are called measures of central tendency  Sample Mean

x1, xn are arranged in increasing order of magnitude

RULE FOR CALCULATING THE MEDIAN

Quartiles break your data up into quarters.

25th Percentile=59 first quartile

CALCULATIONS FOR THE SAMPLE MEDIAN

1.7  2.2  3.9  3.11  14.7 x! ! 5.12 5

44 (occurs twice) 49 (occurs twice)

Measures of Central Tendency (Mode, mean and median)

Depending on the skewness of the population

(a) A bell-shaped distribution

(b) A distribution skewed to the left

(c) A distribution skewed to the right

A: mean B: median C: mode

A: mode B: median C: mean

 Suppose IRS wants to measure the central

Hint: Bill Gates Skewed to the right

Other Measures of Locations

The Spread of a Distribution: Variation

Typical Variation: Standard Deviation

is an estimate of the population standard deviation.

Typical Variation: Average Squared Deviation

XMax-XMin  Sample Variance

 Sample Standard Deviation

49 1 100 64 16 144 374

Unbiased Estimate of Population Variance

variance by averaging with n-1 instead of n.

This estimator is unbiased because, on

average, it equals the population variance.

Discrete and Continuous Data

for speed, you can determine

Graphical Methods and Data Description

Construction of a Stem-Leaf Display

Car Battery Life

Stem and Leaf Plot of Battery Life

Double-Stem and Leaf Plot of Battery Life

Relative Frequency Distribution

Relative Frequency Distribution of Battery Life

Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9

Class midpoint 1.7 2.2 2.7 3.2 3.7 4.2 4.7

Relative frequency 0.05 0.025 0.100 0.375 0.250 0.125 0.075

Relative Frequency Histogram of Battery Life

Picturing Distributions: Histogram

Measures of Shape: Skewness

Measures of Shape: Kurtosis

Light-tailed Normal Heavy-tailed

Data Displays and Graphical Methods

of your data are called measures of central tendency Sample Mean

1.7 2.2 3.9 3.11 14.7 x! ! 5.12 5

Suppose IRS wants to measure the central

XMax-XMin Sample Variance

Sample Standard Deviation

! 4.91 f 0.14 (1 f ) 0.14