Sunteți pe pagina 1din 93

INEN 270

ENGINEERING STATISTICS Spring 2011

Introduction

Agenda
Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Lecture 1: Introduction to Statistics


Purpose
Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Why Study Statistics?


 You need to know how to evaluate published

numerical facts.  Your profession or employment may require you to interpret the results of sampling or to employ statistical methods of analysis to make inferences in your work.

What Is the Purpose of Statistics?




One purpose of statistics is to make sense of your data. Statistics provide information about your data so you can answer questions and make informed business decisions.

Lecture 1: Introduction to Statistics


Purpose Statistical Concepts
Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Objectives
  

 

Explain use of statistics. Define population and sample. Describe processes involved in statistical analysis. Compare descriptive and inferential statistics. Discuss the sampling plan.

Defining the Problem


 Before you begin any analysis, you should

complete certain tasks.


    

1. Outline the purpose of the study. 2. Document the study questions. 3. Define the population of interest. 4. Determine the need for sampling. 5. Define the data collection protocol.

Example: Speeding Data

50 mph

Speed Limit

65 mph

45
48 mph 52 mph

Population and Sample

Basic Definition
 STATISTICS: Area of science concerned with

extraction of information from numerical data and its use in making inference about a population from data that are obtained from a sample.

Extract?Information

Population
(set of all measurements)

Sample
(set of measurements selected from the population)

? Make Inference

Basic Definition
 Population and Parameter  Population: set representing all measurements of interest to the investigator.  Parameters: an unknown population characteristic of interest to the investigator.  Sample and Statistic  Sample: subset of measurements selected from the population of interest.  Statistic: a sample characteristic of interest to the investigator.  Descriptive Statistics  Center of location: mean, median, mode  Variability: variance, standard deviation  Distribution

Examples of Population and Sample


 Selecting the proper diet for shrimp or other

sea animals is an important aspect of sea farming. A researcher wishes to estimate the average weight of shrimp maintained on a specific diet for a period of 6 months. One hundred shrimp are randomly selected from an artificial pond and each is weighed.
   

Identify the population Identify the sample Identify the parameter Identify the statistic

Simple Random Sampling


The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Convenience Sampling
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Process of Statistical Data Analysis


Population

Describe

Random Sample Sample Statistics

Make Inferences

Sampling Plan
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.

Lecture 1: Introduction to Statistics


Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Objectives


Compute and interpret statistics describing the location of a set of values, such as the mean and median and mode. Compute and interpret statistics describing the variability in a set of values, such as the range and standard deviation. Compute and interpret the measures of shape, skewness and kurtosis. Produce graphical displays of data.

Some Frequently Used Statistics and Parameters


SAMPLE STATISTICS MEAN VARIANCE STANDARD DEVIATION PROPORTION POPULATION PARAMETERS

x
s2 s


Q W W


Measure of Location
 Descriptive statistics that locate the center

of your data are called measures of central tendency  Sample Mean




The sample mean of a set of n measurements (x1, x2,xn) is equal to the sum of the measurements divided by n.

xi x1  x2  ...  xn x ! ! n i !1 n
n

Measure of Location


Sample Median
Median: the middle value (also known as the 50th percentile)  The median of a set of n measurements (x1, x2, xn) is the value that falls in the middle position when the measurements are ordered from the smallest to the largest.


x if n n1 2 ~! x xn  xn 2 2 1 if n 2

is

odd

is

even

x1, xn are arranged in increasing order of magnitude

RULE FOR CALCULATING THE MEDIAN


 1. Order the measurements from the smallest to the

largest.  2. A) If the sample size is odd, the median is the middle measurement.  B) If the sample size is even, the median is the average of the two middle measurements.

1 3 3 4 5 8 51
n= 3 n= 3

13345 8
n= 3 n= 3

median

median

Percentiles
98 95 92 90 85 81 79 70 63 55 47 42 third quartile

75th

Percentile=91

50th Percentile=80

Quartiles break your data up into quarters.

25th Percentile=59 first quartile

Example
A random sample of six values were taken from a population. These values were: x1=7, x2=1, x3=10, x4=8, x5=4, and x6=12. What are the sample mean and sample median for these data?

Sample Mean

x !?

x1  x2  x3  x4  x5  x6 x! n 7  1  10  8  4  12 ! 6 !7

CALCULATIONS FOR THE SAMPLE MEDIAN

1. Order Sample

2.Median ~ ! ? x

1. Order Sample
x2=1, x5=4, x1=7, x4=8, x3=10, x6=12 MEDIAN = ( 7 + 8 ) / 2 = 7.5

Example
 Given a set of data: 1.7, 2.2, 3.11, 3.9, and 14.7  Sample mean=

 Sample median =

x !?

~!? x

1.7  2.2  3.9  3.11  14.7 x! ! 5.12 5

~ ! 3.11 x

Example
Consider the following sample: 4 18 36 39 41 42 43 46 47 48 49 49 50 51 44 53 44 54 45 60

Which measure of central tendency best describes the central location of the data: THE SAMPLE MEAN OR SAMPLE MEDIAN? Why?

x
x!
i !1

n ~ ! 45  46 ! 45.5 x 2
the median

! 43.15

Why? Because there is an outlier (extreme value),4 in the data set, the mean is heavily influenced by this single outlier. Solution: Trimmed meandrops the highest and lowest extreme values and averages the rest. e.g. 5% trimmed mean drops the highest and lowest 5% and averages the rest.

Sample Mode
 Sample Mode  What is the mode for the previous example?
 

44 (occurs twice) 49 (occurs twice)

Measures of Central Tendency (Mode, mean and median)


 How are they related to a given data set?


Depending on the skewness of the population

(a) A bell-shaped distribution

(b) A distribution skewed to the left

(c) A distribution skewed to the right

A: mean B: median C: mode

A: mode B: median C: mean

 Suppose IRS wants to measure the central

tendency of the income of the American population, which measure will you recommend and why?
 

Hint: Bill Gates Skewed to the right

Other Measures of Locations


 Trimmed means  Computed by trimming away a certain percent of both the largest and smallest set of values.  Less sensitive to outliers than the mean but more-so than the median.  What is the relationship between trimmed mean and the median?  Example: 0.32 0.53 0.28 0.37 0.47 0.43
0.36 0.42 0.38 0.43

xtr (10 ) ! ?

0.32

0.53 0.36

0.28 0.42

0.37 0.38

0.47 0.43

0.43

0.28 0.42

0.32 0.43

0.36 0.43

0.37 0.47

0.38 0.53

0.32  0.37  0.47  0.43  0.36  0.42  0.38  0.43 8 ! 0.3975 xtr (10) !

The Spread of a Distribution: Variation


Measure
range interquartile range variance standard deviation

Definition
the difference between the maximum and minimum data values the difference between the 25th and 75th percentiles (IR or IQR) a measure of dispersion of the data around the mean a measure of dispersion expressed in the same units of measurement as your data (the square root of the variance) standard deviation as a percentage of of the mean

coefficient of variation

Typical Variation: Standard Deviation


The variance is a measure of variation. The

square root of the variance, or standard deviation, is a measure of variation in terms of the original linear scale.
W ! W 2 is the population standard deviation

 

is an estimate of the population standard deviation.

sn 1 ! s 21 n

Typical Variation: Average Squared Deviation


 Consider the data {3, 4, 8}
Obs 1 2 3 Sum Average Data 3 4 8 15 5 Deviation -2 -1 3 0 0 (Deviation)2 4 1 9 14 14/3

Measures of Variability
 Sample Range

XMax-XMin  Sample Variance

xi n 2 xi  i !1 n ( xi  x ) 2 n s2 ! ! i !1 n 1 n 1 i !1
n

 Sample Standard Deviation

s! s

Obs. 1 2 3 4 5 6

xi xi  x
7 1 10 8 4 12 0 -6 3 1 -3 5

( xi  x ) Obs.
2

xi
7 1 10 8 4 12 42

xi

0 36 9 1 9 25 80

1 2 3 4 5 6

49 1 100 64 16 144 374

Sample Variance
x i  x 2
S2 !
i !1 n

n 1 80 ! 5 ! 16

xi n xi2  i !1 n 2 i !1 S ! n 1 2 42 374  6 ! 5 ! 16
n

Unbiased Estimate of Population Variance


Calculate the unbiased estimate of population

variance by averaging with n-1 instead of n.

( yi  y ) 2 s
2 n 1

i !1

n 1

This estimator is unbiased because, on

average, it equals the population variance.

Discrete and Continuous Data


 Discrete Data


Counted: # of defective items, # of accidents Measured: all possible heights, weights, distance,etc.

 Continuous Data


Distributions
 When you examine the distribution of values

for speed, you can determine


  

the range of possible data values the frequency of data values whether the data values accumulate in the middle of the distribution or at one end.

Graphical Methods and Data Description


 Stem and Leaf Plot  Relative Frequency distribution  Relative Frequency Histogram

Construction of a Stem-Leaf Display


 List the stem values, in order, in a vertical column  Draw a vertical line to the right of the stem values  For each observation, record the leaf portion of the observation

in the row corresponding to the appropriate stem  Reorder the leaves from the lowest to highest within each stem row  If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary).

Car Battery Life


2.2 3.4 2.5 3.3 4.7 4.1 1.6 4.3 3.1 3.8 3.5 3.1 3.4 3.7 3.2 4.5 3.3 3.6 4.4 2.6 3.2 3.8 2.9 3.2 3.9 3.7 3.1 3.3 4.1 3.0 3.0 4.7 3.9 1.9 4.2 2.6 3.7 3.1 3.4 3.5

Stem and Leaf Plot of Battery Life


STEM 1 2 3 4 LEAF Frequency 69 2 25669 5 0011112223334445567778899 25 11234577 8

Double-Stem and Leaf Plot of Battery Life


STEM 1 2* 2 3* 3 4* 4 LEAF 69 2 5669 001111222333444 5567778899 11234 577 Frequency 2 1 4 15 10 5 3

Relative Frequency Distribution


 Group data into different classes or intervals  Counting leaves belonging to each stem  Each stem defines a class interval  Divide each class frequency by the total

number of observations, we obtain the proportion of the set of observations in each of the classes.

Relative Frequency Distribution of Battery Life


Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9 Class midpoint 1.7 2.2 2.7 3.2 ? ? ? Frequency, f 2 1 4 15 ? ? ? Relative frequency 0.05 0.025 0.100 0.375 ? ? ?

Class Interval 1.5-1.9 2.0-2.4 2.5-2.9 3.0-3.4 3.5-3.9 4.0-4.4 4.5-4.9

Class midpoint 1.7 2.2 2.7 3.2 3.7 4.2 4.7

Frequency, f 2 1 4 15 10 5 3

Relative frequency 0.05 0.025 0.100 0.375 0.250 0.125 0.075

Relative Frequency Histogram of Battery Life

Picturing Distributions: Histogram




PERCENT

Each bar in the histogram represents a group of values (a bin). The height of the bar is the percent of values in the bin.

Bins

Measures of Shape: Skewness


Skewed to Left FREQUENCY FREQUENCY Skewed to Right FREQUENCY

Symmetric

Measures of Shape: Kurtosis

Light-tailed Normal Heavy-tailed

Data Displays and Graphical Methods


 Box and Whisker Plot or Boxplot  Pth Percentile  The Pth Percentile is the value Xp such that p% of the measurements will fall below that value and (100-p)% of the measurements will fall above the value.

P%


(100-P)%

Quartile Xp  Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q1, the second by Q2, and the third (Upper Quartile) by Q3.

Q1

Q2

Q3

 InterQuartile Range (IQR)  IQR=Q3-Q1  Outlier  Observations that are considered to be unusually far removed from the bulk of the data.  We label the observations as outliers when the distance from the box exceeds 1.5 times the interquartile range (in either direction).  Box encloses the interquartile range of the data  Whiskers show the extreme observations in the

sample.

Box and Whiskers Plot or Boxplot


 Calculating Fence Values

Maximum Upper Quartile

Lower Inner Fence:




Q1-1.5(IQR)
Median Lower Quartile

Upper Inner Fence:




Q3+1.5(IQR) Q1-3(IQR)
Minimum

Lower Outer Fence:




Upper Outer Fence:




Q3+3(IQR)

A Quick Method
 1. Order the data from smallest to largest

value.  2. Divide the ordered data set into two data sets using the median as the dividing value.  3. Let the lower quartile be the median of the set of values consisting the smaller values.  4. Let the upper quartile be the median of the set of values consisting of the larger values.

Example
 Nicotine content was measured in a random

sample of 40 cigarettes. The data is displayed below.

1.Order the data from the smallest to the largest 2.Divide the ordered data set into two data sets using the median as the dividing value

0.72 1.40 1.64 1.69 1.79 1.88 2.03 2.28

0.85 1.47 1.64 1.70 1.79 1.90 2.08 2.31

1.09 1.51 1.67 1.74 1.82 1.92 2.09 2.37

1.24 1.58 1.68 1.75 1.85 1.93 2.11 2.46

1.37 1.63 1.69 1.75 1.86 1.97 2.17 2.55

       

Q2=? Q1=? Q3=? IQR=Q3-Q1=? Q1=(1.63+1.64)/2=1.635 Q2=(1.75+1.79)/2=1.77 Q3=(1.97+2.03)/2=2.000 IQR=Q3-Q1=0.365

Box-whisker Plot
Outlier

Outlier

Information Drawn from Boxplot


1. The center of the distribution is indicated by the

median line in the box.


2. A measure of the variability is given by the

interquartile range, the length of the box.


3. The relative position of the median line indicates

the symmetry of the middle 50% of the data.


4. The skewness can be obtained by the length of the

whiskers.
5. The presence of outliers can be examined.

Quantile Plot
A quantile plot simply plots the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value.

3 i 8 fi ! 1 n 4
Where i is the order of observations when they are ranked from low to high.

Quantile Plot for paint data (table 8.2 page 238)

Normal Quantile Plots


 The normal quantile-quantile plot is a plot of y(i)

(ordered observations) against q ( f ), where


0 ,1 i

3 i 8 fi ! 1 n 4

q0,1 ( f )

! 4.91 f 0.14  (1  f ) 0.14

Normal Quantile Plots


. 3. 2. .... . . .. .. .. .. ....... . . . .. .... .. . .... ... .... . ... . .. ... . .. .. .. . . . ... ... .. .. . . . . ... . . .... .... .......... .. . .... . . 4. ..... . 5. . .. .... ... . .. . .. . . . .. . ...... .. . . . ... . .......... . .. . . .. ... . . ... . .. ..... .... . 1.

Lecture 1: Introduction to Statistics


Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Objectives


Understand the importance of making inference. Understand the steps conducting a statistical study.

Statistical Inference
making an "INFORMED GUESS" about a parameter based on a statistic. (This is the main objective of statistics.)

STATISTICAL INFERENCE
GATHER DATA SAMPLE

POPULATION

MAKE INFERENCES PARAMETERS SAMPLE STATISTICS

Q , W

, W , T , etc.

x , s , s, T , etc.
2

Variable
 A VARIABLE is a characteristic of an individual or

object that may vary for different observations.


 A QUANTITATIVE VARIABLE measures a variable

on some sort of scale.


 A QUALITATIVE VARIABLE categorizes the values

of the variable.

RAISIN BRAN EXAMPLE


 A cereal company claims that the average amount of

raisins in its boxes of raisin bran is two scoops.


 A random sample of five boxes was taken off the

production line, and an analysis revealed an average of 1.9 scoops per box.

Components of the Problem


 Identify the population  Identify the sample  Identify the symbol for the parameter  Identify the symbol for the statistic

Five Steps in a Statistical Study:


1. Stating the problem 2. Gathering the data 3. Summarizing the data 4. Analyzing the data 5. Reporting the results

Stating the Problem


 Specifically identifying the population to be sampled  Identifying the parameter (s) being studied

Gathering the Data


 SURVEYS
   

Random Sampling Stratified Sampling Cluster Sampling Systematic sampling Completely Randomized Design Randomized Block Design Factorial Design

 EXPERIMENTS
  

Lecture 1: Introduction to Statistics


Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics

Lecture Summary

Summary
 Basics of statistics  Descriptive statistics and graphs  Inferential statistics  Textbook
 

Chapter 1 (page 1-28) Chapter 8 (page 229-243)

S-ar putea să vă placă și