Documente Academic
Documente Profesional
Documente Cultură
Introduction
Agenda
Purpose Statistical Concepts Descriptive Statistics and Some of Their Graphs Inferential Statistics
Lecture Summary
Lecture Summary
numerical facts. Your profession or employment may require you to interpret the results of sampling or to employ statistical methods of analysis to make inferences in your work.
One purpose of statistics is to make sense of your data. Statistics provide information about your data so you can answer questions and make informed business decisions.
Lecture Summary
Objectives
Explain use of statistics. Define population and sample. Describe processes involved in statistical analysis. Compare descriptive and inferential statistics. Discuss the sampling plan.
1. Outline the purpose of the study. 2. Document the study questions. 3. Define the population of interest. 4. Determine the need for sampling. 5. Define the data collection protocol.
50 mph
Speed Limit
65 mph
45
48 mph 52 mph
Basic Definition
STATISTICS: Area of science concerned with
extraction of information from numerical data and its use in making inference about a population from data that are obtained from a sample.
Extract?Information
Population
(set of all measurements)
Sample
(set of measurements selected from the population)
? Make Inference
Basic Definition
Population and Parameter Population: set representing all measurements of interest to the investigator. Parameters: an unknown population characteristic of interest to the investigator. Sample and Statistic Sample: subset of measurements selected from the population of interest. Statistic: a sample characteristic of interest to the investigator. Descriptive Statistics Center of location: mean, median, mode Variability: variance, standard deviation Distribution
sea animals is an important aspect of sea farming. A researcher wishes to estimate the average weight of shrimp maintained on a specific diet for a period of 6 months. One hundred shrimp are randomly selected from an artificial pond and each is weighed.
Identify the population Identify the sample Identify the parameter Identify the statistic
Convenience Sampling
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Describe
Make Inferences
Sampling Plan
The image cannot be displayed. Your computer may not have enough memory to open the image, or the image may have been corrupted. Restart your computer, and then open the file again. If the red x still appears, you may have to delete the image and then insert it again.
Lecture Summary
Objectives
Compute and interpret statistics describing the location of a set of values, such as the mean and median and mode. Compute and interpret statistics describing the variability in a set of values, such as the range and standard deviation. Compute and interpret the measures of shape, skewness and kurtosis. Produce graphical displays of data.
x
s2 s
Q W W
Measure of Location
Descriptive statistics that locate the center
The sample mean of a set of n measurements (x1, x2,xn) is equal to the sum of the measurements divided by n.
xi x1 x2 ... xn x ! ! n i !1 n
n
Measure of Location
Sample Median
Median: the middle value (also known as the 50th percentile) The median of a set of n measurements (x1, x2, xn) is the value that falls in the middle position when the measurements are ordered from the smallest to the largest.
x if n n1 2 ~! x xn xn 2 2 1 if n 2
is
odd
is
even
largest. 2. A) If the sample size is odd, the median is the middle measurement. B) If the sample size is even, the median is the average of the two middle measurements.
1 3 3 4 5 8 51
n= 3 n= 3
13345 8
n= 3 n= 3
median
median
Percentiles
98 95 92 90 85 81 79 70 63 55 47 42 third quartile
75th
Percentile=91
50th Percentile=80
Example
A random sample of six values were taken from a population. These values were: x1=7, x2=1, x3=10, x4=8, x5=4, and x6=12. What are the sample mean and sample median for these data?
Sample Mean
x !?
x1 x2 x3 x4 x5 x6 x! n 7 1 10 8 4 12 ! 6 !7
1. Order Sample
2.Median ~ ! ? x
1. Order Sample
x2=1, x5=4, x1=7, x4=8, x3=10, x6=12 MEDIAN = ( 7 + 8 ) / 2 = 7.5
Example
Given a set of data: 1.7, 2.2, 3.11, 3.9, and 14.7 Sample mean=
Sample median =
x !?
~!? x
~ ! 3.11 x
Example
Consider the following sample: 4 18 36 39 41 42 43 46 47 48 49 49 50 51 44 53 44 54 45 60
Which measure of central tendency best describes the central location of the data: THE SAMPLE MEAN OR SAMPLE MEDIAN? Why?
x
x!
i !1
n ~ ! 45 46 ! 45.5 x 2
the median
! 43.15
Why? Because there is an outlier (extreme value),4 in the data set, the mean is heavily influenced by this single outlier. Solution: Trimmed meandrops the highest and lowest extreme values and averages the rest. e.g. 5% trimmed mean drops the highest and lowest 5% and averages the rest.
Sample Mode
Sample Mode What is the mode for the previous example?
tendency of the income of the American population, which measure will you recommend and why?
xtr (10 ) ! ?
0.32
0.53 0.36
0.28 0.42
0.37 0.38
0.47 0.43
0.43
0.28 0.42
0.32 0.43
0.36 0.43
0.37 0.47
0.38 0.53
0.32 0.37 0.47 0.43 0.36 0.42 0.38 0.43 8 ! 0.3975 xtr (10) !
Definition
the difference between the maximum and minimum data values the difference between the 25th and 75th percentiles (IR or IQR) a measure of dispersion of the data around the mean a measure of dispersion expressed in the same units of measurement as your data (the square root of the variance) standard deviation as a percentage of of the mean
coefficient of variation
square root of the variance, or standard deviation, is a measure of variation in terms of the original linear scale.
W ! W 2 is the population standard deviation
sn 1 ! s 21 n
Measures of Variability
Sample Range
xi n 2 xi i !1 n ( xi x ) 2 n s2 ! ! i !1 n 1 n 1 i !1
n
s! s
Obs. 1 2 3 4 5 6
xi xi x
7 1 10 8 4 12 0 -6 3 1 -3 5
( xi x ) Obs.
2
xi
7 1 10 8 4 12 42
xi
0 36 9 1 9 25 80
1 2 3 4 5 6
Sample Variance
x i x
2
S2 !
i !1 n
n 1 80 ! 5 ! 16
xi n xi2 i !1 n 2 i !1 S ! n 1 2 42
374 6 ! 5 ! 16
n
( yi y ) 2 s
2 n 1
i !1
n 1
Counted: # of defective items, # of accidents Measured: all possible heights, weights, distance,etc.
Continuous Data
Distributions
When you examine the distribution of values
the range of possible data values the frequency of data values whether the data values accumulate in the middle of the distribution or at one end.
in the row corresponding to the appropriate stem Reorder the leaves from the lowest to highest within each stem row If the number of leaves appearing in each stem is too large, divide the stems into two groups, the first corresponding to leaves 0 through 4, and the second corresponding to leaves 5 through 9. (This subdivision can be increased to five groups if necessary).
number of observations, we obtain the proportion of the set of observations in each of the classes.
Frequency, f 2 1 4 15 10 5 3
PERCENT
Each bar in the histogram represents a group of values (a bin). The height of the bar is the percent of values in the bin.
Bins
Symmetric
P%
(100-P)%
Quartile Xp Quartiles divide the measurements into four parts such that 25% of the measurements are contained in each part. The first quartile (Lower Quartile) is denoted by Q1, the second by Q2, and the third (Upper Quartile) by Q3.
Q1
Q2
Q3
InterQuartile Range (IQR) IQR=Q3-Q1 Outlier Observations that are considered to be unusually far removed from the bulk of the data. We label the observations as outliers when the distance from the box exceeds 1.5 times the interquartile range (in either direction). Box encloses the interquartile range of the data Whiskers show the extreme observations in the
sample.
Q1-1.5(IQR)
Median Lower Quartile
Q3+1.5(IQR) Q1-3(IQR)
Minimum
Q3+3(IQR)
A Quick Method
1. Order the data from smallest to largest
value. 2. Divide the ordered data set into two data sets using the median as the dividing value. 3. Let the lower quartile be the median of the set of values consisting the smaller values. 4. Let the upper quartile be the median of the set of values consisting of the larger values.
Example
Nicotine content was measured in a random
1.Order the data from the smallest to the largest 2.Divide the ordered data set into two data sets using the median as the dividing value
Box-whisker Plot
Outlier
Outlier
whiskers.
5. The presence of outliers can be examined.
Quantile Plot
A quantile plot simply plots the data values on the vertical axis against an empirical assessment of the fraction of observations exceeded by the data value.
3 i 8 fi ! 1 n 4
Where i is the order of observations when they are ranked from low to high.
3 i 8 fi ! 1 n 4
q0,1 ( f )
Lecture Summary
Objectives
Understand the importance of making inference. Understand the steps conducting a statistical study.
Statistical Inference
making an "INFORMED GUESS" about a parameter based on a statistic. (This is the main objective of statistics.)
STATISTICAL INFERENCE
GATHER DATA SAMPLE
POPULATION
Q , W
, W , T , etc.
x , s , s, T , etc.
2
Variable
A VARIABLE is a characteristic of an individual or
of the variable.
production line, and an analysis revealed an average of 1.9 scoops per box.
Random Sampling Stratified Sampling Cluster Sampling Systematic sampling Completely Randomized Design Randomized Block Design Factorial Design
EXPERIMENTS
Lecture Summary
Summary
Basics of statistics Descriptive statistics and graphs Inferential statistics Textbook