00 voturi pozitive00 voturi negative

57 vizualizări66 pagininotes in statistic, basic stat

Jan 01, 2016

© © All Rights Reserved

PPT, PDF, TXT sau citiți online pe Scribd

notes in statistic, basic stat

© All Rights Reserved

57 vizualizări

00 voturi pozitive00 voturi negative

notes in statistic, basic stat

© All Rights Reserved

Sunteți pe pagina 1din 66

Descriptive Statistics

Chapter Outline

2.1 Frequency Distributions and Their Graphs

2.2 More Graphs and Displays

2.3 Measures of Central Tendency

2.4 Measures of Variation

2.5 Measures of Position

Overview

Descriptive Statistics

Describes the important characteristics of a set of

data.

Organize, present, and summarize data:

1. Graphically

2. Numerically

Important Characteristics of

Quantitative Data

Shape, Center, and Spread

Center: A representative or average value that

indicates where the middle of the data set is located.

Variation: A measure of the amount that the values

vary among themselves.

Distribution: The nature or shape of the distribution

of data (such as bell-shaped, uniform, or skewed).

Overview

2.1 Frequency Distributions and Their Graphs

2.2 More Graphs and Displays

2.3 Measures of Central Tendency

2.4 Measures of Variation

2.5 Measures of Position

Section 2.1

Frequency Distributions

and Their Graphs

Frequency Distributions

Frequency Distribution

along with number of values that fall in each class

(frequency, f ).

1. Ungrouped Frequency Distribution for data sets with

few different values. Each value is in its own class.

2. Grouped Frequency Distribution: for data sets with

many different values, which are grouped together in

the classes.

Frequency Distributions

Ungrouped

Courses Frequency, f

Taken

Grouped

Age of Frequency, f

Voters

25

18-30

202

38

31-42

508

217

43-54

620

1462

55-66

413

932

67-78

158

15

78-90

32

Number of Peas in a Pea

Pod

Sample Size: 50

Peas per

pod

Freq, f

Freq,

Peas per pod

f

1

18

12

Frequency Histograms

frequency

Frequency Histogram

A bar graph that represents the frequency distribution.

The horizontal scale is quantitative and measures the

data values.

The vertical scale measures the frequencies of the

classes.

Consecutive bars must touch.

data values

Larson/Farber 4th ed.

10

Frequency Histogram

Ex. Peas per Pod

Peas per pod

Freq, f

18

12

Relative Frequency Histograms

Relative Frequency Distribution

Shows the portion or percentage of the data that falls

in a particular class.

class frequency f

relative frequency

Sample size

n

Relative Frequency Histogram

Has the same shape and the same horizontal scale as

the corresponding frequency histogram.

The vertical scale measures the relative frequencies,

not frequencies.

12

Has the same shape and horizontal scale as a

histogram, but the vertical scale is marked with

relative frequencies.

Grouped Frequency Distribution

For data sets with many different values.

Groups data into 5-20 classes of equal width.

Exam Scores

Freq, f

Exam Scores

Freq, f

30-39

30-39

40-49

40-49

50-59

50-59

60-69

60-69

70-79

70-79

13

80-89

80-89

10

90-99

90-99

Exam Scores

Freq, f

Lower class limits: are the smallest numbers that can

actually belong to different classes

Upper class limits: are the largest numbers that can

actually belong to different classes

Class width: is the difference between two

consecutive lower class limits

15

Distributions

Class midpoints: the value halfway between LCL

and UCL:

(Lower class limit) (Upper class limit)

2

UCL and the next LCL

(Upper class limit) (next Lower class limit)

2

Distribution

1. Determine the range of the data:

Range = highest data value lowest data value

May round up to the next convenient number

2. Decide on the number of classes.

Usually between 5 and 20; otherwise, it may be difficult to detect any

patterns.

3. Find the class width:

.

Round up to the next convenient number.

range

class width =

number of classes

17

4. Find the class limits.

Choose the first LCL: use the minimum data entry

or something smaller that is convenient.

Find the remaining LCLs: add the class width to the

lower limit of the preceding class.

Find the UCLs: Remember that classes must cover

all data values and cannot overlap.

5. Find the frequencies for each class. (You may add a

tally column first and make a tally mark for each data

value in the class).

Larson/Farber 4th ed.

18

Shape of Distributions

Symmetric

roughly a mirror image of its right half.

Skewed

Data is skewed if it is not symmetric and if it extends

more to one side than the other.

Uniform

Data is uniform if it is equally distributed (on a

histogram, all the bars are the same height or

approximately the same height).

Symmetric

Skewed left

Uniform

Skewed Right

Outliers

Outliers

Unusual data values as compared to the rest of the set.

They may be distinguished by gaps in a histogram.

Section 2.2

More Graphs and Displays

22

Other Graphs

Besides Histograms, there are other methods of

graphing quantitative data:

Dot Plots

Time Series

Represents data by separating each data value into

two parts: the stem (such as the leftmost digit) and

the leaf (such as the rightmost digit)

24

Split each data value at the same place value to form the stem and a leaf. (Want 5-20 stems).

Arrange all possible stems vertically so there are no missing stems.

Write each leaf to the right of its stem, in order.

Create a key to recreate the data.

Variations of stem plots:

1. Split stems

2. Back to back stem plots.

25

the values of the data.

26

Dot Plots

Dot plot

Consists of a graph in which each data value is plotted as

a point along a scale of values

Figure 2-5

Time Series

(Paired data)

Quantitative

data

Time Series

Data set is composed of quantitative entries taken at

regular intervals over a period of time.

e.g., The amount of precipitation measured each

day for one month.

Use a time series chart to graph.

time

Larson/Farber 4th ed.

28

Time-Series Graph

Number of Screens at Drive-In Movies Theaters

Figure 2-8

Ex. www.eia.doe.gov/oil_gas/petroleum/

Pareto Chart

A vertical bar graph in which the

height of each bar represents

frequency or relative frequency.

Frequency

Pie Chart

A circle is divided into sectors

that represent categories.

Categories

Larson/Farber 4th ed.

30

Find the total sample size.

Convert the frequencies to relative frequencies (percent).

Marital Status

(in millions)

Never Married

55.3

Married

127.7

Widowed

13.9

Divorced

22.8

Total: 219.7

55.3

219.7

127.7

219.7

13.9

219.7

22.8

219.7

0.25 or 25%

31

Create a bar for each category, where the height of the

bar can represent frequency or relative frequency.

The bars are often positioned in order of decreasing

height, with the tallest bar positioned at the left.

Figure 2-6

Section 2.3

Measures of Central Tendency

33

Measure of central tendency

A value that represents a typical, or central, entry of a

data set.

Most common measures of central tendency:

Mean

Median

Mode

34

Mean : The sum of all the data entries divided by the

number of entries.

Population mean:

Sample mean:

x

x

n

Round-off rule for measures of center:

Carry

one more decimal place than is in the original values. Do

not round until the last step.

35

Median

The value that lies in the middle of the data when the data

set is arranged in order from lowest to highest. .

Measures the center of an ordered data set by dividing it

into two equal parts.

A sample mean is often referred to as x.

~

If the data set has an

odd number of entries: median is the middle data entry.

even number of entries: median is the mean of the two

middle data entries.

Larson/Farber 4th ed.

36

If the data set has an:

odd number of entries: median is the middle data entry:

2

11

13

%

median is the exact middle value: x 6

middle data entries:

2

11

13

67

%

6.5

median is the mean of the by two numbers: x

2

37

Mode

The data entry that occurs with the greatest frequency.

If no entry is repeated the data set has no mode.

If two entries occur with the same greatest frequency,

each entry is a mode (bimodal).

Mode is 1.10

b) 27 27 27 55 55 55 88 88 99

Bimodal -

c) 1 2 3 6 7 8 9 10

No Mode

27 & 55

All three measures describe an average. Choose the one that best

represents a typical value in the set.

Mean:

The most familiar average.

A reliable measure because it takes into account every entry of a

data set.

May be greatly affected by outliers or skew.

Median:

A common average.

Not as effected by skew or outliers.

Mode: May be used if there is an overwhelming repeat.

The shape of your data and the existence of any

outliers may help you choose the best average:

Section 2.4

Measures of Variation

41

Another important characteristic of quantitative data is how

much the data varies, or is spread out.

The 2 most common method of measuring spread are:

1. Range

2. Standard deviation and Variance

42

Range

Range

The difference between the maximum and minimum

data entries in the set.

The data must be quantitative.

Range = (Max. data entry) (Min. data entry)

43

The wait time to see a bank teller is studied at 2 banks.

Bank A has multiple lines, one for each teller.

Bank B has a single wait line for 1st available teller.

5 wait times (in minutes) are sampled from each bank:

Bank A:

5.2 6.2 7.5 8.4 9.2

Bank B:

6.6 6.8 7.5 7.7 7.9

Find the mean, median, and range for each bank.

Bank A: Range = ?

Bank B: Range = ?

Note: The range is easy to compute, but only uses 2

values. Do the following 2 sets vary the same?

Set A: 1, 2, 3, 4, 5, 6, 7, 8, 9, 10

Set B: 1, 10, 10, 10, 10, 10, 10, 10, 10, 10

45

Measures the typical amount data deviates from the

mean.

Sample Variance, s:2

(

x

x

)

s2

n 1

( x x )

s s

n 1

2

46

1.

data set.

2.

3.

4.

deviations squared.

5.

sample variance.

6.

the sample standard

deviation.

x

n

xx

( x x )2

( x x ) 2

2

(

x

x

)

s2

n 1

( x x ) 2

s

n 1

47

for Bank A (multi-line)

x 36.5

x

x (in min)

n

5

( x x )

s

n 1

2

Squares: (x x)2

5.2

(2.1)2 = 4.41

6.2

6.2 7.3 =

)2 =

7.5

7.5 7.3 =

)2 =

8.4

8.4 7.3 =

)2 =

9.2

9.2 7.3 =

)2 =

(x x) =

x x

x 36.5

s s2

Round to one more decimal than the data.

Dont round until the end.

Include the appropriate units.

for Bank B (1 wait line)

x 36.5

x

x (in min)

n

5

Squares: (x x)2

6.6

6.8

2

(

x

x

)

s2

n 1

7.5

7.7

7.9

x 36.5

(x x) =

x x

2

s s2

Round to one more decimal than the data.

Dont round until the end.

Include the appropriate units.

Standard Deviation and Variance

Sample

Statistics:

Population

Parameters:

Mean

Standard

Deviation

Variance

s2

Standard Deviation

Note: Unlike x and , the formulas for s and

are not mathematically the same:

Sample Standard Deviation

( x x )

s s

n 1

2

2

(

x

)

2

N

Larson/Farber 4th ed.

51

s0

( When would s = 0 ?)

values from the mean. The larger s is, the more the

data varies.

The units of the standard deviation s are the same as

the units of the original data values. (The variance

has units2).

The value of the standard deviation s can increase

dramatically with the inclusion of one or more

outliers (data values far away from all others)

Standard deviation is a measure of the typical amount

an entry deviates from the mean.

The more the entries are spread out, the greater the

standard deviation.

53

Standard Deviation

Sample Mean

Sample Standard

Deviation

54

Using Technology

The gas mileage of 2 cars is sampled over various

conditions:

Car A:

Car B:

25.2 19.1 18.0 24.4 20.3 (mpg)

Use a calculator to find the mean and standard deviation

for each to justify your choice.

How does s show how much the data varies?

Three methods:

1. Range Rule of Thumb

2. Chebyshevs Theorem

3. The Empirical Rule

Range Rule: For most data sets, the majority of the

data lies within 2 standard deviations of the mean.

Recall: Range = High Lo

Estimate: Range 4s

rule to estimate the standard deviation:

Range

4

A sample of womens heights has a mean of 64

inches and a standard deviation of 2.5 inches.

Using the range rule, most women fall within

what heights?

What would be an unusual height?

The sample of Exam Scores used in the class

handout had a mean of 73.6. Which of the

following is most likely the standard deviation of

the sample?

s = 3.6

s = 12.8

s = 74.5

Chebyshevs Theorem

Chebyshevs Theorem

For data with any distribution, the proportion (or

fraction) of any set of data lying within K standard

deviations of the mean is always at least 1-1/K2, where

K is any positive number greater than 1.

For K = 2, at least 3/4 (or 75%) of all values lie

within 2 standard deviations of the mean

For K = 3, at least 8/9 (or 89%) of all values lie

within 3 standard deviations of the mean

A sample of salaries at an elementary school has a

mean of $32,000 and a standard deviation of $3000.

Use Chebyshevs Theorem to describe how the salaries

are spread out.

Would a salary of $28,000 be unusual?

Would a salary of $45,000 be unusual?

Empirical (68-95-99.7) Rule

For data sets having a symmetric distribution:

About 68% of all values fall within 1 standard

A sample of IQs has a symmetric distribution with a mean

of 100 and a standard deviation of 15.

1. Sketch the distribution.

2. 68% of people have an IQ between what 2 values?

3. What percent of people have an IQ between 70 and 130?

4. What percent of people have an IQ between 100 and 115?

5. What percent of people have an IQ above 145?

66

## Mult mai mult decât documente.

Descoperiți tot ce are Scribd de oferit, inclusiv cărți și cărți audio de la editori majori.

Anulați oricând.