Sunteți pe pagina 1din 114

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 1 - LAB SESSION INTRODUCTION TO MINITAB


INTRODUCTION: This lab session is designed to introduce you to the statistical software MINITAB. During this session you will learn how to enter and exit MINITAB, how to enter data and commands, how to print information, and how to save your work for use in subsequent sessions. As with any new skill, using this software will require practice and patience. BEGINNING AND ENDING A MINITAB SESSION To start MINITAB From the taskbar, choose Start > Programs > MINITAB 14 > MINITAB.

To exit MINITAB To end a MINITAB session and exit the program, choose File from the menu bar and then choose Exit. A dialog box will appear, asking if you want to save the changes made to this worksheet. Click Yes or No. It is also possible to exit MINITAB by clicking the X in the upper right corner of the window.

Technology Guide for Elementary Statistics 11e: Minitab In MINITAB, there are three ways to access commands: with menus, the Toolbar, and session commands. The Toolbar is a quick way to issue commands. When you click a button, MINITAB performs an action or opens a dialog box, exactly like the corresponding menu command. To be able to use session commands we must enable the command language editor. To do this, choose Editor > Enable Command Language. Session commands are alternatives to menu commands that you can type in the session window or in the command line editor.

MINITAB WINDOWS The main MINITAB window opens when you first start MINITAB. You will be in a window titled MINITAB - Untitled within which a split window is shown; one titled Session and the other titled Worksheet 1. The Session window displays text output such as tables of statistics. Data windows are where you enter, edit, and view the column data for each worksheet. Another window in the MINITAB environment that can be accessed through the Window menu is the Project Manager. The Project Manager summarizes each open worksheet. Within the Project manager, the History window records all the commands you have used. Graph windows display graphs.

Session Window The data window is active when you first start MINITAB. To move to the Session Window just point the mouse to the Session Window and click. In older versions of Minitab, whenever you issue a command from a menu, its corresponding Session command appears in the Session window. In version 14, the command will appear in the History folder within the Project Manager and will only appear in the session window if you have enabled the command language. You can also type Session commands directly into the Session window at the MTB> prompt. Throughout these labs, the same typographical conventions will be used as in Johnson/Kubys Elementary Statistics, 11/e. The Help Window in MINITAB Information about MINITAB is stored in the computer. If you forget how to use a command or subcommand, or need general information, you can ask MINITAB for help. There are three methods for accessing Help: choose Help from the menu, select ? from the toolbar, or press F1. It would be beneficial for you to read How to use Minitab Help the first time you enter the program to help you understand the structures used in Minitab.

Technology Guide for Elementary Statistics 11e: Minitab

Students: Practice using the HELP command by typing the following and reading what is presented on the screen: Menu Commands Choose: Help > Help Select: Index Help on Enter: MEAN

The Data Window Close Help and click in the worksheet. The worksheet is arranged by rows and columns. The columns C1, C2, C3, . . . , correspond to the variables in your data, the rows to observations. In general, a column contains all the data for one variable, and a row contains all the data for an individual subject or observation. You can refer to the columns as C1, C2, or by giving them descriptive names. Click into the column

Technology Guide for Elementary Statistics 11e: Minitab name cell (the blank space below the column number). Name column 2 Test 1, column 3 test 2, column 4 test 3 and column 5 Average

ENTERING DATA Now that we are in the data window, let's enter data in the second column: 78 94 93 81 75 62 58 50 80 79 To do this press the down arrow key ( ) or Enter to move to the next entry position.

Suppose we wish to create a column that contains the integers 1 to 10. Although we could enter these numbers directly into the Data window by typing, there is a much easier way: Menu commands Choose:Calc > Make Patterned Data > Simple Set of Numbers Enter: Store patterned data in : C1 from first value: 1 to last value: 10 Click: OK NOTE: Use the Tab key to cycle through the prompts in the dialog box.

Technology Guide for Elementary Statistics 11e: Minitab This is what the menu choices look like:

Column 1 should now contain the integers 1 through 10. While you are in the data window, fill columns 3 and 4 with a set of ten test scores each. You should now have four columns of data.

Changing a value entered We can edit data directly in the data window. Let's suppose we had incorrectly entered the third data item in the second column. It should have been a 73. Click cell C2 row 3 to make it active. Type in the correct value and press enter. Double-clicking allows insertion of new characters without retyping the entire entry. Suppose we had inadvertently left out a value and we wish to enter it in a particular position. Place the cursor in the cell in which you wish to insert the new value. Click the Insert Cells button on the taskbar. A blank cell is created and the missing value can be entered.

Technology Guide for Elementary Statistics 11e: Minitab

A cell can be deleted by making the cell active, then Choose: Edit > Delete Cells (or press the Del key). Rows of values can also be inserted or deleted in a similar manner. The menu command to insert a row is only functional when the data window is active, and a row is active. To make a row active , click the row header (ie. the row number). An empty row will be added above the active row in the Data window and the remaining rows will be moved down. Menu Commands Choose: Editor > Insert Row To print your data choose File > Print Worksheet, make the appropriate selections and click OK Suppose we wish to copy a column into another column. We can use the COPY command instead of reentering the data. Choose: Data> Copy> Copy columns to columns Enter: Copy from columns: TEST1 Select Store Copied Data in columns (choose from drop down arrow to select Column) Click: OK

To erase an entire column we use the ERASE command. Menu Commands Choose: Data > Erase Variables Enter: Columns and constants: select appropriate variable Click: OK

SAVING YOUR WORK A MINITAB project contains all of your work; the data, text output from the commands, graphs, and more. When you save a project, you save all of your work at once. When you open a project, you can pick up right where you left off. The projects many pieces can be handled individually. You can create data, graphs, and output from within MINITAB. You can also add data and graphs to the project by copying them from files. The contents of most windows can be saved and printed separately from the project, in a variety of formats. You can also discard a worksheet or graph, which removes the item from the project without saving it. Lets save the project and name it Intro. Be sure to note where you are saving it.

Technology Guide for Elementary Statistics 11e: Minitab

To open, save, or close a project To open a new project, choose File > New, click Project, and click OK. To open a saved project, choose File > Open Project. To save a project, choose FILE > Save Project. To close a project, you must open a new project, open a saved project, or exit MINITAB.

RETRIEVING A FILE To retrieve the project that we had saved in the previous session: Menu Commands Choose: File > Open Project Click: Look in drop-down list arrow Locate the file Double-click: INTRO.MPJ Click: OPEN The data window now displays the test data you saved previously. A CD ROM accompanies Johnson/Kubys Elementary Statistics, 11/e. Follow the instructions that accompany the disk for use on your computer.

Technology Guide for Elementary Statistics 11e: Minitab ASSIGNMENT: 1. Create a data file on your disk that consists of the heights of 15 of your classmates (in column 1) and their weights (in column 2). 2. Retrieve the data file created in #1 above, and produce a paper copy (commonly called 'hard-copy') to hand in. Retrieve the data file for Exercise 2.23 from the Students Suite CD-ROM that accompanies the Johnson/Kuby text, and print a hard copy to hand in.

3.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 2 - LAB SESSION 1 GRAPHIC PRESENTATION OF UNIVARIATE DATA


INTRODUCTION: Graphically representing data is one of the most helpful ways to become acquainted with the sample data. In this lab you will use MINITAB to present data graphically. You will be analyzing data using six types of graphs: Pie Charts, Pareto diagrams, Dot plots, Stem-and-leaf displays, Histograms, and cumulative (relative) frequency plots (Ogives).

GRAPHIC PRESENTATIONS OF DATA There are several ways to display a picture of the data. These graphical displays help us get acquainted with the data and to begin to get a feel for how the data is distributed. To see what is available to you, use the menu bar to select Graph. Note the different types of graphs that are listed there. We will use the menu bar to make our selections.

PIE CHARTS Circle graphs (pie diagrams) show the amount of data that belong to each category as a proportional part of a circle. Consider Example 2.1. We are instructed to construct a pie chart, with data presented as a frequency distribution. Enter the data into the sheet.

Menu Commands Choose: Graph > Pie Chart Click: Chart value from a table Enter: Categorical variable : C1 Summary variables : C2 Select: Labels > Title/Footnotes Enter: Household populations Select: Slice Labels > select desired labels OK OK

Technology Guide for Elementary Statistics 11e: Minitab

The chart will come up in a new sheet.

PARETO DIAGRAMS In attempting to get a pictorial representation of data, we must decide what type of graphic display would best present the data and their distribution. Consider Exercise 2.11. We are instructed to construct a Pareto diagram in this instance since this a quality control application. In constructing a Pareto diagram for Exercise 2.11, we must enter words, called text data, in column 1 to indicate the category of the defect, and the corresponding frequency for each defect in column 2.

After entering the data in the worksheet and naming the columns, use the following commands to construct the Pareto diagram:

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Quality Tools > Pareto Chart Click: Chart defects table Enter: Labels in: C1 Frequencies in : C2 Combine all defects after 99 % into one bar Options: Title: Clothing Defects Reminder: Use the Tab key to cycle through the prompts in the dialog box. OK OK

Technology Guide for Elementary Statistics 11e: Minitab

As you have already seen, each time you make a selection, MINITAB displays the corresponding command in the History folder. The History folder provides a permanent record of all the commands issued within the project. Now look at the last three commands that appear before the Pareto diagram. The display includes a subcommand, which provides additional information about the preceding command. Subcommands represent the options you select in dialog boxes. DOT PLOTS Dot plots are a quick and efficient way to get a preliminary understanding of the distribution of your data. It results in a picture of the data as well as sorts the data into numerical order. Enter the data for Exercise 2.19 into column 3 of the current worksheet, and name the column PtsScrd. Construct a dotplot of the data listed in C3: Menu Commands Choose: Graph > Dotplot Enter: Variables: PtsScrd Select: One Y/Simple Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

Situations arise in which we wish to compare data from different populations. This can be accomplished by doing what is called a side-by-side dotplot. Consider Exercise 2.184. Retrieve the data from the Students Suite CD-ROM (EX02-184). The commands to construct the multiple dotplot are as follows:

Menu Commands Choose: Graph> Dotplot Select: Simple Click; Multiple graphs Choose: In separate panels of the same graph Graph: C1 C2 Click: OK

The menu selections look as follows:

Technology Guide for Elementary Statistics 11e: Minitab

The dot plot appears in a new window.

STEM AND LEAF DISPLAY To illustrate the commands necessary to construct a stem-and-leaf display, let's use the data in column 3 (points scored) from the previous worksheet. Menu Commands Choose: Graph > Stem-andLeaf Enter: Variable: C3 Uncheck Trim Outliers Click OK

Technology Guide for Elementary Statistics 11e: Minitab

The stem and leaf chart appears in the session window. Which part of the diagram is the stem? the leaf? What do the numbers to the left indicate?

Try doing a stem-and-leaf for this data choosing various increment values. Notice, originally we did not specify an increment. What was MINITAB's response? How does the diagram change for the different increments you chose? Which is more informative?

HISTOGRAMS Histograms are more useful for large sets of data. We expect the histogram of a sample to be similar to that of the population. To illustrate the many options under the HISTOGRAM command, let's use the data in Exercise 2.39 (on the Students Suite CD). The HISTOGRAM command separates the data into intervals on the x-axis and draws a bar for each interval whose height, by default, is the number of observations (or frequency) in the interval. Menu Commands Choose: Graph > Histogram Choose: Simple OK Enter: Graph Variables: C1 Click: Scale Select the Y-scale Type tab Choose: Frequency or Percent

Technology Guide for Elementary Statistics 11e: Minitab

Histogram of GolfScor
30 25 20

Frequency

15 10 5 0

69

72

75 GolfScor

78

81

With the most basic of histograms you do not get the detail necessary for a proper interpretation of the data. To get the following enhanced histogram, under Labels click Show data labels and select Use y-value data labels radio button. Click OK. OK
Histogram of GolfScor
30 25
22 28

20
Frequency
17 17

15 10 5
1 10 8 5 3 9 9 9

4 1 1 1 1

69

72

75 GolfScor

78

81

Experiment with the many options within the HISTOGRAM command. Which of the options give you a clearer representation of the relationships within the data?

Technology Guide for Elementary Statistics 11e: Minitab

OGIVES To construct an ogive, the class boundaries must be in listed in C1 and the cumulative percentages listed in C2. Let's use Exercise 2.55 in your text as an example. The data is presented as a grouped frequency distribution. You need this same information presented as a cumulative relative frequency distribution: Class Boundaries 0 <= x < 4 4 <= x < 8 8 <= x < 12 12 <= x < 16 16 <= x < 20 20 <= x < 24 24 <= x <= 28 Cumulative Relative Frequency 4/50, or 0.08 12/50, or 0.24 20/50, or 0.40 40/50, or 0.80 46/50, or 0.92 49/50, or 0.98 50/50, or 1.00

Frequencies 4 8 8 20 6 3 1

In a new worksheet, enter the class boundaries in C1 and the cumulative percentages in C2. Be sure to enter 0 for the percent paired with the lower boundary of the first class and pair each cumulative percentage with the class upper boundary.

To plot an ogive: Menu commands Choose: Graph > Scatterplot Choose: With Connect Line Click: OK Choose: Y- variables.: C2 X-variables: C1 Click Labels Under Title enter: KSW TEST SCORES OK

Technology Guide for Elementary Statistics 11e: Minitab

KSW TEST SCORES


1.0

0.8

0.6

C2
0.4 0.2 0.0 0 5 10 15 C1 20 25 30

ASSIGNMENT: Do Exercises 2.7, 2.19, 2.43, 2.48, 2.54 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 2 - LAB SESSION 2 NUMERICAL PRESENTATION OF UNIVARIATE DATA


INTRODUCTION: The basic idea of descriptive statistics is to describe a set of data in a variety of abbreviated ways. In this lab you will investigate measures of central tendency and dispersion. The box-and-whiskers display, a graphical display of the 5-number summary of a set of data, will also be introduced. MEASURES OF CENTRAL TENDENCY AND DISPERSION Measures of central tendency and variation are the foundation of descriptive statistics but most of these formulas are quite tedious to compute, even with a calculator. Fortunately, we can find a number of commonly used descriptive statistics using just a single command. Enter the data in Exercise 2.76 into C1. Get a dot-plot of your data and visually approximate the "center".
Dotplot of C1

7 C1

10

11

Calculate the mean and median using the following commands. Menu Commands Choose: Calc > Column Statistics Select: Median Enter: Input variable: C1 Click: OK

Technology Guide for Elementary Statistics 11e: Minitab


Median of C1
Median of C1 = 7

Choose: Calc > Column Statistics Select: Mean Click: OK


Mean of C1
Mean of C1 = 6.93333

If you are interested in a variety of statistics, including median and mean, these values can be found more easily using the following:

Menu Commands Choose: Stat > Basic Statistics > Display Descriptive Statistics Enter: Variables: C1 Click: OK

The output appears in the session window.


Descriptive Statistics: C1
Variable C1 N 15 N* 0 Mean 6.933 SE Mean 0.452 StDev 1.751 Minimum 4.000 Q1 6.000 Median 7.000 Q3 8.000 Maximum 11.000

We can also find values by entering a formula and storing the result in a column. For example, the midrange = (high + low)/2. To do this in MINITAB we would do the following: Select: Calc>Calculator Store Result In: midrange Type in the expression: ( MAX(C1) + MIN(C1) ) /2 There is now a new column in the worksheet named midrange, which contains the results of the expression. Midrange 7.5

Technology Guide for Elementary Statistics 11e: Minitab

Visually locate the three calculated centers on the dot plot. Notice the three measures of central tendency are approximately the same. How well did you visually approximate the center? Now, place the values of C1 plus 4 into column C2, do a dot plot, visually locate the 'center', then determine the mean, median and midrange. Select: Calc>Calculator Store Result In: C2 Type in the expression: C1 + 4
Dotplot of C2

10

11 C2

12

13

14

15

Descriptive Statistics: C2
Variable C2 Variable C2 N 15 N* 0 Mean 10.933 SE Mean 0.452 StDev 1.751 Minimum 8.000 Q1 10.000 Median 11.000 Q3 12.000

Maximum 15.000

midrange 11.5

How did the three measures of central tendency (mean, median, and midrange) change?

Technology Guide for Elementary Statistics 11e: Minitab Next, place the values of C1 times 3 into C3, and follow the procedure above.

Dotplot of C3

12

15

18

21 C3

24

27

30

33

Descriptive Statistics: C3
Variable C3 N 15 N* 0 Mean 20.80 SE Mean 1.36 StDev 5.25 Minimum 12.00 Q1 18.00 Median 21.00 Q3 24.00 Maximum 33.00

midrange 22.5

Compare the three measures of central tendency for the columns of data C1, C2 and C3. How and why did a change in the measures occur? If a different transformation was performed (such as dividing each entry in C1 by 2) could you make an educated guess about the effect on these three measures?

Consider Applied Example 2.11 in the text. Retrieve the data and do a dot-plot and calculate the mean, median and midrange. What is there about the distribution of these ten data values that causes these three averages to be so different?

Technology Guide for Elementary Statistics 11e: Minitab

Dotplot of AnnIncom

28000

32000

36000

40000 AnnIncom

44000

48000

52000

Descriptive Statistics: AnnIncom


Variable AnnIncom N 10 N* 0 Mean 35400 SE Mean 2413 StDev 7631 Minimum 25500 Q1 31500 Median 33375 Q3 37875 Maximum 54000

midrange 39750 Compare the standard deviations for each of the previous four examples, along with how similar or how different the three measures of central tendency were. Can we use the standard deviation to predict whether we expect these three measures of central tendency to be quite similar or quite different?

FREQUENCY DISTRIBUTIONS When the sample data are in the form of a frequency distribution, we can still use MINITAB to describe the distribution. The class marks need to be listed in one column with the corresponding frequencies in another. Start a new worksheet (Choose: File > New > Worksheet), and enter the following information, where X represents the number of radios in a household and Freq is the number of households having X radios:

Technology Guide for Elementary Statistics 11e: Minitab X 1 2 3 4 5 6 7 Freq 20 35 100 90 65 40 5

Name C1 as X, C2 as Freq. Create C3 to be xf and C4 to be x2f by using the calculator. Determine the mean using the following expression: SUM(C3)/SUM(C2)
mean 3.80282

Use the calculator to evaluate each of the following expression: SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2) (SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2))/(SUM(C2)-1) SQRT((SUM(C4)-(SUM(C3)*SUM(C3))/SUM(C2))/(SUM(C2)-1))

sum of x^2 676.197

variance 1.91016

stddev 1.38209

*Reminder: in the case of a grouped frequency distribution enter the class marks in one column and the corresponding frequencies in another.

BOX-AND-WHISKER DISPLAY The boxplot (MINITAB's name for the box-and-whisker display) is a simple graph that gives a graphic 5-number summary. Information about the center, dispersion, and skewness of a data set will be illustrated. We will use the data from Ex 2.184 from Lab 1. Menu Commands Choose: Graph > Boxplot Choose: One Y, simple Click: OK Enter: Graph variables: Atmospheric, (then Chemical) Click OK

Technology Guide for Elementary Statistics 11e: Minitab

Technology Guide for Elementary Statistics 11e: Minitab A rectangle is constructed between the two quartiles, with a line across the box indicating the location of the median. The box encloses the middle half of the data. The whiskers extend in either direction to indicate the maximum and minimum values.

The BOXPLOT command can also be used to produce a side-by-side boxplot, for comparison among the variables. Menu Commands Choose: Graph > Boxplot Choose: Mulitple Ys, Simple Enter: Graph variables: Atmospheric, Chemical Click: OK

Technology Guide for Elementary Statistics 11e: Minitab Consider again the salary data presented in Application 2 - 2. Retrieve the data from the Student Suite CD and perform a BoxPlot of the data in column A.

Annual Income
55000

50000

45000 AnnIncom

40000

35000

30000

25000

The asterisk ( * ) included in the boxplot indicates an outlier- a data value that is far removed from the rest of the data.

ASSIGNMENT: Do Exercises 2.76, 2.118, 2.125, 2.126 and 2.128 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 3 -LAB SESSION 1 PRESENTATION OF BIVARIATE DATA


INTRODUCTION: It is frequently interesting to view the relationship of two variables. In this lab we will see how MINITAB can help us plot bivariate data and discover some trends in the relationship. We can set up the data as ordered pairs, with the independent variable as the x and the dependent variable as the y. TABULAR PRESENTATION OF BIVARIATE DATA We can arrange the data resulting from two qualitative variables in a cross tabulation or contingency table. These tables often show relative frequencies (percentages) that can be based on the entire sample, or on the subsample classification (either a row or a column). Lets use the data in the Highway Speed Limits table in Exercise 3.6. Retrieve the data (EX03006). Note the data is arranged as follows: Column C1 is titled State. Column C2 is titled Cars, and column C3 is titled Trucks. To construct a cross-tabulation table of the two variables, vehicle type and maximum speed limit: Menu Commands Choose: Stat > Tables > Cross Tabulation and Chi-Square Enter: Categorical Variables: For Rows: C2 For Columns: C3 Select: Counts Click: OK

Technology Guide for Elementary Statistics 11e: Minitab


Results for: Ex03-06.MTW Tabulated statistics: Cars, Trucks
Rows: Cars 55 55 65 70 75 All 1 2 2 0 5 60 0 1 1 0 2 Columns: Trucks 65 0 17 1 1 19 70 0 0 12 1 13 75 0 0 0 11 11 All 1 20 16 13 50

Cell Contents:

Count

Now lets do the same thing, only this time Select the total percent box.
Tabulated statistics: Cars, Trucks
Rows: Cars 55 55 65 70 75 All 2 4 4 0 10 60 0 2 2 0 4 Columns: Trucks 65 0 34 2 2 38 70 0 0 24 2 26 75 0 0 0 22 22 All 2 40 32 26 100

Cell Contents:

% of Total

This table would be useful in answering questions such as: What percent of the states have a maximum speed limit of 75 for both cars and trucks? (22%) What percent of the states have different maximums for cars and trucks? (4% + 2% + 4% + 2% + 2% + 2% + 2% = 18%) We can repeat the procedure for Row percents and Column percents. Using Row Percents:
Tabulated statistics: Cars, Trucks
Rows: Cars 55 55 65 70 75 All 100.00 10.00 12.50 0.00 10.00 Columns: Trucks 60 0.00 5.00 6.25 0.00 4.00 65 0.00 85.00 6.25 7.69 38.00 % of Row 70 0.00 0.00 75.00 7.69 26.00 75 0.00 0.00 0.00 84.62 22.00 All 100.00 100.00 100.00 100.00 100.00

Cell Contents:

Technology Guide for Elementary Statistics 11e: Minitab

Questions easily answered using this table would be: What percent of the states whose maximum speed for cars is 65 have a maximum speed for trucks of 60? (5.00%) What percent of the states whose maximum speed for cars is 55 have a higher maximum speed for trucks? (0%)
Tabulated statistics: Cars, Trucks
Rows: Cars 55 55 65 70 75 All 20.00 40.00 40.00 0.00 100.00 Columns: Trucks 60 0.00 50.00 50.00 0.00 100.00 65 0.00 89.47 5.26 5.26 100.00 70 0.00 0.00 92.31 7.69 100.00 75 0.00 0.00 0.00 100.00 100.00 All 2.00 40.00 32.00 26.00 100.00

Cell Contents:

% of Column

What type of questions would easily be answered using this table?

SCATTER DIAGRAMS To do a scatter diagram illustrating the relationship between two quantitative variables we will enter the data into two columns. For this illustration the data from Table 3-10 will be used (TA03-10). Menu Commands Choose: Graph > Scatterplot Choose: Simple OK

Enter: Y variables: C2 X variables: C1 Labels > Title: your title Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

Data for Push-ups and Sit-ups


60

50

Push_Ups

40

30

20

10 25 30 35 40 Sit_Ups 45 50 55

Technology Guide for Elementary Statistics 11e: Minitab For the person(s) that did 35 push-ups, how many sit-ups were they able to do? How many push-ups and sit-ups were done by the person represented by the dot in the upper right corner?

To compare these two variables in a different way, lets do a side-by-side box-andwhisker display.
Boxplot of Push_Ups, Sit_Ups
60

50

40 Data 30 20 10 Push_Ups Sit_Ups

Compare the two types of exercises. Which indicates greater range of ability? Which exercise do most of those sampled find more difficult to do (as measured by number done)?

ASSIGNMENT: Do Exercises 3.11, 3.25 in your text

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 3 - LAB SESSION 2 CORRELATION AND REGRESSION


INTRODUCTION: Not only is it important to analyze single variables, but frequently one needs to determine if and how two variables are related. The correlation coefficient is a measure of the strength of the linear relationship between two variables. In these exercises you will use MINITAB to analyze this statistic, and these exercises will also give you a very brief introduction to linear regression. INVESTIGATIONS OF THE CORRELATION COEFFICIENT The data set below is a sample of weight and waist size for 11 women. You will use that data to estimate the correlation between a woman's weight and her waist size. Once that value has been determined you will show that this value is independent of the scale of the two variables. Weights and Waist Sizes weight(lbs): 110 143 120 127 143 111 137 154 123 104 140 waist (ins): 22 29 27 26 27 24 28 28 26 25 23

C1='weight' C2='waist' Get a scatter diagram of the bivariate data set. The variable 'WEIGHT' should be on the x-axis and 'WAIST' on the y-axis. Menu Commands Choose: Graph > Scatterplot Select: Simple Enter: Y variables: C2 X variables: C1 Click: Label: Title: Weight and Waist Size Click: OK
waist

Weight and Waist Size


29 28 27 26 25 24 23 22 100 110 120 130 weight 140 150 160

Technology Guide for Elementary Statistics 11e: Minitab Calculate the descriptive statistics of each variable.
Descriptive Statistics: weight, waist
Variable weight waist Variable weight waist N 11 11 N* 0 0 Mean 128.36 25.909 SE Mean 4.89 0.667 StDev 16.21 2.212 Minimum 104.00 22.000 Q1 111.00 24.000 Median 127.00 26.000 Q3 143.00 28.000

Maximum 154.00 29.000

Calculate the correlation coefficient, r. Menu Commands Choose: Stat > Basic Statistics > Correlation Enter: Variables: C1 C2 Click: OK

Correlations: weight, waist


Pearson correlation of weight and waist = 0.603

Technology Guide for Elementary Statistics 11e: Minitab

QUESTIONS: 1. Would you say that the variables were positively or negatively correlated? Is there a strong or weak correlation? If you were to add an equal amount of weight to each woman (assume no change in waist size), would the value of r, the correlation coefficient, change? Test your conjecture by adding 25 lbs. to each woman's weight and recalculate r. NOTE: Assign the results of C1 + 25 to C3 Then calculate the correlation for C3 against C2 If you were to change the scale of the variables: weight to kg and waist size to meters, would the value of r change? Test your conjecture by multiplying 'WEIGHT' by 0.453 and 'WAIST' by .0254 and recalculate r. How will the scatter diagram change when you change the scales? The last observation in your data set was for a model known for her especially thin figure. If you eliminated it from the data set, how much would r change? Would you say that the statistic, r, is sensitive to extreme observations? Explain.

2.

3.

4.

INTERPRETATION OF THE CORRELATION COEFFICIENT In this next section, we will be examining some scatter diagrams of computer-generated data to gain a more thorough understanding of just what the value of the correlation coefficient means. For each pair of variables, you will calculate r and look at the corresponding scatter diagram. Enter the values from 0 to 50 for your first variable and name your variable. (Reminder: Menu Commands Choose: Calc > Set Patterned Data) Name column C1 X and name C2 Random Generate a set of random numbers. Menu Commands Choose: Calc > Random Data > Normal Enter: Generate 51 rows of data Store in column(s): C2 Click: OK

Note: by default the 51 random numbers are from the normal distribution with mean 0 and standard deviation 1

Technology Guide for Elementary Statistics 11e: Minitab

Get a scatter diagram of the two variables and calculate r.

Scatterplot of Random vs X
3

Random

-1

-2 0 10 20 X 30 40 50

When comparing your output to that presented here, remember you are working with random data and there may be variation in results.
Correlations: X, Random
Pearson correlation of X and Random = -0.06

Generate a set of y values which has no random component using the expression 2 + 0.5 * C1 and store results in column 3. Get a scatter plot of C3 versus C1 and determine the value of r.

Technology Guide for Elementary Statistics 11e: Minitab

Scatterplot of C3 vs X
30 25 20

C3

15 10 5 0 0 10 20 X 30 40 50

Correlations: C3, X
Pearson correlation of C3 and X = 1.000

Generate a set of y values that have a small random component and repeat above procedure. Use the expression 2 + 0.5 * C1 + C2 storing results in C4

Correlations: C4, X
Pearson correlation of C4 and X = 0.989
Scatterplot of C4 vs X
30 25 20 C4 15 10 5 0 0 10 20 X 30 40 50

Technology Guide for Elementary Statistics 11e: Minitab Generate a set of y values that are negatively correlated, and repeat above procedure. Use the expression 2 - 0.5 * C1 + C2 storing the result in C5
Scatterplot of C5 vs X

-5

C5

-10

-15

-20

-25 0 10 20 X 30 40 50

Correlations: C5, X
Pearson correlation of C5 and X = -0.990

Generate a set of y values that have a large random component and repeat previous procedure. Use the expression 5 + 0.5 * C1 + 2 * C2 storing the results in C6
Scatterplot of C6 vs X
30

25

20

C6
15 10 5 0 10 20 X 30 40 50

Correlations: C6, X
Pearson correlation of C6 and X = 0.958

Technology Guide for Elementary Statistics 11e: Minitab Generate a set of y values that are non-linearly related to x. Use the expression SQRT(0.1*C1) storing the results in C7
Scatterplot of C7 vs X
2.5

2.0

1.5

C7
1.0

Correlations: C7, X
Pearson correlation of C7 and X = 0.974

0.5

0.0 0 10 20 X 30 40 50

Generate a second set of y values which are related but not linearly related to x and repeat previous procedure. Use the expression 9 - (C1 - 25)**2 storing the results in C8
Scatterplot of C8 vs X
0 -100 -200 -300 -400 -500 -600 -700 0 10 20 X 30 40 50

Correlations: C8, X
Pearson correlation of C8 and X = -0.000

QUESTIONS: 1. Using the results from above, what type of relationship can you determine between the correlation coefficient and the scatter plot? What type of pattern do you see in the scatter diagram when r is close to zero? When r is close to one? What is the pattern like when r is negative? 2. Does r being close to zero imply that the two variables are unrelated? Check C8 versus C1 before answering this question.

C8

Technology Guide for Elementary Statistics 11e: Minitab LINEAR REGRESSION We will illustrate the default output generated by the Minitab Regression command using exercise 3.75. Retrieve the data from text Exercise 3.75.(EX03-075) Get a feeling for whether years of schooling and median usual weekly earnings are correlated by doing a scatterplot.

Determine the linear correlation coefficient for this data.

Technology Guide for Elementary Statistics 11e: Minitab

Calculate the line of best fit. Menu Commands Choose: Stat > Regression > Regression Enter: Response: C2 Predictors: C3 Click: OK

Technology Guide for Elementary Statistics 11e: Minitab Notice that a great deal of information is generated, but we only need the first two lines. We can also do the scatterplot with the regression line included. Just choose Scatterplot with Regression .

ASSIGNMENT: Do Exercises 3.20, 3.38, 3.45, 3.59, 3.99 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 5 - LAB SESSION RANDOM NUMBERS AND PROBABILITY


INTRODUCTION: This lab session is designed to introduce you to random numbers and their use in simulating experiments. The outcomes of events in normal life cannot be predicted, but it is possible to have an idea of what outcomes are possible. The theory of probability was developed to help analyze experiments whose outcomes are uncertain. We can use MINITAB to simulate certain experiments such as flipping a coin or rolling a die. RANDOM NUMBERS You were introduced to the RANDOM command in Chapter 3 - Lab Session 2. Remember, if you select the normal distribution it will generate a sequence of random numbers from the normal distribution with a mean of 0 and a standard deviation of 1.0. Different distributions require different parameters. You can specify these parameters depending on the distribution you choose.

You can specify the numbers to be integer within a certain range using the Integer subcommand. For example, if we wanted to simulate the outcomes for tossing a coin 100 times we would use the following commands: Menu Commands Choose: Calc > Random Data > Integer Enter: Generate 100 rows of data Store in column(s): C1 Minimum value: 1 Maximum value: 2 Click: OK

Take a look at the results in column 1 of the worksheet. Give the relative frequency for a head (1) and a tail (2) based on the MINITAB output for C1. Certainly, let the computer do the work:

Technology Guide for Elementary Statistics 11e: Minitab Menu Commands Choose: Stat > Tables > Tally Individual Variables Enter: Variables: C1 Select: Percents Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

Questions: 1. What commands would be used for simulating the rolling of a die 50 times? 2. Restart MINITAB and place 50 simulated rolls into columns C1 and C2. Give the relative frequency for the outcomes 1, 2, 3, 4, 5, and 6 based on the MINITAB output.

THE LAW OF LARGE NUMBERS To see how the law of large numbers works, we need to create a third column with the sums of two dice rolls simulated by columns C1 and C2 (created in the question set above.)

Menu Commands Choose: Stat > Tables > Tally Individual Variables Enter: Variables: C3 Select: Counts Percents Cumulative Counts Cumulative Percents Click: OK

The results appear as the following table:

Technology Guide for Elementary Statistics 11e: Minitab


Tally for Discrete Variables: C3
C3 2 3 4 5 6 7 8 9 10 11 N= Count 3 2 7 4 6 6 11 5 4 2 50 CumCnt 3 5 12 16 22 28 39 44 48 50 Percent 6.00 4.00 14.00 8.00 12.00 12.00 22.00 10.00 8.00 4.00 CumPct 6.00 10.00 24.00 32.00 44.00 56.00 78.00 88.00 96.00 100.00

Interpreting the results: 1) What is the observed probability of obtaining a sum of 2 on the dice? 2) What is the observed probability of obtaining a sum of 7 on the dice? 3) What is the observed probability of obtaining a sum of 11 on the dice?

Using similar commands, create C4 and C5 containing 500 simulated rolls of a single die and C6 containing the sums of these 500 simulated rolls of 2 dice. 4) Answer the above three questions about C6. How do the answers compare to the theoretical probability? (Use both numerical and graphical evidence.)

THE BINOMIAL PROBABILITY DISTRIBUTION Consider the following situation: Suppose you bought four light bulbs. The manufacturers claim that 85% of their bulbs will last at least 700 hours. If the manufacturer is right, what are the chances that all four of your bulbs will last at least 700 hours? That three will last 700 hours, but one will fail before that? Consider another situation. You've somehow gotten enrolled in a class in advanced Greek Mythology. You dont know anything about mythology but youre to take a pop quiz. You'll have to guess on every question. It's a multiple-choice test; each of the 20 questions has 3 possible answers. To pass you must get at least 12 correct. What are the chances that you'll pass?

Technology Guide for Elementary Statistics 11e: Minitab How would you answer the above questions? MINITAB can help us with this by using the PDF (Probability Density Function) command to generate binomial probabilities. (Remember what a binomial distribution requires.) Calculating Binomial Probabilities with PDF To obtain the probability of each possible outcome for a binomial distribution with n = 10 and p = 0.1, you will use the following commands. To issue menu commands in this case, you must first create a column with the values for which you wish to find the corresponding probabilities. So before issuing the following menu commands, restart MINITAB then create a column (say, C1) of patterned data containing the integers from 0 to 10. Menu Commands Choose: Calc > Probability Distributions > Binomial Select: Probability Enter: Number of trials: 10 Probability of success: .1 Input column: C1 Click: OK

Technology Guide for Elementary Statistics 11e: Minitab This results in the following table:
Probability Density Function
Binomial with n = 10 and p = 0.1 x 0 1 2 3 4 5 6 7 8 9 10 P( X = x ) 0.348678 0.387420 0.193710 0.057396 0.011160 0.001488 0.000138 0.000009 0.000000 0.000000 0.000000

Looking back to our original questions, to find the probability that three of your four light bulbs will be successes (last more than 700 hours) and one will fail we use: Calc>Probability Distribution>Binomial Select: Probability Enter: Number of Trials: 4 Probability of success: .85 Select: Input constant Enter: 3 Click: OK
Probability Density Function
Binomial with n = 4 and p = 0.85 x 3 P( X = x ) 0.368475

Technology Guide for Elementary Statistics 11e: Minitab If we wish to determine the probabilities for all values 0, 1, 2, ..., n, enter 0 through n into a column, and choose Input column, rather than input constant. You also have the option of storing the probabilities in another column.
Binomial with n = 4 and p = 0.85 x 0 1 2 3 4 P( X = x ) 0.000506 0.011475 0.097538 0.368475 0.522006

Cumulative Distribution Function (CDF) The CDF command calculates cumulative probabilities. A cumulative probability is the probability that your result will be less than or equal to a particular value. As an example, suppose we calculate the probability you will fail the test in advanced Greek Mythology. Here n = 20 and p = .3333. You will fail the test if you get less than or equal to 11 questions correct. (You will pass if you get 12 or more right.) The following commands can be used to calculate this probability: Menu Commands Choose: Calc > Probability Distributions > Binomial Click: Cumulative probability Enter: Number of trials: 20 Probability of success: .333333 Click: Input constant: 11 Click: OK
Cumulative Distribution Function
Binomial with n = 20 and p = 0.333333 x 11 P( X <= x ) 0.987027

This is the probability you will fail. So, what is the probability that you will pass? ( 1 .9870 = .013) Again, we can place the results of the CDF command in a particular column of the worksheet by entering a column in the Optional storage specification in the dialog box.

Technology Guide for Elementary Statistics 11e: Minitab Mean and Standard Deviation of the Binomial Distribution Using the actual values for n, p and q, the calculator within Calc>Calculator in MINITAB can be used to determine the mean and standard deviation for the binomial distribution. Use the expressions: n*p store result in mean (remember: you are SQRT(n*p*q) store result in std dev substituting the actual values for n, p and q) We could also use the columns created in the worksheet to do the calculations. Assuming C1 contains the X values, and C2 contains the corresponding probabilities SUM(C1*C2) will produce the mean SQRT(SUM(C1**2*C2)-(SUM(C1*C2))**2)) ASSIGNMENT: Do Exercises 4.32, 5.36, 5.68, 5.69 in your text

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 6 - LAB SESSION NORMAL APPROXIMATION OF THE BINOMIAL


INTRODUCTION: The normal distribution is one of the most important distribution functions in statistics. We will now see how the binomial probabilities can be reasonably estimated by using the normal probability distribution. Later we will need to determine whether normality is a reasonable assumption. A common graphical technique for checking whether a sample comes from a normal population is to create a normal probability plot or N(PP). It is a pairing of values with their corresponding z-scores. If this forms a straight line, the data is normally distributed; otherwise it is not. We will start our investigation with a few specific binomial distributions. Step 1: Entering the data. For this demonstration we will use columns C1, C4, and C7 to hold a series of numbers. The corresponding probabilities will be placed into C2, C5 and C8. Use the following commands to enter the numbers 0, 1, 2, 3, and 4 into column 1: Menu Commands Choose: Calc > Make Patterned Data > Simple Set of Numbers Enter: Store Patterned data in C1 From first value: 0 To last value: 4 In steps of: 1 OK

Use similar commands to set C4 to the numbers 0, 1, ..., 8 and to set C7 to the numbers 0, 1, 2, ..., 24. These three columns will be used for three specific situations: n = 4, n = 8, and n = 24. Step 2:Calculating and Storing the Probabilties. We will now place the binomial probabilities for C1 into C2 using the PDF function (Probability Distribution Function) with n = 4 and p = .5. Menu Commands Choose: Calc > Probability Distributions> Binomial Select: Probability radio button Enter: Number of trials: 5 Probability of success: .5 Input column: C1 Optional storage: C2 OK

Technology Guide for Elementary Statistics 11e: Minitab

Place the binomial probabilities for C4 into C5 and C7 into C8, being sure to use n=8 and n=24, respectively. Step 3:Plotting the Probabilities Now we will plot each of the probabilities of x for 0 to n for n=4 by using the following command: Menu Commands Choose: Graph > ScatterPlot > Simple Enter: Y variable: C2 Xvariable: C1 Click: Data View tab Check: Project line Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

Scatterplot of C2 vs C1
0.40 0.35 0.30 0.25

C2
0.20 0.15 0.10 0.05 0 1 2 C1 3 4

Repeat this procedure for plotting C5 versus C4 and C8 versus C7.What can we say about the distribution as n becomes larger?

Step 4:Interpreting the results. Let's see how the normal approximates a binomial with p = .5 and n = 8. The approximating normal has mu = 8(.5) = 4 and sigma = sqrt((8)(.5)(.5)) = 1.414. First, we need to place the normal probabilities for each x (C4) into another column, say C6 Menu Commands Choose: Calc > Probability Distributions > Normal Select: Probability Density radio button Enter: Mean 4 Standard Deviation 1.414 Input column: C4 Optional storage: C6 OK

Technology Guide for Elementary Statistics 11e: Minitab Then to do a multiple plot, use the following commands: Menu Commands Choose: Graph > Scatter Plot > Simple Enter: Y Variables X Variables C5 C4 C6 C4 Click: Multiple graphs radio button Select: Overlaid on same graph OK
Scatterplot of C5, C6 vs C4
0.30 0.25 0.20
Variable C5 C6

Y-Data

0.15 0.10 0.05 0.00 0 1 2 3 4 C4 5 6 7 8 9

The ScatterPlot command just executed plotted the pdf for the binomial and for the normal approximation on the same axes. This will help us see why we can approximate a binomial by a normal and how to do the appropriate calculations. The MPLOT plots the first pair of columns with the black dot and the second pair with the red box. When two points overlap, it plots both. In this multiple plot there are many overlaps. You should visualize the histogram corresponding to the binomial probabilities. The height of a bar is the probability the binomial variable is equal to the corresponding value. For example, the height of the bar centered at 5 is the probability that the binomial variable is equal to 5. The base of a bar is 1 unit wide. Therefore, the area of a bar is equal to its height, and is thus equal to the corresponding probability.

Technology Guide for Elementary Statistics 11e: Minitab Also visualize the normal curve. (Draw a smooth curve through the values plotted from the normal pdf). Here are some calculations that will help the explanation. Suppose we want the probability the binomial is from 5 to 7. This probability is the sum of the probabilities at 5, 6, and 7. (Look in Rows 6, 7 and 8 in column C4.) The area under the normal curve that goes from 4.5 to 7.5 approximates the area of the three binomial bars. How could we determine this area?

Hint: In C11 enter 4.5 and 7.5. Then calculate the CDF for these two numbers and store them in C12 Menu commands Choose: Calc > Probability Distributions > Binomial Select: Cumulative probability Enter: Number of trials: 8 Probability of success: .5 Input columns: C11 Optional storage: C12 OK

The probability the binomial variable has a value from 5 to 7 is .359375 . The approximation obtained from the normal probability distribution is .993343 - .638183 = .355160, which is very close to the true probability. If we were to use a normal approximation for a binomial with p = .5 and n = 24 (like we did in columns C5 and C6 for n = 8), the approximation would look even better. In the exercises, we'll look at other values of p.

Technology Guide for Elementary Statistics 11e: Minitab

EXTENSIONS OF THE THEORY The Normal Probability Plot The z-scores can be calculated by MINITAB using the following commands and specifying the column in which to place the values. We will calculate the z-scores for C1, place them in C3, and then plot them as follows: Menu Commands Choose: Calc > Standardize Click: Input Column: C1 Enter: Result in: C3 Click: subtract mean and divide by std dev Click: OK Choose: Graph > ScatterplotPlot > Simple Enter: Y Variable: C3 X Variable: C1 Click: OK.

Technology Guide for Elementary Statistics 11e: Minitab

Scatterplot of C3 vs C1
1.5 1.0 0.5

C3

0.0 -0.5 -1.0

2 C1

Do the scores fall approximately on a straight line?

Repeat this for n = 8 and n = 24. Lets repeat this for other distributions: 1) Generate 50 random numbers, normally distributed and place them in column 10. Calculate their z-scores and place them in column 11. Then plot C10 against C11. 2) Repeat this for 100 random numbers in C12. 3) Repeat this again for a uniform distribution in C14. (Replace the subcommand Normal with Uniform 0 1.) Are you getting approximately straight lines?

ASSIGNMENT: 1. (a) Make plots as in the first part of the lab, but use p = .4 instead of p = .5. Use n = 4, 8 and 24. (b) Repeat part (a) using p = .2. (c) What can you say about the normal approximation to the binomial? For what values of n and p does it seem to work best?

Technology Guide for Elementary Statistics 11e: Minitab 2. Suppose X has a binomial distribution with p = .8 and n = 25. Use MINITAB to calculate each of the probabilities below exactly. Also compute the normal approximation to these probabilities. (Remember to use continuity correction.) Compare the binomial results with the normal approximations. (a) P(X = 21) (b) P(X < 21) (c) P(X > 24) (d) P(21 < X < 24) 3. Do Exercises 6.103 and 6.133 in your text

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 7 - LAB SESSION SAMPLE VARIABILITY


INTRODUCTION: In an effort to predict population parameters, we need to investigate the variability in the sample means obtained from repeated sampling. The Central Limit Theorem tells us that the sampling distribution of sample means, x , is approximately normally distributed. In the following lab you will test the results of the Central Limit Theorem.

GENERATING THE DISTRIBUTIONS OF SAMPLE MEANS Uniform Distribution Enter the values 0 through 9 into column 1 and name column 1 'X': Menu Commands: Choose: Calc > Make Patterned Data> Simple Set of Numbers Store patterned data in: C1 From first value: 0 To last value: 9 In steps of: 1 OK Enter the probabilities into column 2. For the uniform distribution assign probabilities of .1 to the x-values 0 through 9. Name column 2 'UNIFORM': Generate 30 sets of 100 uniform deviates (random numbers with a uniform distribution) and store them in C6 through C35. Menu Commands Choose: Calc > Random Data>Discrete Generate 100 rows of data Store in columns: C6-C35 Values in: C1 Probabilities in: C2 OK

Technology Guide for Elementary Statistics 11e: Minitab Observe the distribution of the data in C35 by creating a dotplot.
Dotplot of C35

C35

To illustrate the concept of a sampling distribution we're considering the finite population {0, 1, 2, ..., 9}. We shall generate values from three very different distributions and investigate, empirically, sampling distributions of the sample means for samples of size n=2, n=5, and n=30 for each of the different distributions. (N=2) Calculate the sample mean, x , for each pair of values given in C6 and C7 and store the results in C41. Menu Commands Choose: Calc > Row Statistics Select Mean radio button Enter: Input variables: C6 C7 Enter: Store result in C41 OK

Technology Guide for Elementary Statistics 11e: Minitab

Observe the distribution of the sample means in C41 by doing a dotplot as above.
Means over 2 columns C6 & C7

4 N= 2

Notice that this distribution of sample means does not look like the population. (N=5) Calculate x for the values in C8 through C12, storing your results in C42 as we did above.

Technology Guide for Elementary Statistics 11e: Minitab

Observe the distribution of the sample means in C42:


Means over 5 columns C8 - C12

4 N=5

(N=30) Repeat the above procedure for the values in C6-C35, storing your results in C43. Do a dotplot.

Technology Guide for Elementary Statistics 11e: Minitab


Means over 30 columns C6-C35

3.15

3.50

3.85

4.20 N=30

4.55

4.90

5.25

Compare the descriptive statistics and distributions for each of the calculated means

Descriptive Statistics: C35, N = 2, N=5, N=30


Variable C35 N = 2 N=5 N=30 Total Count 100 100 100 100 Mean 4.280 4.180 4.562 4.4100 TrMean 4.256 4.172 4.571 4.4159 StDev 2.995 1.811 1.386 0.4880 Minimum 0.000000000 0.500 0.800 3.0000 Q1 2.000 2.625 3.650 4.1083 Median 4.000 4.000 4.600 4.4000 Q3 7.000 5.500 5.400 4.6667

Variable C35 N = 2 N=5 N=30

Maximum 9.000 8.000 8.200 5.5000

Technology Guide for Elementary Statistics 11e: Minitab

Technology Guide for Elementary Statistics 11e: Minitab Now do a dotplot to compare them. Note the shape of each of the distributions of the sample means. These distributions don't look like the original data (C35), but they do have a shape we're familiar with.

Dotplot of C35, N = 2, N=5, N=30

C35

N=2 N=5

N=30

0.0

1.2

2.4

3.6

4.8 Data

6.0

7.2

8.4

Each symbol represents up to 2 observations.

Technology Guide for Elementary Statistics 11e: Minitab J-Shaped Distribution Enter the following probabilities into column 2: .39 .26 .22 .18 .15 .13 .12 .10 .05 .02 and repeat the previous procedure. U-Shaped Distribution Enter the following probabilities into column 2: .18 .15 .09 .06 .02 .02 .06 .09 .15 .18 and repeat the previous procedure.

Questions:
1. What are the parameter values for each of the three distributions? 2. What happened to the means and standard deviations of the x 's as n got larger? 3. How did the distributions of x s compare to the normal distribution as n got larger? Were the results similar for the different distributions? 4. Do Exercises 7.9, 7.15, 7.40, 7.45 and 7.46 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 8 - LAB SESSION ESTIMATION AND HYPOTHESIS TESTING


INTRODUCTION: Two indispensable statistical decision-making tools for a single parameter are (i)confidence intervals, and (ii) hypothesis tests to investigate theories about parameters. In this lab you will learn how to calculate confidence intervals and perform hypothesis tests using MINITAB. CONFIDENCE INTERVALS Begin a new worksheet and enter 10 random numbers into C1 with a minimum value of 0 and a maximum value of 25. Do this as follows: Menu Commands Choose: Calc > Random Data > Integers Generate 10 rows of data Store in C1 Minimum 0 Maximum 25 OK To see the mean, standard deviation and maximum and minimum values for the data set use the menu selection Stat>Basic Statistics> Display Descriptive Statistics. If you click on the Statistics tab, you can choose whatever statistics you wish to display.

(Your results may be slightly different, since we are using random data.)

Find the 99% confidence interval of C1 by typing the following command, and record the results. Note that this command requires the standard deviation (sigma) and the specified level for the confidence interval.

Technology Guide for Elementary Statistics 11e: Minitab

Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in columns Enter: C1 Enter Standard deviation: 7.5 Select: Options tab Enter: Confidence Level: 99.0 Alternative: not equal Click: OK OK

Find the 95% and 90% confidence intervals of C1 and record the results.

Technology Guide for Elementary Statistics 11e: Minitab

Looking at these three intervals 1. Consider the means obtained from 100 samples of size 10. If these means were used to construct 100 confidence intervals, determine the expected number of times the population mean would be included in one of these intervals. 2. In the 99% confidence interval that you found, the level of significance is 99%. What is the value of ? What does represent? 3. In which of these intervals is the maximum error, E, the smallest? What does this mean? In which of these intervals are you being more certain to include the population mean?

HYPOTHESIS TESTING To understand the results of a computer driven hypothesis test, it is best to show one first. An example of MINITAB's output for a z-test for data in C1 is given below. The statistics that you need, the test statistic and p-value are the last two values on the bottom line. Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in columns Enter: C1 Enter Standard deviation: 7.5 Test mean 15 (required for test) Select: Options tab Enter: Confidence Level: 99.0 Alternative: less than Click: OK OK

The alternative hypothesis is chosen in the drop down menu.

Technology Guide for Elementary Statistics 11e: Minitab

Another example: A standard final examination in an elementary statistics course is designed to produce a mean score of 75. The hypothesis you will try to verify is: "This particular statistics class is above average." At the .05 level of significance, test the claim that the following sample scores reflect an above-average class: 79 79 78 74 82 89 74 75 78 73 74 84 82 66 84 82 82 71 72 83

Enter the data and get a preliminary graphical analysis. Your menu selections would be Graph > BoxPlot > Simple.

Final exams
90

85

Final exam

80

75

70

65

Technology Guide for Elementary Statistics 11e: Minitab

Descriptive Statistics: Final exam


Variable Final exam Mean 78.05 TrMean 78.11 StDev 5.60 Minimum 66.00 Q1 74.00 Median 78.50 Q3 82.00 Maximum 89.00

Test the hypothesis, "The mean test grade for this class is greater than 75.". Assume sigma = 12. Menu Commands Choose: Stat > Basic Statistics > 1-Sample z Select: Samples in Columns Enter: C1 Enter: Test mean: 75 Enter: Standard Deviation: 12 Select: Options tab Click: Alternative greater than Click: OK

One-Sample Z: Final exam


Test of mu = 75 vs > 75 The assumed standard deviation = 12

Variable Final exam

N 20

Mean 78.0500

StDev 5.5958

SE Mean 2.6833

99% Lower Bound 71.8078

Z 1.14

P 0.128

Questions: 1. What are the formal null and alternative hypotheses? 2. What is the value of the test statistic, and what is your decision? Is the mean of this class above average?

Technology Guide for Elementary Statistics 11e: Minitab ASSIGNMENT: Do Exercises 8.41, and 8.115 in your text, and the following two problems. 1. In one region of a city, a random survey of households includes a question about the number of people in the household. The results are given in the accompanying frequency table. Construct the 90% confidence interval for the mean size of all such households. Assume that the sample standard deviation can be used as an estimate of the population standard deviation. Household size frequency 2. 1 15 2 20 3 37 4 23 5 14 6 4 7 2

An aeronautical research team collects data on the stall speeds (in knots) of ultralight aircraft. The results are summarized in the accompanying stem-and-leaf plot. Construct the 95% confidence interval for the mean stall speed of all such aircraft. Assume = 1 .
MTB > Stem-and-Leaf c1. Stem-and-leaf of C1 Leaf Unit = 0.10 21. | 7 8 22. | 3 4 4 6 23. | 2 2 5 8 9 9 24. | 0 1 3 25. | 2 N = 16

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 1 ANALYZING MEAN AND VARIANCE (SIGMA UNKNOWN)


INTRODUCTION: The t-statistic is used when making inferences concerning the population mean when sigma is an unknown quantity. We will introduce the ttest and compare the z and t distributions.

THE TINTerval To generate a confidence interval using the t-statistic we use the One-Sample T command, specifying the level of confidence and the column of data for which the estimation is being made.

Consider the data presented in exercise 9.31 of your text. Enter the data into C1. Before we complete a 95% confidence interval estimate for the mean length of lunch breaks at Giant Mart, we check the normal probability plot and boxplot to verify the normality assumption.

Probability Plot of C1
Normal
99 Mean StDev N AD P-Value 29.32 4.922 22 0.461 0.235

95 90 80

Percent

70 60 50 40 30 20 10 5

20

25

30 C1

35

40

Technology Guide for Elementary Statistics 11e: Minitab

Boxplot of C1
40

35

C1

30

25

20

The normality assumptions are satisfied so, to complete the 95% confidence interval estimate for the mean length of lunch breaks at Giant Mart, use the following commands: Menu Commands Choose: Stat>Basic Statistics >1-Sample t Select: Samples in columns: C1 Select: Options Enter: Confidence interval Level: 95 Alternative: Not equal Click: OK OK

Technology Guide for Elementary Statistics 11e: Minitab


One-Sample T: C1
Variable C1 N 22 Mean 29.3182 StDev 4.9221 SE Mean 1.0494 95% CI (27.1358, 31.5005)

Suppose you were given summarized data as in Ex 9.26. In this problem you are given the sample size n = 41, x = 3582.17 and (x x )2 = 9960.336 . You are asked to give a 90% confidence interval to estimate the true mean cost. We can perform the same test as above, but this time choose Summarized data instead of sample in column. You will have to calculate the mean and standard deviation by hand first.

One-Sample T
N 41 Mean 87.3700 StDev 15.7800 SE Mean 2.4644 95% CI (82.3892, 92.3508)

THE TTEST Using text exercise 9.29 as the basis of our discussion, enter the data values into column C2. Suppose we have been asked to determine whether this accelerator has decreased the drying time by significantly more then 4% at the 0.01 level. The hypotheses to be tested are: H0: = 4.0 Ha: > 4.0 To perform the test, use the following commands: Menu Commands Choose: Stat>Basic Statistics > 1-Sample t Select: Samples in columns: Enter: C2 Select: Test mean : 4.0 Select Options Enter: Confidence level: 95 Select: Alternative: greater than Click: OK

Technology Guide for Elementary Statistics 11e: Minitab


One-Sample T: DryTime
Test of mu = 4 vs > 4

Variable DryTime

N 8

Mean 4.56250

StDev 1.34051

SE Mean 0.47394

95% Lower Bound 3.66458

T 1.19

P 0.137

Is there sufficient evidence to show that this accelerator has decreased the drying time significantly more than 4% at the .01 level?

As another example consider the point spread between opposing teams in the 1996 bowl games : 5 20 19 33 6 10 7 18 29 41 6 32 9 36 Enter the data into column 3. Test the hypothesis, "The average spread between the scores of the winning and the losing teams in a college bowl game is less than 20." Assume sigma is unknown. Do the same as above, making the appropriate choices.

One-Sample T: Pt Spread
Test of mu = 20 vs < 20

Variable Pt Spread

N 14

Mean 19.3571

StDev 12.7013

SE Mean 3.3946

95% Upper Bound 25.3687

T -0.19

P 0.426

Questions: 1 What are the formal null and alternative hypotheses? 2. What is the value of the test statistic, and what is your decision if = .10? final point spread of a bowl game less than 20? 3. What does the size of the p-value tell us? ASSIGNMENT: Do Exercises 9.56, 9.60 in your text Is the

Technology Guide for Elementary Statistics 11e: Minitab COMPARISON OF THE Z AND T DISTRIBUTION Why do you use two different distributions depending on the availability of the standard deviation, ? What basic assumptions are necessary to use the t-statistic? Is the basic assumption that the parent population is normally distributed a necessary one? Why? If the parent population is not known to be normally distributed, when can we use the tstatistic? In this exercise you will generate both types of statistics from the same 100 samples and be able to compare the two empirical distributions. Open a new worksheet, then generate 100 samples of size 5 from a normal distribution with mu=15 and sigma=10, and store the mean and standard deviation of each of the 100 samples. Menu Commands Choose: Calc>Random Data>Normal Generate 100 rows of data Store in columns: C1-C5 Mean: 15 Standard Deviation: 10 OK x x s n

Calculate both z and t statistics. Menu commands Choose: Calc > Row Statistics Select: Mean radio button Input variables: C1-C5 Store results in: C6 OK Choose: Calc > Row Statistics Select: Standard deviation radio button Input variables: C1-C5 Store results in: C7 OK Choose: Calc>Calculator Store in: C8 Expression: (C6-15)/(10/SQRT(5)) OK Choose: Calc>Calculator Store in: C9 Expression: (C6-15)/(C7/SQRT(5)) OK

Recall: z =

,t =

Technology Guide for Elementary Statistics 11e: Minitab For each of the two statistics, z and t, count the number of times their value is more than 2 units away from the mean of 0. To do this, sort columns C8 and C9 and observe the data. Compare the two distributions graphically by using histograms (multiple graphs, overlay).
Histogram of z
25

20

Frequency

15

10

-5

-4

-3

-2

-1

0 z

Histogram of t
20

15

Frequency

10

.0 .5 -5 -4

.0 -3

.5 -1

0 0.
t

5 1.

0 3.

0 4.

0 5.

Technology Guide for Elementary Statistics 11e: Minitab QUESTIONS: 1. How many of the calculated z-statistics were more than two units away from the origin? How many of the t-statistics? 2. What did the distributions for the two statistics look like? Compare their centers, spread, and overall shape. 3. Would you describe the t-distribution as bell-shaped? If so, would you say it is approximately normal? (i.e., is the z-score plot of the t-statistic a straight line?) 4. If you were to increase n, would you expect the difference between the two distributions to increase or decrease?

ASSIGNMENT: Do Exercise 9.64 in your text

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 2 ANALYZING THE POPULATION PROPORTION

INTRODUCTION: In this lab we will investigate the inferences that can be made about the binomial parameter p. Inferences concerning the population binomial parameter p are made using procedures that closely parallel the inference procedures for the population mean (see Chapter 9 Lab Session 1).

CONFIDENCE INTERVALS Consider the following sample problem: A telephone survey was conducted to estimate the proportion of households with a personal computer. Of the 350 households surveyed, 75 had a personal computer. First, we will determine a point estimate for the proportion in the population who have a personal computer. The data to be entered will be a series of 0's and 1's, each number designating one of two categories. Since the parameter of concern is the proportion of households with a personal computer, we use 1 to represent 'has a personal computer' and use 0 to represent 'does not have a personal computer'. The easiest way to enter the data is to enable the command editor and type in the following commands in the Session window. MTB> SET C1 DATA> 75(1) 275(0) DATA> END This can also be accomplished in the worksheet by entering a 1 in the first row of C1, then click and hold the + in the lower right hand corner of the cell and dragging through row 75. Then enter a 0 in row 76, and click, hold and drag to row 350. Calculate the mean and store it in C2. This actually represents p.

Technology Guide for Elementary Statistics 11e: Minitab Now, if we wish to generate a confidence interval for p (lets say 95%), do the following: Menu Commands Choose: Stat > Basic Statistics > 1 Proportion Select: Sample in column radio button and enter C1 Select: Options Enter: Confidence Interval: 95 We can ignore Test proportion, and Alternative since were not really interested in the hypothesis test Check box: Use test and interval based on normal distribution. OK OK

Technology Guide for Elementary Statistics 11e: Minitab

Test and CI for One Proportion: C1


Test of p = 0.5 vs p not = 0.5 Event = 1

Variable C1

X 75

N 350

Sample p 0.214286

95% CI (0.171298, 0.257273)

Z-Value -10.69

P-Value 0.000

Technology Guide for Elementary Statistics 11e: Minitab HYPOTHESIS TESTING This sample problem will take you through the steps of entering the summarized data and performing a hypothesis test for exercise 9.105 in your textbook. This time we will not simulate the data, rather we will use the summarized statistics given in the problem. The hypotheses for this test are H0: p = .9 vs Ha: p < .9 Menu Commands Choose: Stat > Basic Statistics > 1 Proportion Select: Summarized Data Enter: Number of trials: 75 Number of successes: 55 Select: Options Enter: Test Proportion: .9 Select: Alternative: less than OK OK

Test and CI for One Proportion


Test of p = 0.9 vs p < 0.9 95% Upper Bound 0.817324

Sample 1

X 55

N 75

Sample p 0.733333

Z-Value -4.81

P-Value 0.000

1. 2.

What decision should be made based on these results? What does P VALUE = 0.0000 tell us?

Assignment: Do exercises 9.107, and 9.109 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 9 - LAB SESSION 3 ANALYZING THE POPULATION VARIANCE


INTRODUCTION: In this lab we will present the hypothesis test for the standard deviation for a normal population. When sample data are skewed, just one outlier can greatly affect the standard deviation. It is very important, especially when using small samples, that the sampled population be normal; otherwise the procedures are not reliable. However, unlike the analysis for the mean you will not have convenient computer commands to help you. To use Example 9.19 as an example of using Minitab to aid in completion of the hypothesis test, let's assume the 12 samples tested yielded the following data: 165 172 180 189 181 174 165 185 211 170 198 171 Enter the data into C1, determine the descriptive statistics and do either a dot plot or histogram.
Descriptive Statistics: C1
Variable C1 Variable C1 N 12 N* 0 Mean 180.08 SE Mean 4.01 StDev 13.89 Minimum 165.00 Q1 170.25 Median 177.00 Q3 188.00

Maximum 211.00

Dotplot of C1

168

175

182

189 C1

196

203

210

Technology Guide for Elementary Statistics 11e: Minitab Recall that the manufacturer claims shelf life is normally distributed. Why is this important? The necessary expressions for completing the hypothesis test follow, using the calculator: Store the hypothesized variance in C2 102 or 100 Store the degrees of freedom in C3 COUNT(C1) - 1 (or 11) Store the standard deviation in C4 STDEV(C1) 2 Store * in C5 (C3*(C4*C4))/C2 The spreadsheet will look like this: hyp var 100 df 11 s 13.8922 ChiSq 21.2292

165 172 180 189 181 174 165 185 211 170 198 171

Compute the area under the curve to the left of chi-square* by the following: Menu commands: Choose: Calc > Probability Distributions > Chi-Square Select: Cumulative Probability Noncentrality Parameter: 0.0 Enter: Degrees of freedom: 11 Select: Input constant* Enter: 2 value (in this case 21.2296) OK (Remember, our value of chi-square is different than the example, since we made up data for the problem. See questions below about doing the test using summary statistics.)

Technology Guide for Elementary Statistics 11e: Minitab

Remember this is the left tail. The right tail would be 1-0.968927 = 0.031073.

Questions: 1. What is the p-value? 2. What decision should be made? 3. Does your conclusion match that for Example 9.19? The p-value doesnt match exactly because we made up data to fit the statistics and the data values dont produce the given mean and standard deviation exactly. It is not necessary to have the raw data. Can you redo the test using only the summary statistics?

ASSIGNMENT: Do Exercises 9.137 and 9.144 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 10 LAB SESSION INFERENCES INVOLVING TWO POPULATIONS


INTRODUCTION: When comparing two populations we need two samples, one from each population. Two kinds of samples can be used: dependent or independent, determined by the source of the data. The methods of comparison are quite different. CASE 1. DEPENDENT SAMPLE (PAIRED DATA): The two data values, one from each set, that come from the same source are called paired data. They are compared by using the difference in their values, called the paired difference, d. Because the distribution of the paired difference, d = x1 - x2, will be approximately normally distributed when paired observations are randomly selected from normal populations, we will use the t-test. We wish to make inferences about d where the random variable (d) involved has an approximately normal distribution with an unknown standard deviation (d). Confidence Interval Consider the data presented in exercise 10.16 of your text. Use MINITAB to generate the 95% confidence interval for the mean improvement in memory resulting from taking the memory course. ( d = after - before) Retrieve the data file for ex10-016 from the Student Suite CD. Using the calculator, form the paired difference using the expression C2 - C1 and store it in C3. To generate the interval Menu Commands Choose: Stat > Basic Statistics > 1-Sample T Enter: Samples in Columns: C3 Click OK.

Technology Guide for Elementary Statistics 11e: Minitab The response is shown in the session window.
One-Sample T: C3
Variable C3 N 10 Mean 6.10000 StDev 4.79467 SE Mean 1.51621 95% CI (2.67010, 9.52990)

Hypothesis Testing To demonstrate the procedure for a hypothesis test on mean difference we will do Exercise 10.38. Enter the data for Before in column C1 and for After in column C2 by retrieving it from the Student Suite CD (ex10038). Using the calculator, subtract the values in C1 from the values in C2 and place the paired differences in C3.

Perform a t-test on the paired differences in C3.

Menu Commands Choose: Stat>Basic Statistics > 1-sample t Enter: Variables: C3 Select : Test mean 0.0 Click: OK

Results for: EX10-038.MTW One-Sample T: C3


Test of mu = 0 vs not = 0

Variable C3

N 10

Mean 7.00000

StDev 5.79272

SE Mean 1.83182

95% CI (2.85614, 11.14386)

T 3.82

P 0.004

How would you interpret these results?

ASSIGNMENT: Do Exercises 10.19, 10.20, 10.22, 10.34 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CASE 2. INDEPENDENT SAMPLES: If two samples are selected, one from each of the populations, the two samples are independent if the selection of objects from one population is unrelated to the selection of objects from the other population. Since the samples provide the information for determining the standard error, the t distribution will be used as the test statistic, and the degrees of freedom will be calculated by MINITAB. The TWOSAMPLE command performs both the confidence interval and the hypothesis test at the same time. a) Consider Exercise 10.49 in your text. Retrieve the data from the Student Suite CD or enter the data for the males is in C1 and the females is in C2. Then, to complete the 99% confidence interval:

Menu commands Choose: Stat > Basic Statistics > 2-Sample t Select: Samples in different columns Enter: First: C1 Enter: Second: C2 Click: Options Enter: Confidence level: 99 Click: OK OK

Technology Guide for Elementary Statistics 11e: Minitab The response appears in the session window.

Verify these results by doing the calculations. b) Complete the hypothesis test presented in Exercise 10.72 of your text. Retrieve the data from the Student Suite CD and note that the data for Diet A is in C1 and Diet B is in C2. Menu Commands Choose: Stat>Basic Statistics > 2-Sample t Select: Samples in different columns Enter: First: C1 Enter : Second: C2 Select: alternative: less than Click: OK

Technology Guide for Elementary Statistics 11e: Minitab The results appear in the session window.

Do the data justify the conclusion that the mean weight gained on diet B was greater than the mean weight gained on diet A, at the = .05 level of significance? ASSIGNMENT: Do Exercises 10.76, and 10.79 in your text. Both sets of data are found on the Student Suite CD. Enrichment Assignment: Do Exercise 10.80 or 10.81. Turn in a typed paper detailing your procedures and results. Include the session commands you used and a printed copy of your output to substantiate your conclusions. Review Chapter 1 Lab Session on how to record your session commands and printing out results. Remember you can output your results to a file and then import that file into your word processor to make report writing easier. COMPARING TWO PROPORTIONS USING TWO INDEPENDENT SAMPLES Confidence Interval Consider exercise 10.85. We are interested in estimating the difference in the proportion of male and female teenagers who have ever gambled. The sample evidence given is that 66% of the 200 males (x = 132) and 37% of the 199 females (x = 74) have ever gambled. Menu Commands Choose: Stat > Basic Statistics > 2Proportions Select: Summarized Data Enter: First: 200 (trials) 132 (events) Second: 199 (trials) 74 (events) Select: Options Enter: Confidence level : 95% OK OK

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window.


Test and CI for Two Proportions
Sample 1 2 X 132 74 N 200 199 Sample p 0.660000 0.371859

Difference = p (1) - p (2) Estimate for difference: 0.288141 95% CI for difference: (0.194231, 0.382051) Test for difference = 0 (vs not = 0): Z = 6.01

P-Value = 0.000

Hypothesis Test Consider exercise 10.101. Menu Commands Choose: Stat > Basic Statistics > 2-Proportions Select: Summarized Data Enter: First: 200 (trials) 132 (events) Second: 199 (trials) 74 (events) Select: Options Enter: Confidence level : 95% Alternative: not equal Select: Use pooled estimate of p for test OK OK

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window.

What conclusion should be reached?

ASSIGNMENT: Do Exercises 10.90, and 10.100 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 11 LAB SESSION ANALYZING ENUMERATIVE DATA


INTRODUCTION: The data used in this lab is enumerative -- that is, the data is placed in categories and counted. The observed frequencies list exactly what happened in the sample. The expected frequencies represent the theoretical expected outcomes (what is expected to happen on the average). These expected values must always add up to n. When we perform a hypothesis test on these two sets of values we are really asking how different are they? If the difference is small, we may attribute it to the chance variation in the samples. However if the difference is large there may be a difference in the proportions in the population and we may reject the null hypothesis. We can use the 2 distribution in our test. We will first make inferences concerning multinomial experiments and then extend that to contingency tables.

MULTINOMIAL EXPERIMENTS A multinomial experiment consists of n independent trials, whose outcome fits into only one of k possible cells. The probabilities of each of these cells remains constant and the sum of all the probabilities = 1. For multinomial experiments we will always use a right tail critical region of the distribution. The expected frequency for each cell is obtained by multiplying the probability for that cell by the total number of trials, n. We can program MINITAB to calculate the Chi-Square statistic by entering the data, and the probability for each cell, calculating the expected values for each cell, and the Chi-Square value for each cell. We then need to sum each of these columns. Lets do Example 11-1 from the text, implementing MINITAB to do the calculations.

Since there are seven sections, we can assume the probability of choosing any one of them would be 1/7. So we will enter 0.142857 in seven rows of column one. Enter the seven observed values from Table 11.3 on page 550 into column 2. Calculate the expected values and place in C3 using the expression C1 * SUM(C2) Calculate Chi-Square and place in C4 using the expression

(C2-C3)**2/C3

Technology Guide for Elementary Statistics 11e: Minitab Next calculate the sums of each column and place in columns C5 C8. You should get the following output. Sum C1 1.00000 Sum C2 119 Sum C3 119.000 Sum C4 12.9412

0.142857 0.142857 0.142857 0.142857 0.142857 0.142857 0.142857

18 12 25 23 8 19 14

17.0000 17.0000 17.0000 17.0000 17.0000 17.0000 17.0000

0.05883 1.47058 3.76473 2.11766 4.76469 0.23530 0.52941

Compare the MINITAB results to the results given in the example.

ASSIGNMENT: Do Exercises 11.15, 11.21, and 11.22 in your text.

CONTINGENCY TABLES AND THE CHI SQUARE COMMAND

This command performs a test of H0 that there is no relationship between the row and column variables in a table. We will enter the integer information directly into the worksheet. The MINITAB output will show the observed values and the expected values in each cell, the calculated value of the 2 and the degrees of freedom. Note that the observed values are entered into the columns. The expected values are calculated for us by MINITAB. (Remember, from your text, that these numbers are calculated by multiplying the appropriate row and column sums and dividing by the total number of trials, n. The formula for the cell in the ith row and jth column is (Ri x Cj) / n). Enter the data from Example 11- 6 to see how it works. Menu Commands Choose: Stat>Tables>Chisquare Test (Table in Worksheet) Enter: Columns containing the table: C1 C2 Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window.


Chi-Square Test: Favor, Oppose
Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Favor 143 101.60 16.870 98 101.60 0.128 13 50.80 28.127 254 Oppose 57 98.40 17.418 102 98.40 0.132 87 49.20 29.041 246 Total 200

200

100

Total

500

Chi-Sq = 91.715, DF = 2, P-Value = 0.000

Technology Guide for Elementary Statistics 11e: Minitab

Lets perform the procedure using the data from Exercise 11.45. First name your columns with the days of the week. Enter the data in the appropriate columns. From the STAT menu, choose TABLES and then CHISQUARE. Select the columns C1 - C5

Perform the test to obtain the following output in the session window.
Chi-Square Test: Mon, Tues, Wed, Thurs, Friday
Expected counts are printed below observed counts Chi-Square contributions are printed below expected counts Mon 85 91.00 0.396 15 9.00 4.000 100 Tues 90 91.00 0.011 10 9.00 0.111 100 Wed 95 91.00 0.176 5 9.00 1.778 100 Thurs 95 91.00 0.176 5 9.00 1.778 100 Friday 90 91.00 0.011 10 9.00 0.111 100 Total 455

45

Total

500

Chi-Sq = 8.547, DF = 4, P-Value = 0.073

You still have to frame the null and alternative hypothesis, set the criteria, and then, using the results from the above computer printout, draw your conclusion.

ASSIGNMENT: Do the following Exercises 11.50, 11.51, , 11.68, 11.74 in your text

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 12 LAB SESSION ANALYSIS OF VARIANCE


INTRODUCTION: In earlier sessions you have examined and compared means from two samples. We will now practice a technique that tests hypothesis about several means. While we could compare the means in pairs as we have done before, the process could become too unwieldy to be of any use. Analysis of variance (ANOVA) allows us to test all the means at the same time to see if there is any significant difference between them. The Logic Underlying The Anova Technique We will be forming a comparison between two estimates of the population variance: one based on the variance within each set of data and the other between the sets of data. We will use the F distribution for this comparison. If there is relatively little difference within each group and a large difference between the sample means, we will reject the null hypothesis. (Remember we always word the null hypothesis to say there is no difference...). If there is a lot of variance within a group and little between groups, we cannot conclude that the population means are different. We also need to know that the groups under investigation are approximately normally distributed and independent. ANOVA is presented as a table, and we need to define our terms in order to understand what the table is telling us. The Factor is the variable whose means we are interested in studying. When we first set up our data charts in MINITAB, each column will represent different Levels of the Factor we are examining. Each row will be a data value from repeated samplings, called a Replicate. The ANOVA table will represent the Factor as the first row of the table. The next row is the Error, followed by the Total. The columns will be Degrees of Freedom (DF), the sum of squares (SS), and the mean square (MS) which is the ratio of the sum of the squares to the degrees of freedom for the factor and the error. PERFORMING AN ANOVA ANALYSIS This sample problem will take you through the steps of entering the data and generating the ANOVA table for Example 12-1 in your textbook. The FACTOR we are looking at is temperature and whether it has any effect on production. We will examine production at three different temperature levels: 68, 72, 76. These levels form our columns. The production amounts are the replicates and form the rows of the data table. You can name the columns and enter the data directly into the worksheet. Make sure you have entered the data correctly.

Technology Guide for Elementary Statistics 11e: Minitab If we did a DOTPLOT of the three columns some interesting things are shown.

Sample Results on Temperature and Production

TEMP 68 Temp 72 Temp 76 4 6 Data 8 10 12

Note that the points within each level are fairly close, but the three levels hardly overlap at all. The command for generating the ANOVA table is as follows: Menu Commands Choose: Stat>ANOVA> Oneway (Unstacked) Select C1-C3 Enter: Responses in separate columns C1 C2 C3 Click: OK

Technology Guide for Elementary Statistics 11e: Minitab

The response is shown in the session window.


One-way ANOVA: TEMP 68, Temp 72, Temp 76
Source Factor Error Total DF 2 10 12 SS 84.500 9.500 94.000 MS 42.250 0.950 F 44.47 P 0.000

S = 0.9747

R-Sq = 89.89%

R-Sq(adj) = 87.87%

Level TEMP 68 Temp 72 Temp 76

N 4 5 4

Mean 10.250 7.000 3.750

StDev 1.258 0.707 0.957

Individual 95% CIs For Mean Based on Pooled StDev ---------+---------+---------+---------+ (---*---) (---*---) (---*---) ---------+---------+---------+---------+ 5.0 7.5 10.0 12.5

Pooled StDev = 0.975

Compare the output to the calculations in Example 12-1 in the text. Note in particular that the calculated value for F* = 44.47. To make our decision, we need to compare this to the critical value F(2,10,.05) = 4.10. We can therefore conclude that at least one of the temperatures has an effect on the production level. The p-value given in the chart can also be used to determine the conclusion. How would you interpret it? Exercise 12.53 in the chapter exercises compares the stopping distances for four brands of tires. Using the data given there, is there sufficient evidence to conclude that there is a difference in the mean stopping distances at the = .05 level? This data may be found on the Student Suite CD as Ex12-53

Technology Guide for Elementary Statistics 11e: Minitab

Procedure: a) State your null and alternative hypotheses. b) Find your critical region and value for F. c) 1) Enter your data in columns 1 - 4, naming them A, B, C, D respectively. 2) Do a dotplot to get a feel for how the data interact. 3) Perform an ANOVA to calculate F*. What does the p value tell you? Explain. d) Draw your conclusion about the null hypothesis and explain what it means to you. How would your conclusion change if changed? ASSIGNMENT: Do Exercises 12.28, 12.29, 12.51 and 12.55 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 13 LAB SESSION LINEAR REGRESSION ANALYSIS


INTRODUCTION: In an earlier lab, we looked at bivariate data, and used the linear correlation coefficient to see if there was a relationship between the two variables. You also looked at a method of developing a line of best fit. In this lab we will look at a method of deciding whether the equation of that line is of any use to us in making point predictions and developing confidence intervals. Before beginning this lab, you should review the commands for performing a regression analysis in Chapter 3 Lab Session Lab 2. Use the data in Exercise 13.43 just to refresh your memory. Enter x values in C1 Enter y values in C2 Menu commands Choose: Stat>Regression >Regression Enter: Response: C2 Predictors: C1 Click: OK

Check your results in the session window.


Regression Analysis: y versus x
The regression equation is y = - 13.4 + 2.30 x

Predictor Constant x

Coef -13.414 2.3028

SE Coef 7.168 0.1918

T -1.87 12.01

P 0.098 0.000

S = 10.1738

R-Sq = 94.7%

R-Sq(adj) = 94.1%

Analysis of Variance Source Regression Residual Error Total DF 1 8 9 SS 14924 828 15752 MS 14924 104 F 144.19 P 0.000

Technology Guide for Elementary Statistics 11e: Minitab There are several steps in doing a linear regression analysis. First we obtain a least squares estimate for the model equation y = 0 +1 x +, Next, we need to check our assumptions about the random error component, . (The mean value of the experimental error is zero. We must also assume that the distribution of the ys is approximately normal and the variances 2 of the distribution of random errors is a constant.) Note that an estimate of 2 can be obtained from the MINITAB printout (s2). Third, assess the usefulness of the model by making inferences about the slope. Lastly, we can construct a confidence interval for our predictions. We will use Exercise 13.88 to demonstrate the procedure. 1. Do a scatterplot to visually check if there is a linear relationship. Menu commands Choose: Graph> ScatterplotPlot Select: Simple Enter: OK Enter Y-variables : C2 Enter X-variables: C3

Technology Guide for Elementary Statistics 11e: Minitab

2. Determine the correlation coefficient. Menu commands

Choose: Stat>Basic Statistics > Correlation Enter: Variables: C3 C2


Enter: OK

The answer will appear in the session window.


Correlations: Population, Settlement
Pearson correlation of Population and Settlement = 0.928

3. Find the Equation of the Line of Best Fit: Menu commands Choose: Stat>Regression > Regression Enter: Response: C2 Predictors: C3 Click: OK

The response appears in the session window.


Regression Analysis: Settlement versus Population
The regression equation is Settlement = 0.047 + 0.879 Population Predictor Constant Population S = 1.93233 Coef 0.0466 0.87936 SE Coef 0.3724 0.05191 T 0.13 16.94 P 0.901 0.000

R-Sq = 86.2%

R-Sq(adj) = 85.9%

Analysis of Variance Source Regression Residual Error Total DF 1 46 47 SS 1071.4 171.8 1243.2 MS 1071.4 3.7 F 286.95 P 0.000

Technology Guide for Elementary Statistics 11e: Minitab

Unusual Observations Obs 5 8 30 Population 31.9 0.7 18.2 Settlement 25.000 7.750 25.000 Fit 28.081 0.680 16.033 SE Fit 1.436 0.349 0.751 Residual -3.081 7.070 8.967 St Resid -2.38RX 3.72R 5.04RX

R denotes an observation with a large standardized residual. X denotes an observation whose X value gives it large influence.

Note the different values: Find b0, b1, sb , the calculated value of t*, the p-value. Perform the hypothesis test using the information from the ANOVA results. 4. Form the confidence intervals: Menu commands Choose: Stat > Regression> Fitted Line Plot Enter: Response(Y): C2 Predictor(X): C3 Select: Type of Regression Model: Linear Click: OK

Technology Guide for Elementary Statistics 11e: Minitab Now lets go back and select some options: Menu commands Choose: Stat > Regression > Fitted Line Plot Enter: Response(Y): C2 Predictor(X): C3 Select: Type of Regression Model: Linear Click: Options tab Check: Display confidence interval Display prediction interval Confidence level: 95.0 OK

We now get this graph. Answer the questions contained in 13.82 using this information.

ASSIGNMENT: Do Exercises 13.79, 13.87, 13.90 in your text.

Technology Guide for Elementary Statistics 11e: Minitab

CHAPTER 14 LAB SESSION ELEMENTS OF NON-PARAMETRIC STATISTICS


INTRODUCTION: All the previous methods we have studied are parametric statistics - based on a population that has a certain distribution and can be applied only when special criteria are met. Non-parametric statistical methods can be applied when these criteria are not able to be met and assumptions about the parent population (such as normality) cannot be made, since these techniques do not rely on the distribution of the parent population. Non-parametric methods tend, unfortunately, to waste information and are less sensitive than their parametric counterparts. This, however, can be compensated for very nicely by increasing the sample size. Non-parametric techniques are generally easier to apply and are only slightly less efficient than parametric techniques. THE SIGN TEST The Sign test is one of the easiest tests to use, since it reduces the data to plus and minus signs. It can be used in hypothesis test for a single median or for two dependent samples using a paired difference. The basic concept is that because the median is the middle piece of data, with 50% of the data above it (represented by +) and 50% below (represented by - ), then P(+) = .5 and P(-) = .5 . The method is fairly simple: all zeroes are rejected and the rest of the data is assigned positive and negative signs. The test statistic is the number of the less frequent sign. This is actually a binomial random variable (outcome either + or -) with a probability of 1/2. Z is calculated by the formula z = (x- n/2)/ [(1/2) n] We will use the data from Exercise 14.3 as a sample for using MINITAB to perform a sign test. a) State the hypotheses: H0: The median high temperature = 48 Ha: The median high temperature 48. b) Set test criteria: First enter a new worksheet, placing the data in column C1. Using the menu and choosing nonparametric single sample sign test and entering the test median to be 48 we get the following: = 0.05

Technology Guide for Elementary Statistics 11e: Minitab Menu Commands Choose: Stat > Nonparametrics > 1-Sample sign Enter: Variables: C1 Select: Test median 48 Click: OK

c) The results appear in the session window.

Note that the p-value is less than . Notice we have only 3 temperatures above the stated median and 16 below. The actual median of the sample is 45.5. d) We therefore reject the H0 in favor of the Ha. We can get the Confidence Interval by using the same commands and checking the Confidence Interval box and entering the desired level of confidence. The response will be displayed in the session window.

ASSIGNMENT: Do Exercise 14.14 in your text. The Sign test can also be used for paired differences with two dependent samples. Do Exercise 14.15 in your text.

Technology Guide for Elementary Statistics 11e: Minitab THE MANN WHITNEY TEST

This is an alternative method for the t-test on two independent random samples in which the random variable is continuous (also called Mann-Whitney-Wilcoxon test). By default, a twosided test is performed. To do one-sided tests, select the test you want from the Alternative dialogue box.

The test is carried out as follows: First, the two samples are ranked together, with the smallest observation given rank 1, the next largest given rank 2, and so on. Then the sum of the ranks of the first sample is calculated. If the sum is small, it indicates the observations from the first sample are smaller than those from the second sample, etc.The attained significance level of the test is calculated using a normal approximation (with a continuity correction factor). The following problem demonstrates Example 14-6 in your text: We first name and enter data in C1 the grades from exam A and repeat this in C2 for exam B. Menu Commands Choose: Stat > Nonparametric > Mann-Whitney Enter: First sample:C1 Second sample: C2 Confidence level: 95.0 Alternative not equal Click: OK

Technology Guide for Elementary Statistics 11e: Minitab The response will appear in the session window. Note that the p-value is not smaller than , so we fail to reject H0.

Example 14-7 is completed in a similar manner. Enter the data found on page 680. This time choose less than for the alternative. Complete the test.

ASSIGNMENT: Do Exercises 14.29, 14.30, and 14.33 in your text. Be sure to clearly state the hypotheses and test criteria. Each data set is on the Student Suite CD.

Technology Guide for Elementary Statistics 11e: Minitab

RUNS TEST FOR RANDOMNESS How do we really know when a set of outcomes is truly random? It cannot be in just counting the number of outcomes, but also in looking at the order in which those outcomes arise -- their arrangement. A particular run is a sequence of outcomes that have a common property. When that property changes the current run ends and a new one begins with the new property. The random variable to be considered is V, the number of runs. Its critical value is found in Table 14. Example 14-10 is used to demonstrate the MINITAB technique: a) State your hypotheses: H0: The numbers are random Ha: the numbers are not random

b) State criteria: A two tail test with = .05 and critical values 2 and 10 from Table 14. c) Perform the test: First enter the data and then choose Runs test from the menu. Since there are 30 values, the median will be in the 15.5 position, or 3.5.

Technology Guide for Elementary Statistics 11e: Minitab Menu Commands Choose: Stat>Nonparametrics > Runs Test Enter: Variables: C1 Select: above and below: Enter: 3.5 The solution will appear in the session window.

What would your conclusion be?

ASSIGNMENT: Do Exercises 14.31, 14.41, 14.44 and 14.47 in your text.

Technology Guide for Elementary Statistics 11e: Minitab RANK CORRELATION

This test is a nonparametric alternative to the linear correlation coefficient. The test is used to determine if there is a correlation between two rankings. Lets consider exercise 14.60.

We then determine the rankings for each list with the following commands or menu choices. Menu Commands Choose: Data > Rank Enter: Rank data in: C2 Store ranks in: C4 Click: OK Repeat the above commands for data in C3 and store in C5.

Technology Guide for Elementary Statistics 11e: Minitab

Then, to determine the Spearman rank correlation coefficient for the two rankings: Choose: Stat > Basic Statistics > Correlation Enter: Variables: C4 C5 Click: OK

Correlations: C4, C5
Pearson correlation of C4 and C5 = -0.291 P-Value = 0.385

What would your conclusion be?

ASSIGNMENT: Do exercises 14.61 and 14.63 in your text.

S-ar putea să vă placă și