Sunteți pe pagina 1din 17

Basic Statistics in the Toolbar of Minitabs Help

1. Example of Displaying Descriptive Statistics


You want to compare the height (in inches) of male (Sex=1) and
female (Sex=2) students who participated in the pulse study. You choose
to display a boxplot of the data.
1

Open the worksheet PULSE.MTW.

Choose Stat > Basic Statistics > Display Descriptive

Statistics.
3

In Variables, enter Height.

In By variable, enter Sex.

Click Graphs and check Boxplot of data. Click OK in each

dialog box.
Session window output

Interpreting the results


The means shown in the Session window and the boxplots indicate
that males are approximately 5.3 inches taller than females, and the
spread of the data is about the same.
2. Example of a graphical summary
Students in an introductory statistics course participated in a simple
experiment. Each student recorded his or her resting pulse. Then they all
flipped coins, and those whose coins came up heads ran in place for one
minute. Then the entire class recorded their pulses. You want to examine
the students' resting pulse rates.
1

Open the worksheet PULSE.MTW.

Choose Stat > Basic Statistics > Graphical Summary.

In Variables, enter Pulse1. Click OK.

Interpreting the results


The mean of the students' resting pulse is 72.870 (95% confidence
intervals(1) of 70.590 and 75.149). The standard deviation is 11.009 (95%
confidence intervals of 9.615 and 12.878). Using a significance level(2) of
0.05, the Anderson-Darling normality test(3) (A-Squared(4) = 0.98, P-Value
= 0.013) indicates that the resting pulse data do not follow a normal
distribution.
3. Example of a 1-Sample Z-Test and Z-Confidence Interval
Measurements were made on nine widgets. You know that the
distribution of measurements has historically been close to normal with
= 0.2. Because you know

, and you wish to test if the population mean

is 5 and obtain a 90% confidence interval(1) for the mean, you use the Zprocedure.
1

Open the worksheet EXH_STAT.MTW.

Choose Stat > Basic Statistics > 1-Sample Z.

In Samples in columns, enter Values.

In Standard deviation, enter 0.2.

Check Perform hypothesis test. In Hypothesized mean,

enter 5.
6

Click Options. In Confidence level, enter 90. Click OK.

Click Graphs. Check Individual value plot. Click OK in each

dialog box.
Session window output

Interpreting the results


The test statistic, Z, for testing if the population mean equals 5 is
-3.17. The p-value(5), or the probability of rejecting the null hypothesis
when it is true, is 0.002. This is called the attained significance level(2), pvalue, or attained a of the test. Because the p-value of 0.002 is smaller
than commonly chosen

-levels(2), there is significant evidence that is

not equal to 5, so you can reject H0 in favor of m not being 5.


A hypothesis test(6) at

= 0.1 could also be performed by viewing the

individual value plot. The hypothesized value falls outside the 90%
confidence interval(1) for the population mean (4.6792, 4.8985), and so
you can reject the null hypothesis.

4. Example of a 1-Sample t-Test and t-Confidence Interval


Measurements were made on nine widgets. You know that the
distribution of widget measurements has historically been close to normal,
but suppose that you do not know

. To test if the population mean is 5

and to obtain a 90% confidence interval for the mean, you use a tprocedure.
1

Open the worksheet EXH_STAT.MTW.

Choose Stat > Basic Statistics > 1-Sample t.

In Samples in columns, enter Values.

Check Perform hypothesis test. In Hypothesized mean,

enter 5.
5

Click Options. In Confidence level, enter 90. Click OK in each

dialog box.

Interpreting the results


The test statistic, T, for H0: = 5 is calculated as -2.56. The pvalue(5) of this test, or the probability of obtaining more extreme value of
the test statistic by chance if the null hypothesis was true, is 0.034. This is
called the attained significance level(2), or p-value. Therefore, reject H0 if
your acceptable

level(2) is greater than the p-value, or 0.034. A 90%

confidence interval(1) for the population mean, , is (4.6357,4.9421). This


interval is slightly wider than the corresponding Z-interval shown in
Example of 1-Sample Z.

5.

Example of a 2-Sample t with the Samples in one Column

A study was performed in order to evaluate the effectiveness of two


devices for improving the efficiency of gas home-heating systems. Energy
consumption in houses was measured after one of the two devices was
installed. The two devices were an electric vent damper (Damper=1) and
a thermally activated vent damper (Damper=2). The energy consumption
data (BTU.In) are stacked in one column with a grouping column (Damper)
containing identifiers or subscripts to denote the population. Suppose that
you performed a variance test and found no evidence for variances being
unequal (see Example of 2 Variances). Now you want to compare the
effectiveness of these two devices by determining whether or not there is
any evidence that the difference between the devices is different from
zero.
1

Open the worksheet FURNACE.MTW.

Choose Stat > Basic Statistics > 2-Sample T.

Choose Samples in one column.

In Samples, enter 'BTU.In'.

In Subscripts, enter Damper.

Check Assume equal variances. Click OK.

Interpreting the results


Minitab displays a table of the sample sizes, sample means, standard
deviations, and standard errors for the two samples. Since we previously
found no evidence for variances being unequal, we chose to use the
pooled standard deviation by choosing Assume equal variances. The
pooled standard deviation, 2.8818, is used to calculate the test statistic

and the confidence intervals(1). A second table gives a confidence interval


for the difference in population means. For this example, a 95%
confidence interval is (-1.450, 0.980) which includes zero, thus suggesting
that there is no difference. Next is the hypothesis test(6) result. The test
statistic is -0.38, with p-value of 0.701, and 88 degrees of freedom(7).
Since the p-value(5) is greater than commonly chosen

-levels(2), there is

no evidence for a difference in energy use when using an electric vent


damper versus a thermally activated vent damper.
6. Example of Paired t
A shoe company wants to compare two materials, A and B, for use on
the soles of boys' shoes. In this example, each of ten boys in a study wore
a special pair of shoes with the sole of one shoe made from Material A and
the sole on the other shoe made from Material B. The sole types were
randomly assigned to account for systematic differences in wear between
the left and right foot. After three months, the shoes are measured for
wear. For these data, you would use a paired design rather than an
unpaired design. A paired t-procedure would probably have a smaller error
term than the corresponding unpaired procedure because it removes
variability that is due to differences between the pairs. For example, one
boy may live in the city and walk on pavement most of the day, while
another boy may live in the country and spend much of his day on
unpaved surfaces.
1

Open the worksheet EXH_STAT.MTW.

Choose Stat > Basic Statistics > Paired t.

Choose Samples in columns.

In First sample, enter Mat-A. In Second sample, enter Mat-B.

Click OK.

Interpreting the results


The confidence interval(1) for the mean difference between the two
materials does not include zero, which suggests a difference between
them. The small p-value(5)(p = 0.009) further suggests that the data are
inconsistent with H0: d = 0, that is, the two materials do not perform
equally. Specifically, Material B (mean = 11.04) performed better than
Material A (mean = 10.63) in terms of wear over the three month test
period. Compare the results from the paired procedure with those from an
unpaired, two-sample t-test (Stat > Basic Statistics > 2-Sample t).
The results of the paired procedure led us to believe that the data are not
consistent with H0 (t = -3.35; p = 0.009). The results of the unpaired
procedure (not shown) are quite different, however. An unpaired t-test
results in a t-value(8) of -0.37, and a p-value of 0.72. Based on such
results, we would fail to reject the null hypothesis(6) and would conclude
that there is no difference in the performance of the two materials. In the
unpaired procedure, the large amount of variance in shoe wear between
boys (average wear for one boy was 6.50 and for another 14.25) obscures
the somewhat less dramatic difference in wear between the left and right
shoes (the largest difference between shoes was 1.10). This is why a
paired experimental design and subsequent analysis with a paired t-test,
where appropriate, is often much more powerful than an unpaired
approach.
7. Example of 1 Proportion

A county district attorney would like to run for the office of state
district attorney. She has decided that she will give up her county office
and run for state office if more than 65% of her party constituents support
her. You need to test H0: p = .65 versus H1: p > .65 . As her campaign
manager, you collected data on 950 randomly selected party members
and find that 560 party members support the candidate. A test of
proportion was performed to determine whether or not the proportion of
supporters was greater than the required proportion of 0.65. In addition, a
95% confidence bound was constructed to determine the lower bound for
the proportion of supporters.
1

Choose Stat > Basic Statistics > 1 Proportion.

Choose Summarized data.

In Number of events, enter 560. In Number of trials, enter

950.
4

Check Perform hypothesis test. In Hypothesized proportion,

enter 0.65.
5

Click Options. Under Alternative, choose greater than. Click

OK in each dialog box.

Interpreting the results


The p-value(5) of 1.0 suggests that the data are consistent with the
null hypothesis (H0: p = 0.65), that is, the proportion of party members
that support the candidate is not greater than the required proportion of
0.65. As her campaign manager, you would advise her not to run for the
office of state district attorney.
8. Example of 2 Proportions

As your corporation's purchasing manager, you need to authorize the


purchase of twenty new photocopy machines. After comparing many
brands in terms of price, copy quality, warranty, and features, you have
narrowed the choice to two: Brand X and Brand Y. You decide that the
determining factor will be the reliability of the brands as defined by the
proportion requiring service within one year of purchase. Because your
corporation already uses both of these brands, you were able to obtain
information on the service history of 50 randomly selected machines of
each brand. Records indicate that six Brand X machines and eight Brand Y
machines needed service. Use this information to guide your choice of
brand for purchase.
1

Choose Stat > Basic Statistics > 2 Proportions.

Choose Summarized data.

In First sample, under Events, enter 44. Under Trials, enter 50.

In Second sample, under Events, enter 42. Under Trials, enter

50. Click OK.

Interpreting the results


For this example, the normal approximation test is valid because, for
both samples, the number of events is greater than four, and the
difference between the numbers of trials and events is greater than four.
The normal approximation test reports a p-value(5) of 0.564, and Fisher's
exact test(9) reports a p-value of 0.774. Both of these p-values are larger
than commonly chosen

levels(2). Therefore, the data are consistent with

the null hypothesis that the population proportions are equal. In other
words, the proportion of photocopy machines that needed service in the

first year did not differ depending on brand. As the purchasing manager,
you need to find a different criterion to guide your decision on which
brand to purchase. Because the normal approximation is valid, you can
draw the same conclusion from the 95% confidence interval(1). Because
zero falls in the confidence interval of (-0.0957903 to 0.175790) you can
conclude that the data are consistent with the null hypothesis. If you think
that the confidence interval is too wide and does not provide precise
information as to the value of p1 - p2, you may want to collect more data
in order to obtain a better estimate of the difference.
9. Example of 1 Variance
You are a quality control inspector at a factory that builds high
precision parts for aircraft engines, including a metal pin that must
measure 15 inches in length. Safety laws dictate that the variance of the
pins' length must not exceed 0.001in 2. Previous analyses determined that
pin length is normally distributed. You collect a sample of 100 pins and
measure their length in order to conduct a hypothesis test(6) and create a
confidence interval(1) for the population variance.
1

Open the worksheet AIRPLANEPIN.MTW.

Choose Stat > Basic Statistics > 1 Variance.

In the drop-down menu, choose Enter variance.

In Samples in columns, enter 'Pin length '.

Check Perform hypothesis test. In Hypothesized variance,

enter 0.001.
5

Click Options. Under Alternative, choose less than. Click OK.

Click OK.

Session window output

Interpreting the results


Because the data comes from a normally distributed population, refer
to the standard method. The p-value for the one-sided hypothesis test is
0.014. This value is sufficiently low to reject the null hypothesis and
conclude that the variance of pin length is less than 0.001. You can further
hone your estimate of the population variance by considering the 95%
upper bound, which provides a value that the population variance is likely
to be below. From this analysis, you should conclude that the variance of
pin length is small enough to meet specifications and ensure passenger
safety.
10.Example of 2 Variances
A study was performed in order to evaluate the effectiveness of two
devices for improving the efficiency of gas home-heating systems. Energy
consumption in houses was measured after one of the two devices was
installed. The two devices were an electric vent damper (Damper = 1) and
a

thermally

activated

vent

damper

(Damper

2).

The

energy

consumption data (BTU.In) are stacked in one column with a grouping

column (Damper) containing identifiers or subscripts to denote the


population. You are interested in comparing the variances of the two
populations so that you can construct a two-sample t-test and confidence
interval

to compare the two dampers.

Open the worksheet FURNACE.MTW.

Choose Stat > Basic Statistics > 2 Variances.

Choose Samples in one column.

In Samples, enter 'BTU.In'.

In Subscripts, enter Damper. Click OK.

Interpreting the results

The variance test generates a plot that displays Bonferroni 95%


confidence intervals for the population standard deviation at both factor
levels. The graph also displays the side-by-side boxplots of the raw data
for the two samples. Finally, the results of the F-test and Levene's test are
given in both the Session window and the graph. Interpret the F-test when
the data come from a normal distribution and Levene's test when the data
come from a continuous, but not necessarily normal, distribution. Note
that the 95% confidence level applies to the family of intervals and the
asymmetry of the intervals is due to the skewness of the chi-square
distribution. For the energy consumption example, the p-values(5) of 0.558
and 0.996 are greater than reasonable choices of

(2), so you fail to reject

the null hypothesis(6) of the variances being equal. That is, these data do
not provide enough evidence to claim that the two populations have
unequal variances. Thus, it is reasonable to assume equal variances when
using a two-sample t-procedure.
11.Example of Correlation
We have verbal and math SAT scores and first-year college gradepoint averages for 200 students and we wish to investigate the
relatedness of these variables. We use correlation with the default choice
for displaying p-values.
1

Open the worksheet GRADES.MTW.

Choose Stat > Basic Statistics > Correlation.

In Variables, enter Verbal Math GPA. Click OK.

Interpreting the results

Minitab displays the correlation for the lower triangle of the


correlation matrix when there are more than two variables. The Pearson
correlation between Math and Verbal is 0.275, between GPA and Verbal is
0.322, and between GPA and Math is 0.194. Minitab prints the p-values(5)
for the individual hypothesis tests(6) of the correlations being zero below
the correlations. Since all the p-values are smaller than 0.01, there is
sufficient evidence at

= 0.01(2) that the correlations are not zero, in

part reflecting the large sample size of 200.


12. Example of Normality Test
In an operating engine, parts of the crankshaft move up and down.
AtoBDist is the distance (in mm) from the actual (A) position of a point on
the crankshaft to a baseline (B) position. To ensure production quality, a
manager took five measurements each working day in a car assembly
plant, from September 28 through October 15, and then ten per day from
the 18th through the 25th. You wish to see if these data follow a normal
distribution, so you use Normality test.
1

Open the worksheet CRANKSH.MTW.

Choose Stat > Basic Statistics > Normality Test.

In Variable, enter AtoBDist. Click OK.

Interpreting the results


The graphical output is a plot of normal probabilities versus the data.
The data depart from the fitted line most evidently in the extremes, or

distribution tails. The Anderson-Darling test's p-value indicates that, at a


levels greater than 0.022, there is evidence that the data do not follow a
normal distribution. There is a slight tendency for these data to be lighter
in the tails than a normal distribution because the smallest points are
below the line and the largest point is just above the line. A distribution
with heavy tails would show the opposite pattern at the extremes.
References
[1]
[2]

S.F. Arnold (1990). Mathematical Statistics. Prentice-Hall.


M.B. Brown and A.B. Forsythe (1974). "Robust Tests for the Equality of Variances,"
Journal of the American Statistical Association, 69, 364-367.
[3] George Casella and Roger Berger (1990). Statistical Inference, Duxbury Press, p. 421.
[4] R.B. D'Agostino and M.A. Stephens, Eds. (1986). Goodness-of-Fit Techniques, Marcel
Dekker.
[5] J.J. Filliben (1975). "The Probability Plot Correlation Coefficient Test for Normality,"
Technometrics, 17, 111.
[6] T.P. Hettmansperger and S.J. Sheather (1986). "Confidence Intervals Based on
Interpolated Order Statistics," Statistics and Probability Letters, 4, 75-79.
[7] Kotz, Samuel and Norman L. Johnson (1988). Encyclopedia of Statistical Sciences
Volume 8. John Wiley and Sons. pp 271-278.
[8] H. Levene (1960). Contributions to Probability and Statistics, Stanford University
Press.
[9] H.W. Lilliefore (1967). "On the Kolmogorov-Smirnov Test for Normality with Mean
and Variance Unknown," Journal of the American Statistical Association, 62, 399-402.
[10] T.A. Ryan, Jr. and B.L. Joiner (1976). "Normal Probability Plots and Tests for
Normality," Technical Report, Statistics Department, The Pennsylvania State
University. (Available from Minitab Inc.)
[11] S.S. Shapiro and R.S. Francia (1972). "An Approximate Analysis of Variance Test for
Normality," Journal of the American Statistical Association, 67, 215-216.
[12] S.S. Shapiro and M.B. Wilk. (1965). "An Analysis of Variance Test for Normality
(Complete Samples)," Biometrika, 52, 591.

http://statsdata.blogspot.com/

S-ar putea să vă placă și