Stat Review - Keller

What is Statistics?
Statisticsisawaytogetinformationfromdata
Statistics
Data
Data:Facts,especially
numericalfacts,collected
togetherforreferenceor
information.
Information
Information:Knowledge
communicatedconcerning
someparticularfact.
Statisticsisatoolforcreatingnewunderstandingfromasetof
numbers.
Definitions:OxfordEnglishDictionary
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Key Statistical Concepts

Population
apopulationisthegroupofallitemsofinterestto
astatisticspractitioner.
frequentlyverylarge;sometimesinfinite.
E.g.All5millionFloridavoters,perExample12.5
Sample
Asampleisasetofdatadrawnfromthe
population.
Potentiallyverylarge,butlessthanthepopulation.
E.g.asampleof765votersexitpolledonelectionday.

Parameter
Adescriptivemeasureofapopulation.
Statistic
Adescriptivemeasureofasample.

Population
Sample
Subset
Parameter
Statistic
PopulationshaveParameters,
SampleshaveStatistics.
Descriptive Statistics
aremethodsoforganizing,summarizing,andpresenting
datainaconvenientandinformativeway.Thesemethods
include:
GraphicalTechniques(Chapter2),and
NumericalTechniques(Chapter4).
Theactualmethoduseddependsonwhatinformationwe
wouldliketoextract.Areweinterestedin
measure(s)ofcentrallocation?and/or
measure(s)ofvariability(dispersion)?
DescriptiveStatisticshelpstoanswerthesequestions
Statistical Inference
Statisticalinferenceistheprocessofmakinganestimate,
prediction,ordecisionaboutapopulationbasedonasample.
Population
Sample
Inference
Statistic
Parameter
WhatcanweinferaboutaPopulationsParameters
basedonaSamplesStatistics?
Definitions
Avariableissomecharacteristicofapopulationorsample.
E.g.studentgrades.
Typicallydenotedwithacapitalletter:X,Y,Z
Thevaluesofthevariablearetherangeofpossiblevalues
foravariable.
E.g.studentmarks(0..100)
Dataaretheobservedvaluesofavariable.
E.g.studentmarks:{67,74,71,83,93,55,48}
Interval Data
Intervaldata
Realnumbers,i.e.heights,weights,prices,etc.
Alsoreferredtoasquantitativeornumerical.
ArithmeticoperationscanbeperformedonIntervalData,
thusitsmeaningfultotalkabout2*Height,orPrice+$1,
andsoon.
Nominal Data
NominalData
Thevaluesofnominaldataarecategories.
E.g.responsestoquestionsaboutmaritalstatus,coded
as:
Single=1,Married=2,Divorced=3,Widowed=4
Becausethenumbersarearbitraryarithmeticoperations
dontmakeanysense(e.g.doesWidowed2=Married?!)
Nominaldataarealsocalledqualitativeorcategorical.
Ordinal Data
OrdinalDataappeartobecategoricalinnature,buttheir
valueshaveanorder;arankingtothem:
E.g.Collegecourseratingsystem:
poor=1,fair=2,good=3,verygood=4,excellent=5
Whileitsstillnotmeaningfultodoarithmeticonthisdata
(e.g.does2*fair=verygood?!),wecansaythingslike:
excellent > poororfair < very good
Thatis,orderismaintainednomatterwhatnumericvalues
areassignedtoeachcategory.
Graphical & Tabular Techniques for Nominal

Data
Theonlyallowablecalculationonnominaldataistocount
thefrequencyofeachvalueofthevariable.
Wecansummarizethedatainatablethatpresentsthe
categoriesandtheircountscalledafrequencydistribution.
Arelativefrequencydistributionliststhecategoriesandthe
proportionwithwhicheachoccurs.
RefertoExample2.1
Nominal Data (Tabular Summary)
Nominal Data (Frequency)
BarChartsareoftenusedtodisplayfrequencies
Nominal Data
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.
Graphical Techniques for Interval

Data
Thereareseveralgraphicalmethodsthatareusedwhenthe
dataareinterval(i.e.numeric,noncategorical).
Themostimportantofthesegraphicalmethodsisthe
histogram.
Thehistogramisnotonlyapowerfulgraphicaltechnique
usedtosummarizeintervaldata,butitisalsousedtohelp
explainprobabilities.
Building a Histogram
1) CollecttheData
2) Createafrequencydistributionforthedata.
3) DrawtheHistogram.
Histogram and Stem & Leaf
Ogive
Isagraphofacumulativefrequencydistribution.
Wecreateanogiveinthreesteps
1)Calculaterelativefrequencies.
2)Calculatecumulativerelativefrequenciesbyaddingthe
currentclassrelativefrequencytothepreviousclass
cumulativerelativefrequency.
(Forthefirstclass,itscumulativerelativefrequencyisjustitsrelativefrequency)
Cumulative Relative Frequencies
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00
Ogive
The ogive can be used
to answer questions
like:
What telephone bill
value is at the 50th
percentile?
around $35
(ReferalsotoFig.2.13inyourtextbook)
Scatter Diagram
Example2.9Arealestateagentwantedtoknowtowhat
extentthesellingpriceofahomeisrelatedtoitssize
1) Collectthedata
2) Determinetheindependentvariable(Xhousesize)and
thedependentvariable(Ysellingprice)
3) UseExceltocreateascatterdiagram
Scatter Diagram
Itappearsthatinfactthereisarelationship,thatis,the
greaterthehousesizethegreaterthesellingprice
Patterns of Scatter Diagrams

LinearityandDirectionaretwoconceptsweareinterestedin
Positive Linear Relationship
Negative Linear Relationship
Weak or Non-Linear Relationship

Time Series Data

Observationsmeasuredatthesamepointintimearecalled
crosssectionaldata.
Observationsmeasuredatsuccessivepointsintimeare
calledtimeseriesdata.
Timeseriesdatagraphedonalinechart,whichplotsthe
valueofthevariableontheverticalaxisagainstthetime
periodsonthehorizontalaxis.
Numerical Descriptive Techniques

MeasuresofCentralLocation
Mean,Median,Mode
MeasuresofVariability
Range,StandardDeviation,Variance,CoefficientofVariation
MeasuresofRelativeStanding
Percentiles,Quartiles
MeasuresofLinearRelationship
Covariance,Correlation,LeastSquaresLine
Measures of Central Location

Thearithmeticmean,a.k.a.average,shortenedtomean,is
themostpopular&usefulmeasureofcentrallocation.
Itiscomputedbysimplyaddingupalltheobservationsand
dividingbythetotalnumberofobservations:
Sum of the observations

Mean =
Number of observations
Arithmetic Mean
PopulationMean
SampleMean
Statistics is a pattern language

Population
Sample
Size
Mean
The Arithmetic Mean

isappropriatefordescribingmeasurementdata,e.g.
heightsofpeople,marksofstudentpapers,etc.
isseriouslyaffectedbyextremevaluescalledoutliers.
E.g.assoonasabillionairemovesintoaneighborhood,the
averagehouseholdincomeincreasesbeyondwhatitwas
previously!
Measures of Variability
Measuresofcentrallocationfailtotellthewholestoryabout
thedistribution;thatis,howmucharetheobservations
spreadoutaroundthemeanvalue?
For example, two sets of
class grades are shown. The
mean (=50) is the same in
each case
But, the red class has
greater variability than the
blue class.
Range
Therangeisthesimplestmeasureofvariability,calculated
as:
Range=LargestobservationSmallestobservation
E.g.
Data:{4,4,4,4,50}
Range=46
Data:{4,8,15,24,39,50}
Range=46
Therangeisthesameinbothcases,
butthedatasetshaveverydifferentdistributions
Statistics is a pattern language

Population
Sample
Size
Mean
Variance
Variance
population mean
Thevarianceofapopulationis:
population size
Thevarianceofasampleis:
Note! the denominator is sample size (n) minus one !
sample mean
Application
Example4.7.Thefollowingsampleconsistsofthenumber
ofjobssixrandomlyselectedstudentsappliedfor:17,15,
23,7,9,13.
Findsitsmeanandvariance.
Whatarewelookingtocalculate?
Thefollowingsampleconsistsofthenumberofjobssix
randomlyselectedstudentsappliedfor:17,15,23,7,9,13.
Findsitsmeanandvariance.
asopposedtoor2
Sample Mean & Variance

SampleMean
SampleVariance
SampleVariance(shortcutmethod)
Standard Deviation
Thestandarddeviationissimplythesquarerootofthe
variance,thus:
Populationstandarddeviation:
Samplestandarddeviation:
Standard Deviation
ConsiderExample4.8whereagolfclubmanufacturerhas
designedanewclubandwantstodetermineifitishitmore
consistently(i.e.withlessvariability)thanwithanoldclub.
UsingTools > Data Analysis [may need to add in > Descriptive
StatisticsinExcel,weproducethefollowingtablesfor
interpretation
Yougetmore
consistent
distancewiththe
newclub.
The Empirical Rule
If the histogram is bell
shaped
Approximately 68% of all observations fall

within one standard deviation of the mean.
Approximately 95% of all observations fall

within two standard deviations of the mean.
Approximately 99.7% of all observations fall

within three standard deviations of the mean.
Chebysheffs TheoremNot often used because interval is

very wide.
Amoregeneralinterpretationofthestandarddeviationis
derivedfromChebysheffsTheorem,whichappliestoall
shapesofhistograms(notjustbellshaped).
Theproportionofobservationsinanysamplethatlie
withinkstandarddeviationsofthemeanisatleast:
For k=2 (say), the theorem
states that at least 3/4 of all
observations lie within 2
standard deviations of the
mean. This is a lower bound
compared to Empirical Rules
approximation (95%).
Box Plots
Theseboxplotsarebasedon
datainXm0415.
Wendysservicetimeis
shortestandleastvariable.
Hardeeshasthegreatest
variability,whileJackin
theBoxhasthelongest
servicetimes.
Methods of Collecting Data

Therearemanymethodsusedtocollectorobtaindatafor
statisticalanalysis.Threeofthemostpopularmethodsare:
DirectObservation
Experiments,and
Surveys.
Sampling
Recallthatstatisticalinferencepermitsustodraw
conclusionsaboutapopulationbasedonasample.
Sampling(i.e.selectingasubsetofawholepopulation)is
oftendoneforreasonsofcost(itslessexpensivetosample
1,000televisionviewersthan100millionTVviewers)and
practicality(e.g.performingacrashtestonevery
automobileproducedisimpractical).
Inanycase,thesampledpopulationandthetarget
populationshouldbesimilartooneanother.
Sampling Plans
Asamplingplanisjustamethodorprocedurefor
specifyinghowasamplewillbetakenfromapopulation.
Wewillfocusourattentiononthesethreemethods:
SimpleRandomSampling,
StratifiedRandomSampling,and
ClusterSampling.
Simple Random Sampling

Asimplerandomsampleisasampleselectedinsuchaway
thateverypossiblesampleofthesamesizeisequallylikely
tobechosen.
Drawingthreenamesfromahatcontainingallthenamesof
thestudentsintheclassisanexampleofasimplerandom
sample:anygroupofthreenamesisasequallylikelyas
pickinganyothergroupofthreenames.
Stratified Random Sampling

Afterthepopulationhasbeenstratified,wecanusesimple
randomsamplingtogeneratethecompletesample:
f we only have sufficient resources to sample 400 people total,

we would draw 100 of them from the low income group
if we are sampling 1000 people, wed draw

50 of them from the high income group.
Cluster Sampling
Aclustersampleisasimplerandomsampleofgroupsor
clustersofelements(vs.asimplerandomsampleof
individualobjects).
Thismethodisusefulwhenitisdifficultorcostlytodevelop
acompletelistofthepopulationmembersorwhenthe
populationelementsarewidelydispersedgeographically.
Clustersamplingmayincreasesamplingerrordueto
similaritiesamongclustermembers.
Sampling Error
Samplingerrorreferstodifferencesbetweenthesampleand
thepopulationthatexistonlybecauseoftheobservations
thathappenedtobeselectedforthesample.
Anotherwaytolookatthisis:thedifferencesinresultsfor
differentsamples(ofthesamesize)isduetosamplingerror:
E.g.Twosamplesofsize10of1,000households.Ifwe
happenedtogetthehighestincomeleveldatapointsinour
firstsampleandallthelowestincomelevelsinthesecond,
thisdeltaisduetosamplingerror.
Nonsampling Error
Nonsamplingerrorsaremoreseriousandaredueto
mistakesmadeintheacquisitionofdataorduetothesample
observationsbeingselectedimproperly.Threetypesof
nonsamplingerrors:
Errorsindataacquisition,
Nonresponseerrors,and
Selectionbias.
Note:increasingthesamplesizewillnotreducethistypeof
error.
Approaches to Assigning
Probabilities
Therearethreewaystoassignaprobability,P(Oi),toan
outcome,Oi,namely:
Classicalapproach:makecertainassumptions(suchas
equallylikely,independence)aboutsituation.
Relativefrequency:assigningprobabilitiesbasedon
experimentationorhistoricaldata.
Subjectiveapproach:Assigningprobabilitiesbasedonthe
assignorsjudgment.
Interpreting Probability
Onewaytointerpretprobabilityisthis:
Ifarandomexperimentisrepeatedaninfinitenumberof
times,therelativefrequencyforanygivenoutcomeisthe
probabilityofthisoutcome.
Forexample,theprobabilityofheadsinflipofabalanced
coinis.5,determinedusingtheclassicalapproach.The
probabilityisinterpretedasbeingthelongtermrelative
frequencyofheadsifthecoinisflippedaninfinitenumber
oftimes.
Conditional Probability
Conditionalprobabilityisusedtodeterminehowtwoevents
arerelated;thatis,wecandeterminetheprobabilityofone
eventgiventheoccurrenceofanotherrelatedevent.
ConditionalprobabilitiesarewrittenasP(A|B)andreadas
theprobabilityofAgivenBandiscalculatedas:
Independence
Oneoftheobjectivesofcalculatingconditionalprobability
istodeterminewhethertwoeventsarerelated.
Inparticular,wewouldliketoknowwhethertheyare
independent,thatis,iftheprobabilityofoneeventisnot
affectedbytheoccurrenceoftheotherevent.
TwoeventsAandBaresaidtobeindependentif
P(A|B)=P(A)
or
P(B|A)=P(B)
Complement Rule
ThecomplementofaneventAistheeventthatoccurswhen
Adoesnotoccur.
Thecomplementrulegivesustheprobabilityofanevent
NOToccurring.Thatis:
P(AC)=1P(A)
Forexample,inthesimplerollofadie,theprobabilityofthe
number1beingrolledis1/6.Theprobabilitythatsome
numberotherthan1willberolledis11/6=5/6.
Multiplication Rule
Themultiplicationruleisusedtocalculatethejoint
probabilityoftwoevents.Itisbasedontheformulafor
conditionalprobabilitydefinedearlier:
IfwemultiplybothsidesoftheequationbyP(B)wehave:
P(AandB)=P(A|B)P(B)
Likewise,P(AandB)=P(B|A)P(A)
IfAandBareindependentevents,thenP(AandB)=P(A)P(B)
Addition Rule
Recall:theadditionrulewasintroducedearliertoprovidea
waytocomputetheprobabilityofeventAorBorbothA
andBoccurring;i.e.theunionofAandB.
P(AorB)=P(A)+P(B)P(AandB)
WhydowesubtractthejointprobabilityP(AandB)from
thesumoftheprobabilitiesofAandB?
P(AorB)=P(A)+P(B)P(AandB)
Addition Rule for Mutually Excusive

Events
IfandAandBaremutuallyexclusivetheoccurrenceofone
eventmakestheotheroneimpossible.Thismeansthat
P(AandB)=0
Theadditionruleformutuallyexclusiveeventsis
P(AorB)=P(A)+P(B)
Weoftenusethisformwhenweaddsomejointprobabilities
calculatedfromaprobabilitytree
Two Types of Random Variables

DiscreteRandomVariable
onethattakesonacountablenumberofvalues
E.g.valuesontherollofdice:2,3,4,,12
ContinuousRandomVariable
onewhosevaluesarenotdiscrete,notcountable
E.g.time(30.1minutes?30.10000001minutes?)
Analogy:
IntegersareDiscrete,whileRealNumbersareContinuous
Laws of Expected Value

1. E(c)=c
Theexpectedvalueofaconstant(c)isjustthevalueofthe
constant.
2. E(X+c)=E(X)+c
3. E(cX)=cE(X)
Wecanpullaconstantoutoftheexpectedvalueexpression
(eitheraspartofasumwitharandomvariableXorasacoefficient
ofrandomvariableX).
Laws of Variance
1. V(c)=0
Thevarianceofaconstant(c)iszero.
2. V(X+c)=V(X)
Thevarianceofarandomvariableandaconstantisjustthe
varianceoftherandomvariable(per1above).
3. V(cX)=c2V(X)
Thevarianceofarandomvariableandaconstantcoefficientis
thecoefficientsquaredtimesthevarianceoftherandomvariable.
Binomial Distribution
Thebinomialdistributionistheprobabilitydistributionthat
resultsfromdoingabinomialexperiment.Binomial
experimentshavethefollowingproperties:
1. Fixednumberoftrials,representedasn.
2. Eachtrialhastwopossibleoutcomes,asuccessanda
failure.
3. P(success)=p(andthus:P(failure)=1p),foralltrials.
4. Thetrialsareindependent,whichmeansthatthe
outcomeofonetrialdoesnotaffecttheoutcomesofany
othertrials.
Binomial Random Variable

Thebinomialrandomvariablecountsthenumberof
successesinntrialsofthebinomialexperiment.Itcantake
onvaluesfrom0,1,2,,n.Thus,itsadiscreterandom
variable.
Tocalculatetheprobabilityassociatedwitheachvaluewe
usecombintorics:
forx=0,1,2,,n
Binomial Table
WhatistheprobabilitythatPatfailsthequiz?
i.e.whatisP(X4),givenP(success)=.20andn=10?
P(X4)=.967
Binomial Table
WhatistheprobabilitythatPatgetstwoanswerscorrect?
i.e.whatisP(X=2),givenP(success)=.20andn=10?
P(X=2)=P(X2)P(X1)=.678.376=.302
remember, the table shows cumulative probabilities
=BINOMDIST() Excel Function

ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatgetstwoanswerscorrect?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X=2)=.3020
=BINOMDIST() Excel Function

ThereisabinomialdistributionfunctioninExcelthatcan
alsobeusedtocalculatetheseprobabilities.Forexample:
WhatistheprobabilitythatPatfailsthequiz?
# successes
# trials
P(success)
cumulative
(i.e. P(Xx)?)
P(X4)=.9672
Binomial Distribution
Asyoumightexpect,statisticianshavedevelopedgeneral
formulasforthemean,variance,andstandarddeviationofa
binomialrandomvariable.Theyare:
Poisson Distribution
NamedforSimeonPoisson,thePoissondistributionisa
discreteprobabilitydistributionandreferstothenumberof
events(a.k.a.successes)withinaspecifictimeperiodor
regionofspace.Forexample:
Thenumberofcarsarrivingataservicestationin1hour.(The
intervaloftimeis1hour.)
Thenumberofflawsinaboltofcloth.(Thespecificregionisa
boltofcloth.)
Thenumberofaccidentsin1dayonaparticularstretchof
highway.(Theintervalisdefinedbybothtime,1day,andspace,
theparticularstretchofhighway.)
The Poisson Experiment

Likeabinomialexperiment,aPoissonexperimenthasfour
definingcharacteristicproperties:
1. Thenumberofsuccessesthatoccurinanyintervalis
independentofthenumberofsuccessesthatoccurinany
otherinterval.
2. Theprobabilityofasuccessinanintervalisthesamefor
allequalsizeintervals
3. Theprobabilityofasuccessisproportionaltothesizeof
theinterval.
4. Theprobabilityofmorethanonesuccessinaninterval
approaches0astheintervalbecomessmaller.
ThePoissonrandomvariableisthenumberofsuccesses
thatoccurinaperiodoftimeoranintervalofspaceina
Poissonexperiment.
successes
E.g.Onaverage,96trucksarriveatabordercrossing
everyhour.
time
period
E.g.Thenumberoftypographicerrorsinanewtextbook
editionaverages1.5per100pages.
successes
(?!)
interval
Poisson Probability Distribution

TheprobabilitythataPoissonrandomvariableassumesa
valueofxisgivenby:
andeisthenaturallogarithmbase.
FYI:
Example 7.12
Thenumberoftypographicalerrorsinneweditionsof
textbooksvariesconsiderablyfrombooktobook.After
someanalysisheconcludesthatthenumberoferrorsis
Poissondistributedwithameanof1.5per100pages.The
instructorrandomlyselects100pagesofanewbook.What
istheprobabilitythattherearenotypos?
Thatis,whatisP(X=0)giventhat=1.5?
Thereisabouta22%chanceoffindingzeroerrors
AsmentionedonthePoissonexperimentslide:
Theprobabilityofasuccessisproportionaltothesizeof
theinterval
Thus,knowinganerrorrateof1.5typosper100pages,we
candetermineameanvaluefora400pagebookas:
=1.5(4)=6typos/400pages.
Example 7.13
Fora400pagebook,whatistheprobabilitythatthereare
notypos?
P(X=0)=
thereisaverysmallchancetherearenotypos
Example 7.13
Excelisanevenbetteralternative:
Probability Density Functions

Unlikeadiscreterandomvariablewhichwestudiedin
Chapter7,acontinuousrandomvariableisonethatcan
assumeanuncountablenumberofvalues.
Wecannotlistthepossiblevaluesbecausethereisan
infinitenumberofthem.
Becausethereisaninfinitenumberofvalues,the
probabilityofeachindividualvalueisvirtually0.
Point Probabilities are Zero

Becausethereisaninfinitenumberofvalues,the
probabilityofeachindividualvalueisvirtually0.
Thus,wecandeterminetheprobabilityofarangeofvalues
only.
E.g.withadiscreterandomvariableliketossingadie,itis
meaningfultotalkaboutP(X=5),say.
Inacontinuoussetting(e.g.withtimeasarandomvariable),the
probabilitytherandomvariableofinterest,saytasklength,takes
exactly5minutesisinfinitesimallysmall,henceP(X=5)=0.
ItismeaningfultotalkaboutP(X5).
Probability Density Function

Afunctionf(x)iscalledaprobabilitydensityfunction(over
therangeaxbifitmeetsthefollowing
requirements:
1) f(x)0forallxbetweenaandb,and
f(x)
area=1
a
2) Thetotalareaunderthecurvebetweenaandbis1.0
The Normal Distribution

Thenormaldistributionisthemostimportantofall
probabilitydistributions.Theprobabilitydensityfunctionof
anormalrandomvariableisgivenby:
Itlookslikethis:
Bellshaped,
Symmetricalaroundthemean
The Normal Distribution

Importantthingstonote:
Thenormaldistributionisfullydefinedbytwoparameters:
itsstandarddeviationandmean
Thenormaldistributionisbellshapedand
symmetricalaboutthemean
Unliketherangeoftheuniformdistribution(axb)
Normaldistributionsrangefromminusinfinitytoplusinfinity
Standard Normal Distribution

Anormaldistributionwhosemeaniszeroandstandard
deviationisoneiscalledthestandardnormaldistribution.
0
1
Asweshallseeshortly,anynormaldistributioncanbe
convertedtoastandardnormaldistributionwithsimple
algebra.Thismakescalculationsmucheasier.
Calculating Normal Probabilities

Wecanusethefollowingfunctiontoconvertanynormal
randomvariabletoastandardnormalrandomvariable
Some advice:
always draw a
picture!

Example:Thetimerequiredtobuildacomputerisnormally
distributedwithameanof50minutesandastandard
deviationof10minutes:
Whatistheprobabilitythatacomputerisassembledina
timebetween45and60minutes?
Algebraicallyspeaking,whatisP(45<X<60)?

P(45<X<60)?
meanof50minutesanda
standarddeviationof10minutes

WecanuseTable3in
AppendixBtolookup
probabilitiesP(0<Z<z)
WecanbreakupP(.5<Z<1)into:
P(.5<Z<0)+P(0<Z<1)
Thedistributionissymmetricaroundzero,sowehave:
P(.5<Z<0)=P(0<Z<.5)
Hence:P(.5<Z<1)=P(0<Z<.5)+P(0<Z<1)

HowtouseTable3
ThistablegivesprobabilitiesP(0<Z<z)
Firstcolumn=integer+firstdecimal
Toprow=seconddecimalplace
P(0<Z<0.5)
P(0<Z<1)
P(.5<Z<1)=.1915+.3414=.5328
Using the Normal Table (Table 3)

WhatisP(Z>1.6)?
P(0 < Z < 1.6) = .4452
z
0
1.6
P(Z > 1.6) = .5 P(0 < Z < 1.6)

= .5 .4452
= .0548

WhatisP(Z<2.23)?
P(0 < Z < 2.23)
P(Z < -2.23)
P(Z > 2.23)

z
-2.23
2.23
P(Z < -2.23) = P(Z > 2.23)

= .5 P(0 < Z < 2.23)
= .0129

WhatisP(Z<1.52)?
P(0 < Z < 1.52)
P(Z < 0) = .5
z
0
1.52
P(Z < 1.52) = .5 + P(0 < Z < 1.52)

= .5 + .4357
= .9357

WhatisP(0.9<Z<1.9)?
P(0 < Z < 0.9)
P(0.9 < Z < 1.9)
z
0
0.9
1.9
P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)
=.4713 .3159
= .1554
Finding Values of Z
OtherZvaluesare
Z.05=1.645
Z.01=2.33
Using the values of Z

Becausez.025=1.96andz.025=1.96,itfollowsthatwecan
state
P(1.96<Z<1.96)=.95
Similarly
P(1.645<Z<1.645)=.90
Other Continuous Distributions

Threeotherimportantcontinuousdistributionswhichwillbe
usedextensivelyinlatersectionsareintroducedhere:
StudenttDistribution,
ChiSquaredDistribution,and
FDistribution.
Student t Distribution
Herethelettertisusedtorepresenttherandomvariable,
hencethename.ThedensityfunctionfortheStudentt
distributionisasfollows
(nu)iscalledthedegreesoffreedom,and
(Gammafunction)is(k)=(k1)(k2)(2)(1)
Student t Distribution
Inmuchthesamewaythatanddefinethenormal
distribution,,thedegreesoffreedom,definestheStudent
tDistribution:
Figure 8.24
Asthenumberofdegreesoffreedomincreases,thet
distributionapproachesthestandardnormaldistribution.
Determining Student t Values

Thestudenttdistributionisusedextensivelyinstatistical
inference.Table4inAppendixBlistsvaluesof
Thatis,valuesofaStudenttrandomvariablewithdegrees
offreedomsuchthat:
ThevaluesforAarepredetermined
criticalvalues,typicallyinthe
10%,5%,2.5%,1%and1/2%range.
Using the t table (Table 4) for

values
Forexample,ifwewantthevalueoftwith10degreesof
freedomsuchthattheareaundertheStudenttcurveis.05:
Area under the curve value (tA) : COLUMN
t.05,10
t.05,10=1.812
Degrees of Freedom : ROW
F Distribution
TheFdensityfunctionisgivenby:
F>0.Twoparametersdefinethisdistribution,andlike
wevealreadyseentheseareagaindegreesoffreedom.
isthenumeratordegreesoffreedomand
isthedenominatordegreesoffreedom.
Determining Values of F
Forexample,whatisthevalueofFfor5%oftheareaunder
therighthandtailofthecurve,withanumeratordegreeof
freedomof3andadenominatordegreeoffreedomof7?
Solution:usetheFlookup(Table6)
There are different tables
for different values of A.
Make sure you start with
the correct table!!
F.05,3,7
F.05,3,7=4.35
Denominator Degrees of Freedom : ROW

Numerator Degrees of Freedom : COLUMN
Determining Values of F
Forareasunderthecurveonthelefthandsideofthecurve,
wecanleveragethefollowingrelationship:
Paycloseattentiontotheorderoftheterms!
Chapter 9
Sampling Distributions
1.100
Sampling Distribution of the Mean

Afairdieisthrowninfinitelymanytimes,
withtherandomvariableX=#ofspotsonanythrow.
TheprobabilitydistributionofXis:
x
P(x)
1/6
1/6
1/6
1/6
1/6
1/6
andthemeanandvariancearecalculatedaswell:
Sampling Distribution of Two Dice

Asamplingdistributioniscreatedbylookingat
allsamplesofsizen=2(i.e.twodice)andtheirmeans
Whilethereare36possiblesamplesofsize2,thereareonly
11valuesfor,andsome(e.g.=3.5)occurmore
frequentlythanothers(e.g.=1).
Sampling Distribution of Two Dice

Thesamplingdistributionofisshownbelow:
6/36
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
5/36
P()
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
P()
4/36
3/36
2/36
1/36
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Compare
ComparethedistributionofX
1.0
1.5
2.0
withthesamplingdistributionof.
Aswell,notethat:
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Central Limit Theorem

Thesamplingdistributionofthemeanofarandomsample
drawnfromanypopulationisapproximatelynormalfora
sufficientlylargesamplesize.
Thelargerthesamplesize,themorecloselythesampling
distributionofXwillresembleanormaldistribution.
Central Limit Theorem

Ifthepopulationisnormal,thenXisnormallydistributed
forallvaluesofn.
Ifthepopulationisnonnormal,thenXisapproximately
normalonlyforlargervaluesofn.
Inmanypracticalsituations,asamplesizeof30maybe
sufficientlylargetoallowustousethenormaldistribution
asanapproximationforthesamplingdistributionofX.
Sampling Distribution of the Sample

Mean
1.
2.
3.IfXisnormal,Xisnormal.IfXisnonnormal,Xis
approximatelynormalforsufficientlylargesamplesizes.
Note:thedefinitionofsufficientlylargedependsonthe
extentofnonnormalityofx(e.g.heavilyskewed;
multimodal)
Example 9.1(a)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysonebottle,whatistheprobabilitythatthe
bottlewillcontainmorethan32ounces?
Example 9.1(a)
WewanttofindP(X>32),whereXisnormallydistributed
and=32.2and=.3
thereisabouta75%chancethatasinglebottleofsoda
containsmorethan32oz.
Example 9.1(b)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Example 9.1(b)
WewanttofindP(X>32),whereXisnormallydistributed
with=32.2and=.3
Thingsweknow:
1) Xisnormallydistributed,thereforesowillX.
2) =32.2oz.
3)
Example 9.1(b)
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Thereisabouta91%chancethemeanofthefourbottles
willexceed32oz.
Graphically Speaking
what is the probability that one

bottle will contain more than 32
ounces?
mean=32.
2
what is the probability that the

mean of four bottles will exceed 32
oz?
Sampling Distribution:
Difference of two
means
Thefinalsamplingdistributionintroducedisthatofthe
differencebetweentwosamplemeans.Thisrequires:
independentrandomsamplesbedrawnfromeachoftwo
normalpopulations
Ifthisconditionismet,thenthesamplingdistributionofthe
differencebetweenthetwosamplemeans,i.e.
willbenormallydistributed.
(note:ifthetwopopulationsarenotbothnormally
distributed,butthesamplesizesarelarge(>30),the
distributionofisapproximatelynormal)
Sampling Distribution:
Difference of two
means
Theexpectedvalueandvarianceofthesampling
distributionofaregivenby:
mean:
standarddeviation:
(alsocalledthestandarderrorifthedifferencebetweentwo
means)
Estimation
Therearetwotypesofinference:estimationandhypothesis
testing;estimationisintroducedfirst.
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
E.g.,thesamplemean()isemployedtoestimatethe
populationmean().
Estimation
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
Therearetwotypesofestimators:
PointEstimator
IntervalEstimator
Point & Interval Estimation

Forexample,supposewewanttoestimatethemeansummer
incomeofaclassofbusinessstudents.Forn=25students,
iscalculatedtobe400$/week.
pointestimate
intervalestimate
Analternativestatementis:
Themeanincomeisbetween380and420$/week.
Estimating
when
is known
WeestablishedinChapter9:
the confidence
interval
Thus,theprobabilitythattheinterval:
the sample mean

is in the center of
the interval
containsthepopulationmeanis1.Thisisa
confidenceintervalestimatorfor.
Four commonly used confidence

levels
ConfidenceLevel
cut & keep handy!
Table 10.1
Example 10.1
Acomputercompanysamplesdemandduringleadtimeover
25timeperiods:
235
421
394
261
386
374
361
439
374
316
309
514
348
302
296
499
462
344
466
332
253
369
330
535
334
Itsisknownthatthestandarddeviationofdemandoverlead
timeis75computers.Wewanttoestimatethemeandemand
overleadtimewith95%confidenceinordertosetinventory
levels
Example 10.1
CALCULATE
Inordertouseourconfidenceintervalestimator,weneedthe
followingpiecesofdata:
370.16
Calculatedfromthedata
1.96
75
25
Given
therefore:
Thelowerandupperconfidencelimitsare340.76and399.56.
Example 10.1
INTERPRET
Theestimationforthemeandemandduringleadtimelies
between340.76and399.56wecanusethisasinputin
developinganinventorypolicy.
Thatis,weestimatedthatthemeandemandduringleadtime
fallsbetween340.76and399.56,andthistypeofestimator
iscorrect95%ofthetime.Thatalsomeansthat5%ofthe
timetheestimatorwillbeincorrect.
Incidentally,themediaoftenrefertothe95%figureas19
timesoutof20,whichemphasizesthelongrunaspectof
theconfidencelevel.
Interval Width
Awideintervalprovideslittleinformation.
Forexample,supposeweestimatewith95%confidencethat
anaccountantsaveragestartingsalaryisbetween$15,000
and$100,000.
Contrastthiswith:a95%confidenceintervalestimateof
startingsalariesbetween$42,000and$45,000.
Thesecondestimateismuchnarrower,providingaccounting
studentsmorepreciseinformationaboutstartingsalaries.
Interval Width
Thewidthoftheconfidenceintervalestimateisafunctionof
theconfidencelevel,thepopulationstandarddeviation,and
thesamplesize
Selecting the Sample Size

Wecancontrolthewidthoftheintervalbydeterminingthe
samplesizenecessarytoproducenarrowintervals.
Supposewewanttoestimatethemeandemandtowithin5
units;i.e.wewanttotheintervalestimatetobe:
Since:
Itfollowsthat
Solveforntogetrequisitesamplesize!
Selecting the Sample Size

Solvingtheequation
thatis,toproducea95%confidenceintervalestimateofthe
mean(5units),weneedtosample865leadtimeperiods
(vs.the25datapointswehavecurrently).
Sample Size to Estimate a Mean

Thegeneralformulaforthesamplesizeneededtoestimatea
populationmeanwithanintervalestimateof:
Requiresasamplesizeofatleastthislarge:
Example 10.2
Alumbercompanymustestimatethemeandiameteroftrees
todeterminewhetherornotthereissufficientlumberto
harvestanareaofforest.Theyneedtoestimatethistowithin
1inchataconfidencelevelof99%.Thetreediametersare
normallydistributedwithastandarddeviationof6inches.
Howmanytreesneedtobesampled?
Example 10.2
Thingsweknow:
Confidencelevel=99%,therefore=.01
1
Wewant,henceW=1.
Wearegiventhat=6.
Example 10.2
Wecompute
Thatis,wewillneedtosampleatleast239treestohavea
99%confidenceintervalof
1
Nonstatistical Hypothesis Testing

Acriminaltrialisanexampleofhypothesistestingwithout
thestatistics.
Inatrialajurymustdecidebetweentwohypotheses.The
nullhypothesisis
H0:Thedefendantisinnocent
Thealternativehypothesisorresearchhypothesisis
H1:Thedefendantisguilty
Thejurydoesnotknowwhichhypothesisistrue.Theymust
makeadecisiononthebasisofevidencepresented.

Therearetwopossibleerrors.
ATypeIerroroccurswhenwerejectatruenullhypothesis.
Thatis,aTypeIerroroccurswhenthejuryconvictsan
innocentperson.
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis.Thatoccurswhenaguiltydefendantisacquitted.

TheprobabilityofaTypeIerrorisdenotedas(Greek
letteralpha).TheprobabilityofatypeIIerroris(Greek
letterbeta).
Thetwoprobabilitiesareinverselyrelated.Decreasingone
increasestheother.

Thecriticalconceptsaretheses:
1.Therearetwohypotheses,thenullandthealternative
hypotheses.
2.Theprocedurebeginswiththeassumptionthatthenull
hypothesisistrue.
3.Thegoalistodeterminewhetherthereisenoughevidenceto
inferthatthealternativehypothesisistrue.
4.Therearetwopossibledecisions:
Concludethatthereisenoughevidencetosupportthe
alternativehypothesis.
Concludethatthereisnotenoughevidencetosupportthe
alternativehypothesis.

5.Twopossibleerrorscanbemade.
TypeIerror:Rejectatruenullhypothesis
TypeIIerror:Donotrejectafalsenullhypothesis.
P(TypeIerror)=
P(TypeIIerror)=
Concepts of Hypothesis Testing (1)

Therearetwohypotheses.Oneiscalledthenullhypothesis
andtheotherthealternativeorresearchhypothesis.The
usualnotationis:
pronounce
d
H nought
H0:thenullhypothesis
H1:thealternativeorresearchhypothesis
Thenullhypothesis(H0)willalwaysstatethattheparameter
equalsthevaluespecifiedinthealternativehypothesis(H1)
Concepts of Hypothesis Testing

ConsiderExample10.1(meandemandforcomputersduring
assemblyleadtime)again.Ratherthanestimatethemean
demand,ouroperationsmanagerwantstoknowwhetherthe
meanisdifferentfrom350units.Wecanrephrasethis
requestintoatestofthehypothesis:
H0:=350
Thus,ourresearchhypothesisbecomes:
This is what we are
H1:350
interested in
determining
Concepts of Hypothesis Testing (4)

Therearetwopossibledecisionsthatcanbemade:
Concludethatthereisenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:rejectingthenullhypothesisinfavorofthe
alternative)
Concludethatthereisnotenoughevidencetosupportthe
alternativehypothesis
(alsostatedas:notrejectingthenullhypothesisinfavorof
thealternative)
NOTE:wedonotsaythatweacceptthenullhypothesis
Concepts of Hypothesis Testing

Oncethenullandalternativehypothesesarestated,thenext
stepistorandomlysamplethepopulationandcalculateatest
statistic(inthisexample,thesamplemean).
Iftheteststatisticsvalueisinconsistentwiththenull
hypothesiswerejectthenullhypothesisandinferthatthe
alternativehypothesisistrue.
Forexample,ifweretryingtodecidewhetherthemeanis
notequalto350,alargevalueof(say,600)wouldprovide
enoughevidence.Ifiscloseto350(say,355)wecouldnot
saythatthisprovidesagreatdealofevidencetoinferthatthe
populationmeanisdifferentthan350.
Types of Errors
ATypeIerroroccurswhenwerejectatruenullhypothesis
(i.e.RejectH0whenitisTRUE)
H0
Reject
Reject
II
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis(i.e.DoNOTrejectH0whenitisFALSE)
Recap I
1)Twohypotheses:H0&H1
2)ASSUMEH0isTRUE
3)GOAL:determineifthereisenoughevidencetoinferthat
H1isTRUE
4)Twopossibledecisions:
RejectH0infavorofH1
NOTRejectH0infavorofH1
5)Twopossibletypesoferrors:
TypeI:rejectatrueH0[P(TypeI)=]
TypeII:notrejectafalseH0[P(TypeII)=]
Example 11.1
Adepartmentstoremanagerdeterminesthatanewbilling
systemwillbecosteffectiveonlyifthemeanmonthly
accountismorethan$170.
Arandomsampleof400monthlyaccountsisdrawn,for
whichthesamplemeanis$178.Theaccountsare
approximatelynormallydistributedwithastandarddeviation
of$65.
Canweconcludethatthenewsystemwillbecosteffective?
Example 11.1
Thesystemwillbecosteffectiveifthemeanaccountbalance
forallcustomersisgreaterthan$170.
Weexpressthisbeliefasaourresearchhypothesis,thatis:
H1:>170(thisiswhatwewanttodetermine)
Thus,ournullhypothesisbecomes:
H0:=170(thisspecifiesasinglevalueforthe
parameterofinterest)
Example 11.1
Whatwewanttoshow:
H1:>170
H0:=170(wellassumethisistrue)
Weknow:
n=400,
=178,and
=65
Hmm.Whattodonext?!
Example 11.1
Totestourhypotheses,wecanusetwodifferentapproaches:
Therejectionregionapproach(typicallyusedwhen
computingstatisticsmanually),and
Thepvalueapproach(whichisgenerallyusedwitha
computerandstatisticalsoftware).
Wewillexplorebothinturn
Example 11.1 Rejection Region

Therejectionregionisarangeofvaluessuchthatifthetest
statisticfallsintothatrange,wedecidetorejectthenull
hypothesisinfavorofthealternativehypothesis.
isthecriticalvalueoftorejectH0.
Example 11.1
Allthatslefttodoiscalculateandcompareitto170.
wecancalculatethisbasedonanylevelof
significance()wewant
Example 11.1
Ata5%significancelevel(i.e.=0.05),weget
Solvingwecompute=175.34
Sinceoursamplemean(178)isgreaterthanthecriticalvaluewe
calculated(175.34),werejectthenullhypothesisinfavorofH1,i.e.
that:>170andthatitiscosteffectivetoinstallthenewbilling
system
Example 11.1 The Big Picture
H1:>170
H0:=170
RejectH0infavorof
=175.34
=178
Standardized Test Statistic

Aneasiermethodistousethestandardizedteststatistic:
andcompareitsresultto:(rejectionregion:z>)
Sincez=2.46>1.645(z.05),werejectH0infavorofH1
PLOT POWER CURVE
p-Value
Thepvalueofatestistheprobabilityofobservingatest
statisticatleastasextremeastheonecomputedgiventhat
thenullhypothesisistrue.
Inthecaseofourdepartmentstoreexample,whatisthe
probabilityofobservingasamplemeanatleastasextreme
astheonealreadyobserved(i.e.=178),giventhatthenull
hypothesis(H0:=170)istrue?
p-value
Interpreting the p-value

Thesmallerthepvalue,themorestatisticalevidenceexists
tosupportthealternativehypothesis.
Ifthepvalueislessthan1%,thereisoverwhelming
evidencethatsupportsthealternativehypothesis.
Ifthepvalueisbetween1%and5%,thereisastrong
Ifthepvalueisbetween5%and10%thereisaweak
Ifthepvalueexceeds10%,thereisnoevidencethat
supportsthealternativehypothesis.
Weobserveapvalueof.0069,hencethereis
overwhelmingevidencetosupportH1:>170.
Interpreting the p-value

Comparethepvaluewiththeselectedvalueofthe
significancelevel:
Ifthepvalueislessthan,wejudgethepvaluetobe
smallenoughtorejectthenullhypothesis.
Ifthepvalueisgreaterthan,wedonotrejectthenull
hypothesis.
Sincepvalue=.0069<=.05,werejectH0infavorofH1
Chapter-Opening Example
Theobjectiveofthestudyistodrawaconclusionaboutthe
meanpaymentperiod.Thus,theparametertobetestedisthe
populationmean.Wewanttoknowwhetherthereisenough
statisticalevidencetoshowthatthepopulationmeanisless
than22days.Thus,thealternativehypothesisis
H1:<22
Thenullhypothesisis
H0:=22
Theteststatisticis
z
x
/ n
Wewishtorejectthenullhypothesisinfavorofthe
alternativeonlyifthesamplemeanandhencethevalueof
theteststatisticissmallenough.Asaresultwelocatethe
rejectionregioninthelefttailofthesamplingdistribution.
Wesetthesignificancelevelat10%.
z z z.10 1.28
Rejectionregion:
FromthedatainSSAwecompute
and
4,759
21.63
220
220
x
/ n
21.63 22
6 / 220
.91
pvalue=P(Z<.91)=.5.3186=.1814
Conclusion:Thereisnotenoughevidencetoinferthatthe
meanislessthan22.
Thereisnotenoughevidencetoinferthattheplanwillbe
profitable.
SinceZ(.91)>Z.10(1.28)
WefailtorejectHo: > 22
at a 10% level of significance.
PLOT POWER CURVE
Right-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()
Left-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()
TwoTail Testing
Twotailtestingisusedwhenwewanttotestaresearch
hypothesisthataparameterisnotequal()tosomevalue
Example 11.2
AT&Tsarguesthatitsratesaresuchthatcustomerswont
seeadifferenceintheirphonebillsbetweenthemandtheir
competitors.Theycalculatethemeanandstandarddeviation
foralltheircustomersat$17.09and$3.87(respectively).
Theythensample100customersatrandomandrecalculatea
monthlyphonebillbasedoncompetitorsrates.
Whatwewanttoshowiswhetherornot:
H1:17.09.Wedothisbyassumingthat:
H0:=17.09
Example 11.2
Therejectionregionissetupsowecanrejectthenull
hypothesiswhentheteststatisticislargeorwhenitissmall.
statissmall
statislarge
Thatis,wesetupatwotailrejectionregion.Thetotalarea
intherejectionregionmustsumto,sowedividethis
probabilityby2.
Example 11.2
Ata5%significancelevel(i.e.=.05),wehave
/2=.025.Thus,z.025=1.96andourrejectionregionis:
z<1.96orz>1.96
z.025
+z.025
Example 11.2
Fromthedata,wecalculate=17.55
Usingourstandardizedteststatistic:
Wefindthat:
Sincez=1.19isnotgreaterthan1.96,norlessthan1.96
wecannotrejectthenullhypothesisinfavorofH1.Thatis
thereisinsufficientevidencetoinferthatthereisa
differencebetweenthebillsofAT&Tandthecompetitor.
PLOT POWER CURVE
Summary of One- and Two-Tail

Tests
One-Tail Test
(left tail)
Two-Tail Test
One-Tail Test
(right tail)
Inference About A Population[SIGMA

UNKNOWN]
Population
Sample
Inference
Statistic
Parameter
Wewilldeveloptechniquestoestimateandtestthree
populationparameters:
PopulationMean
PopulationVariance
PopulationProportionp
Inference With Variance Unknown

Previously,welookedatestimatingandtestingthe
populationmeanwhenthepopulationstandarddeviation()
wasknownorgiven:
Buthowoftendoweknowtheactualpopulationvariance?
Instead,weusetheStudenttstatistic,givenby:
Testing
when
is unknown
Whenthepopulationstandarddeviationisunknownandthe
populationisnormal,theteststatisticfortestinghypotheses
aboutis:
whichisStudenttdistributedwith=n1degreesof
freedom.Theconfidenceintervalestimatorofisgiven
by:
Example 12.1
Willnewworkersachieve90%ofthelevelofexperienced
workerswithinoneweekofbeinghiredandtrained?
Experiencedworkerscanprocess500packages/hour,thusif
ourconjectureiscorrect,weexpectnewworkerstobeable
toprocess.90(500)=450packagesperhour.
Giventhedata,isthisthecase?
Example 12.1
IDENTIFY
Ourobjectiveistodescribethepopulationofthenumbersof
packagesprocessedin1hourbynewworkers,thatiswe
wanttoknowwhetherthenewworkersproductivityismore
than90%ofthatofexperiencedworkers.Thuswehave:
H1:>450
Thereforewesetourusualnullhypothesisto:
H0:=450
Example 12.1
COMPUTE
Ourteststatisticis:
Withn=50datapoints,wehaven1=49degreesoffreedom.
Ourhypothesisunderquestionis:
H1:>450
Ourrejectionregionbecomes:
Thuswewillrejectthenullhypothesisinfavorofthe
alternativeifourcalculatedteststaticfallsinthisregion.
Example 12.1
COMPUTE
Fromthedata,wecalculate=460.38,s =38.83andthus:
Since
werejectH0infavorofH1,thatis,thereissufficient
evidencetoconcludethatthenewworkersareproducingat
morethan90%oftheaverageofexperiencedworkers.
Example 12.2
IDENTIFY
Canweestimatethereturnoninvestmentforcompaniesthat
wonqualityawards?
Wearegivenarandomsampleofn=83suchcompanies.
Wewanttoconstructa95%confidenceintervalforthemean
return,i.e.whatis:??
Example 12.2
Fromthedata,wecalculate:
Forthisterm
andso:
COMPUTE
Check Requisite Conditions

TheStudenttdistributionisrobust,whichmeansthatifthe
populationisnonnormal,theresultsofthettestand
confidenceintervalestimatearestillvalidprovidedthatthe
populationisnotextremelynonnormal.
Tocheckthisrequirement,drawahistogramofthedataand
seehowbellshapedtheresultingfigureis.Ifahistogram
isextremelyskewed(sayinthecaseofanexponential
distribution),thatcouldbeconsideredextremely
nonnormalandhencetstatisticswouldbenotbevalidin
thiscase.
Inference About Population

Variance
Ifweareinterestedindrawinginferencesabouta
populationsvariability,theparameterweneedto
investigateisthepopulationvariance:
Thesamplevariance(s2)isanunbiased,consistentand
efficientpointestimatorfor.Moreover,
thestatistic,,hasachisquareddistribution,
withn1degreesoffreedom.
Testing & Estimating Population

Variance
Combiningthisstatistic:
Withtheprobabilitystatement:
Yieldstheconfidenceintervalestimatorfor:
lower confidence
limit
upper confidence
limit
Example 12.3
IDENTIFY
Consideracontainerfillingmachine.Managementwantsa
machinetofill1liter(1,000ccs)sothatthatvarianceofthe
fillsislessthan1cc2.Arandomsampleofn=251literfills
weretaken.Doesthemachineperformasitshouldatthe5%
significancelevel?
Wewanttoshowthat:
H1:<1
Variance is less than 1 cc2
(soournullhypothesisbecomes:H0:=1).Wewilluse
thisteststatistic:
Example 12.3
COMPUTE
Sinceouralternativehypothesisisphrasedas:
H1:<1
co
m
Wecomputerthesamplevariancetobe:s2=.8088
Andthusourteststatistictakesonthisvalue
pa
re
WewillrejectH0infavorofH1ifourteststatisticfallsinto
thisrejectionregion:
Example 12.4
Aswesaw,wecannotrejectthenullhypothesisinfavorof
thealternative.Thatis,thereisnotenoughevidencetoinfer
thattheclaimistrue.
Note:theresultdoesnotsaythatthevarianceisgreaterthan
1,ratheritmerelystatesthatweareunabletoshowthatthe
varianceislessthan1.
Wecouldestimate(at99%confidencesay)thevarianceof
thefills
Example 12.4
COMPUTE
Inordertocreateaconfidenceintervalestimateofthe
variance,weneedtheseformulae:
lower confidence
limit
upper confidence
limit
weknow(n1)s2=19.41fromourpreviouscalculation,and
wehavefromTable5inAppendixB:
Comparing Two Populations

Previouslywelookedattechniquestoestimateandtest
parametersforonepopulation:
PopulationMean,PopulationVariance
Wewillstillconsidertheseparameterswhenwearelooking
attwopopulations,howeverourinterestwillnowbe:
Thedifferencebetweentwomeans.
Theratiooftwovariances.
Difference of Two Means

Inordertotestandestimatethedifferencebetweentwo
populationmeans,wedrawrandomsamplesfromeachof
twopopulations.Initially,wewillconsiderindependent
samples,thatis,samplesthatarecompletelyunrelatedtoone
another.
Becausewearecomparetwopopulationmeans,weusethe
statistic:
Sampling Distribution of
1.isnormallydistributediftheoriginalpopulations
arenormalorapproximatelynormalifthepopulationsare
nonnormalandthesamplesizesarelarge(n1,n2>30)
2.Theexpectedvalueofis
3.Thevarianceofis
andthestandarderroris:
Making Inferences About

Sinceisnormallydistributediftheoriginal
populationsarenormalorapproximatelynormalifthe
populationsarenonnormalandthesamplesizesarelarge(n1,
n2>30),then:
isastandardnormal(orapproximatelynormal)random
variable.Wecouldusethistobuildteststatisticsor
confidenceintervalestimatorsfor
Making Inferences About

exceptthat,inpractice,thezstatisticisrarelyusedsince
thepopulationvariancesareunknown.
??
Insteadweuseatstatistic.Weconsidertwocasesforthe
unknownpopulationvariances:whenwebelievetheyare
equalandconverselywhentheyarenotequal.
When are variances equal?

Howdoweknowwhenthepopulationvariancesareequal?
Sincethepopulationvariancesareunknown,wecantknow
forcertainwhethertheyreequal,butwecanexaminethe
samplevariancesandinformallyjudgetheirrelativevalues
todeterminewhetherwecanassumethatthepopulation
variancesareequalornot.
Test Statistic for

(equal
variances)
1) Calculatethepooledvarianceestimatoras
2) anduseithere:
degrees of freedom
CI Estimator for
(equal
variances)
Theconfidenceintervalestimatorforwhenthe
populationvariancesareequalisgivenby:
pooled variance estimator
degrees of freedom
Test Statistic for

(unequal
variances)
Theteststatisticforwhenthepopulationvariances
areunequalisgivenby:
degrees of freedom
Likewise,theconfidenceintervalestimatoris:
Example 13.2
IDENTIFY
Twomethodsarebeingtestedforassemblingofficechairs.
Assemblytimesarerecorded(25timesforeachmethod).At
a5%significancelevel,dotheassemblytimesforthetwo
methodsdiffer?
Thatis,H1:
Hence,ournullhypothesisbecomes:H0:
Reminder:Thisisatwotailedtest.
Example 13.2
COMPUTE
Theassemblytimesforeachofthetwomethodsare
recordedandpreliminarydataisprepared
The sample variances are similar, hence we will assume that

the population variances are equal
Example 13.2
COMPUTE
Recall,wearedoingatwotailedtest,hencetherejection
regionwillbe:
Thenumberofdegreesoffreedomis:
Henceourcriticalvaluesoft(andourrejectionregion)
becomes:
Example 13.2
COMPUTE
Inordertocalculateourtstatistic,weneedtofirstcalculate
thepooledvarianceestimator,followedbythetstatistic
Example 13.2
INTERPRET
Sinceourcalculatedtstatisticdoesnotfallintotherejection
region,wecannotrejectH0infavorofH1,thatis,thereisnot
sufficientevidencetoinferthatthemeanassemblytimes
differ.
Example 13.2
INTERPRET
Excel,ofcourse,alsoprovidesuswiththeinformation
Compare
or look at p-value
Confidence Interval
Wecancomputea95%confidenceintervalestimateforthe
differenceinmeanassemblytimesas:
Thatis,weestimatethemeandifferencebetweenthetwo
assemblymethodsbetween.36and.96minutes.Note:zero
isincludedinthisconfidenceinterval
Matched Pairs Experiment

Previouslywhencomparingtwopopulations,weexamined
independentsamples.
If,however,anobservationinonesampleismatchedwith
anobservationinasecondsample,thisiscalledamatched
pairsexperiment.
Tohelpunderstandthisconcept,letsconsiderexample13.4
Identifying Factors
Factorsthatidentifythettestandestimatorof:
Inference about the ratio of two

variances
Sofarwevelookedatcomparingmeasuresofcentral
location,namelythemeanoftwopopulations.
Whenlookingattwopopulationvariances,weconsiderthe
ratioofthevariances,i.e.theparameterofinteresttousis:
Thesamplingstatistic:isFdistributedwith
degreesoffreedom.
Inference about the ratio of two

variances
Ournullhypothesisisalways:
H0:
(i.e.thevariancesofthetwopopulationswillbeequal,hence
theirratiowillbeone)
Therefore,ourstatisticsimplifiesto:
df1=n11
df2=n21
Example 13.6
IDENTIFY
Inexample13.1,welookedatthevariancesofthesamples
ofpeoplewhoconsumedhighfibercerealandthosewhodid
notandassumedtheywerenotequal.Wecanusetheideas
justdevelopedtotestifthisisinfactthecase.
Wewanttoshow:H1:
(thevariancesarenotequaltoeachother)
Hencewehaveournullhypothesis:H0:
Example 13.6
CALCULATE
Sinceourresearchhypothesisis:H1:
Wearedoingatwotailedtest,andourrejectionregionis:
Example 13.6
CALCULATE
Ourteststatisticis:
.58
1.61
Hencethereissufficientevidencetorejectthenull
hypothesisinfavorofthealternative;thatis,thereisa
differenceinthevariancebetweenthetwopopulations.
Example 13.6
INTERPRET
WemayneedtoworkwiththeExceloutputbeforedrawing
conclusions
Our research hypothesis
H1:
requires two-tail testing,
but Excel only gives us values
for one-tail testing
If we double the one-tail p-value Excel gives us, we have the pvalue of
the test were conducting (i.e. 2 x 0.0004 = 0.0008). Refer to
the text and CD Appendices for more detail.

Stat Review - Keller

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Stat Review - Keller

Încărcat de

Drepturi de autor:

Formate disponibile

What is Statistics?

Key Statistical Concepts

Key Statistical Concepts

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Key Statistical Concepts

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Graphical & Tabular Techniques for Nominal

Nominal Data (Tabular Summary)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Nominal Data (Frequency)

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Graphical Techniques for Interval

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Histogram and Stem & Leaf

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Cumulative Relative Frequencies

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Patterns of Scatter Diagrams

Positive Linear Relationship

Negative Linear Relationship

Weak or Non-Linear Relationship

Time Series Data

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Numerical Descriptive Techniques

Measures of Central Location

Sum of the observations

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Statistics is a pattern language

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Arithmetic Mean

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Statistics is a pattern language

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Sample Mean & Variance

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

The Empirical Rule

If the histogram is bell

Approximately 68% of all observations fall

Approximately 95% of all observations fall

Approximately 99.7% of all observations fall

Chebysheffs TheoremNot often used because interval is

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Methods of Collecting Data

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Simple Random Sampling

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Stratified Random Sampling

f we only have sufficient resources to sample 400 people total,

if we are sampling 1000 people, wed draw

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.

Addition Rule for Mutually Excusive

Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.