Documente Academic
Documente Profesional
Documente Cultură
Statisticsisawaytogetinformationfromdata
Statistics
Data
Data:Facts,especially
numericalfacts,collected
togetherforreferenceor
information.
Information
Information:Knowledge
communicatedconcerning
someparticularfact.
Statisticsisatoolforcreatingnewunderstandingfromasetof
numbers.
Definitions:OxfordEnglishDictionary
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample
Asampleisasetofdatadrawnfromthe
population.
Potentiallyverylarge,butlessthanthepopulation.
E.g.asampleof765votersexitpolledonelectionday.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample
Subset
Parameter
Statistic
PopulationshaveParameters,
SampleshaveStatistics.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Descriptive Statistics
aremethodsoforganizing,summarizing,andpresenting
datainaconvenientandinformativeway.Thesemethods
include:
GraphicalTechniques(Chapter2),and
NumericalTechniques(Chapter4).
Theactualmethoduseddependsonwhatinformationwe
wouldliketoextract.Areweinterestedin
measure(s)ofcentrallocation?and/or
measure(s)ofvariability(dispersion)?
DescriptiveStatisticshelpstoanswerthesequestions
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Statistical Inference
Statisticalinferenceistheprocessofmakinganestimate,
prediction,ordecisionaboutapopulationbasedonasample.
Population
Sample
Inference
Statistic
Parameter
WhatcanweinferaboutaPopulationsParameters
basedonaSamplesStatistics?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Definitions
Avariableissomecharacteristicofapopulationorsample.
E.g.studentgrades.
Typicallydenotedwithacapitalletter:X,Y,Z
Thevaluesofthevariablearetherangeofpossiblevalues
foravariable.
E.g.studentmarks(0..100)
Dataaretheobservedvaluesofavariable.
E.g.studentmarks:{67,74,71,83,93,55,48}
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Interval Data
Intervaldata
Realnumbers,i.e.heights,weights,prices,etc.
Alsoreferredtoasquantitativeornumerical.
ArithmeticoperationscanbeperformedonIntervalData,
thusitsmeaningfultotalkabout2*Height,orPrice+$1,
andsoon.
Nominal Data
NominalData
Thevaluesofnominaldataarecategories.
E.g.responsestoquestionsaboutmaritalstatus,coded
as:
Single=1,Married=2,Divorced=3,Widowed=4
Becausethenumbersarearbitraryarithmeticoperations
dontmakeanysense(e.g.doesWidowed2=Married?!)
Nominaldataarealsocalledqualitativeorcategorical.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Ordinal Data
OrdinalDataappeartobecategoricalinnature,buttheir
valueshaveanorder;arankingtothem:
E.g.Collegecourseratingsystem:
poor=1,fair=2,good=3,verygood=4,excellent=5
Whileitsstillnotmeaningfultodoarithmeticonthisdata
(e.g.does2*fair=verygood?!),wecansaythingslike:
excellent > poororfair < very good
Thatis,orderismaintainednomatterwhatnumericvalues
areassignedtoeachcategory.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
BarChartsareoftenusedtodisplayfrequencies
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Nominal Data
Itallthesameinformation,
(basedonthesamedata).
Justdifferentpresentation.
Themostimportantofthesegraphicalmethodsisthe
histogram.
Thehistogramisnotonlyapowerfulgraphicaltechnique
usedtosummarizeintervaldata,butitisalsousedtohelp
explainprobabilities.
Building a Histogram
1) CollecttheData
2) Createafrequencydistributionforthedata.
3) DrawtheHistogram.
Ogive
Isagraphofacumulativefrequencydistribution.
Wecreateanogiveinthreesteps
1)Calculaterelativefrequencies.
2)Calculatecumulativerelativefrequenciesbyaddingthe
currentclassrelativefrequencytothepreviousclass
cumulativerelativefrequency.
(Forthefirstclass,itscumulativerelativefrequencyisjustitsrelativefrequency)
firstclass
nextclass:.355+.185=.540
:
:
lastclass:.930+.070=1.00
Ogive
The ogive can be used
to answer questions
like:
What telephone bill
value is at the 50th
percentile?
around $35
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
(ReferalsotoFig.2.13inyourtextbook)
Scatter Diagram
Example2.9Arealestateagentwantedtoknowtowhat
extentthesellingpriceofahomeisrelatedtoitssize
1) Collectthedata
2) Determinetheindependentvariable(Xhousesize)and
thedependentvariable(Ysellingprice)
3) UseExceltocreateascatterdiagram
Scatter Diagram
Itappearsthatinfactthereisarelationship,thatis,the
greaterthehousesizethegreaterthesellingprice
MeasuresofVariability
Range,StandardDeviation,Variance,CoefficientofVariation
MeasuresofRelativeStanding
Percentiles,Quartiles
MeasuresofLinearRelationship
Covariance,Correlation,LeastSquaresLine
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Arithmetic Mean
PopulationMean
SampleMean
Sample
Size
Mean
Measures of Variability
Measuresofcentrallocationfailtotellthewholestoryabout
thedistribution;thatis,howmucharetheobservations
spreadoutaroundthemeanvalue?
For example, two sets of
class grades are shown. The
mean (=50) is the same in
each case
But, the red class has
greater variability than the
blue class.
Range
Therangeisthesimplestmeasureofvariability,calculated
as:
Range=LargestobservationSmallestobservation
E.g.
Data:{4,4,4,4,50}
Range=46
Data:{4,8,15,24,39,50}
Range=46
Therangeisthesameinbothcases,
butthedatasetshaveverydifferentdistributions
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sample
Size
Mean
Variance
Variance
population mean
Thevarianceofapopulationis:
population size
Thevarianceofasampleis:
Note! the denominator is sample size (n) minus one !
sample mean
Application
Example4.7.Thefollowingsampleconsistsofthenumber
ofjobssixrandomlyselectedstudentsappliedfor:17,15,
23,7,9,13.
Findsitsmeanandvariance.
Whatarewelookingtocalculate?
Thefollowingsampleconsistsofthenumberofjobssix
randomlyselectedstudentsappliedfor:17,15,23,7,9,13.
Findsitsmeanandvariance.
asopposedtoor2
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
SampleVariance
SampleVariance(shortcutmethod)
Standard Deviation
Thestandarddeviationissimplythesquarerootofthe
variance,thus:
Populationstandarddeviation:
Samplestandarddeviation:
Standard Deviation
ConsiderExample4.8whereagolfclubmanufacturerhas
designedanewclubandwantstodetermineifitishitmore
consistently(i.e.withlessvariability)thanwithanoldclub.
UsingTools > Data Analysis [may need to add in > Descriptive
StatisticsinExcel,weproducethefollowingtablesfor
interpretation
Yougetmore
consistent
distancewiththe
newclub.
shaped
Amoregeneralinterpretationofthestandarddeviationis
derivedfromChebysheffsTheorem,whichappliestoall
shapesofhistograms(notjustbellshaped).
Theproportionofobservationsinanysamplethatlie
withinkstandarddeviationsofthemeanisatleast:
For k=2 (say), the theorem
states that at least 3/4 of all
observations lie within 2
standard deviations of the
mean. This is a lower bound
compared to Empirical Rules
approximation (95%).
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Box Plots
Theseboxplotsarebasedon
datainXm0415.
Wendysservicetimeis
shortestandleastvariable.
Hardeeshasthegreatest
variability,whileJackin
theBoxhasthelongest
servicetimes.
Sampling
Recallthatstatisticalinferencepermitsustodraw
conclusionsaboutapopulationbasedonasample.
Sampling(i.e.selectingasubsetofawholepopulation)is
oftendoneforreasonsofcost(itslessexpensivetosample
1,000televisionviewersthan100millionTVviewers)and
practicality(e.g.performingacrashtestonevery
automobileproducedisimpractical).
Inanycase,thesampledpopulationandthetarget
populationshouldbesimilartooneanother.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Plans
Asamplingplanisjustamethodorprocedurefor
specifyinghowasamplewillbetakenfromapopulation.
Wewillfocusourattentiononthesethreemethods:
SimpleRandomSampling,
StratifiedRandomSampling,and
ClusterSampling.
Cluster Sampling
Aclustersampleisasimplerandomsampleofgroupsor
clustersofelements(vs.asimplerandomsampleof
individualobjects).
Thismethodisusefulwhenitisdifficultorcostlytodevelop
acompletelistofthepopulationmembersorwhenthe
populationelementsarewidelydispersedgeographically.
Clustersamplingmayincreasesamplingerrordueto
similaritiesamongclustermembers.
Sampling Error
Samplingerrorreferstodifferencesbetweenthesampleand
thepopulationthatexistonlybecauseoftheobservations
thathappenedtobeselectedforthesample.
Anotherwaytolookatthisis:thedifferencesinresultsfor
differentsamples(ofthesamesize)isduetosamplingerror:
E.g.Twosamplesofsize10of1,000households.Ifwe
happenedtogetthehighestincomeleveldatapointsinour
firstsampleandallthelowestincomelevelsinthesecond,
thisdeltaisduetosamplingerror.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Nonsampling Error
Nonsamplingerrorsaremoreseriousandaredueto
mistakesmadeintheacquisitionofdataorduetothesample
observationsbeingselectedimproperly.Threetypesof
nonsamplingerrors:
Errorsindataacquisition,
Nonresponseerrors,and
Selectionbias.
Note:increasingthesamplesizewillnotreducethistypeof
error.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Approaches to Assigning
Probabilities
Therearethreewaystoassignaprobability,P(Oi),toan
outcome,Oi,namely:
Classicalapproach:makecertainassumptions(suchas
equallylikely,independence)aboutsituation.
Relativefrequency:assigningprobabilitiesbasedon
experimentationorhistoricaldata.
Subjectiveapproach:Assigningprobabilitiesbasedonthe
assignorsjudgment.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Interpreting Probability
Onewaytointerpretprobabilityisthis:
Ifarandomexperimentisrepeatedaninfinitenumberof
times,therelativefrequencyforanygivenoutcomeisthe
probabilityofthisoutcome.
Forexample,theprobabilityofheadsinflipofabalanced
coinis.5,determinedusingtheclassicalapproach.The
probabilityisinterpretedasbeingthelongtermrelative
frequencyofheadsifthecoinisflippedaninfinitenumber
oftimes.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Conditional Probability
Conditionalprobabilityisusedtodeterminehowtwoevents
arerelated;thatis,wecandeterminetheprobabilityofone
eventgiventheoccurrenceofanotherrelatedevent.
ConditionalprobabilitiesarewrittenasP(A|B)andreadas
theprobabilityofAgivenBandiscalculatedas:
Independence
Oneoftheobjectivesofcalculatingconditionalprobability
istodeterminewhethertwoeventsarerelated.
Inparticular,wewouldliketoknowwhethertheyare
independent,thatis,iftheprobabilityofoneeventisnot
affectedbytheoccurrenceoftheotherevent.
TwoeventsAandBaresaidtobeindependentif
P(A|B)=P(A)
or
P(B|A)=P(B)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Complement Rule
ThecomplementofaneventAistheeventthatoccurswhen
Adoesnotoccur.
Thecomplementrulegivesustheprobabilityofanevent
NOToccurring.Thatis:
P(AC)=1P(A)
Forexample,inthesimplerollofadie,theprobabilityofthe
number1beingrolledis1/6.Theprobabilitythatsome
numberotherthan1willberolledis11/6=5/6.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Multiplication Rule
Themultiplicationruleisusedtocalculatethejoint
probabilityoftwoevents.Itisbasedontheformulafor
conditionalprobabilitydefinedearlier:
IfwemultiplybothsidesoftheequationbyP(B)wehave:
P(AandB)=P(A|B)P(B)
Likewise,P(AandB)=P(B|A)P(A)
IfAandBareindependentevents,thenP(AandB)=P(A)P(B)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Addition Rule
Recall:theadditionrulewasintroducedearliertoprovidea
waytocomputetheprobabilityofeventAorBorbothA
andBoccurring;i.e.theunionofAandB.
P(AorB)=P(A)+P(B)P(AandB)
WhydowesubtractthejointprobabilityP(AandB)from
thesumoftheprobabilitiesofAandB?
P(AorB)=P(A)+P(B)P(AandB)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
IfandAandBaremutuallyexclusivetheoccurrenceofone
eventmakestheotheroneimpossible.Thismeansthat
P(AandB)=0
Theadditionruleformutuallyexclusiveeventsis
P(AorB)=P(A)+P(B)
Weoftenusethisformwhenweaddsomejointprobabilities
calculatedfromaprobabilitytree
2. E(X+c)=E(X)+c
3. E(cX)=cE(X)
Wecanpullaconstantoutoftheexpectedvalueexpression
(eitheraspartofasumwitharandomvariableXorasacoefficient
ofrandomvariableX).
Laws of Variance
1. V(c)=0
Thevarianceofaconstant(c)iszero.
2. V(X+c)=V(X)
Thevarianceofarandomvariableandaconstantisjustthe
varianceoftherandomvariable(per1above).
3. V(cX)=c2V(X)
Thevarianceofarandomvariableandaconstantcoefficientis
thecoefficientsquaredtimesthevarianceoftherandomvariable.
Binomial Distribution
Thebinomialdistributionistheprobabilitydistributionthat
resultsfromdoingabinomialexperiment.Binomial
experimentshavethefollowingproperties:
1. Fixednumberoftrials,representedasn.
2. Eachtrialhastwopossibleoutcomes,asuccessanda
failure.
3. P(success)=p(andthus:P(failure)=1p),foralltrials.
4. Thetrialsareindependent,whichmeansthatthe
outcomeofonetrialdoesnotaffecttheoutcomesofany
othertrials.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Table
WhatistheprobabilitythatPatfailsthequiz?
i.e.whatisP(X4),givenP(success)=.20andn=10?
P(X4)=.967
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Table
WhatistheprobabilitythatPatgetstwoanswerscorrect?
i.e.whatisP(X=2),givenP(success)=.20andn=10?
P(X=2)=P(X2)P(X1)=.678.376=.302
remember, the table shows cumulative probabilities
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Binomial Distribution
Asyoumightexpect,statisticianshavedevelopedgeneral
formulasforthemean,variance,andstandarddeviationofa
binomialrandomvariable.Theyare:
Poisson Distribution
NamedforSimeonPoisson,thePoissondistributionisa
discreteprobabilitydistributionandreferstothenumberof
events(a.k.a.successes)withinaspecifictimeperiodor
regionofspace.Forexample:
Thenumberofcarsarrivingataservicestationin1hour.(The
intervaloftimeis1hour.)
Thenumberofflawsinaboltofcloth.(Thespecificregionisa
boltofcloth.)
Thenumberofaccidentsin1dayonaparticularstretchof
highway.(Theintervalisdefinedbybothtime,1day,andspace,
theparticularstretchofhighway.)
Poisson Distribution
ThePoissonrandomvariableisthenumberofsuccesses
thatoccurinaperiodoftimeoranintervalofspaceina
Poissonexperiment.
successes
E.g.Onaverage,96trucksarriveatabordercrossing
everyhour.
time
period
E.g.Thenumberoftypographicerrorsinanewtextbook
editionaverages1.5per100pages.
successes
(?!)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
interval
andeisthenaturallogarithmbase.
FYI:
Example 7.12
Thenumberoftypographicalerrorsinneweditionsof
textbooksvariesconsiderablyfrombooktobook.After
someanalysisheconcludesthatthenumberoferrorsis
Poissondistributedwithameanof1.5per100pages.The
instructorrandomlyselects100pagesofanewbook.What
istheprobabilitythattherearenotypos?
Thatis,whatisP(X=0)giventhat=1.5?
Thereisabouta22%chanceoffindingzeroerrors
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Poisson Distribution
AsmentionedonthePoissonexperimentslide:
Theprobabilityofasuccessisproportionaltothesizeof
theinterval
Thus,knowinganerrorrateof1.5typosper100pages,we
candetermineameanvaluefora400pagebookas:
=1.5(4)=6typos/400pages.
Example 7.13
Fora400pagebook,whatistheprobabilitythatthereare
notypos?
P(X=0)=
thereisaverysmallchancetherearenotypos
Example 7.13
Excelisanevenbetteralternative:
2) Thetotalareaunderthecurvebetweenaandbis1.0
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Itlookslikethis:
Bellshaped,
Symmetricalaroundthemean
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Thenormaldistributionisbellshapedand
symmetricalaboutthemean
Unliketherangeoftheuniformdistribution(axb)
Normaldistributionsrangefromminusinfinitytoplusinfinity
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Asweshallseeshortly,anynormaldistributioncanbe
convertedtoastandardnormaldistributionwithsimple
algebra.Thismakescalculationsmucheasier.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Some advice:
always draw a
picture!
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Whatistheprobabilitythatacomputerisassembledina
timebetween45and60minutes?
Algebraicallyspeaking,whatisP(45<X<60)?
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
meanof50minutesanda
standarddeviationof10minutes
WecanbreakupP(.5<Z<1)into:
P(.5<Z<0)+P(0<Z<1)
Thedistributionissymmetricaroundzero,sowehave:
P(.5<Z<0)=P(0<Z<.5)
Hence:P(.5<Z<1)=P(0<Z<.5)+P(0<Z<1)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
z
0
1.6
-2.23
2.23
P(Z < 0) = .5
z
0
1.52
z
0
0.9
1.9
P(0.9 < Z < 1.9) = P(0 < Z < 1.9) P(0 < Z < 0.9)
=.4713 .3159
= .1554
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Finding Values of Z
OtherZvaluesare
Z.05=1.645
Z.01=2.33
Student t Distribution
Herethelettertisusedtorepresenttherandomvariable,
hencethename.ThedensityfunctionfortheStudentt
distributionisasfollows
(nu)iscalledthedegreesoffreedom,and
(Gammafunction)is(k)=(k1)(k2)(2)(1)
Student t Distribution
Inmuchthesamewaythatanddefinethenormal
distribution,,thedegreesoffreedom,definestheStudent
tDistribution:
Figure 8.24
Asthenumberofdegreesoffreedomincreases,thet
distributionapproachesthestandardnormaldistribution.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
ThevaluesforAarepredetermined
criticalvalues,typicallyinthe
10%,5%,2.5%,1%and1/2%range.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
freedomsuchthattheareaundertheStudenttcurveis.05:
Area under the curve value (tA) : COLUMN
t.05,10
t.05,10=1.812
F Distribution
TheFdensityfunctionisgivenby:
F>0.Twoparametersdefinethisdistribution,andlike
wevealreadyseentheseareagaindegreesoffreedom.
isthenumeratordegreesoffreedomand
isthedenominatordegreesoffreedom.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Determining Values of F
Forexample,whatisthevalueofFfor5%oftheareaunder
therighthandtailofthecurve,withanumeratordegreeof
freedomof3andadenominatordegreeoffreedomof7?
Solution:usetheFlookup(Table6)
There are different tables
for different values of A.
Make sure you start with
the correct table!!
F.05,3,7
F.05,3,7=4.35
Determining Values of F
Forareasunderthecurveonthelefthandsideofthecurve,
wecanleveragethefollowingrelationship:
Paycloseattentiontotheorderoftheterms!
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter 9
Sampling Distributions
1.100
1/6
1/6
1/6
1/6
1/6
1/6
andthemeanandvariancearecalculatedaswell:
Whilethereare36possiblesamplesofsize2,thereareonly
11valuesfor,andsome(e.g.=3.5)occurmore
frequentlythanothers(e.g.=1).
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
1/36
2/36
3/36
4/36
5/36
6/36
5/36
4/36
3/36
2/36
1/36
5/36
P()
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
P()
4/36
3/36
2/36
1/36
1.0
1.5
2.0
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Compare
ComparethedistributionofX
1.0
1.5
2.0
withthesamplingdistributionof.
Aswell,notethat:
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2.5
3.0
3.5
4.0
4.5
5.0
5.5
6.0
Example 9.1(a)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysonebottle,whatistheprobabilitythatthe
bottlewillcontainmorethan32ounces?
Example 9.1(a)
WewanttofindP(X>32),whereXisnormallydistributed
and=32.2and=.3
thereisabouta75%chancethatasinglebottleofsoda
containsmorethan32oz.
Example 9.1(b)
Theforemanofabottlingplanthasobservedthattheamount
ofsodaineach32ouncebottleisactuallyanormally
distributedrandomvariable,withameanof32.2ouncesand
astandarddeviationof.3ounce.
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Example 9.1(b)
WewanttofindP(X>32),whereXisnormallydistributed
with=32.2and=.3
Thingsweknow:
1) Xisnormallydistributed,thereforesowillX.
2) =32.2oz.
3)
Example 9.1(b)
Ifacustomerbuysacartonoffourbottles,whatisthe
probabilitythatthemeanamountofthefourbottleswillbe
greaterthan32ounces?
Thereisabouta91%chancethemeanofthefourbottles
willexceed32oz.
Graphically Speaking
mean=32.
2
Sampling Distribution:
Difference of two
means
Thefinalsamplingdistributionintroducedisthatofthe
differencebetweentwosamplemeans.Thisrequires:
independentrandomsamplesbedrawnfromeachoftwo
normalpopulations
Ifthisconditionismet,thenthesamplingdistributionofthe
differencebetweenthetwosamplemeans,i.e.
willbenormallydistributed.
(note:ifthetwopopulationsarenotbothnormally
distributed,butthesamplesizesarelarge(>30),the
distributionofisapproximatelynormal)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Sampling Distribution:
Difference of two
means
Theexpectedvalueandvarianceofthesampling
distributionofaregivenby:
mean:
standarddeviation:
(alsocalledthestandarderrorifthedifferencebetweentwo
means)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Estimation
Therearetwotypesofinference:estimationandhypothesis
testing;estimationisintroducedfirst.
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
E.g.,thesamplemean()isemployedtoestimatethe
populationmean().
Estimation
Theobjectiveofestimationistodeterminetheapproximate
valueofapopulationparameteronthebasisofasample
statistic.
Therearetwotypesofestimators:
PointEstimator
IntervalEstimator
pointestimate
intervalestimate
Analternativestatementis:
Themeanincomeisbetween380and420$/week.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Estimating
when
is known
WeestablishedinChapter9:
the confidence
interval
Thus,theprobabilitythattheinterval:
containsthepopulationmeanis1.Thisisa
confidenceintervalestimatorfor.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Table 10.1
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 10.1
Acomputercompanysamplesdemandduringleadtimeover
25timeperiods:
235
421
394
261
386
374
361
439
374
316
309
514
348
302
296
499
462
344
466
332
253
369
330
535
334
Itsisknownthatthestandarddeviationofdemandoverlead
timeis75computers.Wewanttoestimatethemeandemand
overleadtimewith95%confidenceinordertosetinventory
levels
Example 10.1
CALCULATE
Inordertouseourconfidenceintervalestimator,weneedthe
followingpiecesofdata:
370.16
Calculatedfromthedata
1.96
75
25
Given
therefore:
Thelowerandupperconfidencelimitsare340.76and399.56.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 10.1
INTERPRET
Theestimationforthemeandemandduringleadtimelies
between340.76and399.56wecanusethisasinputin
developinganinventorypolicy.
Thatis,weestimatedthatthemeandemandduringleadtime
fallsbetween340.76and399.56,andthistypeofestimator
iscorrect95%ofthetime.Thatalsomeansthat5%ofthe
timetheestimatorwillbeincorrect.
Incidentally,themediaoftenrefertothe95%figureas19
timesoutof20,whichemphasizesthelongrunaspectof
theconfidencelevel.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Interval Width
Awideintervalprovideslittleinformation.
Forexample,supposeweestimatewith95%confidencethat
anaccountantsaveragestartingsalaryisbetween$15,000
and$100,000.
Contrastthiswith:a95%confidenceintervalestimateof
startingsalariesbetween$42,000and$45,000.
Thesecondestimateismuchnarrower,providingaccounting
studentsmorepreciseinformationaboutstartingsalaries.
Interval Width
Thewidthoftheconfidenceintervalestimateisafunctionof
theconfidencelevel,thepopulationstandarddeviation,and
thesamplesize
thatis,toproducea95%confidenceintervalestimateofthe
mean(5units),weneedtosample865leadtimeperiods
(vs.the25datapointswehavecurrently).
Requiresasamplesizeofatleastthislarge:
Example 10.2
Alumbercompanymustestimatethemeandiameteroftrees
todeterminewhetherornotthereissufficientlumberto
harvestanareaofforest.Theyneedtoestimatethistowithin
1inchataconfidencelevelof99%.Thetreediametersare
normallydistributedwithastandarddeviationof6inches.
Howmanytreesneedtobesampled?
Example 10.2
Thingsweknow:
Confidencelevel=99%,therefore=.01
1
Wewant,henceW=1.
Wearegiventhat=6.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 10.2
Wecompute
Thatis,wewillneedtosampleatleast239treestohavea
99%confidenceintervalof
1
TypeIerror:Rejectatruenullhypothesis
TypeIIerror:Donotrejectafalsenullhypothesis.
P(TypeIerror)=
P(TypeIIerror)=
H0:thenullhypothesis
H1:thealternativeorresearchhypothesis
Thenullhypothesis(H0)willalwaysstatethattheparameter
equalsthevaluespecifiedinthealternativehypothesis(H1)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Types of Errors
ATypeIerroroccurswhenwerejectatruenullhypothesis
(i.e.RejectH0whenitisTRUE)
H0
Reject
Reject
II
ATypeIIerroroccurswhenwedontrejectafalsenull
hypothesis(i.e.DoNOTrejectH0whenitisFALSE)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Recap I
1)Twohypotheses:H0&H1
2)ASSUMEH0isTRUE
3)GOAL:determineifthereisenoughevidencetoinferthat
H1isTRUE
4)Twopossibledecisions:
RejectH0infavorofH1
NOTRejectH0infavorofH1
5)Twopossibletypesoferrors:
TypeI:rejectatrueH0[P(TypeI)=]
TypeII:notrejectafalseH0[P(TypeII)=]
Example 11.1
Adepartmentstoremanagerdeterminesthatanewbilling
systemwillbecosteffectiveonlyifthemeanmonthly
accountismorethan$170.
Arandomsampleof400monthlyaccountsisdrawn,for
whichthesamplemeanis$178.Theaccountsare
approximatelynormallydistributedwithastandarddeviation
of$65.
Canweconcludethatthenewsystemwillbecosteffective?
Example 11.1
Thesystemwillbecosteffectiveifthemeanaccountbalance
forallcustomersisgreaterthan$170.
Weexpressthisbeliefasaourresearchhypothesis,thatis:
H1:>170(thisiswhatwewanttodetermine)
Thus,ournullhypothesisbecomes:
H0:=170(thisspecifiesasinglevalueforthe
parameterofinterest)
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Whatwewanttoshow:
H1:>170
H0:=170(wellassumethisistrue)
Weknow:
n=400,
=178,and
=65
Hmm.Whattodonext?!
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Totestourhypotheses,wecanusetwodifferentapproaches:
Therejectionregionapproach(typicallyusedwhen
computingstatisticsmanually),and
Thepvalueapproach(whichisgenerallyusedwitha
computerandstatisticalsoftware).
Wewillexplorebothinturn
isthecriticalvalueoftorejectH0.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Allthatslefttodoiscalculateandcompareitto170.
wecancalculatethisbasedonanylevelof
significance()wewant
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.1
Ata5%significancelevel(i.e.=0.05),weget
Solvingwecompute=175.34
Sinceoursamplemean(178)isgreaterthanthecriticalvaluewe
calculated(175.34),werejectthenullhypothesisinfavorofH1,i.e.
that:>170andthatitiscosteffectivetoinstallthenewbilling
system
H1:>170
H0:=170
RejectH0infavorof
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
=175.34
=178
andcompareitsresultto:(rejectionregion:z>)
Sincez=2.46>1.645(z.05),werejectH0infavorofH1
p-Value
Thepvalueofatestistheprobabilityofobservingatest
statisticatleastasextremeastheonecomputedgiventhat
thenullhypothesisistrue.
Inthecaseofourdepartmentstoreexample,whatisthe
probabilityofobservingasamplemeanatleastasextreme
astheonealreadyobserved(i.e.=178),giventhatthenull
hypothesis(H0:=170)istrue?
p-value
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter-Opening Example
Theobjectiveofthestudyistodrawaconclusionaboutthe
meanpaymentperiod.Thus,theparametertobetestedisthe
populationmean.Wewanttoknowwhetherthereisenough
statisticalevidencetoshowthatthepopulationmeanisless
than22days.Thus,thealternativehypothesisis
H1:<22
Thenullhypothesisis
H0:=22
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Chapter-Opening Example
Theteststatisticis
z
x
/ n
Wewishtorejectthenullhypothesisinfavorofthe
alternativeonlyifthesamplemeanandhencethevalueof
theteststatisticissmallenough.Asaresultwelocatethe
rejectionregioninthelefttailofthesamplingdistribution.
Wesetthesignificancelevelat10%.
Chapter-Opening Example
z z z.10 1.28
Rejectionregion:
FromthedatainSSAwecompute
and
4,759
21.63
220
220
x
/ n
21.63 22
6 / 220
.91
pvalue=P(Z<.91)=.5.3186=.1814
Chapter-Opening Example
Conclusion:Thereisnotenoughevidencetoinferthatthe
meanislessthan22.
Thereisnotenoughevidencetoinferthattheplanwillbe
profitable.
SinceZ(.91)>Z.10(1.28)
WefailtorejectHo: > 22
at a 10% level of significance.
Right-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()
Left-Tail Testing
Calculatethecriticalvalueofthemean()andcompare
againsttheobservedvalueofthesamplemean()
TwoTail Testing
Twotailtestingisusedwhenwewanttotestaresearch
hypothesisthataparameterisnotequal()tosomevalue
Example 11.2
AT&Tsarguesthatitsratesaresuchthatcustomerswont
seeadifferenceintheirphonebillsbetweenthemandtheir
competitors.Theycalculatethemeanandstandarddeviation
foralltheircustomersat$17.09and$3.87(respectively).
Theythensample100customersatrandomandrecalculatea
monthlyphonebillbasedoncompetitorsrates.
Whatwewanttoshowiswhetherornot:
H1:17.09.Wedothisbyassumingthat:
H0:=17.09
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.2
Therejectionregionissetupsowecanrejectthenull
hypothesiswhentheteststatisticislargeorwhenitissmall.
statissmall
statislarge
Thatis,wesetupatwotailrejectionregion.Thetotalarea
intherejectionregionmustsumto,sowedividethis
probabilityby2.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 11.2
Ata5%significancelevel(i.e.=.05),wehave
/2=.025.Thus,z.025=1.96andourrejectionregionis:
z<1.96orz>1.96
z.025
+z.025
Example 11.2
Fromthedata,wecalculate=17.55
Usingourstandardizedteststatistic:
Wefindthat:
Sincez=1.19isnotgreaterthan1.96,norlessthan1.96
wecannotrejectthenullhypothesisinfavorofH1.Thatis
thereisinsufficientevidencetoinferthatthereisa
differencebetweenthebillsofAT&Tandthecompetitor.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Two-Tail Test
One-Tail Test
(right tail)
Population
Sample
Inference
Statistic
Parameter
Wewilldeveloptechniquestoestimateandtestthree
populationparameters:
PopulationMean
PopulationVariance
PopulationProportionp
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Buthowoftendoweknowtheactualpopulationvariance?
Instead,weusetheStudenttstatistic,givenby:
Testing
when
is unknown
Whenthepopulationstandarddeviationisunknownandthe
populationisnormal,theteststatisticfortestinghypotheses
aboutis:
whichisStudenttdistributedwith=n1degreesof
freedom.Theconfidenceintervalestimatorofisgiven
by:
Example 12.1
Willnewworkersachieve90%ofthelevelofexperienced
workerswithinoneweekofbeinghiredandtrained?
Experiencedworkerscanprocess500packages/hour,thusif
ourconjectureiscorrect,weexpectnewworkerstobeable
toprocess.90(500)=450packagesperhour.
Giventhedata,isthisthecase?
Example 12.1
IDENTIFY
Ourobjectiveistodescribethepopulationofthenumbersof
packagesprocessedin1hourbynewworkers,thatiswe
wanttoknowwhetherthenewworkersproductivityismore
than90%ofthatofexperiencedworkers.Thuswehave:
H1:>450
Thereforewesetourusualnullhypothesisto:
H0:=450
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 12.1
COMPUTE
Ourteststatisticis:
Withn=50datapoints,wehaven1=49degreesoffreedom.
Ourhypothesisunderquestionis:
H1:>450
Ourrejectionregionbecomes:
Thuswewillrejectthenullhypothesisinfavorofthe
alternativeifourcalculatedteststaticfallsinthisregion.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 12.1
COMPUTE
Fromthedata,wecalculate=460.38,s =38.83andthus:
Since
werejectH0infavorofH1,thatis,thereissufficient
evidencetoconcludethatthenewworkersareproducingat
morethan90%oftheaverageofexperiencedworkers.
Example 12.2
IDENTIFY
Canweestimatethereturnoninvestmentforcompaniesthat
wonqualityawards?
Wearegivenarandomsampleofn=83suchcompanies.
Wewanttoconstructa95%confidenceintervalforthemean
return,i.e.whatis:??
Example 12.2
Fromthedata,wecalculate:
Forthisterm
andso:
COMPUTE
populationsvariability,theparameterweneedto
investigateisthepopulationvariance:
Thesamplevariance(s2)isanunbiased,consistentand
efficientpointestimatorfor.Moreover,
thestatistic,,hasachisquareddistribution,
withn1degreesoffreedom.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
lower confidence
limit
upper confidence
limit
Example 12.3
IDENTIFY
Consideracontainerfillingmachine.Managementwantsa
machinetofill1liter(1,000ccs)sothatthatvarianceofthe
fillsislessthan1cc2.Arandomsampleofn=251literfills
weretaken.Doesthemachineperformasitshouldatthe5%
significancelevel?
Wewanttoshowthat:
H1:<1
(soournullhypothesisbecomes:H0:=1).Wewilluse
thisteststatistic:
Example 12.3
COMPUTE
Sinceouralternativehypothesisisphrasedas:
H1:<1
co
m
Wecomputerthesamplevariancetobe:s2=.8088
Andthusourteststatistictakesonthisvalue
pa
re
WewillrejectH0infavorofH1ifourteststatisticfallsinto
thisrejectionregion:
Example 12.4
Aswesaw,wecannotrejectthenullhypothesisinfavorof
thealternative.Thatis,thereisnotenoughevidencetoinfer
thattheclaimistrue.
Note:theresultdoesnotsaythatthevarianceisgreaterthan
1,ratheritmerelystatesthatweareunabletoshowthatthe
varianceislessthan1.
Wecouldestimate(at99%confidencesay)thevarianceof
thefills
Example 12.4
COMPUTE
Inordertocreateaconfidenceintervalestimateofthe
variance,weneedtheseformulae:
lower confidence
limit
upper confidence
limit
weknow(n1)s2=19.41fromourpreviouscalculation,and
wehavefromTable5inAppendixB:
Sampling Distribution of
1.isnormallydistributediftheoriginalpopulations
arenormalorapproximatelynormalifthepopulationsare
nonnormalandthesamplesizesarelarge(n1,n2>30)
2.Theexpectedvalueofis
3.Thevarianceofis
andthestandarderroris:
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
isastandardnormal(orapproximatelynormal)random
variable.Wecouldusethistobuildteststatisticsor
confidenceintervalestimatorsfor
Insteadweuseatstatistic.Weconsidertwocasesforthe
unknownpopulationvariances:whenwebelievetheyare
equalandconverselywhentheyarenotequal.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
2) anduseithere:
degrees of freedom
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
CI Estimator for
(equal
variances)
Theconfidenceintervalestimatorforwhenthe
populationvariancesareequalisgivenby:
degrees of freedom
degrees of freedom
Likewise,theconfidenceintervalestimatoris:
Example 13.2
IDENTIFY
Twomethodsarebeingtestedforassemblingofficechairs.
Assemblytimesarerecorded(25timesforeachmethod).At
a5%significancelevel,dotheassemblytimesforthetwo
methodsdiffer?
Thatis,H1:
Hence,ournullhypothesisbecomes:H0:
Reminder:Thisisatwotailedtest.
Example 13.2
COMPUTE
Theassemblytimesforeachofthetwomethodsare
recordedandpreliminarydataisprepared
Example 13.2
COMPUTE
Recall,wearedoingatwotailedtest,hencetherejection
regionwillbe:
Thenumberofdegreesoffreedomis:
Henceourcriticalvaluesoft(andourrejectionregion)
becomes:
Example 13.2
COMPUTE
Inordertocalculateourtstatistic,weneedtofirstcalculate
thepooledvarianceestimator,followedbythetstatistic
Example 13.2
INTERPRET
Sinceourcalculatedtstatisticdoesnotfallintotherejection
region,wecannotrejectH0infavorofH1,thatis,thereisnot
sufficientevidencetoinferthatthemeanassemblytimes
differ.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 13.2
INTERPRET
Excel,ofcourse,alsoprovidesuswiththeinformation
Compare
or look at p-value
Confidence Interval
Wecancomputea95%confidenceintervalestimateforthe
differenceinmeanassemblytimesas:
Thatis,weestimatethemeandifferencebetweenthetwo
assemblymethodsbetween.36and.96minutes.Note:zero
isincludedinthisconfidenceinterval
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Identifying Factors
Factorsthatidentifythettestandestimatorof:
Whenlookingattwopopulationvariances,weconsiderthe
ratioofthevariances,i.e.theparameterofinteresttousis:
Thesamplingstatistic:isFdistributedwith
degreesoffreedom.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 13.6
IDENTIFY
Inexample13.1,welookedatthevariancesofthesamples
ofpeoplewhoconsumedhighfibercerealandthosewhodid
notandassumedtheywerenotequal.Wecanusetheideas
justdevelopedtotestifthisisinfactthecase.
Wewanttoshow:H1:
(thevariancesarenotequaltoeachother)
Hencewehaveournullhypothesis:H0:
Example 13.6
CALCULATE
Sinceourresearchhypothesisis:H1:
Wearedoingatwotailedtest,andourrejectionregionis:
Example 13.6
CALCULATE
Ourteststatisticis:
.58
1.61
Hencethereissufficientevidencetorejectthenull
hypothesisinfavorofthealternative;thatis,thereisa
differenceinthevariancebetweenthetwopopulations.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.
Example 13.6
INTERPRET
WemayneedtoworkwiththeExceloutputbeforedrawing
conclusions
Our research hypothesis
H1:
requires two-tail testing,
but Excel only gives us values
for one-tail testing
If we double the one-tail p-value Excel gives us, we have the pvalue of
the test were conducting (i.e. 2 x 0.0004 = 0.0008). Refer to
the text and CD Appendices for more detail.
Copyright 2005 Brooks/Cole, a division of Thomson Learning, Inc.