St1131 Cheatsheet

Descriptivestatistics>Collection,Presentation,Descriptionofsampledata.
Inferentialstatistics>Interpretvalues,makingdecisions,drawingconclusions.
Qualitativevariables(a)describesorcategorizeselementofpop.(haircolor,satisfaction
level)
(a) Nominalvariablethatcharacterizesanelement.
(a) Ordinalvariablethatincorporatesanorderedposition/ranking.
Quantitativevariable(b)quantifiesanelementofpopulation.(numericalpriceof
booksetc)
(b)Discretevariablethatcanassumesacountablenumberofvalues.(gap)
(b)Continuousvariablethatassumesanuncountableno.ofvalues.(Nogap)
ExperimentFullycontrolledactivitytoselectelementsanddatavalue
Independent(treatment/explanatory/predictor/x)variableinfluencesdependent
(subject/response/outcome/y)variable.
Experimenta.k.arandomizedcomparativeexperiment.
Randomallocation1.Preventbias;2.Balancethegroupsonvariablesthatyouknow
affecttheresponse;3.Balancethegroupsonlurkingvariablesunknowntoyou.
Compareprimaryvssecondarytreatmentresultstoanalyzeeffectivenessofprimary
treatment.
Placeboeffect=responsetoadummytreatment.Provethatobservedeffectoftreatmentis
notduetosimplyplaceboeffect.
BlindingSubjectgroupsallocationunknowntothem.Datacollectorsandexperimenters
(doubleblinding).
Advantage:1.effectofxonycanbestudiedaccurately;2.reducespotentialforlucking
variables.
E.g.experimentunits200volunteers;treatmentsnewmedicine;ybloodpressure;x
treatmenttype.
Forfindingcauseandeffectuseexperiment.
ObservationalStudyNocontrolactivity
Naturalsetting,drawconclusionsbasedonobservationsw/odoinganythingtosubjects.
Doneforfactorsnotmodifiable.
Doneforotherwiseunethicalexperimentsornotpractical.
Lurkingvariablepresent.
Retrospective(casecontrol)study:Lookbacktocollectinfo.Infoavail.onsubjectswith
responseoutcomeofinterest.(pasttoimmediate)
Prospective(cohort)study:Intothefuture.
E.g.responseoutcomeofinterestLungcancer;caseswithlungcancer;controls;w/o
lungcancer.ExplanatoryvariablePastsmoker?
Forspottingassociationsuseobservationalstudy.
Itdoesnotmatterwhattheresponsevariableis;therewillmostlikelybevariabilityinthe
dataifthetoolofmeasurementispreciseenough.
VariabilityExtenttowhichdatavaluesforaparticularvariabledifferfromeachother.
StatisticalprocesscontrolControllingthevariabilityinamanufacturingprocessisafield
allitsown.
1.FindHandLandcalculaterange.
2.Selectnumberofclasses(m=7)andclasswidth(c=10)sothat(mc=70)
isabitlargerthantherange(letssayrange=59)
3.Pickastartingpoint.ItshouldbealittlesmallerthanL.Countupbytens
(classwidth).Wegettheclassboundaries(35<=x<45)
Thereisupperandlowerboundaries(35and45)
WelosestherawdatawhenusingGFD
Constructinghistogram:Title,verticalscaleforfrequencies/relative
frequencies,horizontalscalelabelingusingclassboundariesorclassmidpoints.
Histogramabargraphthatrepresentsafrequencydistributionofa
quantitativevariable.
DescriptionforhistogramsSymmetrical,Normal(triangle),
Uniform(rectangle),Skewed(with2tails),Jshaped(notailonthe
sidewithhighestfrequency),Bimodal.
Modalclassistheclasswiththehighestfrequency.
Bimodaldisthastwohighfrequencyclasses.Noneedtobethesame.
Cumulativefrequencydistributionisthesumoffrequencyforthatclassandthose
below.
Cumulativerelativefrequencydistribution.Sumoffreqovertotalfreq.
OgiveisalinegraphofaCFDorCRFD
MeasureofCentralTendency
SampleMean,
x ,
.PopulationMean,
x
=
N
SampleMedian(rankedmiddleposition),
~
x.
th
position.If
positionis3.5th,findthesumofthe3rdand4thvalueanddivideby2.
BiasedsamplingmethodProducesdatathatsystemicallydiffersfrom Modeisthevalueofxthatismostfrequent.Ifthereisatie,thereis
thesampledpop.Repeatedsamplingisnouse.
nomode.
UnbiasedsamplingmethodDatarepresentativeofthesampledpopulation.
Conveniencesampleoragrabsampleitemschosenarbitrarilyandinan Midrange=
unstructuredmannerfrompop.
Volunteersampleresultsfromthosewhochosetocontributetheneeded
MeanandMedianaremostimportant.
infoontheirowninitiative.Thosewithstrongfeelingswilltakesurvey.
RoundoffRulekeeponemored.p.inanswerthanwasgivenin
Collectionofdataforstatisticalanalysis:
originalinformation.Onlyroundofffinalanswer,notintermediate
Defineobjectivesofsurveyorstudy.
Definethevariableandpopulationofinterest.
steps.
L+ M
2
1.
2.
3.
Definedatacollectionanddatameasuringschemes.
4.
Collectsample.
5.
Reviewsamplingprocessuponcompletionofcollection
Censusisforpopulation.Commonlysamplesurveyusedinstead.
Samplingframelistofthoseinpopwhichsamplewillbedrawn.Mustberepresentative
ofthepopulation.
Processofselectionofsampleelementiscalledsampledesign.
Sampledesignprobabilitysamplesorjudgmentsamples.
Judgmentsamplenonrandomsampleselectedbasedonopinionofexpert.Biased.
Probabilitysampleeachelementshasacertainprobability.
ProbabilitysampleSinglestagevsMultistage.
SinglestagesamplingAllelementsaretreatedequallyandthereisnosubdividingor
partitioningoftheframe.
SimpleRandomSample(SRS)Everyelementhasanequalprobabilityofbeingchosen.
Withreplacementfromafinitepop,orwithoutreplacementfromaninfinitepop.Assigna
numbertoeachelementandpickfromitrandomly.Unbiased.
SRSisaninefficienttechnique.
Inherentintheconceptofrandomnessistheideathatthenextresultisnotpredictable.
SystematicsamplingmethodSelectingeverykthitemstartingfromafirstelement.
Ifwedesirea3%systematicsample,welocatethefirstietmbyrandomly
selectinganintegerbetw.1and33(
100
X
chosen,thenpick23 ,56 ,89 etc.

th
th
th
Easytodescribeandexecute.
Disproportionalifsamplingframeisrepetitiveorcyclical.
=33).Supposed23was
MeasureofDispersion
Range=HL
Meandeviation=
x x
n
2
( xx )
n1
Samplevariance=s2=
=s2=
( x)
x n
n1
2
2=
PopulationVariance=
( x)2
N
( xx )
isalwayszero.(check)
Multistagemethodsforverylargepopulation.
(a)MultistagerandomsampleElementsaresubdividedandsample
2
choseninmorethan1stage.
Samplestandarddeviation=s=
(b)Stratifiedrandomsamplesubdividepopintostratas(usuallyby
naturaloccurringsubdivision)andthendrawasubsamplefromeachstrata EmpiricalRuleandTestingforNormality(68%,95%,
99.7%)1s.d68%.2s.d95%.3s.d99.7%.Iftrue,distributionis
normalbellshaped.
bySRSorsystem.(90x
=20)Finallycombinetodraw
ChebyshevsTheoremifdistributionisnotnormal,useCTto
findhowmuchdatawillfallwithinintervalscenteredatthemeanfor
conclusiononpopulation.Stratashouldbemutuallyexclusive.Stratashould
alldistributions.Proportionofanydistributionthatlieswithink
becollectivelyexhaustive:nopopelementexcluded.
(2b)Proportionalstratifiedsamplingisjustmakingsureno.ofitems
fromeachstratumisproportionaltosizeofthestrata.
standarddeviationofthemeanisatleast1
wherekis
2
(c)ClustersampleUseSRSorsystemtoselectthestrata(clusters)(1st
stage)thenuseSRSorsystemAGAINtoselectelementsfromeach
any+veno.greaterthan1.
cluster.(2ndstage).Adv:Cheaper,lowadmincostandtravelcost.Higher
MeasureofPosition
samplingerror.
QualitativeData
Q1=P25,atmost25%ofdataaresmallerinvaluethanQ1andatmost
40
180
1
k
~
x
PieChartsDataincircle.
75%arelarger.Q2=P50=
isthemedian.
BarGraphsDataasproportionalsizedrectangle.
Paretodiagrambarsarearrangedfrommosttoleast,highesttolowest(fornominal).It Findingpercentile:1.Rankndatalowesttohighest.
alsoincludesalinegraphdisplayingcumulative%andcountsforthebars.
QuantitativeData
2.
Calculate
Distributionpatternofvariabilitydisplayedbydataofvariable.Displaysfrequencyof
eachvalueofvariable.
DotplotDisplayhorizontal/vertical.Eachdata=onedotalongascale.Sortsdataintoits
3.
IfintegerAresults,add0.5andd(Pk)=A.5.PkishalfwaybetweenAthand
numericalorder.Description:wherearemostofthedots;no.ofpeaks;skewedor
A+1thposition.
symmetrical;anyoutliers;numericalrange
4.
IfAisafraction,d(Pk)=B,Bisnextlargerinteger.Bthposition.
StemnLeafDisplayLeadingdigits=stem.Trailingdigits=leaf.E.g.leafunit=0.1,
stemunit=1.0,6.2=6|2).Innumericalorder.1.Numberofpeaks
1
3 .Thisisameasureofposition.
(unimodal/bimodal)2.Symmetrical.Min.5stems.
Midquartile=
Backtobackstemandleaffortwodistinctdistribution.
Sidebysidedotplotmustbesamescale.
Midrangeismeasureofcentraltendency.
FrequencydistributionAlistingexpressedaschartformthatpairsvaluesofavariable
withtheirfrequency.
Interquartilerange(IQR)=Q3Q1
Groupvsungroupedfreqdist.ungroupedwheneachxstandsalone.
5numbersummaryincludes:L,Q1,
/Q2,Q3,H
Class
Class
ClassBoundaries Frequency
ClassMidpoint/
Number
Tallies
mark
InterquartilerangeQ3Q1.Itisresistanttooutliers.
1
|||
50<=x<60
3
55
LLineQ1BoxQ2BoxQ3LineH
ConstructingGroupedFrequencyDistribution(mutuallyexclusiveandmutually
Lowerfence:Q11.5(IQR);Upperfence:Q3+1.5(IQR)
exhaustive):
nk
100
Q +Q
2
~
x
x x
s
;zscores
(standardscore)arerangedinvaluefrom3.00to+3.00(Empirical
orcheby)
Bivariatedata2variablesfromsamepopulationelements
Twoqualitative:
(a)
UsingCrosstabulationorcontingencytablewithmarginaltotals.Can
usepercentagetorepresentfrequencies.AlsoColumntotalandRowtotal.
(b)
UsingBarGraph,
Onequalitativeandonequantitative
(a)
Usingnormaltable,dotplot,boxandwhiskersusingcommonscale.
Twoquantitativepredicty(dependent)basedonx(independent).
(a)
ScatterDiagram.
LinearCorrelationtomeasurestrengthofalinearrelationshipbetweentwo
variables.
Itisnegativewhenytendstoincrease,positivewhenytendstodecrease.
Ifallpointsfallinastraightline>Perfectpositiveorperfectnegative.
Ifstraighthorizontalorverticalline>Nocorrelation.(r=0)
Iftheresapatternbutnotlinear>maybequadraticrelation.
Nocorrelation;positive;highpositive;negative;highnegative.
CalculatingLinearCorrelationCoefficient,r.Itisalwaysbetween+1and1.+1
meansperfectpositivecorrelation.
Pearsonsproductmomentformula:r=
.SxandSyarestandarddeviations
ofxandyvariables.Anotherform:r=
~
A
Thenyoustartcountingandlocatethed(
valuemean
st . dev .
x X
()( y y )
( n1 ) SxSy
PopulationMedian,M.Depthofmedian=d(
~ n+1
A =
2
Z=
( x x ) ( y y )
1
,
n1
Sx
Sy
adjusted
averagetimeszscoreofxandzscoreofy.
LinearCorrelationCoefficient,r=
SS ( xy )
SS ( x ) SS ( y )
x
SS(x)=
SS(y)=
2
x
y 2
SS(xy)=
xy
x y
n
Ristypicallyroundedtothenearesthundredth.
Asvalueofrchangesfrom0.0to1or+1,datapointsmovescloserto
astraightline.0meansnocorrelation.
CausationandLurkingVariables:Correlation
Causeand
effectrelationship
Lineofbestfitisfoundbyusingthemethodofleastsquares.
^y =b 0+ b1 x
Slopeb1=
SS (xy)
SS( x )
Yintercept,b0,
y( b1 . x )
n
b0 = y (b1 . x )
OR
Mustconsiderifx=0isarealisticxvaluebeforeyoucanconcludethat
( ^y )=
b0ifx=0.
TryNOTtoextrapolate.Canextrapolateabit.
Lineofbestfitalwayspassesthroughthecentroid
( x , y )
Mainreasonforfindingregressionequationistomakepredictions.
PropertyofProbabilitynumbers:
(a)
Probabilityis0ifeventcannotoccur.1ifitoccurseverytime.Otherwise
itsbetween0to1.
(b)
Sumofallprobabilitiesis1.Eventsmustbenonoverlapping.
(Exhaustive/allinclusive)
Probabilityofaneventcanbeobtainedby:
(a)
(b)
(c)
Theoretically/classical(equallylikely)
Empirically/experimental(usingdata)
Subjectively(personaljudgment)
MeanofBinomialDistribution,
=np
Samplespaceisalistingofallpossibleoutcomesfromexperimentbeingconsidered.
Mustbeequallylikelysamplepoints.(S)
StandardDeviation,
= npq
Probability=LongRunProportion
Aneventisasubsetofthesamplespace(A).
TheoreticalApproach:P(A)=
ShapeofBinomialdist;Symmetric,rightskewed,left
skewed.
Skewnessdecreaseswhennisincreased.Whennislarge,distis
approx.tonormaldistribution.
of
No.
No . of elements
NORMALPROBABILITYDISTRIBUTIONS
.Goodapproximationfor
P(a
Disjointevents,mutuallyexclusive=Nocommon
outcomes.
Disjointevents,P(AandB)=0
Notdisjoint,mutuallyinclusive=commonoutcomes.
P(AORB)=P(A)+P(B)P(AandB)
whennislarge,convertbinomialtonormalanduseZscoreto
calculateprobability.
Sameforleftorrightskewed.Increasen.
=np
(b)
= npq
ConvertingbinomialdiscreterandomvariableTOnormal
continuousrandomvariable.
=confidence
X Z ( /2 )
X + Z ( / 2)
n
n
( )
( )
1)
2)
3)
X isthe point estimate , centerpoint of C

z ( /2) istheconfidencecoefficient.
istheerrorprobability.
HigherCI=HigherInterval
4)
( n )
iscalledthemaximum
errorofestimate,E(marginoferror).
Lowerconfidencelimit(LCL)tohigherconfidencelimit
(UCL)
(usehigherntodropE)
Assumptionsforestimatingmean,
E.g.x=4>3.5<x<4.5alsocalledcontinuitycorrectionfactor.
,usingaknown
1)Thesampleisrandomlychosen;
each P ( x ) 1
P ( x )=1
= X
5)
StandardDeviation,
ConstantfunctioniswhenP(x)doesnotchangeevenwhenxchanges.
Everyprobabilityfunctionmustdisplay2basicproperties(checklist):
isknown.
ConfidenceIntervalforMean:
MeanofBinomialDistribution,
Whensamplingisdonew/oreplacement,itisdependent.Itis
independentwhensamplesizeisbig.
Ifbelow10%ofpopulation,independent.Anythingabove,itis
dependent.
(a)
npandnqLARGEROREQUALTO5,nisgreaterthan20
P(B|A)=
interval(CI)
Levelofconfidenceincludesparameterbeingestimated.
Moreskewed=morenforCLTtowork.
NORMALAPPROXIMATIONOFBINOMIAL
)=P(A),AandBare
P( AB)
P(A|B)=
P(B)
P( AB)
P (A)
Assume
Intervalestimate+levelofconfidence(1
REFERTOSTDNORMALDISTRIBUTIONTABLEFOR
PROBABILITY
Totalareaundercurve=1
Moundedandsymmetrical,extendsindefinitelyinbothdirections.
Aysmptotic.
Meandivideareabyhalf.
Nearlyallareabetweenz=3.00toz=3.00
Continuousandunimodal
TwoeventsareindependentifoccurrenceofAdoesnotaffect
probabilityofB.AandBareunrelated.
independent.
STANDARDNORMALDISTRIBUTIONS
x
X N ( , ) Z =
Z N ( =0, 2=1)
Objectiveistousethesampledatatoknowmoreaboutthepopulation.
1)estimatingvalueofapopparameter;2)testingahypothesis
ve/+veBias(under/overestimate)left/rightofmean
UnbiasedOnmean
Intervalestimateisanintervalofnumbersbetterestimate.
Becauseitincorporatesamarginoferrorwhichhelpstogaugethe
accuracyofpointestimate.
=less steep slope

2
InferentialStatistics
variable(SEissmall);2)unbiasedStatistics
x b = f ( x ) dx
Bigger
Complement : P ( A ) + P ( A )=1.0=P( S)
Fordependentevents:
P(AandB)=P(A)XP(B|A)or
P(AandB)=P(B)XP(A|B)
Forindependentevents:
P(AandB)=P(A)XP(B)
forallrealx.
e
2
y=f(x)=
P(Expected)ifwehavealargenumberoftrials.
Lawoflargenumbersorlongtermaveragethelargerthenumber
oftrials,theclosertheP(A)tobetruetotheP(A).
IfP(A|B)=P(A|
1 x
(
)
2
X
X X
X
=
X
Pointestimateisasinglenumberbestguessforparameter.
DoesntnottellusHOWCLOSE.
Anothersamplewillnotyieldthesameresult.(Questionsthequality
ofpointestimate)
Qualitycanbeenhancedbymakingthesamplestatistics:1)less
UsingaTreeDiagram.
EmpiricalApproach:P(A)=
of
No.
No . of trials
Z=
is known;
3.5
P ( x is not more than 3 )=P ( x 3 )=P ( x<3.5 )=P Z<
=P ( Z <2.05 )=0.0
(Ifassumptionsnotmet,LoCwillbelowerthanstated
.)
2)
3)SDSMhasanormaldistribution(ornislargeanduseCLT)
all x
5stepsconfidencesinterval:
ProbabilityHistogram=Probabilitydistribution
Populationparameters(mean,varianceandstandarddeviation)
SAMPLINGDISTRIBUTION
DISCRETEPROBABILITYDISTRIBUTION
(DISCRETERANDOMVARIABLES)
1)
2)
Step1:NullandAlt.
Describepopulationparameterofinterest
Checkforassumptions.Identifytheprobabilitydistribution
(a)
Therearenrepeatedidenticalindependenttrials.
(b)
Eachtrialshastwopossibleoutcome(success,failure)
andformulatouse.StateLoC1
( standard error ) .
X x ( pop mean of sample means ) x =SE
3)
Sampleinformation.(datagiven)
X isthe random variable .

/
4)
Determine
2 andEandLCL/UCL.
Completeinfo>samplingdist>
X X
Z
5)
Statetheconfidenceinterval.
x
LoCcontradictswidth.WecanhighLoCandlowwidth.Higher
confidenceinterval=HigherE
Incompleteinfo>empirical>
FindingSampleSize:
2
Z ( /2 ) .
X
n=(
) alwaysroundupn
E
X 1 , X 2 , X
3}
Elementsofsampledistribution{
SometimesifEisexpressedasmultipleof
,thenactual
Mean
isnotneeded.
X =
(c)
P(success)=p,P(failure)=q,p+q=1
StandardError
(d)
BinomialrandomvariableXisthecountofthenumberof
successfultrialsthatoccur.X={0,1,2,3,4}
, ofadiscreterandomvariable:
= [ xP ( x ) ]
Variance,
,ofadiscreterandomvariable:
2
2
=[ x P( x)] {[xP ( x ) ]}2
2= [ x2 P ( X ) ]2
Mean(expectedvalue),
or
LISTUSINGATABLE
BINOMIALPROBABILITYDISTRIBUTION
(SUCCESSORFAILURE)
Binomialprobabilitypossessesthefollowingproperties:
P(x),theprobabilitythattherewillbeexactlyxsuccessesinntrails,is;
P(x)=nCx(px)(qnx)forx=0,1,2,3,,n
Cxthebinomialcoefficient
sample
n
Ifsampledpopulationhasanormaldist,thenthesamplingdistof
ProbabilityFunction:
X =
n!
x ! ( nx ) !
Makingassumptionstomaketheexperimentindependent.
X
B ( n=12, p=0.05 ) B(12,0.05)
willalsobenormalforsamplesofallsize.
CentralLimitTheorem(CLT),assamplesizeincrease,
sampledistributionofsamplemean(SDSM)willmoreclosely
resemblenormaldistribution.(evenifsampledpopisnotnormal).If
big,canuseEmpiricalRuletoanalyse.
DescriptionofSDSM:1)Themean;2)StandardError;3)
Indicationofhowitisdistributed.
Uniform,Jshaped,Ushaped,NormalDistributions
Samplemeanbecomeslessvariableassamplesizeincrease.
(Narrowerdistribution)
Describepopulationparameterofinterest.
Nullhypothesisisastatementthattheparametertakesa
particularvalue(noeffect).
Alternativehypothesis:Twotailed,left/righttailed.
H0andHaMUSTbemutuallyexclusive.
Step2:Assumptions
Asignificancetestassumesthat
1)randomdataproduction;
2)Normaldistribution;
3)
is known
Whatteststatisticareyouusing?
Determinelevelofsignificance
Step3:Teststatistics
LISTdatagiven.
TestStatisticsforMean:
X
Z
Test statistics the mean difference for PAIRED
DATA,
d : (DEPENDENT)
pq
; 3 .normal if n isbig
d
d
n
t
ConfidenceIntervalforaProportion:
sd / n
p'q'
p'
q
'
n x
1; most common value of
d
p' z ( /2)
p '+ z ( /2) where dfp==
isn
zero
n
n
MUST define your d. For example: d = before
Using Z-statistics even though
after. It is customary to put big minus small.
unknown,
because t* -> s has extra error.
Statistical Significance
Practical
1.
Check 3 points of binomial normality: np, nq and
1 p = p ; 2 p =
'
Pvalue:Smallerthepvalue,thestrongertheevidenceagainst
H0.UseZ*tofindpvalue(twotailstimes2)
Classical:Use
tofindcriticalvalueandcritical
n.
2.
Check for independence <10%
Confidence Interval using p is for p (the parameter)
region.
Drawthegraph.(left/right/twotailed)
Pvaluesmallerthan
ORisitincritical
region.
significantleveltoshow
Higherpvalue,higherprobabilitytorejectH0.
Advantagesofpvalueapproach:
1. Resultsoftestprocedureareexpressedintermsofacontinuous
probabilityscalefrom0.0to1.0;
2. Pvaluecanbereported,usercandecideonhisowndpdonsituation;
3. Computerscandoallthecalculations.Noneedtables.
Significance
Mean Difference between Two Independent

Sample
2
X 1 X
X X =12
X X =
1
where p* and q* are provisional values (from pilot

study).
If p is not provided, use conservative approach p =
0.5, q = 0.5.
to estimate n:
z ( /2 ) . 2
n=(
) Always round up n
E
and standard error
2
1
+( 2 ) but more likely

n1
n2
tobeassmallaspossible.Smallerfor
moreseriouserrors.
Ifisreduced,theneithermustincreaseornmustbeincreased;if
isdecreased,theneitherincreasesornmustbeincreased;ifnis
decreased,theneitherincreasesorincreases.
, P(type II error) = P(do not reject Ho| Ho is
false) real
must be given calculate .

X a
P( Z
)
1
is called the power of the statistical test
ability of a hypothesis test to reject a false null
hypothesis.
One Population
Inferences
When
error,
is unknown, estimated standard
s
n
OR construct a 90% C.I. to reject or not reject.

I
df ,
s
+t
2
X
n
( )
If concerns mean, use t-distribution. If not use Z.

Same source(matched pair) -> dependent.
Unrelated source -> independent.
Dependent: Paired difference:
d = X1 X2
Mean of the Paired differences:
( )
X
B X A
d=
d is the point estimate of d
d
: (DEPENDENT)
isunknown):
X
s/n
Hypothesis testing is same as previous. Apply CLT

if sample size is more than 30.
Inferences about Binomial Probability of Success,
p
Sample Binomial Probability:
x
p' = unbiased estimator for p
n
df , / .
Sd
n
dt
( X 1 X 2 )( 12 )
s 21
s 22
+
n1
n2
Parameter of interest is
12 .
)( )
Assumptions: Samples are randomly selected from

normally distributed population, in an independent
manner. Since the two populations are separate, they are
independent.
Ho
: =0,
>0
H :
a
1
2
1
2
Proportion Difference betw 2 Independent
Samples
'
'
Properties of Sampling Distribution of
:
1
2
p p
p p
p1 p2 ;
Mean
'
1
'
2
p p
'
1
'
2
p1q1
p q
+ 2 2
n1
n2
)( )
Assumptions: The n1 and n2 are independently random

selected. Population should not be changed after
sampling
Normal distributed
Mean and standard deviation of the sample differences:
where df equals the smaller df .

Test Statistic for Mean
Difference(Independent):
n1 p '1 , n1 q'1 , n2 p'2 , n2 q'2 , all 5 ; n 1n2
where df = n 1
Assumptions:
Paired data are randomly selected and;
d
2
( d)
d n
Sd=
n1
2
s1
s
+ 2
n1
n2
Approximately normal dist. If n1 and n2 are sufficiently

large.
If samples consist of less than 10% of their
respective populations;
df , / .

d=
n
( ) ( )
2
( X 1 X 2 ) t ( df , /2 ) .
Sd
n
Confidence Interval for the Mean Difference

(Independent):
Standard Error
s
withdf=n1
n
isunknown ; using students
TestStatisticsforMean(
Two Population
Inferences
PAIRED DATA,
t-distribution.
Check for independence <10%
Confidence interval of the mean difference for
ConfidenceIntervalforMean:
Changes:
Check 3 points of binomial normality: np, nq and n.
X
s
n
At high dF(infinity), it will be the same as normal dist.

If dF is not listed, use the next smaller value of dF.
p ' p
pq/n
x ; p is for population proportion
with p =
n
Z
Students t-statistic =
df ,
X t
TestStatisticsforProportion:
( )
2
s1
s
+( 2 )
n1
n2
df will be the smaller between the 2 df. Therefore true

LoC will be slightly higher than the reported LoC.
Using
Wewant
has:
mean
2. p . q
n=
FailtorejectH0?
( )
Sample Distribution of
Finding sample size:
Step5:NullandAlt.hypothesisconclusion
Notsufficientevidenceat
( )
Step4:PvalueorClassicalApproach
'
Confidence Interval for Proportion Difference:
2
/
'
( p1 p'2) z
Test Statistic for Proportion Difference
Population Proportion Known
'
'
1
2
p p
pq
([ n1 )+( n1 )]
1
'
p 1p
'
'
p p qp
([ n1 )
1
When p1 or p2 are not specified, use pooled

probability pp
p'p=
x 1+ x 2
n 1+ n2
e= y ^y
Hypothesis Testing for linear correlation

coefficient:
=0 ( x , y unrelated ) ; H
<0()
Use the new table Assumptions: The set of (x,y)

ordered pairs forms a random sample, and the y values
at each x have a normal distribution.
Use r-distribution with n-2 df.
Linear model is used to explain bivariate data in
population:
^y =b 0+ b1 x+
is the random experimental error in the
observed value of y at a given value of x.
^y
e=0
Mean of e is 0
Variance is
Use r* = r (linear correlation coefficient)
(difference betw. mean value of
y and experimental
Linear Correlation and Regression Analysis

If r2 is 0.99, means can interpret as 99% of variation can
be explained by regression line. r2=1, all points on line.
H0:
b0 is our estimate of 0(pop y-intercept), and b1, our estimate of

1(pop slope).
b0 and b1 are not fixed because they are sample statistics.
Estimate of Experiment Error is approximated by
Variance is not the same at each Xi. Can see from

scatter plot.
Variance of the Estimated Error, e:
2
e
s=
( y 2 ) ( b 0 ) ( y )(b1 )( xy )
n2
You need
x , y , xy , x 2 , y 2n
Estimate for Variance of Slope:
s
s = e but this should be givenexa
SS (x )
2
b1
Confidence Interval for Slope test:
b1 t ( n2, /2 ) . S b 1 S b 1 given
Test Statistics for Slope 1=0 :
b
t 1 1
Sb 1
H:
1=0 ; H : 1> 0 (if can justify, if
0
not use two-tailed)

Assumptions about the linear correlation coefficient: The
set of (x,y) ordered pairs forms a random sample, and
the y values at each x have a normal distribution.
Use t-distribution with n-2 df.
95% C.I. for
y X
P.I. > C.I.

95% Prediction interval for
random at each x.
P.I. > C.I.
-> Pop Mean of y at each x
YX
-> Individual y at

St1131 Cheatsheet

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

St1131 Cheatsheet

Încărcat de

Drepturi de autor:

Formate disponibile

Descriptivestatistics>Collection,Presentation,Descriptionofsampledata.

chosen,thenpick23 ,56 ,89 etc.

X isthe point estimate , centerpoint of C

=less steep slope

X isthe random variable .

B ( n=12, p=0.05 ) B(12,0.05)

Test statistics the mean difference for PAIRED

Mean Difference between Two Independent

where p* and q* are provisional values (from pilot

and standard error

+( 2 ) but more likely

must be given calculate .

is unknown, estimated standard

OR construct a 90% C.I. to reject or not reject.

If concerns mean, use t-distribution. If not use Z.

Hypothesis testing is same as previous. Apply CLT

Assumptions: Samples are randomly selected from

Assumptions: The n1 and n2 are independently random

where df equals the smaller df .

n1 p '1 , n1 q'1 , n2 p'2 , n2 q'2 , all 5 ; n 1n2

Approximately normal dist. If n1 and n2 are sufficiently

Confidence Interval for the Mean Difference

Check for independence <10%

Confidence interval of the mean difference for

Check 3 points of binomial normality: np, nq and n.

At high dF(infinity), it will be the same as normal dist.

df will be the smaller between the 2 df. Therefore true

Finding sample size:

Confidence Interval for Proportion Difference:

When p1 or p2 are not specified, use pooled

Hypothesis Testing for linear correlation

Use the new table Assumptions: The set of (x,y)

observed value of y at a given value of x.

Use r* = r (linear correlation coefficient)

(difference betw. mean value of

Linear Correlation and Regression Analysis

b0 is our estimate of 0(pop y-intercept), and b1, our estimate of

Variance is not the same at each Xi. Can see from

Variance of the Estimated Error, e:

Confidence Interval for Slope test:

not use two-tailed)

P.I. > C.I.

P.I. > C.I.

-> Pop Mean of y at each x

S-ar putea să vă placă și