Sunteți pe pagina 1din 4

Descriptivestatistics>Collection,Presentation,Descriptionofsampledata.

Inferentialstatistics>Interpretvalues,makingdecisions,drawingconclusions.
Qualitativevariables(a)describesorcategorizeselementofpop.(haircolor,satisfaction
level)
(a) Nominalvariablethatcharacterizesanelement.
(a) Ordinalvariablethatincorporatesanorderedposition/ranking.
Quantitativevariable(b)quantifiesanelementofpopulation.(numericalpriceof
booksetc)
(b)Discretevariablethatcanassumesacountablenumberofvalues.(gap)
(b)Continuousvariablethatassumesanuncountableno.ofvalues.(Nogap)
ExperimentFullycontrolledactivitytoselectelementsanddatavalue
Independent(treatment/explanatory/predictor/x)variableinfluencesdependent
(subject/response/outcome/y)variable.
Experimenta.k.arandomizedcomparativeexperiment.
Randomallocation1.Preventbias;2.Balancethegroupsonvariablesthatyouknow
affecttheresponse;3.Balancethegroupsonlurkingvariablesunknowntoyou.
Compareprimaryvssecondarytreatmentresultstoanalyzeeffectivenessofprimary
treatment.
Placeboeffect=responsetoadummytreatment.Provethatobservedeffectoftreatmentis
notduetosimplyplaceboeffect.
BlindingSubjectgroupsallocationunknowntothem.Datacollectorsandexperimenters
(doubleblinding).
Advantage:1.effectofxonycanbestudiedaccurately;2.reducespotentialforlucking
variables.
E.g.experimentunits200volunteers;treatmentsnewmedicine;ybloodpressure;x
treatmenttype.
Forfindingcauseandeffectuseexperiment.
ObservationalStudyNocontrolactivity
Naturalsetting,drawconclusionsbasedonobservationsw/odoinganythingtosubjects.
Doneforfactorsnotmodifiable.
Doneforotherwiseunethicalexperimentsornotpractical.
Lurkingvariablepresent.
Retrospective(casecontrol)study:Lookbacktocollectinfo.Infoavail.onsubjectswith
responseoutcomeofinterest.(pasttoimmediate)
Prospective(cohort)study:Intothefuture.
E.g.responseoutcomeofinterestLungcancer;caseswithlungcancer;controls;w/o
lungcancer.ExplanatoryvariablePastsmoker?
Forspottingassociationsuseobservationalstudy.
Itdoesnotmatterwhattheresponsevariableis;therewillmostlikelybevariabilityinthe
dataifthetoolofmeasurementispreciseenough.
VariabilityExtenttowhichdatavaluesforaparticularvariabledifferfromeachother.
StatisticalprocesscontrolControllingthevariabilityinamanufacturingprocessisafield
allitsown.

1.FindHandLandcalculaterange.
2.Selectnumberofclasses(m=7)andclasswidth(c=10)sothat(mc=70)
isabitlargerthantherange(letssayrange=59)
3.Pickastartingpoint.ItshouldbealittlesmallerthanL.Countupbytens
(classwidth).Wegettheclassboundaries(35<=x<45)
Thereisupperandlowerboundaries(35and45)
WelosestherawdatawhenusingGFD
Constructinghistogram:Title,verticalscaleforfrequencies/relative
frequencies,horizontalscalelabelingusingclassboundariesorclassmidpoints.
Histogramabargraphthatrepresentsafrequencydistributionofa
quantitativevariable.

DescriptionforhistogramsSymmetrical,Normal(triangle),
Uniform(rectangle),Skewed(with2tails),Jshaped(notailonthe
sidewithhighestfrequency),Bimodal.
Modalclassistheclasswiththehighestfrequency.
Bimodaldisthastwohighfrequencyclasses.Noneedtobethesame.
Cumulativefrequencydistributionisthesumoffrequencyforthatclassandthose
below.
Cumulativerelativefrequencydistribution.Sumoffreqovertotalfreq.
OgiveisalinegraphofaCFDorCRFD

MeasureofCentralTendency
SampleMean,

x ,

.PopulationMean,

x
=
N

SampleMedian(rankedmiddleposition),

~
x.

th

position.If

positionis3.5th,findthesumofthe3rdand4thvalueanddivideby2.
BiasedsamplingmethodProducesdatathatsystemicallydiffersfrom Modeisthevalueofxthatismostfrequent.Ifthereisatie,thereis
thesampledpop.Repeatedsamplingisnouse.
nomode.
UnbiasedsamplingmethodDatarepresentativeofthesampledpopulation.
Conveniencesampleoragrabsampleitemschosenarbitrarilyandinan Midrange=

unstructuredmannerfrompop.
Volunteersampleresultsfromthosewhochosetocontributetheneeded
MeanandMedianaremostimportant.
infoontheirowninitiative.Thosewithstrongfeelingswilltakesurvey.
RoundoffRulekeeponemored.p.inanswerthanwasgivenin
Collectionofdataforstatisticalanalysis:
originalinformation.Onlyroundofffinalanswer,notintermediate
Defineobjectivesofsurveyorstudy.
Definethevariableandpopulationofinterest.
steps.

L+ M
2

1.
2.
3.
Definedatacollectionanddatameasuringschemes.
4.
Collectsample.
5.
Reviewsamplingprocessuponcompletionofcollection
Censusisforpopulation.Commonlysamplesurveyusedinstead.
Samplingframelistofthoseinpopwhichsamplewillbedrawn.Mustberepresentative
ofthepopulation.
Processofselectionofsampleelementiscalledsampledesign.
Sampledesignprobabilitysamplesorjudgmentsamples.
Judgmentsamplenonrandomsampleselectedbasedonopinionofexpert.Biased.
Probabilitysampleeachelementshasacertainprobability.
ProbabilitysampleSinglestagevsMultistage.
SinglestagesamplingAllelementsaretreatedequallyandthereisnosubdividingor
partitioningoftheframe.
SimpleRandomSample(SRS)Everyelementhasanequalprobabilityofbeingchosen.
Withreplacementfromafinitepop,orwithoutreplacementfromaninfinitepop.Assigna
numbertoeachelementandpickfromitrandomly.Unbiased.
SRSisaninefficienttechnique.
Inherentintheconceptofrandomnessistheideathatthenextresultisnotpredictable.
SystematicsamplingmethodSelectingeverykthitemstartingfromafirstelement.

Ifwedesirea3%systematicsample,welocatethefirstietmbyrandomly
selectinganintegerbetw.1and33(

100
X

chosen,thenpick23 ,56 ,89 etc.


th

th

th

Easytodescribeandexecute.
Disproportionalifsamplingframeisrepetitiveorcyclical.

=33).Supposed23was

MeasureofDispersion
Range=HL

Meandeviation=

x x
n

2
( xx )
n1

Samplevariance=s2=

=s2=

( x)
x n
n1
2

2=

PopulationVariance=

( x)2
N

( xx )

isalwayszero.(check)
Multistagemethodsforverylargepopulation.
(a)MultistagerandomsampleElementsaresubdividedandsample
2
choseninmorethan1stage.
Samplestandarddeviation=s=
(b)Stratifiedrandomsamplesubdividepopintostratas(usuallyby
naturaloccurringsubdivision)andthendrawasubsamplefromeachstrata EmpiricalRuleandTestingforNormality(68%,95%,
99.7%)1s.d68%.2s.d95%.3s.d99.7%.Iftrue,distributionis
normalbellshaped.
bySRSorsystem.(90x
=20)Finallycombinetodraw
ChebyshevsTheoremifdistributionisnotnormal,useCTto
findhowmuchdatawillfallwithinintervalscenteredatthemeanfor
conclusiononpopulation.Stratashouldbemutuallyexclusive.Stratashould
alldistributions.Proportionofanydistributionthatlieswithink
becollectivelyexhaustive:nopopelementexcluded.
(2b)Proportionalstratifiedsamplingisjustmakingsureno.ofitems
fromeachstratumisproportionaltosizeofthestrata.
standarddeviationofthemeanisatleast1
wherekis
2
(c)ClustersampleUseSRSorsystemtoselectthestrata(clusters)(1st
stage)thenuseSRSorsystemAGAINtoselectelementsfromeach
any+veno.greaterthan1.
cluster.(2ndstage).Adv:Cheaper,lowadmincostandtravelcost.Higher
MeasureofPosition
samplingerror.
QualitativeData
Q1=P25,atmost25%ofdataaresmallerinvaluethanQ1andatmost

40
180

1
k

~
x

PieChartsDataincircle.
75%arelarger.Q2=P50=
isthemedian.
BarGraphsDataasproportionalsizedrectangle.
Paretodiagrambarsarearrangedfrommosttoleast,highesttolowest(fornominal).It Findingpercentile:1.Rankndatalowesttohighest.
alsoincludesalinegraphdisplayingcumulative%andcountsforthebars.
QuantitativeData
2.
Calculate
Distributionpatternofvariabilitydisplayedbydataofvariable.Displaysfrequencyof
eachvalueofvariable.
DotplotDisplayhorizontal/vertical.Eachdata=onedotalongascale.Sortsdataintoits
3.
IfintegerAresults,add0.5andd(Pk)=A.5.PkishalfwaybetweenAthand
numericalorder.Description:wherearemostofthedots;no.ofpeaks;skewedor
A+1thposition.
symmetrical;anyoutliers;numericalrange
4.
IfAisafraction,d(Pk)=B,Bisnextlargerinteger.Bthposition.
StemnLeafDisplayLeadingdigits=stem.Trailingdigits=leaf.E.g.leafunit=0.1,
stemunit=1.0,6.2=6|2).Innumericalorder.1.Numberofpeaks
1
3 .Thisisameasureofposition.
(unimodal/bimodal)2.Symmetrical.Min.5stems.
Midquartile=
Backtobackstemandleaffortwodistinctdistribution.
Sidebysidedotplotmustbesamescale.
Midrangeismeasureofcentraltendency.
FrequencydistributionAlistingexpressedaschartformthatpairsvaluesofavariable
withtheirfrequency.
Interquartilerange(IQR)=Q3Q1
Groupvsungroupedfreqdist.ungroupedwheneachxstandsalone.
5numbersummaryincludes:L,Q1,
/Q2,Q3,H
Class
Class
ClassBoundaries Frequency
ClassMidpoint/
Number
Tallies
mark
InterquartilerangeQ3Q1.Itisresistanttooutliers.
1
|||
50<=x<60
3
55
LLineQ1BoxQ2BoxQ3LineH
ConstructingGroupedFrequencyDistribution(mutuallyexclusiveandmutually
Lowerfence:Q11.5(IQR);Upperfence:Q3+1.5(IQR)
exhaustive):

nk
100

Q +Q
2

~
x

x x
s

;zscores

(standardscore)arerangedinvaluefrom3.00to+3.00(Empirical
orcheby)
Bivariatedata2variablesfromsamepopulationelements
Twoqualitative:
(a)
UsingCrosstabulationorcontingencytablewithmarginaltotals.Can
usepercentagetorepresentfrequencies.AlsoColumntotalandRowtotal.
(b)
UsingBarGraph,
Onequalitativeandonequantitative
(a)
Usingnormaltable,dotplot,boxandwhiskersusingcommonscale.
Twoquantitativepredicty(dependent)basedonx(independent).
(a)
ScatterDiagram.
LinearCorrelationtomeasurestrengthofalinearrelationshipbetweentwo
variables.
Itisnegativewhenytendstoincrease,positivewhenytendstodecrease.
Ifallpointsfallinastraightline>Perfectpositiveorperfectnegative.
Ifstraighthorizontalorverticalline>Nocorrelation.(r=0)
Iftheresapatternbutnotlinear>maybequadraticrelation.
Nocorrelation;positive;highpositive;negative;highnegative.
CalculatingLinearCorrelationCoefficient,r.Itisalwaysbetween+1and1.+1
meansperfectpositivecorrelation.

Pearsonsproductmomentformula:r=

.SxandSyarestandarddeviations

ofxandyvariables.Anotherform:r=

~
A

Thenyoustartcountingandlocatethed(

valuemean
st . dev .

x X
()( y y )
( n1 ) SxSy

PopulationMedian,M.Depthofmedian=d(

~ n+1
A =
2

Z=

( x x ) ( y y )
1
,

n1
Sx
Sy

adjusted

averagetimeszscoreofxandzscoreofy.
LinearCorrelationCoefficient,r=

SS ( xy )
SS ( x ) SS ( y )
x

SS(x)=
SS(y)=

2
x

y 2

SS(xy)=

xy

x y
n

Ristypicallyroundedtothenearesthundredth.
Asvalueofrchangesfrom0.0to1or+1,datapointsmovescloserto
astraightline.0meansnocorrelation.
CausationandLurkingVariables:Correlation

Causeand

effectrelationship
Lineofbestfitisfoundbyusingthemethodofleastsquares.

^y =b 0+ b1 x

Slopeb1=

SS (xy)
SS( x )
Yintercept,b0,

y( b1 . x )

n
b0 = y (b1 . x )

OR

Mustconsiderifx=0isarealisticxvaluebeforeyoucanconcludethat

( ^y )=

b0ifx=0.

TryNOTtoextrapolate.Canextrapolateabit.
Lineofbestfitalwayspassesthroughthecentroid

( x , y )

Mainreasonforfindingregressionequationistomakepredictions.
PropertyofProbabilitynumbers:
(a)
Probabilityis0ifeventcannotoccur.1ifitoccurseverytime.Otherwise
itsbetween0to1.
(b)
Sumofallprobabilitiesis1.Eventsmustbenonoverlapping.
(Exhaustive/allinclusive)

Probabilityofaneventcanbeobtainedby:

(a)
(b)
(c)

Theoretically/classical(equallylikely)
Empirically/experimental(usingdata)
Subjectively(personaljudgment)

MeanofBinomialDistribution,

=np

Samplespaceisalistingofallpossibleoutcomesfromexperimentbeingconsidered.
Mustbeequallylikelysamplepoints.(S)

StandardDeviation,

= npq

Probability=LongRunProportion
Aneventisasubsetofthesamplespace(A).
TheoreticalApproach:P(A)=

ShapeofBinomialdist;Symmetric,rightskewed,left
skewed.
Skewnessdecreaseswhennisincreased.Whennislarge,distis
approx.tonormaldistribution.

of
No.
No . of elements

NORMALPROBABILITYDISTRIBUTIONS

.Goodapproximationfor
P(a

Disjointevents,mutuallyexclusive=Nocommon
outcomes.
Disjointevents,P(AandB)=0
Notdisjoint,mutuallyinclusive=commonoutcomes.
P(AORB)=P(A)+P(B)P(AandB)

whennislarge,convertbinomialtonormalanduseZscoreto
calculateprobability.
Sameforleftorrightskewed.Increasen.

=np

(b)

= npq
ConvertingbinomialdiscreterandomvariableTOnormal
continuousrandomvariable.

=confidence

X Z ( /2 )
X + Z ( / 2)
n
n

( )

( )

1)

2)
3)

X isthe point estimate , centerpoint of C


z ( /2) istheconfidencecoefficient.
istheerrorprobability.
HigherCI=HigherInterval

4)

( n )

iscalledthemaximum

errorofestimate,E(marginoferror).
Lowerconfidencelimit(LCL)tohigherconfidencelimit
(UCL)
(usehigherntodropE)

Assumptionsforestimatingmean,

E.g.x=4>3.5<x<4.5alsocalledcontinuitycorrectionfactor.

,usingaknown

1)Thesampleisrandomlychosen;

each P ( x ) 1
P ( x )=1

= X

5)

StandardDeviation,

ConstantfunctioniswhenP(x)doesnotchangeevenwhenxchanges.
Everyprobabilityfunctionmustdisplay2basicproperties(checklist):

isknown.

ConfidenceIntervalforMean:

MeanofBinomialDistribution,

Whensamplingisdonew/oreplacement,itisdependent.Itis
independentwhensamplesizeisbig.
Ifbelow10%ofpopulation,independent.Anythingabove,itis
dependent.

(a)

npandnqLARGEROREQUALTO5,nisgreaterthan20

P(B|A)=

interval(CI)
Levelofconfidenceincludesparameterbeingestimated.
Moreskewed=morenforCLTtowork.

NORMALAPPROXIMATIONOFBINOMIAL

)=P(A),AandBare

P( AB)
P(A|B)=
P(B)
P( AB)
P (A)

Assume

Intervalestimate+levelofconfidence(1

REFERTOSTDNORMALDISTRIBUTIONTABLEFOR
PROBABILITY
Totalareaundercurve=1
Moundedandsymmetrical,extendsindefinitelyinbothdirections.
Aysmptotic.
Meandivideareabyhalf.
Nearlyallareabetweenz=3.00toz=3.00
Continuousandunimodal

TwoeventsareindependentifoccurrenceofAdoesnotaffect
probabilityofB.AandBareunrelated.

independent.

STANDARDNORMALDISTRIBUTIONS

x
X N ( , ) Z =

Z N ( =0, 2=1)

Objectiveistousethesampledatatoknowmoreaboutthepopulation.
1)estimatingvalueofapopparameter;2)testingahypothesis

ve/+veBias(under/overestimate)left/rightofmean
UnbiasedOnmean
Intervalestimateisanintervalofnumbersbetterestimate.
Becauseitincorporatesamarginoferrorwhichhelpstogaugethe
accuracyofpointestimate.

=less steep slope


2

InferentialStatistics

variable(SEissmall);2)unbiasedStatistics

x b = f ( x ) dx

Bigger

Complement : P ( A ) + P ( A )=1.0=P( S)

Fordependentevents:
P(AandB)=P(A)XP(B|A)or
P(AandB)=P(B)XP(A|B)
Forindependentevents:
P(AandB)=P(A)XP(B)

forallrealx.

e
2

y=f(x)=

P(Expected)ifwehavealargenumberoftrials.
Lawoflargenumbersorlongtermaveragethelargerthenumber
oftrials,theclosertheP(A)tobetruetotheP(A).

IfP(A|B)=P(A|

1 x
(
)
2

X
X X
X
=
X

Pointestimateisasinglenumberbestguessforparameter.
DoesntnottellusHOWCLOSE.
Anothersamplewillnotyieldthesameresult.(Questionsthequality
ofpointestimate)
Qualitycanbeenhancedbymakingthesamplestatistics:1)less

UsingaTreeDiagram.
EmpiricalApproach:P(A)=

of
No.
No . of trials

Z=

is known;
3.5
P ( x is not more than 3 )=P ( x 3 )=P ( x<3.5 )=P Z<
=P ( Z <2.05 )=0.0
(Ifassumptionsnotmet,LoCwillbelowerthanstated
.)

2)

3)SDSMhasanormaldistribution(ornislargeanduseCLT)

all x

5stepsconfidencesinterval:

ProbabilityHistogram=Probabilitydistribution
Populationparameters(mean,varianceandstandarddeviation)

SAMPLINGDISTRIBUTION

DISCRETEPROBABILITYDISTRIBUTION
(DISCRETERANDOMVARIABLES)

1)
2)

Step1:NullandAlt.

Describepopulationparameterofinterest
Checkforassumptions.Identifytheprobabilitydistribution

(a)

Therearenrepeatedidenticalindependenttrials.

(b)

Eachtrialshastwopossibleoutcome(success,failure)

andformulatouse.StateLoC1
( standard error ) .
X x ( pop mean of sample means ) x =SE
3)
Sampleinformation.(datagiven)

X isthe random variable .


/
4)
Determine
2 andEandLCL/UCL.
Completeinfo>samplingdist>
X X
Z
5)
Statetheconfidenceinterval.
x
LoCcontradictswidth.WecanhighLoCandlowwidth.Higher
confidenceinterval=HigherE

Incompleteinfo>empirical>
FindingSampleSize:

2
Z ( /2 ) .
X
n=(
) alwaysroundupn
E
X 1 , X 2 , X
3}
Elementsofsampledistribution{
SometimesifEisexpressedasmultipleof
,thenactual
Mean
isnotneeded.

X =

(c)

P(success)=p,P(failure)=q,p+q=1

StandardError

(d)

BinomialrandomvariableXisthecountofthenumberof
successfultrialsthatoccur.X={0,1,2,3,4}

, ofadiscreterandomvariable:
= [ xP ( x ) ]
Variance,
,ofadiscreterandomvariable:
2
2
=[ x P( x)] {[xP ( x ) ]}2
2= [ x2 P ( X ) ]2
Mean(expectedvalue),

or

LISTUSINGATABLE

BINOMIALPROBABILITYDISTRIBUTION
(SUCCESSORFAILURE)
Binomialprobabilitypossessesthefollowingproperties:

P(x),theprobabilitythattherewillbeexactlyxsuccessesinntrails,is;

P(x)=nCx(px)(qnx)forx=0,1,2,3,,n
Cxthebinomialcoefficient

sample
n

Ifsampledpopulationhasanormaldist,thenthesamplingdistof

ProbabilityFunction:

X =

n!
x ! ( nx ) !

Makingassumptionstomaketheexperimentindependent.
X

B ( n=12, p=0.05 ) B(12,0.05)

willalsobenormalforsamplesofallsize.

CentralLimitTheorem(CLT),assamplesizeincrease,
sampledistributionofsamplemean(SDSM)willmoreclosely
resemblenormaldistribution.(evenifsampledpopisnotnormal).If
big,canuseEmpiricalRuletoanalyse.
DescriptionofSDSM:1)Themean;2)StandardError;3)
Indicationofhowitisdistributed.
Uniform,Jshaped,Ushaped,NormalDistributions
Samplemeanbecomeslessvariableassamplesizeincrease.
(Narrowerdistribution)

Describepopulationparameterofinterest.

Nullhypothesisisastatementthattheparametertakesa
particularvalue(noeffect).

Alternativehypothesis:Twotailed,left/righttailed.

H0andHaMUSTbemutuallyexclusive.

Step2:Assumptions

Asignificancetestassumesthat

1)randomdataproduction;
2)Normaldistribution;
3)

is known

Whatteststatisticareyouusing?

Determinelevelofsignificance

Step3:Teststatistics
LISTdatagiven.

TestStatisticsforMean:

X
Z

Test statistics the mean difference for PAIRED

DATA,
d : (DEPENDENT)
pq
; 3 .normal if n isbig

d
d
n
t
ConfidenceIntervalforaProportion:
sd / n
p'q'
p'
q
'
n x
1; most common value of
d
p' z ( /2)
p '+ z ( /2) where dfp==
isn
zero
n
n
MUST define your d. For example: d = before
Using Z-statistics even though
after. It is customary to put big minus small.
unknown,
because t* -> s has extra error.
Statistical Significance
Practical
1.
Check 3 points of binomial normality: np, nq and

1 p = p ; 2 p =
'

Pvalue:Smallerthepvalue,thestrongertheevidenceagainst
H0.UseZ*tofindpvalue(twotailstimes2)

Classical:Use

tofindcriticalvalueandcritical

n.
2.
Check for independence <10%
Confidence Interval using p is for p (the parameter)

region.

Drawthegraph.(left/right/twotailed)

Pvaluesmallerthan

ORisitincritical

region.

significantleveltoshow

Higherpvalue,higherprobabilitytorejectH0.
Advantagesofpvalueapproach:
1. Resultsoftestprocedureareexpressedintermsofacontinuous
probabilityscalefrom0.0to1.0;
2. Pvaluecanbereported,usercandecideonhisowndpdonsituation;
3. Computerscandoallthecalculations.Noneedtables.

Significance

Mean Difference between Two Independent


Sample

2
X 1 X

X X =12
X X =
1

where p* and q* are provisional values (from pilot


study).
If p is not provided, use conservative approach p =
0.5, q = 0.5.

to estimate n:
z ( /2 ) . 2
n=(
) Always round up n
E

and standard error

2
1

+( 2 ) but more likely


n1
n2

tobeassmallaspossible.Smallerfor

moreseriouserrors.
Ifisreduced,theneithermustincreaseornmustbeincreased;if
isdecreased,theneitherincreasesornmustbeincreased;ifnis
decreased,theneitherincreasesorincreases.
, P(type II error) = P(do not reject Ho| Ho is
false) real

must be given calculate .


X a
P( Z
)

1
is called the power of the statistical test
ability of a hypothesis test to reject a false null
hypothesis.

One Population
Inferences
When
error,

is unknown, estimated standard

s
n

OR construct a 90% C.I. to reject or not reject.


I

df ,
s
+t
2
X
n

( )

If concerns mean, use t-distribution. If not use Z.


Same source(matched pair) -> dependent.
Unrelated source -> independent.
Dependent: Paired difference:
d = X1 X2
Mean of the Paired differences:

( )

X
B X A
d=
d is the point estimate of d
d

: (DEPENDENT)

isunknown):

X
s/n

Hypothesis testing is same as previous. Apply CLT


if sample size is more than 30.
Inferences about Binomial Probability of Success,
p
Sample Binomial Probability:

x
p' = unbiased estimator for p
n

df , / .

Sd
n
dt

( X 1 X 2 )( 12 )

s 21
s 22
+
n1
n2
Parameter of interest is
12 .

)( )

Assumptions: Samples are randomly selected from


normally distributed population, in an independent
manner. Since the two populations are separate, they are
independent.
Ho

: =0,

>0

H :

a
1
2
1
2
Proportion Difference betw 2 Independent
Samples
'
'
Properties of Sampling Distribution of
:
1
2

p p

p p
p1 p2 ;

Mean

'
1

'
2

p p
'
1

'
2

p1q1
p q
+ 2 2
n1
n2

)( )

Assumptions: The n1 and n2 are independently random


selected. Population should not be changed after
sampling

Normal distributed
Mean and standard deviation of the sample differences:

where df equals the smaller df .


Test Statistic for Mean
Difference(Independent):

n1 p '1 , n1 q'1 , n2 p'2 , n2 q'2 , all 5 ; n 1n2

where df = n 1
Assumptions:
Paired data are randomly selected and;

d
2

( d)
d n
Sd=
n1
2

s1
s
+ 2
n1
n2

Approximately normal dist. If n1 and n2 are sufficiently


large.
If samples consist of less than 10% of their
respective populations;

df , / .


d=
n

( ) ( )
2

( X 1 X 2 ) t ( df , /2 ) .

Sd
n

Confidence Interval for the Mean Difference


(Independent):

Standard Error

s
withdf=n1
n
isunknown ; using students

TestStatisticsforMean(

Two Population
Inferences

PAIRED DATA,

t-distribution.

Check for independence <10%

Confidence interval of the mean difference for

ConfidenceIntervalforMean:

Changes:

Check 3 points of binomial normality: np, nq and n.

X
s
n

At high dF(infinity), it will be the same as normal dist.


If dF is not listed, use the next smaller value of dF.

p ' p
pq/n
x ; p is for population proportion
with p =
n
Z

Students t-statistic =

df ,
X t

TestStatisticsforProportion:

( )
2

s1
s
+( 2 )
n1
n2

df will be the smaller between the 2 df. Therefore true


LoC will be slightly higher than the reported LoC.

Using

Wewant

has:

mean

2. p . q

n=

FailtorejectH0?

( )

Sample Distribution of

Finding sample size:

Step5:NullandAlt.hypothesisconclusion
Notsufficientevidenceat

( )

Step4:PvalueorClassicalApproach

'

Confidence Interval for Proportion Difference:

2
/

'
( p1 p'2) z
Test Statistic for Proportion Difference
Population Proportion Known
'
'
1
2

p p

pq

([ n1 )+( n1 )]
1

'

p 1p
'

'

p p qp

([ n1 )
1

When p1 or p2 are not specified, use pooled


probability pp

p'p=

x 1+ x 2
n 1+ n2

e= y ^y

Hypothesis Testing for linear correlation


coefficient:

=0 ( x , y unrelated ) ; H
<0()

Use the new table Assumptions: The set of (x,y)


ordered pairs forms a random sample, and the y values
at each x have a normal distribution.
Use r-distribution with n-2 df.
Linear model is used to explain bivariate data in
population:

^y =b 0+ b1 x+
is the random experimental error in the

observed value of y at a given value of x.

^y

e=0

Mean of e is 0
Variance is

Use r* = r (linear correlation coefficient)

(difference betw. mean value of

y and experimental

Linear Correlation and Regression Analysis


If r2 is 0.99, means can interpret as 99% of variation can
be explained by regression line. r2=1, all points on line.

H0:

b0 is our estimate of 0(pop y-intercept), and b1, our estimate of


1(pop slope).
b0 and b1 are not fixed because they are sample statistics.
Estimate of Experiment Error is approximated by

Variance is not the same at each Xi. Can see from


scatter plot.

Variance of the Estimated Error, e:

2
e

s=

( y 2 ) ( b 0 ) ( y )(b1 )( xy )
n2

You need

x , y , xy , x 2 , y 2n
Estimate for Variance of Slope:

s
s = e but this should be givenexa
SS (x )
2
b1

Confidence Interval for Slope test:

b1 t ( n2, /2 ) . S b 1 S b 1 given
Test Statistics for Slope 1=0 :
b
t 1 1
Sb 1
H:
1=0 ; H : 1> 0 (if can justify, if
0

not use two-tailed)


Assumptions about the linear correlation coefficient: The
set of (x,y) ordered pairs forms a random sample, and
the y values at each x have a normal distribution.
Use t-distribution with n-2 df.
95% C.I. for

y X

P.I. > C.I.


95% Prediction interval for
random at each x.

P.I. > C.I.

-> Pop Mean of y at each x

YX

-> Individual y at

S-ar putea să vă placă și