Sunteți pe pagina 1din 12

INTRODUCTORY STATISTICS: class 2

SatoshiMiyata,Ph.D.

Announcements

Homework 1: posted and due on 10/17/2014.

Lecture notes 1 & 2: posted.

Office hour: C601 Visiting Faculty Office


Friday 14:00-15:00

10/17/2014

AU14Statistics:class2

Flowchartofdataanalysis
Objective
Sampling

Descriptivestatistic
(Numerical&Graphicalsummary)

Summary
Selectionofmodel
Modification
model

(Regression,ANOVA,etc.)
Inferentialstatistics

Modelbuilding
Modeldiagnostics

(diagnosisofmodelassumptions)

Decision&Report
10/17/2014

AU14Statistics:class2

Basicdefinitions
Population:Collectionofobjectsofinterest.
Parameter:Characteristicunknownconstantof
thepopulation.
Sample:Partofthepopulation.
Samplingframe:Listingoftheindividualstobe
sampled.
Variable:Randomquantityinthepopulation.
Statistic:Quantitycalculatedfromthesample.
10/17/2014

AU14Statistics:class2

BasicParadigm
Samplingframe
Samplingmethod
Sampling
Population

Sample
Calculation
Inference

Parameter

Statistic

Objectofinterest

Basedon Variable

10/17/2014

AU14Statistics:class2

ProcedureofDataAnalysis
Objective
Population,Sample,Samplingframe,
Variable,Parameter,Statistic

Sampling
Census,SimpleRandom,Convenience,VoluntaryResponse
Randomizetoavoidbias.

Summary
NumericalSummary.(Mean,Median,Variance,s.d.,etc.)
GraphicalSummary.(Histogram,Boxplot,etc.)
10/17/2014

AU14Statistics:class2

ProcedureofDataAnalysis(cont.)
ModelBuilding
Estimation,HypothesisTesting
ANOVA,Regression,etc.

Diagnostics
Checkthemodelassumptions

Report
10/17/2014

AU14Statistics:class2

TypesofSamplingMethods
Census:Allindividualsinthepopulationare
sampled.
SimpleRandomSampling(SRS):Allindividualsin
thepopulationhaveequalchancetobesampled.
ConvenienceSampling:Individualsaresampled
fromaspecificpartofthepopulation.
VoluntaryResponseSampling:Samplesare
obtainedonlyfromvoluntaryresponses.
10/17/2014

AU14Statistics:class2

Biasduetosamplingmethods
Allindividualsinthepopulationshouldhaveequalchance
tobesampled.

Census
SRS

unbiasedbutdifficult
unbiased (desirable)
Convenience maybebiased
Popula on
Samplingframe
Voluntary
maybebiased

Inthepreviousexample,thepopulationandthesampling
framearedifferent.Theresultmaybebiased.
10/17/2014

AU14Statistics:class2

BranchesofStatistics
Descriptivestatistics:tosummarizeanddescribe
importantfeaturesofthedata.
Inferentialstatistics:togeneralizethesample
informationanddrawaconclusionaboutthe
population.
Descriptivestatisticssummarizestheshapeofthe
distributionofthedata.
Location
Variation
Skewness,outliers,shapeofdistribution,etc.
10/17/2014

AU14Statistics:class2

10

Numericalsummary:Location
x1, x2 ,, xn beobservations.
Let
n :numberofobservations.
1
Mean: x n i 1 xi
n

x1
Median:Sort,andlet
x1, x2 ,, xn

x2 xn .

if n is odd,
xn 1 2
~
x
xn 2 xn 21 2 if n is even.

10/17/2014

AU14Statistics:class2

11

Propertiesofmeanandmedian
Mean x ismoresensitivetooutliersthan
~
median.
x
Distribution of x
Symmetric
Skew to left
Skew to right

x~
x
x~
x
x~
x

Letbeaconstant.
c

yi xi c, zi cxi
y x c, z cx , ~
y~
x c, ~
z c~
x

10/17/2014

AU14Statistics:class2

12

Numericalsummary:Location(cont.)
Percentile:k%percentileisthepointthatk%ofthe
samplesarebelowand(100k)%ofthesamplesareabove.
Quartile:1st quartile=25%percentile,
3rd quartile=75%percentile.
Trimmedmean:k%trimmedmeaniscalculatedby
discardingthelowerandhigherk%ofthedata.
Fivenumberssummary:minimum,1stquartile,median,
3rdquartile,maximum.
10/17/2014

AU14Statistics:class2

13

SampleRcode:

10/17/2014

> x <- rnorm(100)## 100 random numbers ##


>
> mean(x)## mean ##
[1] -0.3011125
>
> median(x)## median ##
[1] -0.2836064
>
> quantile(x)## quantile ##
0%
25%
50%
75%
100%
-2.6852655 -0.8990262 -0.2836064 0.4137969 1.9382667
>
> quantile(x, 0.1)## 10% percentile ##
10%
-1.480168
>
> mean(x, trim=0.1)## 10% trimmed mean ##
[1] -0.2875766
> mean(x, trim=0.5)## 50% trimmed mean ##
[1] -0.2836064
>
> summary(x)## Five numbers summary ##
Min. 1st Qu. Median Mean 3rd Qu. Max.
-2.6850 -0.8990 -0.2836 -0.3011 0.4138 1.9380
AU14Statistics:class2

14

Numericalsummary:Variation
Variance:averageofsquareddistancebetweenindividual
observationsandasamplemean.
2

x
i1 i
n

s
2

Standarddeviation:

n 1

S xx

n 1

s s2

FourthSpread(InterQuartileRange):
fs =(3rd quartile 1st quartile)
10/17/2014

AU14Statistics:class2

15

Propertiesofvarianceands.d.
s 0, s 0.
Letbeaconstant.
c
2

yi xi c, zi cxi ,

var( y ) var( x),


var( z ) c 2 var( x), s.d.( z ) | c | s.d.( x).

10/17/2014

AU14Statistics:class2

16

SampleRcode:
> var(x)## variance ##
[1] 0.9881656
>
> sd(x)## standard deviation ##
[1] 0.9940652
> sqrt(var(x))
[1] 0.9940652
>
> IQR(x) ## inter quantile distance ##
[1] 1.312823

10/17/2014

AU14Statistics:class2

17

GraphicalSummary:Histogram

100
0

50

Frequency

150

Classes/Bins:Subintervalofthesamplerange
Frequency:#ofobservationsineachbin.
RelativeFrequency:=Frequency/Totalnumberof
observations.
Histogram:Barchartofthefrequencyortherelative
frequency.

-3

10/17/2014

-2

-1

AU14Statistics:class2

18

TypesofHistogramShapes

100
50
0

50

Frequency

150

100 150 200

bimodal

Frequency

unimordal and symmetric

-3 -2

-1

-2

300
100
0

100

Frequency

300

left skewed

Frequency

right skewed

10

15

10/17/2014

20

25

-15

-10

-5

AU14Statistics:class2

19

HistogrambyR
> hist(x)

10
0

Frequency

15

20

Histogram of x

-2

-1

10/17/2014

AU14Statistics:class2

20

GraphicalSummary:Boxplot
Thelowerandtheupperedgesofthe"box"areat1st and
3rd quartiles.Thecenterlineisatthemedian.Thelower
andtheupper"whiskers"arebasicallyatmin.andmax.
maximum
3rd quartile
median

Inter quartile range

1st quartile
minimum
boxplot(x)
AU14Statistics:class2

skew to left

400
0

20

0
-20 -10

skew to right

-10

10

30

15

heavy tails

-10

-5

-10
-15

-1

10

15

10

-5

skew to left

20

symmetric

10

300
100

5
1

5
0

-1

200

Frequency

15
10

Frequency

15
10

Frequency

20

15
10
5

Frequency

-3

heavy tails

20

skew to right
25

symmetric

21

500

10/17/2014

10/17/2014

AU14Statistics:class2

22

TipsonR
TocopygraphicsontoPowerPointorWORD,etc:
1. Rightcrickonthegraphicswindow.
2. SelectCopyasmetafilefromapulldownmenu.
3. PasteitonPPTorawordprocessor.
TosaveRcodeorprogram:
1. Youmaywriteaprogramfirstonatexteditor.Then
copyandpasteitinR.
2. InR,selectFilemenuandgotoNewscript.
Thenwrite,copyandpasteaprogramonREditor.

10/17/2014

AU14Statistics:class2

23

S-ar putea să vă placă și