Documente Academic
Documente Profesional
Documente Cultură
PROBABILITY
DISTRIBUTIONS
By Dr Gary Deng
distribution.
this unit.
To put it simply, a univariate random variable has a
=1
Example:
Mean:
Variance:
The fixed set of probabilities of a discrete variable are now replaced with a
mean=sum(X*P)
mean
sum( (X-mean)^2*P )
mean=t(X)%*%P
Mean
For the remainder of this lecture we will focus mainly on continuous random
variables
Uniform Distribution
The uniform distribution is the easiest of all.
Its uniform because every value has the same probability.
0,
Uniform Distribution
R code:
Normal Distribution
Normal Distribution
R code:
denoted as
Let
, then
install.packages("pracma")
library("pracma")
# mean of the normal distribution
mu = 5
# standard deviation of the normal distribution
sigma = 1
# storage of corresponding X values
NormalCDF = matrix(0,10000,1)
# cumulative probaility ranges from 0.0001 to 1
PROB = seq(0.0001,1,0.0001)
# a simple loop
for (i in 1:10000)
{
# define the normal cumulative density function
# erf is known as the error function
# Notice the function is defined as squared deviation from cumulative probability
# what happens if it's not squared?
# the minimizer will seek to find the lowest value rather than a zero value
# test it yourself by removing ^2
Let Z~
1 + 2 + +
# optimize is an optimizing function used for situations where there is one unknown
# the function is NCDF
# 0 is teh lower bound of the search
# 10 is the upper bound of the search
out = optimize(NCDF, lower = 0, upper = 10 )
# stores the solved value of X in the above function
NormalCDF[i] = out$minimum
}
plot(NormalCDF,PROB)
And:
= ;
=2
R code:
F Distribution
F Distribution
R code:
For example, your ANOVA outputs are nothing more than reported
results of an F-test
~
and
; ~
;
are independent of each other;
Then:
variables.
t Distribution
Let
And let
0,1 and
and
Then:
Mean: 0
Variance:
where
= degrees of freedom
, the variance
1, and t
collapses to a standard normal (mean 0 variance 1).
Let
And let
0,1 and
and
Then:
below.
=1
distributions
these distributions
Logistic Distribution
Looks very similar to a normal distribution, but it has a fatter tail.
It is used when your dependent variable is (0/1).
It is the foundation of the Logit model.
Logistic Distribution
R codes
Exponential Distribution
Exponential Distribution
F 60, =
1
=
30
60
= 0.1353353
30
Exponential Distribution
Poisson Distribution
R codes:
PDF:
= ,
=
Poisson Distribution
R codes:
day
Sun
Sun
Sun
Sun
Sun
time
size
Dinner 2
Dinner 3
Dinner 3
Dinner 2
Dinner 4
In this F distribution:
DF1 = 1, DF2 = 238
R Shiny
Its a great web application framework for R.
Go to: http://shiny.rstudio.com/tutorial/ for a comprehensive online tutorial.
LINEAR MODELS
By Dr Gary Deng
install.packages("shiny")
library(shiny)
Every Shiny App has two components:
1.
2.
The user-interface (ui) script controls the layout and appearance of your app. It is
defined in a source script named ui.R. The server.R script contains the
instructions that your computer needs to build your app.
You need to put BOTH the ui.R and server.R script in the same folder.
ui script:
Shiny example:
Here is a simple
App. As you
change the
number of bins
using the slider,
the histogram of
X changes
accordingly.
server script
#Define server logic required to draw a histogram
shinyServer(function(input, output) {
Let
be any matrix.
If
If
>1&
If
= 1&
If
or
, reverses the
Matrix Multiplcation
Let there be two matrices,
You can multiply
and
and
And the ORDER in which you multiply them matters. That is,
in general.
For
.
will have dimension
For
In general, A
even if both
and
In the case of A =
Matrix Multiplcation
Matrix Inverse
X = matrix(c(3,4,7,8,9,10,3,3,1),3,3)
Y = matrix(c(6,8,2,1,2,1,3,2,1),3,3)
O = matrix(c(2,2,3),3,1)
# X and O are conformable (3x1)
X%*%O
# O and X are NOT conformable
O%*%X
# t(O) and X are conformable (1x3)
t(O)%*%X
# X and Y are conformable
X%*%Y
# first element of X%*%Y
# is sumproduct of row1 of X and column1 of Y
sum(X[1,]*Y[,1])
# last element of X%*%Y
# is sumproduct of row3 of X and column3 of Y
sum(X[3,]*Y[,3])
# YX is NOT equal to XY
Y%*%X==X%*%Y
= 1,
For
Matrix Inverse
Theory of LM
In matrix notation:
X = matrix(c(3,4,7,8,9,10,3,3,1),3,3)
# in R, matrix inverse is done through solve().
XI = solve(X)
X%*%XI
# if you foolishly did this
XI = X^-1
XI
# you have inverted every single element in X.
# XI will NOT be the true matrix inverse of X.
X%*%XI
=
Where:
is
1 ;
is
1 ;
+
is
is
1 .
observations, regressors/predictors/explanatory
variables/exogenous variables, typically includes the intercept/constant
unless specified otherwise.
Clearly,
LS estimation
Cars example
regmodel=lm(dist~speed)
summary(regmodel)
Cars example
Cars example
Your LS estimates:
R codes:
# generate a sequence of xs
x=seq(min(speed),max(speed),1)
ypredicted=regmodel$coefficients[1]+regmodel$coefficients[2]*x
lines(x,ypredicted,col="red")
LS Principle:
LS Principle:
=
Where
Graphically,
Select
.
and
=
=
2
2
=0
=0
In Matrix Algebra
=
=
=
Eqn(16.3) on
page 211 is
WRONG.
Properties of LS estimators
-n =0
other.
3) The residuals are homoscedastic.
4) The residuals are uncorrelated.
Let:
= 1; ~ 3,1 ; ~
Let: = 1,000
Let: ~ 0, 0.5
Let:
= 1; = 0.1;
=3
Simulate
1,000 1 .
Simulated Example
What is the distribution of the error term?
What is the distribution of Y?
hist(Y, xlim=c(-5,60),col="blue")
hist(E,add=T,col="red")
R codes:
X1<-rep(1,1000)
X2<-rnorm(1000,3,1)
X3<-rchisq(1000,3)
E<-rnorm(1000,0,0.5)
B1=1
B2=0.1
B3=3
Y<-B1*X1+B2*X2+B3*X3+E
a normally distributed Y.
Simulated Example
Simulated Example
regmodel=lm(Y~X1+X2+X3-1)
summary(regmodel)
regmodel=lm(AvgKWH~WMAXit+WMINit+Time)
summary(regmodel)
plot(resid(regmodel))
hist(resid(regmodel))
GENERALIZED LINEAR
MODELS
By Dr Gary Deng
There are a lot of options available but you dont have to worry about them here.
Theory
R function
There are many occasions where a LS estimation of a linear regression model may not
be appropriate.
Specifically, when the residuals of the model do not follow a normal distribution (which is
very often the case in practice), LS estimation could result in biased and inconsistent
estimates.
As the name suggests, GLM is a flexible generalization of a linear regression model.
The above is taken straight from R.
The Right Hand Side remains a LINEAR function:
predictors.
following:
via a so-called
link function.
More precisely:
For all intents and purposes, in most cases you only need to know how to specify the
formula: which is the linear combination of the regressors on the right hand side.
data: which is the dataset being used.
family: which describes the error distribution and the link function.
, where
. is
You can just about ignore the rest of the arguments in this function
GLMs are mostly estimated using ML (which we will look at next week).
R Function
Binomial Data
One of the most common problems encountered in
choose from.
In this lecture we will only look at:
Binomial
Poisson
Logit Model
For all intents and purposes, they give you very similar
Logit Model
Define
One way to understand the Logit model is to employ a very important concept called a latent
variable, which forms the foundation of most probability models involving categorical dependent
variables.
For example, let us suppose that we are looking at the probability of a suspect making a false
confession, i.e., the dependent variable = 1 when the confession is false and = 0 otherwise.
Intuitively, one could think of ones willingness to lie as a latent variable that determines the
outcome of the confession. It is unobserved, but when this willingness to lie passes a certain
threshold, ones confession becomes false.
>0
Let us rewrite the linear regression model in terms of the latent variable, we have:
=
+
+
Therefore
=1 =
>0
and
>0
+
+ >0
=
=
>
+
=
<
+
Finally, if we assume that follows a so-called extreme value distribution, the quantity of
written as:
+
<
+
=
1+
+
where
<
can be
0 and 1. The nice thing about this latent variable approach is that it makes an intuitive link between an observed binary
outcome (such as false confession) with an unobserved yet intuitive latent variable (such as willingness to lie).
Logit Model
Logit Model
Another way to understand the logit model is through a functional transformation of the left-hand side
dependent variable. More specifically, the transformation uses what is known as the logit function:
Firstly, we note that the input to a logit function is , i.e., the probability of = 1. Secondly, natural log, i.e.,
ln (. ), is used. Thirdly,
is known as the odds, and it is simply the probability of observing Yes relative to the
probability of observing No.
The binary logistic regression is then defined as:
1
= 1
=
+
Example: US homicides
There were 3,085 counties in the US in 1990. The centre for National Consortium on Violence Research (NCOVR) had compiled a
dataset containing homicide rates (per 100,000 capita) for each of the 3,085 counties in the US in 1990, along with a number of
socio-economic variables thought to be important in predicting homicide rates.
The Dependent Variable:
= 1 if homicide count exceeded 20 per 100,000 capita. These are the homicide hotspots
The Explanatory (socio-economic) Variables:
Southern:
We can easily see why this transformation is useful by noticing that the left hand side variable of the
regression model is no longer bounded between 0 and 1. For instance, if the outcome overwhelmingly favours
+ . Conversely, when the odds overwhelmingly favours No over Yes,
Yes over No, then
1 and
0 and
1
=
then
Finally, to show that the two interpretations are two sides of the same
coin, one could easily derive the following:
MedianAge (MA90):
PopulationStructure (PS90): a variable constructed using principle component analysis, which essentially captures the
percentage of minority races in the county population. The larger the value for this variable the larger the percentage of more
minority races.
ResourceDeprivation (RD90):
a variable constructed using principle component analysis, which essentially captures
the level of deprivation of social and economic infrastructure in the county. The larger the value for this variable the more deprived
was a county of adequate social and economic infrastructure (such as schools and hospitals).
1+
+
+
+
+
+
+
+
+
Example: US homicides
data <- read.csv("Homicides.csv")
attach(data)
regmodel_logit <- glm(HomicideHotSpot~SOUTH+UE90+DV90+MA90+PS90+RD90, family=binomial("logit"))
summary(regmodel_logit)
Example: US homicides
Poisson Model
~
Where
overdispersion (variance > mean) and zero inflation (having too many 0 counts in
the data). They are beyond the scope of this unit.
Example: Shipwreck
Example: Shipwreck
These are the data from McCullagh and Nelder (1989). The file has
Example: Shipwreck
xtabs(~damage+type)
xtabs(~damage+construction)
xtabs(~damage+operation)
plot(damage~months)
Example: Shipwreck
months2 <- months^2
regmodel_Pois <- glm(damage~type+construction+operation+months+months2-1,family=poisson(link="log"))
summary(regmodel_Pois)
Optimization
MAXIMUM LIKELIHOOD
ESTIMATION
(such as in LS);
By Dr Gary Deng
(2) The function that gets optimized has to be a function of controllables. For
example, in minimizing Sum of Squares, you are choosing values of your least
squares estimates.
problem. I will not go into details about this as it is beyond the scope of this
unit, but there are many real life problems that are constrained optimization
problems. As an example, profit maximization could be subject to the labour
supply constraint (a maximum number of hours that your employees could
work, for instance).
LS as an optimization problem
model:
is a function of
all other nineteen sites are known. Your equipment allows you to shoot laser beams and measure
the distance between yourself and the other sites. However the measurement is prone to errors.
Question: can you find out your exact spatial coordinates?
Consider the following example in Navigation. You are at Site 12, and the exact spatial locations of
is a linear function of
, because:
=
Where
fashion.
Navigation Example
Navigation Example
,
.
We observe/measure Euclidean distances between site
12 and all other sites with some error.
Thus the distance between site 12 and any site
=
+
=
+
Is nonlinear in the two unknowns
,
+
.
=
=
,
+
call on
For all intents and purposes, knowing how to use optim is
Navigation Example
data <- read.csv("NavigationExample.csv")
attach(data)
# First we must specify the objective function
fn = function(P) sum( ( D - ( (X-P[1])^2 + (Y-P[2])^2 )^0.5 )^2 )
# next we specify initial values/guesses to the problem
initial = c(0,0)
# call on optim to minimize sum of squares
# the first argument is the initial value
# the second argument is the function to be optimized
# the third argument selects the numerical search method
# we will talk about Hessian later.
out = optim(initial, fn, method = "BFGS", hessian = TRUE)
# problem SOLVED out$par
Finish
site 12
Start
Let
In the most simple case (the only case we will explore in this
=
, ,,
be an -vector of observed sample values. Let
=
, ,,
be a -vector of unknown parameters.
Furthermore, let depend on .
To put it simply, an
(Independent and
: serially
For an
as:
f Y;
=f
,,
=f
In most empirical applications, it is simpler to maximize the log of the likelihood function:
Note: we can only write the joint density as the product of the
;Y =
monotonic.
;Y = f
,,
=f
that maximizes
;Y
approaches infinity,
converges in value to .
Asymptotically Normal:
~
1 L
L
;Y
; Y also maximizes L
L ; Y with respect to
by setting the score to zero, that is:
; Y , as logarithmic transformation is
;Y =
The derivative of
:
L
;Y
; Y The ML estimator
is obtained
=0
In words: is asymptotically normally distributed with mean and variance given by the inverse of
.
is known
as the Information Matrix. Numerically, it is often evaluated with the Hessian Matrix:
L ;Y
=
which is the second order matrix derivative of the log likelihood function. As you will see later, this will be numerically
evaluated and generated in R.
Consistency means that under large-sample conditions ML estimators give you very good coefficient
estimates. Asymptotic normality means that you can use its asymptotic distribution to perform hypothesis
tests.
Beetles Death
Beetles Death
The probability of death depends on the concentration . For logistic regression, recall that we model the probability
using
=
You can easily find out from Wikipedia that the pmf for a binomial
distribution is:
+
1+
,,
=f
Evaluate the above pmf for each experiment observation (8 experiments in total in this sample) and multiply them
together. We have a likelihood function.
1+
1+
Now you will see why a log likelihood function is easier to work with
ln =
1+
This is the log likelihood function we will maximize by choosing values for
and B[2] is
We pick sensible starting values and do the fit. There are various
Using simple algebra involving log functions, we can easily work out:
ln =
=f
Beetles Death
Where
Beetles Death
and
Beetles Death
Beetles Death
Recall that the inverse of the negative of the Hessian matrix approximates the
coefficient estimates.
Beetles Death
Finally as a comparison, if we had used GLM, the results will be very similar.
Because GLMs are also estimated with ML
is:
1
2
1
2
REVISION WEEK
By Dr Gary Deng
Description
ID variable
sales price of house iin $1,000 (MLS)
number of rooms
1 if detached unit, 0 otherwise
number of bathrooms
1 if patio, 0 otherwise
1 if fireplace, 0 otherwise
1 if air conditioning, 0 otherwise
1 if basement, 0 otherwise
number of stories
number of car spaces in garage (0 = no garage)
age of dwelling in years
1 if dwelling is in Baltimore County, 0 otherwise
lot size in hundreds of square feet
interior living space in hundreds of square feet
R codes
data <- read.csv("BaltimoreHousing.csv")
data<-data.matrix(data)
n = dim(data)[1]
m = dim(data)[2]
y = data.matrix(data[,2])
X = data.matrix(data[,3:m])
one = matrix(rep(1,n))
X = cbind(one,X)
k = dim(X)[2]
lnL <- function(p)
(n/2)*log(2*3.1415926)+(n/2)*log(p[1])+(1/(2*p[1]))*(t(y-X%*%p[2:(k+1)])%*% (y-X%*%p[2:(k+1)]))
out <- optim(c(rep(1,(k+1))), lnL, hessian=TRUE, method = "BFGS")
beta <- matrix(out$par)
stdev <- matrix(sqrt(diag(solve(out$hessian))))
t_beta <- beta/stdev
cbind(beta[2:(k+1)],stdev[2:(k+1)],t_beta[2:(k+1)])
reg_lm <- lm(y~X-1)
cbind(beta[2:(k+1)],reg_lm$coefficient)
solve(t(X)%*%(X))%*%(t(X)%*%y)
Variable
STATION
PRICE
NROOM
DWELL
NBATH
PATIO
FIREPL
AC
BMENT
NSTOR
GAR
AGE
CITCOU
LOTSZ
SQFT
Description
ID variable
sales price of house iin $1,000 (MLS)
number of rooms
1 if detached unit, 0 otherwise
number of bathrooms
1 if patio, 0 otherwise
1 if fireplace, 0 otherwise
1 if air conditioning, 0 otherwise
1 if basement, 0 otherwise
number of stories
number of car spaces in garage (0 = no garage)
age of dwelling in years
1 if dwelling is in Baltimore County, 0 otherwise
lot size in hundreds of square feet
interior living space in hundreds of square feet
Description
neighborhood ID, used in GeoDa User's Guide and
tutorials
housing value (in $1,000)
household income (in $1,000)
residential burglaries and vehicle thefts per 1000
households
open space (area)
percent housing units without plumbing
distance to CBD
north-south indicator variable (North = 1)
other north-south indicator variable (North = 1)
east-west indicator variable (East = 1)
core-periphery indicator variable (Core = 1)
R codes
data <- read.csv("columbus.csv")
data<-data.matrix(data)
n = dim(data)[1]
m = dim(data)[2]
y = data.matrix(data[,2])
X = data.matrix(data[,3:m])
one = matrix(rep(1,n))
X = cbind(one,X)
k = dim(X)[2]
lnL <- function(p)
(n/2)*log(2*3.1415926)+(n/2)*log(p[1])+(1/(2*p[1]))*(t(y-X%*%p[2:(k+1)])%*% (y-X%*%p[2:(k+1)]))
out <- optim(c(rep(1,(k+1))), lnL, hessian=TRUE, method = "BFGS")
beta <- matrix(out$par)
stdev <- matrix(sqrt(diag(solve(out$hessian))))
t_beta <- beta/stdev
cbind(beta[2:(k+1)],stdev[2:(k+1)],t_beta[2:(k+1)])
reg_lm <- lm(y~X-1)
cbind(beta[2:(k+1)],reg_lm$coefficient)
solve(t(X)%*%(X))%*%(t(X)%*%y)
require(forecast)
data <- read.csv("BigMacIndex.csv")
data<-data.matrix(data)
Australia
Asia