Sunteți pe pagina 1din 5

Debre Berhan University

College of Natural and Computational Sciences


Department of statistics

Concepts of Bayesian inference


(Stat 651)
Assignment-III(Group)

Master of Statistics
Shewayiref Geremew
Abate Alemayehu
Samuel Ayele
Lemessa Nigusa
Wudneh Ketema

ID No.:PGR/009/08
ID No.:PGR/010/08
ID No.:PGR/010/08
ID No.:PGR/006/08
ID No.:PGR/005/08

Submitted To: Dr.A.R.Muralidharan

December 1, 2016
Debre Berhan,Ethiopia

Assignment III
1

Definition Of Bayes Estimators

In estimation theory and decision theory, a Bayes estimator or a Bayes action is an estimator or decision rule that minimizes the posterior expected value of a loss function (i.e., the
posterior expected loss). Equivalently, it maximizes the posterior expectation of a utility function.In estimation of a parameter vector from N observation samples y, a set of performance
measures is used to quantify and compare the characteristics of different estimators. In general
an estimate of a parameter vector is a function of the observation vector y, the length of the
observation N and the process model M. This dependence may be expressed as
b = f (y, N, M )
Different parameter estimators produce different results depending on the estimation method
and utilization of the observation and the influence of the prior information. Due to randomness of the observations, even the same estimator would produce different results with different
observations from the same process. Therefore an estimate is itself a random variable, it has a
mean and a variance, and it may be described by a probability density function. However, for
most cases, it is sufficient to characterize an estimator in terms of the mean and the variance
of the estimation error. The most commonly used performance measures for an estimator are
the following:
b
Expected value of estimate: E[]
b
Bias of estimate: E[b ]= E[]
b = E[(b E[])()
b
b T]
Covariance of estimate: Cov[]
E[])
Optimal estimators aim for zero bias and minimum estimation error covariance. The desirable
properties of an estimator can be listed as follows:
1. Unbiased estimator: an estimator of is unbiased if the expectation of the estimate is
equal to the true parameter value:
b =
E[]
An estimator is asymptotically unbiased if for increasing length of observations N we have
b =
lim E[]

2. Efficient estimator: an unbiased estimator of is an efficient estimator if it has the


smallest covariance matrix compared with all other unbiased estimates of :
b
Cov[bEf f icient ] Cov[]
where b is any other estimate of .

3. Consistent estimator: an estimator is consistent if the estimate improves with the increasing length of the observation N, such that the estimate b converges probabilistically
to the true value as N becomes infinitely large:
lim E[|b | > ] = 0

where is arbitrary small.

Example
Let X1 , ..., Xn be a random sample from X B(1; ) with 0 1.
Prior density : Beta(; ), i.e.
(
(+) 1
x (1 x)1 if 0 x 1
f (x|) = ()()
0
if otherwise

Z
(1) =
0

n
Y

( + )
[ f (xi ; )]()d =
()()
i=1

Z
(2) =

[
0

n
Y

f (xi ; )]()d

i=1

xi

(1)n

xi 1

(1)1 d =

( + ) ( +
()()

P
P
( + ) ( + xi )( + n xi )
=
()()
( + + n)

xi +
n++
Pn
i=1 Xi +
Bayes estimator for :
n++
Special case : If = = 1, then the Beta(; ) becomes the Un[0,1]-prior density and the
Bayes estimator is
Pn
i=1 Xi + 1
n+2
(x1 , ..., xn ) =

(1)
(2)

Compare the Loss and Risk Function with Examples

Suppose we have a random sample from X with density f(x; ) where IR is an unknown
parameter.
It is convenient to borrow some language from decision theory.
1. An estimate for , a function (x1 ; ...; xn ) of the observations is often called a decision.
The function : : IRn IR is called a decision function.
2. If is estimated by (x1 ; ...; xn ), then the error is called the loss and a measure for the
error is called a loss function , i.e. a non-negative function of and (x1 ; ...; xn ) :
l( ; (x1 ; ...; xn ))

II

Examples
l( ; (x1 ; ...; xn ))=| (x1 ; ...; xn )|: Absolute error lose
l( ; (x1 ; ...; xn ))=( (x1 ; ...; xn ))2 : Squared error lose
3. Suppose a certain loss function has been chosen. We want to choose an estimate for ,
i.e. a decision function (x1 ; ...; xn ) such that the average loss is small. The average loss
is called the risk function. It is a function R of and (x1 ; ...; xn )

R(; )
= E [l(; (X1 , ..., Xn ))]

n
P P
Q

l(;
(x1,

,
xn))
f (xi ; )

x1
R

xn
R

i=1
n
Q

l(; (x1, , xn))

(discrete case)

f (xi ; )dx1 dxn

(continuous case)

i=1

Note
For the squared error loss function, the corresponding risk function is the mean-squared error
(MSE).

Two optimality of Bayes Decision Making


1.Maximum: Maximum likelihood decision rule
Decision based on the posterior probabilities is called the Optimal Bayes Decision rule.
Maximum likelihood decision rule: the threshold value is 1; 0-1 loss function and
equal class prior probability
The Minimax Criterion, used in Game Theory, is derived from the Bayes criterion,
and seeks to minimize the maximum Bayes Risk
The Minimax Criterion does nor require knowledge of the priors, but it needs a
cost function The prior probability p(y) is the probability of the state before we
have observed it. p(y = 1); p(y = -1).
We combine the prior p(y) with the likelihood p(x | y) to obtain the posterior probability p(y | x), which is the probability of the state y given (i.e. conditioned on)
the observation x.

p(y | x) =

p(x|y)p(y)
p(x)

This is Bayes Rule.It follows from the identity p(x | y)p(y) = p(x; y) = p(y |
x)p(x).
III

is the probability of y conditioned on observation x.


p(y | x) = p(x|y)p(y)
p(x)
If p(y = 1 | x) > p(y = 1 | x)
decide y = 1, otherwise decide y = -1
Maximum a Posteriori (MAP)
yM AP = argy maxp(y | x)
2.Admissibility: A criterion for selecting between decision rules in the frequentist
framework is called admissibility. In short, it is often difficult to identify a single
best decision rule, but it can sometimes be easy to discard some bad ones, for
example if they can be shown to always be no better than (and sometimes worse
than) another rule.
In statistical decision theory, an admissible decision rule is a rule for making a
decision such that there is not any other rule that is always better than it.In most
decision problems the set of admissible rules is large, even infinite, so this is not a
sufficient criterion to pin down a single rule, but as will be seen there are some good
reasons to favor admissible rules; compare Pareto efficiency.
According to the complete class theorems, under mild conditions every admissible
rule is a (generalized) Bayes rule (with respect to some prior () possibly an
improper onethat favors distributions where that rule achieves low risk). Thus,
in frequentist decision theory it is sufficient to consider only (generalized) Bayes
rules.
=====================================================

IV

S-ar putea să vă placă și