Documente Academic
Documente Profesional
Documente Cultură
BASIC PROBABILITY
Probability of an event's complement:
, with
Bayes' Rule:
Definition of Independence:
***Expected Value:
NOTES: It's very possible to have an expected value that couldn't actually happen. For example, if a prof only
gives out 90s and 100s, and there is a 50% likelihood of each and an even number of students in the class, then
the expected value of the grades is 95 - even though the prof will never actually give a 95.
Expected value is LINEAR!!!! This is a beautiful thing, allowing us to do things like this:
***Variance:
Standard Deviation:
Conditional PMF:
Families of Discrete Random Variables (list is not exhaustive, but includes the "most important"):
Bernoulli
- single trial with two possible outcomes (e.g., flipping a coin, answering a yes/no question)
For
Binomial
- repeated trials of Bernoullis (e.g., flipping several coins in sequence, answering several
yes/no questions in sequence)
For a positive integer
and
Geometric
- number of successes until a (given number of) failure(s), or number of failures until a (given
number of) success(es) (e.g., running the Boston marathon every year until the year you manage to
finish; cold-calling people for donations until you have six donations)
For
Discrete Uniform
- outcomes in a given range all have an equal likelihood of occurring (e.g., rolling a die)
Poisson ( - involves a RATE of occurrences over time (e.g., number of dodgeball hits in 3 minutes, number of
phone calls received in a given span of time)
For
***Expected Value
:
As with all transitions from discrete to continuous values, we should expect summations to be replaced by integrals, and
that is the only change you see here:
***Variance
:
Because we define variance in terms of expected values, and because we have amended our definition of expected
value to utilize the necessary integration, the definition of variance is exactly the same as before...
Standard Deviation
Again, this is the same as before:
Families of Continuous Random Variables (list is not even close to exhaustive, but includes the "most important"):
Uniform
For constants
Exponential (
For
Erlang
For
Gaussian
- NOTE: Gaussian RVs are HUGELY important!!!!! These babies AREN'T going away, so start
loving them now ...
For constants
, then:
Again, there is no surprise here! Addition of a constant just shifts the location of the density function - it does NOT
affect the spread of the pdf, which is what variance measures. Only the scaling affects the variance, and it shouldn't
surprise you that the effect is quadratic, for variance is essentially a quadratic measure (the second moment minus the
square of the expected value).
Are the above relationships true for ANY random variable? YES!!!! This is true for ANY random variable, provided that
the transformation of that random variable is LINEAR.
NOTE: Linear transformations of any kind on uniform and Gaussian random variables produce uniform and Gaussian
random variables, respectively.
This is not so for exponential and Erlang random variables, the distributions for which are constrained to be 0 for values
less than 0 and are constrained as well to start at 0. Hence, ONLY linear transformations of the form
, where
, result in a Y that is an exponential or an Erlang random variable, respectively. Shifting the original X
distribution in any direction results in a random variable that is neither exponential nor Erlang (respectively); likewise,
scaling the original X distribution by a negative constant will flip the original distribution, again resulting in a random
variable that is neither exponential nor Erlang (respectively).
Notice that the function under integration is exactly the 0 mean, variance 1 Gaussian.
Transformation of , a Gaussian
Random Variable to Standard Normal Random Variable :
We need to be able to do this transformation because there is no analytical solution for the integration of a
Gaussian pdf; thus, we need to be able to linearly transform general Gaussian (non-zero mean and/or non-unity
variance) into the standard Gaussian. To do this:
Probability that is in
Extending the logic above, this is simply:
EXAM 2 MATERIAL
Pairs of Random Variables
Relationships between Distributions/Mass Functions:
Discrete:
Continuous:
Marginals
Marginals
Probability of Event A
Probability of Event A
and
and
Bayes' Rule
Bayes' Rule
Independence
Two RVs are independent if and only if ...
Independence
Two RVs are independent if and only if ...
and
and
8
Covariance/Correlation:
NOTE: Remember, shifts (addition of scalars) don't affect covariances. Also, scalars can be pulled out, i.e.,
Correlation coefficient
NOTE:
and
uncorrelated;
completely correlated (linear relationship, i.e.
X,Y Independent
Iterated Expectation:
Linear transformations of X and Y are invertible transformations, and hence the transformed variables
are also jointly Gaussian, i.e.,
and
are not only marginally Gaussian, but also are jointly Gaussian!
The conditional
is Gaussian, with:
The conditional
is Gaussian, with:
10
when
when
is true.
is true.
Probability of Error: This is always equal to the probability of missed detection times the a priori probability of
the hypothesis plus the probability of false alarm times the a priori probability of the null hypothesis,
i.e.:
Expected Value of the Cost of Errors: All we do here is factor in the associated costs to the probability of error
computation, i.e.:
Detectors:
NOTE: All detectors are written for the continuous case. As always, for the discrete case, the expressions are the same,
but with big "P" substituted for little " ".
Maximum Likelihood (ML) Detector: Most basic detector; compares likelihood ratio to 1 in order to determine
which density is larger. If the density in numerator is larger, then the ratio is larger than 1; if the ratio is smaller
than 1, then the density in the denominator must be larger.
Maximum A Posteriori (MAP) Detector: Minimizes probability of error by weighting the likelihoods with the a
priori probabilities. Note that if the a prioris are equal, then the MAP detector simplifies to the ML detector.
Minimum Cost (Bayes' Risk) Detector: Minimizes expected value of the cost of error by weighting the
likelihoods by both the a priori probabilities and the costs of missed detection/false alarm. Note that if the costs
are equal, then the minimum cost detector simplifies to the MAP detector.
11
(We engineers know that the log can be taken to any base, including base e, so no need to specify
'ln'.)
*log 1 = 0
12
Estimation
We want to know X, but we can only observe Y; so, we have to make an educated guess at the value of X given that we
have observed some value of Y.
Biased/Unbiased:
ML Estimation:
Choose the value of X that maximizes the conditional distribution, called the "likelihood function".
MAP Estimation:
Choose the value of X that maximizes the conditional distribution, but now we multiply by the prior distribution on X,
because we no longer assume equal priors (as in the ML case).
Case 2 (A, some attribute of X, is observed, thus restricting the possibilities for X):
where, clearly,
the error in the linear estimate. Then the mean square error of the estimate is:
14
EXAM 3 MATERIAL
LIMIT THEOREMS
Sums of Random Variables
Let
. Then:
by the linearity of expected value.
which, if
Repeat after me: If the RVs are independent, then the variance of the sum equals the sum of the variances."
NOTE: PDFs of the sums of RVs are convolutions of the individual PDFs!!!
What if N is random? (i.e., we don't know how many variables we're adding ...)
Let
Markov Inequality
For non-negative RV ,
15
Chebyshev Inequality (this gives a tighter bound than the Markov inequality ...)
For any RV (not necessarily non-negative),
be i.i.d. (although the weak law holds for uncorrelated RVs). Then:
for any
The WLLN says that as the number of RVs we're summing approaches infinity, the sample mean approaches
the true mean with a probability that approaches certainty!!!!
Strong Law:
This is very similar to the weak law, except we require that the RVs be i.i.d., and we are essentially changing
the above statement to say that as the number of samples approaches infinity, then the sample mean actually
equals the true mean with absolute certainty:
i.i.d.
in which we transform to a standard normal Gaussian CDF, then use the phi or Q function to get the
probability we seek. We can of course express the mean and standard deviation above in terms of the mean
and standard deviation of as follows...
Remember:
16
Confidence Intervals
Theorem: is a Gaussian
RV with unknown mean . The relationship between a confidence interval
estimate of , denoted , is given by
where
17
MARKOV CHAINS
Markov Property
Basically, it says we only need to know the current state in order to know the probabilities of the next state, so that
there is only a one step delay dependence:
Row numbers are the states we're leaving; column numbers are the states at which we're arriving, and the -th
element is the probability of entering state from state .
The sum of each row must be 1, because at each time step, some action MUST be taken, whether it's to stay in the
same state or move to another.
Under certain conditions, the matrix will converge to the steady state, where each row will contain the same steady
state probability vector, denoted using the Greek lowercase letter pi:
is the eigenvector of
corresponding to eigenvalue 1. There can be as many steady state probability vectors as the
matrix has eigenvalues equal to 1.
Definitions
accessible - State is accessible from state if there's a directed path from to .
communicate - States and communicate if is accessible from and is accessible from . A group whose members
communicate with each other is called a communicating class.
irreducible - In the state diagram or matrix, every state communicates with every other, and hence all states are active
in the steady state, so none of them can be "reduced out".
18
transient - State is transient if communicates with state , but doesn't communicate with . You can think of
transient states as follows: Over time, "things" in that state will leak out to other non-transient states, until
eventually there's virtually nothing left to leak out. Hence,
NOTE: If there are multiple communicating classes with recurrent states, then the steady state probability vector
depends on the initial probability vector (where you probably started).
period - Greatest common divisor of all possible cycles from all states back to themselves.
aperiodic - Period of Markov Chain is equal to 1
NOTE: At least one self loop means that the Markov chain must have period 1, and thus must be aperiodic!!!
NOTE 2: If the period is greater than 1, then the chain will oscillate between steady state vectors, and will depend on
where you started.
b) Draw a dashed line between one recurrent state and the rest of the states. Then, the probability of flowing
into state must equal the probability of flowing out of state That is, write an equation where
19
NOTE: Be sure not to draw dashed lines around transient states. Their steady state probabilities are zero, so
their values won't help you solve systems of equations involving them.
NOTE 2: Self-loops don't matter, because they're neither flowing into nor out of that state.
c) Repeat part b for a different recurrent state; do this until you have a system of as many equations as you
have unknowns.
d) Solve the system for the unknown elements of the steady state vector.
e) Check your work - your results should sum to 1!!!
20