Documente Academic
Documente Profesional
Documente Cultură
PROPOSAL REFINEMENT
Understanding Rule Learning Dynamics as an ALGORITHM
PRIOR
Given finite actions space A, prior θ is chosen such that . If prior
places positive probabilities on only , then some positive
probability has to be placed on so that these actions are not initially excluded. Therefore let
, where ε is suitably small . Since then probabilities in prior θ need
to be readjusted so that . Also note that value of utility
function is available, such that [Most likely utilities have to be consistent with probabilities
placed on each ].
_____________________________________________________________________________________
QUESTIONS:
1. How to choose ε?
2. Is it sufficient to uniformly distribute probability of ε over actions not included in the original prior?
3. Is it possible to spread ε smoother manner to prevent our prior from looking as a field of one-point
peaks and uniform distribution over the rest of the field [we may apply intuition that allows us to
allocate greater portion of ε to actions that are closer to action with positive probability in original
prior]?
We update potential function in the “neighborhood” of . Note that definition of the neighborhood
depends on the choice of similarity function. Since we will be using normalized Gaussian Kernels then
value of potential function is updated for every action in the actions space A. Domains within
which updating occurs can be restricted by using pyramidal Gaussians.
Definition:
∀ ∈ [
Given new updated values we can choose next action. Since (t+1) now is a current
period then replace it by t. Thus is chosen with following probability:
_____________________________________________________________________________________
QUESTIONS:
1. What are the strategies for choosing parameter b (bandwidth)? Is it a static parameter or it can be
dynamically adjusted based on data generated by the algorithm?
Oleksiy Mnyshenko |3
MARKOV PROCESS
Process of choosing an action from current probability distribution based on current values of potential
function and then updating the potential function again to arrive to new probability distribution on
actions space A can be modeled as a Markov process.
[Question: if we were to run computation how would we record value of utility function for our actions
space]
Transition probabilities:
Note that above Markov chain is inhomogeneous since transition probabilities are dependent on the last
that was used to update potential function on A. Since initially prior is such that
and , then there is always going to be a positive
probability of randomly choosing some at any t.
Therefore the whole state space consists of one recurrent class, meaning that above Markov process is
irreducible. Irreducibility of space has potential to yield interesting results as related to convergence.
Above process may get stuck in a local maximum and it can be avoided by using perturbed Markov
process such that with probability ) next state is chosen in accordance to transition probabilities
Oleksiy Mnyshenko |4
and with probability , mistake is made such that next around which update will be
performed is chosen randomly from uniform distribution over the actions space.
CONVERGENCE
Our goal is to demonstrate that the statement below that holds for extreme case of counterfactual
thinking also is true under the reinforcement learning algorithm that was outlined above:
Questions:
1. Aiding convergence through specifying similarity function in a way that will allow kernel to have
smaller variance where utility function is sensitive to small variation in actions; and greater bandwidth
where changes in payoff across a neighborhood of action is small.
READINGS
Topics:
Mathematics
Economics:
2. Law of effect;
3. Learning behavior (books or articles that will help me put such a highly focused research in the
context with other overarching topics)
- “The Theory of Evolution and Dynamical Systems”, Josef Hofbauer, Karl Sigmund, Cambridge University
press.
Research (methodology):
Becker, Tricks of the Trade: How to Think about Your Research While You’re Doing It
(University of Chicago Press, 1998).
Booth, Colomb and Williams, The Craft of Research (University of Chicago Press, 2003).
Turabian, Booth, Colomb, and Williams, A Manual for Writers of Research Papers, Theses,
and Dissertations, Seventh Edition: Chicago Style for Students and Researchers (Chicago:
University of Chicago Press, 2007). Get the 7th edition!
Zerubavel, The Clockwork Muse: A Practical Guide to Writing Theses, Dissertations and
Books (Harvard University Press, 1999).