Sunteți pe pagina 1din 3

Bellman equation theory we used for verification

The main goal of any investor or investment agent is to maximize the return with the least
possible risk. The diversification of investment is very important for minimizing the risk. Various
risk management techniques are used for instance Sharpe ratio (Sharpe, 1994) and Treynor
ratio (Hübner, 2005) are calculated for a different combination of assets in a portfolio to
determine that the ratio of risk and return is under threshold limit. These techniques give a
surface-level observation about the risk and return but they are not very close to the actual
market movements that depend on various factors. Many mathematical models have been
developed by scholars over the years for reaching optimality with a combination of choices.
Bellman model was introduced by Dr. R. E. Bellman in 1954 in his paper “the theory of dynamic
planning” (O'Donoghue, et al., 2017). The model is based on the concept of dynamic
programming. The study describes what are the essential elements used in the Bellman model,
the equation and how it works in attaining optimality in portfolio management.
The Bellman model talks about the various possible stages that can be attained by the agent. In
these different stages an agent can land based on the decision he takes; these stages need to
be represented numerically. There can be many stages given the number of alternatives an
agent has at a particular time. The agent does transactions in an environment which are called
actions. These are basically the inputs. The outcome of these actions is known as a reward
which determines how well the agent is performing in obtaining his desired goals. The agent
desires to choose an optimal action plan that maximizes the long term reward. The bellman
equation expresses the function that describes the value derived from the combination of flow
payoff and continuation payoff after discounting (O'Donoghue, et al., 2017). Since the model is
based on dynamic programming it is important to discuss some of its basic concepts here.
Dynamic programming is a set of algorithms that are used to solve a complex task. The bigger
complex task is divided into various sub-tasks in order to simplify. Then these tasks are solved in
a recursive manner to arrive at the final solution (Salas & Powell, 2017). The bellman equation
does the same with respect to the various actions, the stages it is leading to and the impact on
a final reward.
The first task for an agent is to determine the maximum long term reward that he can get if he
chooses the best set of actions. The state will be optimal in this case and the reward is also
known as the value of the state. The best action plan will give the most value and if it is nor
optimal then the value will be decreased. There can be a combination of many different action
plans that will lead to different stages that may not necessarily be optimal. Further, there are
external forces that also affect the environment and it is again not necessary that the optimal
action plan results in the best reward overall. However, the agent always tries to do the best in
a given situation. So, the best stage gives the maximum value that is used in the equation to
determine the value of the state. The bellman equation aids in the evaluation of expected
reward under different stages that can be advantageous or disadvantageous. The Bellman
equation is as follows:
V(s) = ) ))
V(s) is the value of a given state.
is the set of actions that maximizes the value.
) is the reward from action “a” in-state “s”.
is the discounting factor that discounts the reward over time.
) is the next stage that an agent reaches every time he takes action.
One of the important assumptions of the Bellman equation is that it assumes that the agent
takes the optimal decision at each stage. The equation describes the value through the
combination of reward and pay off at a particular state. The different rewards are calculated at
each and every stage that is based upon the remaining alternatives an agent has. There are
always open trades for both gain and losses. An agent always tries to end up in a win situation.
Gamma is a discounting factor and very important to determine the true value of a given state.
The successful values of gamma lie between 0.90 to 0.99. The lower value of gamma indicates
that the intention of an agent is short term and the high value indicates that these are for the
long term. The value of each state depends upon the distance it has from the terminal stage.
The more the distance is, the more will be the discounting done. As there are always numerous
alternatives after each decision, it can also be expressed that the V(s’) is the sum of all possible
states multiplied by their probability of occurrence. Then the equation will look simpler as
below:
V(s) = ) ∑ ) )
Portfolio management also requires the evaluation of various actions that can impact the
overall performance. After selecting the assets for the portfolio, the agent needs to monitor
them and takes further action. The value of portfolio changes with changing factors in the
environment. The agent faces ac continuous challenge to maintain the value of a portfolio to an
optimal level and with acceptable risk. There can be many different alternatives at a given point
of time, for instance after any recent development in the Brexit meetings the stock’s value is
affected. The agent needs to be aware of the incidents that will affect its portfolio and all the
possible action alternatives to counter these external threats. With the help of bellman model,
the agent can evaluate how different action plans will affect the performance of his portfolio.
The probabilities are assigned to every alternative and sub alternative. The branches of these
alternatives are solved regressively to arrive at a final figure that sums up the product of
different values to their respective probabilities. It makes it easier for the agent or investor to
know the consequences of each and every decision and the equation giving the best value is
taken for ideal decision making.
The Bellman equation is widely accepted by many investment banking firms, foreign
institutional investors and traders. Its adoption has a major impact on the ongoing trend in the
financial market and Business Insider (2014) also describes how well the financial experts and
traders used the equation to identify the true potential of their portfolio that can be achieved
and ultimately increasing the return on their investments. With the advancement of technology
especially the introduction of Artificial Intelligence (AI) and better computing speed of modern
computers the Bellman equation can be applied easily for any given task provided the various
possible outcomes and their probability of occurrence is known in advance. For that too, the
tools can be developed for better forecasting.It can be concluded that the Bellman model has a
diverse range of applications and it is one of the effective models to evaluate the various
alternatives and their impact at a given point of time. It has been adopted by many
professionals in the field of investment and portfolio management and it proved to be an
effective tool for risk management activities.

References
Business Insider, 2014. The 17 Equations That Changed The Course Of History. [Online] Available at:
https://www.businessinsider.in/The-17-Equations-That-Changed-The-Course-Of-
History/articleshow/31905139.cms[Accessed 31 July 2019].

Hübner, G., 2005. The generalized Treynor ratio. Review of Finance, 9(3), pp. 415-435.

O'Donoghue, B., Osband, I., Munos, R. & Mnih, V., 2017. The uncertainty bellman equation and
exploration. arXiv preprint arXiv, p. 1709.05380..

Salas, D. & Powell, W., 2017. Benchmarking a scalable approximate dynamic programming algorithm for
stochastic control of grid-level energy storage. INFORMS Journal on Computing, 30(1), pp. 106-123.

Sharpe, W., 1994. The sharpe ratio. Journal of portfolio management, 21(1), pp. 49-58.

S-ar putea să vă placă și