The Implementation of A Hidden Markov Mo

The Implementation of a Hidden Markov Model in MATLAB for the Prediction of
Commodity Prices
2011 Report
Prepared by:
Taylor Sauder
Foster College of Business Administration
Bradley University
Report Distributed December 7, 2011

Abstract
The Hidden Markov Model offers an approach for modeling dynamic systems
that are observed through a time-series. In this paper, a general overview of Hidden
Markov Models is presented, followed by tutorial for implementing a model in
MATLAB. To illustrate the application of the model, the technique is used to predict
commodity prices. In addition to the predictions, a trading strategy is produced to
determine the overall profit/loss over the backtesting period.
1. Introduction
The analysis and prediction of financial time series is of upmost importance
for professionals and academics in the field of finance. Using the concepts of discrete
time-series, a Hidden Markov Model (HMM) technique is used to model the dynamic
behavior of commodity prices. First, a set of historical data is used to initialize and
train the HMM. Secondly, the trained HMM will be used to predict future
commodity prices over a selected backtesting period. Finally to determine the
accuracy and value of the model, a random walk model will be compared to the
generated HMM and a trading strategy will be implemented over a backtesting
period.
Section 2 of this paper describes the Hidden Markov Model process in a
general context. Section 2.1 provides a formal, mathematical explanation of the
model; Section 2.2 describes the dilemmas that arise within HMM’s and offers well-
established solutions. Section 3 provides a thorough summary of the HMM
approach for modeling commodity prices using MATLAB. Section 4 describes the
1
data analysis and general inputs for the model. Several performance metrics are
highlighted in Section 4.1 followed by the results in Section 4.2. After summarizing
the results, Section 4.3 calls attention several issues with the modeling process.
Using the HMM output, a trading strategy is created and implemented in Section 5.
Finally, a summary of the findings and discussion of future studies is provided in
Section 6.
2. Hidden Markov Model Overview
Processes are often viewed as signals, which come from either one or
multiple sources. These signals may have fixed or unfixed parameters and may even
be corrupted at times. Therefore, a mathematical signal model can be used to
process the signal in hopes of deriving a statistical process that describes the source.
This first leads to the theory of Markov Chains, which can be extended further to
Hidden Markov Models. Leonard E. Baum and several colleagues first published the
theory of HMMs during the 1960s. In 1989, Lawrence Rabiner published his tutorial
on HMMs, which explained the theory in a more general context (Lawrence). This
ease of understanding propelled the use of HMMs across several fields of study
including speech recognition, speech analysis, video analysis, photo analysis, and
biology. While most research is applied to fields outside of finance, it often deals
with time series analysis; therefore the transition requires little effort.
2.1 Formal Definition
While a Markov Process does model a stochastic process, it observes, rather
than describe or predict the process, which generates the observations. Here in lies
2
the importance of the Hidden Markov Model. The HMM records observations as
probability functions of the state, where the source is described as a hidden
stochastic process.
HMM’s are characterized by the following five traits:
1) The number of 𝑁 hidden states within the model. Each state corresponds to
a unique state provided by the model. In the model, the states are defined by
price buckets.
2) The amount of 𝑀 unique observations per state. These symbols are denoted
as 𝑉 = {𝑣1 , 𝑣2 , … . , 𝑣𝑀 }. This can be thought of as the number of
observations that fall in each price bucket.
3) State transition probability distributions 𝐴 = {𝑎𝑖𝑗 } where 𝑎𝑖𝑗 =
𝑃(𝑞𝑡+1 = 𝑆𝑗 |𝑞𝑡 = 𝑆𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁
4) The emission probability distribution in state 𝑗, 𝐵 = {𝑏𝑗 (𝑘)} where
𝑏𝑗 = 𝑃(𝑣𝑘 𝑎𝑡 𝑡 |𝑞𝑡 = 𝑆𝑗 ), 1 ≤ 𝑗 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝑀
5) The prior probability 𝜋𝑖 = {𝜋𝑖 } of being in state i at the beginning of the
observations where 𝜋𝑖 = 𝑃 (𝑞1 = 𝑆𝑖 ), 1 ≤ 𝑖 ≤ 𝑁.
The values of 𝑁, 𝑀, 𝐴, 𝐵, and 𝜋 can be used to generate the observation sequence
𝑂 = 𝑂1 𝑂2 𝑂3 … 𝑂𝑇 where 𝑂𝑡 is an observation from 𝑉, and 𝑇 is the number of
observations in the sequence (Md, Stock). To initiate a HMM, an initial state will be
chosen based on the prior distribution 𝜋 and 𝑡 is set at 1. 𝑂𝑡 = 𝑣𝑘 is chosen
according to the distribution in 𝑆𝑖 . The model moves to state 𝑞𝑡+1 = 𝑆𝑗 based on the
3
transition probability distribution of 𝑆𝑖 . This process will continue as 𝑡 increments
or until termination. More formally, this process is denoted by 𝜆 = (𝐴, 𝐵, 𝜋 ) where:
∑𝑗 𝑎𝑖𝑗 = 1, ∑𝑡 𝑏𝑖 (𝑂𝑡 ) = 1, ∑𝑖 𝜋𝑖 = 1, 𝑎𝑖𝑗 , 𝑏𝑖 (𝑂𝑡 ), 𝜋𝑖 ≥ 0 for all 𝑖, 𝑗, 𝑡
2.2 HMM Fundamental Issues
There are three fundamental issues regarding HMMs that must be solved before an
HMM can be used.
1) Given 𝜆 = (𝐴, 𝐵, 𝜋 ) how can 𝑃 (𝑂|𝜆) be computed for each observation
sequence?
2) How can a state sequence 𝑄 = 𝑞1 𝑞2 … . 𝑞𝑇 be chosen for the observation
sequence 𝑂 and the model 𝜆?
3) How do we maximize 𝑃(𝑂|𝜆) by adjusting model parameters 𝐴, 𝐵, and 𝜋?
The solution for question one can be solved in a straightforward manner,
however the amount of calculations is too cumbersome and a more efficient process
must be found (Lawrence). Therefore, a Forward-Backward algorithm (FB) is used.
The FB uses induction to create two sets of probabilities, the forward and backward.
Together, these are used to create a smoothed set of values. In this specific case,
only the forward probabilities are used to create ∝𝑇 (𝑖) = 𝑃(𝑂1 𝑂2 … 𝑂𝑇 , 𝑞𝑇 = 𝑆𝑖 |𝜆)
where 𝑃(𝑂 |𝜆)is the sum of ∝𝑇 (𝑖) ′ 𝑠 and 𝛼1 (𝑖) = 𝜋𝑖 𝑏𝑖 (𝑂1 ) for 1 ≤ 𝑖 ≤ 𝑁.
The second question can be answered by using a Viterbi Algorithm. Viterbi is
commonly used because it takes into account the most likely state at every instant
and the probability of occurrence within the sequence of states. The algorithm will
4
find the max 𝑄 for a given observation sequence 𝑂 by the means of induction. An
array 𝜓𝑡 (𝑗) is used to store the highest probability paths (Lawrence).
The third question can be answered either through a Baum-Welch method,
Expectation-Modification method, or Gradient Techniques. Rabiner claims, “There
is no known way to analytically solve for the model which maximizes the probability
of the observation sequence. In fact, given the finite observation sequence as
training data, there is no optimal way of estimating the model parameters”
(Lawrence). The Baum-Welch method can be used to select 𝜆 = (𝐴, 𝐵, 𝜋 ) where
𝑃(𝑂 |𝜆) is locally maximized using an interative process. The Baum-Welch method
uses the Forward Backward algorithm to create a re-estimation model of 𝜆̅ =
(𝐴̅, 𝐵̅ , 𝜋̅ ). By using 𝜆̅ in place of 𝜆, the probability of 𝑂 being observed from the
model can be increased up to some point. This can be defined by max [𝑄(𝜆, 𝜆̅)] =
̅
𝜆
𝑃(𝑂|𝜆̅) ≥ 𝑃(𝑂|𝜆) where the likelihood function converges to a critical point
(Lawrence).
3. HMM Implementation in MATLAB
To implement the HMM process, MATLAB offers various algorithms within
the statistical toolbox package. A summary of the implementation is described
below in 12 steps.
1. Select a dataset to be modeled and import it into MATLAB.
2. Select a time frame for the model to be trained over and a period to
backtest over. This requires knowledge of the dataset and may be
5
influenced by computing power. Note: each period of backtesting
requires the entire process to be repeated.
3. Determine high and low price windows and bucket sizes for the
observation prices. For example, a bucket size of $1.00 with windows -$5
and $5 will create buckets: -5:-4, -4:-3, …. , -1:0, 0:1, …. , 3:4, 4:5.
4. Compute observations per state matrix M.
5. Create the transition probability matrix where 𝑎𝑖𝑗 =
𝑃(𝑞𝑡+1 = 𝑆𝑗 |𝑞𝑡 = 𝑆𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁. Within this model, the probabilities
remain discrete. Therefore, the issue of dividing by zero arises. To
circumvent this issue, a loop replaces the transition values with zero.
6. Create the emission probability matrix. These values are assumed to be
normal where the mean and variance are derived from matrix M.
7. Create a prior probability matrix where the discrete probabilities are
derived from matrix M.
8. By default, MATLAB begins the HMM algorithms at state 1. To assign a
different distribution of probabilities, the transmission and emission
matrices are augmented to include the prior matrix.
9. To generate a sequence of 𝑄 states and 𝑂 observations, the function
ℎ𝑚𝑚𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒 (𝑙𝑒𝑛𝑔𝑡ℎ, 𝐴, 𝐵) is used; where A and B represent the
transition and emission matrices respectively (Jaroslav).
10. Using the sequence of states and observations, ℎ𝑚𝑚𝑡𝑟𝑎𝑖𝑛 (𝑂, 𝐴, 𝐵)
calculates the maximum likelihood estimates of the transition and
6
observation probabilities by the means of a Baum-Welch algorithm given
the initial transition and emission matrices (Jaroslav).
11. To predict the future state and associated price, the estimated transition
matrix is used along with the current state. An algorithm examines the
current state then searches through the transition matrix to find the
maximum likelihood for the next state. Once determined, the future state
and associated prices are stored in matrix that remains unaffected by the
backtesting process. In other words, it does not update along with the
transition and emission matrices during the process.
12. Repeat Steps 4-11 until all of the backtesting data has an associated
predicted state and price.
For a more in-depth overview see Appendix I.
4. Data Analysis and Results
To investigate the performance of the HMM model, predictions are generated
for corn spot prices over a 20-day period. The initial training day includes daily
prices beginning January 2005 and ending September 2011. This period was chosen
due to corn ethanol mandates, which began in 2005. Since 2005, 40% of the corn
production is consumed by ethanol demands, which has led to a drastic increase in
price and caused prices to enter a new regime. It should be noted that all data was
gathered via DataStream and the prediction period was kept minimal due to
computing power restraints.
7
For the corn prices, a step or “bucket” size of .005 was used. This was due to
the half-cent price reporting that occurs in the commodity markets. Also a
maximum price of $8 and minimum price of $1.50 were chosen to train the data set.
This ensured that all past data would be encompassed.
4.1 Performance Metrics
Several performance metrics were used in this study. First, the mean
absolute deviations and standard deviations for the predicted and actual prices
were calculated. Second, the accuracy in predicting the direction of price
movements was determined and recorded as a percentage value.
4.2 Results
Over the 20-day backtesting period, the HMM model resulted a mean
absolute deviation of $0.149 and a standard deviation of $0.117 when predicting
prices. The model was also able to predict 33% of the price movements during the
period. To determine if these results are of any value to a trader or researcher, a
random walk model was created and results were compared. Price changes within
this model are predicted by using the previous day’s price for the current day’s price.
From this approach, slightly improved predictions resulted. The random walk mean
absolute deviation was $0.100 and the standard deviation was $0.098. While the
random walk method has reduced the prediction error, a two-sample mean
hypothesis test was conducted to determine if the difference was significant. The
̅̅̅̅
(𝑋 ̅̅̅̅
1 −𝑋 2)
test statistic value can be derived from 𝑡 = where 𝑋1 represents the HMM
𝑆2 𝑆 2
√ 1+ 2
𝑛1 𝑛2
8
deviation and 𝑋2 represents the random walk deviation. The degrees of freedom
can be calculated by
2
𝑆2 𝑆2
( 1 + 2)
𝑛1 𝑛2
𝑑𝑜𝑓 = 𝟐 𝟐 .
1 𝑆2 1 𝑆2
∗ ( 1) + ∗ ( 2)
𝑛1 − 1 𝑛1 𝑛2 − 1 𝑛2
By the use of the prior calculations, the resulting p-value is 0.1597 for a two-tailed
test (Wackerly). This test clearly fails to reject the null hypothesis. Therefore, there
is not enough evidence to support a significant difference between the two mean
deviations.
4.3 Issues to Consider
Due to the model examining only prices, many data points that the model
trains over are not relevant to the backtesting period. Based on this principle, it
would be more relevant to train only over prices that may have an impact on
predicting future prices. Therefore, a regime switching model could be
incorporated into the HMM so that only relevant data is used for training. This
would also reduce the overall computations enabling the potential for higher order
processes to be conducted. Another issue may arise within the bucket sizes. By
using the smallest available size of .005, overfitting could potentially occur causing
the model to fail. To address this issue, further studies should be conducted
regarding the selection of an optimum size.
9
5.0 Trading Strategy
While there is a lack of statistical significance between the random walk and
HMM model, a trading strategy was created to determine if the model still provides
value. Before designing a strategy, one issue with the HMM process must be
addressed. While training the data if a state has not been observed, the transition
prediction is the state remains the same. i.e. it behaves the same as the random
walk model. This is relevant because a practitioner would not want to trade on ill-
informed information. Therefore, a Boolean filter is used to determine whether the
model predicts a stationary state. This is represented by: 1 stationarity occurs or 0
it does not occur.
If the previous filter is satisfied, then several criterions must be met before a
trade is initiated. First, predicted returns must be calculated for all backtested
periods where the previous filter is satisfied. These returns are further compared to
a required return set by the user. For our testing, the required return was 1%,
which would represent a considerable gain for daily prices. The final step must
compare the previous day’s price to the predicted price in order to determine a buy
or sell action. These values are then stored in a matrix and summated to ascertain
the overall profit from the trading strategy.
Over the 20-day backtesting period, the trading strategy initiated five trades
for a total loss of $0.265. This was not astonishing considering the model predicted
the price direction correctly 33% of the time. A visual representation of the price
predictions, errors, and trading points can be seen below.
10
6.0 Conclusion
This paper has used a discrete time-series Hidden Markov Model in an
attempt to predict corn spot prices. Over a 20-day backtesting period, the model
resulted a mean absolute deviation of $0.149 and a standard deviation of $0.117
when predicting prices. In addition, the model predicted the correct price
movement 33% of the time. However, a random walk model was constructed and
had a mean absolute deviation of $0.100 and standard deviation of $0.098. While
the random walk model appeared to be better, a t-test result in a lack of statistical
significance between the random walk and HMM model. The final step was to
implement a trading strategy to determine the value of the HMM model. After a
series of filters, the strategy resulted in a total loss of $0.265.
11
While the model did not produce advantageous results, several
improvements are proposed for future research. First, as noted before, a procedure
should be generated to determine an optimum bucket size to prevent overfitting the
model. Second, the model should include only relevant prices to prevent
computation lag. This process could be accomplished through a regime-switching
algorithm. Third, higher order methods could be added to the HMM. By including
the probabilities of states above and below the current state, a more informed
prediction may result. Similar to the bucket size criteria, an optimum amount of
nearby states would need to be selected. Fourth, to improve the trading strategy,
the Boolean filter could be improved by requiring a minimum amount of past
observations. For example, a trade would not occur if a state had not been observed
three times. This would enable the strategy to only trade on well-informed
predictions. In conclusion, the first attempt at using a HMM to predict commodity
prices does appear provide mildly beneficial results and should be further
researched.
12
References
Jaroslav Lajos, K. M. George, N. Park. 2011. A Six State HMM for the S&P 500
Stock Market Index. Oklahoma State University. 218 Mathematical Sciences.
Stillwater, OK.
Lawrence R. Rabiner. 1990. A tutorial on hidden Markov models and selected
applications in speech recognition. In Readings in speech recognition, Alex Waibel
and Kai-Fu Lee (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA
267-296.
Md. Rafiul Hassan and Baikunth Nath. 2005. StockMarket Forecasting Using
Hidden Markov Model: A New Approach. In Proceedings of the 5th International
Conference on Intelligent Systems Design and Applications (ISDA '05). IEEE Computer
Society, Washington, DC, USA, 192-196.
Md. Rafiul Hassan. 2009. A combination of hidden Markov model and fuzzy
model for stock market forecasting. Neurocomput. 72, 16-18 (October 2009),
Wackerly, Dennis D., William Mendenhall, and Richard L. Scheaffer. "Chapter
10.3." Mathematical Statistics with Applications. Pacific Grove, CA: Duxbury, 2002.
Print.
13

The Implementation of A Hidden Markov Mo

Încărcat de

Informații document

Titlu original

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

The Implementation of A Hidden Markov Mo

Încărcat de

Drepturi de autor:

Formate disponibile

The Implementation of a Hidden Markov Model in MATLAB for the Prediction of

Foster College of Business Administration

Report Distributed December 7, 2011

Markov Models is presented, followed by tutorial for implementing a model in

commodity prices. In addition to the predictions, a trading strategy is produced to

determine the overall profit/loss over the backtesting period.

The analysis and prediction of financial time series is of upmost importance

commodity prices over a selected backtesting period. Finally to determine the

generated HMM and a trading strategy will be implemented over a backtesting

Section 2 of this paper describes the Hidden Markov Model process in a

general context. Section 2.1 provides a formal, mathematical explanation of the

established solutions. Section 3 provides a thorough summary of the HMM

Finally, a summary of the findings and discussion of future studies is provided in

2. Hidden Markov Model Overview

be corrupted at times. Therefore, a mathematical signal model can be used to

2.1 Formal Definition

While a Markov Process does model a stochastic process, it observes, rather

probability functions of the state, where the source is described as a hidden

HMM’s are characterized by the following five traits:

as 𝑉 = {𝑣1 , 𝑣2 , … . , 𝑣𝑀 }. This can be thought of as the number of

observations that fall in each price bucket.

3) State transition probability distributions 𝐴 = {𝑎𝑖𝑗 } where 𝑎𝑖𝑗 =

4) The emission probability distribution in state 𝑗, 𝐵 = {𝑏𝑗 (𝑘)} where

5) The prior probability 𝜋𝑖 = {𝜋𝑖 } of being in state i at the beginning of the

observations where 𝜋𝑖 = 𝑃 (𝑞1 = 𝑆𝑖 ), 1 ≤ 𝑖 ≤ 𝑁.

The values of 𝑁, 𝑀, 𝐴, 𝐵, and 𝜋 can be used to generate the observation sequence

𝑂 = 𝑂1 𝑂2 𝑂3 … 𝑂𝑇 where 𝑂𝑡 is an observation from 𝑉, and 𝑇 is the number of

chosen based on the prior distribution 𝜋 and 𝑡 is set at 1. 𝑂𝑡 = 𝑣𝑘 is chosen

or until termination. More formally, this process is denoted by 𝜆 = (𝐴, 𝐵, 𝜋 ) where:

∑𝑗 𝑎𝑖𝑗 = 1, ∑𝑡 𝑏𝑖 (𝑂𝑡 ) = 1, ∑𝑖 𝜋𝑖 = 1, 𝑎𝑖𝑗 , 𝑏𝑖 (𝑂𝑡 ), 𝜋𝑖 ≥ 0 for all 𝑖, 𝑗, 𝑡

2.2 HMM Fundamental Issues

HMM can be used.

1) Given 𝜆 = (𝐴, 𝐵, 𝜋 ) how can 𝑃 (𝑂|𝜆) be computed for each observation

2) How can a state sequence 𝑄 = 𝑞1 𝑞2 … . 𝑞𝑇 be chosen for the observation

sequence 𝑂 and the model 𝜆?

3) How do we maximize 𝑃(𝑂|𝜆) by adjusting model parameters 𝐴, 𝐵, and 𝜋?

The solution for question one can be solved in a straightforward manner,

must be found (Lawrence). Therefore, a Forward-Backward algorithm (FB) is used.

The second question can be answered by using a Viterbi Algorithm. Viterbi is

array 𝜓𝑡 (𝑗) is used to store the highest probability paths (Lawrence).

The third question can be answered either through a Baum-Welch method,

Expectation-Modification method, or Gradient Techniques. Rabiner claims, “There

of the observation sequence. In fact, given the finite observation sequence as

training data, there is no optimal way of estimating the model parameters”

(Lawrence). The Baum-Welch method can be used to select 𝜆 = (𝐴, 𝐵, 𝜋 ) where

uses the Forward Backward algorithm to create a re-estimation model of 𝜆̅ =

(𝐴̅, 𝐵̅ , 𝜋̅ ). By using 𝜆̅ in place of 𝜆, the probability of 𝑂 being observed from the

𝑃(𝑂|𝜆̅) ≥ 𝑃(𝑂|𝜆) where the likelihood function converges to a critical point

3. HMM Implementation in MATLAB

To implement the HMM process, MATLAB offers various algorithms within

the statistical toolbox package. A summary of the implementation is described

1. Select a dataset to be modeled and import it into MATLAB.

backtest over. This requires knowledge of the dataset and may be

requires the entire process to be repeated.

4. Compute observations per state matrix M.

5. Create the transition probability matrix where 𝑎𝑖𝑗 =

𝑃(𝑞𝑡+1 = 𝑆𝑗 |𝑞𝑡 = 𝑆𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁. Within this model, the probabilities

remain discrete. Therefore, the issue of dividing by zero arises. To

6. Create the emission probability matrix. These values are assumed to be

7. Create a prior probability matrix where the discrete probabilities are

derived from matrix M.

8. By default, MATLAB begins the HMM algorithms at state 1. To assign a

different distribution of probabilities, the transmission and emission