Sunteți pe pagina 1din 14

The Implementation of a Hidden Markov Model in MATLAB for the Prediction of

Commodity Prices

2011 Report

Prepared by:
Taylor Sauder

Foster College of Business Administration

Bradley University

Report Distributed December 7, 2011


Abstract

The Hidden Markov Model offers an approach for modeling dynamic systems

that are observed through a time-series. In this paper, a general overview of Hidden

Markov Models is presented, followed by tutorial for implementing a model in

MATLAB. To illustrate the application of the model, the technique is used to predict

commodity prices. In addition to the predictions, a trading strategy is produced to

determine the overall profit/loss over the backtesting period.

1. Introduction

The analysis and prediction of financial time series is of upmost importance

for professionals and academics in the field of finance. Using the concepts of discrete

time-series, a Hidden Markov Model (HMM) technique is used to model the dynamic

behavior of commodity prices. First, a set of historical data is used to initialize and

train the HMM. Secondly, the trained HMM will be used to predict future

commodity prices over a selected backtesting period. Finally to determine the

accuracy and value of the model, a random walk model will be compared to the

generated HMM and a trading strategy will be implemented over a backtesting

period.

Section 2 of this paper describes the Hidden Markov Model process in a

general context. Section 2.1 provides a formal, mathematical explanation of the

model; Section 2.2 describes the dilemmas that arise within HMM’s and offers well-

established solutions. Section 3 provides a thorough summary of the HMM

approach for modeling commodity prices using MATLAB. Section 4 describes the

1
data analysis and general inputs for the model. Several performance metrics are

highlighted in Section 4.1 followed by the results in Section 4.2. After summarizing

the results, Section 4.3 calls attention several issues with the modeling process.

Using the HMM output, a trading strategy is created and implemented in Section 5.

Finally, a summary of the findings and discussion of future studies is provided in

Section 6.

2. Hidden Markov Model Overview

Processes are often viewed as signals, which come from either one or

multiple sources. These signals may have fixed or unfixed parameters and may even

be corrupted at times. Therefore, a mathematical signal model can be used to

process the signal in hopes of deriving a statistical process that describes the source.

This first leads to the theory of Markov Chains, which can be extended further to

Hidden Markov Models. Leonard E. Baum and several colleagues first published the

theory of HMMs during the 1960s. In 1989, Lawrence Rabiner published his tutorial

on HMMs, which explained the theory in a more general context (Lawrence). This

ease of understanding propelled the use of HMMs across several fields of study

including speech recognition, speech analysis, video analysis, photo analysis, and

biology. While most research is applied to fields outside of finance, it often deals

with time series analysis; therefore the transition requires little effort.

2.1 Formal Definition

While a Markov Process does model a stochastic process, it observes, rather

than describe or predict the process, which generates the observations. Here in lies

2
the importance of the Hidden Markov Model. The HMM records observations as

probability functions of the state, where the source is described as a hidden

stochastic process.

HMM’s are characterized by the following five traits:

1) The number of 𝑁 hidden states within the model. Each state corresponds to

a unique state provided by the model. In the model, the states are defined by

price buckets.

2) The amount of 𝑀 unique observations per state. These symbols are denoted

as 𝑉 = {𝑣1 , 𝑣2 , … . , 𝑣𝑀 }. This can be thought of as the number of

observations that fall in each price bucket.

3) State transition probability distributions 𝐴 = {𝑎𝑖𝑗 } where 𝑎𝑖𝑗 =

𝑃(𝑞𝑡+1 = 𝑆𝑗 |𝑞𝑡 = 𝑆𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁

4) The emission probability distribution in state 𝑗, 𝐵 = {𝑏𝑗 (𝑘)} where

𝑏𝑗 = 𝑃(𝑣𝑘 𝑎𝑡 𝑡 |𝑞𝑡 = 𝑆𝑗 ), 1 ≤ 𝑗 ≤ 𝑁, 1 ≤ 𝑘 ≤ 𝑀

5) The prior probability 𝜋𝑖 = {𝜋𝑖 } of being in state i at the beginning of the

observations where 𝜋𝑖 = 𝑃 (𝑞1 = 𝑆𝑖 ), 1 ≤ 𝑖 ≤ 𝑁.

The values of 𝑁, 𝑀, 𝐴, 𝐵, and 𝜋 can be used to generate the observation sequence

𝑂 = 𝑂1 𝑂2 𝑂3 … 𝑂𝑇 where 𝑂𝑡 is an observation from 𝑉, and 𝑇 is the number of

observations in the sequence (Md, Stock). To initiate a HMM, an initial state will be

chosen based on the prior distribution 𝜋 and 𝑡 is set at 1. 𝑂𝑡 = 𝑣𝑘 is chosen

according to the distribution in 𝑆𝑖 . The model moves to state 𝑞𝑡+1 = 𝑆𝑗 based on the

3
transition probability distribution of 𝑆𝑖 . This process will continue as 𝑡 increments

or until termination. More formally, this process is denoted by 𝜆 = (𝐴, 𝐵, 𝜋 ) where:

∑𝑗 𝑎𝑖𝑗 = 1, ∑𝑡 𝑏𝑖 (𝑂𝑡 ) = 1, ∑𝑖 𝜋𝑖 = 1, 𝑎𝑖𝑗 , 𝑏𝑖 (𝑂𝑡 ), 𝜋𝑖 ≥ 0 for all 𝑖, 𝑗, 𝑡

2.2 HMM Fundamental Issues

There are three fundamental issues regarding HMMs that must be solved before an

HMM can be used.

1) Given 𝜆 = (𝐴, 𝐵, 𝜋 ) how can 𝑃 (𝑂|𝜆) be computed for each observation

sequence?

2) How can a state sequence 𝑄 = 𝑞1 𝑞2 … . 𝑞𝑇 be chosen for the observation

sequence 𝑂 and the model 𝜆?

3) How do we maximize 𝑃(𝑂|𝜆) by adjusting model parameters 𝐴, 𝐵, and 𝜋?

The solution for question one can be solved in a straightforward manner,

however the amount of calculations is too cumbersome and a more efficient process

must be found (Lawrence). Therefore, a Forward-Backward algorithm (FB) is used.

The FB uses induction to create two sets of probabilities, the forward and backward.

Together, these are used to create a smoothed set of values. In this specific case,

only the forward probabilities are used to create ∝𝑇 (𝑖) = 𝑃(𝑂1 𝑂2 … 𝑂𝑇 , 𝑞𝑇 = 𝑆𝑖 |𝜆)

where 𝑃(𝑂 |𝜆)is the sum of ∝𝑇 (𝑖) ′ 𝑠 and 𝛼1 (𝑖) = 𝜋𝑖 𝑏𝑖 (𝑂1 ) for 1 ≤ 𝑖 ≤ 𝑁.

The second question can be answered by using a Viterbi Algorithm. Viterbi is

commonly used because it takes into account the most likely state at every instant

and the probability of occurrence within the sequence of states. The algorithm will

4
find the max 𝑄 for a given observation sequence 𝑂 by the means of induction. An

array 𝜓𝑡 (𝑗) is used to store the highest probability paths (Lawrence).

The third question can be answered either through a Baum-Welch method,

Expectation-Modification method, or Gradient Techniques. Rabiner claims, “There

is no known way to analytically solve for the model which maximizes the probability

of the observation sequence. In fact, given the finite observation sequence as

training data, there is no optimal way of estimating the model parameters”

(Lawrence). The Baum-Welch method can be used to select 𝜆 = (𝐴, 𝐵, 𝜋 ) where

𝑃(𝑂 |𝜆) is locally maximized using an interative process. The Baum-Welch method

uses the Forward Backward algorithm to create a re-estimation model of 𝜆̅ =

(𝐴̅, 𝐵̅ , 𝜋̅ ). By using 𝜆̅ in place of 𝜆, the probability of 𝑂 being observed from the

model can be increased up to some point. This can be defined by max [𝑄(𝜆, 𝜆̅)] =
̅
𝜆

𝑃(𝑂|𝜆̅) ≥ 𝑃(𝑂|𝜆) where the likelihood function converges to a critical point

(Lawrence).

3. HMM Implementation in MATLAB

To implement the HMM process, MATLAB offers various algorithms within

the statistical toolbox package. A summary of the implementation is described

below in 12 steps.

1. Select a dataset to be modeled and import it into MATLAB.

2. Select a time frame for the model to be trained over and a period to

backtest over. This requires knowledge of the dataset and may be

5
influenced by computing power. Note: each period of backtesting

requires the entire process to be repeated.

3. Determine high and low price windows and bucket sizes for the

observation prices. For example, a bucket size of $1.00 with windows -$5

and $5 will create buckets: -5:-4, -4:-3, …. , -1:0, 0:1, …. , 3:4, 4:5.

4. Compute observations per state matrix M.

5. Create the transition probability matrix where 𝑎𝑖𝑗 =

𝑃(𝑞𝑡+1 = 𝑆𝑗 |𝑞𝑡 = 𝑆𝑖 ), 1 ≤ 𝑖, 𝑗 ≤ 𝑁. Within this model, the probabilities

remain discrete. Therefore, the issue of dividing by zero arises. To

circumvent this issue, a loop replaces the transition values with zero.

6. Create the emission probability matrix. These values are assumed to be

normal where the mean and variance are derived from matrix M.

7. Create a prior probability matrix where the discrete probabilities are

derived from matrix M.

8. By default, MATLAB begins the HMM algorithms at state 1. To assign a

different distribution of probabilities, the transmission and emission

matrices are augmented to include the prior matrix.

9. To generate a sequence of 𝑄 states and 𝑂 observations, the function

ℎ𝑚𝑚𝑔𝑒𝑛𝑒𝑟𝑎𝑡𝑒 (𝑙𝑒𝑛𝑔𝑡ℎ, 𝐴, 𝐵) is used; where A and B represent the

transition and emission matrices respectively (Jaroslav).

10. Using the sequence of states and observations, ℎ𝑚𝑚𝑡𝑟𝑎𝑖𝑛 (𝑂, 𝐴, 𝐵)

calculates the maximum likelihood estimates of the transition and

6
observation probabilities by the means of a Baum-Welch algorithm given

the initial transition and emission matrices (Jaroslav).

11. To predict the future state and associated price, the estimated transition

matrix is used along with the current state. An algorithm examines the

current state then searches through the transition matrix to find the

maximum likelihood for the next state. Once determined, the future state

and associated prices are stored in matrix that remains unaffected by the

backtesting process. In other words, it does not update along with the

transition and emission matrices during the process.

12. Repeat Steps 4-11 until all of the backtesting data has an associated

predicted state and price.

For a more in-depth overview see Appendix I.

4. Data Analysis and Results

To investigate the performance of the HMM model, predictions are generated

for corn spot prices over a 20-day period. The initial training day includes daily

prices beginning January 2005 and ending September 2011. This period was chosen

due to corn ethanol mandates, which began in 2005. Since 2005, 40% of the corn

production is consumed by ethanol demands, which has led to a drastic increase in

price and caused prices to enter a new regime. It should be noted that all data was

gathered via DataStream and the prediction period was kept minimal due to

computing power restraints.

7
For the corn prices, a step or “bucket” size of .005 was used. This was due to

the half-cent price reporting that occurs in the commodity markets. Also a

maximum price of $8 and minimum price of $1.50 were chosen to train the data set.

This ensured that all past data would be encompassed.

4.1 Performance Metrics

Several performance metrics were used in this study. First, the mean

absolute deviations and standard deviations for the predicted and actual prices

were calculated. Second, the accuracy in predicting the direction of price

movements was determined and recorded as a percentage value.

4.2 Results

Over the 20-day backtesting period, the HMM model resulted a mean

absolute deviation of $0.149 and a standard deviation of $0.117 when predicting

prices. The model was also able to predict 33% of the price movements during the

period. To determine if these results are of any value to a trader or researcher, a

random walk model was created and results were compared. Price changes within

this model are predicted by using the previous day’s price for the current day’s price.

From this approach, slightly improved predictions resulted. The random walk mean

absolute deviation was $0.100 and the standard deviation was $0.098. While the

random walk method has reduced the prediction error, a two-sample mean

hypothesis test was conducted to determine if the difference was significant. The
̅̅̅̅
(𝑋 ̅̅̅̅
1 −𝑋 2)
test statistic value can be derived from 𝑡 = where 𝑋1 represents the HMM
𝑆2 𝑆 2
√ 1+ 2
𝑛1 𝑛2

8
deviation and 𝑋2 represents the random walk deviation. The degrees of freedom

can be calculated by
2
𝑆2 𝑆2
( 1 + 2)
𝑛1 𝑛2
𝑑𝑜𝑓 = 𝟐 𝟐 .
1 𝑆2 1 𝑆2
∗ ( 1) + ∗ ( 2)
𝑛1 − 1 𝑛1 𝑛2 − 1 𝑛2

By the use of the prior calculations, the resulting p-value is 0.1597 for a two-tailed

test (Wackerly). This test clearly fails to reject the null hypothesis. Therefore, there

is not enough evidence to support a significant difference between the two mean

deviations.

4.3 Issues to Consider

Due to the model examining only prices, many data points that the model

trains over are not relevant to the backtesting period. Based on this principle, it

would be more relevant to train only over prices that may have an impact on

predicting future prices. Therefore, a regime switching model could be

incorporated into the HMM so that only relevant data is used for training. This

would also reduce the overall computations enabling the potential for higher order

processes to be conducted. Another issue may arise within the bucket sizes. By

using the smallest available size of .005, overfitting could potentially occur causing

the model to fail. To address this issue, further studies should be conducted

regarding the selection of an optimum size.

9
5.0 Trading Strategy

While there is a lack of statistical significance between the random walk and

HMM model, a trading strategy was created to determine if the model still provides

value. Before designing a strategy, one issue with the HMM process must be

addressed. While training the data if a state has not been observed, the transition

prediction is the state remains the same. i.e. it behaves the same as the random

walk model. This is relevant because a practitioner would not want to trade on ill-

informed information. Therefore, a Boolean filter is used to determine whether the

model predicts a stationary state. This is represented by: 1 stationarity occurs or 0

it does not occur.

If the previous filter is satisfied, then several criterions must be met before a

trade is initiated. First, predicted returns must be calculated for all backtested

periods where the previous filter is satisfied. These returns are further compared to

a required return set by the user. For our testing, the required return was 1%,

which would represent a considerable gain for daily prices. The final step must

compare the previous day’s price to the predicted price in order to determine a buy

or sell action. These values are then stored in a matrix and summated to ascertain

the overall profit from the trading strategy.

Over the 20-day backtesting period, the trading strategy initiated five trades

for a total loss of $0.265. This was not astonishing considering the model predicted

the price direction correctly 33% of the time. A visual representation of the price

predictions, errors, and trading points can be seen below.

10
6.0 Conclusion

This paper has used a discrete time-series Hidden Markov Model in an

attempt to predict corn spot prices. Over a 20-day backtesting period, the model

resulted a mean absolute deviation of $0.149 and a standard deviation of $0.117

when predicting prices. In addition, the model predicted the correct price

movement 33% of the time. However, a random walk model was constructed and

had a mean absolute deviation of $0.100 and standard deviation of $0.098. While

the random walk model appeared to be better, a t-test result in a lack of statistical

significance between the random walk and HMM model. The final step was to

implement a trading strategy to determine the value of the HMM model. After a

series of filters, the strategy resulted in a total loss of $0.265.

11
While the model did not produce advantageous results, several

improvements are proposed for future research. First, as noted before, a procedure

should be generated to determine an optimum bucket size to prevent overfitting the

model. Second, the model should include only relevant prices to prevent

computation lag. This process could be accomplished through a regime-switching

algorithm. Third, higher order methods could be added to the HMM. By including

the probabilities of states above and below the current state, a more informed

prediction may result. Similar to the bucket size criteria, an optimum amount of

nearby states would need to be selected. Fourth, to improve the trading strategy,

the Boolean filter could be improved by requiring a minimum amount of past

observations. For example, a trade would not occur if a state had not been observed

three times. This would enable the strategy to only trade on well-informed

predictions. In conclusion, the first attempt at using a HMM to predict commodity

prices does appear provide mildly beneficial results and should be further

researched.

12
References

Jaroslav Lajos, K. M. George, N. Park. 2011. A Six State HMM for the S&P 500

Stock Market Index. Oklahoma State University. 218 Mathematical Sciences.

Stillwater, OK.

Lawrence R. Rabiner. 1990. A tutorial on hidden Markov models and selected

applications in speech recognition. In Readings in speech recognition, Alex Waibel

and Kai-Fu Lee (Eds.). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA

267-296.

Md. Rafiul Hassan and Baikunth Nath. 2005. StockMarket Forecasting Using

Hidden Markov Model: A New Approach. In Proceedings of the 5th International

Conference on Intelligent Systems Design and Applications (ISDA '05). IEEE Computer

Society, Washington, DC, USA, 192-196.

Md. Rafiul Hassan. 2009. A combination of hidden Markov model and fuzzy

model for stock market forecasting. Neurocomput. 72, 16-18 (October 2009),

Wackerly, Dennis D., William Mendenhall, and Richard L. Scheaffer. "Chapter

10.3." Mathematical Statistics with Applications. Pacific Grove, CA: Duxbury, 2002.

Print.

13

S-ar putea să vă placă și