Sunteți pe pagina 1din 6

Clustering Techniques for Financial Diversification

Zach Howard
Keith Johansen

March 16, 2009

Abstract
Holistic approaches to cluster formation for portfolio diversification rely too strongly on
qualitative and opinionated analysis. Qualitative techniques use factors such as industry, size
and growth forecasts to form a cluster. These factors serve as poor proxy measures of how
similarly a group of stocks moves in response to market conditions. The quantitative techniques
that will be applied in this paper will attempt to form clusters using data matrices that involve
factors that relate to the movement of a stock. This will eliminate the poor proxy factors of
the holistic analysis procedures and attempt to perform analysis based on the true predictive
factors of stock movement.

1 Introduction
In portfolio formation it is often advantageous to diversify the portfolio by selecting stocks from
several groups. These groups of stocks will vary in how they respond to market conditions. Thus
the portfolio will be profitable in positive, negative and neutral markets. Clustering techniques
mentioned in this paper will be used to form groups of stocks that are most like one another using
two different data matrices as input. A fixed number of stocks will then be selected each week
from each of the clusters that formed to calculate the returns achieved by the clustering technique.
These returns will be compared to several benchmarks to determine if the return is significant.

2 Data Sources
The data for this project comes from the EOD Data repository. 1 The chosen data includes 1357
stocks from the NASDAQ exchange with data points from January 2nd, 1998 to December 31st,
2002. The 1357 stocks chosen do not comprise the entire NASDAQ; because of missing values in
the data only the stocks with data points over the entire period are included.

3 Data Transformations
Two unique data transformations were used to form clusters.

We will form a data matrix X to perform the clustering on which will contain the entire set of data
for the analysis. The formation of X will vary based on which transformation is being used.
1
http://eoddata.com/

1
A moving window of training and testing data will be used. If 52 weeks of stocks are the testing
data in one period they will become the training data in the subsequent period. This moving
window still maintains the testing data as out of sample during testing period.

3.1 Log Returns


X = [log(Ri,t )]

where Ri,t is the return of stock i at time t i = 1..1357 t = 1..260

This will result in a matrix in which the rows are individual stocks and the columns are changes
in time periods within that stock. The resulting matrix will be a 1357 by 260. The dimension of
the clustering at each time period will be 1357 by 52. The factors in the clustering will be the
log values of the returns for the preceding 52 weeks. The log transform is taken to reduce data
variation and make the data more like a normal distribution.

There are no outliers present in this data matrix.

3.2 Autocorrelation Measures


Autocorrelation measures are helpful for discovering patterns in time series data that are obscured
by noise.

X = [ρj (Ri ) | φj (Ri ) | ρj (Ri2 ) | φj (Ri2 )] j = 2..14

Where Ri is the time series of prices for stock i, j is the number of lags to use in calculating the
respective correlation functions, ρj (·) is the global autocorrelation function and φj (·) is the partial
autocorrelation function.

The rows of X will be individual stocks. The columns of X will be the autocorrelation functions
stated. There will be 280 columns for each stock since each of the four autocorrelation functions will
be calculated for 12 lags. At each training period the factors in the clustering will be the preceding
56 entries in the matrix. This data only represents 14 weeks of data, but has 56 dimensions.

There are outliers present in this data matrix.

4 Selection Metric Within a Cluster


The relative strength index is a well established method for choosing stocks that have exhibited
positive movement in the recent past. This metric will be applied at the end of each week to select
the best stocks to hold for the upcoming week. Only long positions will be considered. We will use
the Cutler RSI which involves simple moving averages. The chosen stocks will be those with the
highest RSI measures.

SM A of U
RSI = SM A of D

2
SMA is simple moving average. For each period U and D are calculated. For an up period,
Ui,t = Ri,t − Ri,t−1 and Di,t = 0; for a down period Ui,t = 0 and Di,t = Ri,t−1 − Ri,t where Ri,t is
the closing price at the end of period t for stock i.

5 Benchmarks
5.1 Market Index
The return on a market index of the NASDAQ will be used as a general measure of market perfor-
mance over a period. The market index will allow us to compare the performance on our portfolio
that was diversified through clustering to the return of the market as a whole in order to separate
the increase from our methods from the overall performance of the market.

5.2 Random Clusters and No Clusters


To isolate the effects of the clustering techniques from the effects of the selection metric within a
cluster, we will randomly create clusters and select stocks from these clusters in the same manner
as we did from the machine learning based clusters.

We will also use a benchmark formed by using the RSI measure to select stocks each week without
forming clusters.

If these benchmarks were not used, it could be argued that the selection metric was so powerful that
it could effectively select the best stocks and that clustering and then selecting would be equivalent
to applying the metric to the total universe of stocks without clustering.

6 Clustering Methods
6.1 K-means
The K-Means algorithm is a clustering technique that forms clusters by iteratively assigning each
point to a mean and then re-computing each mean based on the points assigned to it.

In the standard version of K-Means, a user-supplied number of means, K, are initialized randomly
throughout the space of the data. So, for each dimension, d, of each mean, m:

P (Md = x) = U nif orm[M INd , M AXd ](x) (1)


Where M INd and M AXD are the minimum and maximum values of each dimension, d, in the
dataset.

At each iteration, each data point is assigned to the closest mean. Euclidean distance is used to
determine closeness, with each dimension corresponding to a metric for the stock (either the log of
the change in price, or the autocorrelation measure).

After assigning (or re-assigning) each data point, the means must be adjusted. Each mean is
recomputed by averaging all of the points together that are currently assigned to the mean. So,

3
each dimension, d, of each new mean, m, which has n data points assigned to it, can be computed
as:
P
p²m pd
md = (2)
n
The assignment and mean adjustment steps are repeated until each mean has no change.

6.2 K-means++
The K-Means++ algorithm is a modified version of K-Means.2 It behaves identically, except that
the means are more carefully initialized. Instead of scattering the means randomly, each mean will
be initialized on top of a data point according to a probability weighted by the points distance to
the nearest mean, squared.

The first mean will be initialized randomly, as in standard K-Means. Then, each successive mean,
m, is initialized as equal to a data point, p, from the set of all points P, with probability:

D(p0 )2
p(m = p) = P 2
(3)
p²P D(p)

Once all means are initialized, the algorithm continues with the standard K-Means algorithm.

6.3 K-mediods
The k-mediods algorithm is a modification to the k-means algorithm. The modified algorithm re-
quires that the mean for each iteration be an actual data point. This allows the method to be more
robust in the presence of outliers. In the normal k-means algorithm if there are a small number of
extreme outliers these values would attract the mean towards them and away from the true cluster.
Since the k-mediods algorithm requires that the mean value be a datapoint the mean value at each
iteration can not be attracted out of the true cluster, it may be attracted to one side of the cluster
but will still remain in the true cluster.3

The algorithm begins by randomly selecting data points to be the means of each cluster. It then
uses Euclidean distance to find the closest cluster to each data point. The new mean of clusters is
defined to be the point h in each cluster i such that the sum of all pairwise distances is minimized.
X
arg min D(h, j) (4)
h²Ci j²Ci

6.4 Kernel K-means


The kernel k-means algorithm uses a kernel function to cast the data to a higher dimension. This
is helpful when the data is not linearly separable in the current dimensions. The intuition of the
algorithm is the same as the normal k-means but some analogs develop for the distance metric and
selecting the new mean. The version of kernel k-means that we used was designed for weighted
2
http:www.stanford.edu/∼darthur/kMeansPlusPlus.pdf
3
The Elements of Statistical Learning, Hastie et al, 2001

4
kernel k-means but all weights were held constant and equal. We used a gaussian kernel. The
kernel matrix K is formed by taking the kernel function of each pair of data points.4 .
<xi ,xj >
Kij = e− σ2 (5)
The algorithm randomly selects initial means for each cluster. At each iteration, for each point xi
and each cluster ci calculate
P P
2 ∗ x ²Π(t) wj ∗ Kij (t) wj wl Kjl
xj ,xl ²Πc
j c
d(xi , mc ) = Kii − P + P (6)
x ²Π
(t) wj ( x ²Π(t) wj )2
j c j c

The new class c* of point xi =


arg min d(xi , mc ) (7)
c
The cluster vectors Πc are updated at the end of each iteration, this iteration is continued until
there is no change in the cluster vectors between iterations.

Although we successfully programmed an implementation of this method, because of computational


requirements we did not apply this method to the data. The kernel matrix is computationally
intensive but is only done once per clustering. The computational expense of the calculating d(·, ·)
was large as well, but the most prohibitive factor was the need to cross validate in order to learn
the parameter in the kernel function. Since our other methods performed well there was also no
reason to believe that the data were not linearly separable and needed the dimensionality increase
of the kernel k-means method.

7 Results
7.1 Clustering Results
The values presented in the tables are the cumulative geometric return of the portfolio over the
chosen 5 year period. The table headers are in the format #C #S, the number preceding the C is
the number of clusters and the number preceding the S is the number of stocks selected from each
cluster that produced the displayed result. The total number of stocks selected is always 24.

Autocorrelation Data Log Data


6C 4S 4C 6S 3C 8 S 2C 12S 6C 4S 4C 6S 3C 8 S 2C 12S
K-Means++ 13.8 % 7.5% 8.7% 4.9% 10.5% 18.7 % 7.6% 5.0%
K-Means 6.5% 9.2% 8.8% 10.0% 14.7% 15.3% 14.4% 5.3%
K-Mediods 11.3% 6.8% 9.4% 7.7% 14.2% 9.8% 8.5% 2.9%
Random 5.4% 8.4% 5.0% 3.3% 5.8% 6.0% 6.3% 6.2%

7.2 Benchmark Results


The random clustering benchmarks are presented in the previous section. The total return on the
NASDAQ composite over the 5 year period was -17.5%. The return with no clusters and selecting
24 stocks each based solely on RSI values was 4.6%.
4
http://www.cs.utexas.edu/users/kulis/pubs/spectral techreport.pdf

5
8 Conclusions
Before starting this project, we hypothesized that the autocorrelation data would outperform the
log data. This however is not evident in our results. the autocorrelation transform that we used
only spanned the 14 weeks before the time when the clusters were to be formed. The log data had
values for 52 previous weeks. The 14 weeks of the autocorrelation measures were not long enough
to find long term patterns that would hold over the year subsequent to the clustering. Thus the
autocorrelation data did not perform as well as the log data since the log data contained more
robust information about long term patterns. Given more time we would extend the number of
lags of the autocorrelation data to be closer to 52. This however is computationally expensive and
we did not have time for that in this project.

It is clear from the results that the K-Means clustering algorithm performed better than the bench-
marks for both types of data. This verifies our predictions that clustering stocks based on their
movement in response to various market conditions can be beneficial. This is evident from the
success of the basic K-Means because it shows that even when the means are randomly initialized,
clustering based on these factors can be beneficial. It is also evident from the results that it per-
formed significantly better on the log data than on the autocorrelation data. This reveals that the
random initialization of means can lead to poorer clustering when outliers are present, as they are
with the autocorrelation data. Therefore, K-Means++ or K-Mediods are preferred in the case of
data with outliers.

According to the results, K-Means++ was the most successful clustering technique out of the three
used. Since the only difference in K-Means++ and standard K-Means is the process of initializing
the means, it is clear that the special initialization in K-Means is the source of this improvement.
Since K-Means++ picks its means to be equal to a data point based on a certain probability, it
ensures that the means are placed near or within clusters to start with. This means that the
means will converge more quickly and the clusters will be formed more accurately. Furthermore,
the probability to set a mean to a point is weighted by the distance to the nearest mean squared,
so it is almost guaranteed that the means will be distributed evenly throughout the data. This
avoids cases where several means are initialized very close together and they end up splitting up
what should be one cluster. K-Means++ is therefore the best choice for clustering, regardless of
whether outliers are present or not.

The authors of the K-means++ procedure proved that the algorithm always has a lower error rate
than the normal K-means algorithm. Our results empirically qualify this slightly; the K-means++
algorithm will always out perform the normal K-means only if the clustering is performed with the
ideal number of clusters, otherwise there is no such guarantee.

Our research and results show that machine learning methods can be effectively used to create
profitable portfolios even during a period when the market as a whole has a negative return.
It would be interesting to obtain the results from real mutual funds that have been holistically
clustered with human intervention and compare with our performance.

S-ar putea să vă placă și