Sunteți pe pagina 1din 8

CONTRIBUTORS

Cluster Analysis for Evaluating


Jeff Bacidore Trading Strategies1
Managing Director, Head of
Algorithmic Trading, ITG, Inc.
Jeff.Bacidore@itg.com
+1.212.588.4327
ABSTRACT
Kathryn Berkow
Quantitative Analyst, In this paper, we introduce a new methodology to empirically identify the primary
Algorithmic Trading, ITG, Inc. strategies used by a trader using only post-trade fill data. To do this, we apply a
Kathryn.Berkow@itg.com well-established statistical clustering technique called k-means to a sample of
+1.212.444.6146 “progress charts,” representing the portion of the order completed by each point in
Ben Polidore the day as a measure of a trade’s aggressiveness. Our methodology identifies the
Director, primary strategies used by a trader and determines which strategy the trader used
Algorithmic Trading, ITG, Inc. for each order in the sample. Having identified the strategy used for each order,
Benjamin.Polidore@itg.com trading cost analysis (TCA) can be done by strategy. We also discuss ways to exploit
+1.212.323.3408 this technique to characterize trader behavior, assess trader performance, and
suggest the appropriate benchmarks for each distinct trading strategy.
Nigam Saraiya
Vice President,
Algorithmic Trading, ITG, Inc.
Nigam.Saraiya@itg.com BACKGROUND
1.212.444.6479
Assessing trader performance is challenging because traders often vary their
strategies depending on the objectives of each trade. For example, when orders are
benchmarked to the open, traders may front-load their trades, perhaps executing a
large portion of the trade in the opening auction. For larger, more impactful orders,
CONTACT
traders may choose to trade more passively, stretching the order over a longer period
of time. Ideally, trading cost analysis (TCA) should take into account the trader’s
Asia Pacific underlying strategy. In reality, doing so is challenging because 1) it is often unclear
+852.2846.3500 how to characterize the underlying strategies used by the trader and 2) even if the
Canada
strategies were known, determining which orders apply to which strategy can be
+1.416.874.0900 difficult if that information is not captured in post-trade databases.

EMEA In light of these challenges, one common approach to assessing trader performance
+44.20.7670.4000 is to group trades by algorithm as a proxy for the trader’s underlying strategy. If
traders use specific algorithms to meet their objectives (e.g. using Close Algorithms
United States
for trades benchmarked to the close, VWAP Algorithms for trades benched to VWAP,
+1.212.588.4000
etc.), this approach makes sense because the algorithm is the strategy. However,
info@itg.com high-touch traders often use algorithms as tactics rather than strategies, switching
www.itg.com between different algorithms within a given order. As a result, TCA by algorithm will

1
This is the submitted version of the following article: Cluster Analysis for Evaluating Trading Strategies,
Jeff Bacidore, Kathryn Berkow, Ben Polidore, and Nigam Saraiya, The Journal of Trading Vol. 7 No. 3, ©
2012, Institutional Investor, Inc., which has been published in final form at: www.iijournals.com/doi/
abs/10.3905/jot.2012.7.3.006
2

not yield information about the effectiveness of the trader’s hybrid strategy.
Another commonly used approach to evaluate trader performance is to assess their
performance in the context of average aggressiveness. For example, one could look
at the average progress chart of a trader to see how passively or aggressively the
trader tends to work orders, and assess performance in that context. Such averages
may not be meaningful, however, as they aggregate across underlying strategies. For
example, Figure 1 shows the aggregate fill progress chart for a single trader. From
the graph, it would appear that this trader’s underlying strategy is VWAP. However, in
reality, this trader may have used multiple strategies that resemble VWAP in
aggregate, even if the trader never actually targeted full-day VWAP on a single order.

Figure 1. This is an example of the aggregate fill progress chart for all orders in a sample
dataset. The horizontal axis represents time from 9:30 AM – 9:45 AM (bin 1) to 3:45 PM – 4:00
PM (bin 26); the vertical axis represents percent of the order completed.

Analyzing trader performance correctly requires first identifying the different


underlying strategies used by a trader and then aggregating orders by these
strategies. In this paper, we present a new methodology that allows us to both
identify the core trading strategies used by a trader and classify each of the trader’s
orders into these strategies empirically, without having to tag orders prior to
execution. To do this, we first create a progress chart for each order and then apply a
well-established statistical clustering methodology called k-means to identify the
primary strategies used to execute these orders. The k-means methodology
classifies each order within one of the strategies, allowing for analysis by strategy.
This new approach to identifying trading strategies can be very useful when doing
TCA, especially for high touch trading. First, our methodology can identify the
underlying strategies used by each trader. Because of its dynamic nature, any new
strategies employed will be uncovered even if traders change them over time.
Second, for desks with multiple traders, our approach can be used to report which
strategies are used by the desk as a whole and divide strategy usage by trader. Third,
this type of granular trader-level analysis allows desks to assess relative trader
performance as a means to share best practices, instead of simply measuring which
trader is “best.” In particular, this analysis not only identifies which traders
outperformed, but also helps explain why they outperformed. Finally, since these
strategies can be represented graphically, we are able to infer what the trader’s
benchmark may have been for a given trade. For example, for highly front-loaded
trades, the open may be the most relevant benchmark, while for back-loaded trades,
the closing price may be more appropriate. As noted before, all this can be done
empirically on a post-trade basis, so our approach does not require traders to enter
additional data or for systems to be adapted to accommodate new post-trade
strategy information.
3

METHODOLOGY
Our methodology uses the intuition of a progress chart when characterizing a trading
strategy, but applies a common clustering technique called k-means to divide the
aggregate strategy into its component strategies in the same way a prism divides
light into its component colors (as shown in Figure 2). The process begins by
creating a progress chart for each order. Specifically, for each 15-minute period in
the trading day (26 in total), it computes the cumulative fraction of the order that
was completed by the end of that period, i.e., the progress of the order at that point.
The trading strategy itself is represented by the collection of these 26 “progress
points,” an example of which is given in Figure 1. These charts will always begin at 0%
and end at 100%, and will increase as we move from left to right along the x-axis to
represent the order’s cumulative fill progress over the day. We then apply k-means to
group them into k distinct trading strategies.

Figure 2. The methodology takes an aggregate progress chart and splits it into its underlying
component strategies.

To understand how k-means works intuitively, assume that we break the trading day
into 3 bins instead of 26 bins. For each order, we determine the percent of the order
that was complete at the end of each bin. For example, suppose the trader executed
a 10,000-share order by executing 2000 shares in bin 1, 1000 shares in bin 2, and
7000 shares in bin 3. Our methodology would characterize this order as a progress
chart with the values 20%, 30%, and 100%, to represent the percent complete at the
end of each bin. Since all orders are completed by the end of the last bin, all orders
will have a value of 100% in bin 3. For this reason, we only need to look at the
progress at the end of the first two bins when attempting to distinguish between
strategies.2
In Figure 3, we plot a sample of orders, where each black dot on the graph represents
an order. The x-axis represents the percent of the order completed by the end of bin
1, and the y-axis represents the percent completed by the end of bin 2. In the

2
Adding the third bin where all orders take on a value of 100% to the k-means methodology does not provide any
useful information in helping us differentiate between how the different orders were traded. So one can exclude
the third bin from the k-means methodology without influencing the results.
4
example of the 10,000-share order above, the order can be represented graphically
as the dot labeled X in Figure 3A. Since this order was 20% complete at the end of bin
1 and 30% complete by the end of bin 2, the point is represented with an x-axis value
of 20% and a y-axis value of 30%.

Figure 3. Illustration of k-means algorithm. In Figure 3A, the black dots are the existing, classified observations.
The triangle in Figure 3B represents a new order that must be classified, and the squares represent the centers of
the two existing clusters. The grey arrows show the distance between the new point and the existing clusters’
centers. The algorithm classifies the new point with the cluster whose center is the shortest distance from it. The
black squares in Figure 3C represent the original cluster centers. The grey square is the updated center of the
cluster with the additional order.

Looking at Figure 3A, there are clearly two distinct groups of dots—one cluster in the
lower left quadrant and another in the upper right quadrant. Intuitively, these
clusters represent the two distinct strategies that the trader used. The former
represents orders that are executing slowly, i.e., those that have made relatively little
progress after both bin 1 (x axis) and bin 2 (y axis). The latter represents orders that
are being executed more quickly, where progress in both bin 1 and bin2 is
significantly higher.
In two-dimensions with a small amount of data, one could do cluster analysis
visually, as in Figure 3A. When the data set is large or the number of dimensions is
higher, as is the case here where we could have thousands of orders each split into
26 distinct bins, one must rely on statistical techniques to manage the clustering.
This is where k-means methodology comes into play.
The k-means algorithm begins by assigning k initial cluster centers, which can be
specified by the user or selected randomly by the algorithm. Iteratively, the algorithm
works through the sample, using a distance metric to assign each observation to the
nearest cluster. Figure 3B provides an example of an iteration of k-means. Suppose
we were to add a new observation, represented by the triangle in Figure 3B. K-means
computes the distance between that point and the two existing cluster centers,
represented by the squares in Figure 3B, to determine the nearest cluster. Since the
triangle is closer to the left cluster, k-means assigns it to the left cluster. With the
addition of a new data point, however, k-means must now compute a new cluster
center. Figure 3C shows the new cluster center, represented by the grey square,
which has shifted in the direction of the new observation.
When cluster centers and assignments of observations stop changing dramatically,
the algorithm stops. At this point, the output contains information on the k cluster
centers, which can be used to characterize the group itself, as well as the assignment
of each observation into a cluster.3 In our specific application, the center point of a
group characterizes the “average” progress chart of that strategy and the
assignments indicate the strategy that each order most closely resembles.

3
See Johnson & Wichern (2007) and MacQueen (1967) for a detailed discussion of k-means.
5
EXAMPLE
To demonstrate the methodology’s effectiveness, we apply it to a sample of orders
sent to two different algorithms over two different trading horizons to determine
whether it can identify these four distinct algorithm-trading horizon combinations.
Specifically, the sample includes both half-day and full-day4 not-held market orders
sent to either a VWAP or implementation shortfall (IS) algorithm5 between January 1,
2011 and September 31, 2011. We limit our sample to orders greater than five
hundred shares, ensuring orders were worked over time and not executed in one slice
by the algorithm.
With no strategy context, k-means identified the four trading strategies and
classified orders within them with a high degree of accuracy. The results in Figure 4
show the trading strategies identified in the sample that comprise the VWAP-like
aggregate progress chart shown in Figure 1. Figure 4A represents half-day VWAP
orders, Figure 4B represents full-day VWAP orders, Figure 4C represents IS algo
orders starting before 9:40 AM, and Figure 4D represents half-day IS algo orders.
K-means was able to classify over 98% of the orders correctly. As shown in Table 1,
VWAP orders were correctly identified more than 99.5% of the time. IS orders were
identified correctly more than 98% of the time. Therefore, k-means was able to both
correctly identify the four different strategies and assign orders to each strategy with
precision.

Figure 4. Trading styles identified from post-trade data; example results for sample full- and half-day VWAP and
IS algo orders.

Order Type Accurary


Half-Day VWAP 99.73%
Full-Day VWAP 99.54%
Full-Day IS 98.58%
Half-Day IS 98.19%

Table 1 Accuracy of k-means in assigning orders to strategies.

APPLICATIONS
This methodology can be used to assess trader performance in several ways. First,
k-means can be used to identify underlying trading strategies for large client orders.
Figure 5 shows the output for a hypothetical client. For this client, we see three
distinct fill trajectories—trading into the close (strategy A), front-loaded trading
(strategy B), and participation-based trading throughout the day (strategy C). Another
benefit of k-means is the ability to uncover less dominant strategies used by a trader.

4
Orders considered “full-day” arrived before 9:40 AM; orders considered “half-day” arrived between 12:00 and
12:50 PM. All VWAP orders ended after 3:20 PM, but there was no restriction on end time of IS orders.
5
Specifically, we include orders sent to ITG Active Algorithm, a single stock implementation shortfall algorithm.
6
This is evidenced in Table 2, which shows that only 5% of value was executed via
strategy C. Here, k-means uncovered a minority strategy that may have been
overlooked in a traditional analysis. In effect, our methodology gives traders the
ability to experiment with trading strategies in real time without having to change
their work flow to capture any strategy-level information.

Figure 5. Hypothetical client trades aggregated over the day and grouped by style via k-means. Three distinct
trading strategies emerge from the data.

Second, for desks with multiple traders, k-means can be used to help characterize
strategies by trader. The diagrams in Figure 6 show trader usage of the strategies
identified by k-means. For example, we can see that Trader 1 is the dominant user of
strategy C, but C makes up only 25% of Trader 1’s trading. Using the k-means results,
we can report how often each strategy was used and understand the trades
composing each strategy—by trader, fund, order size, market capitalization, time
period, market conditions, or any combination thereof.

Figure 6. Breakdown of trader usage of strategies for hypothetical client analysis shown in
Figure 4 and Table 1. Traders within strategies (Figure 5B) and strategies within traders
(Figure 5A).

Beyond usage patterns, the k-means output allows us to evaluate trades according
to appropriate benchmarks, identifying which strategies are most successful. Why
compare all executions to the close benchmark if 10% of orders were actually
front-loaded and 5% traded in a VWAP algorithm? The k-means results implicitly
provide suggestions concerning the benchmark a given trader may have been
targeting, which can help to better evaluate performance. For example, Trader 1 may
use strategy A when benchmarked to close, B when benched to the open and C when
benched to VWAP. Table 2 indicates that strategy A is performing well versus the close
benchmark, strategy B is performing well versus arrival and open, and strategy C is
performing well versus VWAP benchmarks. These results are intuitive since traders
7
likely target different benchmarks with different strategies. The ability to infer
benchmarks is especially useful for traders whose systems do not permit benchmark
information to flow to their post-trade databases.

Performance (bps)
Prev. Day Interval
Strategy Orders % Value Arrival Open Close
Close VWAP VWAP
A 10,334 46% -3 -1 1 1 -8 8
B 17,957 49% -6 4 -2 -12 -2 2
C 3,940 5% -17 -13 -6 -9 2 1

Table 2. Performance results for hypothetical client orders grouped into trading styles
illustrated in Figure 1.

Finally, our methodology can help to evaluate trader performance in the context of
the underlying trading strategies. If a given trader is under- or outperforming his
peers, our methodology can help identify the strategies driving his relative
performance. For example, if Trader 1 strongly underperforms his peers, it may be
due to his overuse of strategy C, which Table 2 shows is the worst-performing
strategy relative to the pre-trade cost benchmark. More generally, Table 2 shows
which strategies do best against each benchmark, implicitly making suggestions for
how to execute future trades.

CONCLUSION
In this paper, we provide a new methodology for identifying trading strategies using
only post-trade data. Specifically, we apply a well-established statistical technique
called k-means to both identify the primary strategies used by a trader and classify
each order into one of these strategies. This approach is particularly useful since it
does not require changes to trader workflows or post-trade systems to capture
strategy or benchmark information. Once the underlying strategies have been
identified and orders classified, TCA can be done by strategy. Analysis by strategy is
crucial because the choice of strategy can often be the primary determinant of a
trader’s performance. Visual representations of the underlying strategies naturally
suggest the trader’s benchmark, yielding relevant and useful analysis. Results can be
communicated both visually and numerically, making this a practical tool for any
trader.
8

REFERENCES

Johnson, R. A. and D. W. Wichern


Applied Multivariate Statistical Analysis, Sixth Edition. Upper Saddle River, New
Jersey: Pearson Prentice Hall, 2007.

MacQueen, J.B.
Some Methods for Classification and Analysis of Multivariate Observations.
Proceedings of 5th Berkeley Symposium on Mathematical Statistics and Probability,
1, Berkeley, CA: University of California Press (1967), 281-297.

© 2012 Investment Technology Group, Inc. All rights reserved. Not to be reproduced or retransmitted without permission. 50112-22067

Broker-dealer products and services offered by ITG Inc., member FINRA, SIPC. These materials are for informational purposes only, and are not
intended to be used for trading or investment purposes or as an offer to sell or the solicitation of an offer to buy any security or financial
product. The information contained herein has been taken from trade and statistical services and other sources we deem reliable but we do
not represent that such information is accurate or complete and it should not be relied upon as such. No guarantee or warranty is made as to
the reasonableness of the assumptions or the accuracy of the models or market data used by ITG or the actual results that may be achieved.
These materials do not provide any form of advice (investment, tax or legal). ITG Inc. is not a registered investment adviser and does not
provide investment advice or recommendations to buy or sell securities, to hire any investment adviser or to pursue any investment or trading
strategy. The positions taken in this document reflect the judgment of the individual author(s) and are not necessarily those of ITG.

S-ar putea să vă placă și