Sunteți pe pagina 1din 124

DeepThought 1.4.

2
Machine Learning for Financial Trading Systems

Deep Thought Software (NZ) Ltd


www.deep-thought.co
c
September,
2014

Contents
1 Introduction
1.1 Software Requirements
1.2 Data . . . . . . . . . .
1.3 Configuration . . . . .
1.4 Output Files . . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

2 Data
2.1 Importing Historical Data from MT4 . . . . . . . . . . . . . . . .
2.1.1 Exporting data as CSV from MT4 . . . . . . . . . . . . .
2.1.2 Importing MT4 CSV data into a DeepThought Database
2.2 Importing from Dukascopy . . . . . . . . . . . . . . . . . . . . .
2.3 Importing from HistData . . . . . . . . . . . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

.
.
.
.
.

.
.
.
.

1
1
2
2
2

.
.
.
.
.

3
3
3
4
4
4

3 Terminology

4 Machine Learning
4.1 Support Vector Machines (SVM)
4.2 Gradient Boosted Trees (GBT) .
4.3 Random Forests . . . . . . . . .
4.4 Extremely Randomised Trees . .
4.5 Multi-Layer Perceptron . . . . .
4.6 Ensembles . . . . . . . . . . . . .
4.7 Continuous Features . . . . . . .
4.7.1 Feature Normalisation . .
4.8 Categorical Features . . . . . . .

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

5 Backtesting
5.1 Backtesting Setup . . . . . . . . . . . . . . .
5.2 Recording and Using Recorded Signals . . . .
5.3 Order Fill Simulation . . . . . . . . . . . . . .
5.4 Paper Trading . . . . . . . . . . . . . . . . .
5.5 Files Produced During Backtesting and Paper
6 Genetic Algorithm for Parameter
6.1 Configuration . . . . . . . . . . .
6.2 Running the Genetic Algorithm .
6.2.1 Database . . . . . . . . .
6.3 Genetic Algorithm Results . . . .
6.4 Using Recorded Results . . . . .
6.5 The Condor Submit File . . . . .
6.6 Trouble Shooting . . . . . . . . .

Search
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

6
6
7
7
7
8
8
9
9
9

. . . . .
. . . . .
. . . . .
. . . . .
Trading

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

.
.
.
.
.

11
11
12
12
13
13

.
.
.
.
.
.
.

14
15
16
18
18
18
18
19

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

CONTENTS

II

7 Live and Paper Trading


7.1 Manual Trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.2 Automated Trading . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
7.3 Trouble Shooting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

20
20
21
22

8 Python Scripting
8.1 Python Installation . . . . . . . . . . . . .
8.2 Python Feature . . . . . . . . . . . . . . .
8.3 Python Target . . . . . . . . . . . . . . .
8.4 Python Predictor . . . . . . . . . . . . . .
8.5 Python Signal Generation . . . . . . . . .
8.6 The deep thought intf Interface Object

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

24
24
25
28
30
32
33

9 Configuration Details
9.1 bar-series . . . . . . . . . . . . . . . . . . . . . .
9.1.1 Renko Bars . . . . . . . . . . . . . . . . . .
9.1.2 Summary of bar-series Options . . . . .
9.2 bar-series-collection . . . . . . . . . . . . . .
9.3 model . . . . . . . . . . . . . . . . . . . . . . . . .
9.4 Features . . . . . . . . . . . . . . . . . . . . . . . .
9.4.1 hour-of-day . . . . . . . . . . . . . . . . .
9.4.2 day-of-week . . . . . . . . . . . . . . . . .
9.4.3 bar-diff . . . . . . . . . . . . . . . . . . .
9.4.4 bar-attribute . . . . . . . . . . . . . . . .
9.4.5 moving-average . . . . . . . . . . . . . . .
9.4.6 csv-feature . . . . . . . . . . . . . . . . .
9.4.7 python-script . . . . . . . . . . . . . . . .
9.5 Targets . . . . . . . . . . . . . . . . . . . . . . . .
9.5.1 bars-in-future . . . . . . . . . . . . . . .
9.5.2 python-script . . . . . . . . . . . . . . . .
9.6 Predictors . . . . . . . . . . . . . . . . . . . . . . .
9.6.1 svm-predictor . . . . . . . . . . . . . . . .
9.6.2 linear-svm-predictor . . . . . . . . . . .
9.6.3 gbt-predictor . . . . . . . . . . . . . . .
9.6.4 random-forest-predictor . . . . . . . . .
9.6.5 extremely-randomised-trees-predictor
9.6.6 multi-layer-perceptron-predictor . . .
9.6.7 python-predictor . . . . . . . . . . . . . . .
9.7 predictor-ensemble . . . . . . . . . . . . . . . .
9.8 signal-generator . . . . . . . . . . . . . . . . . .
9.9 trader . . . . . . . . . . . . . . . . . . . . . . . . .
9.10 backtest . . . . . . . . . . . . . . . . . . . . . . .
9.11 genetic-algo . . . . . . . . . . . . . . . . . . . . .

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

35
35
38
40
40
41
42
43
44
45
47
49
51
53
55
55
56
58
58
62
64
65
66
66
68
68
70
72
73
74

10 Commandline Tools
10.1 Candle Statistics (--stats) . . . . . . . . . . . . . . . . . . .
10.2 Generate Bars (--generate-bars) . . . . . . . . . . . . . . .
10.3 Generating a Manual Signal . . . . . . . . . . . . . . . . . . .
10.4 Generating Feature Statistics (--generate-feature-stats) .
10.5 Extracting a Training Set (--extract-training-set) . . . .
10.6 SVM Grid Search (--svm-param-search-c) . . . . . . . . . .
10.7 GBT Grid Search (--gbt-param-search-c) . . . . . . . . . .

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

.
.
.
.
.
.
.

77
78
79
80
80
81
81
82

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

CONTENTS

III

10.8 Printing XML Configuration Documentation (--print-config) . . . . . . . . . .

83

11 Fundamental Indicators (Experimental)


11.1 Fundamental Feature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

92
92

12 Tutorial: Preparing the Commandline


12.1 Step 1: Open the commandline . . . . .
12.2 Step 2: Open the defaults window . . .
12.3 Step 3: Change the font . . . . . . . . .
12.4 Step 4: Change the default window size

.
.
.
.

94
94
95
95
96

.
.
.
.
.
.
.

97
97
98
99
99
99
100
102

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

13 Tutorial: Backtesting in DeepThought and MT4


13.1 Step 1: Edit the configuration . . . . . . . . . . . .
13.2 Step 2: Start the DeepThought backtest . . . . . .
13.3 Step 3: Copy files to Metatrader . . . . . . . . . .
13.4 Step 4: Modify the EA . . . . . . . . . . . . . . . .
13.5 Step 5: Running Metatrader Strategy Tester . . .
13.6 Step 6: Optimisation with MT Strategy Tester . .
13.7 Step 7: Analyse the Results . . . . . . . . . . . . .

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.

A Sample Configuration

104

B Condor Setup and Operation


B.1 Installation . . . . . . . . . .
B.1.1 Adding a Condor User
B.2 Useful Commands . . . . . .
B.2.1 condor status . . . .
B.2.2 condor q . . . . . . .
B.2.3 condor rm . . . . . . .

108
108
112
113
113
113
114

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

.
.
.
.
.
.

List of Figures
4.1
4.2

Overfitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Multi-Layer Perceptron with 3 inputs, 5 hidden and 2 output neurons. . . . . . .

7
8

9.1
9.2

Type 1 Renko Bars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .


Type 2 Renko Bars . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

38
38

12.1
12.2
12.3
12.4

Opening the DeepThought Commandline


Opening the Defaults Window . . . . . .
Changing the Font . . . . . . . . . . . . .
Changing the Window Size/Layout . . . .

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

.
.
.
.

94
95
95
96

13.1 Editing the Configuration . . . . . . . . . . . . . . .


13.2 Starting the Backtest . . . . . . . . . . . . . . . . . .
13.3 The Completed Backtest . . . . . . . . . . . . . . . .
13.4 Metatrader tester setup . . . . . . . . . . . . . . . .
13.5 The Completed Backtest . . . . . . . . . . . . . . . .
13.6 Enabling the Genetic Optimisation in Metatrader . .
13.7 Selecting which Parameters to Optimise . . . . . . .
13.8 Enabling Optimisation in the Strategy Tester . . . .
13.9 List of the Best Results of the Metatrader Optimiser
13.10Report of the Optimum Settings . . . . . . . . . . .
13.11Graph of a Test With Optimum Settings . . . . . . .

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.

98
98
99
100
100
101
101
102
102
102
103

B.1
B.2
B.3
B.4
B.5
B.6
B.7
B.8
B.9

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

108
109
109
110
110
111
111
112
112

Condor
Condor
Condor
Condor
Condor
Condor
Condor
Condor
Condor

Setup
Setup
Setup
Setup
Setup
Setup
Setup
Setup
Setup

1
2
3
4
5
6
7
8
9

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

IV

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.

.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.

List of Tables
4.1
4.2

Normalisation and Scaling Schemes . . . . . . . . . . . . . . . . . . . . . . . . . .


Binarising categorical variables. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

9
10

5.1

Files Produced During Backtesting/Paper Trading . . . . . . . . . . . . . . . . .

13

6.1

parameter configuration options for the genetic-algo configuration section. . .

16

7.1

DeepThought parameters for the Metatrader EA. . . . . . . . . . . . . . . . . . .

22

8.1

Summary of the deep thought intf interface object . . . . . . . . . . . . . . . .

34

9.1
9.2
9.3
9.4
9.5
9.6
9.7
9.8
9.9
9.10
9.11
9.12
9.13
9.14
9.15
9.16
9.17
9.18
9.19
9.20
9.21
9.22
9.23
9.24
9.25
9.26
9.27
9.28
9.29
9.30
9.31

Sections in the XML configuration file . . . . . . . . . . . . . . . . . . . . . .


The effect of the delay-minutes-offset parameter on intraday candles. . .
bar-series configuration options. . . . . . . . . . . . . . . . . . . . . . . . .
Features used as independent inputs to machine learning models. . . . . . . .
hour-of-day feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
day-of-week feature. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
Price difference examples for the bar-diff feature. . . . . . . . . . . . . . . .
bar-diff feature parameter options. . . . . . . . . . . . . . . . . . . . . . . .
bar-attribute feature parameter options. . . . . . . . . . . . . . . . . . . .
moving-average feature parameter options. . . . . . . . . . . . . . . . . . . .
csv-feature parameter options. . . . . . . . . . . . . . . . . . . . . . . . . .
python-script feature parameter settings. . . . . . . . . . . . . . . . . . . .
bars-in-future target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
python-script target. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
svm-predictor configuration options. . . . . . . . . . . . . . . . . . . . . . .
params configuration options for the svm-predictor. . . . . . . . . . . . . . .
linear-svm-predictor configuration options. . . . . . . . . . . . . . . . . .
params configuration options for the linear-svm-predictor. . . . . . . . . .
params configuration options for the gbt-predictor. . . . . . . . . . . . . . .
gbt-predictor options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
params configuration options for the random-forest-predictor. . . . . . . .
Random Forest random-forest-predictor options. . . . . . . . . . . . . . .
Multi-layer Perceptron multi-layer-perceptron-predictor options. . . . .
params configuration options for the multi-layer-perceptron-predictor. .
python-predictor parameter settings. . . . . . . . . . . . . . . . . . . . . . .
retrain-period options for the predictor-ensemble. . . . . . . . . . . . . .
signal-generator configuration options . . . . . . . . . . . . . . . . . . . . .
trader configuration options. . . . . . . . . . . . . . . . . . . . . . . . . . . .
backtest options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
genetic-algo options. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
parameter configuration options for the genetic-algo configuration section.

36
37
40
42
43
44
45
46
47
50
52
53
55
57
60
61
63
63
64
64
65
65
67
67
69
69
70
72
73
75
76

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.

LIST OF TABLES

VI

10.1 Column meanings using the --stats commandline option. . . . . . . . . . . . . .


10.2 --generate-bars parameters. . . . . . . . . . . . . . . . . . . . . . . . . . . . .

79
80

11.1 title values for the fundamental-indicator feature. . . . . . . . . . . . . . . .

93

B.1 The Job States in Condor. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Listings
8.1
8.2
8.3
8.4
8.5
8.6
8.7
8.8

Python
Python
Python
Python
Python
Python
Python
Python

feature configuration. . . . . . . . . . .
script example defining a feature. . . .
target configuration. . . . . . . . . . . .
script example defining a target. . . . .
predictor configuration example. . . . .
script example for a predictor. . . . . .
signal generator configuration example.
script example for the signal generator.

VII

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

.
.
.
.
.
.
.
.

25
27
28
29
30
31
32
33

Chapter 1

Introduction
DeepThought is a sophisticated software package for creating trading systems utilising state of
the art machine learning algorithms. Currently supported are Support Vector Machines (SVM),
Linear Support Vector Machines (LSVM), Gradient Boosted Trees (GBT) and Random Forests.
Other methods will be added over time if there is a potential benefit to trading.
This software tool is designed for people who are serious about their trading. It does have a
learning curve, so be prepared to spend some time understanding and researching before rushing
to live trading. If you are looking for a $99 get rich quick EA1 and do not want to spend any
time developing your own system, then this probably isnt the tool for you. If you believe that
$99 get rich quick EAs actually exist and do what they claim, then this definitely isnt the tool
for you.
Predicting financial markets is a difficult problem. The patterns we are attempting to forecast are
extremely weak. There are many academic papers that discuss which machine learning algorithm
is best SVM versus Neural Networks versus Random Forests, etc. We have found that the
features that make up an observation which is used for forecasting is much more important
than the actual algorithm. If you have a feature set which does not contain any patterns, then
whichever technique you use will not work. Thus is it better to spend the majority of your
time working on feature selection and engineering rather than fuss about SVM versus Neural
Networks.
DeepThought is a command line tool that operates on XML configuration files. A DLL version
integrates with Metatrader for live trading. Both the DLL and the command line EXE are built
from the same source code. The configuration file contains all the settings necessary for both
backtesting and live trading, thus once a good configuration has been found, the DLL can use
the same configuration without modification. A genetic algorithm is able be used for parameter
search.

1.1

Software Requirements

Scripts in Python are provided to perform analysis on backtested configurations, and to import
data into DeepThought databases. It is suggested that Python(x,y) is used as it provides a
full Python development environment, including common scientific, mathematical and plotting
libraries. It can be downloaded from code.google.com/p/pythonxy/
1
EA stands for Expert Advisor, Metatraders terminology for a script which trades automatically without
human intervention.

CHAPTER 1. INTRODUCTION

1.2

Data

DeepThought needs access to historical data for backtesting (not connected to a trading platform), live and paper-trading, so data is stored in separate Sqlite databases. These can be
inspected using any Sqlite tools such as Sqliteman available for free from sqliteman.com.
If you have access to reliable historical data, then you should import this into the database.
Python scripts have been supplied for this purpose see sections 2.1.2 and 2.3 for more details.
This database is also used for live trading. When Metatrader is running, the DeepThought EA
collects market ticks and passes them to the DLL. The ticks are used to create 1 min candles
which are then stored in the database which is used for signal generation. Metatrader is only
used as a order placement/management system all signal logic is contained in the DLL.

1.3

Configuration

This is the heart of the system. Use a text editor such as Notepad++ to edit XML configuration
files. A few samples are supplied in the examples file. Configuration files must be in their own
unique directory with the filename config.xml or config.xml. This is because you will likely
be working on several config files at the same time, or want to keep previous config files with
their output for reference purposes. It is easier to keep all related files together in the same
directory as it minimises clutter.

1.4

Output Files

During backtesting, paper-trading and live trading various files are produced recording the
signals generated, log file, PnL and the feature statistics. These are listed in table 5.1 on page
13.

Chapter 2

Data
Data is stored in Sqlite databases. Each instrument has its own database. These are generally
stored in the directory C:\FX Database (in Windows). These databases are used both for
backtesting and for live trading. There are certain limits with EA access to data in MT4
which can only be overcome by not using MT4 for historical data access. Data can also be
imported from other sources such as www.histdata.com, and a python script is provided for
this purpose.
The DeepThought EA running in MT4 builds 1 minute candles from ticks passed from MT4.
These are automatically stored in the database as they are created.

2.1

Importing Historical Data from MT4

Data can be exported as CSV files from MT4. The first task is to ensure MT4 has the maximum
amount of data available from the broker.

2.1.1

Exporting data as CSV from MT4

You can export data from MT4 to a CSV file in the following way:
Open a 1 minute chart on the instrument that you want data.
Select Tools Options and in the Charts tab, make sure the Max bars in history and
Max bars in chart are set to something huge. If not, enter something like 9999999999999.
Make sure auto-scrolling is off by checking Charts Auto Scroll.
Press and hold the Page Up Key. This forces MT4 to download older data and is more
reliable than Tools History Centre Download. This can take a while and the
amount of data available depends on your broker.
Once the chart stops downloading data, select Tools History Center, and navigate
to the 1 Minute (M1) option of the desired instrument. Click on Export and save as a
csv file.

CHAPTER 2. DATA

2.1.2

Importing MT4 CSV data into a DeepThought Database

The python directory contains scripts to import historical data. To import data from an exported CSV from from MT4, use the following command:
python import_mt4_csv.py -d C:\FX_database\EURUSD.db -c EURUSDm1.csv -n
The above command, run from the python directory, will create a database EURUSD.db in the
C:\FX Database directory and import the data in EURUSDm1.csv. The script assumes the file
EURUSDm1.csv is in the same directory as the script. The -n parameter will create a new
database. If you have an existing database, omit this parameter and the new data will be
merged with existing data. It will not overwrite any conflicts, but will fill in the gaps of any
missing data.
It is useful to run the above script once a week, maybe at the weekend, to ensure any data gaps
caused due to network outages, or other interruptions to the DeepThought EA running in MT4,
are filled.

2.2

Importing from Dukascopy

Dukascopy www.dukascopy.com makes historical tick data available for free. This can be
downloaded using a free tool at www.strategyquant.com/tickdatadownloader/. Note that
DeepThought is not associated or affiliated with Dukascopy or Strategyquant in any way.
Once the tick data has been downloaded using the above tool, it can be imported using the
command:
deepthought --import-dukascopy-csv D:\TickDataDownloader\tickdata\EURUSD.csv
--dbname C:\FX Database\EURUSD.db

where the tick downloader has downloaded and created a single CSV file in
D:\TickDataDownloader\tickdata\EURUSD.csv. A new database will be created if it doesnt
exist, otherwise the new data will be merged with an existing database. When merging the old
data will not be overwritten.

2.3

Importing from HistData

In the python directory there is a script for importing historical data files downloaded from
www.histdata.com. Run this script similar to the MT4 import script:
python import_histdata.py --db <db file> --dir <mt4 csv file> --createdb --unzip
The --createdb and --unzip parameters are optional. If the --createdb option is present, a
new database will be created. The process will fail if a database with the same name already
exists to prevent accidental overwriting. HistData files are downloaded in zip format. You can
unzip these manually yourself, or supply the --unzip option to have the script do this before
importing.

Chapter 3

Terminology
We define a Feature as a type of information used in the training/forecasting sets. Examples
for features are Hour of Day and Close price difference between two candles. Each
feature has at least one attribute. An attribute is the actual number or value used in the
training/forecasting set. The Hour of Day feature has one attribute, the hour, and the Close
price difference between two candles feature can have as many attributes as defined.
A feature vector is a series of feature attributes that form an observation. This could comprise,
for example, Hour of Day, Day of Week, 30 differences in close price and 10 differences in moving
averages, thus the feature vector wold have 42 attributes.
An attribute is classed as either a continuous or a categorical variable. Continuous variables
are variables whose values are real numbers such as a change in price. Categorical variables
are variables that can only take specific values such as day of week which can only be one of
{sun,mon,tue,wed,thu,fri,sat,sun}.
A label is the forecast variable, i.e. the thing we are trying to predict. When used during the
(supervised) training phase, each of the features used for training must be assigned a label. The
set of feature vectors together with their labels is the training set.
The current version of DeepThought focuses mainly on classification problems. That is the
labels is 1 (for true), and -1 (for false). Regression problems attempt to forecast magnitude
as well as direction. There is limited support for regression problems, and this will be enhanced
in future versions. We have found that it is hard enough to predict whether the market will
move up or down, let alone by how much.
A label is typically something like the close price is higher/lower at the end of the next candle
in the future. A label of 1 would indicate higher, and a label -1 would indicate lower.
A model is the collection of parameters that define a feature vector. This would include
parameters such as how many previous close price difference to include, and how to scaled the
values.
A predictor is a self contained forecaster, such as an individual SVM or GBT.
After the training phase, a final model is built for each predictor. Note, this a different usage of
the term model to the one given above. Currently these are stored in memory as retraining is
frequent. The model is used to forecast a label for an unlabelled feature vector.

Chapter 4

Machine Learning
It is beyond the scope of this manual to describe each of the machine learning algorithms in
detail. The interested reader should consider the Stanford and/or Caltech machine learning
course offered via iTunesU (for free).
Machine learning problems tend to be divided into two main approaches: Classification where
the goal is to forecast discrete classes ; and regression where the goal is to forecast a real
number.
DeepThought supports both classification and regression. For trading system, it is probably
best to focus on classification as it is a difficult enough to forecast market up or down, let alone
by how much. Most classification problems are two class. Multi-class is possible generally by
reducing the problem into several two class problems, or a one-versus-all setting.
Often the process in applying machine learning to trading systems involves an offline step of
building a model, then deploying the model to the trading system. DeepThought takes a different
approach by enabling the system to continuously retrain. While it is possible to build a single
model then forecast using only this model, the preferred mode of operation is to retrain after
the forecast and signal has been sent to the market. The sequence of events is:
1. At system spin-up, train all predictors.
2. New candle (or Renko bar) received and saved to database.
3. Forecasts made by ensemble of predictors and combined into a single signal.
4. Signal sent to trading platform (e.g. Metatrader).
5. All predictors retrained, ready for the next candle to complete to trigger the next forecast.
Thus your system can continuously adapt to the market.

4.1

Support Vector Machines (SVM)

The parameters associated with SVMs are: Kernel type, C (penalty), g (for radial basis function
kernel), e (only for regression). Generally the radial basis function kernel is used, with classification so the only parameters to select are C and g. The DeepThought command line tool has an
option to perform a grid search using 5 fold cross validation. This means the results provided
are for out of sample data, avoiding over-fitting. See 10.6 on page 81 for details on how to do a
grid search.

CHAPTER 4. MACHINE LEARNING

DeepThought supports linear SVMs and kernel SVMs. Linear SVMs are faster than kernel
SVMs, but may not perform well if the problem is non-linear ; i.e. the dependent variable
(the thing we are forecasting) is not a linear combination of the inputs. Kernel SVMs use a
kernel function such as a gaussian to map inputs into a higher dimensional space, then a linear
algorithm is applied to these higher dimensional features. This enables them to model non-linear
relationships. The trade-off is that they can be prone to overfitting and care must be taken to
avoid this by properly evaluating on out-of-sample test data. Overfitting is where the model has
very accurately fitted the training data but does not generalise the underlying relationship well.
This is illustrated in figure 4.1.

Figure 4.1: Overfitting where the green line has overfitted the training data. A better fit is the
black line where the underlying function has been modelled by allowing a few mis-classifications
in the training data.

4.2

Gradient Boosted Trees (GBT)

GBT is a decision tree process. It works by creating an initial decision tree then creates successive
trees that are trained on the errors of the previous trees. This is termed a greedy algorithm. The
more trees the better, and this method is good at avoiding over fitting. An advantage of decisiontree based methods, including Random Forests detailed below, is that no normalisation or
outlier removal is required. Two parameters are required by the GBT: number of trees and tree
depth.

4.3

Random Forests

Random forests work by combining the results of many weak predictors to form a stronger
predictor. The weak predictor is a decision tree, and each decision tree is built on a random
subset of features and samples. When forecasting, the final prediction is the most common class
for classification, or an average of each trees prediction for regression.

4.4

Extremely Randomised Trees

This is a variation of Random Forests where a different method is used to select the features/data for each tree.

CHAPTER 4. MACHINE LEARNING

4.5

Multi-Layer Perceptron

Also known as Neural Networks. Probably the most widely familiar form of machine
learning. DeepThought supports two methods of training a multi-layer Perceptron: backpropagation and Rprop. For details on back-propagation see http://en.wikipedia.org/
wiki/Backpropagation, and for Rprop see http://en.wikipedia.org/wiki/Rprop.
A multi-layer perceptron is made up of a number of neurons, connected in layers. The first
layer takes the input so the number of neurons in this layer is always equal to the number
of attributes in the input. The output layer is where we read the forecast so the number of
neurons is equal to the number of attributes that make up the forecast variable. For a two-class
classification problem there will be two output neurons, and for a regression problem there will
only be one.
The multi-layer perceptron can also contain hidden layers. Normally there is only one hidden
layer, but we can have more than one. The topology is illustrated in figure 4.2.

Figure 4.2: Multi-Layer Perceptron with 3 inputs, 5 hidden and 2 output neurons.

4.6

Ensembles

Ensembles have been described as the closest thing to a free lunch in machine learning. Much
effort has been put into the implementation of ensembles in DeepThought . Each of the predictor
types can have one or more sets of parameters. One of the drawbacks of SVMs is that hyperparameters must be selected. For classification these are C and . As the patterns we are
attempting to predict are extremely weak, we can never be sure that a single set of specific
values will perform as well as indicated during cross-validation. A way around this is simply to
use an ensemble of all values and use the majority vote as the final prediction. This is what
each predictor does.
We can also mix different predictor types, and have multiple models per predictor type. The limit
to the number and variety of predictors is limited only by computational power. DeepThought
will use all available cores on your PC during backtesting, but it can still be slow when large
ensembles are used. A future version will be able to spread a single backtest across several
machines. Note that the genetic algorithm can use an unlimited cluster of machines by utilising
the Condor system.

CHAPTER 4. MACHINE LEARNING

4.7

Continuous Features

Continuous features are features whose value is a floating point number, i.e they can take any
value. An example is the difference between two close prices. Different features can have different
ranges. For example comparing a feature of the difference between two close prices, and a feature
of the difference between moving average values 100 bars apart we can see that the latter will
have a greater range than the former. To adjust for this, feature values must be normalised
to bring them into more-or-less the same range. This prevents a feature with a large range
squashing or overwhelming features with smaller ranges. DeepThought has several methods of
approaching this.

4.7.1

Feature Normalisation

Normalisation is the process that scales the values of each feature in the same range. The
parameters found during normalisation are used to scale the feature vector used for forecasting.
DeepThought supports several techniques for normalising training/forecasting features, listed in
table 4.1.
Table 4.1: Normalisation and Scaling Schemes
Scaling Type
min-max
zscore

div-sd
div-max
log10

Description
scale all values between 1 and 1 using the minimum and
maximum values for the feature value
for each feature value, subtract the mean and divide by the
standard deviation. The resulting scaled feature has a mean
of zero and a standard deviation of one.
divide each feature by the standard deviation
divide each feature by the maximum of the absolute values
of the maximum and minimum
take the base-10 logarithm of each feature value

Which scheme works best is a task for trial-and-error, but starting with min-max and zscore is
recommended.

4.8

Categorical Features

Categorical features have specific values. An example is day of week. We could map days of
week to an integer in the range 0. . . 6 and use that as a continuous value and treat it as above, or
we could binarise it into multiple attributes. DeepThought supports both approaches. When a
feature is binarised, it is mapped into multiple attributes that can take values of zero or one. A
set of attributes for the feature can have only one attribute with the 1 value and all others are
0. This is sometimes called a one-hot vector approach. Table 4.2 illustrates this process.

CHAPTER 4. MACHINE LEARNING

10

Table 4.2: Binarising categorical variables.


Day of Week
Sunday
Monday
Tuesday
Wednesday
Thursday
Friday
Saturday

Binarised encoding
1000000
0100000
0010000
0001000
0000100
0000010
0000001

Chapter 5

Backtesting
You will probably spend a lot of time in backtesting as there is a lot trial-and-error involved in
creating a system. The built in backtester is capable of simulating market and limit orders, take
profit, stop loss and move to break even. It operates on 1 minute candles. The backtester also
operates when in live trading mode so you can compare actual results with simulated in real
time. It also functions as as a paper-trader when running in live trading mode and the EA is
set to not place any actual trades.
Chapter 9 on page 9 describes configuration settings in detail. This chapter focuses on the
process of backtesting.

5.1

Backtesting Setup

The first step is to create a unique directory that contains a configuration file. This has the
filename either config.xml or config.xml. The latter filename ensures the configuration
file is always at the top of a directory listing. Each configuration file is kept in a separate
directory as files are created during the backtest for later analysis. If you are working on several
configurations at the same time, or want to keep previous configuration files with their results,
then having each in a separate directory avoids clutter.
The configuration file contains a section named backtest. A typical setup is shown below:
1
2
3
4
5
6
7

<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2013-12-08</stop-date>
<use-recorded-signals>True</use-recorded-signals>
<display-progress>False</display-progress>
<execute-when-complete>python "C:\DeepThought\python\analyse_backtest_results.py" %CONFIG_LOCATION%</
execute-when-complete>
</backtest>

The display-progress setting turns on/off display of trades as they are closed in the console.
Windows display of text in the console is slow (compared to Linux), so if you are using recorded
signals as described below, turning the progress display off can speed the backtest further. You
dont lose anything by turning the progress off as all results including progress is logged in the
various output files.
The actual backtest is started by the command:
deepthought --backtest C:\configs\EURUSD MA TEST
11

CHAPTER 5. BACKTESTING

12

where C:\configs\EURUSD MA TEST is the directory where the config.xml is located.

5.2

Recording and Using Recorded Signals

During the backtesting (and paper-trading) process, the signals are recorded to a file and stored
in the same directory as the configuration file. If you are using a large ensemble of machine
learning predictors, a backtest over a year can take hours or even days. Sometimes you may
not be changing the machine learning settings, but experimenting with other settings such as
take profit, or trigger. In this situation you can run the backtest once to generate and record
the signals. Before running the next backtest set use-recorded-signals setting to True and
the next time the backtest is run the machine learning training and forecasting will be bypassed
and the signal looked up from the recorded signals file. This dramatically shortens the time to
run a backtest provided none of the machine learning settings have been altered.
Another use of the recorded signals file is for backtesting in Metatrader. An EA has been
provided which uses these signals in Metatraders strategy tester. The recorded signals file is
named recorded.signals.csv and must be copied to:
<Metatrader-install-dir>\tester\files
For example, if your broker was InterbankFX this directory would be
C:\Program Files (x86)\InterbankFX\tester\files
This is a restriction by Metatrader as EAs run in the strategy tester cannot access files outside
this location. The source has been provided for this EA so you could adapt an existing system
to utilise the signals.

5.3

Order Fill Simulation

At the close of each 1 minute candle, the simulator looks at the high and low prices and decides
if order prices have been hit. Orders can have optional take-profit and stop-loss prices.
There is also an optional break-even setting. If this has been set, a stop-loss is automatically
set with a price equal to entry price plus 1 pip for a buy order, and entry price minus 1 pip for
a sell order. The typical sequence of events in order fill simulation is:
1. Signal indicates an order is to be placed.
2. Order is placed. If it is a market order, it is immediately filled as the last known bid price
for a sell, and last known bid + spread price for a buy. If it is a limit order, the price is
checked at the end of the next 1 minute candle.
3. At the end of the 1 minute candle limit orders are checked for fills by looking at the candles
high and low prices.
4. Check take-profit, stop-loss and break-even. If the take-profit or stop-loss has
been hit, close the position. If break-even has been set and the price has been reached
set a stop-loss at break even +1 pip.

CHAPTER 5. BACKTESTING

5.4

13

Paper Trading

The provided MT4 EA has a setting do live trade. When this is set to false then no live
trades are placed, but DeepThought will continue to simulate trades using live market data, and
populate the database. It is worthwhile to always run a paper-trader for the purpose of keeping
the database for each instrument you use up to date. The files produced during paper-trading
are identical to the files produced during backtesting as the same process is used.

5.5

Files Produced During Backtesting and Paper Trading

Various files are produced by the trade simulator during backtesting and paper trading. These
are detailed in table 5.1.
Table 5.1: Files Produced During Backtesting/Paper Trading
Filename
backtest.log
daily.returns.csv
pnl.csv
recorded.signals.csv

statistics.h4-features.csv

svm-c-rbf.forecasts.csv

Description
The log file detailing all events during backtesting. Useful
for debugging.
The daily returns of the backtest. The trade open date-time
is used to group trades to the same day.
A record of each individual trade.
The signals generated by the ensemble. Used in playback
during subsequent backtests, and used by MT4 in the strategy tester.
The statistics (min,max,mean,stddev) of each attribute in
a model. Useful to spot data errors as values should be
reasonably stable over time. Any sharp and or large changes
should be investigated. This filename example is for a model
named h4-features in the configuration.
For each predictor, a file is generated detailing the forecasts
it made. Useful for external analysis. This filename example
is for a predictor named svm-c-rbf in the configuration.

Chapter 6

Genetic Algorithm for Parameter


Search
A genetic algorithm is a way of searching a large search space using methods inspired by biology.
In DeepThought the genetic algorithm is used for parameter selection. We could use a bruteforce approach and test every combination of parameters available however the (usually) very
large number of combinations makes this infeasible.
A genome defines a list of parameters. This list of parameters is tested in a backtest to produce
a score which is used to rank the parameter set. In genetic algorithm terminology the backtest
generates the objective function which is specified in the configuration as Sharpe Ratio,
Accuracy or Profit (in Pips). You could potentially run separate genetic algorithms and
optimise on all objective functions and combine the results in an ensemble.
DeepThought uses the Condor high performance computing clustering system. It is available for
download from http://research.cs.wisc.edu/htcondor/. Condor is a system that clusters
individual computers together to form a high performance cluster. It can operate on a single
computer, so you still run the genetic algorithm if you only have access to a single computer.
Condor is a system for running many jobs in parallel, so has uses well beyond our use of it for
genetic algorithms. Although it is beyond the scope of this document to provide detailed info
on installing and using condor, we explain the parts relevant to DeepThought. Further detail is
given in appendix B.
The genetic algorithm in DeepThought operates in the following way:
1. DeepThought is started as a GA Server.
2. A random population of genomes is created. This is the first generation.
3. A configuration file is produced from a template for each genome.
4. A Condor submit file is produced and the population submitted to Condor. Each individual configuration file is run on one core of the cluster in parallel with other configuration
files.
5. DeepThought listens on a TCP port for backtests to finish and send a summary of results.
6. As each backtest completes, a summary is transmitted via UDP to DeepThought GA
Server. The log files and other outputs are sent back to the server and stored in individual
directories for later analysis if required.
7. After a configurable timeout has been reached, all running jobs are terminated. This step
is skipped if all jobs complete before the timeout.
14

CHAPTER 6. GENETIC ALGORITHM FOR PARAMETER SEARCH

15

8. The backtest results produced by the genomes are assessed and using parameters detailed
in section 9.11 the next generation of genomes is produced.
9. Steps 3 to 8 are repeated until the number of generations specified in the configuration
has been reached. Alternatively, the GA will stop if all possible combinations have been
tested.
DeepThought keeps a list of all genomes and their results. This is to prevent the same parameter
combination being tested more than once. This file is persisted to disk after each generation
and it also operates as a save point and can be used to resume a genetic algo in the event that
it was interrupted.

6.1

Configuration

A configuration file is supplied to DeepThought in the same format as for backtesting and
live/paper trading. It is used as a template as configuration files are created for each parameter
combination (genome) to be tested. The configuration file must contain a genetic-algo section
similar to the configuration snippet below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18

<genetic-algo>
<ga-server>tcp://wraith</ga-server>
<ga-server-port>55566</ga-server-port>
<genome-id>-1</genome-id>
<objective-function>sortino</objective-function>
<timeout-minutes>360</timeout-minutes>
<population-size>20</population-size>
<mutation-probability>10</mutation-probability>
<num-breeders-percent>30</num-breeders-percent>
<min-num-breeders>30</min-num-breeders>
<num-new-random-genomes>2</num-new-random-genomes>
<num-generations>10</num-generations>
<parameter id="stop-loss"
type="integer" low="10" high="200" step="5" />
<parameter id="take-profit" type="integer" low="10" high="200" step="5" />
<parameter id="time-of-day" type="categorical" values="h1,h4,single,none" />
<parameter id="SVM-Penalty" type="exp-2" low="1" high="15" step="1" />
<parameter id="SVM-gamma"
type="exp-2" low="-8" high="2" step="1" />
</genetic-algo>

See section 9.11 for a detailed explanation of the options. To have the genetic algorithm modify
values in a configuration file, the file must have XML tag ga-subst defined for each value that
can vary where the value of ga-subst is equal to the parameter id defined in the genetic-algo
section. The example below illustrates this:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21

<feature>
<type>hour-of-day</type>
<period ga-subst="time-of-day">h4</period>
</feature>
<feature>
<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<number ga-subst="average-close-num">30</number>
<value-type>diff</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
...
<svm-predictor>
<identifier>svm-c-rbf</identifier>
<model>h4-features</model>
<continuous-tune>false</continuous-tune>
<params> <!-- 48.6% -->
<penalty ga-subst="SVM-penalty">8</penalty>
<gamma ga-subst="SVM-gamma">0.015625</gamma>
<forecast-weight>1.0</forecast-weight>

CHAPTER 6. GENETIC ALGORITHM FOR PARAMETER SEARCH


22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42

<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<num-training-observations>2000</num-training-observations>
<num-training-skip>1</num-training-skip>
</svm-predictor>
...
<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<entry-threshold>0.0</entry-threshold>
<forecast-type>SVC</forecast-type>
<take-profit ga-subst="take-profit">0.0</take-profit>
<stop-loss ga-subst="stop-loss">0.0</stop-loss>
<break-even>20.0</break-even>
<exit-all-hour>-1</exit-all-hour>
<trade-bar-series>EURUSDm1</trade-bar-series>
<reverse-all>False</reverse-all>
</signal-generator>

The parameter types are defined in table 6.1.


Table 6.1: parameter configuration options for the genetic-algo configuration section.
Option
integer

categorical

exp-2

6.2

Description
Used when the parameter can be modelled as an integer. The options
available for the integer type are:
low
The lowest value that the integer can take.
high
The highest value that the integer can take.
step
The value to increment/decrement for different values of this
parameter.
Used when the parameter can only take certain (string) values. The
options available for the categorical type are:
values Comma separated list of values that this parameter can take.
Used when the parameter is best suited to an exponential grid search.
For example SVM penalty, SVM gamma and SVM epsilon are
best searched using an exponential grid search. This means that
rather than use values that are linearly spaced such as 5, 10, 15, 20, ...,
we use values such as 21 , 22 , 23 , 24 , .... This results in final values of
2, 4, 8, 16, .... Note that negative numbers can be used and result in
the final values being less than 1. e.g. 25 , 24 , 23 , 22 , ... become
0.03125, 0.0625, 0.125, 0.25, .... The options available for the exp-2 type
are:
low
The lowest value that the exponent can take.
high
The highest value that the exponent can take.
step
The value to increment/decrement the exponent.

Running the Genetic Algorithm

The genetic algorithm is started with the following command:


DeepThought --genetic-algo C:\configs\EURUSD MA TEST

16

CHAPTER 6. GENETIC ALGORITHM FOR PARAMETER SEARCH

17

This will use the file config.xml or \config.xml in the directory C:\configs\EURUSD MA TEST
in the same way as for backtesting described in section 5.1.
The progress of the genetic algorithm is printed to the console similar to the example below. In
this example, we are using a population of 20 on a single machine with 8 cores. As each backtest
completes, a summary of the results are displayed. The beginning of each line contains three
numbers. The first is the generation number, the second the genome number and the last, the
number of genomes in a population. The Compute host is the name of the machine that the
backtest ran on. Useful for monitoring a cluster of machines to see which machines are quicker
and running more backtests.
C:\DeepThought>DeepThought --genetic-algo C:\DeepThought_Configs\EURUSD_GA
DeepThoughtLib::GeneticAlgo::SubmitToCluster
Submitting job(s)....................
20 job(s) submitted to cluster 41.
Submitted to Condor cluster 41
2014-Jan-12 17:08:59.307337 Info: DeepThoughtLib::GeneticAlgo::WaitForResults Waiting for (20) results for generation 3. Num of jobs is 20. Max wait time is 06:00:00
3/1/20 3003: Obj=-0.102615 Sharpe=-0.102615 PnL=-259.2 dd=-2357.7 num=1616 %=50.5569
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=7 . Time left is 05:38:49.
3/2/20 3006: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=4 . Time left is 05:38:47.
3/3/20 3002: Obj=-1.028 Sharpe=-1.028 PnL=-2684.7 dd=-5193.9 num=1378 %=48.4761
Compute host=Slartibartfast svm-gamma=0 svm-penalty=12 . Time left is 05:38:31.
3/4/20 3004: Obj=-1.37488 Sharpe=-1.37488 PnL=-5296.8 dd=-6997.9 num=1268 %=46.6088
Compute host=Slartibartfast svm-gamma=0 svm-penalty=0 . Time left is 05:38:30.
3/5/20 3014: Obj=-1.34507 Sharpe=-1.34507 PnL=-3374 dd=-4027.3 num=1619 %=48.7338
Compute host=Slartibartfast svm-gamma=-6 svm-penalty=5 . Time left is 05:38:09.
3/6/20 3008: Obj=-1.21692 Sharpe=-1.21692 PnL=-2878.6 dd=-3719.6 num=1619 %=48.2397
Compute host=Slartibartfast svm-gamma=-6 svm-penalty=8 . Time left is 05:34:31.
3/7/20 3001: Obj=-1.3953 Sharpe=-1.3953 PnL=-3371.4 dd=-3995.2 num=1619 %=47.8073
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=11 . Time left is 05:26:46.
3/8/20 3038: Obj=-0.365059 Sharpe=-0.365059 PnL=-1207.7 dd=-4145.4 num=1600 %=51.625
Compute host=Slartibartfast svm-gamma=-5 svm-penalty=0 . Time left is 05:17:44.
3/9/20 3021: Obj=-0.102615 Sharpe=-0.102615 PnL=-259.2 dd=-2357.7 num=1616 %=50.5569
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=7 . Time left is 05:17:22.
3/10/20 3018: Obj=-1.57754 Sharpe=-1.57754 PnL=-4522.9 dd=-4974.1 num=1618 %=48.8257
Compute host=Slartibartfast svm-gamma=-4 svm-penalty=3 . Time left is 05:17:21.
3/11/20 3043: Obj=-0.577919 Sharpe=-0.577919 PnL=-1692.1 dd=-2551.6 num=1596 %=48.7469
Compute host=Slartibartfast svm-gamma=-1 svm-penalty=4 . Time left is 05:17:01.
3/12/20 3022: Obj=-1.028 Sharpe=-1.028 PnL=-2684.7 dd=-5193.9 num=1378 %=48.4761
Compute host=Slartibartfast svm-gamma=0 svm-penalty=4 . Time left is 05:16:59.
3/13/20 3009: Obj=-1.21359 Sharpe=-1.21359 PnL=-3082.6 dd=-4291.5 num=1618 %=48.7021
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=12 . Time left is 05:16:48.
3/14/20 3044: Obj=-0.0356163 Sharpe=-0.0356163 PnL=-83 dd=-3019.5 num=1619 %=50.8956
Compute host=Slartibartfast svm-gamma=-6 svm-penalty=3 . Time left is 05:14:11.
3/15/20 3052: Obj=-0.365059 Sharpe=-0.365059 PnL=-1207.7 dd=-4145.4 num=1600 %=51.625
Compute host=Slartibartfast svm-gamma=-5 svm-penalty=0 . Time left is 05:00:50.
3/16/20 3046: Obj=-0.469054 Sharpe=-0.469054 PnL=-1470.5 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=3 . Time left is 05:00:43.
3/17/20 3048: Obj=-1.57754 Sharpe=-1.57754 PnL=-4522.9 dd=-4974.1 num=1618 %=48.8257
Compute host=Slartibartfast svm-gamma=-4 svm-penalty=3 . Time left is 05:00:30.
3/18/20 3062: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=5 . Time left is 05:00:16.
3/19/20 3056: Obj=-0.469054 Sharpe=-0.469054 PnL=-1470.5 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=3 . Time left is 05:00:15.
3/20/20 3045: Obj=-1.21359 Sharpe=-1.21359 PnL=-3082.6 dd=-4291.5 num=1618 %=48.7021
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=12 . Time left is 04:53:48.
Received enough results (20) for generation 3
All jobs in cluster 41 have been marked for removal
***********************************
Best 20 results for generation 3
***********************************
2024: Obj=1.04909 Sharpe=1.04909 PnL=2929.9 dd=-1373 num=1619 %=53.7986
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=3
2025: Obj=1.04909 Sharpe=1.04909 PnL=2929.9 dd=-1373 num=1619 %=53.7986
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=3
1002: Obj=0.167274 Sharpe=0.167274 PnL=298.4 dd=-1526.8 num=1620 %=60.1852
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=5
1003: Obj=0.0850919 Sharpe=0.0850919 PnL=170.2 dd=-1966.8 num=1620 %=60.6173
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=4
3044: Obj=-0.0356163 Sharpe=-0.0356163 PnL=-83 dd=-3019.5 num=1619 %=50.8956
Compute host=Slartibartfast svm-gamma=-6 svm-penalty=3
3003: Obj=-0.102615 Sharpe=-0.102615 PnL=-259.2 dd=-2357.7 num=1616 %=50.5569
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=7
3021: Obj=-0.102615 Sharpe=-0.102615 PnL=-259.2 dd=-2357.7 num=1616 %=50.5569
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=7
3038: Obj=-0.365059 Sharpe=-0.365059 PnL=-1207.7 dd=-4145.4 num=1600 %=51.625
Compute host=Slartibartfast svm-gamma=-5 svm-penalty=0
3052: Obj=-0.365059 Sharpe=-0.365059 PnL=-1207.7 dd=-4145.4 num=1600 %=51.625
Compute host=Slartibartfast svm-gamma=-5 svm-penalty=0
3046: Obj=-0.469054 Sharpe=-0.469054 PnL=-1470.5 dd=-4545.9 num=1616 %=49.1337

CHAPTER 6. GENETIC ALGORITHM FOR PARAMETER SEARCH

18

Compute host=Slartibartfast svm-gamma=-2 svm-penalty=3


3056: Obj=-0.469054 Sharpe=-0.469054 PnL=-1470.5 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=3
2002: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=12
2034: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=8
3006: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=4
3062: Obj=-0.474801 Sharpe=-0.474801 PnL=-1489 dd=-4545.9 num=1616 %=49.1337
Compute host=Slartibartfast svm-gamma=-2 svm-penalty=5
2036: Obj=-0.508462 Sharpe=-0.508462 PnL=-1231.8 dd=-3003.5 num=1620 %=49.6296
Compute host=Slartibartfast svm-gamma=-8 svm-penalty=8
2018: Obj=-0.570008 Sharpe=-0.570008 PnL=-1397.2 dd=-2994.2 num=1618 %=50.1236
Compute host=Slartibartfast svm-gamma=-6 svm-penalty=4
2031: Obj=-0.577919 Sharpe=-0.577919 PnL=-1692.1 dd=-2551.6 num=1596 %=48.7469
Compute host=Slartibartfast svm-gamma=-1 svm-penalty=3
3043: Obj=-0.577919 Sharpe=-0.577919 PnL=-1692.1 dd=-2551.6 num=1596 %=48.7469
Compute host=Slartibartfast svm-gamma=-1 svm-penalty=4
1004: Obj=-0.630181 Sharpe=-0.630181 PnL=-1419.7 dd=-2004.3 num=1616 %=57.5495
Compute host=Slartibartfast svm-gamma=-1 svm-penalty=5

6.2.1

Database

A copy of the database must exist on all machines in the cluster in an identical location. Normally
the database(s) are in C:\FX Database and this should be copied to all machines in the cluster.
This is so that the genetic algorithm does not need to send a copy of the data to each node
(there would be 8 copies of the same data on a single machine with 8 cores).

6.3

Genetic Algorithm Results

While the genetic algorithm is running, results are accumulated in the directory given in the
commandline. In the above example this is C:\configs\EURUSD MA TEST. A separate directory
is created for each generation, named generation-1, generation-2, etc. For each individual
genome a results.zip file is created containing the generated configuration and all output files.
This filename is prepended with the genome-id.
In the directory containing the configuration template, a file genetic-algo-cache.xml is created and updated each time a backtest completes on a Condor node. This contains the
genome-id along with a summary of results and a list of values assigned to the parameters
that the genetic algorithm is optimising. A sample of this file is given below. The best results
are always at the top. This is the file that is also used as a save point in the event that the
genetic algorithm was interrupted.

6.4

Using Recorded Results

If use-recorded-results is set to True in the backtest configuration section you must ensure
that a file named recorded.signals.csv is in the same directory as the configuration. This
file is generated by a backtest as explained in section 5.2.

6.5

The Condor Submit File

Condor operates on submit files. These are plain text files that list the jobs to be run on the
cluster. DeepThought generates submit files for each generation. These are created in the same
directory as the genetic algorithm configuration file. You should not normally need to view

CHAPTER 6. GENETIC ALGORITHM FOR PARAMETER SEARCH

19

these files, and altering them will have no effect as they are always generated by the genetic
algorithm.

6.6

Trouble Shooting

Most problems occur because of a problem with the configuration file. First check the backtest.log file for errors. Other things to check are dates: are the backtest start/stop dates contained
within the data? Also check database filenames and that the database exists in the same
directory on all machines in the cluster and is populated. Normally the database(s) are in C:\
FX Database and this should be copied to all machines in the cluster.
The genetic algorithm is slightly harder to debug as there is less direct access to what is happening, and there is a reliance on a third party component (Condor). In the results.zip file of a
genome located in the generation-n directory of generation n, check the backtest.log file for
errors. If it is empty, or the problem is not evident, try directly backtesting the configuration
file.
If you are using a multi-machine cluster, try disabling firewalls and other things that may prevent
network access. You can also check the Condor log file. Although this tends to be a little cryptic
it may provide clues where to start looking.

Chapter 7

Live and Paper Trading


Once you have been through the research and development process and have found a configuration that you are happy with the next stage is to paper trade. We strongly suggest doing
this before live trading to ensure that paper trading results match (in a statistical sense) your
backtested results.
The process for live and paper trading is identical with the exception that orders are not placed
in a live market.

7.1

Manual Trading

DeepThought can be traded manually. One use of manual trading is for end-of-day systems
where it is feasible for a human to make every trade manually.
One challenge with manual trading is populating the database. To do this we suggest using
the provided Metatrader EA as described below, but setting the parameter do-live-trade to
false. This will populate the database while placing no trades. The EA can be left running as
it is possible for more than one program to access the database at any one time.
Manual trading is done with two commands. The first (--manual-trade-train-and-persist)
will train all models in the configuration and save in the configuration directory. The command
below is an example of manually training from the configuration in C:\DeepThought Configs\EURUSD Strategy 1:
deepthought --manual-trade-train-and-persist C:\DeepThought Configs\EURUSD Strategy 1
The output should simply show Ok if the training could be done. If not check the log file in the
configuration directory for hints on what went wrong. Once the models have been trained the
forecasts can be generated using the --manual-trade-generate-signal option:
deepthought --manual-trade-generate-signal C:\DeepThought Configs\EURUSD Strategy 1
The output will be similar to the following:
DeepThought built on Jan
BUY
Consensus=25
NumberOfPredictors=45

7 2014 at 16:48:35

20

CHAPTER 7. LIVE AND PAPER TRADING

21

The output is formatted in this way to make it easy for other scripts to parse the output if
DeepThought manual signals form part of a larger trading strategy. When generating signals
manually the normal sequence of events should be:
1. Run --manual-trade-train-and-persist to generate the initial model.
2. Wait for the candle to complete on the time-frame that you are forecasting on.
3. When the candle completes run --manual-trade-generate-signal and act on the forecast.
4. After the forecast has been processed, re-run --manual-trade-train-and-persist to
re-generate the models on the latest data.
The events are sequenced in this way as forecasting can take a while if you are using a large
ensemble. Using the sequence above we have as much time as it takes a candle to complete to
train the models.

7.2

Automated Trading

DeepThought is able to auto-trade with Metatrader 4 using the supplied Expert Advisor. Links
to other trading platforms will be added over time. Please contact us with a request to create
a link to the platform you are using (if not Metatrader) the more requests, the higher the
priority will become for implementation.
An Expert Advisor (EA), Metatraders terminology for an automated trading script is provided which accesses the DeepThought DLL. The source of the EA is provided so you can add
your own trading logic to the signals generated by DeepThought. This EA provides basic trading of the signals generated by DeepThought and you should add your own trading logic. This
EA is intended to be starting point as a place to add your own trading rules. You can backtest
any trading logic using recorded signals following the process described in section 5.2 of page
12.
The EA is in the DeepThought installation directory Metatrader, named DeepThought.mq4.
It must be placed in the experts directory where Metatrader is installed. For example if your
broker was InterbankFX this directory would be
C:\Program Files (x86)\InterbankFX\experts
The DLL, named DeepThought.Dll located in the DeepThought installation directory needs
to be copied to the experts\library directory where Metatrader is installed. For example if
your broker was InterbankFX this directory would be
C:\Program Files (x86)\InterbankFX\experts\library
When you next start Metatrader the DeepThought EA should be available in the Experts folder
in Metatrader. Add this to a chart in the normal way. It can be added to any time-frame as ticks
are used to generate candles, however we suggest adding it to the 1 minute time frame.
You also need to place the licence file you received when purchasing DeepThought in the
same directory as the Metatrader executable. For example if your broker was InterbankFX this
directory would be
C:\Program Files (x86)\InterbankFX\
If you are trading several instruments, we suggest a separate Metatrader instance for each
instrument as Metatrader will likely crash when loading the same DLL into more than one EA.

CHAPTER 7. LIVE AND PAPER TRADING

22

To create a new instance of Metatrader, simply copy the Metatrader installation to another
location so each instance has a completely separate set of files.
Table 7.1: DeepThought parameters for the Metatrader EA.
Data Type
string

Parameter
files location

Default

int

gmt offset

int

max trade duration seconds

string

deep thought db

EURUSDm1

double
bool

trade lot size


do limit orders

0.1
true

double

limit order offset

0.0002

int

magic number

1600

bool

do live trade

false

bool

add to position

true

7.3

Description
The directory where the XML
configuration is location
The hour offset from GMT of
your broker. If your historical
data is in UTC (i.e. GMT)
time then you will need an offset to ensure there are no gaps
in the data caused by timezone changes.
Automatically close trades after this many seconds. Set to
0 to leave trades open (they
will close with an opposite signal).
The identifier of the
1 minute bar-series in
DeepThought that price ticks
will be sent to, to build 1
minute candles.
The trade size in lots.
Use limit orders. If set to
false, market orders will be
used.
The price offset to use for
placing limit orders.
The number that Metatrader
inserts with trade info. Enables you to track which
trades came from what system if you are running multiple systems.
Set to true for live trading;
set to false for paper trading.
If set to true, will add positions to existing positions,
sometimes known as pyramiding.

Trouble Shooting

Metatrader can be unstable when it is working with external DLLs. It can be particularly bad
when changing parameters in the EA and a Metatrader crash is unfortunately all too common.
We hope that these stability issues will be fixed in Metatrader 5.
If you are having problems changing parameters in the DeepThought EA, follow these

CHAPTER 7. LIVE AND PAPER TRADING

23

steps:
1. Save the EA parameters:
2. Delete the EA from the chart.
3. Exit Metatrader.
4. Open the windows task-manager and check if Terminal is still running. If it is, highlight
it and click End Process.
5. Restart Metatrader.
6. Add the EA back to the chart, load the parameters saved in step 1 and make the changes.
This may seem a bit odd but since as the stability of Metatrader is beyond our control this
is all we can offer. If Metatrader still crashes, then a reboot of the computer is probably
required.

Chapter 8

Python Scripting
DeepThought uses the Python language for scripting. There are no restrictions on the Python
scripts. DeepThought uses the Python system installed on your PC thus you are able to use
whatever libraries and modules (e.g. scipy, numpy, pandas etc) you require. When a function
is called from DeepThought , an interface object is passed to your script enabling your script
to access elements in DeepThought such as candle data and to pass back various values such as
the forecast or training label value.
The use of embedded Python enables unlimited customisation in the following areas:
1. Features - custom features from any datasource accessible to your Python scripts.
2. Target and trigger - define a trigger when forecasts are made and define a target that the
predictors is forecasting.
3. Predictor - works with the builtin machine learning predictors, or supply your own. Does
not need to be machine learning based, essentially allowing DeepThought to be used as a
standard algorithmic platform.
4. Signal Generator - combine forecasts from the predictors to produce buy/sell signals.
The use of Python is optional and it is entirely possible to produce a working system without
the use of Python.
There is a <python> section in the configuration file.
This contains one or more
<script-filename> entries. Each script can contain one or more functions, or all functions
can be in a single script file. If you are using multiple script files, they all operate in the same
namespace. An example <python> section is given below.
1
2
3

<python>
<script-filename>target_num_pips.py</script-filename>
</python>

Your Python scripts reside in the same directory as the config.xml configuration file.

8.1

Python Installation

DeepThought uses Python version 2.7. It uses the 32-bit version on windows. The installation
process installs an optional Python distribution MiniConda. This is a cut-down version of the
free Anaconda distribution available at http://store.continuum.io/cshop/anaconda/. You

24

CHAPTER 8. PYTHON SCRIPTING

25

can bypass the installation provided with DeepThought and install the full Anaconda distribution, or install another distribution. The only requirement is that it is version 2.7 32-bit
(Windows). The Linux and MacOS versions of DeepThought use 64-bit. We use the 2.7 version
rather than the 3.3 version as most large distributions (e.g. Anaconda, Python(x,y)) are still
based on 2.7.
We strongly recommend the usage of the Numpy and Pandas libraries for numberical and
time-series processing and Matplotlib for visualisation. These are installed by default with
Anaconda and can be installed if you are using Miniconda with the following in a commandline
prompt:
1
2

conda install pandas


conda install matplotlib

The Numpy library automatically will be installed as both Pandas and Matplotlib depend on
it.

8.2

Python Feature

A feature is comprised of one or more numical values (attributes). You can have as many
Python generated features as you wish. Each feature must be generated using a unique function
name.
To add a Python generated feature to your model, add a feature of type python-script. A
complete example configuration is given below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38

<config>
<bar-series>
<identifier>EURUSDm1</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="database">eurusd.db</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>1</bar-duration-minutes>
<const-bar-price>0.0</const-bar-price>
</bar-series>
<bar-series>
<identifier>EURUSDh4</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="bar-series">EURUSDm1</source>
<history-source-type>bar-series</history-source-type>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>240</bar-duration-minutes>
<delay-minutes-offset>0</delay-minutes-offset>
</bar-series>
<bar-series-collection>
<data-file-dir>C:\FX_Database</data-file-dir>
</bar-series-collection>
<python>
<script-filename>ema diff feature.py</script-filename>
</python>
<model>
<identifier>h4-features</identifier>
<target>
<type>bars-in-future</type>
<identifier>target-1-bar-in-future</identifier>
<bar-series>EURUSDh4</bar-series>
<number>1</number>
<price-type>up-down</price-type>
</target>
<feature>
<type>hour-of-day</type>
<period ga-subst="time-of-day">h4</period>

CHAPTER 8. PYTHON SCRIPTING


39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105

</feature>
<feature>
<type>python-script</type>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>
<get-number-of-attributes-func-name>GetNumberOfAttributes</get-number-of-attributes-func-name>
<get-features-func-name>GetFeatures</get-features-func-name>
<parameter name="ma short period" type="int">20</parameter>
<parameter name="ma long period" type="int">50</parameter>
<identifier>python-test-1</identifier>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<number ga-subst="average-close-num">30</number>
<value-type>diff</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
</model>
<svm-predictor>
<identifier>svm-c-rbf</identifier>
<model>h4-features</model>
<continuous-tune>false</continuous-tune>
<params> <!-- 56.1% -->
<penalty ga-subst="svm-penalty">512</penalty>
<gamma ga-subst="svm-gamma">0.0625</gamma>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<num-training-observations>500</num-training-observations>
<num-training-skip>1</num-training-skip>
</svm-predictor>
<predictor-ensemble>
<retrain-period>Weekly</retrain-period>
</predictor-ensemble>
<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<target-trigger>h4-features</target-trigger>
<entry-threshold>0.0</entry-threshold>
<take-profit>0.0</take-profit>
<stop-loss>0.0</stop-loss>
<break-even>0.0</break-even>
<trade-bar-series>EURUSDm1</trade-bar-series>
<reverse-all>False</reverse-all>
</signal-generator>
<trader>
<hold-minutes>0</hold-minutes>
<hold-bars>0</hold-bars>
<max-drawdown>100000</max-drawdown>
<close-at-weekend>False</close-at-weekend>
<scale-out>False</scale-out>
<max-position>100</max-position>
<limit-orders offset="0.0">False</limit-orders>
</trader>
<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2014-01-01</stop-date>
<use-recorded-signals>False</use-recorded-signals>
<display-progress>True</display-progress>
<execute-when-complete>python C:\DeepThought\python\analyse_backtest_results.py %CONFIG_LOCATION%</
execute-when-complete>
</backtest>
</config>

Listing 8.1: Python feature configuration.


Here we have defined three functions:

26

CHAPTER 8. PYTHON SCRIPTING

27

SetParameterValue
We can set parameters in the configuration file which will be passed to this function once on
spin-up. These parameters can be controlled by the Genetic Algorithm described in Chapter
6 on page 14. The parameters are defined in the configuration using <parameter> entries as
shown in the example above.
GetNumberOfAttributes
A function that returns the number of attributes that make up the feature.
GetFeatures
A function that is responsible for generating the actual numerical attributes. A DeepThought
interface object is provided to this function to pass the attributes back to DeepThought .
An example script that implements the above functions is given below.
1
2
3

import pandas as pd
import numpy as np
import sys

4
5
6
7
8

ma_short_period = None # int


ma_long_period = None # int
number_of_diffs = 30
num_required_candles = None

9
10
11
12
13
14
15

def ExpMovingAverage(values, period):


weights = np.exp(np.linspace(-1., 0., period))
weights /= weights.sum()
ema = np.convolve(values, weights)[:len(values)]
ema[:period] = ema[period]
return ema

16
17
18

def GetNumberOfAttributes(deep_thought_intf):
deep_thought_intf.SetNumAttributes(2)

19
20
21
22
23
24
25
26
27
28
29
30

def SetParameterValue(param_name, param_value):


global ma_short_period
global ma_long_period
global num_required_candles
if param_name == "ma_short_period":
ma_short_period = param_value
elif param_name == "ma_long_period":
ma_long_period = param_value
num_required_candles = ma_long_period + number_of_diffs + 2
else:
print("Unknown parameter:", param_name)

31
32
33
34
35

def GetFeatures(deep_thought_intf):
if (ma_short_period == None):
print("Error: ma_short_period has not been set!")
return -1

36
37
38

csv_file_name = deep_thought_intf.GetLastBars(num_required_candles, "EURUSDh4")


candles = pd.read_csv(csv_file_name, index_col=False)

39
40
41

if len(candles.index) < num_required_candles:


return -1

42
43
44

close_values = candles[close].values
reversed_close_values = close_values[::-1]

45
46
47

ema_short = ExpMovingAverage(reversed_close_values, ma_short_period)


ema_long = ExpMovingAverage(reversed_close_values, ma_long_period)

48
49
50

for i in range(1, number_of_diffs, 1):


deep_thought_intf.SetAttribute(i-1, ema_short[-i] - ema_long[-i])

Listing 8.2: Python script example defining a feature.


This script uses the Python libraries pandas which provides data analysis functions including a

CHAPTER 8. PYTHON SCRIPTING

28

data-frame and numpy for numeric analysis. An interface object deep thought intf is passed to
the GetNumberOfAttributes() and GetFeatures() functions. This interface is the mechanism
for passing data back and forth between DeepThought and your scripts.
The methods of the deep thought interface object are detailed in table 8.1 on page 34.

8.3

Python Target

The target script has two functions; to detect a forecast trigger and to label a training instance
with a target. Detecting a forecast trigger can be as simple as forecasting each 4-hourly bar, or
more complex such as only forecasting when a pair of moving averages has crossed. If a trigger
has been detected, your GetTargetIsTrigger() function returns True and a training sample is
created. If the criteria has not been met, your script returns False.
Your script must also supply a GetTarget() function. This function is passed a candle at
observation time for a sample where the target trigger was met, along with the current candle.
Your function can then compare and decide if the target criteria has been met.
The following examples should make this a little clearer.
This example is the setup for a system that forecasts whether a 20 pip target will be hit first by
price moving up or price moving down. A new forecast is created each four hours so that every
four hours this system will enter a trade with a target of +20 pips for an up forecast and -20
pips for a down forecast. To do this the scripts use the 1 minute candles. The script is setup
in the config in the <python> section. The target is configured in the <model> section. More
detail on configuration is given in chapter 9 on page 35. Detail on the <model> configuration
section is on page 41.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35

<python>
<script-filename>target num pips.py</script-filename>
</python>
<model>
<identifier>h4-features</identifier>
<target>
<type>python-script</type>
<identifier>target-next-pip-movement</identifier>
<bar-series>EURUSDm1</bar-series>
<parameter name="pip-movement" type="double">20.0</parameter>
<check-target-trigger-func-name>GetIsTargetTrigger</check-target-trigger-func-name>
<get-target-func-name>GetTarget</get-target-func-name>
<set-parameter-value-func-name>TargetSetParameterValue</set-parameter-value-func-name>
</target>
<feature>
<type>hour-of-day</type>
<period ga-subst="time-of-day">h4</period>
</feature>
<feature>
<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<number ga-subst="average-close-num">30</number>
<value-type>diff</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>5</period>
<number ga-subst="average-close-num">30</number>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>

CHAPTER 8. PYTHON SCRIPTING


36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72

29

<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>10</period>
<number ga-subst="average-close-num">30</number>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>20</period>
<number ga-subst="average-close-num">30</number>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>50</period>
<number ga-subst="average-close-num">30</number>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>100</period>
<number ga-subst="average-close-num">30</number>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
</model>

Listing 8.3: Python target configuration.


The following is the listing of the target num pips.py script defined in the configuration file.
We have defined a single parameter pip-movement in the configuration. This is passed to the
script using the TargetSetParameterValue() function when DeepThought starts up.
The function GetIsTargetTrigger() tests to see if the trigger criteria has been reached. As
we are calling this script every time a 1 minute candle complete (setup in the config using the
<bar-series> element in the <target> section), the script must check to see if a four hour
candle has just completed.
The function GetTarget() set the target of either -1.0 or 1.0 if a target has been reached. If no
target has been reached, a target is not set.
1
2
3
4

import pandas as pd
import numpy as np
import sys
from datetime import datetime

5
6
7

num_pips = None # double


last_close_datetime = "none"

8
9
10
11
12

def TargetSetParameterValue(param_name, param_value):


if param_name == "pip-movement":
global num_pips
num_pips = param_value/10000.0

13
14
15
16
17
18

def GetIsTargetTrigger(deep_thought_intf, latest_candle):


# Check if we are at the close of an H4 candle. We need to do it this way as we are
# triggering from M1 candles.
csv_file_name = deep_thought_intf.GetLastBars(2, "EURUSDh4")
candles = pd.read_csv(csv_file_name, index_col=False)

CHAPTER 8. PYTHON SCRIPTING


19
20
21
22
23
24
25
26

30

if len(candles.index) < 1:
return False
candle_close_datetime = candles.iloc[0][close_date_time]
global last_close_datetime
if (last_close_datetime != candle_close_datetime):
last_close_datetime = candle_close_datetime
return True
return False

27
28
29
30
31
32

def GetTarget(deep_thought_intf, candle_at_observation, latest_candle):


if (latest_candle.ClosePrice() - candle_at_observation.ClosePrice() >= num_pips):
deep_thought_intf.SetTarget(1.0)
elif (latest_candle.ClosePrice() - candle_at_observation.ClosePrice() <= -num_pips):
deep_thought_intf.SetTarget(-1.0)

Listing 8.4: Python script example defining a target.

8.4

Python Predictor

A predictor is a component that takes training data, performs training, builds a model and when
given an unlabelled example performs a forecast. There a built-in predictors in DeepThought
which implement algorithms such as Support Vector Machines, Gradient Boosted Trees, Neural
Networks, etc. You can use this in conjunction with your own predictor, or simply use your own
predictor(s) by themselves.
Your predictor does not have to implement a machine learning algorithm. You could, for example, use a simple moving average cross as a predictor. You could also use Pythons sklearn
library (http://scikit-learn.org) to use other machine learning algorithms.
A complete configuration file using a single Python predictor is given below followed by the
Python script predictor.py. This example implements a moving-average cross predictor. It
will trade for one bar when a fast moving average crosses a slow moving average. No machine
learning is used in this example. This is an example of usage only for demonstration purposes
we dont recommend the use of a moving average cross by itself. In the example below, the
model does not require any features as the Python script only requires the bar-series data
(close prices) which it gets from the deep thought intf object.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27

<config>
<bar-series>
<identifier>EURUSDm1</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="database">eurusd.db</source>
<load-from-date>2012-06-01</load-from-date>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>1</bar-duration-minutes>
<const-bar-price>0.0</const-bar-price>
</bar-series>
<bar-series>
<identifier>EURUSDh4</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="bar-series">EURUSDm1</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>240</bar-duration-minutes>
<delay-minutes-offset>0</delay-minutes-offset>
</bar-series>
<bar-series-collection>
<data-file-dir>C:\FX_Database</data-file-dir>
</bar-series-collection>
<python>
<script-filename>predictor.py</script-filename>
</python>
<model>

CHAPTER 8. PYTHON SCRIPTING


28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81

<identifier>h4-features</identifier>
<target>
<type>bars-in-future</type>
<identifier>target-1-bar-in-future</identifier>
<bar-series>EURUSDh4</bar-series>
<number>1</number>
<price-type>up-down</price-type>
</target>
</model>
<python-predictor>
<model>h4-features</model>
<identifier>python-predictor-h4</identifier>
<predictor-weight>1.0</predictor-weight>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>
<predict-func>Predict</predict-func>
<train-func>Train</train-func>
<parameter name="ma-long" type="int">20</parameter>
<parameter name="ma-short" type="int">5</parameter>
<num-training-observations>25</num-training-observations>
<num-training-skip>1</num-training-skip>
</python-predictor>
<predictor-ensemble>
<retrain-each-bar>True</retrain-each-bar>
</predictor-ensemble>
<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<target-trigger>h4-features</target-trigger>
<entry-threshold>0.0</entry-threshold>
<take-profit>0.0</take-profit>
<stop-loss>0.0</stop-loss>
<break-even>0.0</break-even>
<exit-all-hour>-1</exit-all-hour>
<trade-bar-series>EURUSDm1</trade-bar-series>
<reverse-all>False</reverse-all>
</signal-generator>
<trader>
<hold-minutes>0</hold-minutes>
<hold-bars>0</hold-bars>
<max-drawdown>100000</max-drawdown>
<close-at-weekend>False</close-at-weekend>
<scale-out>False</scale-out>
<max-position>100</max-position>
<limit-orders offset="0.0">False</limit-orders>
</trader>
<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2014-01-01</stop-date>
<use-recorded-signals>False</use-recorded-signals>
<display-progress>True</display-progress>
</backtest>
</config>

Listing 8.5: Python predictor configuration example.


1
2

import numpy as np
import pandas as pd

3
4
5
6

# global variables
ma_long_period = None #int
ma_short_period = None #int

7
8
9
10
11
12
13

def ExpMovingAverage(values, period):


weights = np.exp(np.linspace(-1., 0., period))
weights /= weights.sum()
ema = np.convolve(values, weights)[:len(values)]
ema[:period] = ema[period]
return ema

14
15
16

def Sign(value):
if (value >= 0):

31

CHAPTER 8. PYTHON SCRIPTING


17
18
19

32

return (1)
else:
return (-1)

20
21
22
23
24
25
26
27

def SetParameterValue(param_name, param_value):


global ma_long_period
global ma_short_period
if param_name == "ma-long":
ma_long_period = param_value
if param_name == "ma-short":
ma_short_period = param_value

28
29
30
31
32
33

def Train(deep_thought_intf, training_csv):


# This example does not need to training anything but we could use
# the following line to read a training set into a Pandas data frame:
# training_df = pd.read_csv(training_csv, index_col=False)
return True

34
35
36
37
38
39
40

def Predict(deep_thought_intf, attributes_csv):


# Get an array of close prices
csv_file_name = deep_thought_intf.GetLastBars(ma_long_period + 4, "EURUSDh4")
candles = pd.read_csv(csv_file_name, index_col=False)
close_values = candles[close].values
reversed_close_values = close_values[::-1]

41
42
43
44
45
46
47
48

# As numpy convolve (moving average) calculates from lowest index to highest,


# we must reverse the array of values as a bar series has the most recent
# values with the lowest index and we want to compute the moving average
# moving forward in time (i.e. from the back of the array of close prices
# forwards).
ema_short = ExpMovingAverage(reversed_close_values, ma_short_period)
ema_long = ExpMovingAverage(reversed_close_values, ma_long_period)

49
50
51
52
53

# Calculate the difference between the moving averages at the most


# recent candle, and the one before that
diff_current = ema_short[-1] - ema_long[-1]
diff_previous = ema_short[-2] - ema_long[-2]

54
55
56
57
58
59
60
61
62

# Look for a cross. If we find one, predict in the direction of


# the cross.
if Sign(diff_current) != Sign(diff_previous):
deep_thought_intf.SetForecast(Sign(diff_current))
else:
# Not strictly necessarily as forecast defaults to 0 if
# not set, but set here for completeness.
deep_thought_intf.SetForecast(0)

Listing 8.6: Python script example for a predictor.

8.5

Python Signal Generation

The signal generator is the component that transforms the forecasts produced by the predictors
into buy and sell signals. It also controls trading parameters such as take profit and stop loss.
More detail on the signal generator can be found in section 9.8 on page 70.
As there is only one signal generator, you only need to provide optional Python function names
to the signal-generator component. An example is given below.
1
2
3
4
5
6
7
8
9
10

<python>
<script-filename>signal generator.py</script-filename>
</python>
<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<target-trigger>h4-features</target-trigger>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>

CHAPTER 8. PYTHON SCRIPTING

33

<combine-forecasts-func-name>CombineForecasts</combine-forecasts-func-name>
<parameter name="threshold" type="double">20.0</parameter>
<trade-bar-series>EURUSDm1</trade-bar-series>
</signal-generator>

11
12
13
14

Listing 8.7: Python signal generator configuration example.


1
2
3
4
5
6

#
#
#
#
#
#

Simple demonstration of the signal generator calling a Python function.


This example simply buys/sells if the combined forecasts of the
predictor ensemble threshold have exceeded a threshold given by the
"threshold" parameter in the configuration file.

7
8
9

import pandas as pd
import numpy as np

10
11
12

# Globals
threshold = None # double

13
14
15
16
17

# We could make these parameters in the configuration, but hard code them here
# for the moment.
take_profit = 20.0
stop_loss = 25.0

18
19
20
21
22
23

def SetParameterValue(param_name, param_value):


if param_name == "threshold":
global threshold
threshold = param_value
print("set threshold to ", threshold)

24
25
26
27

def CombineForecasts(deep_thought_intf, predictions_csv):


# read the predictions into a Pandas dataframe
predictions = pd.read_csv(predictions_csv)

28
29
30
31

# Remove all limit orders and tag them in the log file as "Missed"
deep_thought_intf.DeleteLimitOrders("Missed")
average_forecast = predictions.forecast.mean()

32
33
34
35
36
37
38

if average_forecast >= threshold:


deep_thought_intf.SendBuyOrder("EURUSDm1", take_profit, stop_loss)
elif average_forecast <= -threshold:
deep_thought_intf.SendSellOrder("EURUSDm1", take_profit, stop_loss)
else:
deep_thought_intf.CloseAllTrades("Threshold not reached")

Listing 8.8: Python script example for the signal generator.

8.6

The deep thought intf Interface Object

Some of the Python functions described above are passed a deep thought inft object. This is
an interface that is used between DeepThought and your script to pass values back and forth.
It is used to set values such as targets and feature values as well as enable your script to access
historical candles, forecasts, etc. Table 8.1 summaries the methods provided on this object.
Note that not all methods would be used in a given function. For example, the functions used
to set features should not set targets in fact if you do this the target will be ignored. You
would need to set the target in the appropriate target function.

CHAPTER 8. PYTHON SCRIPTING

34

Table 8.1: Summary of the deep thought intf interface object


Method
GetLastBars(num bars,
bar series)
GetNumAttributes()
SetAttribute(index, value)
SetNumAttributes(num attributes)
SetTarget(value)
SetForecast(value)
SendBuyOrder(bar series id,
take profit, stop loss)

SendSellOrder(bar series id,


take profit, stop loss)

CloseAllTrades(comment)
DeleteLimitOrders(comment)

Description
Returns the file name of a CSV file containing the
last num bars candles of bar series bar series.
Returns the number of attributes of this feature.
Set using SetNumAttributes().
Set the value of the attribute with the given index.
Indexes are zero indexed.
Set the number of attributes of this features.
Set the target value if a target has been reached
when a GetTarget() function is called.
Set the forecast value when a Predict() function
has been called.
Send a buy order to the bar series sepcifiecd by
bar series at the current market price. Optionally set take profit or stop loss to be non-zero
if required.
Send a sell order to the bar series sepcifiecd by
bar series at the current market price. Optionally set take profit or stop loss to be non-zero
if required.
Close all open trades with an optional comment.
The comment appears in the log file.
Remove all unfilled limit orders with an optional
comment. The comment appears in the log file.

Chapter 9

Configuration Details
DeepThought is driven by XML configuration files. A GUI will be available in a future version.
The XML is divided into sections as listed in table 9.1. Some sections are required, some
are optional and some can contain their own sections. Also, some sections can have only one
definition (e.g. bar-series-collection), while others can have as many as desired, e.g. model.
Each section is detailed individually in this chapter.
Where the option value is a string, e.g. True, False, RBF, the text is case insensitive. You can
check the first part of the log file to check what default values were used for missing values.
Table 9.1 details the configuration sections.

9.1

bar-series

A bar-series is the raw data. For Forex trading, DeepThought operates by using 1 minute
candles to generate longer duration bars (candles) . Renko bars can also be generated. An
example configuration snippet that generates 90 minute candles is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

<bar-series>
<identifier>EURUSDm1</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="database">eurusd.db</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>1.5</average-spread>
<bar-duration-minutes>1</bar-duration-minutes>>
</bar-series>
<bar-series>
<identifier>EURUSDh4</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="bar-series">EURUSDm1</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>240</bar-duration-minutes>
<delay-minutes-offset>0</delay-minutes-offset>
</bar-series>

In the above example, a pip as defined by the broker is a price change of 0.0001. Therefore we
must multiply by 10000 to use pips in a human readable form, so a price change of 0.0005 is 5
pips. This is controlled by the price-to-pip-multiplier setting. It exists entirely to make
things easier to read for humans. When this is set, all price movements in the configuration
must be set accordingly. So a take profit of 20 pips can be set as 20 rather than 0.002. We could
set price-to-pip-multiplier to 1.0 (the default if not defined) and we would need to enter a
20 pip take profit as 0.002. All output files use this multiplier.

35

CHAPTER 9. CONFIGURATION DETAILS

36

Table 9.1: Sections in the XML configuration file


XML Section
bar-series

bar-series-collection
python
model

svm-predictor
linear-svm-predictor
gbt-predictor
random-forest-predictor
extremely-randomised-trees-predictor
predictor-ensemble
signal-generator

trader
backtest
genetic-algo

Description
Require at least one. Normally have at least two: one to
define the 1 minute candles, and one to define a longer duration candle series that is used by the predictor.
Required, and only one definition allowed. Defines parameters common to all bar-series, e.g. database locations.
Optional. Only required if using Python scripting for one or
more other components.
Require at least one, and can have as many as desired. You
would normally have several in an ensemble learning setting.
Defines the features to be used by a predictor. Different
predictors can use the same model.
Optional. Defines the parameters for a Support Vector
Machine predictor.
Optional. Defines the parameters for a Linear Support
Vector Machine predictor.
Optional. Defines the parameters for a Gradient Boosted
Tree predictor.
Optional. Defines the parameters for a Random Forest
predictor.
Optional. Defines the parameters for an Extremely Random Trees predictor.
Required and only one definition allowed. Defines parameters common to all predictors.
Required and only one definition allowed. Defines how forecasts from individual predictors are combined to create a
signal (buy, sell, do nothing).
Required and only one definition allowed. Defines trading
parameters such as take-profit and stop-loss.
Only required if backtesting. Defines parameters for backtesting such as start and stop dates.
Only required if running a genetic algorithm. Parameters
which control a genetic algorithm for parameter search. Detailed in chapter 6 on page 14.

Each bar-series has an identifier parameter. This is used throughout the configuration by
other sections that need to use bar series data.
The first bar-series defined has a source type parameter of database. This means it will
use the database defined in source in the directory specified in the
bar-series-collection section.
The second bar-series defined has a bar-series-type of const-time. This means it will
generate constant duration candles with the number of minutes defined by the setting
bar-duration-minutes. In this example 90 minutes is used. We could use 240 to generate H4
candles, 20 to generate 20 minute candles, etc. We are not limited to the standard candle duration. Also defined is the source type of bar-series and the source of EURUSDm1. This means
it will source its data from the previously defined bar-series. The delay-minutes-offset is
set to 0 in this example. This means that the candle series will be generated from 00:00 on the
date of the first candle in the historical set. We could use a setting of 15 for example, meaning

CHAPTER 9. CONFIGURATION DETAILS

37

that the candles will be generated 15 minutes later than the previous setting.
Table 9.2 illustrates the effect of this setting. It can be used as a way of not starting candles
until news announcements have been absorbed by the market.
Table 9.2: The effect of the delay-minutes-offset parameter on intraday candles.
Candle Number
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

Open
00:00
01:30
03:00
04:30
06:00
07:30
09:00
10:30
12:00
13:30
15:00
16:30
18:00
19:30
21:00
22:30

Offset 0
Time Close Time
01:30
03:00
04:30
06:00
07:30
09:00
10:30
12:00
13:30
15:00
16:30
18:00
19:30
21:00
22:30
00:00

Open
00:15
01:45
03:15
04:45
06:15
07:45
09:15
10:45
12:15
13:45
15:15
16:45
18:15
19:45
21:15
22:45

Offset 15
Time Close Time
01:45
03:15
04:45
06:15
07:45
09:15
10:45
12:15
13:45
15:15
16:45
18:15
19:45
21:15
22:45
00:15

CHAPTER 9. CONFIGURATION DETAILS

9.1.1

38

Renko Bars

DeepThought can generate Renko bars. These are where the price is constant and the duration
is variable. Two types are available, the difference being when a new bar is generated. Figures
9.1 and 9.2 illustrate the difference between the two types.

Figure 9.1: Type 1 Renko Bars

Figure 9.2: Type 2 Renko Bars


For type 1 Renko bars, a new bar is created when the price moves up or down by
const-bar-price pips. For type 2 bars a new bar is created when the price moves
const-bar-price pips above the previous high, or const-bar-price pips below the previous low. Note that for Renko bars, the high is equal to either the open for a down bar and equal
to the close for an up bar and conversely for the low. For Renko type 2 bars the price must move
twice const-bar-price pips when the bar reverses direction to the previous bar. This can have
the effect of twice the loss (gain) if getting a reversal forecast wrong (correct) compared with
forecasting a Renko bar in the same direction as the previous bar. In figures 9.1 and 9.2 note
the relative positions of the bars labelled 1 and 2.
A configuration snippet that generates Renko bars with a 20 pip price movement is given below.
1
2

<bar-series>
<identifier>EURUSDm1</identifier>

CHAPTER 9. CONFIGURATION DETAILS


3
4
5
6
7
8
9
10
11
12
13
14
15
16

<bar-series-type>const-time</bar-series-type>
<source type="database">eurusd.db</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>1.5</average-spread>
<bar-duration-minutes>1</bar-duration-minutes>
</bar-series>
<bar-series>
<identifier>EURUSDh4</identifier>
<bar-series-type>const-price-method-2</bar-series-type>
<source type="bar-series">EURUSDm1</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<const-bar-price>20.0</const-bar-price>
</bar-series>

Fixed duration and renko bars can be mixed.

39

CHAPTER 9. CONFIGURATION DETAILS

9.1.2

40

Summary of bar-series Options

Table 9.3 summarises the options available when defining bar-series objects.
Table 9.3: bar-series configuration options.
Setting
identifier
bar-series-type

source type

price-to-pip-multiplier

average-spread
bar-duration-minutes

const-bar-price

delay-minutes-offset

load-from-date

9.2

Description
A unique identifier that identifies this bar series to other
configuration sections.
One of:
const-time for fixed duration candles.
const-price-method-1 for type 1 Renko bars.
const-price-method-2 for type 2 Renko bars.
One of:
database if the bar series is 1 minute candles stored in a
database.
bar-series if the bar series is generated from a 1 minute
source.
The source value defines which database or bar-series to
use.
The multiplier to multiply the smallest price change to 1
pip. For example if a pip is defined (by the broker) to be a
price change of 0.0001, the multiplier would be 10000 to get
1 pip.
The average spread in pips to use during backtesting and
paper-trading.
If the bar-series-type has been defined as const-time,
this option is mandatory. It is the number of minutes of the
duration of a candle.
If the bar-series-type has been defined as either
const-price-method-1 or const-price-method-2, this is
the price movement, in pips, that defines a bar.
Optional parameter if the bar-series-type has been defined as const-time. Specifies the offset in minutes as described above.
Optional parameter. Used to load data from a specific date
in the format YYYY-MM-DD. Useful for live trading to
speed spin-up time by not loading data that is not required.
This should be set to a date no later than is required to
create a training set.

bar-series-collection

This section specifies where Sqlite databases are located. The configuration snippet is given
below.
1
2
3

<bar-series-collection>
<data-file-dir>C:\FX_Database</data-file-dir>
</bar-series-collection>

The data-file-dir option set the directory of the Sqlite databases.

CHAPTER 9. CONFIGURATION DETAILS

9.3

41

model

The model defines the features, target and when to forecast. It contains a single target section
which also defines the trigger (i.e. when to forecast) and as many feature sections as as we
need. A sample configuration snippet is given below:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47

<model>
<identifier>h4-features</identifier>
<target>
<type>bars-in-future</type>
<identifier>target-1-bar-in-future</identifier>
<bar-series>EURUSDh4</bar-series>
<number>1</number>
<price-type>up-down</price-type>
</target>
<feature>
<type>hour-of-day</type>
<period>h4</period>
</feature>
<feature>
<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<number>30</number>
<value-type>diff</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>5</period>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>10</period>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>20</period>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>
</model>

This example model comprises a target of whether the price will be higher or lower by 1 bar in
the future and the following features:
The hour of day binarised into 6 attributes.
The close price differences between the previous 30 bars.
Exponential moving averages of periods 5, 10, 20 with the attributes being 16 differences
spread across the past 100 values.
All continuous features are normalised using the min-max scheme.

CHAPTER 9. CONFIGURATION DETAILS

9.4

42

Features

The currently available feature list is given in table 9.4. There are plans to increase this list
in future releases and include the ability to create your own features using Python scripts. All
features must define a type which tells the system what the feature is. The configuration for
each feature is placed within a model section as a model comprises features and one target.
Table 9.4: Features used as independent inputs to machine learning models.
Feature
hour-of-day

day-of-week
bar-diff
bar-attribute

moving-average
python-script

Description
The hour of day. Can be continuous (023), categorical to
nearest H4 (6 binarised attributes), or categorical to H1 (24
binarised attributes)
Day of the week. Can be continuous (06) or categorical
with 7 binarised attributes.
The difference between two candle prices such as high-high
or close-close.
Similar to bar-diff. Can be absolute values such as volume,
or difference between candle attributes such as average price
(the average close of the 1 minute candles contained within
the candle)
The popular and ever-present moving average.
Custom feature in Python script.

Each individual feature and its options are detailed in the following sections.

CHAPTER 9. CONFIGURATION DETAILS

9.4.1

43

hour-of-day

An example hour-of-day configuration snippet is given below and table 9.5 details the parameters options.
1
2
3
4

<feature>
<type>hour-of-day</type>
<period>h4</period>
</feature>

Table 9.5: hour-of-day feature.


Option
type
period

Description
Must be hour-of-day.
Defines the way in which the hour is encoded as attributes.
Takes one of the following values:
H1
Binarises with 24 attributes. Refer to section 4.8 for details
on binarising features.
H4
Discretises the hour to the most recent H4 open time so
we have 6 possible values. Thus hours 0,1,2,3 become
1,0,0,0,0,0 and 4,5,6,7 become 0,1,0,0,0,0 and 20,21,22,23 become 0,0,0,0,0,1 etc.
single Treats the hour as a continuous variable, using min-max scaling.
none
Disable this feature. Useful as a value for the genetic algorithm to turn this feature on and off.

CHAPTER 9. CONFIGURATION DETAILS

9.4.2

day-of-week

The day-of-week configure snippet is:


1
2
3
4

<feature>
<type>day-of-week</type>
<representation>binary</representation>
</feature>

Table 9.6 details the options for the day-of-week feature.


Table 9.6: day-of-week feature.
Option
type
representation

Description
Must be day-of-week.
Defines the way in which the day is encoded as attributes.
Takes one of the following values:
binary Binarises with 7 attributes. Refer to section 4.8 for details
on binarising features.
single Treats the day as a continuous variable, using min-max scaling.
none
Disable this feature. Useful as a value for the genetic algorithm to turn this feature on and off.

44

CHAPTER 9. CONFIGURATION DETAILS

9.4.3

45

bar-diff

An example bar-diff snippet that extracts prices differences between the close price is:
1
2
3
4
5
6
7
8
9

<feature>
<type>bar-diff</type>
<diff-type>close</diff-type>
<bar-series>EURUSDh4</bar-series>
<min-max-clamp>0.015</min-max-clamp>
<scale-type>none</scale-type>
<outlier-percentile>1</outlier-percentile>
<selection-list>1,2,3,5,7,13,20,55</selection-list>
</feature>

This example create eight attributes with the values given in table 9.7.
Table 9.7: Price difference examples for the bar-diff feature.
Attribute Number
1

2
3
4
5
6
7
8

Description
The price difference between the close price of the last completed candle at the time of forecast/sample time and the
close price of 1 candle before.
The price difference between 1 and 2 candles before the forecast/sample time.
The price difference between 2 and 3 candles before the forecast/sample time.
The price difference between 3 and 5 candles before the forecast/sample time.
The price difference between 5 and 7 candles before the forecast/sample time.
The price difference between 7 and 13 candles before the
forecast/sample time.
The price difference between 13 and 20 candles before the
forecast/sample time.
The price difference between 20 and 55 candles before the
forecast/sample time.

These price differences are set by the selection-list option. We can also use the number
option instead of selection-list. For example:
1
2
3
4
5
6
7
8

<feature>
<type>bar-diff</type>
<diff-type>close</diff-type>
<bar-series>EURUSDh4</bar-series>
<min-max-clamp>0.015</min-max-clamp>
<scale-type>zscore</scale-type>
<number>80</number>
</feature>

will generate 80 attributes of the price differences between the 80 candles immediately before
the forecast/sample time.
Table 9.8 lists all the options available for the bar-diff feature.

CHAPTER 9. CONFIGURATION DETAILS

Table 9.8: bar-diff feature parameter options.


Option
type
diff-type

selection-list
bar-series
scale-type

outlier-percentile

number

Description
Must be bar-diff.
The type of data used to generate the attributes. Takes one of the
following values:
close
The close price.
high
The price of the high.
low
The price of the low.
high-open
The high price minus the open price of the
same candle.
high-close
The high price minus the close price of the
same candle.
close-low
The close price minus the low price of the
same candle.
open-to-close
The close price minus the open price of the
same candle.
prev-close-to-open The open price minus the close of the previous candle.
Comma separated list of candle indexes from the candle at the time
of forecast/sample for which to calculate the price differences.
The identifier of the bar-series.
The type of scaling used to normalise the features. Takes one of
the following values:
min-max
Scale all values between 1 and 1 using
the minimum and maximum values for the
feature value.
zscore
For each feature value, subtract the mean
and divide by the standard deviation. The
resulting scaled feature has a mean of zero
and a standard deviation of one.
div-sd
Divide each feature by the standard deviation.
div-max
Divide each feature by the maximum of
the absolute values of the maximum and
minimum.
log10
Take the base-10 logarithm of each feature
value.
none
Do not use any scaling.
Optional.
The percentile to use to remove outliers.
If
outlier-percentile is not supplied, then no outliers are removed. If set to 1, this setting will use the values at the 1%
and 99% percentiles as the min and max. All values higher/lower
than this will be trimmed to these percentile values.
The number of attributes in this feature.
Use this if
selection-list is not used.

46

CHAPTER 9. CONFIGURATION DETAILS

9.4.4

47

bar-attribute

This feature has many options. In a future version this feature configuration and the bar-diff
feature configuration will be merged. An example bar-attribute snippet that extracts the
most recent 30 prices difference between the average-close price is:
1
2
3
4
5
6
7
8

<feature>
<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<value-type>diff</value-type>
<number ga-subst="average-close-num">30</number>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
</feature>

The average-close is the average close price of all the 1 minute candles contained within an
individual candle in the EURUSDh4 bar-series. Table 9.9 lists all the options available for the
bar-attribute feature.
Table 9.9: bar-attribute feature parameter options.
Option
type
attribute-type

value-type

bar-series
selection-list
scale-type

Description
Must be bar-attribute.
The type of data used to generate the attributes. Takes one of the
following values:
average-close
The average of the close prices of the 1
minute candles contained within the candle.
average-hlc
The average of the high, low and close
prices.
volume
The volume traded during the candle. For
Forex this is the number of times the price
changed, a sort of defacto for volume.
minute-high
The number of minutes since the open
that the high price was reached.
minute-low
The number of minutes since the open
that the low price was reached.
mins-between-high-low The number of minutes between the time
that the high price was reached and the
time that the low price was reached.
The way the feature is constructed. Takes one of the following values:
value
The raw value of the feature. Use with
care so as not to expose the predictors to
values they have not seen in training.
diff
The average of the high, low and close
prices.
The identifier of the bar-series.
Comma separated list of candle indexes from the candle at the time of
forecast/sample for which to calculate the moving average differences.
The type of scaling to use to normalise the features. Takes one of the
following values:
min-max
Scale all values between 1 and 1 using
the minimum and maximum values for the
feature value.

CHAPTER 9. CONFIGURATION DETAILS

For each feature value, subtract the mean


and divide by the standard deviation. The
resulting scaled feature has a mean of zero
and a standard deviation of one.
div-sd
Divide each feature by the standard deviation.
div-max
Divide each feature by the maximum of
the absolute values of the maximum and
minimum.
log10
Take the base-10 logarithm of each feature
value.
none
Do not use any scaling.
Optional.
The percentile to use to remove outliers.
If
outlier-percentile is not supplied, then no outliers are removed.
If set to 1, this setting will use the values at the 1% and 99% percentiles as the min and max. All values higher/lower than this will
be trimmed to these percentile values.
The number of attributes in this feature.
zscore

outlierpercentile

number

48

CHAPTER 9. CONFIGURATION DETAILS

9.4.5

49

moving-average

This feature is the ubiquitous moving average. Future versions of DeepThought will enable you
to code your own indicators using Python or similar. A sample configuration snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>10</period>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>zscore</scale-type>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>20</period>
<selection-list>1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<bar-series>EURUSDh4</bar-series>
<scale-type>zscore</scale-type>
</feature>

This example is for two features with 16 attributes each. The selection-list option controls
which candle indexes are use to generate the attributes as shown in table 9.7. The options for
the moving-average feature are in table 9.10.

CHAPTER 9. CONFIGURATION DETAILS

50

Table 9.10: moving-average feature parameter options.


Option
type
ma-attribute-type

period
bar-series
scale-type

outlier-percentile

Description
Must be moving-average.
The type of data used to generate the attributes. Takes one of the
following values:
open
The open price.
high
The price of the high.
low
The price of the low.
close
The price at the close of the candle.
average-close
The average of the 1 minute candles that
are contained within the candle.
average-hlc
The average of the high, low and close.
volume
The volume traded during the candle. For
Forex this is the number of times the price
changed, a sort of defacto for volume.
bar-duration
The duration in minutes of the Renko bar.
Not relevant for fixed-duration candles.
mins-between-high-low The number of minutes between the time
the high was reached and the time that
the low was reached.
time-high
The number of minutes since the open
that the high price was reached.
time-low
The number of minutes since the open
that the low price was reached.
The period of the moving average.
The identifier of the bar-series.
The type of scaling to use to normalise the features. Takes one of the
following values:
min-max
Scale all values between 1 and 1 using
the minimum and maximum values for the
feature value.
zscore
For each feature value, subtract the mean
and divide by the standard deviation. The
resulting scaled feature has a mean of zero
and a standard deviation of one.
div-sd
Divide each feature by the standard deviation.
div-max
Divide each feature by the maximum of
the absolute values of the maximum and
minimum.
log10
Take the base-10 logarithm of each feature
value.
none
Do not use any scaling.
Optional.
The percentile to use to remove outliers.
If
outlier-percentile is not supplied, then no outliers are removed.
If set to 1, this setting will use the values at the 1% and 99% percentiles as the min and max. All values higher/lower than this will
be trimmed to these percentile values.

CHAPTER 9. CONFIGURATION DETAILS

9.4.6

51

csv-feature

The CSV Feature enables you to use your own data in CSV format. It can be generated by
Metatrader via a script to generate data for backtesting and a script has been provided for this.
It can be found in the Metatrader Scripts folder in the DeepThought install directory. The
CSV must also be generated by the EA when live/paper trading and the example expert advisors
demonstrate how to do this. If you are using another trading platform, you should be able to
generate this file from that platform. A sample configuration snippet is given below.
1
2
3
4
5
6
7

<feature>
<type>csv-feature</type>
<filename>C:\IBFX-MT4-AU\experts\files\EURUSD_CCI.csv</filename>
<identifier>cci_feature</identifier>
<value-type>value</value-type>
<selection-list>1,2,3,5,8,13,21,34</selection-list>
</feature>

File Format
The file format of the CSV file is:
YYYY.mm.DD,HH:MM,%value%

where %value% is a double/float value. The order of dates must be decreasing from the top of
the file, i.e newest at the beginnig, oldest at the end of the file. When backtesting DeepThought
will look in the CSV file to find the closest previous value with date equal to or earlier than
the date the the forecast is being taken. The <selection-list> indexes are defined from this
point. The first few entries of an example CSV file is given below:
2014.02.04,08:00,-16.56445518
2014.02.04,04:00,-7.55706836
2014.02.04,00:00,-46.66780044
2014.02.03,20:00,-29.3612079
2014.02.03,16:00,-24.97550049
2014.02.03,12:00,-46.92997607
2014.02.03,08:00,-64.23733597
2014.02.03,04:00,-97.95026325
2014.02.03,00:00,-97.8667956
2014.02.02,23:00,-108.98997431
2014.01.31,20:00,-123.76219274
2014.01.31,16:00,-147.9225595
2014.01.31,12:00,-171.87121273
2014.01.31,08:00,-111.88052857
...

Table 9.11 lists the options for the csv-feature.

CHAPTER 9. CONFIGURATION DETAILS

Table 9.11: csv-feature parameter options.


Option
type
file-name
identifier

value-type

selection-list

scale-type

outlier-percentile

Description
Must be csv-feature.
The fully qualified location of the file where the CSV data is located. e.g. C:\IBFX-MT4-AU\experts\files\EURUSD CCI.csv
A unique identifier for this feature. Used to assign the correct
scaling and identify items in output file. Inbuilt features are able
to generate this from their parameters however as this is a userdefined feature the identifier must be supplied manually.
Defines how to process the CSV values. Takes one of the following
values:
diff
The difference between values whose indexes are specified by the selection-list. For example if the list is
specified as 1,2,5 then the differences use in the model
will be the difference between the value at 0 and 1, 1
and 2, 2 and 5..
value
Use the value directly. The first element in the
selection-list should be 0.
The indexes of the values to take from the CSV file. The indexes
are relative to the date of the sample that is being extracted. For
example if the backtester is forecasting on 5th May 2013 at 8 am,
index 0 (if value-type is diff) will correspond to the closest
matching prior value to 2013.05.05,08:00. If there is a value at
this time it will be used otherwise use the first value before this
date.
The type of scaling to use to normalise the features. Takes one of
the following values:
min-max Scale all values between 1 and 1 using the minimum
and maximum values for the feature value.
zscore
For each feature value, subtract the mean and divide
by the standard deviation. The resulting scaled feature has a mean of zero and a standard deviation of
one.
div-sd
Divide each feature by the standard deviation.
div-max Divide each feature by the maximum of the absolute
values of the maximum and minimum.
log10
Take the base-10 logarithm of each feature value.
none
Do not use any scaling.
Optional.
The percentile to use to remove outliers.
If
outlier-percentile is not supplied, then no outliers are removed. If set to 1, this setting will use the values at the 1%
and 99% percentiles as the min and max. All values higher/lower
than this will be trimmed to these percentile values.

52

CHAPTER 9. CONFIGURATION DETAILS

9.4.7

53

python-script

The python-script feature is detailed in section 8.2 on page 25 with scripting example usage. It
enables you to implement virtually any feature using Python scripting. An example configuration
snippet is given below.
1
2
3
4
5
6
7
8
9
10

<feature>
<type>python-script</type>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>
<get-number-of-attributes-func-name>GetNumberOfAttributes</get-number-of-attributes-func-name>
<get-features-func-name>GetFeatures</get-features-func-name>
<parameter name="ma\_short\_period" type="int">20</parameter>
<parameter name="ma\_long\_period" type="int">50</parameter>
<identifier>python-test-1</identifier>
<scale-type>min-max</scale-type>
</feature>

Table 9.12: python-script feature parameter settings.


Option
type
set-parameter-value-func-name

get-number-of-attributes-func-name

get-features-func-name

identifier
parameter

scale-type

Description
Must be python-script.
The name of the function that is used to set
parameters used by other functions in the script.
Typically SetParameterValue
The name of the function that returns the number of attributes set by the function set in
get-features-func-name. Typically GetNumberOfAttributes
The name of the function that is responsible
for generating the numerical values of the attributes. Typically GetFeatures.
A unique identifier for this feature.
An optional parameter that is passed via
set-parameter-value-func-name. You can
have as many parameters defined as you need.
The values are able to be set using the genetic
algorithm if desired. Two attributes must be set
with this element:
name
Name of the parameter which
will be passed as a string
to the function defined by
set-parameter-value-func-name.
type
Takes one of the following values: int, string, double. The
value of parameter is passed as
this type to the function defined by
set-parameter-value-func-name.
The type of scaling to use to normalise the features. Takes one of the following values:
min-max Scale all values between 1 and 1
using the minimum and maximum
values for the feature value.

CHAPTER 9. CONFIGURATION DETAILS


zscore

div-sd
div-max

log10
none

54
For each feature value, subtract the
mean and divide by the standard deviation. The resulting scaled feature
has a mean of zero and a standard
deviation of one.
Divide each feature by the standard
deviation.
Divide each feature by the maximum of the absolute values of the
maximum and minimum.
Take the base-10 logarithm of each
feature value.
Do not use any scaling.

CHAPTER 9. CONFIGURATION DETAILS

9.5

55

Targets

A target, also termed label, is the dependent variable that we are trying to predict. Currently
DeepThought supports the ability to target future price changes and a custom target where you
provide Python script to compute the target. The target configuration forms part of a model
configuration. Each model configuration must have one and only one target section.

9.5.1

bars-in-future

This is a built in target where we are predicting the change in price at the close of one or more
candles in the future. A sample configuration snippet is:
1
2
3
4
5
6
7

<target>
<type>bars-in-future</type>
<identifier>target-1-bar-in-future</identifier>
<bar-series>EURUSDh4</bar-series>
<number>1</number>
<price-type>up-down</price-type>
</target>

This example labels a training set with a target of 1 bar in the future on the EURUSDh4 bar series.
A forecast will attempt to predict this label. The price-type option in this case returns 1 for
price will move up and 1 for price will move down. Table 9.13 summarises the options.
Table 9.13: bars-in-future target.
Option
type
identifier
bar-series
price-type

number

Description
Must be bars-in-future.
A unique identifier for this feature.
The bar-series that the target is calculated
on.
The type of data used to generate the attributes. Takes one of the
following values:
up-down For classification, 1 if the price moves up, 1 if the
price moves down.
close
For regression, the change in price of the close at
number bars in the future.
high
For regression, the change in price of the high at
number bars in the future.
low
For regression, the change in price of the low at number
bars in the future.
The number of bars to look into the future.

CHAPTER 9. CONFIGURATION DETAILS

9.5.2

56

python-script

You can provide a Python script to compute a target. This is covered in more detail with example
scripts in section 8.3 on page 28. An example configuration snippet is given below.
1
2
3
4
5
6
7
8
9

<target>
<type>python-script</type>
<identifier>target-next-pip-movement</identifier>
<bar-series>EURUSDm1</bar-series>
<set-parameter-value-func-name>TargetSetParameterValue</set-parameter-value-func-name>
<check-target-trigger-func-name>GetIsTargetTrigger</check-target-trigger-func-name>
<get-target-func-name>GetTarget</get-target-func-name>
<parameter name="pip-movement" type="double">20.0</parameter>
</target>

This example labels a training example with a target of +1 or -1 depending on whether the price
will move 20 pips up or 20 pips down. There is no time limit for the price movement, although
this could be coded in the Python script. Table 9.14 summarises the options.

CHAPTER 9. CONFIGURATION DETAILS

57

Table 9.14: python-script target.


Option
type
identifier
bar-series

set-parameter-value-func-name

check-target-trigger-func-name

get-target-func-name

parameter

Description
Must be python-script.
A unique identifier for this target.
The bar-series to trigger from. At the close of
each candle in this series, the functions defined by
check-target-trigger and get-target-func-name
are called.
The name of the function that is used to set parameters used by other functions in the script. Typically
SetParameterValue
The name of the function that checks to see if a the criteria to trigger a forecast has been reached. It it has, a
training example is created and a forecast made. The
training example is cached until the criteria to label a target has been met as defined by the function
given in get-target-func-name. Typically GetIsTargetTrigger.
The name of the function that checks to see if a target
has been reached. This function compares the state
of the example when the trigger was reached with the
current state of the market and assigns a target value
if the target criteria has been reached. Typically GetTarget.
An optional parameter that is passed via
set-parameter-value-func-name. You can have as
many parameters defined as you need. The values are
able to be set using the genetic algorithm if desired.
Two attributes must be set with this element:
name Name of the parameter which will be passed
as a string to the function defined by
set-parameter-value-func-name.
type Takes one of the following values: int,
string, double. The value of parameter is
passed as this type to the function defined by
set-parameter-value-func-name.

CHAPTER 9. CONFIGURATION DETAILS

9.6

58

Predictors

Your configuration can have as many predictor sections as you desire. The number is only
limited by the computing power of your hardware. The forecasts are combined and a majority
vote decides the signal direction. If there is no net agreement among the predictors, the signal
will be either hold or exit all. A threshold can be set in the signal-generator configure
section so signals will only be generated if the sum of the forecasts is higher than this threshold.
Several predictor types are implemented, detailed in the following sections.

9.6.1

svm-predictor

The svm-predictor supports both classification and regression forecasts, and several kernel
types are available. We recommend using classification with the RBF kernel to start with.
Below is an sample configuration snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32

<svm-predictor>
<identifier>svm-c-rbf</identifier>
<predictor-weight>1.0</predictor-weight>
<model>h4-features</model>
<continuous-tune>false</continuous-tune>
<continuous-tune-num-param-sets>1</continuous-tune-num-param-sets>
<model-min-accuracy>54.0</<model-min-accuracy>
<predictor-weight>1.0</predictor-weight>
<params> <!-- 56.1% -->
<penalty>512</penalty>
<gamma>0.25</gamma>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 55.1% -->
<penalty>128</penalty>
<gamma>0.25</gamma>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 54.7% -->
<penalty>2048</penalty>
<gamma>0.25</gamma>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<num-training-observations>2000</num-training-observations>
<num-training-skip>1</num-training-skip>
</svm-predictor>

This example uses the features generated by the h4-features model. It has three sets of
parameters, so it is actually a mini ensemble of three predictors. The forecast value for signal
generation purposes is the sum of the three predictors so this example can only output the
following values: 3, 1, 1, 3. The predictor-weight is set at 1.0. If we had other predictors
we can adjust this option to weight the individual predictors.
In the example configuration above, a training set comprises 2000 training examples using a
sliding window of 1 defined by the num-training-skip option. This means that the 2000
training examples are sampled at the close of each candle using the most recent history and
working back in time. If we had set num-training-skip to 5, for example, then training
samples would be created every 5th candle.
Each individual SVM has its own set of parameters. In the above example all the SVMs are
classification as svm-type is set to SVC, with a RBF kernel. This combination requires that the

CHAPTER 9. CONFIGURATION DETAILS

59

penalty and gamma (Gaussian width of the RBF kernel) be defined.


Ensembles: Bucket of Models
A bucket of models is a type of ensemble where a number of models are tested and the
best one selected for forecasting. DeepThought achieves this by providing the parameters
continuous-tune and model-min-accuracy. The continuous-tune parameter is set to True
to enable continuous selection of models. After each candle completes a forecast is performed
and orders placed. After order placement the system retrains. It first does this by performing a cross-validation across Penalty and gamma for classification and additionally epsilon
for regression. The best model is selected provided that its cross-validation accuracy is at
least the accuracy specified in model-min-accuracy. If not then the the action specified by
no-model-behaviour is performed. To summarise the steps:
1. At strategy spin-up, run cross validation to find the best model. Alternatively load a
previously created model.
2. At the end of the candle, forecast and trade using the current model.
3. Run a cross-validation, including the newly completed candle in the training set.
4. If the best model(s) in the cross-validation set has accuracy of at least the accuracy specified by model-min-accuracy, then replace the model to be used at the next forecast.
5. Wait for the close of the candle, then repeat from step 2.
svm-predictor Options
Table 9.15 lists all options for the svm-predictor.
Table 9.16 lists the options for the params sections of the svm-predictor.

CHAPTER 9. CONFIGURATION DETAILS

60

Table 9.15: svm-predictor configuration options.


Option
identifier
predictor-weight
model
continuous-tune

continuous-tune-num-param-sets
model-min-accuracy

no-model-behaviour

params
num-training-observations

num-training-skip

Description
A unique identifier for this predictor. Used in log and
output files.
The weight given to this predictor when multiple predictors are used. Can be negative.
The identifier of the model that the predictor operates on.
Values are True or False. If set to True will
conduct a parameter search (for penalty, gamma)
after each forecast, before retraining.
The top
continuous-tune-num-param-sets parameter sets
are used. Note that any params sections are ignored if
this option is set as it produces params sections which
can be continuously changing.
The number of params sections to produce when the
continuous-tune parameter is set to True.
The minimum accuracy for a new model to be selected
with continuous-tune (see above for a detailed explanation).
The action when a model cannot be found with accuracy of at least model-min-accuracy. Takes one of
the following values:
dont-trade
Close all orders and do not
forecast
use-last-model
Use the last best model(s)
use-default-params Use the parameters defined in
the <params> section.
The parameter set of an individual SVM. Mandatory
to have at least one params section.
The number of training examples in a training set.
The larger this number, the further back in time samples are drawn from.
A sliding window is used to select training examples.
This parameter sets the number of bars that the sliding window is moved.

CHAPTER 9. CONFIGURATION DETAILS

Table 9.16: params configuration options for the svm-predictor.


Option
penalty
gamma
epsilon
degree
coeff
svm-type

kernel

Description
The penalty (sometimes written as C).
The Gaussian width of an RBF kernel when the kernel is set to rbf.
The epsilon insensitivity, used when the svm-type is SVR.
The polynomial degree when kernel is set to polynomial.
A coefficient used in the polynomial and sigmoid kernels.
The prediction type if the SVM. Takes one of the following values:
SVC
Classification (two class).
SVR
Regression (continuous values).
The kernel to use. All kernels require the penalty and gamma options.
Takes one of the following values:
rbf
Gaussian Radial Basis Function.
Requires the
epsilon option when svm-type is SVR.
k(xi , xj ) = exp(kxi xj k2 )
where =gamma
linear
Linear kernel.
k(xi , xj ) = xi xj
polynomial Polynomial kernel. Requires degree to specify the
polynomial degree.
k(xi , xj ) = (xi xj + c)d
where c=coeff, d=degree
sigmoid
Sigmoid (hyperbolic tangent) kernel.
k(xi , xj ) = tanh(xi xj + c)
where c=coeff, =gamma

61

CHAPTER 9. CONFIGURATION DETAILS

9.6.2

62

linear-svm-predictor

The linear-svm-predictor is an SVM supporting only linear models. Use with caution as
linear modelling may not be the best way to model financial markets, however the linear-svm-predictor may be useful as part of an ensemble containing different predictor types. More
information on linear SVM is at http://www.csie.ntu.edu.tw/~cjlin/liblinear/. Below is
a sample configuration snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<linear-svm-predictor>
<identifier>linear-svm-1</identifier>
<predictor-weight>1.0</predictor-weight>
<model>15min-features</model>
<params>
<penalty>1.0</penalty>
<solver-type>L2R_L2LOSS_SVC_DUAL</solver-type>
</params>
<params>
<penalty>256.0</penalty>
<epsilon>0.05</epsilon>
<solver-type>L2R_L2LOSS_SVR_DUAL</solver-type>
</params>
<num-training-observations>1000</num-training-observations>
<num-training-skip>1</num-training-skip>
</linear-svm-predictor>

This example uses the features generated by the 15min-features model. It has two set of
SVM parameters, so it is actually a mini ensemble of two predictors. The first set defines a
classification predictor and the second set defines a regression predictor. The forecast value for
signal generation purposes is the sum of the two predictors. If we had other predictors, we can
adjust predictor-weight to weight the final value. This example uses a training set comprising
1000 training examples using a sliding window of 1 defined by the num-training-skip option.
This means that the 1000 training examples are sampled at the close of each candle using the
most recent history and working back in time. If we had set num-training-skip to 5 for
example, then training samples would be created every 5th candle.
Each individual SVM has its own set of parameters. In this example the first SVMs is a classifier
as solver-type is set to L2R L2LOSS SVC DUAL and the second forecasts a value (regression) as
the solver-type is L2R L2LOSS SVR DUAL. Its probably not a good idea to mix classification
and regression like this, we just show it here as a configuration example.
Table 9.17 lists all options for the linear-svm-predictor and table 9.18 lists the options for
the params sections of the linear-svm-predictor.

CHAPTER 9. CONFIGURATION DETAILS

63

Table 9.17: linear-svm-predictor configuration options.


Option
identifier
predictor-weight
model
params
num-training-observations

num-training-skip

Description
A unique identifier for this predictor. Used in
log and output files.
The weight given to this predictor when multiple
predictors are used. Can be negative.
The identifier of the model that the predictor
operates on.
The parameter set of an individual SVM.
Mandatory to have at least one params section.
The number of training examples in a training
set. The larger this number, the further back in
time samples are drawn from.
A sliding window is used to select training examples. This parameter sets the number of bars
that the sliding window is moved.

Table 9.18: params configuration options for the linear-svm-predictor.


Option
penalty
epsilon
solver-type

Description
The penalty (sometimes written as C).
The epsilon insensitivity, used for regression problems. Not used for
classification.
The solver used in the SVM. Takes one of the following values (note SVC
for classification and SVR for regression:
L2R LR
L2-regularized logistic regression (primal)
L2R L2LOSS SVC DUAL L2-regularized L2-loss support vector classification
(dual)
L2R L2LOSS SVC
L2-regularized L2-loss support vector classification
(primal)
L2R L1LOSS SVC DUAL L2-regularized L1-loss support vector classification
(dual)
MCSVM CS
support vector classification by Crammer and Singer
L1R L2LOSS SVC
L1-regularized L2-loss support vector classification
L1R LR
L1-regularized logistic regression
L2R LR DUAL
L2-regularized logistic regression (dual)
L2R L2LOSS SVR
L2-regularized L2-loss support vector regression (primal)
L2R L2LOSS SVR DUAL L2-regularized L2-loss support vector regression (dual)
L2R L1LOSS SVR DUAL L2-regularized L1-loss support vector regression (dual)

CHAPTER 9. CONFIGURATION DETAILS

9.6.3

64

gbt-predictor

Gradient boosted trees are a decision tree based predictor. An initial decision tree is constructed
and subsequent trees are trained on the errors of the previous trees. They are good at generalising
and over fitting tends not to be an issue. Below is an example configuration snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15

<gbt-predictor>
<identifier>gbt-1</identifier>
<predictor-weight>1.0</predictor-weight>
<model>15min-features</model>
<params>
<num-trees>500</num-trees>
<depth>6</depth>
</params>
<params>
<num-trees>800</num-trees>
<depth>5</depth>
</params>
<num-training-observations>1000</num-training-observations>
<num-training-skip>1</num-training-skip>
</gbt-predictor>

The above example has a mini ensemble of two GBT predictors defined by the two sets of
parameters. These param sets apply in the same was as for the SVM predictor described in
section 9.6.1. Table 9.19 details the options for the gbt-predictor configuration section.
Table 9.19: params configuration options for the gbt-predictor.
Option
identifier
predictor-weight
model
params
num-training-observations

num-training-skip

Description
A unique identifier for this predictor. Used in
log and output files.
The weight given to this predictor when multiple
predictors are used. Can be negative.
The identifier of the model that the predictor
operates on.
The parameter set of an individual GBT.
Mandatory to have at least one params section.
The number of training examples in a training
set. The larger this number, the further back in
time samples are drawn from.
A sliding window is used to select training examples. This parameter sets the number of bars
that the sliding window is moved.

Table 9.20 lists the options for the params section of the gbt-predictor.
Option
num-trees

depth

Description
The number of decision trees to use. The higher the number, the more accurate the prediction, however it increases
computation time with diminishing returns.
The depth of the individual decision trees.
Table 9.20: gbt-predictor options.

CHAPTER 9. CONFIGURATION DETAILS

9.6.4

65

random-forest-predictor

The random forest predictor works by randomly selecting features and training samples from
the training set. Another way to view this is if we consider the training set to be a matrix,
decision trees are constructed on random selections of the rows and columns. This creates a
forest of decision trees. When forecasting, the majority class produced by all the trees is the
final prediction. The configuration is similar to the gbt-predictor configuration. Below is an
example configuration snippet:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<random-forest-predictor>
<identifier>gbt-1</identifier>
<predictor-weight>1.0</predictor-weight>
<model>15min-features</model>
<predictor-weight>1.0</predictor-weight>
<params>
<num-trees>500</num-trees>
<depth>6</depth>
</params>
<params>
<num-trees>800</num-trees>
<depth>5</depth>
</params>
<num-training-observations>1000</num-training-observations>
<num-training-skip>1</num-training-skip>
</random-forest-predictor>

The above example has a mini ensemble of two random forest predictors defined by the two
sets of parameters. These param sets apply in the same was as for the SVM predictor described in section 9.6.1. Table 9.21 details the options for the random-forest-predictor
configuration section and table 9.22 lists the options for the params section of the
random-forest-predictor.
Table 9.21: params configuration options for the random-forest-predictor.
Option
identifier
predictor-weight
model
params

num-training-observations

num-training-skip

Description
A unique identifier for this predictor. Used in
log and output files.
The weight given to this predictor when multiple
predictors are used. Can be negative.
The identifier of the model that the predictor
operates on.
The parameter set of an individual Random Forest. Mandatory to have at least one params section.
The number of training examples in a training
set. The larger this number, the further back in
time samples are drawn from.
A sliding window is used to select training examples. This parameter sets the number of bars
that the sliding window is moved.

Table 9.22: Random Forest random-forest-predictor options.


Option
num-trees
depth

Description
The number of decision trees to use in the forest.
The maximum depth of the individual decision trees.

CHAPTER 9. CONFIGURATION DETAILS

9.6.5

66

extremely-randomised-trees-predictor

Extremely randomised trees are a variation of random forests. The parameters are identical
to the random-forest parameters. The only difference is that the configuration section is
defined by extremely-randomised-trees-predictor. See section 9.6.4 above for details on
the configuration.

9.6.6

multi-layer-perceptron-predictor

Also known more popularaly as a Neural Network. This predictor comprises one or more hidden
layers with variable number of neurons per layer. The topology is illustrated in figure 4.2 on
page 8. Three variations are available for the multi-layer perceptron (MLP). These are
Regression - forecasting the price move.
Classification - forecasting 1 or 1 for up/down.
Classification - forecasting a value between 1 and 1, where the sign (positive or negative)
indicates direction and the magnitude indicates the probability or certainty.
A sample configuration snippet is:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<multi-layer-perceptron-predictor>
<identifier>nn-1</identifier>
<predictor-weight>1.0</predictor-weight>
<model>h4-features</model>
<params>
<training-algo>rprop</training-algo> <!-- rprop|backpropagation -->
<hidden-layers>20</hidden-layers>
<activation-function>sigmoid</activation-function> <!-- identity|sigmoid|gaussian -->
<max-iterations>1000</max-iterations>
<termination-epsilon>0.01</termination-epsilon>
<forecast-type>classification</forecast-type> <!-- classification|regression. -->
<classification-output>value</classification-output> <!-- binary|value -->
</params>
<num-training-observations>1000</num-training-observations>
<num-training-skip>1</num-training-skip>
</multi-layer-perceptron-predictor>

Table 9.23 details the options for the multi-layer-perceptron-predictor configuration section and table 9.24 lists the options for the params sections.

CHAPTER 9. CONFIGURATION DETAILS

Table 9.23: Multi-layer Perceptron multi-layer-perceptron-predictor options.


Option
identifier
predictor-weight
model
params

num-training-observations

num-training-skip

Description
A unique identifier for this predictor. Used in
log and output files.
The weight given to this predictor when multiple
predictors are used. Can be negative.
The identifier of the model that the predictor
operates on.
The parameter set of an individual multi-layer
perceptron. Mandatory to have at least one
params section.
The number of training examples in a training
set. The larger this number, the further back in
time samples are drawn from.
A sliding window is used to select training examples. This parameter sets the number of bars
that the sliding window is moved.

Table 9.24: params configuration options for the multi-layer-perceptron-predictor.


Option
training-algo

hidden-layers
activation-function

max-iterations
termination-epsilon
forecast-type

classification-output

Description
DeepThought
supports
two
training
algorithms.
training-algo must be one of:
rprop
Resilient backpropagation
backpropagation Standard back-propagation
Comma delimited list of the number of neurons in the hidden
layer(s). Use a single number for one hidden layer.
The function applied to the output of a neurons value. Must
be one of:
identity
Use the output value as is.
sigmoid
The most commonly used function.
gaussian
An experimental gaussian function.
Stop training after this many iterations.
Stop training when the error drops below this value. Use 0
to disable.
Must be one of:
classification
Up/down classification problem.
regression
Regression problem forecasting the change
in price.
If forecast-type has been set as classification, two
variations are available. These are:
binary
Output is 1 or 1 for up/down.
value
Output is a value between 1 and 1,
where the sign (positive or negative) indicates direction and the magnitude indicates the probability or certainty.

67

CHAPTER 9. CONFIGURATION DETAILS

9.6.7

68

python-predictor

This is a custom predictor where you supply Python code. This is discussed in more detail
in secion 8.4 on page 30 with example Python scripts. The predictor need not be machine
learning based. In fact it is possible to use standard mechanical technical analysis using a
Python predictor. A sample configuration snippet is given below.
1
2
3
4
5
6
7
8
9
10
11
12

<python-predictor>
<model>h4-features</model>
<identifier>python-predictor-h4</identifier>
<predictor-weight>1.0</predictor-weight>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>
<predict-func>Predict</predict-func>
<train-func>Train</train-func>
<parameter name="ma-long" type="int">20</parameter>
<parameter name="ma-short" type="int">5</parameter>
<num-training-observations>25</num-training-observations>
<num-training-skip>1</num-training-skip>
</python-predictor>

The above example is a configuration for a moving-average cross predictor where two parameters are supplied to the Python script. Note that the number of training observations is set only
to 25. In this example we only need enough observations to compute the moving averages.
Table 9.25 details the options for the python-predictor.

9.7

predictor-ensemble

The predictor-ensemble parameter only defines a single option. A sample configuration snippet is given below.
1
2
3

<predictor-ensemble>
<retrain-period>Weekly</retrain-period>
</predictor-ensemble>

The retrain-period option will retrain all predictors after forecasting and signal generation.
The values for retrain-period are given in table 9.26.

CHAPTER 9. CONFIGURATION DETAILS

69

Table 9.25: python-predictor parameter settings.


Option
identifier
predictor-weight
model
set-parameter-value-func-name

predict-func
train-func
parameter

num-training-observations

num-training-skip

Description
A unique identifier for this predictor.
The weight given to this predictor when multiple predictors are used. Can be negative.
The identifier of the model that the predictor operates on.
The name of the function that is used to set parameters used by other functions in the script. Typically
SetParameterValue
The name of the function that performs the prediction.
Typically Predict
The name of the function that does training on historical data. Typically Train.
An optional parameter that is passed via
set-parameter-value-func-name. You can have as
many parameters defined as you need. The values are
able to be set using the genetic algorithm if desired.
Two attributes must be set with this element:
name Name of the parameter which will be passed
as a string to the function defined by
set-parameter-value-func-name.
type Takes one of the following values: int,
string, double. The value of parameter is
passed as this type to the function defined by
set-parameter-value-func-name.
The number of training examples in a training set.
The larger this number, the further back in time samples are drawn from.
A sliding window is used to select training examples.
This parameter sets the number of bars that the sliding window is moved.

Table 9.26: retrain-period options for the predictor-ensemble.


Option
none
each-bar
daily
weekly
monthly

Description
Dont retrain after the initial training.
Train after each bar of the bar-series defined by
the target in each model.
Retrain each day at 00:00.
Retrain weekly after the first forecast on Monday.
Retrain monthly after the first forecast on the
first Monday of the month.

CHAPTER 9. CONFIGURATION DETAILS

9.8

70

signal-generator

The signal-generator configuration section defines how a signal is created from a predictor
ensemble. A signal in this sense is an action to do something. This could be buy, sell, place
limit order, cancel unfilled orders, do nothing or close all trades. You can control the action
of the signals in the Metatrader EA provided. You can also supply optional Python script to
combine forecasts into a signal. More detail on Python scripting in the signal generator is given
in section 8.5 on page 32. A configuration snippet is show below.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<entry-threshold>0.0</entry-threshold>
<set-parameter-value-func-name>SetParameterValue</set-parameter-value-func-name>
<combine-forecasts-func-name>CombineForecasts</combine-forecasts-func-name>
<parameter name="threshold" type="double">20.0</parameter>
<take-profit>50.0</take-profit>
<stop-loss>0.0</stop-loss>
<break-even>20.0</break-even>
<exit-all-hour>-1</exit-all-hour>
<trade-bar-series>EURUSDm1</trade-bar-series>
<reverse-signals>False</reverse-signals>
</signal-generator>

This example will place trades at any time of the day on any day of the week. There is no
threshold so if the ensemble returns a positive value it will place a buy and if the ensemble
returns a negative value it will place a sell. Take profit is set to 50 pips, and a stop loss will
be moved to break-even +1 pip when an open position is 20 pips in profit. Table 9.27 lists the
options for the signal-generator section.
Table 9.27: signal-generator configuration options
Option
entry-times

entry-threshold

Description
Defines the hours and days that the signal generator will
generate signals for. The parameters in this subsection
are:
hour
Use all for all hours, or a comma
separated list of the allowed hours.
For example, to trade only at 10am,
12pm and 4pm the entry would be
<hour>10,12,16</hour>
day-of-week Use all for all days, or a comma
separated list of the allowed days.
Numbers (0 is Sunday) or three character representations can be used. For
example to only trade on Tuesday,
Wednesday and Thursday the entry
would
be
<day-of-week>Tue,Wed,
Thu</day-of-week>
or
<day-of-week>2,3,4</day-of-week>.
This is the value that the ensemble must exceed to generate a signal.

CHAPTER 9. CONFIGURATION DETAILS

set-parameter-value-func-name

combine-forecasts-func-name

parameter

take-profit
stop-loss
break-even

exit-all-hour

trade-bar-series
reverse-signals

71

Optional. Only needed if you are using Python scripting.


The name of the function that is used to set parameters
used by other functions in the script. Typically SetParameterValue
Optional. Only needed if you are using Python scripting.
The name of the function that is used to combine the
forecasts from the predictors into a signal. This function
must send buy/sell/close signals as it overrides the rules
built into DeepThought . Typically SetParameterValue
Only needed if using Python scripting and your Python
function requires parameters. Defines a parameter that is
passed via set-parameter-value-func-name. You can
have as many parameters defined as you need. The values
are able to be set using the genetic algorithm if desired.
Two attributes must be set with this element:
name
Name of the parameter which will be
passed as a string to the function defined
by set-parameter-value-func-name.
type
Takes one of the following values:
int, string, double.
The
value of parameter is passed as
this type to the function defined by
set-parameter-value-func-name.
Optional. Set to 0.0 to disable. The take profit in pips
set when an order is sent to the broker.
Optional. Set to 0.0 to disable. The stop loss in pips set
when an order is sent to the broker.
Optional. Set to 0.0 to disable. If an open position exceeds this value in pips, a stop loss is set to break even
+1 pip.
Optional. Set to -1 to disable. The hour of the day to
close all trades. For example if this was set to 12, then
at 12pm each day all trades will be closed.
The identifier of the bar-series that the orders are
placed on.
If set to True will reverse all signals; Sell instead of buy
and buy instead of sell. Use with caution and only when
you are certain that your ensemble is reliably wrong.

CHAPTER 9. CONFIGURATION DETAILS

9.9

72

trader

The trader is responsible for simulating trades and passing signals to the trading platform. A
sample configuration snippet is given below and table 9.28 lists the options for trader.
1
2
3
4
5
6
7
8
9

<trader>
<hold-minutes>0</hold-minutes>
<hold-bars>0</hold-bars>
<max-drawdown>100000</max-drawdown>
<scale-out>False</scale-out>
<max-position>100</max-position>
<direction>both</direction>
<limit-orders offset="0.0">False</limit-orders>
</trader>

Table 9.28: trader configuration options.


Option
hold-minutes

hold-bars

max-drawdown
scale-out

max-position
direction

limit-orders

Description
The number of minutes to hold an open position. After a position has
been open for the number of minutes given here, the position will be
automatically closed. This auto-close function is disabled if the value is
set to 0.
The number of bars to hold an open position. After a position has been
open for the number of bars given here the position will be automatically
closed. This auto-close function is disabled if the value is set to 0.
A backtest will be halted if a drawdown of this many pips is encountered.
Useful for the genetic algorithm to abandon bad configurations.
If set to True reduces the number of positions (by two) when a reverse
signal is encountered. For example, if we are long by 5 positions and a
sell signal is received, the position will be reduced to 3. If this option
is set to False then the 5 positions will all be closed and a sell position
opened (or sell limit order placed if the limit-orders option has been
set).
The maximum number of open positions at any point in time.
The direction to trade in. Takes one of the following values:
both
Trade in both directions.
long
Only take long trades.
short Only take short trades.
If this is True limit orders will be placed, offset pips below the bid
for a buy order, and offset pips above the ask for a sell order. If
limit-orders is False then market orders are sent.

CHAPTER 9. CONFIGURATION DETAILS

9.10

73

backtest

The backtest configuration section is where parameters only related to backtesting are set.
This section is ignored when live and paper trading. A sample configuration snippet is given
below and table 9.29 lists the options.
1
2
3
4
5
6
7

<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2013-12-08</stop-date>
<use-recorded-signals>False</use-recorded-signals>
<display-progress>True</display-progress>
<execute-when-complete>python "C:\DeepThought\python\analyse_backtest_results.py" %CONFIG_LOCATION%</
execute-when-complete>
</backtest>

Table 9.29: backtest options.


Option
start-date
stop-date
used-recorded-signals

display-progress

execute-when-complete

Description
The date the backtest is to start from in the format yyyymm-dd.
The date the backtest is to finish in the format yyyy-mm-dd.
During a backtest (and live/paper trading) forecasts from
the predictors are recorded to a CSV file. These forecasts
can be used in backtests if you are not changing any predictor options. For example if you are only experimenting
with take-profit and stop-loss then the backtesting will
be quicker by several orders of magnitude.
When backtesting using recorded signals, the backtest is normally very quick, but can be slowed down if the progress is
printed to the console. Set display-progress to True to
turn off progress printing to speed the backtest even further.
Nothing is lost as everything is still logged to the log file.
A script to execute when the backtest has been completed.
You may have as many <execute-when-complete> entries
as you require. If the macro %CONFIG LOCATION% is present,
it will be replaced with the full path of the directory containing the configuration file. This enables scripts to parse the
various output files. In this example we are using a Python
script, but it can be anything.

CHAPTER 9. CONFIGURATION DETAILS

9.11

74

genetic-algo

The genetic algorithm functionality for parameter selection is described in detail in chapter 6.
A sample configuration snippet is given below and table 9.30 lists the available options.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16

<genetic-algo>
<ga-server>tcp://wraith</ga-server>
<ga-server-port>55566</ga-server-port>
<genome-id>-1</genome-id>
<timeout-minutes>360</timeout-minutes>
<population-size>20</population-size>
<objective-function>sharpe</objective-function>
<mutation-probability>10</mutation-probability>
<num-breeders-percent>30</num-breeders-percent>
<min-num-breeders>30</min-num-breeders>
<num-new-random-genomes>2</num-new-random-genomes>
<num-generations>10</num-generations>
<parameter id="stop-loss"
type="integer" low="10" high="200" step="5" />
<parameter id="take-profit" type="integer" low="10" high="200" step="5" />
<parameter id="time-of-day" type="categorical" values="h1,h4,single,none" />
</genetic-algo>

CHAPTER 9. CONFIGURATION DETAILS

75

Table 9.30: genetic-algo options.


Option
ga-server

ga-server-port
genome-id
timeout-minutes

population-size

objective-function

mutation-probability
num-breeders-percent

min-num-breeders

num-new-random-genomes

parameter

Description
The name of the machine that the genetic algorithm is being run from.
The GA is designed to run across multiple machines, so this tells the
remote machines where to send their results.
The TCP port to use. This can be any number 1-65535, however it
cannot conflict with any existing network services.
This is set by the genetic algorithm as a way of identifying genomes
being tested with backtest results. Leave this option value as 1.
If this is set to a value greater than zero, a generation will time-out
after this many minutes. This caters for the case where most genomes
in a population have been tested and the system is waiting for 1 or 2 to
complete. The GA server will cancel all running jobs after this timeout
and the (incomplete) results are abandoned.
This is the number of genomes that make up a generation. You will
generally want to set this to be around the number of cores you have
available in your cluster. The larger this number, the more genomes
can be tested simultaneously. Even if you only have access to a single
machine we recommend making this value at least 20.
The objective that we are optimising. This is one of the following values:
pnl
Maximise profit in Pips.
sharpe
Maximise the Sharpe ratio.
sortino
Maximise the Sortino ratio.
accuracy Maximise the accuracy.
When two genomes are being crossed, this parameter controls the probability of a mutation in the child genomes.
As each generation of genomes is tested the results for all genomes
are kept and the top num-breeders-percent are used to breed more
genomes.
This is the minimum number of breeders required to create a new generation. This setting will override the num-breeders-percent in the event
that the number of genomes produced by num-breeders-percent is
lower than the number defined by min-num-breeders. This can happen
in the first couple of generations if the population size is small. Another
way to think about these parameters is that the number of breeders is
the higher of num-breeders-percent and min-num-breeders.
This is the number of new randomly created genomes added at each
generation. This is used to ensure that new genetic material is added at
each generation.
We can have as many parameter sections as we need. Each parameter
section defines an individual parameter to be included in the optimisation. The options for parameter are documented in detail in table
9.31.

CHAPTER 9. CONFIGURATION DETAILS

Table 9.31: parameter configuration options for the genetic-algo configuration section.
Option
integer

categorical

exp-2

Description
Used when the parameter can be modelled as an integer. The options
available for the integer type are:
low
The lowest value that the integer can take.
high
The highest value that the integer can take.
step
The value to increment/decrement for different values
of this parameter.
Used when the parameter can only take certain (string) values. The
options available for the categorical type are:
values Comma separated list of values that this parameter
can take.
Used when the parameter is best suited to an exponential grid search.
For example SVM penalty, SVM gamma and SVM epsilon are
best searched using an exponential grid search. This means that
rather than use values that are linearly spaced such as 5, 10, 15, 20, ...,
we use values such as 21 , 22 , 23 , 24 , .... This results in final values of
2, 4, 8, 16, .... Note that negative numbers can be used and result in
the final values being less than 1. e.g. 25 , 24 , 23 , 22 , ... become
0.03125, 0.0625, 0.125, 0.25, .... The options available for the exp-2 type
are:
low
The lowest value that the exponent can take.
high
The highest value that the exponent can take.
step
The value to increment/decrement the exponent.

76

Chapter 10

Commandline Tools
The DeepThought commandline tool has more options than backtesting. It can be used to
generate data for analysis in other tools such as Python, R and Excel. It can also perform
diagnostic functions on your data. If the command does not appear to have worked as expected,
check the DeepThought.log file for errors and/or warnings.
To see a list of options at anytime, type
deepthought
at the command prompt, in the DeepThought installation directory.
lar to:
DeepThought built on Apr
Usage:
--backtest <directory>

9 2014 at 23:09:36

Backtest the configuration file in


<directory>. Various files are output
to the same location. The configuration
file must be named config.xml or
_config.xml.
--extract-training-set <directory>
Uses the config in <directory> to
generate training files for all <model>
sections in the config. Use
--num-samples to define the number of
samples, --num-skip X to take every Xth
sample, and --extract-to-date to only
extract samples before the supplied
date.
--extract-to-date YYYY-MM-DD
Used with --extract-training-set to
define the date to extract samples
before.
--generate-bars <database>
Generate a bar series from the given
database and save to csv with the given
filename given with the
--output-filename parameter. Requires
the --bar-type, --output-filename
parameters and --duration for
const-time, --price-movement for
const-price (Renko) candles.
--bar-type <string>
Used with --generate-bars. Type of bars
to generate. const-time | const-price-1
(Renko) | const-price-2 (Renko).
--output-filename <filename>
Used with --generate-bars. The name of
the CSV file to create. Created in the
same directory that DeepThought is run
from.
--price-movement <price delta>
Use with --generate-bars. When the
bar-type is specified as const-price
(Renko), this parameter defined the
magnitude of the price movement.
--generate-feature-stats <directory> Run through a backtest without
generating signals and generates
feature statistics.
--genetic-algo <config template>
Run a genetic algo using the supplied
file as the template. Requires a Condor
cluster.
--import-dukascopy-csv <csv filename> Import a CSV file downloaded from
Dukascopy using the tool from
http://www.strategyquant.com/tickdatado
wnloader/. Also need the --dbname
property.
--dbname <db name>
The full path of a database to load

77

The output is simi-

CHAPTER 10. COMMANDLINE TOOLS

78

Dukascopy data into.


--manual-trade-train-and-persist <directory>
Use the configuration in the supplied
directory to train the models for use
in generating a signal.
--manual-trade-generate-signal <directory>
Use the configuration in the supplied
directory to generate the latest
signal.
--num-samples <number of samples>
Used with --extract-training-set to
define the number of samples to
extract. Set to 0 to extract all
available samples.
--num-skip <number of samples to skip>
Used with --extract-training-set to
define the number of samples to skip
per extracted sample.
--print-config
Print example config.
--stats <database>
Print out some interesting statistics
for the data in the given candle
database. Requires the --duration
parameter and optional --delay,
--start-date and --data-file-dir
parameters.
--data-file-dir <directory>
Used with --stats and --check-db.
Optional directory where database files
are located. Default is C:\FX_Database.
--delay <Int>
Used with --stats. Indicates the delay
to use when creating candles. Optional.
--duration <minutes>
Used with --stats. The duration in
minutes of the candles in the datafile.
--multiplier <int>
Used with --stats. Optional price to
pips multiplier. Default is 10000.
--start-date <date>
Used with --stats. Optional start date
for when the stats are generated. in
the format YYYY-MM-DD.
--gbt-param-search-c <Filename>
Perform a parameter grid search for GBT
classification problem on the supplied
file in libSVM format.
--svm-param-search-c <Filename>
Perform a parameter grid search for an
SVM classification problem on the
supplied file in libSVM format.
--svm-param-search-r <Filename>
Perform a parameter grid search for an
SVM regression problem on the supplied
file in libSVM format.
--version
Print version info.

The various options are detailed in the following sections.

10.1

Candle Statistics (--stats)

This function prints some interesting statistics on candles generated from 1 minute candles. This
can be useful to get a feel for the average move of a bar at a certain time of day. The command is:
deepthought --stats EURUSD --data-file-dir C:\FX Database --duration 90
This example will generate statistics on the database EURUSD.db located in C:\FX Database
with a duration of 90 minutes. The output is similar to:
Hour
0
1
3
4
6
7
9
10
12
13
15
16
18
19
21
22

High-Low
32.1042
27.2416
23.9615
26.1151
40.6504
45.0207
42.4698
38.3129
57.4895
54.2737
47.0676
34.3027
33.2836
27.5136
24.2859
25.9693

High-Open
14.2428
13.9522
11.6187
12.7399
19.5089
21.5749
21.0607
17.2918
23.2958
27.0638
22.5515
17.4355
15.9249
13.5833
11.1304
12.0976

Open-Close
15.1128
11.7623
9.69082
10.4736
17.7179
20.7954
19.0148
16.6603
22.0288
25.4632
21.0145
15.2099
14.4872
12.5106
9.15172
10.7947

Open-Low
17.8614
13.2895
12.3428
13.3752
21.1415
23.4458
21.4091
21.0211
34.1937
27.2098
24.5161
16.8672
17.3587
13.9303
13.1555
13.8717

Down Candle:H-O
7.18116
6.35232
5.92409
6.86057
10.5305
10.494
10.5383
8.9854
12.2934
13.3921
11.274
8.24733
7.99955
6.70084
6.46652
6.12017

UpCandle:O-L
8.09004
7.90242
6.88904
7.9061
12.0605
12.1731
11.5586
11.2163
13.4976
13.8574
12.0082
9.64635
9.24595
7.73969
7.9826
8.07086

The meanings of each column are given in table 10.1, where a Down Candle is defined as the
close price being lower than the open price, and an Up Candle is defined as the close price
being higher than the open price.

CHAPTER 10. COMMANDLINE TOOLS

79

Table 10.1: Column meanings using the --stats commandline option.


Column
Hour
High-Low
High-Open
Open-Close
Open-Low
DownCandle:H-O

UpCandle:O-L

Description
The hour of the day.
Average price difference in pips between the high and low
prices.
Average price difference in pips between the high and open
prices.
Average price difference in pips between the open and close
prices.
Average price difference in pips between the open and low
prices.
The average price difference for Down Candles only between the high and open prices. Useful for optimising the
level to place limit orders.
The average price difference for Up Candles only between
the open and low prices. Useful for optimising the level to
place limit orders.

Note that the command above generates the statistics using all data in the database it shows
nothing about the change in statistics over time. To generate statistics using only recent data
use the --start-date option in the format YYYY-MM-DD. For example:
deepthought --stats EURUSD --data-file-dir C:\FX Database
--duration 90 --start-date 2013-01-01
produces the following output:
Hour
0
1
3
4
6
7
9
10
12
13
15
16
18
19
21
22

High-Low
35.377
29.8979
26.3702
28.7024
44.1595
48.3887
45.716
41.5242
61.8966
57.4965
50.4498
36.5071
35.4195
30.0237
26.8316
28.6849

10.2

High-Open
15.5626
15.1253
12.7431
13.9491
21.1497
23.01
22.6288
18.72
24.3663
28.4277
23.8847
18.3648
16.9111
14.7538
12.1506
13.1876

Open-Close
16.6329
12.7776
10.4709
11.3823
19.0823
22.2336
20.4367
17.9344
22.9623
26.5514
22.2757
16.1003
15.2546
13.6816
10.029
11.8746

Open-Low
19.8144
14.7726
13.6271
14.7533
23.0098
25.3787
23.0872
22.8041
37.5303
29.0688
26.5651
18.1423
18.5084
15.2699
14.681
15.4972

Down Candle:H-O
7.80126
6.95526
6.59142
7.50945
11.4971
11.2253
11.4784
9.85978
12.899
14.2292
12.1771
8.78008
8.61007
7.25489
7.13001
6.72975

UpCandle:O-L
9.05009
8.81505
7.7593
8.84543
13.2004
13.3212
12.4819
12.2913
14.9961
15.0963
13.0914
10.3921
10.0501
8.50051
8.89532
9.02975

Generate Bars (--generate-bars)

The --generate-bars function creates candles in a CSV file for analysis. Standard constant
time and Renko (see section 9.1.1) type 1, type 2 bars can be generated. The following
command generates 45 minute candles and stores them in EURUSDm45.csv:
deepthought --generate-bars EURUSD --bar-type const-time --duration 45
--output-filename EURUSDm45
To generate Renko bars, use the following example:
deepthought --generate-bars EURUSD --bar-type const-price-1 --price-movement

CHAPTER 10. COMMANDLINE TOOLS

80

0.002 --output-filename EURUSD Renko 20


this will generate type 1 Renko bars with a price difference of 0.002 (20 pips).
10.2 lists the required parameters for the --generate-bars function.

Table

Table 10.2: --generate-bars parameters.


Parameter
--generate-bars
--bar-type

--duration
--price-movement
--data-file-dir
--output-filename

10.3

Description
the argument immediately following, is the database. In the above example the database file is eurusd.db in C:\FX Database
The type of bar to generate. Must be one of:
const-time
Standard constant time candles. Must also provide
the duration parameter.
const-price-1 Renko type 1 bars.
Must also provide the
price-movement parameter.
const-price-2 Renko type 2 bars.
Must also provide the
price-movement parameter.
The duration, in minutes, of a constant time candle specified by a
--bar-type of const-time.
The price movement of a Renko bar specified when --bar-type is
const-price-1 or const-price-2.
Optional parameter specifying the location of the database. The default
of C:\FX Database is used if this parameter is not supplied.
The name of the CSV file to write the results. Will be created in the
same directory as DeepThought.

Generating a Manual Signal

This is covered in more detail in section 7.1. The commands given below are an example of generating a manual signal from the configuration in C:\DeepThought Configs\EURUSD Strategy 1:
deepthought --manual-trade-train-and-persist C:\DeepThought
Configs\EURUSD Strategy 1
Once the models have been trained, we can now generate the actual signal with:
deepthought --manual-trade-generate-signal C:\DeepThought
Configs\EURUSD Strategy 1
The output will be similar to below:
DeepThought built on Jan
BUY
Consensus=25
NumberOfPredictors=45

10.4

7 2014 at 16:48:35

Generating Feature Statistics (--generate-feature-stats)

During a backtest and live/paper trading, feature statistics are generated for each model in
the configuration. The --generate-feature-stats function generates these statistics without

CHAPTER 10. COMMANDLINE TOOLS

81

performing any of the other backtest functions, thus it is a quick way to generate the feature
statistics.
Feature statistics are useful for checking data, and for ideas around what static stop loss and
take profit levels should be. The statistics are minimum, maximum, mean and standard
deviation. They are generated for each training set so are regenerated at the end of each
candle. Although the statistics can change over time, there should be no spikes or large changes.
The statistics are generated using the following command:
deepthought --generate-feature-stats C:\DeepThought Configs\EURUSD GA

10.5

Extracting a Training Set (--extract-training-set)

A training set can be extracted using the following command:


deepthought --extract-training-set ExampleConfigs\EURUSD MA
This will extract training set for all models defined in the configuration in the directory
ExampleConfigs\EURUSD MA. It will create files in libSVM format and CSV. The CSV file is
useful for examination in Excel or any other package. It also gives you more clarity on what
features are being generated and what the data actually looks like. Note that the extracted
data has had scaling applied so if you want to see the raw data, set the scale-type of each
feature to none.
We can also optionally add --num-samples to set the number of training examples in the
training set extracted and --num-skip to define how many candles the window moves between
each sample extraction. For example to extract a training set of 1500 samples, sampled at every
second bar, use the following command:
deepthought --extract-training-set ExampleConfigs\EURUSD MA
--num-samples 1500 --num-skip 2

10.6

SVM Grid Search (--svm-param-search-c)

A grid search is a brute-force approach to finding parameters. Support vector machines


require hyper-parameters to be set. These are the Penalty and Gaussian for classification,
and Penalty, Gaussian and Epsilon for regression. The steps for a grid search are:
deepthought --extract-training-set ExampleConfigs\EURUSD MA
deepthought --svm-param-search-c h4-features.training.data
The first line extracts training sets for the models defined in config.xml in the directory ExampleConfigs\EURUSD MA. In this example there is only one model and the name of
the file produced is h4-features.training.data in libSVM format. The second command
performs the parameter search.

CHAPTER 10. COMMANDLINE TOOLS

82

The results are in the log file. The log file also contains configuration text for each parameter
set. Below is a sample of what is written to the log file for a regression model. This can be cut
and pasted into your configuration file. You can either take all parameter sets that had a cross
validated accuracy of greater than 50%, the top n parameter sets or just use all parameter sets.
If some combinations of parameters produce an accuracy of 0%, this means that the problem
couldnt converge and the parameter combination should be discarded.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25

<params> <!-- 55.4% -->


<penalty>2</penalty>
<gamma>0.25</gamma>
<epsilon0.00390625</epsilon>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVR</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 55.1% -->
<penalty>8192</penalty>
<gamma>0.00390625</gamma>
<epsilon0.00390625</epsilon>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVR</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 55% -->
<penalty>8</penalty>
<gamma>0.0625</gamma>
<epsilon0.015625</epsilon>
<forecast-weight>1.0</forecast-weight>
<svm-type>SVR</svm-type>
<kernel>rbf</kernel>
</params>
...

For regression use the following commands:


deepthought --extract-training-set ExampleConfigs\EURUSD MA
deepthought --svm-param-search-r h4-features.training.data

10.7

GBT Grid Search (--gbt-param-search-c)

This is similar to the support vector machine parameter grid search above. In this case we are
optimising for the number of trees (num-trees) and tree depth (depth). The steps for a GBT
grid search are:
deepthought --extract-training-set ExampleConfigs\EURUSD Single GBT
deepthought --gbt-param-search-c h4-features.training.data
The first line extracts training sets for the models defined in config.xml in the directory ExampleConfigs\EURUSD Single GBT. In this example there is only one model and the
name of the file produced is h4-features.training.data in libSVM format. The second
command performs the parameter search. The results are in the log file. The log file also
contains configuration text for each parameter set. Below is a sample of what is written to the
log file for a regression model. This can be cut and pasted into your configuration file. You can
either take all parameter sets that had a cross validated accuracy of greater than 50%, the top
n parameter sets or just use all parameter sets. If some combinations of parameters produce an
accuracy of 0%, this means that the problem couldnt converge and the parameter combination

CHAPTER 10. COMMANDLINE TOOLS

83

should be discarded.
<params> <!-- 56% -->
<num-trees>850</num-trees>
<depth>4</depth>
</params>
<params> <!-- 55.5% -->
<num-trees>450</num-trees>
<depth>6</depth>
</params>
<params> <!-- 55.3% -->
<num-trees>250</num-trees>
<depth>12</depth>
</params>
<params> <!-- 55.2% -->
<num-trees>1200</num-trees>
<depth>5</depth>
</params>
...

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17

10.8

Printing
XML
(--print-config)

Configuration

Documentation

A detailed configuration will be printed to the console with the following command:
deepthought --print-config
This can be save to file using a pipe:
deepthought --print-config > config-details.xml
where the configuration will be saved in the file config-details.xml.
shown on the following pages, useful as a reference guide.

The output is

47

46

45

44

43

42

41

40

39

38

37

36

35

34

33

32

31

30

29

28

27

26

25

24

23

22

21

20

19

18

17

16

15

14

13

12

11

10

<config>
<!-- Values are case insensitive. -->
<bar-series-collection>
<data-file-dir>C:\FX_Database</data-file-dir> <!-- Specifies where historical database files are kept. -->
</bar-series-collection>
<!-- We define as many bar-series as we need. Normally wed define one which takes 1 min candles, then others -->
<!-- that feed from this bar-series to generate longer time frames, renko bars, etc. -->
<bar-series>
<identifier>EURUSDm15</identifier> <!-- Identifier for this BarSeries. Used by other sections
to reference this instance. -->
<bar-series-type>const-time</bar-series-type> <!-- const-time* | const-price-method-1 | const-price-method-2
const-time is standard candle, others are Renko. -->
<save-to-history-source>True</save-to-history-source> <!-- If save is called, this flag indicates whether
to save this BarSeriess data or not. -->
<history-source-type>bar-series</history-source-type> <!-- bar-series | sqlite. Where to get history from. -->
<history-source-in>EUR.USD</history-source-in> <!-- Detail on where to find history. If the data is from
another bar-series, this is the identifier of the other series.
If the data is sourced from an sqlite database,
this is the filename of the sqlite database. -->
<price-to-pip-multiplier>1</price-to-pip-multiplier> <!-- Generally used to make pips human readable,
e.g. 5 pips instead of 0.0005. -->
<average-spread>0.00015</average-spread> <!-- Used by the backtester when calculating trade profit. -->
<bar-duration-minutes>15</bar-duration-minutes> <!-- If the BarSeries-type is const-time, this is the duration in
minutes of the bar. -->
<const-bar-price>0.0</const-bar-price> <!-- If the BarSeries-type generated Renko bars, this is the
price movement for a new bar to be generated. -->
<delay-minutes-offset>5</delay-minutes-offset> <!-- Optional. Used to delay the start of const-time candles.
For example if 5 is given with 1 hour candles then each
bar will start and finish as 5 mins past the hour. Useful to
allow the market to absorb news before forecasting. -->
</bar-Series>
<!-- we can have as many feature definitions as we need. -->
<model>
<identifier>test-features</identifier> <!-- An identifier for this model. -->
<!-- Each model has one target and an unlimited number of features. --> <!-- Target x bars in the future -->
<target-definition>
<type>bars-in-future</type>
<identifier>target-1-bar</identifier> <!-- Identifier used by other sections (e.g. SVM) that use this target. -->
<bar-series>EURUSDrenko</bar-series> <!-- bar-series that the forecasts are for. -->
<number>1</number> <!-- The number of bars in the future to forecast. -->
<target-value-multiplier>10000.0</target-value-multiplier> <!-- multiply the final target by this value. Can make
a difference for some regression predictors. -->
<price-type>close</price-type> <!-- high | low | close | up-down where up-down is used for classification. -->
</target-definition>
<!-- Day of Week -->
<feature>
<type>day-of-week</type>

CHAPTER 10. COMMANDLINE TOOLS


84

94

93

92

91

90

89

88

87

86

85

84

83

82

81

80

79

78

77

76

75

74

73

72

71

70

69

68

67

66

65

64

63

62

61

60

59

58

57

56

55

54

53

52

51

50

49

48

<!-- single | binary*


single: means a single number (0-6)
binary: returns a one-hot vector. -->

</feature>
<!-- Hour of day -->
<feature>
<type>hour-of-day</type>
<period>H1</period> <!-- H1 | H4* | single. H1 and H4 return one-hot binary vectors. single return hour of day
as a single number. -->
</feature>
<!-- Moving Average -->
<feature>
<type>moving-average</type>
<ma-attribute-type>close</ma-attribute-type> <!-- The price type to compute the MA on. One of:
volume | num-trades | vwap | bar-duration |
mins-between-high-low | time-high | time-low |
open | high | low | close | average-close |
average-hlc -->
<value-type>diff</value-type> <!-- Takes one of the following values:
value : use the MA value as is.
diff : use the differences between consecutive MA values.
diff-with-price : use the difference between the
MA and the close price. -->
<period>21</period> <!-- The moving average period. -->
<selection-list>1,2,3,4,5,7,9,13,16,20,25</selection-list> <!-- The indexes which the MA
differences are calculated. -->
<number>5</number> <!-- If selection-list is not given, specifies the number of past values to use
as features. -->
<bar-series>EURUSDH4</bar-series> <!-- The bar-series to calculate the moving average on. -->
</feature>
<!-- Bar difference -->
<feature>
<type>bar-diff</type>
<diff-type>close</diff-type> <!-- high | low | close* | up-down (used for Renko, will output +1 or -1
depending on if the bar is up or down) | high-open | close-low |
high-close | open-to-close | prev-close-to-open -->
<bar-series>EURUSDr20</bar-series> <!-- The bar-series that the differences are extracted from. -->
<scale-type>min-max</scale-type> <!-- zscore | none* | min-max | div-sd | div-max | log10 -->
<min-max-clamp>0.015</min-max-clamp> <!-- Optional price clamp - any values greater (or less than the -ve value)
are truncated. -->
<number>30</number> <!-- The number of historical bars to compute the difference between. If set to 0 the
selection-list will be used. -->
<selection-list>1,2,3,4,5,7,9,13,16,20,25</selection-list> <!-- If specified, this is the index of the bars
to compute the difference between. -->
</feature>
<!-- Bar/Candle Attribute -->
<feature>

<representation>single</representation>

CHAPTER 10. COMMANDLINE TOOLS


85

141

140

139

138

137

136

135

134

133

132

131

130

129

128

127

126

125

124

123

122

121

120

119

118

117

116

115

114

113

112

111

110

109

108

107

106

105

104

103

102

101

100

99

98

97

96

95

<type>bar-attribute</type>
<attribute-type>volume</attribute-type> <!-- volume | num-trades | vwap | duration-minutes | mins-between-high-low
minute-low : the number of minutes from the candle open to the low
minute-high : the number of minutes from the candle open to the high
average-close : average of the 1 minute close prices that the candle
is generated from.
average-hlc : average high/low/close. -->
<value-type>value</value-type> <!-- value* (actual unmodified value) | diff (difference between bars) | num-bars
(the smallest number of bars that the supplied volume parameter will fit. -->
<const-volume>1000</const-volume> <!-- Used when the volume-type is defined as num-bars. -->
<bar-series>emini-5min</bar-series> <!-- The bar-series that the differences are extracted from. -->
<number>30</number> <!-- The number of historical bars used to compute the attributes. -->
<scale-type>min-max</scale-type> <!-- zscore | none* | min-max | div-sd | div-max | log10 -->
</feature>
<!-- CSV Feature -->
<feature>
<type>csv-feature</type>
<filename>C:\IBFX-MT4-AU\experts\files\EURUSD_CCI.csv</filename>
<identifier>cci_feature</identifier>
<value-type>diff</value-type> <!-- Defines how to process the CSV values. Must be one of:
diff - the difference between values whose indexes are specified
by the selection-list. For example if the list is
specified as 1,2,5 then the differences use in the
model will be the difference between the value at
0 and 1, 1 and 2, 2 and 5.
value - use the value directly. The first element in the
selection-list should be 0. -->
<selection-list>1,2,3,5,8,13,21,34</selection-list> <!-- The indexes of the values in the CSV
file to compute the difference between. -->
</feature>
<!-- Fundamental Indicator -->
<feature>
<type>fundamental-indicator</type>
<bar-series>EURUSDh4</bar-series> <!-- Used to calculate the number of candles before and after an event. -->
<title>Change in Non-farm Payrolls</title> <!-- One of the following values:
Change in Non-farm Payrolls
Advance Retail Sales
Consumer Confidence
Consumer Price Index (yoy)
consumer price index ex food & energy (yoy)
Durable Goods Orders
Federal Open Market Committee Rate Decision
Gross Domestic Product (Annualized)
Gross Domestic Product Price Index
ISM Manufacturing
Personal Consumption
U. of Michigan Confidence

CHAPTER 10. COMMANDLINE TOOLS


86

188

187

186

185

184

183

182

181

180

179

178

177

176

175

174

173

172

171

170

169

168

167

166

165

164

163

162

161

160

159

158

157

156

155

154

153

152

151

150

149

148

147

146

145

144

143

142

<currency>USD</currency> <!-- USD | EUR -->


<scale-type>zscore</scale-type> <!-- zscore | none* | min-max | div-sd | div-max | log10 -->
</feature>
</model>
<!-- Predictor Ensemble -->
<predictor-ensemble>
<use-recorded-signals>false</use-recorded-signals> <!-- true | false. If signals have not changed but you want to
test new stoploss/take profit, etc, set to true -->
<retrain-period>Weekly</retrain-period> <!-- Defines when the retraining is done. Takes one of:
each-bar : retrain after each bar.
daily
: retrain each day at 00:00.
weekly
: retrain after the first bar on Monday.
monthly : retrain after the first bar on the first Monday
of the month. -->
</predictor-ensemble>
<!-- Extremely Randomised Trees predictor. -->
<extremely-randomised-trees-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<identifier>random-forest-1</identifier> <!-- The name of this predictor, printed in log files. -->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<params> <!-- Parameters for a mini ensemble. Need at least one set, can have as many as we want. -->
<num-trees>1000</num-trees> <!-- Number of trees. -->
<depth>8</depth> <!-- Depth of an individual tree. -->
</params>
<num-training-observations>1000</num-training-observations> <!-- The number of training instances used to
create a training set. -->
<num-training-skip>1</num-training-skip> <!-- When generating the training set, skip this many samples
between collecting samples. -->
</extremely-randomised-trees-predictor>
<!-- Gradient boosted tree predictor. -->
<gbt-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<identifier>gbt-1</identifier> <!-- The name of this predictor, printed in log files. -->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<params> <!-- Parameters for a mini ensemble. Need at least one set, can have as many as we want. -->
<num-trees>500</num-trees> <!-- Number of trees. -->
<depth>8</depth> <!-- Depth of an individual tree. -->
</params>
<num-training-observations>1000</num-training-observations> <!-- The number of training vectors used to
create a training set. -->
<num-training-skip>1</num-training-skip> <!-- When generating the training set, skip this many samples
between collecting samples. -->
</gbt-predictor>

Unemployment Rate
Initial Jobless Claims-->

CHAPTER 10. COMMANDLINE TOOLS


87

235

234

233

232

231

230

229

228

227

226

225

224

223

222

221

220

219

218

217

216

215

214

213

212

211

210

209

208

207

206

205

204

203

202

201

200

199

198

197

196

195

194

193

192

191

190

189

<!-- Linear SVM predictor. -->


<linear-svm-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<identifier>linear-svm-1</identifier> <!-- The name of this SVM, printed in log files. -->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<params> <!-- Definition of a linear SVM instance. We can have as many of these as we want. -->
<penalty>1.0</penalty> <!-- SVM parameter C. -->
<epsilon>0.5</epsilon> <!-- SVM parameter for e-insensitivity. Only required for regression. -->
<solver-type>L2R_L2LOSS_SVC_DUAL</solver-type> <!-- L2R_LR | L2R_L2LOSS_SVC_DUAL | L2R_L2LOSS_SVC |
L2R_L1LOSS_SVC_DUAL | MCSVM_CS | L1R_L2LOSS_SVC |
L1R_LR | L2R_LR_DUAL | L2R_L2LOSS_SVR |
L2R_L2LOSS_SVR_DUAL|L2R_L1LOSS_SVR_DUAL -->
</params>
<num-training-observations>1000</num-training-observations> <!-- The number of training vectors used to
create a training set. -->
<num-training-skip>1</num-training-skip> <!-- When generating the training set, skip this many samples
between collecting samples. -->
</linear-svm-predictor>
<!-- Multi-Layer Perceptron predictor. -->
<multi-layer-perceptron-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<identifier>random-forest-1</identifier> <!-- The name of this predictor, printed in log files. -->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<params> <!-- Parameters for a mini ensemble. Need at least one set, can have as many as we want. -->
<training-algo>rprop</training-algo>
<hidden-layers>20</hidden-layers> <!-- Comma separated list of the number of neurons in hidden
layers. Example: a single hidden layer with 20 neurons use "20"
Two hidden layers with 40 and 20 neurons, use "40,20". -->
<activation-function>sigmoid</activation-function> <!-- identity|sigmoid|gaussian -->
<max-iterations>1000</max-iterations> <!-- Stop training after this many iterations. -->
<termination-epsilon>0.01</termination-epsilon> <!-- Stop training when the change in error drops below
this value. Use 0.0 to disable. -->
<forecast-type>classification</forecast-type> <!-- classification|regression. Use classification to forecast
up/down, and regression to forecast the actual
price move. -->
<classification-output>binary</classification-output> <!-- binary|value. If the forecast type is classification
the predictor can output either a binary (1,-1)
or a value relating to the confidence,
eg, 0.54 for up, -0.28 for down. -->
</params>
<num-training-observations>1000</num-training-observations> <!-- The number of training vectors used to
create a training set. -->
<num-training-skip>1</num-training-skip> <!-- When generating the training set, skip this many samples
between collecting samples. -->
</multi-layer-perceptron-predictor>

CHAPTER 10. COMMANDLINE TOOLS


88

282

281

280

279

278

277

276

275

274

273

272

271

270

269

268

267

266

265

264

263

262

261

260

259

258

257

256

255

254

253

252

251

250

249

248

247

246

245

244

243

242

241

240

239

238

237

236

<!-- Random forest predictor. -->


<random-forest-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<identifier>random-forest-1</identifier> <!-- The name of this predictor, printed in log files. -->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<params> <!-- Parameters for a mini ensemble. Need at least one set, can can have as many as we want. -->
<num-trees>1000</num-trees> <!-- Number of trees. -->
<depth>8</depth> <!-- Depth of an individual tree. -->
</params>
<num-training-observations>1000</num-training-observations> <!-- The number of training vectors used to
create a training set. -->
<num-training-skip>1</num-training-skip> <!-- When generating the training set, skip this many samples
between collecting samples. -->
</random-forest-predictor>
<!-- SVM Predictor -->
<svm-predictor>
<model>15min-features</model> <!-- The model where we get training data. -->
<continuous-tune>False</continuous-tune> <!-- True|False. Set to True to perform a parameter search after
each forecast. Note that params sections are ignored if this
is set to True as it will generate params sections using the
best parameters. -->
<continuous-tune-num-param-sets>5</continuous-tune-num-param-sets> <!-- The number of the top parameter sets to use
after a parameter search if continuous-tune
has been set to True. -->
<model-min-accuracy>54.0</model-min-accuracy> <!-- When continuous tune is true, only use parameter sets
that produce a model with accuracy of this value
or higher. If a new model cannot be found, the existing
one is retained. -->
<no-model-behaviour>dont-trade</no-model-behaviour> <!-- Defines what to do when a model with cross-validated accuracy
set by <model-min-accuracy> cannot be found.
Takes one of the following values:
dont-trade - Close all orders and do not forecast
use-last-model - Use the last best model(s)
use-default-params - Use the parameters defined in the
<params> section.-->
<predictor-weight>1.0</predictor-weight> <!-- The weighting to use for this predictor by the
signal generator when assessing all predictors. -->
<identifier>svm-1</identifier> <!-- The name of this SVM, printed in log files. -->
<params> <!-- Definition of an SVM instance. We can have as many of these as we want. -->
<penalty>32.0</penalty> <!-- SVM parameter C. -->
<gamma>0.03125</gamma>
<!-- SVM parameter for Gaussian width. -->
<epsilon>0.5</epsilon>
<!-- SVM parameter for e-insensitivity. Only required for SV regression. -->
<coeff>0.0</coeff>
<!-- SVM parameter used for the polynomial and sigmoid kernels. -->
<svm-type>SVR</svm-type> <!-- SVM type, SVC for classification, SVR for regression. -->
<kernel>rbf</kernel> <!-- SVM kernel. linear | polynomial | rbf | sigmoid -->
</params>

CHAPTER 10. COMMANDLINE TOOLS


89

329

328

327

326

325

324

323

322

321

320

319

318

317

316

315

314

313

312

311

310

309

308

307

306

305

304

303

302

301

300

299

298

297

296

295

294

293

292

291

290

289

288

287

286

285

284

283

</svm-predictor>
<signal-generator>
<entry-times> <!-- Defines the days and hours-of-day that orders can be sent. -->
<hour>all</hour> <!-- Comma delimited list of hours. use all for all hours. -->
<day-of-week>all</day-of-week> <!-- Comma delimited list of days where days are
represented by a three character abbreviation,
e.g tue,wed,thu. Use all for all days. -->
</entry-times>
<combine-forecasts-func-name>CombineForecasts</combine-forecasts-func-name> <!-- Optional Python function to
combine the individual forecasts
to generate buy/sell/exit signals.
<set-parameter-value-func-name>SetParameterValue<set-parameter-value-func-name> <!-- Optional Python function to
set parameters used by the
CombineAlphas function.
<entry-threshold>0</entry-threshold> <!-- The predictor ensemble must return an absolute value
of at least this to trade. -->
<take-profit>60.0</take-profit> <!-- Static take profit in pips. -->
<stop-loss>40.0</stop-loss> <!-- Static stop loss in pips. -->
<break-even>20.0</break-even> <!-- Move to break even + 1 pip when a trade is
this many pips in profit. -->
<exit-all-hour>-1</exit-all-hour> <!-- Set to an hour to exit all positions, or -1 to
leave positions open indefinitely. -->
<trade-bar-series>EURUSDrenko</trade-bar-series> <!-- bar-series to place orders. -->
<reverse-signals>False</reverse-signals> <!-- Buy when we get a sell signal and Sell when
we get a buy signal. -->
</signal-generator>
<trader>
<add-to-existing>true</add-to-existing> <!-- Use pyramiding. -->
<max-position>10</max-position> <!-- Max number of positions. -->
<scale-out>false</scale-out> <!-- When we get a reverse signal, reduce the position if True,
otherwise close and reverse. -->
<limit-orders>True</limit-orders> <!-- True|False. Set to True to use limit orders, False for market orders. -->
<limit-order offset="2.0">True</limit-order> <!-- Place limit orders in the backtester rather than market orders.
This example uses an offset of two pips, with the
bar-series.price-to-pip-multiplier set to 10000. If the
bar-series.price-to-pip-multiplier was set to 1, the offset
would be 0.0002. -->
<hold-minutes>0</hold-minutes> <!-- Close any position open for this many minutes. -->
<max-drawdown>10000</max-drawdown> <!-- Stop backtest if drawdown exceeds this amount (in pips). -->
</trader>
<genetic-algo>
<ga-server>tcp://wraith</ga-server> <!-- Server where the GA is run. -->

<num-training-skip>1</num-training-skip>

<!-- The number of training vectors used to


create a training set. -->
<!-- When generating the training set, skip this many samples
between collecting samples. -->

<num-training-observations>1000</num-training-observations>

CHAPTER 10. COMMANDLINE TOOLS


90

355

354

353

352

351

350

349

348

347

346

345

344

343

342

341

340

339

338

337

336

335

334

333

332

331

330

<ga-server-port>55566</ga-server-port> <!-- TCP port on the machine where the GA is run. -->
<genome-id>-1</genome-id> <!-- Used internally. ID for the genome under test. -->
<timeout-minutes>360</timeout-minutes> <!-- Stop all backtests after this many minutes and
start the next generation. -->
<population-size>20</population-size> <!-- Number of genomes in a the population. -->
<objective-function></objective-function> <!-- The objective that we are optimising. One of:
pnl | sharpe | accuracy. -->
<mutation-probability>10</mutation-probability> <!-- Probability of a mutation in a genome. -->
<num-breeders-percent>30</num-breeders-percent> <!-- Top percentage of genomes in a population
to use as breeders for the next generation. -->
<min-num-breeders>30</min-num-breeders> <!-- The minimum number of genomes to use as breeders. -->
<num-new-random-genomes>2</num-new-random-genomes> <!-- Number of random genomes to create
for each generation. -->
<num-generations>15</num-generations> <!-- Stop after this many generations. -->
<!-- The entries below are examples on how to define parameters for optimisation. -->
<parameter id="stop-loss" type="integer" low="10" high="200" step="5" />
<parameter id="time-of-day" type="categorical" values="h1,h4,single,none" />
<parameter id="SVC-penalty" type="exp-2" low="1" high="15" step="1" />
</genetic-algo>
<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2013-12-08</stop-date>
<use-recorded-signals>False</use-recorded-signals>
<display-progress>False</display-progress> <!-- display the trade progress as the backtest runs. -->
</backtest>
</config>

CHAPTER 10. COMMANDLINE TOOLS


91

Chapter 11

Fundamental Indicators
(Experimental)
Currently under development is support for fundamental indicators. These are news items that
influence exchange rates such as non-farm payroll, GDP, etc. This functionality is still under development, but you are free to experiment. It will be moved into mainstream functionality only
after thorough testing and tidying of loose ends (such as automated updating of values).
A Python script has been provided which should create and populate a fundamental database.
The script is run using the following command:
python create_fundamentals_db.py --dir <where to save db>
This will create a database named fundamentals.db in the directory specified by the parameter
--dir. Data is downloaded from dailyfx.com.

11.1

Fundamental Feature

A sample configuration snippet to define a fundamental indicator is given below.


<feature>
<type>fundamental-indicator</type>
<bar-series>EURUSDh4</bar-series>
<title>Advance Retail Sales</title>
<currency>USD</currency>
<scale-type>zscore</scale-type>
</feature>

The title attribute defines what the indicator is. The currently available indicators are listed
in table 11.1. We must also include a fundamentals-db section that defines the location of the
fundamentals database. Below is an example configuration snippet:
<fundamentals-db>
<data-file-dir>c:\fx_database</data-file-dir>
<db-file>fundamentals.db</db-file>
</fundamentals-db>

92

CHAPTER 11. FUNDAMENTAL INDICATORS (EXPERIMENTAL)

93

Table 11.1: title values for the fundamental-indicator feature.


Fundamental Indicator title
Advance Retail Sales
Consumer Confidence
Consumer Price Index (yoy)
Consumer price index ex food & energy (yoy)
Durable Goods Orders
Federal Open Market Committee Rate Decision
Gross Domestic Product (Annualized)
Gross Domestic Product Price Index
ISM Manufacturing
Personal Consumption
U. of Michigan Confidence
Unemployment Rate
Initial Jobless Claims

Currency
USD
USD
USD
USD
USD
USD
USD
USD
USD
USD
USD
USD
USD

The attributes generated for each fundamental-indicator feature are:


1. Most recent release consensus forecast value.
2. Most recent release actual value.
3. Most recent release previous value.
4. Next release consensus forecast value (if the release is in the next 7 days).
5. Binary missing value indicator if no value for the next release consensus forecast value.
6. Binary missing value indicator if no value for the previous release.
7. Number of bars since the previous release.
8. Number of bars to the next release.

Chapter 12

Tutorial: Preparing the


Commandline
This quick tutorial show how to customise the Windows commandline window. We prefer a
larger default window size and a better looking font. Follow these steps to setup your own nicer
looking commandline window.

12.1

Step 1: Open the commandline

Assuming Windows 7, from the start menu, select All Programs DeepThought
DeepThought CommandLine.

Figure 12.1: Opening the DeepThought Commandline

94

CHAPTER 12. TUTORIAL: PREPARING THE COMMANDLINE

12.2

95

Step 2: Open the defaults window

When the Windows commandline window appears, click on the top-right icon to drop down a
menu. Select Defaults from this menu.

Figure 12.2: Opening the Defaults Window

12.3

Step 3: Change the font

In the defaults window, first change the font to something nicer. We prefer Consolas. Click
the Font tab to access the font settings.

Figure 12.3: Changing the Font

CHAPTER 12. TUTORIAL: PREPARING THE COMMANDLINE

12.4

96

Step 4: Change the default window size

You can also change the default window size by clicking the Layout tab.

Figure 12.4: Changing the Window Size/Layout


Now click Ok to save your changes. You will need to exit and re-open the commandline Window
to see the changes.

Chapter 13

Tutorial: Backtesting in
DeepThought and MT4
This tutorial details the steps in running a backtest in DeepThought and optimising EA parameters using Metatraders strategy tester.

13.1

Step 1: Edit the configuration

The first step is to use your favourite text editor to edit the XML configuration. We use the
freeware application Notepad++ available from http://notepad-plus-plus.org/. In this
tutorial we are using the sample configuration in ExampleConfigs\EURUSD Single GBT (named
config.xml). You can copy this directory as a starting point for your own experiments. This
particular configuration contains a single gradient boosted tree.

97

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

98

Figure 13.1: Editing the Configuration

13.2

Step 2: Start the DeepThought backtest

Open the DeepThought commandline from the start menu: select All Programs
DeepThought DeepThought CommandLine. Use the --backtest option as shown
in figure 13.2.

Figure 13.2: Starting the Backtest


After the backtest has completed the commandline window will look similar to figure 13.3

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

99

Figure 13.3: The Completed Backtest

13.3

Step 3: Copy files to Metatrader

The Metatrader EA for testing signals produced by DeepThought is located in C:\DeepThought\Metatrader EA. Copy this to the Metatrader experts folder. In this example we are using the broker InterbankFX, installed into a custom location C:\IBFX-MT4-AU.
Copy DeepThought Signal Tester.mq4 from C:\DeepThought\Metatrader EA to C:\IBFX-MT4-AU\experts.
Next we copy the recorded signals where Metatrader can read them. It is a limitation of
Metatrader that an EA can only read/write from a single directory. This directory changes
depending on whether the EA is being run on a chart or in the strategy tester. For our purpose
here we will be running the EA in the strategy tester, so we must copy ExampleConfigs\EURUSD Single GBT\recorded.signals.csv to C:\IBFX-MT4-AU\tester\files. Note that we
do not need to copy the DLL as the EA is only reading from recorded.signals.csv and not
generating any new signals.
If Metatrader is running exit and restart to load the EA.

13.4

Step 4: Modify the EA

At this stage we could make a copy of DeepThought Signal Tester.mq4 and include any custom
logic we wish to combine/include the DeepThought signals with any existing system. In this
example well use the EA as is and use Metatraders strategy tester to optimise the take-profit
and stop-loss settings.

13.5

Step 5: Running Metatrader Strategy Tester

If the Strategy Tester is not visible in Metatrader, open it by selecting View Strategy
Tester. Select the DeepThought signal tester EA and currency. Set the timeframe to M1 for

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

100

the most accurate results as shown in figure 13.4. The EA name and currency may be different
if you have renamed or modified the EA or are testing a different currency pair. It is best just
the run the EA to see that everything works. Check that the strategy tester Report tab show
the Total Trades roughly the same as the number of signals in the recorded.signals.csv
file. In this example we made roughly 34% profit over the year 2013 with a 14% drawdown with
starting capital of $10,000 with a lot size of 1.

Figure 13.4: Metatrader tester setup

Figure 13.5: The Completed Backtest

13.6

Step 6: Optimisation with MT Strategy Tester

The final step is to see if we can improve on the raw results above. In your EA you may have
other parameters related to trading logic youve added. In this example well only optimise the
take-profit and stop-loss settings.
Click Expert Properties and tick the Genetic Algorithm check box in the Testing tab as
shown in figure 13.6.

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

101

Figure 13.6: Enabling the Genetic Optimisation in Metatrader


Click on the Inputs tab and set the stop-loss and take-profit entries as shown in figure
13.7. Here we are optimising both stop-loss and take-profit to take values between 10 and
120 pips, with increments of 10 pips. Note that the EA takes these values as changes in actual
price rather than pips, so 120 pips is entered as 0.0120.

Figure 13.7: Selecting which Parameters to Optimise


Click Ok to save and close the window. Make sure the Optimisation option in the strategy
tester window is checked as shown in figure 13.11 then click Start to begin the optimisation.
This may take a while depending on what options have been set. You can watch the progress
by clicking on the Optimisation Results tab.

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

102

Figure 13.8: Enabling Optimisation in the Strategy Tester

13.7

Step 7: Analyse the Results

After the Metatrader optimiser has finished we should have a list of results similar to the results
shown below.

Figure 13.9: List of the Best Results of the Metatrader Optimiser


Here we see that a take-profit of 0.0120 and a stop-loss of 0.0110 (representing 120 and
110 pips respectively) produce the best results in terms of profit. Double click on this line and
the values will be automatically loaded into the tester. Click Start to start the test with these
values.
Below we can see that we have moderately increased the profit from 34% to 37% with around
the same drawdown.

Figure 13.10: Report of the Optimum Settings

CHAPTER 13. TUTORIAL: BACKTESTING IN DEEPTHOUGHT AND MT4

Figure 13.11: Graph of a Test With Optimum Settings

103

Appendix A

Sample Configuration
The configuration below uses an ensemble of 10 SVM predictors. Much more are possible but
only 10 are shown for brevity. These were selected after performing an SVM parameter grid
search. The features are generated from EURUSD on an H4 timeframe. The forecast is for
the close of the next H4 candle. Historical data is stored in the database eurusd.db located
in C:\FX Database. The features used are hour-of-day, the previous 80 average-close price
differences, and 16 differences of moving average with periods 5, 10, 20, 50, 100. The differences
are spaced out over 100 candles as defined by the selection-list. The scaling is min-max
for all features. A genetic-algo section is present and will be ignored during backtest, live and
paper trading. You can add XML comments and extra XML sections are ignored.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39

<!-- Multiple SVM example. -->


<config>
<bar-series>
<identifier>EURUSDm1</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="database">eurusd.db</source>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>1</bar-duration-minutes>
<const-bar-price>0.0</const-bar-price>
</bar-series>
<bar-series>
<identifier>EURUSDh4</identifier>
<bar-series-type>const-time</bar-series-type>
<source type="bar-series">EURUSDm1</source>
<history-source-in>EURUSDm1</history-source-in>
<history-source-type>bar-series</history-source-type>
<price-to-pip-multiplier>10000.0</price-to-pip-multiplier>
<average-spread>0.0</average-spread>
<bar-duration-minutes>240</bar-duration-minutes>
<delay-minutes-offset>0</delay-minutes-offset>
</bar-series>
<bar-series-collection>
<data-file-dir>C:\FX_Database</data-file-dir>
</bar-series-collection>
<model>
<identifier>h4-features</identifier>
<target>
<type>bars-in-future</type>
<identifier>target-1-bar-in-future</identifier>
<bar-series>EURUSDh4</bar-series>
<number>1</number>
<price-type>up-down</price-type>
</target>
<feature>
<type>hour-of-day</type>
<period ga-subst="time-of-day">h4</period>
</feature>
<feature>

104

APPENDIX A. SAMPLE CONFIGURATION


40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112

<type>bar-attribute</type>
<attribute-type>average-close</attribute-type>
<value-type>diff</value-type>
<bar-series>EURUSDh4</bar-series>
<number>80</number>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>5</period>
<selection-list>0,1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<value-type>diff-with-price</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>10</period>
<selection-list>0,1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<value-type>diff-with-price</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>21</period>
<selection-list>0,1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<value-type>diff-with-price</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
<feature>
<type>moving-average</type>
<ma-attribute-type>average-close</ma-attribute-type>
<period>50</period>
<selection-list>0,1,2,3,4,5,7,9,13,16,20,25,31,45,55,70,100</selection-list>
<value-type>diff-with-price</value-type>
<bar-series>EURUSDh4</bar-series>
<scale-type>min-max</scale-type>
<outlier-percentile>1</outlier-percentile>
</feature>
</model>
<fundamentals-db>
<data-file-dir>c:\fx_database</data-file-dir>
<db-file>fundamentals.db</db-file>
</fundamentals-db>
<svm-predictor>
<identifier>svm-c-rbf</identifier>
<model>h4-features</model>
<continuous-tune>false</continuous-tune>
<params> <!-- 53.1% -->
<penalty>128</penalty>
<gamma>0.125</gamma>
<forecast-weight ga-subst="w1">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 52.8% -->
<penalty>2048</penalty>
<gamma>0.125</gamma>
<forecast-weight ga-subst="w2">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 52.7% -->
<penalty>32768</penalty>

105

APPENDIX A. SAMPLE CONFIGURATION


113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185

<gamma>0.000488281</gamma>
<forecast-weight ga-subst="w3">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 52.5% -->
<penalty>128</penalty>
<gamma>0.0078125</gamma>
<forecast-weight ga-subst="w4">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 52.4% -->
<penalty>32768</penalty>
<gamma>0.125</gamma>
<forecast-weight ga-subst="w5">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 51.7% -->
<penalty>2048</penalty>
<gamma>0.0078125</gamma>
<forecast-weight ga-subst="w6">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 51.3% -->
<penalty>0.5</penalty>
<gamma>0.0078125</gamma>
<forecast-weight ga-subst="w7">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 51.3% -->
<penalty>32768</penalty>
<gamma>0.0078125</gamma>
<forecast-weight ga-subst="w8">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 50.7% -->
<penalty>0.03125</penalty>
<gamma>3.05176e-005</gamma>
<forecast-weight ga-subst="w9">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<params> <!-- 50.7% -->
<penalty>0.5</penalty>
<gamma>3.05176e-005</gamma>
<forecast-weight ga-subst="w10">1.0</forecast-weight>
<svm-type>SVC</svm-type>
<kernel>rbf</kernel>
</params>
<num-training-observations>800</num-training-observations>
<num-training-skip>1</num-training-skip>
</svm-predictor>
<predictor-ensemble>
<retrain-period>weekly</retrain-period> <!-- each-bar | daily | weekly | monthly -->
</predictor-ensemble>
<signal-generator>
<entry-times>
<hour>all</hour>
<day-of-week>all</day-of-week>
</entry-times>
<target-trigger>h4-features</target-trigger>
<entry-threshold>0.0</entry-threshold>
<take-profit>0.0</take-profit>
<stop-loss>0.0</stop-loss>
<break-even>0.0</break-even>
<trade-bar-series>EURUSDm1</trade-bar-series>
<reverse-signals>False</reverse-signals>
</signal-generator>

106

APPENDIX A. SAMPLE CONFIGURATION


186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225

107

<trader>
<hold-minutes>0</hold-minutes>
<hold-bars>0</hold-bars>
<max-drawdown>100000</max-drawdown>
<close-at-weekend>False</close-at-weekend>
<scale-out>False</scale-out>
<max-position>100</max-position>
<direction>both</direction>
<limit-orders offset="0.0">False</limit-orders>
</trader>
<backtest>
<start-date>2013-01-01</start-date>
<stop-date>2014-03-30</stop-date>
<use-recorded-signals>False</use-recorded-signals>
<display-progress>True</display-progress>
<execute-when-complete>python "C:\DeepThought\python\analyse_backtest_results.py" %CONFIG_LOCATION%</
execute-when-complete>
</backtest>
<genetic-algo>
<ga-server>tcp://localhost</ga-server>
<ga-server-port>55566</ga-server-port>
<genome-id>-1</genome-id>
<timeout-minutes>360</timeout-minutes>
<population-size>20</population-size>
<mutation-probability>10</mutation-probability>
<num-breeders-percent>30</num-breeders-percent>
<min-num-breeders>30</min-num-breeders>
<num-new-random-genomes>2</num-new-random-genomes>
<num-generations>10</num-generations>
<parameter id="w1" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w2" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w3" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w4" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w5" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w6" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w7" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w8" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w9" type="categorical" values="-1.0,0.0,1.0" />
<parameter id="w10" type="categorical" values="-1.0,0.0,1.0" />
</genetic-algo>
</config>

Appendix B

Condor Setup and Operation


The purpose of this chapter is to enable you to install and configure Condor. Condor is capable
of much more than given here, the details below are for the purpose of running the genetic
algorithm in DeepThought.

B.1

Installation

The following screen shots in figures B.1 to B.9 show the installation steps to install Condor on
a single computer.

Figure B.1: Condor Setup 1

108

APPENDIX B. CONDOR SETUP AND OPERATION

Figure B.2: Condor Setup 2

Figure B.3: Condor Setup 3

109

APPENDIX B. CONDOR SETUP AND OPERATION

Figure B.4: Condor Setup 4

Figure B.5: Condor Setup 5

110

APPENDIX B. CONDOR SETUP AND OPERATION

Figure B.6: Condor Setup 6

Figure B.7: Condor Setup 7

111

APPENDIX B. CONDOR SETUP AND OPERATION

Figure B.8: Condor Setup 8

Figure B.9: Condor Setup 9

B.1.1

Adding a Condor User

C:\>condor_store_cred add -u myusername@slartibartfast


Account: myusername@slartibartfast
Enter password:
Operation succeeded.

112

APPENDIX B. CONDOR SETUP AND OPERATION

B.2

113

Useful Commands

B.2.1

condor status

The condor status command shows the status of all nodes (cores). The example below shows
the status when no jobs are running:
C:\>condor_status
Name

OpSys

Arch

State

Activity LoadAv Mem

ActvtyTime

slot1@Slartibartfa
slot2@Slartibartfa
slot3@Slartibartfa
slot4@Slartibartfa
slot5@Slartibartfa
slot6@Slartibartfa
slot7@Slartibartfa
slot8@Slartibartfa

WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:23
WINDOWS
X86_64 Unclaimed Idle
0.130 1015 0+00:00:05
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:25
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:26
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:27
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:28
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:29
WINDOWS
X86_64 Unclaimed Idle
0.000 1015 0+00:00:22
Total Owner Claimed Unclaimed Matched Preempting Backfill

X86_64/WINDOWS

Total

C:\>

B.2.2

condor q

The condor q command lists the jobs that are in the Condor queue. The example below shows
the queue when no jobs are running:
C:\>condor_q
-- Submitter: Slartibartfast : <192.168.1.110:1072> : Slartibartfast
ID
OWNER
SUBMITTED
RUN_TIME ST PRI SIZE CMD
0 jobs; 0 completed, 0 removed, 0 idle, 0 running, 0 held, 0 suspended
C:\>

The example below shows jobs in three states:


C:\>condor_q
-- Submitter: Slartibartfast : <192.168.1.110:1072> : Slartibartfast
ID
OWNER
SUBMITTED
RUN_TIME ST PRI SIZE CMD
5.0
myusername
1/2 23:30
0+00:00:02 H 0
3.2 DeepThought.exe
5.1
myusername
1/2 23:30
0+00:00:03 H 0
9.8 DeepThought.exe
5.2
myusername
1/2 23:30
0+00:00:02 H 0
3.2 DeepThought.exe
5.3
myusername
1/2 23:30
0+00:00:03 H 0
3.2 DeepThought.exe
5.4
myusername
1/2 23:30
0+00:00:03 H 0
7.3 DeepThought.exe
5.5
myusername
1/2 23:30
0+00:00:03 H 0
3.2 DeepThought.exe
5.6
myusername
1/2 23:30
0+00:00:02 H 0
3.2 DeepThought.exe
5.7
myusername
1/2 23:30
0+00:00:02 H 0
3.2 DeepThought.exe
5.8
myusername
1/2 23:30
0+00:00:05 R 0
3.2 DeepThought.exe
5.9
myusername
1/2 23:30
0+00:00:05 R 0
3.2 DeepThought.exe
5.10 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.11 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.12 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.13 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.14 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.15 myusername
1/2 23:30
0+00:00:04 R 0
3.2 DeepThought.exe
5.16 myusername
1/2 23:30
0+00:00:00 I 0
3.2 DeepThought.exe
5.17 myusername
1/2 23:30
0+00:00:00 I 0
3.2 DeepThought.exe
5.18 myusername
1/2 23:30
0+00:00:00 I 0
3.2 DeepThought.exe
5.19 myusername
1/2 23:30
0+00:00:00 I 0
3.2 DeepThought.exe
20 jobs; 0 completed, 0 removed, 4 idle, 8 running, 8 held, 0 suspended
C:\>

The states are in the column headed ST. The possible states are:

--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b
--b

APPENDIX B. CONDOR SETUP AND OPERATION


H

Held

Idle

R
X

Running
Stuck

114

Something went wrong with the job and it could not


complete properly. Jobs in this state should normally
disappear after a minute or two.
The job is waiting to be run. In the above example we
are running the genetic algorithm with a population
size of 20, therefore 20 jobs are created per generation.
However we are only running Condor on a single machine with 8 cores, so when the algorithm is first run,
8 jobs should be in the R state, and 12 in the I state
waiting for a slot to free.
The job is running normally.
Some jobs do not clear after being in the H state due
to some quirk in Condor. You can manually remove
them by adding the -forcex option to the condor rm
command as detailed below.
Table B.1: The Job States in Condor.

B.2.3

condor rm

Sometimes you may wish to stop all jobs on the cluster for reasons such as running the wrong
configuration or clearing jobs in the held state (sometimes they appear to get stuck). Use the
condor rm command do do this. For a single user type the following:
C:\>condor_rm myusername
All jobs of user "myusername" have been marked for removal

To remove job for all users, type the following:


C:\>condor_rm -all
All jobs have been marked for removal

When jobs are in the X state, they are generally stuck. Add the -forcex option to the condor rm
commands above.

Index
Attribute, 5
Backtest, 73
Options, 73
Backtesting, 11
Bar Series, 35
Options, 40
Candle
Duration, 36
Candles, 35
Cluster
Condor, 14
Commandline
Defaults, 94
Condor
Installation, 108
Useful Commands, 113
Condor Cluster, 8
Configuration, 11, 35
Example, 104
XML Reference, 84
Data
Importing from Dukascopy, 4
Importing from histdata.com, 4
Importing from Metatrader, 4
Database, 36
location, 40
Ensemble, 8, 36, 68
Ensembles
Bucket of Models, 59
Extremely Randomised Trees, 7, 66

Hour of Day, 43
Moving Average Options, 50
Normalisation, 9
Price Difference, 45, 46
Selection List, 45
Feature Vector, 5
Genetic Algorithm, 14, 36
Configuration, 74
Options, 75
Parameter Types, 16
Parameters, 76
Gradient Boosted Trees, 7, 64
Options, 64
Parameter Grid Search, 82
Label, 5, 55
Linear Support Vector Machine, 36
Options, 63
Live Trading, 20
Manual Trading, 20, 80
Metatrader, 3
EA Options, 22
Importing Data From, 4
Model, 5, 41
Multi-layer Perceptron, 8, 66
Multi-layer Perceptron Options, 67
Multicore Processing, 8
Neural Network, 8, 66
Neural Network Options, 67
Normalisation, 9
Output Files, 13

Feature, 5
Bar Attribute, 47
Binarised, 10, 41
Categorical, 5, 9
Configuration, 42
Continuous, 5, 9, 41
CSV Input, 51
CSV Input Options, 52
Day of Week, 44
Fundamental Indicator, 93

Paper Trading, 13
Pip, 35
Multiplier, 35
Predictor, 5, 8, 36, 58
Ensemble, 68
GBT Options, 64
GBT Parameter Options, 64
Label, 55
Linear SVM Options, 63
115

INDEX
Linear SVM Parameter Options, 63
Random Forest Options, 65
Random Forest Parameter Options,
65
SVM Options, 60
SVM Parameter Options, 61
Python
DeepThought Interface, 33
Feature, 25, 53
Moving Average Cross, 30
numpy, 28
Pandas, 27
Predictor, 30, 68
Signal Generator, 32, 70
Target, 28, 56
Random Forest, 7, 65
Options, 65
Recording Signals, 12
Renko Bars, 38
Signal Generator, 32, 70
Options, 70
Support Vector Machine, 6, 8, 36, 58
Bucket of Models, 59
Hyper Parameters, 8
Kernel Functions, 61
Linear, 62
Options, 60
Parameter Grid Search, 81
Target, 55
Bars in the Future Options, 55
Future Price Change, 55
Python Script, 56
Python Script Options, 57
Trader, 72
Options, 72
Training, 5

116

S-ar putea să vă placă și