Documente Academic
Documente Profesional
Documente Cultură
Contents
1.
2.
3.
4.
5.
6.
7.
Acknowledgement .......................................................................................... 33
References .................................................................................................... 34
Invention
2,000,000 BC
Stone tools
400,000 BC
Use of fire
10,000 BC
Agriculture
5,000 BC
Metalworking
4,000 BC
Writing
3,000 BC
Cities
3,000 BC
The wheel
1440
Printing
1765
Steam engines
1800
Electricity
1879
1885
Automobile
1903
Airplanes
1926
Television
1928
Penicillin
1944
Electronic Computer
1951
1961
Space travel
1979
Wireless phone
1981
Personal computers
1983
The Internet
2000
Mobile Computing
The exponentially rising trend in both can be explained by the fact that every
invention facilitated new discoveries and inventions. An interesting digression
is to ask ourselves: will this trend continue?
2. We see an evolving desire of functionality as the society evolves from these
breakthroughs. With progression of time, these breakthroughs are solving more
complex problems.
The first observation challenges entrepreneurs and businesses more than ever to stay
abreast of new technologies to find and maintain their place in the market amidst
frequent technological disruptions. The second observation suggests that technology
is heading towards intelligent machines. This motivates a discussion on AI.
Science or Fiction?
An artificially intelligent machine exhibits intelligent behavior. In computer science,
the Turing test is a commonly used criteria for intelligent behavior. The test says that
a machines behavior is considered intelligent, if a blinded human evaluator is unable
to distinguish its performance from that of a human. The state of the art AI applications
successfully passes Turing test when this definition is applied to a specific task.
Artificial General Intelligence (a.k.a. strong AI and full AI) refers to hypothetical
machines that can think like humans and perform with full autonomy. These would
pass the Turing test without the need to constrain its definition to specific task.
However, it doesnt appear achievable in the foreseeable future with the current
technologies. It is still in infancy and currently caters to the interest of researchers,
science fiction writers and futurists.
This trend appears exciting but its extrapolation in shaded area might not be realistic.
The industry is facing more than ever challenges to sustain growth with this rate. So
far, the industry was relying on technology scaling i.e. miniaturization of transistors.
However, in the early 2010s we reached a point where quantum effects and
wavelength of lithography light source have been limiting the practical extent of
miniaturization. Quantum effects result in uncertain electrical charge distribution when
separation structures are made too small. Wavelength limitation causes diffraction
during photolithography, making photolithography masks ineffective at smaller feature
size. This has been mitigated to an extent by use of techniques such as immersion
lithography and optical proximity correction. To keep up with the trend at the moment,
the
microprocessor
industry
is
currently
using
and
developing
alternative
Clearly, computer scientists and engineers have been dealing with difficulties
creatively. The momentum of advancements, it seems, will continue to support
growing computation requirements of AI development for the foreseeable future.
As illustrated above [3], with growing use of digital technology and Internet
connectivity, the amount of electronic information available to human kind is trending
similar to the Moores Law. Starting with innovations in web search in late 90s, the
science related to storing and processing large-scale data has been rapidly evolving
under the term Big Data.
The recent Internet of Things (IoT) approach to product design is taking data collection
a step further. In this approach products are connected to a cloud hosted backend
through the Internet with the motive of increasing reach of businesses to consumers.
Businesses can provide new services while collecting usage data for adapting their
services to consumer behavior. Smartphone apps have also been doing the same by
delivering functionality through interactive interfaces instead of physical product.
The phenomenon of massive data growth is enabled by advances in storage media.
Throughout most of the computing history, we have been storing data on magnetic
hard disk drives (HDD). The amount of storage for a given price point has been rising
exponentially similar to the Moores Law. By late 2000s solid-state drives (SSDs), a
Flash memory based storage technology became a serious contender to the magnetic
hard disk market. SSD is being widely adopted by consumers and data centers because
it not only provides performance improvement, it also reduces energy, cooling and
space requirements [4]. SSDs performance benefits are enabling more low-latency
and high-throughput data processing applications.
Another promising storage technology called Phase Change Memory (PCM) being
developed since the 1970s was commercially introduced in 2015 by Intel and Micron
under 3D XPoint trademark. The engineering samples released in 2016 showed 2.4 to
3 times speedup compared to a modern SSD [5]. PCM not only packs more storage,
but could offer new level of performance scalability. And at a certain point, it could be
possible to unify main memory and storage memory in computers, resulting in
computers that persist state in absence of power. Among other benefits, this could
result in large energy savings for cloud infrastructures.
The implication of these continued advances in data storage is that AI algorithms are
being exposed to data about human expressions & processes at an increasing rate.
3. The Forefront of AI
AI is the study of how to make computers do things that at which, at the moment,
humans are better [7]. As machines become increasingly capable, facilities once
thought to require intelligence are removed from the definition. For example, optical
character recognition is no longer perceived as an exemplar of "artificial intelligence"
having become a routine technology. [8]
Intelligent Agents
AI literature frequently deals with the term intelligent agents. An intelligent agent is
an abstract entity that acts on a humans behalf to maximally achieve a given goal
with minimum cost. Cognitive tasks such as planning, prediction, pattern or anomaly
detection, visual recognition and natural language processing could also be goals for
an intelligent agent.
Price ($)
1000
150,200
2000
225,500
3000
451,800
4000
684,500
Price ($)
600,000
500,000
400,000
300,000
200,000
100,000
0
0
1000
2000
3000
sq. ft.
10
4000
5000
Step 2 - Learn from data: There are many model frameworks to choose from. A
model is picked by the designer by their judgement of its suitability to the problem.
That effectively introduces a prior. In this case, we use a linear model because we
know for a fact that housing prices increase somewhat linearly with land area.
=
This simple equation says that house price is times land area . It may be noted
that the equation represents a line with a slope determined by .
Now a training algorithm1 systematically picks a value for , such that the equation is
maximally consistent with the data. Intuitively, it is fitting the line to data. In this
example, the algorithm could choose a value of 150.
12
The challenge is to tune the model in such a way that only necessary detail is
captured by the learning model. Ensemble methods and regularization are
some widely used methods towards this problem.
Feature Engineering
From the discussion on machine learner vs statistician approach, it seems both have
their strengths and weaknesses. In such situations engineers ask the golden question:
why not both?
One way to combine best from both is to use statistical approach and domain expertise
to understand the properties of the data and transform it into a representation that
augments machine learning. This process is called feature engineering. However, it is
manual, expensive and can be ineffective if the problem is complex enough. As we will
see later in the discussion of deep learning, there is an automated approach for
generating representations.
13
Wall Street analysts' consensus earnings estimates are used by the market to judge
stock performance of the company. Investors seek a sound estimate of this year's and
next year's earnings per share (EPS), as well as a strong sense of how much the
company will earn even farther down the road [12]. The approach published by
Bloomberg is as follows:
Step 1: Acquire data
As always, we start with acquiring dataset containing signals that could indicate the
outcome. They acquired the following data for 39 tickers:
1) Daily stock data (OLCV) 2000-2014 from Yahoo! Finance
2) Corresponding actual and predicted earnings from Estimize and Zacks
Investment Research respectively.
From these two, a combined dataset was prepared as shown below:
The screenshot above shows partial contents of one of the 39 combined dataset files
for each ticker. Whats obtained here is time-series data.
Step 2: Feature Engineering
They aggregated rows for each quarter and calculated the following features:
14
Feature Name
Description
yr
Year, as it is.
qtr
Quarter, as it is.
up_day
The number and sum of up-days in the quarter and if the ratio
of sum and total number is > 50%, set the feature to 1, else
0.
p_over_20
p_over_10_ema
p_mom_1
v_mom_1
target
The feature engineered dataset corresponding to screenshot above looks like the
following:
Note that it required human expertise to derive these features. This feature engineered
dataset is an abstract representation: significantly reduced size, losing most
information but retaining only the information assumed by the domain expert to be
important.
Step 3: Learn from Data
For this classification problem they applied the logistic regression, decision trees and
random forest (an ensemble of varied decision trees).
15
Logistic Regression
CONFUSION
Actual
MATRIX
No
Actual
Yes
ACCURACY
Decision Tree
Random Forest
Pred.
Pred.
Pred.
Pred.
Pred.
Pred.
No
Yes
No
Yes
No
Yes
303
25
299
285
591
45
541
613
Actual
No
Actual
Yes
Actual
No
Actual
Yes
65.71
62.2
68.03
54
54
61
RECALL %
66
62
68
F1 %
53
53
56
%
PRECISION
%
Conclusion: Quoting from author: Its a work in progress, but the best model had a
recall of 68% and precision 61%, which is above the 50% mark that is equivalent to
randomly guessing. The models built can be improved by including more stocks and
getting data over a longer period of time, while adding parameter search and cross
validation to the process.
In the next topic, we will attempt to improve this model by using deep learning
approach.
16
4. Deep Learning
In the discussion of feature engineering in the previous topic, the importance of data
representation was emphasized. Deep Learning is a machine learning paradigm that
learns multiple levels of data representation, where each level of representation is
more abstract than the previous one. It has dramatically improved the state-of-theart in speech recognition, visual object recognition, object detection and many other
domains such as drug discovery and genomics. [10] Deep learning is also applicable
in finance wherever improved machine learning performance can be an advantage.
It began in 1950s when the Nobel Laureates Hubel and Wiesel accidently noticed
neuron activity in the visual cortex of a cat, as they moved a bright line across its
retina. During these recordings, they made interesting observation: (1) the neurons
fired only when the line was in a specific place on the retina, (2) the activity of these
neurons changed with orientation of the line, and (3) sometimes the neurons fired only
when the line was moving in a particular direction. [13] With series of experiments
they noticed that there is a hierarchy of pattern filters, with increasing levels of
abstraction across the visual cortex. This eventually revealed the process of visual
perception in the brain. A simplified form of this model is illustrated below.
The image is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.
It is attributed to Randall C. O'Reilly and original work can be found at [14]
Deep Learning borrows two important aspects of the visual perception model:
1) Representation Learning Along Depth: As illustrated in the figure above, it
can be seen that the first set layers of neurons V1 learnt elementary features
from the raw image coming from retina. The second set V2 learnt a more
abstract representation of features generated by V1. As the model layer
17
18
A Brief History
Since 1943, many ANN designs were published, but the 1986 model that used
backpropagation training algorithm was the first ANN model that deep learning
borrows from. Another model was an unsupervised learning model called the
Neocognitron [18] and was published in 1980. It laid the foundation for the now widely
used deep model called Convolutional Neural Network (CNN) which Yann LeCun et al.
trained in 1989 using backpropagation for handwritten digit recognition in US postal
mail. [19] CNNs are more efficient for image recognition as it takes advantage of
spatial properties in image data.
When backpropagation was first introduced its most exciting use was for training
recurrent neural networks (RNNs). [20] RNNs are suitable for speech, language and
other sequential data. RNNs process an input sequence one element at a time,
maintaining in their hidden units a state vector that implicitly contains information
about the history of all the past elements of the sequence. [10]
Researchers had difficulty during the 1990s in training RNNs due to the vanishing
gradient problem which increases with recursion. This problem arises when weights
(model parameter being learnt) are too low, and the repeated multiplication during
training computation converges it to zero. [21]
For RNNs, one workaround was to use a history compression method proposed by
Jrgen Schmidhuber in 1992. [22] Another method was to use gating mechanism to
retain weight indefinitely if required. These models were called Long Short-Term
Memory (LSTM). [23]
A CNN based model broke image classification error records in the 2012 ImageNet
competition which was another major breakthrough attracting significant research
interest. [24] The Google Trends data shows how interest in deep learning has been
trending in the general public since 2012, pushing machine learning along with it.
19
CONFUSION
Actual
MATRIX
No
Actual
Yes
ACCURACY
Random Forest
Pred.
Pred.
Pred.
Pred.
No
Yes
No
Yes
285
613
2167
16759
969
37424
Actual
No
Actual
Yes
69.07
68.03
68.07
61
RECALL %
97.48
68
F1 %
80.85
56
%
PRECISION
%
Though a comparable result would have sufficed to make a point, a higher performance
was obtained. Notice that the confusion matrix has larger values because the timeseries was not aggregated like the feature engineering approach.
Before the used LSTM model is described, the working of an RNN in general is described
and how the LSTM cell improves over it.
20
Source: [25]
RNN is a specialized neural network architecture that learns pattern in sequential data.
When applied to time domain, it models dynamic systems. The left side of the figure
above shows that the RNN cell A is a function of input sample xt at time t and the cellstate at previous time-step. The recursions are shown unfolded on right hand side of
the figure. ht is an output for next (hidden) layer and is a non-linear function of cell
state. This means that the output of an RNN is not just a function of its input, but a
function of the input as well as cells history, allowing it to respond to a trend instead
of just the absolute value of the input in case of ANN.
A problem of vanishing gradients with RNNs was mentioned before. Its effect is that
this model is not capable of learning dependencies which are distant enough in the
sequence. To understand this, consider an example show below:
Source: [25]
Here the dependency means that the cell state and output h t+1 at time t+1 is
dependent on inputs at time 0 and 1. The vanishing gradient problem says that if
such dependencies cannot be learnt effectively if the dependency distance t is large
enough.
21
The Long Short-Term Memory (LSTM) resolved this problem. Consider another
example:
In this language model, predicting the word French depended strongly on the word
France which came 3 words before. Here an RNN can work well. But in a case where
there may be paragraphs between these two parts, the vanishing gradient problem
makes it difficult for an RNN to learn the dependency. This is where the LSTM shines
through its gating mechanism. An LSTM cell looks like the following:
Source: [26]
is the input vector at time t, is the cell state vector, is output gate weight
vector. The weight vector modulates how much of the cell-state propagates to
hidden layer through a multiplier. It was mentioned before that RNN responds to two
things: 1) current input vector and 2) past cell state vector. The input gate
modulates through a multiplier how much of the current input vector is given
weightage in the learning process. The forget gate modulates how much of the
previous-cell state vector is given weightage in the learning process. The activations
of both allows the cell to persist and forget long and short term dependencies. Note
that the values of weight inside these vectors and any neural network lie in the range
0-1.
22
This prepares us to describe the model used in the demonstration which is shown
below:
The LSTM layer shown in dark has 200 units. The representation of input data is vector
h. This is then given as an input to logistic regression: a binary classifier. The output
of logistic regression gives the probability of the company beating consensus estimate
at time t. The program may be found here. [27]
23
2. With the fact that laws of nature are captured by simple physics-based
functions whose order never seem to exceed 4, each layer of a deep model can
efficiently learn a function that represents a causal process in the hierarchy.
Exceptional simplicity of physics-based functions hinges on properties such as
symmetry, locality, compositionality and polynomial log-probability of input
data. [28]
So, the depth of a deep model is more efficient at capturing hierarchy of causal
processes in the statistical process generating the observed data. Therefore, deep
neural networks dont have to approximate every possible mathematical function but
only a tiny subset of them.
24
Spam filtering
Malware detection
Biometric recognition
25
All strategies learnt in Go stay valid throughout gameplay, while in the game
of portfolio that may not be the case. Strategies may need to evolve with time.
The computer simulated large no. of games to learn strategies. Since we cannot
simulate the portfolio game, large amount of historical data is required for the
intelligent agent to learn strategies from it. But then, those strategies could be
outdated to an extent.
Behavioral Finance
Behavioral finance studies the effects of psychological, social, cognitive, and emotional
factors on the economic decisions of individuals and institutions and the consequences
26
for market prices, returns, and resource allocation, although not always that narrowly,
but also more generally, of the impact of different kinds of behavior, in different
environments of varying experimental values. [35].
Opinion mining, sentiment analysis and subjectivity analysis uses natural language
processing to understand information retrieved from social media, news releases and
reports. Deep learning has substantially improved the ability to pick opinion,
sentiments and subjectivity from human expressions. Predictive modelling can
estimate relationship between this information and the financial outcome. Although
general techniques are well known [36], it is a complex phenomenon to capture.
Proprietary models such as those used by IBM and Bloomberg L.P. gain competitive
edge by using more advanced models, data engineering and AI based targeted web
crawlers.
Retail Banking
There is opportunity in increased automation and better risk models to reduce delays
in service pipeline, making the end product more appealing to the smartphone enabled
generation. AI can also gather insights from consumer data and help engineer products
that better engage with clients. State of the art in computational vision and language
abilities have drastically improved, and there is a strong potential in incorporating
these to provide a more natural interaction experience in client side applications.
Personalized engagement is effective at building and maintaining relationship with
clients. Insights from product use data, and data from other channels that client
engages with makes it possible to offer this personalized experience at large scale.
Risk Management
New financial risks evolve and regulations increase with time. The increasing overhead
of modelling financial risk can be managed by making the process of creating new risk
models more efficient. Using a data-driven approach, especially deep learning to
eliminate feature engineering, can improve model performance and make the
modelling process economic and agile.
Resulting model performance benefits would enable more automation in transactions
in the pursuit of delivering seamless banking experience to clients. And resulting agility
of the modelling process would make it easier to prepare the new risk models for new
regulations and evolving risks.
27
The automation aspect not only benefits clients but also reduces operating cost of
services at a given scale, allowing workforce to be used for more intellectual tasks.
Systematic Trading
Systematic trading is a methodical approach to investments and trading decisions
based on well-defined goals and risk controls [37]. It may be partially or fully
automated. Since it is hard for humans to understand, predict, and regulate the trading
activity, there is opportunity to leverage on AI. An intelligent agent can respond
instantly to ever-shifting market conditions, taking into account thousands or millions
of data points every second. Resulting system is a market ruled by precision and
mathematics rather than emotion and fallible judgment from lack of automation [38].
28
It is possible that ToF cameras make their way into smartphones to provide face signin and gesture recognition capabilities. A reliable and convenient multifactor
authentication is then possible by combining 3D face recognition and fingerprint
recognition that is already found in Apple iPhone 5s.
Data from Gaze tracking applied to Human-Computer Interaction: Gaze
tracking has been used for marketing research for a long time but it is also capable of
providing computing experience where the interface reacts to users attention and
intent. Since 2010, it appears that there is patent race on gaze tracking technology
between Google, Microsoft, Apple and a Swedish company Tobii which is the leader in
eye tracking products.
The technology has substantially improved over years and as a result it has entered
into the gaming industry recently. Tobii has released its EyeX sensor in the consumer
market and introduced gaze tracking in major game titles such as Assassins Creed
Syndicate [41], Deus Ex: Mankind Divided [42], Tom Clancys the Division [43] to
name a few. Several products in the market have integrated Tobiis gaze tracking
sensors e.g. the MSI GT72S G laptop and Acers Predator series gaming displays. Tobii
has recently received order from Dell to integrate their sensor in their Alienware IS4
series gaming laptops [44].
Computing interfaces that react to intentions is a new experience for consumers and
could be a revolution in computing. In this pursuit, Tobiis gaming sensor is already
augmenting Microsoft Windows 10 interface by providing on-screen gaze pointing
abilities that reduce use of mouse and keyboard.
The real opportunity for banking lies in the fact that if gaze tracking catches on, the
data available from these sensors is far more indicative of users interests, and
presents a big opportunity for marketing and product engineering in retail banking, as
well as for predictive modelling in behavioral finance. A similar opportunity is in virtual
reality (VR) and augmented reality (AR) applications where users attention can be
approximated by their head movements.
The invasiveness of this technology and data privacy concerns are noteworthy. To
ensure its adoption, there is a challenge in creating compelling value proposition to
counter a possible backlash from consumers.
30
Data for which simple techniques such as thresholds yield substantial false
positives and false negatives.
Ability to handle multiple predictions and branching sequences with high order
statistics.
31
showed, this is sufficient to implement any computable program, as long as you have
enough runtime and memory By learning how to manipulate their memory, Neural
Turing Machines can infer algorithms from input and output examples alone. In other
words, they can learn how to program themselves.
NTMs takes inspiration from biological working of memory & attention and the design
of computers. Unlike a machine learning model which learns an input to output
mapping, NTMs are capable of learning algorithms i.e. instructions that lead to
completion of a task. Alexs research [47] introduced a model that successfully learnt
and performed elementary operations like copy and sort.
Although its research is emergent, having algorithms synthesize new algorithms could
be ground-breaking in AI.
32
Acknowledgement
This content is an extended discussion of the case-study titled Opportunity for Banking
in Data-Driven Predictive Analytics by my team: Jacqueline Zhang, Nicholas Mancini,
Indraneel Bende, Ricky He and myself. It was presented to the domain leads at DB
Global Technology, Cary, NC as a part of the 2016 summer analyst program.
Im thankful for the contributions of my team members in the case-study and for the
inspiring feedback given by the domain leads. I grateful to Bryan Cardillo, Shambhu
Sharan and the rest of dbTradeStore team for keeping me inspired and motivated
throughout this internship program.
33
References
[1] L. Grossman, "2045: The Year Man Becomes Immortal," Time Magazine, 10
February 2011.
[2] J. Demmel, "Communication-Avoiding Algorithms for Linear Algebra and
Beyond," in IPDPS, 2013.
[3] M. Hilbert and P. Lpez, "The worlds technological capacity to store,
communicate, and compute information," science, pp. 60-65, 2011.
[4] D. Floyer, "The IT Benefits of an All-Flash Data Center," 23 March 2015.
[Online]. Available: http://wikibon.com/the-it-benefits-of-an-all-flash-datacenter/.
[5] I. Cutress, "Intels 140GB Optane 3D XPoint PCIe SSD Spotted at IDF,"
AnandTech, 26 August 2016. [Online]. Available:
http://www.anandtech.com/show/10604/intels-140gb-optane-3d-xpoint-pciessd-spotted-at-idf.
[6] K. Freund, "Intel Acquires Nervana Systems Which Could Significantly Enhance
Future Machine Learning Capabilities," Forbes, 9 August 2016. [Online].
Available: http://www.forbes.com/sites/moorinsights/2016/08/09/intelacquires-nervana-systems-which-could-significantly-enhance-future-machinelearning-capabilities. [Accessed 7 September 2016].
[7] E. Rich and K. Knight, Artificial Intelligence (second edition), McGraw-Hill,
1991.
[8] R. C. Schank, "Where's the AI?," AI magazine, p. 38, 1991.
[9] S. Russel and P. Norvig, Artificial Intelligence: A modern approach (third
edition), Prentice Hall, 2010.
[10] Y. LeCun, Y. Bengio and G. Hinton, "Deep learning," Nature, 2015.
[11] K. P. Roberto Martin, "Can Machine Learning Predict a Hit or Miss on Estimated
Earnings?," Bloomberg L.P., 4 February 2016. [Online]. Available:
34
35
36
37
[44] Tobii AB, "Tobii Receives Order from Alienware Regarding the IS4 Eye-Tracking
Platform," 2 September 2016. [Online]. Available:
http://www.businesswire.com/news/home/20160901006614/en/.
[45] Y. Cui, S. Ahmad and J. Hawkins, Continuous online sequence learning with an
unsupervised neural network model, arXiv.org, 2015.
[46] P. Gabrielsson, R. Knig and U. Johansson, "Evolving Hierarchical Temporal
Memory-Based Trading Models," in Applications of Evolutionary Computation,
Vienna, Austria, 2013.
[47] A. Graves, G. Wayne and I. Danihelka, Neural Turing Machines, arXiv.org,
2014.
[48] A. Jakulin, "What is the difference between statistics and machine learning?,"
Quora, 22 December 2012. [Online]. Available: https://www.quora.com/Whatis-the-difference-between-statistics-and-machine-learning/answer/AleksJakulin?srid=OlUS. [Accessed 7 September 2016].
38