Documente Academic
Documente Profesional
Documente Cultură
An introduction to
Machine Learning eMag Issue 50 - April 2017
Book Review:
Andrew McAfee and Real-World, Man-Machine
Erik Brynjolfssons Algorithms
In this article, Edwin Chen and Justin Palmer talk
The Second about the end-to-end flow of developing machine
Machine Age learning models:
where you get training data, how you pick the ML
Andrew McAffee and Erik Bryn-
algorithm, what you must address after your model
jolfsson begin their book The Second
is deployed, and so forth.
Machine Age with a simple ques-
tion: what innovation has had the
greatest impact on human history?
FOLLOW US CONTACT US
GENERAL FEEDBACK feedback@infoq.com
ADVERTISING sales@infoq.com
EDITORIAL editors@infoq.com
A LETTER FROM
THE EDITOR
Machine learning has long powered many products we interact with dailyfrom
intelligent assistants like Apples Siri and Google Now, to recommendation en-
gines like Amazons that suggest new products to buy, to the ad ranking systems
used by Google and Facebook.
More recently, machine learning has entered the public consciousness because of
advances in deep learningthese include AlphaGos defeat of Go grandmaster
Lee Sedol and impressive new products around image recognition and machine
translation.
While much of the press around machine learning has focused on achievements
that were not previously possible, the full range of machine learning methods
from traditional techniques that have been around for decades to more recent ap-
proaches with neural networkscan be deployed to solve many important (but
perhaps more prosaic) problems that businesses face. Examples of these applica-
tions include, but are by no means limited to, fraud prevention, time-series fore-
casting, and spam detection.
InfoQ has curated a series of articles for this introduction to machine learning
eMagazine covering everything from the very basics of machine learning (what
are typical classifiers and how do you measure their performance?), to production
considerations (how do you deal with changing patterns in data after youve de-
ployed your model?), to newer techniques in deep learning. After reading through
this series, you should be ready to start on a few machine learning experiments of
your own.
Read online on InfoQ
Michael Manapat
Machine learning at a high level has been covered classification problems for example, ad-click pre-
in previous InfoQ articles (see, for example, Get- diction.) Along the way, well encounter many of the
ting Started with Machine Learning in the Getting key ideas and terms in machine learning.
a Handle on Data Science e-mag and series), and
this article and the ones that follow it elaborate on
many of the concepts and methods discussed earli- Target: Credit-card fraud
er with emphasis on concrete examples and venture Businesses that sell products online inevitably have
into some new areas, including neural networks and to deal with fraud. In a typical fraudulent transaction,
deep learning. the fraudster will obtain stolen credit-card numbers
and use them to purchase goods online. The fraud-
Well begin, in this article, with an extended case sters will then sell those goods elsewhere at a dis-
study in Python: how can we build a machine-learn- count, pocketing the proceeds, while the business
ing model to detect credit-card fraud? (While well must bear the cost of the chargeback. You can read
use the language of fraud detection, much of what more about the details of credit-card fraud here.
we do may apply with little modification to other
Lets say were an online business that has been ex- False,2015-12-31T23:59:59Z,2359,US,0
periencing fraud for some time, and wed like to use
machine learning to help with the problem. More spe- False,2015-12-31T23:59:59Z,1480,US,3
cifically, every time a transaction is made, wed like to
predict whether or not itll turn out to be fraudulent False,2015-12-31T23:59:59Z,535,US,3
(i.e., whether or not the authorized cardholder is mak-
False,2015-12-31T23:59:59Z,1632,US,0
ing the purchase) so that we can take appropriate ac-
tion. This type of machine-learning problem is known False,2015-12-31T23:59:59Z,10305,US,1
as classification as we are assigning every incoming
payment to one of two classes: fraud or not-fraud. False,2015-12-31T23:59:59Z,2783,US,0
made, wed like to production and in batch. Depending on the definition of the feature,
this can be highly non-trivial.
predict whether These problems together are frequently referred to as feature engi-
neering and are often the most involved (and impactful) parts of indus-
not the authorized For every payment, well plug in the values of amount, card_coun-
cardholder is try, and card_use_24h into the formula above, and if the probability
is greater than 0.5, well predict that the payment is fraudulent and
otherwise well predict that its legitimate.
making the Even before we discuss how to compute a, b, Z, there are two imme-
purchase) so diate problems to address:
that we can
Probability(fraud) needs to be a number between 0 and 1, but the
quantity on the right side can get arbitrarily large (in absolute value) de-
pending on the values of amount and card_use_24h (if those feature
take appropriate values are sufficiently large and one of a or b is nonzero).
Logit function
To address the first problem, instead of directly modeling p=Probabil-
ity(fraud), well model what is known as the log-odds of fraud, so our
model becomes:
If an event has probability p, its odds are p/(1-p), which is why the left
side is called the log odds or logit.
Categorical variables
To address the second problem in our list, well take the categorical vari-
able card_country (which, say, takes one of N distinct values) and ex-
pand it into N-1 dummy variables. These new features will be Booleans
of the form card_country=AU, card_country=GB, etc. We only need
N-1 dummies because the Nth value is implied when the N-1 dummies
are all false. For simplicity, lets say that card_country can take just one
of three values here: AU, GB, or US. Then we need two dummy variables
to encode it, and the model we would like to fit (i.e., find the coefficient
values for) is:
For every sample that actually was fraudulent, wed like p to be close to
1, and for every sample that was not fraudulent, wed like p to be close
to 0 (and so 1-p should be close to 1). Thus, we take the product of p
over all fraudulent samples with the product of 1-p over all non-fraud-
ulent samples to assess the accuracy of our guesses for a, b, c, d, and
Z. Wed like to make the likelihood function as large as possible (i.e., as
close as possible to 1). Starting with our guess, well iteratively tweak
a, b, c, d, and Z to improve the likelihood until we find that we can no
longer increase it by perturbing the coefficients. One common method
for this optimization is stochastic gradient descent.
Implementation in Python
Now well use some standard open-source tools in Python to put into
practice the theory weve just discussed. Well use pandas, which brings
R-like data frames to Python, and scikit-learn, a popular machine-learn-
ing package. Lets say the sample data we described above is in a CSV
file named data.csv; we can load the data and take a peek at it with the
following:
Now the data frame has all the data metrics of model performance (see
we need, dummy variables and all, the next section on what these are).
to train our model. Weve split up the If a model overfits, it will perform
target (the variable were trying to well on the training set (as it will have
predict which in this case is fraudu- learned the patterns in the set) but
lent) and the features as scikit-learn poorly on the validation set. There
takes them as different parameters. are other approaches to cross-valida-
tion (for example, k-fold cross-valida-
Before proceeding with the model tion), but a train-test split will serve
training, theres one more issue to our purposes here.
discuss. Wed like our model to gen-
eralize well i.e., it should be accu- We can easily split our data into train-
rate when classifying payments that ing and testing sets with scikit-learn
we havent seen before and it should as follows:
not just capture the idiosyncratic
patterns in the payments we happen
to have already seen. To make sure
that we dont overfit our models to
the noise in the data we have, well
separate the data into two sets: a
training set that well use to estimate
the model parameters (a, b, c, d, and
Z) and a validation set (also called a
test set) that well use to compute
Were now ready to train the model, which at this point is a triviality:
The fit function runs the fitting procedure (which maximizes the likelihood func-
tion described above), and then we can query the returned object for the values of
a, b, c, and d (in coef_) and Z (in intercept_). Our final model is:
the false-positive rate, the fraction of all legitimate charges that are incor-
rectly classified as fraudulent, and
the true-positive rate (also known as recall or sensitivity), the fraction of all
fraudulent charges that are correctly classified as fraudulent.
While there are many measures of classifier performance, well focus on these two.
Ideally, the false-positive rate will be close to 0 and the true-positive rate will be
close to 1. As we vary the probability threshold at which we classify a charge as
fraudulent (above we said it was 0.5, but we can choose any value between 0 and 1
low values mean were more aggressive in labeling payments as fraudulent and
high values mean were more conservative), the false-positive rate and true-posi-
tive rate trace out a curve that depends on how good our model is. This is known
as the ROC curve and can be computed easily with scikit-learn:
Of course, when we put a model into production to take an action, we generally need to action the
model-outputted probabilities by comparing them to a threshold as we did above, saying that a
charge is predicted to be fraudulent if Probability(fraud)>0.5. Thus, the performance of our
model for a specific application corresponds to a point on the ROC curve the curve just controls
the tradeoff between false-positive rate and true-positive rate, i.e., the policy options we have at our
disposal.
Its often the case that the relationship between predictive features and the
target variable were trying to predict is nonlinear, in which case we should
use a nonlinear model to capture the relationship. One powerful and intui-
tive type of a nonlinear model is a decision tree like the following:
At each node, we compare the val- game until we reach a leaf, which
ue of a specified feature to some contains a predicted probability of
threshold and branch either to the fraud we can assign to that trans-
left or the right depending on the action.
output of the comparison. We con-
tinue in this manner (like a game In brief, we create a decision tree by
of 20 Questions, though trees do selecting a feature and threshold at
not need to be 20 levels deep) un- each node to maximize some no-
til we reach a leaf of the tree. The tion of information gain or discrim-
leaf consists of all the samples in inatory power the gini shown
our training set for which the com- in the figure above and pro-
parisons at each node satisfied the ceed recursively until we hit some
path we took down the tree, and pre-specified stopping criterion.
the fraction of samples in the leaf While we wont go further into the
that are fraudulent is the predicted details of producing the decision
probability of fraud that the model tree, training such a model with
reports. When we have a new sam- scikit-learn is as easy as training a
ple to be classified, we generate its logistic regression (or any other
features and play the 20 Questions model, in fact):
Productionizing machine-learning
models
Training a machine-learning model is just one
step in the process of using machine learn-
ing to solve a business problem. As described
above, model training generally must be pre-
ceded by the work of feature engineering. And
once we have a model, we need to put it into
production to take appropriate actions (by
blocking payments assessed to be fraudulent,
Alyssa Frazee is a machine-learning engineer at Stripe, where she builds models to detect fraud
in online credit-card payments. Before Stripe, she did a Ph.D. in biostatistics and fell in love with
programming at the Recurse Center. Find her on Twitter at @acfrazee.
from observing the outcome for optimism Ive seen derives from Problem 1: Your model
observations in one of the class- the fact that ML practitioners becomes its own
es: we never get to see wheth- have been doing their very best adversary
er a website visitor would have to develop techniques for over- Adversarial machine learning is
clicked an ad if we dont show it coming these sorts of problems. a fascinating subfield of ML that
and we never get to see if a cred- We can correct expensive but deals with model-building with-
it-card charge was actually fraud- badly designed biology exper- in a system whose data chang-
ulent unless we process it since iments after the fact. We can es over time due to an external
were missing the data to evalu- build regression models even if adversary, i.e., someone trying
ate. Luckily, there are statistical our data is correlated in surpris- to exploit weaknesses in the
methods for addressing this. ing or unquantifiable ways that current model or someone who
rule out standard linear regres- benefits from the model making
Finally, we may be using a black- sion. We can empirically estimate a mistake. Fraud and security are
box model: a model that makes what could have been if we had two huge application areas in ad-
accurate, fast predictions that missing data. versarial ML.
computers easily understand
but that arent designed to be I work on machine learning at
examined post hoc by a human Stripe, a company that builds
(random forests are a canonical I mention these examples be- payments infrastructure for the
example). Do our users want un- cause they (and countless others Internet. Specifically, I build ML
derstandable explanations for like them) have led me to believe models to automatically detect
decisions that the model made? that you can solve most of data and block fraudulent payments
Simple modeling techniques can problems with relatively simple across our platform. My team
handle that problem too. techniques. Im loath to give up aims to decline charges being
on answering an empirical ma- made without the consent of the
One of my favorite things chine learning question just be- cardholder. We identify fraud us-
about being a statisti- cause, at first glance, our data set ing disputes: cardholders file dis-
cian-turned-ML-practitioner is isnt quite textbook. What follows putes against businesses where
the optimism of the field. It feels are a few examples of ML prob- their cards are used without their
strange to highlight optimism lems that at one point seemed authorization.
in fields concerned with data insurmountable but that can be
analysis: statisticians have a bit tackled with some straightfor- In this scenario, our obvious ad-
of a reputation for being party ward solutions. versaries are fraudsters: people
poopers when they point out to trying to charge stolen cred-
collaborators flaws in experimen- it-card numbers for financial
tal designs, violations of model gain. Intelligent fraudsters are
assumptions, or issues arising generally aware that banks and
because of missing data. But the
performance
charge never happens and so we technique called inverse proba-
cant determine if it would have bility weighting to reconstruct a
been fraudulent. This means we fully labeled data set of charges
metrics and cant estimate model perfor-
mance. Any increase in observed
with labeled outcomes. The
idea behind inverse probability
This article introduces neural What are neural By building a system of connect-
networks, including brief de- networks? ed artificial neurons, we obtain
scriptions of feed-forward neural Artificial neural networks are systems we can train to learn
networks and recurrent neural algorithms initially conceived higher-level patterns in data and
networks, and describes how to emulate biological neurons. to perform useful functions such
to build a recurrent neural net- The analogy, however, is a loose as regression, classification, clus-
work that detects anomalies in one. The features of biological tering, and prediction.
time-series data. To make our neurons that artificial neural
discussion concrete, well show networks mirror include connec- The comparison to biological
how to build a neural network tions between the nodes and an neurons only goes so far. An ar-
using Deeplearning4j, a popular activation threshold, or trigger, tificial neural network is a collec-
open-source deep-learning li- for each neuron to fire. tion of compute nodes. We pass
brary for the JVM. data, represented as a numeric
array, into a networks input layer that represents the input data. the node either fires or does not,
and the data proceeds through For example, each pixel in an depending on whether or not
the networks so-called hidden image may be represented by a the strength of the stimulus it re-
layers until the network gener- scalar that is then fed to a node. ceives, the product of the input
ates an output or decision about That input data passes through and the coefficient, surpasses the
the data. We then compare the the coefficients, or parameters, threshold of activation.
nets resulting output to expect- of the net and through multipli-
ed results (ground-truth labels cation, those coefficients will am- In a so-called dense or fully con-
applied to the data, for example) plify or mute the input, depend- nected layer, the output of each
and use the difference between ing on its learned importance node passes to all nodes of the
the networks guess and the right i.e., whether or not that pixel subsequent layer. This continues
answer to incrementally correct should affect the nets decision through all hidden dense layers,
the activation thresholds of the about the entire input. ending with the output layer,
nets nodes. As we repeat this where the network reaches a
process, the nets outputs con- Initially, the coefficients are ran- decision about the input. At the
verge on the expected results. dom; i.e., the network is creat- output layer, the nets decision
ed knowing nothing about the about the input is evaluated
A whole neural network of many structure of the data. The activa- against the expected decision
nodes can run on a single ma- tion function of each node deter- (e.g., do the pixels in this image
chine. It is important to note, for mines the output of that node represent a cat or a dog?). The
those coming from distributed given an input or set of inputs. So error is calculated by comparing
systems, that a neural network is
not necessarily a distributed sys-
tem of multiple machines. Node,
here, means a place where com-
putation occurs.
Training process
To build a neural network, we
need a basic understanding of
the training process and how
the nets generates output. While
we wont go deep into the equa-
tions, a brief description follows.
coefficients are loss, cost, or error function). quickly through many combina-
tions of hyperparameters to find
The activation function deter- the right architecture.
modified, the mines whether and to what ex-
tent a signal should be sent to Larger data sets are being gen-
Recurrent neural
networks
Unlike feed-forward neural net-
works, the hidden layer nodes
of a recurrent neural network
(RNN) maintain an internal state,
a memory, that updates with
new input fed into the network.
Those nodes make decisions
based both on the current input
and on what has come before.
RNNs can use that internal state
to process relevant data in arbi-
trary sequences of inputs, such
as time series.
Sample code
The configuration of a recurrent neural network might look something like this:
.seed(123)
This sets a random seed to initialize the neural nets weights, in order to obtain reproducible results. Typically,
coefficients are initialized randomly, so to obtain consistent results while adjusting other hyperparameters, we
need to set a seed so we can use the same random weights over and over as we tune and test.
.optimizationAlgo(OptimizationAlgorithm.STOCHASTIC_GRADIENT_DESCENT).iterations(1)
This determines which optimization algorithm to use (in this case, stochastic gradient descent) to determine
how to modify the weights to improve the error score. We probably wont have to modify this.
.learningRate(0.005)
When using stochastic gradient descent, the error gradient (that is, the relation of a change in coefficients to a
change in the nets error) is calculated and the weights are moved along this gradient in an attempt to move the
error towards a minimum.Stochastic gradient descent gives us the direction of less error, and the learning rate
determines how big of a step is taken in that direction. If the learning rate is too high, we may overshoot the
error minimum; if it is too low, our training will take forever. This is a hyperparameter that we may need to adjust.
Getting help
There is an active community of Deeplearning4J users who can be found on several support channels on Gitter.
Edwin Chen works at Hybrid, a platform for machine learning and human labor. He used to
build machine-learning systems for Google, Twitter, Dropbox, and quantitative finance.
Justin Palmer is founder of topfunnel, software for recruiters, and works on Hybrid. He was
most recently VP of data at LendingHome and has built ML products for speech recognition and
natural language processing at Basis Technology and MITRE.
However, even these systems files selling Viagra, drugs, and so that we can easily experi-
need to move beyond simple other blacklisted products, and ment with different models
click data once they reach large we want to fight the problem and parameters?
enough scale and sophistica- with machine learning. But how
tion; for instance, because theyre do we do this? 3. We cant rest after weve de-
heavily biased towards clicks, ployed our first spam classifi-
it can be difficult to tune the 1. First, were going to need er. As we get new sources of
systems to show new ads and people to label training users, or spammers get more
new videos to users, and so ex- data. We cant use logs; our creative, the types of spam
plore-exploit algorithms become users arent flagging things appearing on our website
necessary. for us and even if they were, will quickly change so well
theyre surely wildly biased need to continually rerun
Whats more, many of these sys- (and spammers themselves steps 1 and 2 which is a
tems eventually incorporate ex- would misuse the system). surprisingly difficult process
plicitly human-generated labels But gathering training data to automate, especially while
as well. For instance, Netflix em- is a generally difficult prob- maintaining the accuracy lev-
ploys over 40 people to hand-tag lem in and of itself. Well need els we need.
movies and TV shows in order to hundreds of thousands of la-
make better recommendations bels, requiring thousands of 4. Even with a working, mature
and generate labels like Foreign hours of work. Where will we ML pipeline, were not fin-
movies featuring a strong female get these? ished. We dont want to ac-
lead, YouTube hand-labels every cidentally flag and remove
ad to have better features when 2. Next, well need to build legitimate users so there will
making ad-click predictions, and and deploy an actual ML al- always be cases of ML deci-
Google trains its search algorithm gorithm. Even with ML ex- sion boundaries which we
in part on scores that a large, in- pertise, this is a difficult and need a human to go and look
ternal team of dedicated raters time-consuming process: at. But how do we build a scal-
gives to query-webpage pairs. how do we choose an algo- able human-labor pipeline
rithm, how do we choose the that seamlessly integrates
Suppose were an e-commerce features to input into the al- into our ML and returns re-
site like eBay or Etsy. Weve start- gorithm, and how do we do sults in real time?
ing to see a lot of spammy pro- this in a repeatable manner,
Behind the scenes, the same call Its also likely to be biased in un- One common monitoring tech-
automatically and invisibly de- known ways (after all, plenty of nique is to label a number of
cides whether a ML classifier is people are fooled by spammy Ni- profiles as spam or not spam
reliable enough to classify the gerian e-mail). and randomly send them to your
example on its own or wheth- workers in order to see if the
er it needs human intervention. Another way to come up with workers agree with our labels.
Models get built automatically, training data is to label a bunch
theyre continually retrained, of profiles ourselves. But this is Another potential approach is to
and the caller never has to worry almost certainly a waste of time use statistical distribution tests to
whether more data is needed. and resources: spam probably catch outlier workers. For exam-
constitutes less than 1-2% of all ple, imagine a simple image-la-
In the rest of this article, well go profiles, so wed need hundreds beling task: if most workers label
into more detail on the problems of thousands of profile classifica- 80% of the images as cat and
we described above problems tions (and thousands of hours) in 20% as not cat then a worker
that are common to all efforts to order to form a reasonable train- who labels only 45% of images as
deploy ML to solve real-world ing set. cat should probably be flagged.
problems.
What we need, then, is a large One difficulty, though, is that
group of workers to comb workers deviate from each other
Labels for training through a large set of profiles, in completely legitimate ways.
In order to train any spam clas- and mark them as spam or not For example, people may tend to
sifier, we first need a training set spam according to a set of in- upload more cat images during
of spam and not spam labels. structions. Common ways to find the day, or spammers may tend
One way to provide these is to workers to perform these types to operate during the night. In
use our sites visitors and logs. of tasks include hiring off of these cases, daytime workers
Just add a button that allows visi- Craigslist or using online crowd- will have higher cat and not
tors to mark profiles as spam and sourcing platforms like Amazon spam labels compared to those
use the results as a training set. Mechanical Turk, Crowdflower, or who work at night. To account
Hybrid. for this kind of natural deviation,
However, this can be a problem a more sophisticated approach is
for several reasons. Most of our However, the work generated to apply non-parametric Bayes-
visitors will ignore the button so by Craigslist or Mechanical Turk ian techniques to cluster worker
our training set is likely to be very workers is often low quality; at output, which we then measure
small. Hybrid, weve often seen spam for deviations.
rates, where workers randomly
Its easily gamed: spammers can click on labels, as high as 80-90%.
simply start marking legitimate So well need to monitor worker
profiles as spammy. output for accuracy.
by Charles Humble
Erik Brynjolfsson is the director of the MIT Center for Digital Business and one of the most
cited scholars in information systems and economics. He is a cofounder of MITs Initiative on the
Digital Economy, along with Andrew McAfee. He and McAfee are the only people named to both
the Thinkers 50 list of the worlds top management thinkers and the Politico 50 group of people
transforming American politics.
Andrew McAfee is a principal research scientist at the MIT Center for Digital Business and the
author of Enterprise 2.0. He is a cofounder of MITs Initiative on the Digital Economy, along
with Erik Brynjolfsson. He and Brynjolfsson are the only people named to both the Thinkers 50
list of the worlds top management thinkers and the Politico 50 group of people transforming
American politics.
Erik Brynjolfsson and Andrew McAfee begin The Second Machine Age
with a simple question: what innovation has had the greatest impact on
human history?
Innovation is meant in the course of humanity the most either of them, the arc of hu-
broadest sense: agriculture and (and how even is that deter- man history decisively moves
the domestication of animals mined)? up and to the right (as Silicon
were innovations, as were the Valley startups would have of all
advent of various religions and To start, Brynjolfsson and McA- of their metrics) starting around
forms of government, the print- fee suggest population and mea- 1765. The authors argue that the
ing press, and the cotton gin. sures of social development as trigger for this growth was James
But which of these changed the approximate yardsticks. Using Watts steam engine, a gener-
al-purpose technological inno- all the entrants failing just a few al years away from achievements
vation more than three times as hours in (Popular Science derid- like those of AlphaGo.
efficient as its predecessors and ed the competition as a Debacle
one that essentially kicked off in the Desert). There was also Why has the progress here been
the Industrial Revolution. IBMs Jeopardy-winning Watson, so sudden in the past several
which thoroughly demolished years? One plausible, specific an-
Brynjolfsson and McAfee, re- the two most successful human swer for many of these advances
searchers at the MIT Center for Jeopardy contestants. Watson goes unmentioned: develop-
Digital Business who have made absorbed massive amounts of ments in neural networks and
careers studying the impact of information, including the en- deep learning. But Brynjolfsson
the Internet on business, believe tirety of Wikipedia, and was able and McAfee focus on three high-
that were on the precipice of an- to answer instantaneously and er-level explanations.
other such revolution a sec- correctly even when the clues in-
ond machine age and pro- volved typical-for-Jeopardy puns First, theres the exponential
vide some anecdotal evidence and indirection (it correctly of- growth described by Moores
for this. These examples all have fered pentathlon as the answer Law: transistor density doubles
the same form: a decade ago we to A 1976 entree in the modern every 18 months. Citing Ray Kur-
were frustratingly far from prog- this was kicked out for wiring zweils rough rule of thumb that
ress in the area and almost over- his epee to score points without things meaningfully change after
night, the problems had been touching his foe). And although 32 doublings (once youre in the
solved (generally by advances it was developed after the book second half of the chess board)
in machine learning). The work was published, we could add and the fact that the Bureau of
here progressed in the same Deepminds AlphaGo, the first Economic Analysis first cited in-
way that Ernest Hemingway de- Go program ever to beat a pro- formation technology as a cor-
scribed how people go bankrupt fessional player. In October 2015, porate investment category in
in The Sun Also Rises: gradually, AlphaGo defeated the reigning 1958, the authors peg 2006 as
then suddenly. three-time European champion when Moores Law put us into a
Fan Hui 5-0, and in March 2016, new regime of computing.
Among the examples are it defeated Lee Sedol, the top
self-driving cars, now complete- Go player in the world over the Second, theres the trend of the
ly unremarkable on the free- past decade, 4-1. Because Go is digitization of everything: maps,
ways of Northern California, so combinatorially complex books, speech theyre all be-
only a decade ago seemed out on average, the number of pos- ing stored digitally in a form
of reach. As recently as 2004, sible moves a player can make thats amenable for processing
the DARPA Grand Challenge to is almost an order of magnitude and analysis. For example, the
build a car that could autono- more than the equivalent num- navigation app Waze uses sev-
mously navigate a course in the ber in chess it was generally eral streams of information: dig-
desert ended disastrously, with believed that we were still sever- itized street maps, location coor-
TMP
The Morning Paper
47
The Current State of
NoSQL Databases
49 Getting a Handle on
Data Science
This eMag looks at data science from the ground up, across
technology selection, assembling raw and unstructured
data, statistical thinking, machine learning basics, and the
This eMag focuses on the current state of NoSQL data-
bases. It includes articles, a presentation and a virtual
panel discussion covering a variety of topics ranging
from highly distributed computations, time series da-
tabases to what it takes to transition to a NoSQL da-
tabase solution.
46
ethics of applying these new weapons. Architectures Youve
Always Wondered About