Sunteți pe pagina 1din 3

Book Review

The Numerati
Reviewed by Jeffrey Shallit

analyze the relevant data and try to come up with


The Numerati
marketable conclusions. “[T]hese mathematicians
Stephen Baker
and computer scientists,” Baker intones sternly,
Houghton Mifflin Co., 2008
US$26.00, 256 pages “are in a position to rule the information of our
ISBN-13: 978-0618784608 lives.”
In a book whose subject is data, equations,
and mathematical models, Baker is surprisingly
Is it possible for a nonmathematician to write shy about presenting any actual mathematics. Or
both accurately and entertainingly about a math- perhaps it is not so surprising. Steven Hawking
ematical topic while still conveying something once wrote, “Someone told me that each equation
nontrivial about the mathematics? The answer is I included in the book would halve the sales. I
yes, but good examples are rare. Constance Reid therefore resolved not to have any equations at
did the trick with her book about Hilbert, and, to all. In the end, however, I did put in one equa-
a lesser extent, with her book about Courant, but tion, Einstein’s famous equation E = mc 2 . I hope
she had the advantage of having Julia Robinson that this will not scare off half of my potential
for a sister. And of course Martin Gardner, who readers.” [2] Baker has taken Hawking one equation
had little formal mathematical training, wrote the further.
“Mathematical Games” column of Scientific Amer- Baker’s approach is almost entirely anecdotal.
ican for many years, and introduced the beauty All told, he interviews about two dozen of the
of mathematics to many young readers, including
numerati, ranging from IBM’s Samer Takriti to Ya-
this reviewer.
hoo’s head of research, Prabhakar Raghavan, and
Stephen Baker, the author of The Numerati, is,
asks them some not-very-revealing questions. We
unfortunately, no Martin Gardner. By numerati (the
learn very little about their personalities and even
word apparently first appeared in a 1990 review of
less about what it is they do on a day-to-day basis.
a British art exhibit, written by Doron Swade) Baker
Some of the anecdotes are, admittedly, inter-
means the kind of people who, were they working
esting. I particularly enjoyed the plan to “put
in the financial industry, would be called “quants”:
a wireless computer on half a million cows in
people with very strong mathematical and com-
Kansas”; with the data collected, researchers hope
puter skills who can analyze real-world problems.
to determine what behavior patterns of cows are
While “quants” study financial markets and build
correlated with higher-quality meat. But some are
mathematical models, Baker’s numerati analyze
not so interesting. Baker opens with a puzzle: why
large volumes of data collected electronically, in
do people who rent romantic movies online also
order to make predictions about human behavior
tend to click on an ad for rental cars, much more
in a variety of spheres: voting, employment, con-
than the average user? The answer, when it comes,
sumption, crime, illness, blogging, and marriage.
is not that surprising: lovers of romantic movies
Each of these activities gets a chapter devoted to
were attracted by the ads that promoted weekend
it, in which Baker interviews several people who
“escapes”.
Jeffrey Shallit is professor of computer science at the There is very little in The Numerati to interest
University of Waterloo, Ontario, Canada. His email ad- the professional or amateur mathematician; this
dress is shallit@cs.uwaterloo.ca. is the kind of book that a business executive

October 2009 Notices of the AMS 1109


might buy in an airport bookstore, hoping to learn Web pages—are of immense value to advertisers”.
something about mathematical modeling and the This is incorrect. The site Goto.com allowed adver-
Internet—but I imagine even the business execu- tisers to bid on search results as early as February
tive will find insufficient novelty in Baker’s modest 1998, two years before Google did so. Google’s
survey. There’s just not enough detail provided to original noteworthy accomplishment—and the one
tell the reader very much about the main subject: that made it the search engine of choice—was its
the models and algorithms that extract meaning new algorithm, called PageRank, for deciding what
from large volumes of data. Web pages provide good matches for a query.
As an example, consider this passage: “If one PageRank represented the Web as a directed
of Raghavan’s scientists gives an imprecise com- graph. Nodes are pages, and there’s a directed
puter command while trawling through Yahoo’s edge from page A to page B if A links to B. In its
data, he can send the company’s servers whirring simplest form, PageRank assigned a weight W to
madly through the noise for days on end. But a the edge (A, B) with
timely tweak in these instructions can speed up
the hunt by a factor of 30,000. That reduces a number of links from B to A
W = .
24-hour process to about three seconds. His point total number of pages that B links to
is that people with the right smarts can summon The resulting square matrix, called the “link ma-
meaning from the nearly bottomless sea of data. trix”, is column stochastic and has an eigenvalue
It’s not easy, but they can find us there.” of 1. The associated eigenvector, if it is unique
Reading this, I can only wonder, what is an “im- and suitably normalized, gives the “rank” or im-
precise computer command”? Does the passage portance of each page. (There is now more actual
concern a new breakthrough at Yahoo in search mathematics in this review than in all 244 pages
optimization, or something obvious that every of Baker’s book.) To make this idea work well in
undergraduate computer science student learns, practice, we need uniqueness of the eigenvector
such as binary search? Baker just doesn’t give and a fast way to calculate it, so the mathematical
enough detail to decide. story doesn’t end here. But even in its infancy,
Baker emphasizes that the volume of data col- PageRank helped Google give much better results
lected by the numerati requires new techniques, than other search engines—so good that Google’s
but he doesn’t really explain why. It would have home page cockily offers an option labeled “I’m
been nice to read something along these lines: if Feeling Lucky”, where only a single search result,
we are working with small data sets, with hun-
the top one, is revealed—that it quickly became
dreds or thousands of items, we can afford to
the search engine of choice. Although Google’s
use algorithms that run in linear, O(n log n), or
search engine has since moved far past PageRank,
even quadratic time. But, as my colleague Alex
a mathematically savvy writer could have easily
López-Ortiz has noted [3], when you are dealing
summarized these elementary ideas, or at least
with 230 or even 240 data points, the log factor is
referred to the paper of Bryan and Leise [1].
the difference between a query that completes in
Even when a simple geometric diagram would
a second and one that completes in half a minute.
have enlightened the reader, Baker refuses to pro-
Too often Baker relies on clichés. Over and
vide it. In talking with Mark Steitz, a Democratic
over, we are told that the goal of the numerati is
to “turn us into dizzying combinations of num- consultant, he describes a “simplex triangle” that
bers” (p. 13), to “turn IBM’s workers into numbers” represents voters in an election. Each voter is
(p. 20), and that they will view people as “boiled represented by a point with two coordinates that
down to numbers” (p. 23) or “represented as a represent (a) the likelihood of favoring one party
series of numbers” (p. 35). Of these, only the last over another and (b) the likelihood of actually
is accurate. Sometimes, though, Baker says we are going to the polls in any election. “Steitz draws a
actually equations: “each of us [is] represented by vertical line up the triangle, a so-called isoquant.
scores of equations” (p. 42); “I had ... no clue as Each voter along this line is of equal value, he
to what kind of equation I would become” (p. 99). says.” Although I imagine every reader of this
This, even metaphorically, seems incorrect. Peo- review could produce the diagram Steitz has in
ple might be represented by numbers, and their mind, one picture here would be worth more than
relationships might be governed by equations, but a hundred words.
it makes little sense to claim that an individual’s In the chapter on politics, Baker discusses the
attributes are represented by an equation. difficulty of obtaining good data on who people
Although most of his account is accurate—as are likely to vote for. Because of this, “proxies”
far as it goes—Baker does get some of the histo- are used; if you bought a Volvo and shop at Trad-
ry wrong. He claims, for example, that “Google’s er Joe’s, you might be more likely to vote for a
breakthrough, which transformed a simple search Democrat than someone who’s an NRA member
engine into a media giant, was the discovery that and drives a pickup truck. Geographical proxies
our queries—the words we type when we hunt for can be good predictors, too, but Baker’s account

1110 Notices of the AMS Volume 56, Number 9


is superficial compared to others, such as Michael Ultimately, I did not find The Numerati a very
Weiss’s The Clustering of America [5]. satisfying account of its subject. I wanted more
The contrast between this book and some re- insight—something that Baker, with his nonmath-
lated ones published recently is startling. For ematical background, could not provide. Perhaps
example, Emanuel Derman’s My Life as a Quant I am unfair in criticizing Stephen Baker for not
[4] is a memoir of the author’s career as a physi- writing the book I would have wanted to read. The
cist, computer programmer, and financial wizard. problem is, I don’t think he wrote the book that
Along the way, Derman provides portraits of most people would have wanted to read.
Tsung-Dao Lee, the physicist who co-discovered
the asymmetry of the weak interaction with C. N. References
Yang, and Fischer Black, co-creator of the Black- [1] Kurt Bryan and Tanya Leise, The
Scholes equation for the value of an option. Here $25,000,000,000 eigenvector: The linear alge-
bra behind Google, SIAM Review 48 (2006),
is Derman on T. D. Lee:
569–581.
... every speaker felt compelled to [2] Steven Hawking, A Brief History of Time, Bantam
focus on him; as they spoke, their Books, 1998.
eyes fixated only on him, and he [3] Alejandro López-Ortiz, Algorithmic foundations
let no statement he did not fully of the Internet, Combinatorial and Algorithmic
Aspects of Networking, Lecture Notes in Comput-
agree with pass him by. No mat-
er Science, Vol. 3405, Springer, Berlin, 2005, pp.
ter who lectured at the seminar,
155–158.
T. D. concentrated intensely on [4] Emanuel Derman, My Life as a Quant: Reflections
their argument, and interrupted at on Physics and Finance, Wiley, 2004.
the first instant something was not [5] Michael J. Weiss, The Clustering of America, Tilden
satisfactory. At times he broke in Press, 1988.
on the initial sentence of the talk,
refusing to let a speaker proceed
until the point was clarified. Some-
times clarification never came; I
once witnessed the humiliation of
a visting postdoc who was forced
to defend the first sentence he ut-
tered for the entire hour and a half
allowed for his seminar.
Derman’s writing is witty, insightful, and mov-
ing; his prose is eloquent, and accurately captures
the joys and sorrows of doing research. Derman’s
book is not filled with equations, either, but he
uses diagrams effectively to make his points, and
describes, in a clear if nontechnical way, some
of the ideas that excited him in physics and fi-
nance. As someone who has actually worked in
mathematics, physics, and finance, Derman writes
with an authority and insight that Baker cannot
approach.
Very little of The Numerati is devoted to an
analysis of the ethical and privacy concerns that
data collection raises. Although Baker briefly dis-
cusses one way of hiding from the numerati—an
initiative called Attention Trust—he says almost
nothing about technologies for cryptography and
anonymity. Modern cryptography, which is strong-
ly mathematically based, offers us the hope that
many of our transactions can take place veiled
from the prying eyes of the numerati. And anony-
mous Web-surfing, based (for example) on tech-
nology from anonymizer.com or the Tor project,
can prevent data collectors from linking online
behavior with the specific person who is doing the
surfing.

October 2009 Notices of the AMS 1111

S-ar putea să vă placă și