Documente Academic
Documente Profesional
Documente Cultură
ACM
CACM.ACM.ORG OF THE 03/2011 VOL.54 NO.3
Plug-and-Play
Macroscopes
Data Structures
in the Multicore Age
Fumbling the Future
The Informatics
Philharmonic
Testable System
Administration
Memristors:
Pass or Fail?
Association for
Computing Machinery
AdvAnCe Your CAreer wiTh ACM TeCh PACkS…
For Serious
Computing Professionals.
2 co mmunications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
03/2011 vol. 54 no. 03
Research Highlights
86 Technical Perspective
Concerto for Violin
and Markov Model
By Juan Bello, Yann LeCun,
50 70 and Robert Rowe
Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.
CL
PL
T (212) 869-7440; F (212) 869-0481 Jason I. Hong; Jeff Johnson; Wendy E. MacKay Printed in the U.S.A.
NE
TH
S
I
Z
I
M AGA
4 communications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
editor’s letter
DOI:10.1145/1897852.1897854
I
n “Regulating the Information absurd. At least until artificial intel- With computation, even more fac-
Gatekeepers” (Nov. 2010), ligence has advanced to where ma- tors are needed, including the cor-
Patrick Vogl and Michael chines must indeed be granted the rectness of hardware design and the
Barrett said a counterargu- same rights we grant our fellow hu- validity of the software packages be-
ment against the regulation mans. ing used, as argued by Nick Barnes
of search-engine bias is that “Search Roger Neate, Seattle, WA in his comment “Release the Code”
results are free speech and therefore (Dec. 2010) concerning Dennis Mc-
cannot be regulated.” While I have Cafferty’s news story “Should Code Be
no quarrel as to whether this claim is Authors’ Response: Released?” (Oct. 2010).
true, I’m astounded that anyone could Neate touches a nerve concerning the For such a set of scientific assump-
seriously make such a counterargu- increasingly complex relationship between tions, Thomas S. Kuhn coined the
ment—or any judge accept it. humans and material technologies in term “paradigm” in his 1962 book The
Search results are the output of an society. Accountability in these socio- Structure of Scientific Revolutions. Imre
algorithm. I was unaware the field of material settings is challenging for judge Lakatos later evolved the concept into
artificial intelligence had advanced to and regulator alike. In the 2003 case the notion of “research program” in
the point that we must now consider of SearchKing vs. Google Technology, a his 1970 paper “Falsification and the
granting algorithms the right of free U.S. District Court noted the ambiguity of Methodology of Scientific Research
speech. To illustrate such absurdity, deciding whether PageRank is mechanical Programs.”
suppose I was clever enough to have and objective or subjective, ruling that In this light, neither the two-leg nor
devised an algorithm that could crawl PageRank represents constitutionally the four-leg hypothesis is convincing.
the Web and produce opinionated ar- protected opinions. Whether search Citing the leg metaphor at all, science
ticles, rather than search results, as its results are indeed free speech remains is perhaps more accurately viewed as
output. Would anyone seriously sug- controversial, meaning we can expect the a millipede.
gest the resulting articles be granted debate to continue. Wolf Siberski, Hannover, Germany
all the constitutional protections af- Patrick Vogl and Michael Barrett,
forded the works of a human author? Cambridge, U.K.
Taking the analogy further, suppose, Certify Software Professionals
too, my algorithm produced some- and their Work
thing equivalent to shouting “Fire!” Science Has 1,000 Legs As a programmer for the past 40
in a crowded theater. Or, further still, It’s great to reflect on the foundations years, I wholeheartedly support Da-
perhaps it eventually produced some- of science in Communications, as in vid L. Parnas’s Viewpoint “Risks of
thing genuinely treasonous. Tony Hey’s comment “Science Has Undisciplined Development” (Oct.
If we accept the idea that the out- Four Legs” (Dec. 2010) and Moshe Y. 2010) concerning the lack of dis-
put of an algorithm can be protected Vardi’s Editor’s Letter “Science Has cipline in programming projects.
under the right of free speech, then Only Two Legs” (Sept. 2010), but also We could be sitting on a time bomb
we ought also to accept the idea how the philosophy of science sheds and should take immediate action
that it is subject to the same limita- light on questions involving the num- to prevent potential catastrophic
tions we place on truly unfettered ber of legs in a natural science. consequences of the carelessness of
free speech in a civilized society. But Willard Van Orman Quine’s 1951 software professionals. I agree with
who would we go after when these paper “Two Dogmas of Empiricism” Parnas that undisciplined software
limitations are exceeded? I may have convincingly argued that the attempt development must be curbed.
created the algorithm, but I’m not to distinguish experiment from the- I began with structured program-
responsible for the input it found ory fails in modern science because ming and moved on to objects and
that actually produced the offensive every observation is so theory-laden; now to Web programming and find
output. Who’s guilty? Me? The algo- for example, as a result of a Large that software is a mess today. When
rithm? (Put the algorithm on trial?) Hadron Collider experiment, scien- I travel on a plane, I hope its embed-
The machine that executed the algo- tists will not perceive, say, muons or ded software does not execute some
rithm? How about those responsible other particles, but rather some visual untested loop in some exotic func-
for the input that algorithmically pro- input originating from the computer tion never previously recognized or
duced the output? screen displaying experimental data. documented. When I conduct an
Unless humans intervene to mod- The interpretation of this perception online banking transaction, I like-
ify the output of algorithms produc- depends on the validity of many non- wise hope nothing goes wrong.
ing search results, arguments involv- empirical factors, including physics See the Web site “Software Hor-
ing search results and free speech are theories and methods. ror Stories” (http://www.cs.tau.
6 co mmunications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
letters to the editor
Timely
BY accessAN
BECOMING toACMrelevant
MEMBER information
YOU RECEIVE:
Communications of the ACM magazine • ACM Tech Packs • TechNews email digest • Technical Interest Alerts and
Timely access• to
ACM Bulletins ACM relevant
journalsinformation
and magazines at member rates • full access to the acmqueue website for practi-
Communications
tioners • ACM SIG the ACM magazine
of conference discounts• ACM
• theTech PacksACM
optional • TechNews email digest • Technical Interest Alerts
Digital Library
and ACM Bulletins • ACM journals and magazines at member rates • full access to the acmqueue website for
practitioners
Resources that• ACM SIGenhance
will conference discounts
your career• and
the optional
follow youACM toDigital
newLibrary
positions
Career & Job Center • online books from Safari® featuring O’Reilly and Books24x7® • online courses in multiple
Resources
languages •that will
virtual enhance
labs your career
• e-mentoring servicesand follow you
• CareerNews emailtodigest
new positions
• access to ACM’s 34 Special Interest
Career
Groups&•Job Center • email
an acm.org The Learning
forwardingCenter • online
address withbooks
spamfrom Safari® featuring O’Reilly and Books24x7® •
filtering
online courses in multiple languages • virtual labs • e-mentoring services • CareerNews email digest • access to
ACM’s36
ACM’s worldwide network
Special Interest of more
Groups than
• an 97,000
acm.org members
email rangesaddress
forwarding from students to seasoned
with spam filtering professionals and
includes many renowned leaders in the field. ACM members get access to this network and the advantages that
come worldwide
ACM’s from their expertise
network of to more
keep you
thanat100,000
the forefront of the
members technology
ranges world. to seasoned professionals and
from students
includes many renowned leaders in the field. ACM members get access to this network and the advantages that
Pleasefrom
come taketheir
a moment
expertise to to
consider
keep youtheatvalue of an ACM
the forefront membership
of the your career and your future in the
technologyforworld.
dynamic computing profession.
Please take a moment to consider the value of an ACM membership for your career and your future in the
Sincerely,computing profession.
dynamic
Sincerely,
Alain Chesnais
President
Alain Chesnais
Association for Computing Machinery
President
Association for Computing Machinery
Special rates for residents of developing countries: Special rates for members of sister societies:
http://www.acm.org/membership/L2-3/ http://www.acm.org/membership/dues.html
Please print clearly
Purposes of ACM
ACM is dedicated to:
Name
1) advancing the art, science, engineering,
and application of information technology
2) fostering the open interchange of
Address information to serve both professionals and
the public
3) promoting the highest professional and
City State/Province Postal code/Zip ethics standards
I agree with the Purposes of ACM:
Country E-mail address
Signature
Area code & Daytime phone Fax Member number, if applicable ACM Code of Ethics:
http://www.acm.org/serving/ethics.html
o ACM Professional Membership plus the ACM Digital Library: o ACM Student Membership plus the ACM Digital Library: $42 USD
$198 USD ($99 dues + $99 DL) o ACM Student Membership PLUS Print CACM Magazine: $42 USD
o ACM Digital Library: $99 USD (must be an ACM member) o ACM Student Membership w/Digital Library PLUS Print
CACM Magazine: $62 USD
GZ\^hiZgcdlidViiZcYi]Z'%&&G^X]VgYIVe^V8ZaZWgVi^dcd[9^kZgh^in^c8dbeji^c\
8dc[ZgZcXZVcYhVkZ,*ÄVYkVcXZgZ\^higVi^dcgViZhVgZkVa^Yi]gdj\]IjZhYVn!BVgX]-#
S^cXZ'%%&!i]ZIVe^V8ZaZWgVi^dcd[9^kZgh^in^c @e^dAkX_Wjem_Yp"Egd[Zhhdgd[
8dbeji^c\]VhhZgkZYVhVaZVY^c\[dgjb[dgWg^c\" 8dbejiZgHX^ZcXZVii]ZJc^kZgh^in
^c\id\Zi]ZghijYZcih!egd[ZhhdghVcYegd[Zhh^dcVahid d[8Va^[dgc^V!7Zg`ZaZn!lVhhZaZXiZY
Y^hXjhhVcYhigZc\i]Zci]Z^geVhh^dcVcYXdbb^ibZci WnHX^Zci^ÒX6bZg^XVc^c'%%'VhdcZ
idXdbeji^c\#I]Z'%&&egd\gVbl^aa^cXajYZhiZaaVg d[*%hX^Zci^hih[dgdjihiVcY^c\
heZV`Zghl]dVgZZmZbeaVgnaZVYZgh^cVXVYZb^VVcY VX]^ZkZbZcih^chX^ZcXZVcY
^cYjhign!hjX]Vh/ iZX]cdad\n#
?hl_d]MbWZWmiao#8[h][h"[dgbZgX]V^gd[ FWjjoBef[p"8dbedcZci9Zh^\c
i]Z>7B6XVYZbnd[:c\^cZZg^c\VcYi]Z'%%& :c\^cZZgl^i]>ciZa!Vl^ccZgd[
=:C668=^heVc^X:c\^cZZgd[i]ZNZVg!l^aa =ZlaZiiEVX`VgYÉhIZX]c^XVaAZVYZgh]^e
\^kZi]Z@Zc@ZccZYnBZbdg^VaAZXijgZdc 6lVgY^c'%%&!VcYXd"[djcYZg
ÆI]Z8]Vc\^c\CVijgZd[GZhZVgX]VcY d[AVi^cVh^c8dbeji^c\#
>ccdkVi^dc^ci]Z'&hi8Zcijgn#Ç
8Wi[Zedikhl[oi\hecfWijWjj[dZ[[i"\ehJWf_W
:[XehW^;ijh_d"i]Z?dcEdhiZaEgd[Zhhdgd[ 9ed\[h[dY[(&''lZÉkZVYYZYcZlegd\gVbhidXdc"
8dbejiZgHX^ZcXZViJ8A6VcYVbZbWZgd[ cZXihijYZcihl^i]Xdbeji^c\egd[Zhh^dcVah!i]ZgZWn
i]ZCVi^dcVa6XVYZbnd[:c\^cZZg^c\!l^aaiVa` deZc^c\i]ZYddgid[jijgZdeedgijc^i^Zh!VcYVheZX^Va
dcÆEVgi^X^eVidgnHZch^c\/[gdb:XdhnhiZbh dji^c\idiV`Z^ci]Zh^\]ihd[HVc;gVcX^hXd#LZl^aa
id=jbVcHnhiZbh#Ç Xdci^cjZeVhiedejaVghZhh^dch!^cXajY^c\i]ZHijYZci
EdhiZgHZhh^dc!Idlc=VaaBZZi^c\!7VcfjZi!VcYi]Z
7bWd;kijWY["HZc^dgK^XZEgZh^YZcid[ 9dXidgVa8dchdgi^jb!VYVn"adc\egd\gVbYZh^\cZY
:c\^cZZg^c\VcYGZhZVgX]Vi<dd\aZ!l^aa id]ZaeZfj^ehijYZcih[dgi]Z\gjZa^c\X]VaaZc\Zd[
\^kZVcV[iZgY^ccZgiVa`Zci^iaZYÆDg\Vc^o^c\ Òc^h]^c\i]Z^gYdXidgViZ#I]ZgZl^aaVahdWZGZhjbZ!
i]ZLdgaYÉh>c[dgbVi^dc#Ç <gVYHX]ddaVcY:Vgan8VgZZg6Yk^XZLdg`h]dehVcY
ViiZcYZZ"egdedhZY7D;hVcYeVcZah#
7oWddW>emWhZ"6hhdX^ViZEgd[Zhhdg^ci]Z
:8:HX]ddaVi<Zdg\^VIZX]l]dIZX]cdad\n J^[9ed\[h[dY[fhe]hWc"d[miWdZh[]_ijhWj_ed
GZk^ZlhZaZXiZYVhV'%%(Ndjc\>ccdkVidg! _d\ehcWj_edYWdX[\ekdZWj0
l^aa\^kZi]ZiVa`ÆHcdBdiZh"GdWdi^XHX^Zci^ÒX ]iie/$$iVe^VXdc[ZgZcXZ#dg\$'%&&$
:meadgZgh[dgJcYZghiVcY^c\8a^bViZ8]Vc\Z#Ç
JWf_W9ed\[h[dY[(&''ikffehj[hi_dYbkZ[0
8bW_i[7]k[hWo7hYWi"6gX]^iZXil^i]B^Xgd" =ee]b[VcYDWj_edWbIY_[dY[<ekdZWj_edEaVi^cjb0
hd[il]dlVhhZaZXiZYWnIZX]cdad\nGZk^ZlVh ?dj[b<daY0
V'%%-Ndjc\>ccdkVidgVcYlVhVXZaZWgViZY 9_iYe"C_Yheie\jVcYD[j7ffH^akZg0
heZV`ZgVii]ZI:9IZX]cdad\n:ciZgiV^cbZci ?8CVcYIocWdj[Y7gdcoZ0
9Zh^\cXdc[ZgZcXZ# 7cWped"<h[ZZ_[CWY"BWmh[dY[8[ha[b[oDWj_edWb
BWXehWjeho"BWmh[dY[B_l[hceh[DWj_edWbBWXehW#
?bboW>_Yai"6hhdX^ViZEgd[Zhhdg^ci]Z8dbej" jeho"DWj_edWb9[dj[h\eh7jceif^[h_YH[i[WhY^"
iVi^dcVaVcY6eea^ZYBVi]ZbVi^Xh9ZeVgibZci DWj_edWbI[Ykh_jo7][dYoVcYI7FHjeedgiZgh#
ViG^XZJc^kZgh^inVcYgZX^e^Zcid[i]Z'%%*
Dei^b^oVi^dcEg^oZ[dgNdjc\GZhZVgX]ZghWn I]ZIVe^V8dc[ZgZcXZ'%&&^hdg\Vc^oZYWni]Z8dVa^i^dcid
i]ZDei^b^oVi^dcHdX^Zin# 9^kZgh^[n8dbeji^c\VcY^hXd"hedchdgZYWni]Z6hhdX^Vi^dc
[dg8dbeji^c\BVX]^cZgnVcYi]Z>:::8dbejiZgHdX^Zin!^c
XddeZgVi^dcl^i]i]Z8dbeji^c\GZhZVgX]6hhdX^Vi^dc#
;ijh_d
in the virtual extension
DOI:10.1145/1897852.1897855
Reaching Out to the Media: The Internet Electorate Governing Web 2.0
Become a Computer R. Kelly Garrett and James N. Danziger Steven DeHertogh, Stijn Viaene,
Science Ambassador and Guido Dedene
The Internet was a prominent feature
Frances Rosamond et al. Web 2.0 applications aspire to make
of the 2008 U.S. presidential election,
Science communication or public regularly noted for its role in the Obama maximal use of the level playing field
outreach can be seen as taking a lot campaign’s successful fundraising and for engagement offered by the Internet,
of time and effort compared to the supporter-mobilization efforts and for its both technologically and socially. The
perceived payoffs these types of initiatives widespread use by interested voters. This World Wide Web has thereby entered
provide. In effect, there’s a tragedy of the article reports on a national telephone “the realm of sociality,” where software
commons—we all benefit from those survey conducted in the weeks following becomes fused with everyday social life.
who do it, so there is incentive to let other that election to assess how Americans’ This evolution has taken huge strides—
people shoulder the load. experience of elections was changing in Web 2.0 environments such as Wikipedia,
The rationale behind science response to the increasing availability Facebook, and MySpace have all become
communication is fairly obvious, and it and use of the digital communication household names.
is often difficult to provide compelling network. Both practitioners and researchers
arguments that appeal to skeptics. Public The Internet has long been heralded as are converging on the usefulness of Web
outreach is related to the reputation of the an efficient means of acquiring political 2.0 for professional organizations. In and
scientific field, funding, and the integration information, but the increasing presence around enterprises, Web 2.0 platforms
of the science community into society. More of user-created content means the have been professed to support a profound
locally and perhaps more relevantly, it is network is also becoming an important change in intra- and inter-enterprise
related to the reputation of your university mode of political expression. The communication patterns. It is still early in
and to the quality of your students. article examines these complementary terms of available management research
Other sciences have established long- roles, focusing on how Americans used on so-called “enterprise 2.0” experiences.
lasting traditions of transmitting their key the Internet to learn about the 2008 Nevertheless, we have observed, as have
issues, raising public awareness including campaign, share political information, others, that the way for organizations to
highlights such as the Nobel Prize or and voice their own opinions. Also capture benefits from Web 2.0 technology
the Fields Medal (however, rarely is the considered is which individuals are most in the enterprise probably differs
general public aware of the ACM A.M. likely to engage in online information substantially from the way they attended
Turing Award). acquisition and expression, examining to other enterprise information system
Computer science is not yet where it the influence of these practices on voters. projects in the past.
should be regarding public awareness. Analyses are based on the national This article proposes a set of grounding
The reasons for this situation may lie in random-digit dial telephone survey of 600 principles to get the most out of enterprise
the relative youth of the area, the rapid adult Americans conducted two weeks 2.0 investments. The principles represent
advances in the field, as well as the fast- after the 2008 election (November 6–20), a synthesis of existing management
moving technology that computer science with a response rate of 26.2%. theory and the author’s own case research
is related to. Computer scientists face In the lead-up to the U.S. election in of companies with recent experience in
the myriad drawbacks of lacking public 2008, nearly two-thirds (64%) of Americans introducing Web 2.0 into their enterprises.
awareness. They are confronted with low got campaign news online, a marked The successful introduction of Web 2.0
enrollment numbers and low funding, increase from 2004, when only about one- for the enterprise will require a move
and to some extent, they feel ignored quarter (27%) of Americans said they got away from predesigned paternalistically
and misunderstood. The authors of this campaign news online. Equally notable imposed communication strategies and
article provide suggestions for what can be is the fact that in 2008 two-fifths (38%) structures, toward carefully stimulating a
pragmatically done to increase coverage of of respondents reported seeking online many-to-many, decentralized emergence
computer science in the media. campaign news almost every day. of bottom-up communicative connections.
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 11
The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.
doi:10.1145/1897852.1897856 http://cacm.acm.org/blogs/blog-cacm
12 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
blog@cacm
exception handling when they are actu- The purpose is not even well under- ephemeral teams get people to know
ally working in JavaScript. stood. The business strategy behind people, yielding valuable peer net-
Maybe we should be teaching sci- forming a research group sometimes works. When a tough research prob-
entists and engineers about computer seems to be little more than a vari- lem later comes up and no one nearby
science more generally. But as Greg ant of the Underpants Gnomes’ plan knows how to solve it, finding the per-
Wilson points out, they don’t want in South Park. Phase 1: Hire Ph.Ds. son in the company who can solve it
much—they see computer science Phase 2: ? Phase 3: Profit! becomes much easier.
as a “tax.” What’s the core of com- Generally, researchers in industry Many other companies, including
puter science that even scientists and are supposed to yield some combina- Microsoft, Facebook, and Twitter, main-
engineers ought to know? Alan Kay tion of long-term innovation, improv- tain separate research organizations,
recently suggested a “Triple Wham- ing the sophistication of technology but try to keep the researchers working
my” (http://computinged.wordpress. and products beyond simple and ob- very closely with the product teams. At
com/2010/05/24/the-core-of-computer- vious solutions, and helping to attract these companies, the impetus for novel
science-alan-kays-triple-whammy/) talented and enthusiastic developers. research often is a problem in the prod-
defining the core of computer science: To take one example in search, with- uct, usually a problem that would not
1. Matter can be made to remember, out researchers who know the latest be obvious in academia because of their
discriminate, decide, and do. work, it would be hard for a company to lack of access to big data and scale.
2. Matter can remember descrip- build the thousands of classifiers that What organizational structure
tions and interpret and act on them. ferret out subtleties of query intent, works best in industry may depend on
3. Matter can hold and interpret and document meaning, and spamminess, your goals. For immediate impact, hav-
act on descriptions that describe any- all of which is needed for a high-quality ing researchers integrated into product
thing that matter can do. search experience. Information retriev- groups provides a lot of value; they are
That’s a pretty powerful set. It goes al is a field that benefits from a long directly solving today’s hard problems.
way beyond Python vs. Java, or using history of past work, and researchers But what about the problem that might
Perl to check genome sequences with often are the ones that know the history hit in a year or two? And what about
regular expressions vs. using MATLAB and how to stand on giants’ shoulders. long-term breakthroughs, entirely new
for analyzing data from ecological sim- Even so, there are many in industry products, enabled by new technology
ulations. How do we frame the Triple that consider researchers an expensive no one has thought of yet?
Whammy in a way that fledgling scien- luxury that their company can ill afford. My personal opinion leans mostly
tists and engineers would find valuable Part of this comes from the historically toward integrating researchers on
and learnable? common organizational structure of projects, much like Google does, but
having a separate and independent also giving researchers 20% time (as all
Reader’s comment research lab, which sometimes looks developers should get) and occasion-
The worrying trend I see is that many to be a gilded ivory tower to those who ally turning a 20% time project into a
computer engineering graduates are feel they are locked outside. full project (again, as all developers
interested in learning only a large set of The separate research lab is the tra- should get, but the threshold for what
programming languages, but dislike courses ditional structure, but a problematic is considered impactful might differ
like algorithm design, not realizing that these one, not only for the perception of the for a researcher, given the speculative
languages are merely tools for implementing group by the rest of the company, but gamble that is the nature of research).
solutions. The end result is what you could also because the researchers can be so This strikes a balance between imme-
call technicians but not engineers. far removed from the company’s prod- diate impact, doing novel research,
—Farhan Ahmad ucts as to have little ability to make an and taking advantage of a long-term
impact. Many companies appear to opportunity when inspiration hits.
Greg Linden be trying other ways of organizing re- What do you think? How should re-
“Research in the Wild: searchers into the company. searchers be organized in companies?
Making Research For example, Google is well known Why?
Work in Industry” for integrating many of its researchers
http://cacm.acm.org/ into product groups and shifting them Reader’s comment
blogs/blog-cacm/97467 among product groups, working side- Research as a process and a profession and
How to do research in academia is well by-side with different development as a mind set is quite different than product-
established. You get grants to fund teams. While on a particular project, making. Pushing the two too close together
your group, attract students, publish a researcher might focus on the part or expecting people to be good at both may
papers, and pump out Ph.Ds. Depend- of the problem that requires esoteric not always be optimal. See my “Research as
ing on who you ask and how cynical knowledge of particular algorithms, product” post on the FXPAL Blog.
they have become, the goals are some but they are exposed to and work on —Gene Golovchinsky
combination of impacting the field, many problems in the product. When
educating students, and personal ag- this group comes together, everyone Mark Guzdial is a professor at the Georgia Institute
of Technology. Greg Linden is the founder of Geeky
grandizement. shares knowledge, and then people Ventures.
Research in industry is less estab- move to another group, sharing their
lished. How to organize is not clear. knowledge again. Moreover, these © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 13
cacm online
ACM
Member
News
DOI:10.1145/1897852.1897857 David Roman David Harel Elected
to Israel Academy of
14 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
N
news
I
n r ece n t y ears , several pow-
erful research grids consisting
of thousands of computing
nodes, dozens of data cen-
ters, and massive amounts of
bandwidth have emerged, but few of
these grids have received much atten-
tion in the mainstream media. Unlike
seti@home, folding@home, and other
highly focused grid projects that have
captured the popular imagination by
allowing home users to donate com-
pute cycles, the big research grids are
not accessible to the public and their
fame does not extend far beyond the
researchers who use them. Outreach
teams and usability engineers at the
largest of these new grids, such as Na-
regi, Egee, and TeraGrid, are trying to
change that reality by helping to facili- A grid-based computer simulation of the gravitational waves produced as two black holes
Scientific Visualiz atio n by Werner Benger, AE I/ZIB/LSU/UIBK
tate the adoption of grid technologies merge with each other to form a larger black hole.
in fields that have not traditionally used
grid-based supercomputing resources. and is tied to a 10-gigabyte backbone ers working on almost 1,600 projects at
TeraGrid, said to be the world’s larg- that connects primary network facili- the end of 2009.
est distributed network infrastructure ties in Los Angeles, Denver, and Chi- Matthew Heinzel, director of Ter-
for open scientific research, is one such cago. At maximum capacity, TeraGrid aGrid’s grid infrastructure group,
network that has quietly been making can produce more than a petaflop of says that TeraGrid’s outreach teams
waves in research communities out- computing power and store more than have done an excellent job drawing re-
side computer science, and is helping 30 petabytes of data. The project, start- searchers from fields outside comput-
to solve complex problems in biology, ed by the National Science Foundation er science. To obtain CPU time on the
physics, medicine, and numerous oth- in August 2001, has grown in the past grid, scientists simply submit a request
er fields. TeraGrid consists of 11 data- five years from fewer than 1,000 active that describes the work to be done; ex-
center sites located around the U.S. users in 2005 to nearly 5,000 active us- tensive CPU-time requests are subject
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 15
news
16 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news
high level of accuracy requires systems dimensional flow fields and other Stanford’s Naiman offers a simi-
that are too large for a single university. data, such as time histories for cloud lar observation. He says that while his
For example, the nodes processing Sch- ice mass. Naiman says that although work on the grid has been positive, the
netter’s modeling code require com- the TeraGrid data is still undergoing usability of the technology could be im-
munication with other nodes several analysis, it is likely to help improve proved. For his part, TeraGrid’s Heinzel
times per second and a data-exchange understanding of the development says he remains optimistic about the
rate of about one gigabyte per second. of contrails and their environmental accessibility of grids, and predicts that
“To finish a simulation in a reason- impact. “TeraGrid performs as adver- major usability improvements are on
able time, we need to use hundreds tised, providing us with CPU hours the way. He likens the evolutionary
or thousands of cores,” he says. “That that we would not have had access to pace of these grid developments to
means we need to split the problem otherwise,” he says. “We also take ad- how the Web quickly emerged from the
into many pieces, and we need to en- vantage of the large archival storage Internet and now requires little more
sure that each of these pieces remains available on TeraGrid to ensure that than a browser and a basic knowledge
as independent from the others as important data is backed up.” of hyperlinks. “If you know exactly how
possible.” By using this modular tech- to use grid tools, they work effectively,”
nique, Schnetter’s team can replace or Improving Grid Software says Heinzel. “Now we need to make
exchange grid code if it becomes nec- As for the future of research on grid them more user-friendly so we can get
essary to apply new physics or use a dif- networks, TeraGrid’s Heinzel says he a wider audience.”
ferent hardware architecture. remains optimistic, but points out that In the future envisioned by Heinzel,
Another project run on TeraGrid is improvements must be made in grid grids will be manipulated easily by
an effort to understand the environ- software not only to enhance ease of computer scientists while still provid-
mental impact of aviation. Conducted use for researchers such as Schnetter ing friendly interfaces for researchers
out of Stanford University by doctoral and Naiman, but also to take complete coming from other fields. Rather than
candidate Alexander Naiman and advantage of new generations of hard- predicting that the arrival of such tech-
overseen by Sanjiva Lele, a professor ware. “You have to be almost a systems nologies will take decades or more,
in the department of aeronautics and admin to set your parameters on data Heinzel says that much progress will
astronautics, the project models con- movement correctly so you can take be made in the next few years alone.
densation trails, the ice clouds formed full advantage of these systems,” says “We’re going to see some big improve-
by aircraft emissions. Naiman, whose Heinzel. “So the software really needs ments in the usability of the grid and
research group specializes in compu- to mature.” grid software in the next two to four
tational fluid dynamics and turbulence Echoing these concerns, LSU’s years,” he says. “Future systems will be
simulations, says the difficulty of con- Schnetter points out that his research very user-friendly with a high degree
trail modeling becomes increasingly groups consist of people with widely of abstracting the inner workings of
acute as the complexity of the model varying degrees of supercomputer expe- what’s going on from the end users.”
increases. “The more complex the rience. “Teaching everybody how to use
flow, the higher resolution required to the different systems, and staying on
Further Reading
simulate it, and the more resources are top of what works best on what system,
needed,” he says. and which parameters need to be tuned Ferreira, L., Lucchese, F., Yasuda, T., Lee, C.Y.,
Queiroz, C.A., Minetto, E., and Mungioli, A.S.R.
While Stanford has local supercom- in what way to achieve the best perfor-
Grid Computing in Research And Education.
puting resources, they are in high de- mance, is like herding cats,” he says. IBM Redbooks, Armonk, NY, 2005.
mand. “TeraGrid provides relatively “There are almost no GUIs for super-
Magoulès, F.
large and modern supercomputing re- computers, and most of the ones that Fundamentals of Grid Computing: Theory,
sources to projects like ours that have exist are really bad, so that using them Algorithms, and Technologies. Chapman &
no other supercomputing support,” he requires some arcane knowledge.” Hall, Boca Raton, FL, 2009.
says. The simulation code that Naiman Schnetter says he hopes that grid- Neeman, H., Severini, H., Wu, D.,
and his team run on TeraGrid was based supercomputing will have a and Kantardjieff, K.
written at the Center for Turbulence much larger influence on the curricu- Teaching high performance computing via
Research at Stanford. Naiman says it lum than it does today, especially with videoconferencing, ACM Inroads 1, 1, March
2010.
was easy to get that code running on so few universities teaching scientific
TeraGrid. The research group paral- programming at the level required to Scavo, T., and Welch, V.
A grid authorization model for science
lelized the program, a type of large effectively use grid resources. “The gateways, International Workshop on Grid
eddy simulation, using standard mes- good students in my group learned Computing Environments 2007, Reno, NV,
sage-passing interface strategies that programming by themselves, on the Nov. 11, 2007.
Naiman says have been highly scalable side, because they were interested,” Wong, J. (Ed.)
on TeraGrid. he says. Still, Schnetter suggests that Grid Computing Research Progress. Nova
The contrail modeling project is such self-taught programming might Science Publishers, Hauppauge, NY, 2008.
ongoing, but so far the Stanford team not be sustainable in a world in which
has simulated the first 20 minutes of computers are becoming increasingly Based in Los Angeles, Kirk L. Kroeker is a freelance
editor and writer specializing in science and technology.
contrail development for several sce- complex. “I hope that this changes in
narios, producing terabytes of three- the next decade,” he says. © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 17
news
Twitter as Medium
and Message
Researchers are mining Twitter’s vast flow of data to measure public sentiment,
follow political activity, and detect earthquakes and flu outbreaks.
T
wi t t e r g e ne rate s a lot of
noise. One hundred sixty
million users send upward
of 90 million messages per
day, 140-character mus-
ings—studded with misspellings,
slang, and abbreviations—on what they
had for lunch, the current episode of
“Glee,” or a video of a monkey petting
a porcupine that you just have to watch.
Individually, these tweets range
from the inane to the arresting. But
taken together, they open a surprising
window onto the moods, thoughts, and
activities of society at large. Research-
ers are finding they can measure public
sentiment, follow political activity, even
spot earthquakes and flu outbreaks,
just by running the chatter through
algorithms that search for particular
words and pinpoint message origins.
“Social media give us an opportu-
nity we didn’t have until now to track Truthy shows how a tweet propagates, with retweets in blue and topic mentions in orange.
what everybody is saying about every- Tweets that are sent back and forth between two Twitter accounts appear as a thick blue bar.
thing,” says Filippo Menczer, associ-
ate director of the Center for Complex ness than somebody tweeting “home which time the disease has almost cer-
Networks and Systems Research at In- sick with flu.” But it can take a week tainly spread. Twitter reports, though
diana University. “It’s amazing.” or two for the CDC to collect the data less precise, are available in real time,
The results can be surprisingly accu- and disseminate the information, by and cost a lot less to collect. They could
rate. Aron Culotta, assistant professor draw health officials’ attention to an
of computer science at Southeastern outbreak in its earlier stages. “We’re
Louisiana University, found that track- Twitter data may help certainly not recommending that the
ing a few flu-related keywords allowed CDC stop tracking the flu the way they
him to predict future flu outbreaks. He answer sociological do it now,” Culotta says. “It would be
used a simple keyword search to look questions that are nice to use this as a first-pass alarm.”
at 500 million messages sent from Sep- Google Flu Trends does something
tember 2009 to May 2010. Just finding otherwise hard to similar. One potential point in Twit-
the word “flu” produced an 84% cor- approach, because ter’s favor is that a tweet contains more
relation with statistics collected by the words, and therefore more clues to
U.S. Centers for Disease Control and polling enough people meaning, than the three or four words
Im age courtesy of t rut hy.india na .ed u
Prevention (CDC). Adding a few other is too expensive and of a typical search engine query. And
words, like “have” and “headache” in- training algorithms to classify mes-
creased the agreement to 95%. time consuming. sages—filtering out the tweets that talk
The CDC’s counts of what it terms about flu shots or Bieber Fever—im-
influenza-like illness are based on proves the accuracy further.
doctors’ reports of specific symptoms There are other physical phenom-
in their patients, so they’re probably ena where Twitter can be an add-on to
a more accurate measure of actual ill- existing monitoring methods. Air Twit-
18 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news
IBM’s
in St. Louis, looks for comments and
photos about events like fires and dust “If you think about
storms as a way to get early indications our society as being
about air quality. And the U.S. Geologi-
cal Survey (USGS) has explored using a big organism,” says 2015 Pre-
Twitter messages as a supplement to
its network of seismographic monitors
Noah Smith, “this is
just another tool to
dictions
that alert the federal agency when an
earthquake occurs. Paul Earle, a seis- look inside of it.” IBM recently unveiled its fifth
annual “Next Five in Five,” a
mologist at the USGS, is responsible
list of technology innovations
for quickly getting out information the company says have the
about seismic activity. He has searched potential to change how
for spikes in keywords—“OMG, earth- people work, live, and play over
the next five years. The IBM
quake!” is a popular tweet—for a quick predictions are:
alert of an event. “It’s another piece of the sorts of information that might be ˲˲ 3D interfaces will let peo-
information in the seconds and min- derived from the Twitter data stream, ple interact via 3D holograms
utes when things are just unfolding,” even without taking additional steps to in real time. As 3D and holo-
graphic cameras become more
Earle says. “It comes in earlier. Some or filter out false positives. “There’s just sophisticated and miniaturized
most of its value is replaced as we get simply so much data that you can do to fit into mobile phones, us-
more detailed or science-derived infor- pretty decently, even by taking naïve ap- ers will be able to interact with
photos, surf the Web, and chat
mation.” proaches,” says Mislove. in novel ways.
Earle says Twitter might help weed This type of inquiry, of course, has ˲˲ Scientific advances in tran-
out the occasional false alarm from au- limitations. Researchers readily admit sistors and battery technology
tomated equipment, when no tweets that Twitter data is noisy, and it’s not will allow devices to last about
10 times longer than they do
follow an alert. The content of tweets always simple to know what a word now. Instead of today’s heavy
might also supplement Web-based means—in some parlances, “sick” is a lithium-ion batteries, scientists
forms that collect people’s experiences good thing. But with hundreds of mil- are working on batteries that
use air to react with energy-
of an earthquake and are used to map lions of messages, the errors tend to
dense metal.
the event’s intensity, a more subjective shrink. Another worry is hysteria; peo- ˲˲ Sensors in phones, cars, and
measure of impact that includes fac- ple worried about swine flu might tweet other objects will collect data
tors such as building damage. A recent about it more, leading others to worry to give scientists a real-time
picture of the environment.
earthquake in Indonesia, for instance, and tweet (or retweet), so there’s a spike IBM recently patented a tech-
produced a spike of tweets—in Indo- in mentions without any increase in ac- nique that enables a system to
nesian. There’s no Web form in that tual cases. accurately conduct post-event
language for intensity, but Earle says There’s also sample bias; certain analysis of seismic events, as
well as provide early warnings
Indonesian tweets might help fill in the segments of the population use Twitter for tsunamis.
blanks. more than others. But researchers seek- ˲˲ Advanced analytics technolo-
ing to glean insights from tweets can gies will provide personalized
recommendations that get
Sentiment Analysis apply corrections to the sample, just as commuters where they need
Many researchers are doing senti- traditional pollsters do. And as a wider to go in the fastest time. Using
ment analysis of tweets. Using tools variety of people send more tweets, the new mathematical models and
from psychology, such as the Affective bias is reduced. “The more data you IBM’s predictive analytics tech-
nologies, researchers will ana-
Norms for English Words, which rates have, the closer you get to a true repre- lyze and combine multiple pos-
the emotional value of many words, sentation of what the underlying popu- sible scenarios that can deliver
Alan Mislove tracked national moods, lation is,” says Noah Smith, an assistant the best routes for daily travel.
˲˲ Innovations in computers
and found that Americans tend to be professor of computer science at Carn-
and data centers are enabling
a lot happier on Sunday morning than egie Mellon University. the excessive heat and energy
Thursday evening, and that West Coast Smith is examining how Twitter can that they give off to help heat
residents seem happier than those on supplement more familiar polling. One buildings in the winter and
power air conditioning in the
the East Coast. advantage is that pollsters can influ- summer. With new technolo-
“I think this is going to be one of the ence the answers they get by the way gies, such as novel on-chip
most important datasets of this era, be- they phrase a question; people are fairly water-cooling systems, the ther-
cause we are looking at what people are consistent, for example, in being more mal energy from processors can
be efficiently recycled to provide
talking about in real time at the scale of supportive of “gay marriage” than of hot water for an office or home.
an entire society,” says Mislove, an as- “homosexual marriage,” just because The 2015 predictions
sistant professor of computer science of the word choice. Studying tweets, are based on emerging
technologies from IBM’s labs
at Northeastern University. He says which people send out of their own ac- around the world as well as
there’s no easy way to validate those re- cord, removes that problem. “We’re not market and societal trends.
sults, but as a proof-of-concept it shows actually talking to anyone. We’re not —Bob Violino
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 19
news
asking a question. We’re just taking and campaign names, the system de- CDC statistics or public opinion polls,
found data,” Smith says. “We can get a tects what he calls memes, messages but others remain unproven. Still, the
much larger population of people par- about a specific topic or candidate. It scientists are excited at the prospects
ticipating passively.” then displays a graphic representa- of what they might find by mining such
Indeed, he says, Twitter data may tion of how each meme propagates, a large, raw stream of data. “As Twitter
help researchers answer all sorts of with retweets in blue and mentions of and other social media grow, you’ll be
sociological questions that are other- the topic in orange. If someone sets up able to ask much more fine-grained
wise hard to approach, because poll- two accounts and repeatedly sends the questions,” says Smith. “If you think
ing enough subjects is too expensive same tweets back and forth between about our society as being a big organ-
and time-consuming using traditional them—an effort to show up in Twit- ism, this is just another tool to look in-
methods. For instance, Smith says, a ter’s popular “trending topics” list—it side of it.”
researcher might study how linguistic appears as a thick blue bar. Networks
patterns correlate to socioeconomic of automated tweets pushing a particu-
Further Reading
status, and perhaps learn something lar meme show up as regular, orange
about communication patterns among starbursts. More natural propagations Chen, J., Nairn, R., Nelson, L., and Chi, E. H.
Short and tweet: experiments on
people in different demographic look like fuzzy dandelions. In some
recommending content from information
groups. That, in turn, could reveal cases, the tweets carry links to Web streams, ACM Conference on Human
something about their access to infor- sites with questionable claims or even Factors in Computing Systems, Atlanta, GA,
mation, jobs, or government services. strident propaganda. Others turn out April 10–15, 2010.
Of course, the power of widespread, to be pitching a product. Culotta, A.
unfiltered information invites the pos- The patterns, along with informa- Detecting influenza outbreaks by analyzing
sibility of abuse. Two researchers at tion about when each Twitter account Twitter messages, KDD Workshop on Social
Media Analytics, Washington, D.C., July 25,
Wellesley University, Panagiotis Metax- was created and whether the owner is
2010.
as and Eni Mustafaraj, found that dur- known, allow voters to distinguish ac-
ing a special election in Massachusetts tual political dialogue from organized Earle, P., Guy, M., Buckmaster, R., Ostrum, C.,
Horvath, S., and Vaughan, A.
for U.S. Senate, the Democratic candi- attacks. Menczer hopes to add senti- OMG earthquake! Can Twitter improve
date, Martha Coakley, was the subject ment analysis to analyze the content of earthquake response? Seismological
of a “Twitter bomb” attack. A conserva- the messages as well as their dispersal Research Letters 81, 2, March/April 2010.
tive group in Iowa, the American Future patterns. Metaxas, P.T. and Mustafaraj, E.
Fund, sent out 929 tweets in just over At Xerox Palo Alto Research Center, From obscurity to prominence in minutes:
two hours with a link to a Web site that Research Manager Ed H. Chi is also political speech and real-time search, Web
attacked Coakley. The researchers es- looking at message propagation. “Twit- Science Conference, Raleigh, NC, April
26–27, 2010.
timate the messages could have been ter is kind of this perfect laboratory
seen by more than 60,000 people before for understanding how information O’Connor, B., Balasubramanyan, R.,
Routledge, B., and Smith, N.
being shut down as spam. spreads,” Chi says. Such a study can
From Tweets to polls: linking text sentiment
Indiana’s Menczer developed a tool improve theoretical models of informa- to public opinion time series, Proceedings
to distinguish between organized par- tion dispersal, and also give people and of the International AAAI Conference on
tisan spamming and grass-roots activ- businesses better strategies for deliver- Weblogs and Social Media, Washington,
ism. He calls it Truthy, from comedian ing their messages or managing their D.C., May 23–26, 2010.
Stephen Colbert’s coinage describing reputations.
a statement that sounds factual but Much Twitter-based research is still Neil Savage is a science and technology writer based in
Lowell, MA.
isn’t. Starting with a list of keywords preliminary. Some findings can be vali-
that includes all candidates, parties, dated through other sources, such as © 2011 ACM 0001-0782/11/0300 $10.00
Milestones
AAAS Fellows
In December, the American for the Section on California, Santa Cruz/Palo Carolina, Chapel Hill; Hanan
Association for the Advancement Information, Computing, Alto Research Center; Venu Samet, University of Maryland;
of Science (AAAS) elected and Communication. They Govindaraju, University at Abraham Silberschatz, Yale
503 members as Fellows in are: Srinivas Aluru, Iowa Buffalo State, The University of University; Manuela M. Veloso,
recognition of their meritorious State University; Victor Bahl, New York; Hamid Jafarkhani, Carnegie Mellon University;
efforts to advance science or Microsoft Research; David R. University of California, Irvine; and Barry Wessler, Wessler
its applications. Election as Boggs, Consulting Electrical Farnam Jahanian, University of Consulting.
an AAAS Fellow is an honor Engineer; Geoffrey Charles Michigan; Phokion G. Kolaitis, The new Fellows were
bestowed upon members by Bowker, University of Pittsburgh; University of California, Santa honored at the Fellows Forum
their peers. John M. Carroll, Pennsylvania Cruz; C. C. Jay Kuo, University held in February during the
Of the new Fellows, 16 State University; J. J. Garcia- of Southern California; Dinesh AAAS Annual Meeting in
members were selected Luna-Aceves, University of Manocha, University of North Washington, D.C.
20 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news
Evaluating
Government Funding
Presidential report asserts the value of U.S. government
funding and specifies areas needing greater focus.
I
s computer science rightly part
of public education? How
much does the U.S. govern-
ment spend on basic network-
ing and IT research? Should
industry provide that funding instead?
How important is supercomputing?
These are some of the questions ad-
dressed by a 148-page report released
by the President’s Council of Advisors
on Science and Technology (PCAST)
last December. Titled Designing a Digi-
tal Future: Federally Funded Research
and Development in Networking and In-
formation Technology, the report looked
at U.S. investments in the cross-agency
Networking and Information Technol- President Obama enjoys a lighthearted moment with members of the President’s Council of
ogy Research and Development (NI- Advisors on Science and Technology during a meeting at the White House.
TRD) program, currently totaling ap-
proximately $4.3 billion per year. parative rankings of the world’s fastest focus, such as the marked increase in
Among other points, the council supercomputers” are “relevant to only data. “We’ve gone from a world where
called for affirmation of computer sci- some of our national priorities,” and data was rare and precious to where
ence as a part of education in science, said that they shouldn’t “ ‘crowd out’ we’re drowning in it,” Reed says.
technology, engineering, and math; the fundamental research in computer The report also documented NIT’s
increased investment in the areas of science and engineering that will be re- importance to U.S. competitiveness—
privacy, human-computer interaction, quired to develop truly transformation- and the payback for NIT investment.
massive data stores, and physical in- al next-generation HPC systems.” An example of high payback was
strumentation such as sensors and PCAST called for an increase of $1 given at the report’s public release by
robotics; long-term, multi-agency NIT billion in funding for “new, potentially Akamai founder Tom Leighton. He re-
initiatives for health, energy, transpor- transformative NIT research” and rec- lated a story of U.S. Defense Advanced
tation, and security; better coordina- ommended more specific accounting Research Projects Agency funding he
tion among agencies by the Office of to separate basic NIT research from in- received in the 1990s to study “highly
Science and Technology Policy and frastructure costs. The report’s Work- mathematical and highly theoretical
the National Science and Technology ing Group Co-chair and University of subjects ... the living example of high-
Council; and a standing committee to Washington Professor Ed Lazowska risk research.” When the research was
provide ongoing strategic perspectives. was quick to point out that the money finished, Internet companies weren’t
The report also warned against sin- was used in ways that are “appropriate interested in the results—even for free.
gle-minded performance metrics when and important”—for example, large- “So we started a company called
evaluating high-performance comput- scale genome databases—although not Akamai Technologies,” Leighton says.
official wh it e h ouse photo by p ete souza
ing (HPC) projects—a subject made “pushing the forefront of NIT.” “We [now] carry over a third of Web
timely by Top500’s most-recent rank- Independent technology reviews traffic ... and are probably paying over
ing of the world’s fastest supercomput- of this sort were mandated under the a $100 million in taxes this year. But it
ers, which appeared five weeks before High-Performance Computing Act of wasn’t the kind of research that com-
PCAST released its report. The Chinese- 1991. The previous review, published in panies fund.”
built Tianhe-1A supercomputer topped 2007, “found many of the same issues”
that list, bumping U.S.-made comput- according to that report’s co-chair, Mi- Tom Geller is an Oberlin, OH-based science, technology,
and business writer.
ers from the lead spot for the first time crosoft Corporate Vice President Daniel
in six years. The report stated that “com- Reed. But he did note some changes of © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 21
news
Memristors:
Pass or Fail?
The device may revolutionize data storage, replacing flash memory
and perhaps even disks. Whether they can be reliably and cheaply
manufactured, though, is an open question.
A
fundamental elec-
device, whose
t ro n i c
existence was postulat-
ed five decades ago but
which proved hard to
understand, let alone build, is ready
to emerge from the lab, corporate and
university researchers say. If so, the
memristor (or memory resistor), as it
is called, may arrive just in time to save
the information storage industry from
the transistor’s collision with the scal-
ing wall at the end of Moore’s Law.
Hewlett-Packard announced last
August that it would team with the
South Korean computer memory
maker Hynix Semiconductor to de-
velop memristor-based memory chips,
called resistive RAM (ReRAM), which
they say will be on the market in about
three years. The companies say their Lorem ipsum dolor sit amet consect
titanium-based chips could replace
flash memory—which has become
nearly ubiquitous in mobile applica-
tions—and would be 10 times faster An image of a circuit with 17 memristors captured by an atomic force microscope at
and 10 times more energy efficient. Hewlett-Packard’s Information and Quantum Systems Lab.
Meanwhile, Rice University has joined
with Austin, TX-based PrivaTran, a tain types of logic circuits. “That [logic Berkeley, says the memristor’s size ad-
semiconductor design company spe- ability] could change the standard par- vantage isn’t its sole advantage. “You
Im age by R. Stanley Wi llia m s, H P Senior F ellow, Qua ntum Scien ce R esea rch , H P L a bs
cializing in custom integrated sys- adigm of computing, by enabling com- can not only build them smaller, but
tems, to develop an all-silicon ReRAM putation to one day be performed in use fewer of them,” he says. “Ten mem-
chip that could be a substitute for flash chips where data is stored, rather than ristors might do the same thing as 50
memory. But a senior research official on a specialized CPU,” says Gilberto transistors, so it’s a new ball game.”
at Intel says it is far from certain that Medeiros Ribeiro, a senior scientist at In 1971, Chua published a paper,
either effort will succeed. HP Labs. “Memristor—The Missing Circuit Ele-
A memristor is a tiny two-terminal The memristor has several quali- ment,” in IEEE Transactions on Circuit
electronic component that can be ties that make it attractive for memory Theory, which outlined the mathemat-
made from a variety of materials—in- chips. First, it is nonvolatile, so that ical underpinnings of memristors,
cluding polymers, metal oxides, and it remembers its state after electrical which he called the fourth fundamen-
conventional semiconductors like sili- current is switched off. Second, it can tal building block of electronics (along
con— whose resistance varies with the be scaled to a single nanometer (nm) with resistors, capacitors, and induc-
voltage applied across it and with the in size, researchers believe, whereas tors). The existence of memristance
length of time the voltage is applied. the one-bit flash memory cell is expect- had been reported earlier—in 1960 by
Its initial applications are likely to be ed to reach its scaling limit at about Bernard Widrow at Stanford Univer-
as binary memory devices, but it could 20 nm. And Leon Chua, a professor of sity, for example—but it was not well
work in an analog fashion and could electrical engineering and computer understood.
eventually become the basis for cer- science at the University of California, Earlier researchers had erroneously
22 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news
Women
esis relationship (one in which effect
lags cause) between voltage and cur- The memristor could
rent, when in fact it is based on flux enable computation
and charge, the time integrals of volt-
age and current, says Chua. He likens to be performed in and
the pre-1971 view of memristance to
Aristotle’s belief that force is propor-
chips where data is
stored, rather than
Tenure
tional to velocity and not, as Newton
correctly demonstrated 2,000 years on a specialized Over the last couple of decades,
women have played an
later, as proportional to the change in
velocity, or acceleration. CPU, says Gilberto increasingly important role in
the research sciences. However,
In 2006, HP designed and built a Medeiros Ribeiro. according to Keeping Women
in the Science Pipeline, a recent
titanium memristor that worked pre-
study by University of California,
dictably and retained its state when Berkeley researchers, there’s
powered off, based on the mathemati- trouble brewing. Women,
cal framework proposed by Chua. “For who now receive more than
50% of the Ph.D.s granted by
years people built [memristance] de- institutions, are more likely to
vices almost by accident. It’s to the leave the profession than men.
great credit of HP that they finally fig- However, not everyone has been im- This, combined with growing
ured it out,” he says. Figuring it out, pressed by the recent announcements demand for talent in Europe
and Asia, puts U.S. preeminence
according to HP’s Stanley Williams, from HP and Rice. “The memristor is in the sciences at risk.
the chief architect of the company’s only one of several interesting [recent] The authors, who collected
memristor, meant “understanding flash technologies, and by no means data from multiple sources
and surveyed 62 academic
the mathematical framework for the most interesting,” says Justin Rat- institutions, found considerable
memristors.” tner, Intel’s chief technology officer. differences in men and women
Almost 40 years seems a long time “Any time someone hypes a particular attaining tenure track positions.
between the emergence of Chau’s memory technology before building a Married women with young
children are 35% less likely than
framework and the ability to reli- large memory chip, you should be sus- their male counterparts to enter
ably produce memristors, but enor- picious, very suspicious. It’s one thing a tenure-track position after
mous engineering hurdles had to be to demonstrate a storage device in the receiving a Ph.D. in science.
What’s more, married women
overcome. It required methods and lab, but it’s an entirely different thing
with children are 27% less likely
tools, such as scanning tunneling mi- to demonstrate it can be built in high than men with children to
croscopy, that could work at atomic volume at low cost and with exception- receive tenure after entering a
scales. HP says it experimented with al reliability.” tenure-track job in the sciences.
On the other hand, single
an enormous number of device types, Rattner acknowledges that flash women without young children
many based on exotic materials and memory, which is a $20 billion-plus are about as successful as
structures, but the results were often market, is rapidly approaching its scal- married men with children in
inconsistent and unexplainable. It was ing limit. But rather than memristors, attaining tenure-track jobs.
According to the report, both
not until 2006 that HP developed equa- Intel is concentrating on nonvola- men and women view tenure-
tions that explained just what was oc- tile phase-change memory, by which track positions in research-
curring in its titanium memristors. certain types of glass can be made to intensive universities as less
than a family-friendly career
switch between two states by the ap- choice. Only 46% of the men
More Speed, Less Power plication of heat. In the amorphous and 29% of the women rate their
The breakthrough achieved by HP state, the atomic structure of the glass institutions “somewhat” or
in 2006 could revolutionize memory is highly disordered and has high re- “very” family friendly. Numerous
work hours and maternity leave
technology, the company says. “Mem- sistivity. But when switched to its crys- of less than six weeks were cited
ristor memory chips promise to run at talline state, the glass has a regular as common problems.
least 10 times faster and use 10 times atomic structure and low resistivity. The upshot? The academic
world needs to adopt more
less power than an equivalent flash “We built commercial-grade, phase-
family-friendly policies and
memory chip,” according to Williams, change memories of sufficient size to provide greater opportunities
director of HP’s Information and fully understand the pros and cons of for female tenure-track
Quantum Systems Lab. “Experiments the technology in a high-volume envi- candidates. “America’s
researchers do not receive
in our lab also suggest that memris- ronment,” Rattner says. Intel is look- enough family-responsive
tor memory can be erased and writ- ing at additional novel approaches to benefits,” the report concludes.
ten over many more times than flash nonvolatile memories such as the spin “Academia needs to be more
memory. We believe we can create torque transfer memory, which ex- flexible… research universities
should look to build a family-
memristor ReRAM products that, at ploits magnetic spin states to electri- friendly package of policies and
any price point, will have twice the ca- cally change the magnetic orientation resources.”
pacity of flash memory.” of a material. —Samuel Greengard
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 23
news
The memristor prototype chip built with the current flowing through it
at Rice is a 1-kilobyte ReRAM with sub- over time. Similarly, the brain learns
5 nm switches, according to Jim Tour, In the short term, and configures itself by varying the
a synthetic organic chemist at Rice memristors are most strength of synaptic connections be-
and a leading memristor researcher. tween neurons. The ability of mem-
Of Rattner’s concern about manufac- likely to be used in ristors to remember and to work as
turing the devices, he says, “All so far storage devices, analog devices allows them to assume
looks good—materials cost, fabrica- any of many values over a range, just as
tion needs, scalability and switching but eventually may synapses do.
times—except the switch voltage is a be used in artificial The memristor self-learns from
bit higher than we’d like, but we have experience, and the brain is made of
some ideas to reduce that.” neural networks. memristors,” Chua says. “That in the
Rice claims to have an edge over long run is much more interesting and
HP’s silicon-and-titanium memristor important [than data storage], because
chip with its all-silicon model. “There it’s how you can design intelligent ma-
are lots of engineering barriers to be chines. But it’s in the next 50 years, not
overcome before this really takes off,” the next 10.”
says Doug Natelson, a professor of oxidation and reduction of titanium.
physics and astronomy at Rice. “But Titanium dioxide (TiO2) is a semicon-
Further Reading
the use of all silicon makes the manu- ductor and is highly resistive in its pure
facturing very understandable.” state. However, oxygen-deficient TiO2, Chua, L.
Memristance comes from reduc- which has oxygen “vacancies” where an Memristor–the missing circuit element,
IEEE Transactions on Circuit Theory 18, 5,
tion-oxidation chemistry, in which oxygen atom would normally appear, is Sept. 1971.
atoms or molecules gain or lose their highly conductive. By applying a bias
Jo, S.H., Chang, T., Ebong, I.,
affinity for oxygen atoms, and in which voltage across a thin film of semicon-
Bhadviya, B.B., Mazumder, P., and Lu, W.
the physical structure of materials can ductor with oxygen-deficient TiO2 on Nanoscale memristor device as synapse in
change. The Rice memristor chip, a one side, the oxygen vacancies move neuromorphic systems, Nano Letters 10, 4,
thin layer of silicon oxide sandwiched into the pure TiO2 on the other side of March 1, 2010.
between two electrodes, is made to the semiconductor, thus lowering the Strukov, D.B., Snider, G.S., Stewart, D.R.,
convert back and forth between sili- resistance. Running current in the oth- and Williams, R.S.
con (a conductor) and silicon oxide er direction will move the oxygen va- The missing memristor found, Nature 453,
May 1, 2008.
(an insulator.) A sufficiently large volt- cancies back to the other side, increas-
age (up to 13 volts) applied across the ing the resistance of the TiO2 gain. Tour, J.M. and He, T.
silicon oxide converts some of it into In the short term, memristors are Electronics: the fourth element, Nature 453,
May 1, 2008.
pure silicon nanocrystals that conduct most likely to be used in storage de-
current through the layer. The switch, vices, but eventually may be used in Yao, J., Sun, Z., Zhong, L.,
Natelson, D., and Tour, J.M.
according to Natelson, shows robust artificial neural networks, in applica- Resistive switches and memories from
nonvolatile properties, a high ratio of tions such as pattern recognition or silicon oxide, Nano Letters 10, 10, Aug. 31,
current “on” to current “off” (>105), real-time analysis of the signals from 2010.
fast switching (sub-100 ns), and good sensor arrays, in a way that mimics
endurance (104 write-erase cycles). the human brain. A memristor works Gary Anthes is a technology writer and editor based in
Arlington, VA.
The HP version is conceptually like a biological synapse, with its con-
similar, but works by the alternating ductance varying with experience, or © 2011 ACM 0001-0782/11/0300 $10.00
Technology
Flexible Screens
Hewlett-Packard plans to a screen that won’t break or them,” said Nicholas Colaneri, like newspapers rolling off a
deliver a prototype of a solar- shatter like glass. director of the Flexible Display printing press, which could
powered, lightweight device Researchers expect the Center at Arizona State be more inexpensive than the
with a flexible plastic screen— HP prototype to inspire a University, in an interview with current batch production of
which HP researchers are new generation of products The San Jose Mercury News. glass displays.
affectionately terming “a Dick with flexible plastic screens, “How about a stack of thin The flexible display and
Tracy wristwatch”—to the U.S. including clothing, household displays that I can peel off and e-reader market is expected
Army later this year. Roughly furnishings, and more. “You stick on things, sort of like a to grow dramatically from
the size of an index card, the can start thinking about pad of Post-it notes?” $431 million in 2009 to $9.8
low-power device will enable putting electronic displays HP hopes to produce billion in 2018, according to
soldiers to read digital maps, on things where you wouldn’t flexible displays that can DisplaySearch.
directions, and other data on ordinarily think of having be produced continuously, —Graeme Stemp-Morlock
24 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news
Gary Chapman,
Technologist: 1952–2010
He raised important public issues, such as the impact
of computers and the Internet on society, and encouraged
social responsibility for computer professionals.
G
ary Chapman, an American labors,” Mitch Kapor, the founder of
technologist, Internet ex- Lotus Development Corp., said in an
pert, and ethicist, died email interview.
Dec. 14, 2010, at the age of Chapman served as CPSR execu-
58. He suffered a massive tive director from 1984 to 1991. Under
heart attack while on a kayaking trip in Chapman’s leadership, CPSR flour-
Guatemala. ished and grew, and emerged as an
Over the last two decades, Chap- outspoken critic of the use of technol-
man established himself as an inter- ogy by the military. “With Gary at the
national authority on Internet and helm, CPSR raised serious questions
technology policy. He was among the in public forums about the Strategic
first technologists to draw public at- Defense Initiative [Star Wars],” Orn-
tention to the issues that computing stein says. “Gary and I shared the no-
technology, including the Internet, tion that too much technology was be-
presents to society. Chapman helped ing developed for military purposes.
to insert ethics and human values into We felt strongly that it would have
the world of computing by focusing on been better if technology funding
a mélange of issues, including how to came through the National Science
address the digital divide in society, Foundation rather than the Depart-
preventing the misuse of technology ment of Defense.”
by government agencies, especially However, after the defeat of Star
the military, and encouraging young job interview, but was told he was too Wars and the collapse of the Berlin Wall
people to use the Internet responsibly. late. Ornstein had already decided on in 1989, the threat of nuclear war di-
At the time of his death, Chapman a candidate. Chapman insisted, and minished, and Chapman became more
was a senior lecturer at the LBJ School got an interview. “When we heard him concerned about the effects of comput-
of Public Affairs at the University of speak, we knew he was the perfect ers and, later, the Internet on society.
Texas at Austin. He began teaching at person for the job,” Ornstein recalls. Born on Aug. 8, 1952 in Los Ange-
the school in 1994. He also served as “Everyone on the board agreed that we les, Chapman served as a medic in
director of the 21st Century Project, a had to hire him.” the U.S. Army Special Forces during
research and education resource for “Gary was a real pioneer in link- the Vietnam War. He earned a B.A. in
policymakers and the public. ing the lives and careers of computer political science from Occidental Col-
In addition, Chapman lectured professionals to the social impact of lege in 1979 and attended Stanford
internationally and wrote for many the work they do and calling for us to University’s political science Ph.D.
prominent publications, including take responsibility for the fruits of our program. He left Stanford in 1984 to
The New York Times, Technology Review, lead CPSR.
and The New Republic. His syndicated “Gary was deeply concerned about
Digital Nation column, which ap- Chapman helped society plunging ahead with technology
peared in more than 200 newspapers without giving adequate thought to the
and Web sites, ran from 1995 to 2001. bring ethics and social implications,” Ornstein says. “He
Chapman’s big break came in 1984 human values to the helped provide much-needed direc-
photogra ph by Sash a Ha agensen
when, while a graduate student in po- tion, and he has left behind students
litical science at Stanford University, world of computing. and others who will continue to moni-
he learned the newly formed Com- tor and analyze technology policy.”
puter Professionals for Social Respon-
sibility (CPSR) was hiring its first ex- Samuel Greengard is an author and journalist based in
West Linn, OR.
ecutive director. Chapman contacted
CPSR cofounder Severo Ornstein for a © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 25
V
viewpoints
Legally Speaking
Do You Own the
Software You Buy?
Examining the fine print concerning your rights
in your copies of purchased software.
S
oftware companies have cause we bought our copy of that soft- “license” label, has been going on for
mass-marketed computer ware, we think we own it, regardless decades. Strangely enough, it has yet to
programs for the past few of what any “license” document says. be definitively resolved. A recent appel-
decades on terms that typ- We are also quite confident the vendor late court ruling has upheld the license
ically purport to restrict won’t take any action against us, even if characterization, but a further appeal
the right of end users to resell or oth- we do violate one of the terms, because is under way in that case. This ruling is
erwise transfer their interests in cop- realistically the vendor can’t monitor also at odds with other appellate court
ies of software they have purchased. every end user of its products. decisions. So things are still up in the air
The restrictions are usually stated in The debate over whether mass-mar- on the ultimate issue. This column will
documents known as shrink-wrap or ket transactions like these are really explain what is at stake in these battles
click-through “licenses.” Vendors of “sales” of goods, notwithstanding the over your rights in your copies of pur-
other types of digital content some- chased software.
times distribute their works with simi-
lar restrictions. Copyright law Three Legal Options
Shrink-wraps are documents in- The distinction between “sales” and “li-
serted in packaged software, often allows rights holders censes” really matters when assessing
just under the clear plastic wrap sur- to control only the risk of liability to copyright owners
rounding the package, informing pur- if resale restrictions are ignored.
chasers they are not owners of copies the first sale of Copyright law allows rights holders
of programs they just bought, but in- a copy of a protected to control only the first sale of a copy
stead have rights in the program that of a protected work to members of the
are limited by the terms of a license work to members public. After the first sale, the owner of
agreement. Click-throughs are similar of the public. that copy is entitled to resell or other-
in substance, although the “license” wise transfer (for example, give it away
terms only become manifest when you as a gift or lend it to others) the copy
try to install the software and are direct- free from risk of copyright liability.
ed to click “I agree” to certain terms. Bookstores and libraries are among the
Most of us ignore these documents institutions made possible by copyright
and the restrictions they contain. Be- law’s first-sale doctrine.
What happens if copyright owners sions in which copyright owners chal- The Ninth Circuit Court of Appeals
try to restrict resales through license lenged the resale of copies of copyright- heard the arguments in Vernor and Au-
restrictions? There are three possible ed works on eBay. In both cases, the gusto on the same day. In September
outcomes. courts ruled that copyright’s first-sale 2010, the appellate court ruled in favor
First, the effort to restrict resales may rule applied, notwithstanding transfer of Autodesk. Yet it upheld Augusto’s
be deemed a nullity, as it was in a fa- restrictions, because of economic reali- first-sale claim.
mous 1908 Supreme Court case, Bobbs- ties of the transactions. In assessing whether the first-sale
Merrill Co. v. Straus. Bobbs-Merrill sold The plaintiff in Vernor v. Autodesk rule should apply to mass-market
books to Straus containing a prominent asked the court to declare that he was transactions like these, it is useful to
notice that resale of the books except at the owner of copies of Autodesk soft- compare the economic realities test
a stated price would be treated as copy- ware he purchased from one of Au- used by the trial courts in the Vernor
right infringement. When Straus sold todesk’s customers and that he was and Augusto cases and the labeling and
the books for a lower price, Bobbs-Mer- entitled under the first-sale doctrine to restrictions test adopted by the Ninth
rill sued for infringement. The Court re- resell those copies on eBay. Circuit in Vernor.
fused to enforce this resale restriction, Autodesk claimed no sale had taken
saying that the copyright owner was en- place because the software was licensed Economic Realities Test
titled to control only the first sale of cop- on terms that forbade transfer of the Under this test, a copyright owner’s
ies of its works to the public. copy to third parties. Autodesk asked characterization of a transaction as a
Second, a resale restriction may be the court to declare that sales of these license, rather than a sale, is not dis-
enforceable against the purchaser in-
sofar as he has agreed not to resell his
copy, but it would be unenforceable
against anyone to whom the purchaser
might subsequently sell his copy.
This result might seem odd, but
there is a fundamental difference be-
tween contract and property rights:
Contracts only bind those who have
agreed to whatever terms the contract
provides; property rights create obliga-
tions that are good against the world.
A first-sale purchaser may thus have
breached a contractual obligation to the
copyright owner if it resells its copy of
the work in violation of a resale restric-
tion, but he is not a copyright infringer.
Those who purchase copies of copy-
righted works from owners of first-sale
copies are not at risk of either copyright
or contract liability. These third-party
purchasers are also free to resell their
copies to a fourth party without fear copies on eBay constituted copyright positive. It is instead only one factor
that either is at risk of any liability to the infringement. among many that should be weighed
copyright owner. UMG v. Augusto involved promotion- in determining the true nature of the
Third, courts may rule that the first- al CDs of music. Augusto purchased transaction.
sale rule does not apply to mass market these CDs at flea markets, online auc- Other factors include whether the
“license” transactions involving copies tions, and used CD stores. Language purchaser has the right of perpetual
of copyrighted works because no “sale” on the CD packaging indicated they possession of the copy, whether the
has taken place. Under this interpreta- were licensed for personal use only rights holder has the right and ability
tion, secondary markets in those cop- and could not lawfully be sold or other- to reclaim the copy if the license terms
ies are illegal. Anyone who purports to wise transferred to third parties. When are violated, whether the purchaser has
resell the copies is a copyright infringer Augusto started selling UMG promo- paid substantial sums for the privilege
for distributing copies of copyrighted tional CDs on eBay, UMG sued him for of permanent possession, and whether
works without getting permission from infringement. the purchaser has the right to discard
Photogra ph by Wind ell Oskay
the copyright owner. My March 2009 column predicted or destroy the copy. The marketing
that Autodesk and UMG would appeal channels through which the copy was
Vernor and Augusto the trial court rulings against them, and obtained (such as purchasing pack-
My March 2009 Legally Speaking col- that the software industry could be ex- ages of software at Walmart or Office
umn (“When is a ‘License’ Really a pected to push very hard for a reversal, Depot) may also be relevant.
Sale?”) discussed two lower-court deci- particularly in the Vernor case. Under the economic realities test,
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 27
viewpoints
these copies were infringing because eration of those markets. Vernor and
ib
cr
they were not authorized by Peak. Augusto’s cases are important to the fu-
s
ub
MAI v. Peak cited no authority and of- ture of competition in product markets
/s
rg
of their copies of Peak software. Peak’s law that the first-sale rule represents.
w
w
license was, for that panel, dispositive. Pamela Samuelson (pam@law.berkeley.edu) is the
ht
The three-judge panel decision in Richard M. Sherman Distinguished Professor of Law and
Information at the University of California, Berkeley.
Vernor did not rely on the license label
alone as a basis for rejecting Vernor’s Copyright held by author.
28 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
V
viewpoints
Computing Ethics
Surrounded
by Machines
A chilling scenario portends a possible future.
I
pr ed i c t t hat in the near fu-
ture a low-budget movie will
become a phenomenon. It
will be circulated on the In-
ternet, shared in the millions
via mobile telephones, and dominate
Facebook for a full nine days. It will
show ordinary people going about their
everyday lives as slowly, subtly, every-
thing starts to go wrong as described in
the following events.
A middle-aged man notices the ad-
vertisements that pop up with his Web
searches are no longer related to the
search, but to odd and random prod-
ucts and services—hair replacement,
sports cars, retirement homes, second
career counseling, fantasy vacations,
divorce lawyers, treatments for depres-
sion and impotence.
A young professional woman, re-
cently laid off because of the bad econ-
omy, posts an Internet ad to sell her These merely perplexing events be- pathizers were taking over America—
piano. The ad doesn’t mention that she come ever more ominous as thousands The Invasion of the Info Snatchers will
needs the money to pay her rent. None of people, then millions, realize they play on our high-tech anxiety as our
of the offers are as high as her asking are always being watched, they have online lifestyles, position-broadcast-
price, but two offer exactly what she no privacy, and their every decision is ing cellphones, and protective moni-
owes for rent (to the penny) and one of- controlled by some unseen force. Four- toring devices are inexorably compro-
fers exactly $100 more. sevenths of moderate Americans who mised, exploited, and joined by ever
The seven most troublesome stu- are likely to vote begin to slide from the more subtle devices.
dents at a high school notice that wher- middle to the extreme right or left, not The preceding descriptions are in-
ever they go, they run into one or more knowing why. It gets worse and worse. It tended to be satirical, but all of these
of their three meanest teachers. seems like Armageddon. scenarios are possible today, though
An elderly couple starts hearing Just as the 1956 film Invasion of the their probability varies. What seems
Illustratio n by vi ktor ko en
voices in their assisted-living apart- Body Snatchers encapsulated the Red most unlikely to me, though, is that
ment, faint whispers they can barely Scare zeitgeist with its depiction of or- people are and will be nervous about
discern: dinary people being replaced by exact being swept away in the rising tide of
“He’s awake.” replicas who are actually aliens bent pervasive information technology.
“She just got out of bed.” on taking over the world—as many Pervasive computing, ubiquitous
“The coffee machine is on.” feared that Communist spies and sym- computing, ambient intelligence, and
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 29
viewpoints
everyware are all commonly used terms Machines on the go… do we build and shape such pervasive
for generally similar phenomena—the When presented properly, the ben- sensing systems without slipping into
increasing presence in our everyday efits of pervasive IT are obvious. The coercion, surveillance, and control?”
lives of information and communica- popularity and benefits of the Internet
tion technologies (ICT) that are too and cellphones need not be defended ...in the home…
small to notice, or integrated into ap- or even described, but the amount of Kalpana Shankar, an assistant professor
pliances or clothing or automobiles, or personal information in circulation in the School of Informatics and Com-
are aspects of services we use willingly. over the former is tremendous and in- puter Science and an Adjunct Assistant
Some technologists have professed a creasing, as are the unsanctioned uses Professor in the School of Library and
goal for such technologies to be invis- of personal data. The position-broad- Information Science at Indiana Uni-
ible to the users, taken-for-granted casting function of the latter is not ne- versity Bloomington, distributed a case
or simply unnoticed by most people, farious in intent. Both can be used in study to participants before the work-
continuous with our background envi- knowledge creation, but also for stalk- shop. The case study, “Sensing Pres-
ronment, existing and operating below ing of various sorts. ence and Privacy: The Presence Clock,”
and behind the realm of real-time hu- At the workshop, Katie Shilton, a was developed as part of an NSF-funded
man intervention or awareness. doctoral candidate in the Department research project, “Ethical Technology
These technologies were the focus of Information Studies and a research- in the Homes of Seniors,” or ETHOS
of a two-day workshop held last year: er at the Center for Embedded Network (http://ethos.indiana.edu/).
Ethical Guidance for Research and Ap- Sensing (CENS) at the University of Cal- An increasing number of people
plication of Pervasive and Autonomous ifornia at Los Angeles, described three want to live in their own homes as they
Information Technology (PAIT). The intriguing CENS projects using mobile age and begin to become less self-reli-
workshop was funded by the National phones; I’ll describe two. ant due to slowly increasing frailty of
Science Foundation (grant number Participants in the Personal Envi- various sorts. Their offspring want to
SES-0848097), with additional support ronmental Impact Report (PEIR) pro- ensure they are safe and that responses
from the Poynter Center for the Study gram record and submit a continuous to mishaps are rapid and certain. The
of Ethics and American Institutions location trace using their mobile devic- ETHOS project investigates how In-
(http://poynter.indiana.edu) at Indiana es. A participant’s location is captured ternet-connected devices that alert re-
University Bloomington, and hosted by every few seconds, allowing the system sponsible parties to troubling changes
the Association for Practical and Pro- to determine the location and infer the in routine—such as a person falling in
fessional Ethics (http://www.indiana. most likely mode of locomotion—foot, the living room and not getting up—
edu/~appe). Thirty-six scholars, includ- car, or bus. The participant’s travel pro- can give care providers peace of mind
ing ethicists, engineers, social scien- file is then correlated with Southern and elders more autonomy than they
tists, lawyers, geographers, and social California air quality and weather data, would enjoy in an assisted-living facili-
scientists, participated in the meeting, allowing PEIR to estimate the partici- ty as well as life-saving interventions in
discussed ethical issues in pervasive pant’s carbon footprint, as well as her an ethical manner acceptable to both
IT, and began crafting approaches to or his exposure to air pollution. The elders and their offspring.
ethical guidance for the development accuracy of the data gives an unprec- The Presence Clock case can be
and use of such devices, including edented look into the environmental found at the PAIT blog (http://ethical-
public policy guidance. The workshop harms people create and suffer. pait.blogspot.com/2009/08/case-
schedule, a list of participants, and Through the Biketastic project, bi- study-presence-clock.html) along with
more can be found at http://poynter. cyclists carrying GPS-enabled mobile commentary. Briefly described, the
indiana.edu/pait/. In this space, I can- phones transmit data on their routes Presence Clock is an analog clock that
not hope to do justice to the rich and through L.A. The information is not comes in pairs. One clock is installed
wide-ranging conversations we had at limited to position, but also includes in the elders’ living space and the sec-
the workshop, so I will focus on three data on the volume of noise and, using ond in the living space of their caretak-
significant topics we discussed at the the device’s accelerometer, the rough- ers. The two clocks are connected via
workshop. ness of the path. The data is transmit- the Internet. Both clocks sense move-
ted to a Web site (http://biketastic.com) ment and presence and lights on the
and displayed. Future improvements remote clock show roughly how much
When presented could display existing data about the time someone spent at any given hour
route, such as air quality, traffic con- in the room with the local clock; the
properly, the benefits ditions, and traffic accidents. The time spent is indicated by the bright-
of pervasive IT are Biketastic riders can also share their ness of a light next to the relevant hour
information with other riders to create marker (for example, a dull light at 1
obvious. a detailed profile of the rideability of and a bright light at 4 indicate some-
numerous routes. one spent little time near the clock at
Shilton described not only the ben- 1 and a good deal of time there at 4). A
efits and uses of these projects, but the different-colored light blinks next to
potential down-side and abuses. Sum- the hour marker when someone most
marizing the latter, she asked, “How recently entered the room.
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 31
V
viewpoints
The Profession of IT
Managing Time
Professionals overwhelmed with information glut
can find hope from new insights about time management.
T
a per-
i m e m a n ag e m e n t i s Tracking Commitments
sistently hot issue for many to Completion
computing professionals. Although we often Much of the literature on time man-
Almost every day we hear complain about agement focuses on the first practice.
(or have) laments about That practice directly addresses one of
information overload, about a relent- not having enough the biggest breakdowns with commit-
lessly increasing rate of input from time, lack of time ment management—missed or forgot-
Internet and other sources, and about ten commitments. When the world
feelings of overwhelm, data drowning, is the symptom, gets demanding, we can find ourselves
inadequacy, and even victimization. not the problem. in a state of constant worry about
The consequences from poor time whether we forgot commitments or
management can be significant: loss their due dates and whether we have
of trust, loss of reputation, negative the capacity to get everything done.
assessments about our competence David Allen has written a hugely
and sincerity, and inability to get the popular book about how to organize
jobs and projects we want. Books and our records so that nothing is lost and
seminars on time management con- lack of time is the symptom, not the we can eliminate from our minds all
tinue to be popular. Software tools to problem. The problem is commit- concerns about whether every commit-
help keep track of calendars and to-do ment management. Time is one of ment is being taken care of.1 He has de-
lists sell well. the resources needed to manage com- fined an operating system for manag-
The same issues plague us as de- mitments. Other resources, such as ing commitments. His system can be
cision makers. We wanted larger money, space, and personnel, may be implemented with a few simple rules
networks and more sensors for bet- needed as well. From now on, let us and folders. The folders and structure
ter situational awareness—and now talk about commitment management. of flows among them are remarkably
those networks overwhelm us. We still In managing commitments we need similar to the job-scheduling part of
complain about the quality of our de- to know only four things. I’ll call them a computer operating system. After
cisions. practices because you can learn them you set up your system and practice its
In my own research on this subject as skills and get tools to help you do rules for a short time, you soon become
I have turned up new insights that are them better (see the figure here). skilled at commitment tracking. That
very helpful especially if viewed as a 1. How to track commitments to so many people have found his book re-
coherent framework. I discuss these their completions; ally helpful illustrates that the record-
insights here. There are opportunities 2. How to chose what commitments keeping part of commitment tracking
here for all computing professionals to to make or decline; is a huge struggle for many.
become more productive and for some 3. How to organize the conversa- Allen’s story begins with “stuff” ar-
to design new software tools. tions that lead to completions of com- riving before you. Stuff is anything that
mitments; and demands your attention and possible
From Time Management to 4. How to manage mood and capacity. future action. Think of stuff as incom-
Commitment Management These four practices go together. If ing requests. A request can be anything
It is very important to frame the ques- we pay attention to only one, we will from the really simple (such as “read
tion properly. Although we often com- see some headway but not a lasting so- me” or “take note”) to the complex
plain about not having enough time, lution to our problem. (such as “write an analytic report” or
32 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints
Figure 1. Four practices of commitment management: 1-tracking, 2-selecting, 3-executing, 4-capacity planning.
time
money
material
personnel
4
now
execute
3
importance queued
1
mission
2
“implement a software tool”). Allen Trump the Urgent asperated over the sheer number of ur-
says to sort the incoming items into With the Important gent, time-wasting requests. The irony
trash (ignore and delete), possibly use- Stephen Covey has discussed at length is that many urgent requests are the re-
ful (save in tickler file), reference (save the notion of controlling what com- sult of previously neglected important
in reference file), and actionable. You mitments you enter or decline.2 The tasks. For example, if you make sure
do actionable items immediately if central question is: what exactly do you give excellent service to your cus-
they require two minutes or less (for you commit to? Covey maintains that tomers, you will not spend a lot of time
example, a quick answer to an email the answers come from having a clear answering complaints.
message); otherwise you enqueue sense of mission. Just as organizations Covey tells an engaging story about
them in your to-do list and calendar, have mission statements, individuals a time-management seminar leader
or you delegate them. You review your should have personal mission state- who did a demonstration involving
queues periodically to see if your dele- ments. We can ignore requests that placing rocks, then gravel, sand, and
gations have completed and the order- do not serve our mission, and we can water into a large glass jar. After his
ings of lists reflects your current pri- (politely) ask the people making them students struggled with getting all
orities. Once an item is in this system, to leave us alone. Covey counsels each these items successfully into the jar, he
you do not have to think about it and of us to write down a mission state- asked, “What is the point about time
your mind is clear to focus on the tasks ment, including our ongoing personal management?” He got many answers
needing completion. and professional commitments. Then including there is always more room
This story is incomplete in three we arrange our calendars to make sure to fit more things in your schedule if
ways. (1) It does not address the pos- that we allocate time sufficient for each they are small or liquid enough, and
sibility of controlling the flow of stuff. major commitment. you may therefore have more capacity
(2) It does not make explicit that much Covey argues that good mission to get things done than you think. He
of the stuff originates with you and statements help people distinguish said, “No. The point is that if you don’t
your teams as you design actions to important requests from urgent re- put the big rocks in at the beginning,
fulfill your own commitments. And quests. Many people find themselves you can’t get them in at all.”
(3), it does not deal with limitations overwhelmed with urgent but unim- The moral for commitment man-
on your capacity and the mood of over- portant requests that consume all their agement is: let your mission statement
whelm when you are beyond capacity. time. This is a double whammy—they inform you about what tasks are most
These three aspects take us to the next are frustrated at being unable to find important, then set aside sufficient
three practices. time for the important things and ex- time in your schedule to do them.
M AR C H 2 0 1 1 | VO L . 5 4 | N O. 3 | C OM M U N I C AT I O N S O F T HE ACM 33
viewpoints
Mastering Conversations for To manage commitments means to ally do not spend more than 60–80
Context, Possibility, Action manage the conversations leading to hours per week on professional com-
The third practice begins with the real- the fulfillment of those commitments. mitments.
ization that all commitments are made Have you or someone made the appro- You need to reduce your load if you
in conversations.3 The practice is to priate requests or offers? Who is re- are over capacity. First, look at your
become an observer and facilitator of sponsible for performing each action? mission statement and recall what
those conversations. There are three Who is responsible for accepting and is most important to you. Make sure
basic kinds of conversations. declaring satisfaction with the result? that the time you allocate for your
˲˲ Context. Define the purpose, mean- Do you trust promises made to you by “big rock” commitments is sufficient
ing, and value of actions. others along the way? to do them right. All other commit-
˲˲ Possibility. Invent possibilities for ments need to be modified or elimi-
future action (in the context). Managing Capacity and Mood nated. Modified means you negotiate
˲˲ Action. Elicit the commitments The final aspect of the picture is your new terms with the person(s) who ex-
that will realize specific possibilities ability to manage your capacity and pects the results. Eliminated means
and see them through to completion. mood. You have the capacity for a you cancel the commitment. In both
It would be a misunderstanding of commitment if you have the time and those cases you need to work with the
Allen’s model (practice 1) to interpret other resources needed to fulfill the customers of your commitments to re-
his “actionable” items only in the third commitment. If you do not have the set their expectations and take care of
sense. Professionals who do not create resources, you will need to initiate any consequences resulting from your
context will find it difficult to get anyone conversations to get them—and you scale-back or cancellation.
to work with them. Although the action must manage those conversations
itself is performed in the third conver- as well. Generally, if you have accept- Conclusion
sation, the other two are needed before ed too many commitments relative Commitment management presents a
people are willing to engage in a conver- to your capacity, you will feel over- big software challenge. There are soft-
sation for action. Sometimes you need whelmed, victimized, and sometimes ware tools that help with some of the
to schedule time for context and possi- panicked—poor moods for productiv- four practices separately. For example,
bility conversations, but more often you ity. When you do not have the capacity, OmniFocus (omnigroup.com), Things
can insert them as needed as prefaces you can find yourself in a death spiral (culturedcode.com), and Taskwarrior
to your requests and offers (which open of an increasing backlog of broken (taskwarrior.org) conform to Allen’s
conversations for action). promises, negative assessments about workflows in practice 1. Orchestrator
A conversation for action takes your performance, lack of willingness (orchmail.com) tracks conversations
place between a customer and per- to trust you, and a personal sense of for action through their stages in prac-
former; the customer makes a request powerlessness. Over time, these bad tice 3; ActionWorks (actiontech.com)
(or accepts an offer) that the perform- moods increase stress and anxiety in goes further, mapping and managing
er commits to fulfilling. The transac- your body and lead to chronic diseas- entire business processes. Can some-
tion between them can be visualized es. Not a pretty picture. one design a coherent system that
as a closed loop with four segments: With a simple exercise, you can supports all four together?
request, negotiate, perform, accept.5 assess whether you have the capacity If you learn the four commitment-
Performers often make requests of for your commitments and take cor- management practices, you will be
others to get components for their rective steps when they are beyond able to execute all your commitments
own deliveries; thus a single request your capacity.3,4 On a three-column productively and in a mood of fulfill-
can evoke coordination in a larger spreadsheet, make one row for each ment and satisfaction. All your custom-
network of people (for details on con- commitment. Put a description of the ers will be satisfied and you will enjoy a
versations for action and their skilled commitment in the first column, the strong, trustworthy reputation.
management, see 3–5). number of weekly hours you need to
do it well in the second column, and References
1. Allen, D. Getting Things Done: The Art of Stress-Free
the number of weekly hours you ac- Productivity. Penguin, 2001.
Commitment tually spend in the third. Make sure 2. Covey, S.R., Merrill, R., and Merrill, R. First Things First.
Simon & Schuster, 1994.
to include all your “big rock” com-
management mitments including time for fam-
3. Denning, P. and Dunham. R. The Innovator’s Way:
Essential Practices for Successful Innovation. MIT
34 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
V
viewpoints
Broadening Participation
A Program Greater
than the Sum of Its Parts:
The BPC Alliances
Changing the trajectory of participation in computing
for students at various stages of development.
T
h e re is v ir tually no disci-
pline or aspect of our daily
lives that is not positively im-
pacted by advances in com-
puter science. It has become
the backbone of our technologically
dependent society. In fact, computer
software engineers are among the oc-
cupations projected to grow the fast-
est and add the most new jobs over the
2008–2018 decade.6 Yet, bachelor’s,
master’s, and Ph.D. degrees earned by
U.S. citizens and permanent residents
continue to decline.3,7 Further, degrees
earned by women, persons with dis-
abilities, and underrepresented mi-
norities (American Indians/Alaskan
Natives, African Americans, Native Ha- Students at the CAHSI 2009 annual meeting held at Google headquarters.
waiian’s/Pacific Islanders or Hispan-
Photo court est y of AAAS Center fo r Adva ncing Science & Engineering Ca pacit y
ics) lag those of non-resident Aliens, or Demonstration Projects (LSA). Typi- underrepresentation in the computing
Asians, and White males. cal DPs pilot innovative programs that, disciplines. Projects may target stages
once fully developed, could be incorpo- of the academic pipeline from middle
Program Focus rated into the activities of an Alliance school through the early faculty ranks,
Rather than focus on the problems that or otherwise scaled for wider impact. and are expected to have significant im-
beset computing, we will emphasize LSA projects can leverage, scale, and pact on both the quality of opportunities
solutions in the form of the National adapt the work of Alliances or DPs, as afforded to participants and the number
Science Foundation’s (NSF) Broaden- well as efforts by other organizations of participants potentially served.5
ing Participation in Computing (BPC) to extend the impact of effective prac- NSF funding for the Alliances began
program.a The BPC-A program supports tices. Alliance and Alliances Extension in 2005/2006 with most programs oper-
three categories of awards: Alliances; Projects (Alliances) represent broad ating with students approximately one
Demonstration projects (DPs); and Le- coalitions of academic institutions of year later. Ten alliances constitute the
veraging, Scaling, or Adapting Projects, higher learning, secondary and middle core of BPC as of 2009. An eleventh al-
schools, government, industry, profes- liance, the National Center for Women
a For additional information on the BPC pro-
sional societies, and other not-for-profit & IT (NCWIT), predated the BPC pro-
gram, visit: http://www.bpcportal.org/bpc/ organizations designing and carrying gram, but has served as a focal point
shared/home.jhtml. out comprehensive programs to reduce and resource for all the Alliances, par-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 35
viewpoints
36 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 37
V
viewpoints
Viewpoint
Computer and Information
Science and Engineering:
One Discipline, Many Specialties
Mathematics is no longer the only foundation for computing
and information research and education in academia.
D
u ri n g t h e last 60 years we it into subdisciplines. The dominant
have seen the beginning discourse in our community should
of a major technological The fast evolution be about building a coherent view of
revolution, the Information of IT motivates the broad discipline, building bridges
Revolution. IT has spanned between its constituents, and build-
large new economic sectors and has, a periodic ing bridges to other disciplines as we
over a long period, doubled the rate reexamination engage in interdisciplinary research.
of increase in labor productivity in the I hope this column will contribute to
U.S.1,16 Over two-thirds of job open- and reorganization these goals.
ings in science and engineering in of computing and
the coming decade are in IT.12 Intel- C&I Is a Use-Driven
lectual property, rather than physical information (C&I) Research Discipline
assets, has become the main means of research and I am discussing in this column the
production: control over intangibles broad field of Computing and Informa-
(such as patents and copyrights) are education in academia. tion Science and Engineering (CISE):
at the forefront of the national and in- the study of the design and use of digital
ternational business agenda;6,23 invest- systems that support storing, process-
ment by industry in intangible assets ing, accessing and communicating in-
has overtaken investment in tangible formation. To prevent possibly mislead-
means of production.7,19 (C&I) research and education in aca- ing connotations, I shall call this broad
The information revolution is far demia. We seem to be in one such peri- field Computing and Information (C&I).
from having run its course: “machine- od. Many universities have established We still seem to be debating whether
thought” has not yet replaced “brain- or expanded schools and programs computer science is science, engineer-
thought,” to the extent that “machine- that integrate a broad range of subdis- ing, or something unlike any other aca-
made” has replaced “hand-made.” One ciplines in C&I; and NSF is affecting demic discipline (see, for example9,11).
can be confident that the use of digital the scope of research and education in The debate is often rooted in a linear
technologies will continue to spread; C&I through the creation of programs view of science and engineering: Sci-
that more and more workers will move such as the Cyber-Enabled Discovery entists seek knowledge, for knowledge
from the physical economy to the in- and Innovation (CDI) and Pathways to sake; through a mysterious process,
formation economy; and that people Revitalized Undergraduate Computing this knowledge turns out to have prac-
will spend more and more of their work Education (CPATH) programs.21,22 tical consequences and is picked up by
and leisure time creating, manipulat- I strongly believe that C&I is one broad applied scientists, next engineers, and
ing, and communicating information. discipline, with strong interactions be- then used to develop better technolo-
The fast evolution of IT motivates a tween its various components. A coher- gies. This view encourages an implicit
periodic reexamination and reorgani- ent view of the whole must precede any value system whereby science is seen a
zation of computing and information discussion of the best ways of dividing higher call than engineering.
Donald Stokes, in his book Pas- The diagram in Figure 2 describes C&I Needs Broader Foundations
teur’s Quadrant,25 leads a powerful not only engineering departments, C&I was lucky to develop early on
attack against this simplistic view but also other use-oriented disciplines mathematical abstractions that rep-
of science. He points out that, over such as medicine or agriculture. Fur- resented important constraints on
the centuries, fundamental research thermore, concern about impact and computing devices, such as time and
has been often motivated by con- use, and research in “Pasteur’s Quad- space complexity; this enabled C&I to
siderations of use—by the desire to rant,” are increasingly prevalent in develop useful artifacts while being
implement certain processes and science departments, be it life sci- fully contained within the confines of
achieve certain goals—not (not only) ences, social sciences, or physical sci- mathematics: The early development
by the desire to acquire knowledge for ences. Only a few purists would claim of algorithms, programming languag-
knowledge’s sake. His paradigmatic that departments are weakened by es, compilers or operating systems re-
example is Pasteur, who founded such concerns. quired no knowledge beyond C&I and
modern microbiology, driven by the The diagram in Figure 2 clearly ap- its mathematical foundations.
practical goal of preserving food. plies to C&I. Our discipline is use-in- Mathematics continues to be
According to Stokes, research spired: We want to build better comput- the most important foundation for
should be described as a two-dimen- ing, communication, and information C&I: The artifacts produced by C&I
sional space, as shown in Figure 1. systems. This occasionally motivates researchers and practitioners are
Stokes further argues that “Pasteur’s use-inspired basic research (for exam- algorithms, programs, protocols,
Quadrant,” namely use-inspired basic ple, complexity, cryptography), and of- and schemes for organizing infor-
research, is increasingly prevalent in ten involves applied research (such as mation; these are mathematical or
modern research institutes. The argu- architecture, databases, graphics). The logical objects, not physical objects.
ment of Stokes strongly resonates with design and experimentation with pro- Algorithms, programs or protocols
schools of engineering, or computer totypes is essential in system research. are useful once realized, executed
science. Most of their faculty members C&I scientists use scientific methods in or embodied in a physical digital de-
pursue scientific research that has a their research;8,10 and there is a contin- vice; but they are mostly studied as
utilitarian justification; their research ued back and forth between basic and mathematical objects and the prop-
is in “Pasteur’s Quadrant.” applied research and between academ- erties studied do not depend on their
Any engineering department in a ic research and the development of dig- physical embodiment. Indeed, one
modern research university is a science ital products and services by industry. might call much of C&I “mathemati-
and engineering department. This is
Figure 1. Pasteur’s Quadrant (adapted from Stokes28).
often indicated by the department’s
name: Material Science and Engineer-
ing, Nuclear Science and Engineering,
or even Engineering Science (at Oxford Consideration of use?
University). Figure 2 describes the re- No Yes
search activities in such a department. Quest for No Pure applied research
Faculty members perform basic use- fundamental (Edison)
inspired or applied research related understanding? Yes Pure basic research (Bohr) Use-inspired basic research
to the applications of their discipline. (Pasteur)
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 39
viewpoints
cal engineering”b: it is focused on the I believe, however, that physical and the technical: One may well argue
creation of new mathematical objects constraints are a small fraction of the that the essential insight that enabled
under constraints, such as low time constraints relevant to the design of C&I efficient Web search and led to the cre-
and space complexity for discrete algo- systems. For example, software engi- ation of companies such as Google is
rithms, good numerical convergence neering research has strived for de- that the structure of the Web carries
for numerical algorithms, or good cades to define code metrics that rep- information about the usefulness of
precision and recall for classifiers; the resent how complex a code is (hence, Web pages—a socio-technical insight.
difference between mathematics and what effort is required to program or Progress in graphics and animation
“mathematical engineering” is precise- debug it)—with limited success. Such increasingly requires an understand-
ly the emphasis on such constraints. a code metric would measure how dif- ing of human vision: otherwise, one
As technology progresses, new con- ficult it is for a programmer to com- makes progress in quality metrics that
straints need to be considered. For ex- prehend a code. But this is a cognitive have low correlation to the subjective
ample, time complexity is increasingly issue: It is highly unlikely that one can quality of an image; examples can be
irrelevant when communication (to develop successful theories on this easily multiplied.
memory, disk, and network) replaces subject without using empirically vali- Another important aspect of the
computation as the main performance dated cognitive models that are based evolution of our field is the increasing
bottleneck, and when energy consump- on our best understanding of human importance of applications. Precisely
tion becomes the critical constraint. cognition. Unfortunately, traditional because software is so malleable and
New technologies that will take us “Be- software engineering research has not universal, one can develop very spe-
yond Moore’s Law” (quantum comput- been rooted in cognitive sciences. cialized systems to handle the needs
ing, molecular computing) will require Cognitive, cultural, social, organiza- of various disciplines: computer-aided
new mathematical abstractions. tional, and legal issues are increasingly design, medical imaging, DNA match-
Part of C&I, namely computer en- important to engineering, in general.5 ing, Web auctions—these are but a
gineering, has always been concerned This is a fortiori true for C&I. In the ear- few examples of application areas that
with the interplay between the math- ly days of computing, only few people have motivated significant specialized
ematical abstractions and their physi- interacted directly with computers: C&I research. Such research cannot be
cal embodiment. In addition to math- the psychology of programmers or us- successful without a good understand-
ematics, physics is foundational for ers could be ignored without too much ing of the application area.
this specialty, and will continue to be inconvenience: these few people would This suggests a new view for the
so. Physics is also important for cyber- adapt to the computer. Today, the situ- organization of C&I that is described
physical systems that directly interact ation is vastly changed: Billions of peo- in Figure 3: Mathematics is no longer
with their physical environment. ple interact daily with digital devices the only foundation. For those working
and C&I systems become intimately close to hardware or working on cyber-
involved in many cognitive and social physical systems, a good foundation
b Mathematical engineering was apparently processes; it is not possible anymore in physics continues to be important.
used as a synonym for “computer science” in
Holland, in the early days of the discipline. It
to ignore the human in the loop. In- An increasing number of C&I research
is now used by some schools as a synonym for deed, interesting research increasingly areas (such as human-computer inter-
“scientific computing.” occurs at the intersection of the social action, social computing, graphics and
visualization, and information retriev-
Figure 3. C&I—An inclusive view. al) require insights from the social sci-
ences (cognitive psychology, sociology,
anthropology, economics, law, and so
forth); human subject experiments be-
Use-inspired Applied
come increasingly important for such
basic research research research. At a more fundamental level,
the development of artificial cognitive
systems provides a better understand-
ing of natural cognitive systems—of
Prototypes the brain and its function; and para-
Mathematics, digms borrowed from C&I become
statistics,
foundational in biology. Insights from
social sciences,
physical neuroscience provide a better way of
sciences… Science, building artificial intelligent systems
engineering, Products and biology may become the source of
arts, humanities,
Foundations future computing devices. Finally, re-
business,
medicine Systems, applications, search in C&I is strongly affected by the
data repositories multiple application areas where infor-
Application areas
mation technology is used (such as sci-
ence, humanities, art, and business),
and profoundly affects these areas.
40 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 41
viewpoints
42 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
ACM_DL_Ad_CACM_2.indd 1 2/2/11 1:54:56 PM
viewpoints
the teaching style of the instructor.20 only provide researchers with data that Schuster, New York, 1953.
4. Choy, S.P., Bradburn, E.M., and Carroll, C.D. Ten
Rather than focusing on the use of IT cannot be obtained otherwise but also Years After College: Comparing the Employment
to improve the lecture experience, we change in fundamental ways the re- Experiences of 1992–93 Bachelor’s Degree Recipients
With Academic and Career-Oriented Majors, National
should probably focus on the use of IT lation of the scientist to the object of Center for Education Statistics, Institute of Education
and social networking tools to make study. The volunteers are unlikely to be Sciences, U.S. Department of Education, 2008.
5. Clough, G.W. The Engineer of 2020: Visions of
individual and group self-study more motivated by pure scientific curiosity; Engineering in the New Century. National Academy of
productive by multiplying the interac- they want the research they participate Engineering Press, Washington, D.C. (2004).
6. Cohen, W.M. and Merrill, S.A. Patents in the
tion channels between students and in to have an impact—save the environ- Knowledge-based Economy. National Academies
between students and faculty.15 ment or cure cancer. The researcher Press, 2003.
7. Corrado, C.A., Sichel, D.E., and Hulten, C.R. Intangible
As the half-life of knowledge grows that uses their data has an implicit or Capital and Economic Growth. Board of Governors of
the Federal Reserve System City, 2006.
shorter, it becomes less important to explicit obligation to use the data col- 8. Denning, P.J. Computer science: The discipline.
impart specific knowledge to students lected for that common purpose and Encyclopedia of Computer Science (2000).
9. Denning, P.J. Is computer science science? Commun.
(and to test them on this knowledge) not use it for other purposes. Research ACM 48, 4 (Apr. 2005), 27–31.
and more important to teach them how becomes engaged and obligated to a 10. Denning, P.J. et al. Computing as a discipline.
Commun. ACM 32, 1 (Jan. 1989).
to learn, how to identify and leverage large community.17 11. Denning, P.J. and Freeman, P.A. Computing’s
sources of knowledge and expertise, IT enables the fast dissemination paradigm. Commun. ACM 52, 12 (Dec. 2009), 28–30.
12. Dohm, A. and Shniper, L. Occupational employment
and how to collaborate with experts in of scientific observations and results. projections to 2016. Monthly Labor Review Online 130,
other areas, creating collective knowl- Research progresses faster if observa- 11 (Nov. 2007).
13. Ehrenberg, R. et al. Financial forces and the future
edge. Yet our education is still strongly tional data and preliminary results are of American higher education. Academe 90, 4 (Apr.
focused on acquiring domain-specific shared as quickly and as broadly as pos- 2004), 28–31.
14. Furst, M. and DeMillo, R.A. Creating symphonic-
individual knowledge; and students sible. One obstacle to such unimpeded thinking computer science graduates for an
mostly collaborate with other students sharing is that academic careers are increasingly competitive global environment. White
Paper, College of Computing, Georgia Institute of
that have similar expertise. Projects fostered by the publication of polished Technology (2004).
and practicums that involve teams of analyses, not by the publication of raw 15. Haythornthwaite C. et al. New theories and models
of and for online learning. First Monday 12, 8 (August
students from different programs, with data or partial results: Research groups 2007).
16. Jorgenson, D.W., Ho, M.S. and Stiroh, K.J. Information
different backgrounds, could refocus tend to hold on to their data until they Technology and the American Growth Resurgence.
education so as to train more foxes and can analyze it and obtain conclusive re- MIT Press, Cambridge, Mass., 2005.
17. Krasny, M.E. and Bonney, R. Environmental education
fewer hedgehogse—a change I believe sults. Better ways of tracking the prov- through citizen science and participatory action
will benefit many of our students. Such enance of data used by researchers and research. Environmental Education and Advocacy:
Changing Perspectives of Ecology and Education
collaborative learning-by-doing em- the web of mutual influences among (2005), 292–319.
powers students, increases motivation, researchers would enable to track the 18. Light, R.J. Making the Most of the College: Students
Speak their Minds. Harvard University Press, 2004.
improves retention and teaches skills impact of contributions other than 19. Marrano, M.G., Haskel, J.E. and Wallis, G. What
that are essential for success in the in- polished publications and develop a Happened to the Knowledge Economy? ICT,
Intangible Investment and Britain’s Productivity
formation society. A skillful use of IT merit system that encourage more in- Record Revisited. Department of Economics, Queen
technology, both for supporting course formation sharing. We can and should Mary, University of London, 2007.
20. Morello, D. The IT Professional Outlook: Where Will
activities and for assessing teaching develop an environment where no sci- We Go From Here? Gartner, 2005.
and learning, can facilitate this educa- entist has an incentive to withhold in- 21. National Science Foundation—Directorate Computing
and Information Science and Engineering. CISE
tion style.f formation. Pathways to Revitalized Undergraduate Computing
IT changes the way research is pur- C&I has been, for years, an amaz- Education (CPATH). NSF, Arlington, VA.
22. National Science Foundation—Directorate Computing
sued: For example, it enables citizen ingly vibrant, continuously renew- and Information Science and Engineering. Cyber-
Enabled Discovery and Innovation (CDI). NSF,
science projects where many volun- ing intellectual pursuit that has had Arlington, VA.
teers collect data. Such projects have a profound impact on our society. It 23. Sell, S.K. Private Power, Public Law: The Globalization
of Intellectual Property Rights. Cambridge University
become prevalent in environmental has succeeded being so by continu- Press, 2003.
sciences24 and are likely to have a large ously pursuing new uses of IT and 24. Silvertown, J. A new dawn for citizen science. Trends
in Ecology and Evolution 24, 9 (Sept. 2009).
impact on health sciences. They not continuously adjusting disciplinary 25. Stokes, D.E. Pasteur’s Quadrant. Brookings Institution
focus in research and education so Press, 1997.
e “The fox knows many things, but the hedgehog 26. Thelwall, M. Can Google’s PageRank be used to find
as to address the new problems. This the most important academic Web pages? Journal of
knows one big thing.”3 Sir Isaiah Berlin distin-
fast evolution must continue for our Documentation 59, 2 (Feb. 2003), 205–217.
guishes between hedgehogs—thinkers “who 27. Triplett, J.E. and Bosworth, B.P. ‘Baumol’s disease’
relate everything to a single central vision,” discipline to stay vital. IT will continue has been cured: IT and multifactor productivity in U.S.
and foxes—thinkers who “pursue many ends, to be a powerful agent of change in our services industries. Edward Elgar Publishing, City, 2006.
28. Wulf, W.A. The Urgency of Engineering Education
often unrelated and even contradictory, con- society and, to drive this change, we Reform. The Bridge 28, 1 (Jan. 1998), 48.
nected only in some de facto way.” Although must continuously change and strive to
the essay of Isaiah Berlin focuses on Russian
writers, I see “foxiness” as being very much the
change our academic environment. Marc Snir (snir@illinois.edu) is Michael Faiman and
Saburo Muroga Professor in the Department of Computer
tradition of American Pragmatism. Both types Science at the University of Illinois at Urbana-Champaign.
are needed in our society, but “hedgehogs”
References
who prize the hedgehog way of thinking seem 1. Atkinson, R.D. and McKay, A.S. Digital Prosperity: I thank Martha Pollack for her careful reading of an early
to dominate in academia, especially in science Understanding the Economic Benefits of the version of this Viewpoint and for her many suggestions.
and engineering. Information Technology Revolution. Information Some of the ideas presented here were inspired by a talk
Technology and Innovation Foundation, 2007. by John King. This Viewpoint greatly benefited from the
f The recently started International Journal on
2. Baumol, W.J. and Bowen, W.G. Performing Arts: The detailed feedback of one of the referees.
Computer-Supported Collaborative Learning Economic Dilemma, 1966.
provides several useful references. 3. Berlin, I. The Hedgehog and the Fox. Simon & Copyright held by author.
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 43
practice
doi:10.1145/1897852.1897868
We don’t have to scratch the surface
Article development led by
queue.acm.org
very hard to find cracks in the belief
system of deterministic management.
Experienced system practitioners know
Models of determinism are changing deep down that they cannot think of
IT management. system administration as a simple pro-
cess of reversible transactions to be
by Mark Burgess administered by hand; yet it is easy to
see how the belief stems from classical
Testable
teachings. At least half of computer sci-
ence stems from the culture of discrete
modeling, which deals with absolutes
as in database theory, where idealized
System
conditions can still be simulated to an
excellent approximation. By contrast,
the stochastic models that originate
from physics and engineering, such
Administration
as queueing and error correction, are
often considered too difficult for most
basic CS courses. The result is that sys-
tem designers and maintainers are ill
prepared for the reality of the Unexpect-
ed Event. To put it quaintly, “systems”
are raised in laboratory captivity under
ideal conditions, and released into a
wild of diverse and challenging circum-
stances. Today, system administration
still assumes, for the most part, that
the world is simple and deterministic,
The methods of system administration have but that could not be further from the
truth.
changed little in the past 20 years. While core In the mid-1990s, several research
IT technologies have improved in a multitude of practitioners, myself included, argued
for a different model of system admin-
ways, for many if not most organizations system istration, embracing automation for
administration is still based on production- consistency of implementation and
line build logistics (aka provisioning) and using policy to describe an ideal state.
The central pillar of this approach was
reactive incident handling—an industrial-age stability.2,4 We proposed that by plac-
method using brute-force mechanization to ing stability center stage, one would
achieve better reliability (or at the very
amplify a manual process. As we progress into least predictability). A tool such as IT
an information age, humans will need to work is, after all, useful only if it leads to
less like the machines they use and embrace consistently predictable outcomes.
This is an evolutionary approach to
knowledge-based approaches. That means management: only that which survives
exploiting simple (hands-free) automation that can be successful.
As a physicist by training, I was sur-
leaves us unencumbered to discover patterns and prised by the lack of a viable model for
make decisions. This goal is reachable if IT itself explaining actual computer behavior.
opens up to a core challenge of automation that is It seemed that, instead of treating be-
havior as an empirical phenomenon
long overdue—namely, how to abandon the myth full of inherent uncertainties, there
of determinism and expect the unexpected. was an implicit expectation that com-
because they know no simple way of of paint where needed. Such an ap- is that, when you can repair system
discovering what changed without an proach required the conceptual leap state (both static configuration and
arduous manual investigation. The to a computable notion of mainte- runtime state), then the initial condi-
process is crude, like tearing down a nance. Maintenance can be defined tion of the system becomes unimport-
building to change a lightbulb. But by referring to a policy or model for ant, and you may focus entirely on
the reason is understandable. Cur- an ideal system state. If such a model the desired outcome. This is the way
rent tools are geared for building, not could somehow be described in terms businesses want to think about IT—in
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 45
practice
terms of goals rather than “building journey, because you often need to re- # >10kb of complex stuff
projects”—thus also bringing us clos- compute the path when the unexpected
er to a modern IT industry. occurs, such as a closed road. This GPS To fix both problems, it is sufficient
approach was taken by Cfengine5 in the to alter only this list (for example, a de-
Convergence to a Desired State mid-1990s. It says: work relative to the sired outcome):
Setting up a reference model for repair desired end-state of your model, not an
sounds like a simple matter, but it re- initial baseline configuration, because # >10kB of complex stuff
quires a language with the right proper- the smallest unexpected change breaks MODULES = JAVA OTHERS PHP
ties. Common languages used in soft- a recipe based on an initial state. This # >10kB of complex stuff
ware engineering are not well suited has been likened to Prolog.7
for the task, as they describe sequential In simple terms, the approach Traditionally, one replaces the
steps from a fixed beginning rather works by making every change satisfy a whole file with a hand-managed tem-
than end goals. Generally, we don’t simple algorithm: plate or even reinstalls a new package,
know the starting state of a machine forcing the end user to handle every-
when it fails. Moreover, a lot of redun- Change (arbitrary_state) → desired_state (1) thing from the ground up. Using a de-
dant computation is required to track a Change (desired_state) → desired_state (2) sired state approach, we can simple
model, and that would intrude on clari- say: in the context of file webserv-
ty. The way around this has been to con- This construction is an expression er.config, make sure that any line
struct declarative DSLs (domain-specif- of “dumb” stability, because if you per- matching “MODULES = something”
ic languages) that hide the details and turb the desired state into some arbi- is such that “something” contains
offer predictable semantics. Although trary state, it just gets pushed back into “PHP” and does not contain “SECURI-
Cfengine was the first attempt to han- the desired state again, like an auto- TY HOLE.” Figure 1 illustrates how this
dle indeterminism, special languages mated course correction. It represents might look in Cfengine.
had been proposed even earlier.9 a system that will recover from acciden- Thus, the code defines two inter-
Many software engineers are not tal or incidental error, just by repeating nal list variables for convenience and
convinced by the declarative DSL ar- a dumb mantra—without the need for passes these to the specially defined
gument: they want to use the famil- intelligent reasoning. method edit_ listvar, which is con-
iar tools and methods of traditional For example: suppose you want to structed from convergent primitives.
programming. For a mathematician reconfigure a Web server to support For each item in the list, Cfengine will
or even a carpet fitter, however, this PHP and close a security hole. The serv- assure the presence or absence of the
makes perfect sense. If you are trying er and all of its files are typically part of listed atoms without touching any-
to fit a solution to a known edge state, a software package and is configured thing else. With this approach, you
it is cumbersome to start at the oppo- by a complex file with many settings: don’t need to reconstruct the whole
site end with a list of directions that as- Web server or know anything about
sume the world is fixed. When you pro- # >10kB of complex stuff how it is otherwise configured (for ex-
gram a GPS, for example, you enter the MODULES = SECURITY _ HOLE JAVA ample, what is in “complex stuff”)
desired destination, not the start of the OTHERS or even who is managing it: a desired
end-state relative to an unknown start-
Figure 1. Reconfiguring a Web server in Cfengine. state has been specified. It is a highly
compressed form of information.
bundle agent webserver_config I referred to this approach as con-
{ vergent maintenance (also likening
vars:
“add” slist => { “PHP”, “php5” };
the behavior to a human immune
“del” slist => { “SECURITY_HOLE”, “otherstuff” }; system2), as all changes converge on
a destination or healthy state for the
column_edits: system in the frame of reference of the
“APACHE_MODULES=.*”
policy. Later, several authors adopted
edit_column => edit_listvar(“$(add_modules)”,”append”); the mathematical term idempotence
“APACHE_MODULES=.*” (meaning invariance under repetition),
edit_column => edit_listvar(“$(del_modules)”,”delete”);
focusing on the fact that you can apply
}
these rules any number of times and
[Note: The syntax (which incorporates implicit guards and iteration) the system will only get better.
has the form:
type_of_promise:
Guarded Policy
In the most simplistic terms, this ap-
“Atom” proach amounts to something like
Dijkstra’s scheme of guarded com-
property_type => desired_end_state;
] mands.8 Indeed, Cfengine’s language
implementation has as much in com-
mon with Guarded Command Lan-
46 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice
guage as it does with Prolog.7 The as- tion suggests a panacea, ushering in
sertion of X as a statement may be a new and perfect world. Alas, the ap-
interpreted as: proach can be applied only partially
to actual systems because no actual
If not model(X), set model(X)
If you are trying to systems are built using these pure con-
structions. Usually, multiple change
For example: fit a solution to a mechanisms tether such atoms togeth-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 47
practice
48 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice
˲˲ You do not trust the measuring de- ius theorem (graph theory), and the list a system is the fundamental challenge:
vice completely; or goes on. If this sounds like mere aca- the test-driven approach is about bet-
˲˲ There is a dependence on some- demic nonsense, then consider how ter knowledge management—knowing
thing that prevents the measurement much of this nonsense is in our daily what you can and cannot know.
from being made. lives through technologies such as Whether system administration
If you believe in classic first-order Google PageRank or the Web of Trust is management or engineering is an
logic, any assertion must be either true that rely on this same idea. oft-discussed topic. Certainly without
or false, but in an indeterminate world Note, however, that the robustness some form of engineering, manage-
following any of these cases, you simply advocated in this article, using the ment becomes a haphazard affair. We
do not know, because there is insuffi- principle of atomization and indepen- still raise computers in captivity and
cient information from which to choose dence of parts, is in flat contradiction then release them into the wild, but
either true or false. The system has only with modern programming lore. We there is now hope for survival. Desired
two states, but you cannot know which are actively encouraged to make hier- states, the continual application of
of them is the case. Moreover, suppose archies of dependent, specialized ob- “dumb” rule-based maintenance, and
you measure at some time t; how much jects for reusability. In doing so we are testing relative to a model are the keys
time must elapse before you can no lon- bound to build fragilities and limita- to quantifiable knowledge.
ger be certain of the state? tions implicitly into them. There was
This situation has been seen before a time when hierarchical organization
Related articles
in, of all places, quantum mechanics. was accepted wisdom, but today it is on queue.acm.org
Like Schrodinger’s cat, you cannot know becoming clear that hierarchies are
A Plea to Software Vendors from
which of the two possibilities (dead or fragile and unmanageable structures,
Sysadmins—10 Do’s and Don’ts
alive) is the case without an active mea- with many points of failure. The alter- Thomas A. Limoncelli
surement. All you can know is the out- native of sets of atoms promising to http://queue.acm.org/detail.cfm?id=1921361
come of each measurement reported stabilize patches of the configuration Self-Healing in Modern Operating Systems
by a probe, after the fact. The lesson of space is tantamount to heresy. Never- Michael W. Shapiro
physics, on the other hand, is that one theless, sets are a more fundamental http://queue.acm.org/detail.cfm?id=1039537
can actually make excellent progress construction than graphs. A Conversation with Peter Tippett
without complete knowledge of a sys- For many system administrators, and Steven Hofmeyr
tem—by using guiding principles that these intellectual ruminations are no January 10, 2009
do not depend on the uncertain details. more pertinent than the moon land- http://queue.acm.org/detail.cfm?id=1071725
ings were to the users of Teflon pans.
Back to Stability? They do not see themselves in these References
1. Burgess, M. An approach to understanding policy
A system might not be fully knowable, issues, which is why researchers, not based on autonomy and voluntary cooperation.
but it can still be self-consistent. An merely developers, need to investigate Submitted to IFIP/IEEE 16th International Workshop
on Distributed Systems Operations and Management
obvious example that occurs repeated- them. Ultimately, I believe there is still (2005).
2. Burgess, M. Computer immunology. In Proceedings of
ly in nature and engineering is that of great progress to be made in system ad- the 12th System Administration Conference, 1998.
equilibrium. Regardless of whether you ministration using these approaches. 3. Burgess, M. Configurable immunity for evolving
human-computer systems. Science of Computer
know the details underlying a complex The future of system administration Programming 51, 3 (2004), 197–213.
system, you can know its stable states lies more in a better understanding 4. Burgess, M. On the theory of system administration.
Science of Computer Programming 49 (2003), 1–46.
because they persist. A persistent state of what we already have to work with 5. Cfengine; http://www.cfengine.org.
is an appropriate policy for tools such than in trying to oversimplify necessary 6. Couch, A., Daniels, N. The maelstrom: network service
debugging via `ineffective procedures.’ Proceedings of
as computers—if tools are changing complexity with industrial force. the 15th Systems Administration Conference (2001), 63.
too fast, they become useless. It is bet- 7. Couch, A., Gilfix, M. It’s elementary, dear Watson:
Applying logic programming to convergent system
ter to have a solid tool that is almost Conclusion management processes. In Proceedings of the 13th
what you would like, rather than the It is curious that embracing uncer- Systems Administration Conference (1999), 123.
8. Dijkstra, E. http://en.wikipedia.org/wiki/Guarded_
exact thing you want that falls apart tainty should allow you to understand Command_Language.
after a single use (what you want and something more fully, but the simple 9. Hagemark, B., Zadeck, K. Site: A language and system
for configuring many computers as one computer site.
what you need are not necessarily the truth is that working around what you Proceedings of the Workshop on Large Installation
same thing). Similarly, if system ad- don’t know is both an effective and Systems Administration III (1989); http://www2.parc.
com/csl/members/jthornton/Thesis.pdf.
ministrators cannot have what they low-cost strategy for deciding what you 10. Opscode; http://www.opscode.com/chef.
11. Puppet Labs; http://www.puppetlabs.com/.
want, they can at least choose from the actually can do. 12. Sloman, M. S., Moffet, J. Policy hierarchies for
best we can do. Major challenges of scale and com- distributed systems management. Journal of Network
and System Management 11, 9 (1993), 404.
Systems can be stable, either be- plexity haunt the industry today. We
cause they are unchanging or because now know that scalability is about not
Mark Burgess is a professor of network and system
many lesser changes balance out over only increasing throughput but also be- administration, the first with this title, at Oslo University
time (maintenance). There are count- ing able to comprehend the system as it College. His current research interests include the
behavior of computers as dynamic systems and applying
less examples of very practical tools grows. Without a model, the risk of not ideas from physics to describe computer behavior. He is
that are based on this idea: Lagrange knowing the course you are following the author of Cfengine and is the founder, chairman, and
CTO of Cfengine, Oslo, Norway.
points (optimization), Nash equilibri- can easily grow out of control. Ultimate-
um (game theory), the Perron-Froben- ly, managing the sum knowledge about © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 49
practice
doi:10.1145/1897852.1897869
National
Internet
Defense—
Small States
on the
Skirmish Line
and commercial activity and influence.
This is far less palpable than a nation’s
physical territory or even than “its air”
or “its water”—one could, for example,
establish by treaty how much pollution
De spite the gl oba land borderless nature of the Mexican and American factories might
Internet’s underlying protocols and driving contribute to the atmosphere along
their shared border, and establish met-
philosophy, there are significant ways in which it rics and targets fairly objectively. Cy-
remains substantively territorial. Nations have berspace is still a much wilder frontier,
difficult to define and measure. Where
policies and laws that govern and attempt to its effects are noted and measurable,
defend “their Internet”—the portions of the global all too often they are hard to attribute
network that they deem to most directly impact to responsible parties.
Nonetheless, nation-states are tak-
their commerce, their citizens’ communication, and ing steps to defend that space, and
their national means to project social, political, some have allegedly taken steps to at-
50 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
tack that of others. Two recent events plans to relocate the Bronze Soldier, a sending campaigns, suggests that this
illustrate the potential vulnerabilities Soviet war memorial, and the capital, was a one-month attack for hire (or was
faced by small nation-states and sug- Tallinn, experienced several nights of intended to look like one). Unfortu-
gest steps that others may take to miti- rioting. The subsequent cyber attacks nately, such attacks, either threatened
gate those vulnerabilities and establish are believed to be a consequence of the or launched for commercial extortion,
a more robust and defensible Internet memorial’s relocation. have become commonplace. Based on
presence. The first was an attack on An attack on Estonian Internet in- offers visible on the black market at
Estonian Internet infrastructure and frastructure and Web sites began at the time, the attack likely cost between
Web sites in May and June 2007. The 11 p.m. local time, midnight Moscow $200 and $2,000 to hire. Like many
second was a cyber attack against the time, Tuesday, May 8. The attack was politically motivated attacks, it com-
Illustratio n by a lex william so n
Georgian infrastructure that accompa- effectively mitigated by 7 a.m. the fol- bined a distributed denial-of-service
nied the Russian incursion into South lowing day but continued to be visible (DDoS) attack against Internet infra-
Ossetia in August 2008. in traffic logs for exactly 30 days there- structure with DDoS and attempted
after. That time period, together with defacement attacks against the Web
Estonia the fact that the attacking botnets’ sites of Estonian banks, media outlets,
Tensions had been building in Estonia signature was identical to that used in and government.
in the spring of 2007 over the country’s prior Russian Business Network spam- The Estonian defense was notably
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 51
practice
successful, and there are a number of of providers is via Internet exchange diction is generally clear, though limit-
lessons to be taken from it by other points, commonly abbreviated IXP. ed and perhaps difficult to enforce; but
countries wishing to avoid a cyberwar- The world has about 330 IXPs at the on the other end it is nearly impossible
fare defeat. The simplest summary of moment, and that number has been even to influence. Thus, diversity is key
the dynamics of a DDoS-based cyber at- steadily increasing. Each IXP has a spe- to optimizing the survivability of inter-
tack is as a numbers game. An attacker cific physical location and connects a national connectivity.
with greater network capacity than the community of ISPs that meet as peers Estonia had numerous privately
defender will be able to overwhelm the at the exchange. Some countries, such controlled data circuits crossing its
defender’s network, while retaining as the U.S., have many IXPs. Others, borders, with the other ends located
sufficient capacity to support its own such as the Netherlands and Germa- in several different countries. Of these,
needs at the same time. Such an attack ny, have very large IXPs. Many smaller the most significant were large Scan-
would be deemed successful. An at- countries have exactly one exchange, dinavian and Western European ISPs
tacker with less bandwidth than the de- located in the capital city. But the great- with which Estonian ISPs had commer-
fender would exhaust itself in consum- est number of countries, typically the cial relationships and that were based
ing the defender’s capacity, while the smallest ones, has no IXP at all. This in diplomatically friendly neighboring
defender might well retain enough ex- means that they are heavily dependent countries. This is an optimal situation,
cess capacity that its population would for their domestic connectivity upon and when push came to shove, Estonia
not be significantly inconvenienced; international data circuits. Imagine a received fast and effective aid from the
such an attack would be considered situation in which there were no local ISPs at the other ends of those circuits.
unsuccessful. telephone calls, only calls overseas; to Name resolution. The ability to re-
Viewed in closer detail, there are reach someone next door, you would solve domain names domestically is
different kinds of network capacity have to make a call that went overseas another critical infrastructure capabil-
and different mechanisms for improv- and then back again, at twice the cost. ity. The Domain Name System (DNS)
ing and defending each. They can be This is the situation in most less- is the Internet’s directory service, pro-
placed in four categories: local or in- developed countries, as a result of viding Internet-connected computers
ternal capacity; external connectivity; misunderstanding Internet econom- with the ability to map the human-
name resolution capability; and defen- ics and topology. Countries in this readable domain names in email and
sive coordination. situation are extremely vulnerable to Web addresses to the machine-read-
Local capacity, or bandwidth, is having those external lines of commu- able binary IP addresses used to route
most familiar as one’s initial connec- nications cut or overburdened, since traffic within the network. Domain
tion to the Internet. This local loop, or that causes not only international but names are resolved to IP addresses
last mile, is the copper wire or fiber line also domestic communications to fail, (and vice versa) by iterating through
in the ground or on poles, or the wire- and thus the ability to coordinate a de- a delegation hierarchy of DNS direc-
less link that carry signals from the fense fails as well. A strong domestic tory servers, starting at the “root” and
customer to an ISP (Internet service Internet exchange point is the first and progressing through top-level domain
provider). A robust local-loop infra- most critical component of a cyberwar- (TLD) name servers such as .com and
structure consists of buried fiber-optic fare defense. A redundant pair of IXPs, .net, to the organization-specific name
cable interconnecting each business or one in each major city, is the desir- servers that hold the particular answer
or residence with multiple ISPs over able goal. A redundant pair of IXPs in one is looking for.
different physical paths. Ideally, these Tallinn formed the linchpin of the Es- If connectivity is broken between
service providers ought to be in com- tonian defense. users and any one of the name servers
petition so they cannot be collectively International communications ca- in the delegation chain from the root
suborned or sabotaged, and so their pability is necessary for conducting down to the specific one they are look-
prices are low enough that people can business in a global economy. It’s also ing for, then the users will be unable
actually choose fluidly among them. A needed for defensive coordination with to resolve the domain name they’re
sparsely supplied market for local con- outside allies in order to protect a na- looking for, and unable to reach the
nectivity can create bottlenecks and tion’s international capacity. Interna- corresponding Web site or send the
make attractive targets. In Estonia’s tional capacity is the asset most easily email, regardless of whether they have
case, multiple independent fiber infra- targeted from the outside, and it is per- connectivity to the Web site or email
structure operators existed, and many haps the most challenging to defend addressee. If the directory service is
different ISPs built a healthy, competi- from the perspective of the state, since broken, you can’t find things, even if
tive marketplace on top of that. More— it’s a multinational private-sector re- you could, hypothetically, reach them.
and more diverse—domestic fiber is source. In most countries, each circuit Estonia did not have any root servers
always better, but Estonia’s was more that crosses the border is controlled by within the country at the time of the
than sufficient. one company at one end, another com- attack, and still does not today. This is
External connectivity. More impor- pany at the other end, and a third in be- one of the few weak points of the Esto-
tant to defensibility is the ecosystem tween. In turn, many of these compa- nian defense and would have become
for the providers’ own connectivity nies are themselves consortia of other more debilitating over the course of an
within that domestic context. The mod- multinational companies. On the do- attack that had been more effective for
ern means to create an effective mesh mestic end of a circuit regulatory juris- a longer period of time.
52 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 53
practice
systems and processes, from facto- effort, consuming far more resources
ries to pipelines to transportation than does actual defense.
networks.) Rather, The Estonia inci-
dent was a pure information-warfare Defending the Small Nation-State
attack, attempting to convince Esto-
nians that the information-economy Much of what had Ensuring the Internet security of a
small nation-state entails investment
infrastructure of which they were so
proud was vulnerable and unsound,
worked well in the in four areas: ensuring physical net-
work robustness; securing the inter-
that their work in that sector was of case of Estonia did connection of participating networks
little value, that their adversary was
more capable and better prepared,
not in the Georgia through exchange points; securing the
data and services required to keep the
and in a more pitched conflict, their attack…Georgian Internet running; and developing an
defeat would be inevitable. A popula-
tion that would take such a message
international effective response community.
In advance of any threat, a nation
to heart would indeed be unwilling to connectivity was far should take steps to ensure that its net-
support conflict against the attacker.
The Estonia attack had very little more limited, hence works are connected to the rest of the
world via diverse international tran-
success in concrete terms, and little
more success in information-war-
more vulnerable. sit links to different unrelated tran-
sit providers in different, unaligned
fare terms, relative to the Estonians countries. A significant factor in why
against whom it was directed. Because Georgia was so affected by its cyber at-
of its apparent state-on-state nature, tack was its extremely limited connec-
and Estonia’s status at the time as the tivity to the outside world; Estonia was
most recently admitted NATO ally, the in a far better position, with a more di-
attack managed to garner a surpris- verse mesh of connectivity to friendlier
ing degree of attention elsewhere, neighbors. Submarine cables are also
though. The attacks against Georgia worth noting as a clear point of vulner-
were far more effective, but Georgia ability in international transit. There
did not have as far to fall and the con- have been a number of accidental sub-
flict on the Internet paled in compari- marine cable cuts in the past several
son to the actual shooting war in its years, and a coordinated, willful effort
territory. One might accurately term to take those out would be fairly simple
both the Estonia and Georgia cyber to mount and would have significant
assaults as skirmishing; the attack on effect in certain regions.
Estonia amounted to little more than In the case of Estonia, DoS attacks
a nuisance, in part because of its scale effectively stopped at the country’s IXP
and in part because of the effective- and had minimal impact on domestic
ness of the response. Internet traffic. In countries lacking
Without a doubt, any major war IXPs, even domestic traffic may end
would see complementary attacks up routed internationally, at greater
against the adversaries’ information expense than if there had been an IXP
infrastructure, including their nation- to broker exchanges before incurring
al presence on the Internet—suppres- higher international transit costs, and
sion of the means to coordinate and at greater risk of disruption.
organize has long been a basic tenet It is critical that countries have root
of warfare. It is perhaps early to assess and TLD name servers well connected
the impact of cyberwar, absent “real to their domestic IXPs, such that all of
war”; the attack against Estonia was their domestic ISPs can provide unin-
too slight to measure significant ef- terrupted DNS service to their custom-
fects, while the attack on Georgia was ers. In the case of ISO country-code
just a sideshow to a widely, physically TLD name servers, such as those for
destructive conflict. Estonia’s .ee domain, that’s relatively
The ultimate source of both attacks easily accomplished, though not yet
remains murky. Many assertions have universally done. In the case of root
been made, but there has been little ac- name servers, it requires the coopera-
tual discussion of the question of state tion and goodwill of a foreign organi-
involvement in cyber attacks. Plausible zation, the operator of the root name
deniability has become the watchword server, and generally some small in-
in cyberwarfare, and accordingly, at- vestment in infrastructure support for
tribution has become a major focus of the remotely operated root server. This
54 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice
might amount to an expenditure of content, and service provider com- and perhaps government investment,
some $15,000 (U.S.) per year, per root munity. Therefore, it will be limited to foster a robust physical infrastructure.
server installation within the country. operators, vendors, researchers, and ˲˲ Similarly, take steps to ensure a di-
(It’s worth noting that all of the in- people in the FIRST community work- versity of international connections.
vestments required for cyberwarfare ing to stop NSP security incidents.”3 ˲˲ Encourage (or directly sponsor)
defense are equally applicable to gen- New members of the “culture of creation of one or more IXPs.
eral economic development. Just as security” come out of academic and ˲˲ Ensure the domestic availability of
the cyberwarfare field of conflict is a training programs (which must be es- DNS resolution, through root servers.
private-sector space, this, too, is un- tablished), intern in a CERT (interna- ˲˲ Foster the growth of a collaborating
like traditional military expenditures. tionally or domestically), and go on to community of security professionals.
A tank or a bunker is purely a cost cen- careers as CSOs (chief security officers) A diversity of interconnections,
ter, whereas an IXP or domain name in CERTs, academia, law enforcement, both international and domestic, fa-
server is a profit center, generating or government. This is fundamentally cilitated by the efficient peering af-
new, concrete, and monetized value analogous to the peopling of a national forded by IXPs, provides a more robust
for its users from the moment it’s es- health environment with doctors. logical infrastructure, and local DNS
tablished. The return on investment of In the U.S., the Department of resolution further lessens depen-
a newly established IXP is typically less Homeland Security has included dence on more exposed international
than three weeks, and often less than CERTs and information assurance an- connections. With that technical in-
one week.) alysts and operators in a new research frastructure ensured, nations should
The CERT is a widely employed and development solicitation. In a then foster development of the human
model for computer and network in- draft of the solicitation, DHS notes, infrastructure, the information secu-
cident response. CERTs are directly “While we have a good understand- rity personnel needed to anticipate
responsible for systems under their ing of the technologies involved in [cy- threats, the ability to intercede inven-
own control, and, with other CERTs, bersecurity incident response teams], tively to restore services, and the abil-
collaborate on collective network se- we have not adequately studied the ity to support incident forensic collec-
curity. FIRST (Forum of Incident Re- characteristics of individuals, teams, tion and analysis.
sponse and Security Teams), an asso- and communities that distinguish
ciation of CERTs, brings CERTs and the great [cybersecurity incidence] re-
Related articles
their staffs together to build the most sponders from the average technology
on queue.acm.org
fundamental links in a web of trust.1 A contributor. In other areas where indi-
CERT should also have already estab- vidual contributions are essential to Cybercrime 2.0: When the Cloud Turns Dark
Niels Provos, Moheeb Abu Rajab,
lished lines of communication with success, for example, first responders, Panayiotis Mavrommatis
ISPs, law enforcement, and other ele- commercial pilots, and military per- http://queue.acm.org/detail.cfm?id=1517412
ments of government concerned with sonnel, we have studied the individual
CTO Roundtable: Malware Defense
infrastructure security. and group characteristics essential http://queue.acm.org/detail.cfm?id=1731902
Network operators’ groups pro- to success. To optimize the selection,
The Evolution of Security
mote community and cooperation training, and organization of CSIR per- Daniel E. Geer
between a country’s Internet opera- sonnel to support the essential cyber http://queue.acm.org/detail.cfm?id=1242500
tors and their foreign counterparts. missions of DHS, a much greater un-
Participation in Inter-network Opera- derstanding and appreciation of these
References
tions Center Dial-by-ASN (INOC-DBA) characteristics must be achieved.” 1. FIRST; http://first.org/about/.
and Network Service Provider Security 2. Inter-network Operations Center Dial-by-ASN
(INOC-DBA), a Resource for the Network Operator
(NSP-SEC) can also aid in coordinat- Conclusion Community; http://www2.computer.org/portal/web/
csdl/doi/10.1109/CATCH.2009.36.
ing incident response. INOC-DBA is It would be fair to describe these two 3. NSP Security Forum; http://puck.nether.net/mailman/
a voice over Internet Protocol (VoIP) incidents—Estonia in 2007, and Geor- listinfo/nsp-security.
hotline system, interconnecting net- gia a year later—as “cyberskirmish-
work operation centers; it uses the ing.” The attacks on Estonia amounted Bill Woodcock is a founder and research director of
networks’ own numeric identifiers as to little more than a nuisance, though a Packet Clearing House, a nonprofit research institute
dedicated to understanding and supporting Internet traffic
dialing numbers so that a NOC op- quite visible and much discussed one. exchange technology, policy, and economics. He entered
erator observing problematic traffic Georgia had far greater problems to the field of Internet routing research in 1989 while serving
as the network architect and operations director for an
can merely enter the address of the deal with in an armed incursion into international multiprotocol service-provision backbone
offending network to place a call to its territory, and the Internet was not a network. Woodcock has participated in the establishment
of more than 70 public Internet exchange points in
the responsible party.2 NSP-SEC is an factor in that fight. Europe, Africa, Asia, and the Americas.
informal organization of security pro- The difference in responsiveness Ross Stapleton-Gray is research program manager at
fessionals at the largest Internet infra- between the two, however, recom- Packet Clearing House. Prior to joining PCH, he served as
an intelligence analyst for the CIA, in information policy
structure providers: “Membership in mends that the small nation-state positions with the American Petroleum Institute and the
NSP-SEC is restricted to those actively ought to make investments in Inter- University of California Office of the President, and has
worked with several IT security start-ups, including as a
involved in the mitigation of [Network net defensibility akin to those seen in cofounder of Sandstorm Enterprises.
Service Provider] security incidents Estonia:
within organizations in the IP transit, ˲˲ Through policy and regulation, © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 55
practice
doi:10.1145/1897852.1897870
So much for theory. Does any of this
Article development led by
queue.acm.org
hold in practice?
One of the nice side effects of the
“software tools” concept is that pro-
Why can’t we all use standard libraries grams are data, too. We can apply data
for commonly needed algorithms? mining methods to program source
code, allowing us to investigate such
by Poul-Henning Kamp questions.
Cryptography algorithms provide a
B.Y.O.C
good example because they are easier
to identify than other algorithms. Mag-
ic numbers in crypto algorithms make
for good oracular answers to their pres-
(1,342 Times
ence: you are not likely to encounter
both 0xc76c51a3 and 0xd192e819 any-
where other than an implementation
of SHA-2. Creating an oracle to detect
and Counting)
sorting algorithms in source code with
(p>0.9) would be a good student proj-
ect (albeit, likely impossible).
For data mining FOSS (free and
open source software) programs, the
FreeBSD operating system ships with
a handy facility called the Ports Collec-
tion, containing strategic metadata for
22.003 pieces of FOSS. A small number
of these “ports” are successive versions
of the same software (Perl 5.8, Perl
A lthough seldom a rt i cu l at ed clearly, or even 5.10, among others), but the vast ma-
at all, one of the bedrock ideas of good software jority are independent pieces of soft-
ware, ranging from trivialities such as
engineering is reuse of code libraries holding easily XLogo to monsters such as Firefox and
accessible implementations of common algorithms OpenOffice.
A simple command downloads and
and facilities. The reason for this reticence is probably extracts the source code to as many
because there is no way to state it succinctly, without ports as possible into an easily navigat-
sounding like a cheap parody of Occam’s razor: ed directory tree:
Frustra fit per plura quod potest fieri per pauciora (it is cd /usr/ports ; make -k extract
pointless to do with several where few will suffice).
You will obviously need both suffi-
Obviously, choice of programming language means cient disk space and patience. (Using
that “few” will never be “a single one,” and until cd /usr/ports ; make -k -j 10
somebody releases a competent implementation extract will do 10 pieces of software
in parallel, but will be a bandwidth
under an open source license, we may have several hog.)
more versions floating around than are strictly The results are worse. I had not ex-
pected to see 1,342, as shown in the ac-
necessary, for legal rather than technological reasons. companying table.a I expect that these
It also never hurts to have a few competing numbers will trisect my readers into
implementations to inspire improvement; in fact, three somewhat flippantly labeled seg-
there seems to be a distinct lack of improvement a Sorry, I forgot to include the DES algorithm in
where a single implementation becomes too “golden.” the search.
56 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
2d-rewriter-1.4 -- Cellular automata simulator amoebax-0.2.1_4 -- Cute and addictive Puyo Puyo like game
2dhf-2005.05_4 -- A Numerical Hartree-Fock Program for Diatomic Molecules amp-0.7.6,1 -- Another mp3 player
2ping-1.0 -- A bi-directional ping utility ampache-3.5.4_1 -- A Web-based Audio file manager
3dc-0.8.1_3 -- 3-Dimensional Chess for X Window System ampasCTL-1.4.0 -- Color Transformation Language interpreter
3ddesktop-0.2.9_8 -- 3D Virtual Desktop Switcher amphetadesk-0.93.1_6 -- RSS client that serves HTTP to your local web browser
3dm-2.11.00.009_1,1 -- 3ware RAID controller monitoring daemon and web server amphetamine-0.8.10_7 -- A 2D - Jump'n'run shooter
3dpong-0.5_2 -- X Window 3D Pong game for 1 or 2 players with a ball and paddles ample-0.5.7 -- Allows you to listen to your own MP3's away from home
3proxy-0.6.1 -- Proxy servers set (support HTTP(S), FTP, SOCKS, POP3, TCP & UDP) amqp08-20090705 -- Vendor neutral AMQP 0.8 XML specification
44bsd-csh-20001106 -- The traditional 4.4BSD /bin/csh C-shell amrcoder-1.0 -- AMR encoder/decoder for Mbuni MMS Gateway (www.mbuni.org)
44bsd-more-20000521 -- The pager installed with FreeBSD before less(1) was imported amrstat-20070216 -- Utility for LSI Logic's MegaRAID RAID controllers
44bsd-rdist-20001111 -- The traditional 4.4BSD rdist amsn-0.98.3_1 -- Alvano\'s MSN Messenger
4stattack-2.1.4_6 -- Connect four stones in a straight line amspsfnt-1.0_5 -- AMSFonts PostScript Fonts (Adobe Type 1 format)
4va-1.21_2 -- Four-Dimensional graphics tumbler for X11 amule-2.2.6 -- The all-platform eMule p2p client
54321-1.0.2001.11.16_9 -- 54321 is five games in four-, three-, or two-dimensions for one
amule-devel-10390
player -- The all-platform eMule p2p client
6tunnel-0.11.r2_2 -- TCP proxy for applications that don't speak IPv6 an-0.95_1 -- Fast anagram generator
8kingdoms-1.1.0_3 -- 3D turn-based fantasy strategic game anacron-2.3_4 -- Schedules periodic jobs on systems that are not permanently up
915resolution-0.5.3_1,1 -- Resolution tool for Intel i915 video cards anagramarama-0.2_5 -- Anagramarama - a word game for Linux, Windows and BeOS
9base-b20090309 -- Port of various original plan9 tools analog-6.0_5,1 -- An extremely fast program for analysing WWW logfiles
9box-0.2.1_3 -- 9box can "pack" windows inside itself and-1.2.2 -- Auto Nice Daemon
9e-1.0 -- Explode Plan9 archives angband-3.1.2.v2 -- Rogue-like game with color, X11 support
9libs-1.0.1_2 -- Plan9 compatibility libraries angst-0.4b_2 -- An active sniffer
9menu-1.8_2 -- A simple menu patterned after Plan9 animenfo-client-20020819 -- AnimeNfo client
ACH-1.0.2 -- A free, open source tool for complex research problems animenfo-client-gui-gtk-20020819_9 -- AnimeNfo client with GTK support
ADMsmb-0.3 -- Security scanner for Samba animorph-0.3 -- Morphing engine for MakeHuman
ADMsnmp-0.1 -- SNMP audit scanner anjuta-2.32.1.0 -- Integrated Development Environment for C and C++
AquaGatekeeper-1.22_4 -- Aqua H323 Gatekeeper and proxy anjuta-extras-2.32.0.0 -- Extra anjuta plugins.
AquaGatekeeper-2.0_3 -- Aqua H323 Gatekeeper and proxy anki-1.0.1 -- Flashcard trainer with spaced repetition
Atlas-0.5.2_1 -- A C++ reference implementation of the Atlas protocol ann-1.1.2 -- A Library for Approximate Nearest Neighbor Searching
Atlas-0.6.2 -- A C++ reference implementation of the Atlas protocol annelid-1_4 -- Remake of the ubiquitous "Snake" and "Worm" games
AtomicParsley-0.9.0_1 -- Command line program for reading parsing and setting MP4/M4A metadata
annextools-10.0_1 -- BSD tools for the MicroAnnex-XL Terminal Server
AutoIndex-1.5.4_1 -- PHP 4.x script that makes a table that lists the files in a directory
annoyance-filter-1.0d -- Adaptive Bayesian Junk Mail Filter
AutoIndex-2.2.4 -- PHP 5.x script that makes a table that lists the files in a directoryanomy-sanitizer-1.76_4 -- Sanitize and clean incoming/outgoing mail
BillardGL-1.75_6 -- An OpenGL billiard simulator anope-1.8.5 -- A set of IRC services for IRC networks
BitchX-1.1.0.1_4 -- Feature-rich scriptable IRC client ansifilter-1.4 -- Customizable ANSI Code Converter
CKEditor-3.4 -- CKEditor is a WYSIWYG editor to be used inside web page ansiprint-1.0 -- Prints through a terminal with ANSI escape sequences
CalculiX-2.2 -- A Three-Dimensional Structural Finite Element Program ant-xinclude-task-0.2_2 -- XInclude task for Jakarta Ant
CaribbeanStud-1.0_2 -- Caribbean Stud gambling game for X Window System anteater-0.4.5 -- A MTA log analyzer
Cgraph-2.04_2 -- A PostScript plotting library in C antipolix-2.1_2 -- Simple multiplayer game for X Window System
Clp-1.11.0 -- Linear Programming Solver antivirus-3.30_6 -- Sendmail milter wich uses Mcafee Virus Scan or clamav
Coin-3.1.3_1 -- C++ 3D graphics library based on the Open Inventor 2.1 API antiwm-0.0.5 -- A minimalist window manager inspired by Ratpoison
DFileServer-1.1.3 -- A compact webserver designed to make sharing files easy antiword-0.37_1 -- An application to display Microsoft(tm) Word files
DTraceToolkit-0.99 -- Collection of useful scripts for DTrace antlr-2.7.6_2 -- ANother Tool for Language Recognition
DarwinStreamingServer-6.0.3 -- Darwin Streaming Server, a MP3, MPEG4 and QuickTime streaming
antlrworks-1.3.1,1
server -- The ANTLR GUI Development Environment
DirComp-1.3.10_2 -- Compare two directories antrix-1477_1 -- Free stable dedicated-server for World of Warcraft
E-Run-1.2_10 -- A simple epplet for launching arbitrary programs anubis-3.6.2_10 -- Outgoing SMTP mail processor
E-buttons-0.2_11 -- A simple epplet that contains several buttons used to launch programs
anyremote-5.2 -- Remote control service over Bluetooth, infrared or tcp/ip networking
EZWGL-1.50_6 -- The EZ Widget and Graphics Library anyremote2html-1.4 -- A web interface for anyRemote
FSViewer.app-0.2.5_9 -- X11 file manager using WINGS library. Dockable in WindowMaker anyterm-1.1.29 -- A terminal emulator on a Web page
FlightGear-2.0.0_3 -- The FlightGear flight simulator aoe-1.2.0 -- FreeBSD driver for ATA over Ethernet (AoE)
FlightGear-aircrafts-20100302 -- Additional aircrafts for the FlightGear flight simulator
aoi-2.5.1_2 -- An open source Java written 3D modelling and rendering studio
FlightGear-data-2.0.0_1 -- FlightGear scenery, textures and aircraft models aolserver-4.5.1_1 -- A multithreaded web server with embedded TCL interpreter
Fnorb-1.3_1 -- A CORBA 2.0 ORB written in Python aolserver-nsencrypt-0.4_2 -- OpenSSL data encryption module for AOLserver
FreeMat-4.0_1 -- An environment for rapid engineering and scientific processing aolserver-nsgd-2.0_8 -- Graphics module for AOLserver
FreeMat-mpi-4.0_1 -- An environment for rapid engineering and scientific processing aolserver-nsmcrypt-1.1_3 -- AOLserver interface to mcrypt library
Fudgit-2.41_1 -- Multi-purpose data-processing and fitting program aolserver-nsmhash-1.1_2 -- AOLserver interface to mhash library
GNUDoku-0.93_5 -- A free program for creating and solving Su Doku puzzles aolserver-nsmysql-1.0_2 -- Internal MySQL database driver for AOLserver
GSubEdit-0.4.p1_9 -- GNOME Subtitle Editor is a tool for editing/converting video subtitles
aolserver-nsopenssl-3.0.b26_1 -- OpenSSL socket encryption module for AOLserver
GTKsubtitler-0.2.4_8 -- A small GNOME program for editing and converting subtitles aolserver-nspostgres-4.1_3 -- Internal PostgreSQL driver for AOLserver
Gdtclft-2.2.5_9 -- A TCL interface to the Thomas Boutell's Gd library aolserver-nssha1-0.1_1 -- AOLserver module to perform SHA1 hashes
Generic-NQS-3.50.9_2 -- Generic Network Queuing System aolserver-nszlib-1.1_2 -- Zlib library interface for AOLserver
GeoIP-1.4.6 -- Find the country that any IP address or hostname originates from aolserver-xotcl-1.6.6 -- Object-oriented scripting language based on Tcl
GiNaC-1.5.6 -- A C++ library for symbolic mathematical calculations aop-0.6 -- A curses based arcade game with only 64 lines of code
GimpUserManual-HTML-2_1 -- The user manual for the GNU Image Manipulation Program (GIMP)ap-utils-1.4.1_4 -- A set of utilities to configure and monitor wireless access points
GimpUserManual-PDF-2_1 -- The user manual for the GNU Image Manipulation Program (GIMP) ap13-mod_accessCookie-0.4 -- Supply access control based cookies stored in a MySQL database
GraphicsMagick-1.1.15_3,1 -- Fast image processing tools based on ImageMagick ap13-mod_access_identd-1.2.0 -- Apache module to supply access control based on ident reply
GraphicsMagick-1.2.10_1 -- Fast image processing tools based on ImageMagick ap13-mod_access_referer-1.0.2_1 -- Provides access control based on Referer HTTP header for Apache
GraphicsMagick-1.3.12_1 -- Fast image processing tools based on ImageMagick ap13-mod_accounting-0.5_7 -- An Apache module that records traffic statistics into a database
HVSC-Update-2.8.4 -- Update program for the HVSC C= 64 SID tune collection ap13-mod_auth_any-1.5 -- Apache module to use any command line program to authenticate a user
Hermes-1.3.3_2 -- Fast pixel formats conversion library ap13-mod_auth_cookie_mysql-1.0 -- Allows authentication against a MySQL database via a secure cookie
HeroesOfMightAndMagic-3_1 -- BSD Installation of the Linux game "Heroes of Might and Magic
ap13-mod_auth_external-2.1.19_1
III" -- Enables the use of external mechanisms for user authentication
Hyperlatex-2.9.a_2 -- Produce HTML and printed documents from LaTeX source ap13-mod_auth_imap-1.1 -- An Apache module to provide authentication via an IMAP mail server
IExtract-0.9.30_1 -- Extract meta-information from files ap13-mod_auth_kerb-5.3,1 -- An Apache module for authenticating users with Kerberos v5
IMHear-1.0 -- An MSN Messenger event/message sniffer ap13-mod_auth_mysql-3.2 -- Allows users to use MySQL databases for user authentication
IMP-1.0.7_1 -- Monadic interpreter of a simple imperative language ap13-mod_auth_mysql_another-3.0.0_2 -- Allows users to use MySQL databases for user authentication
IPA-1.07 -- Image Processing Algorithms ap13-mod_auth_pam-1.1.1_1 -- Allows users to use PAM modules for user authentication
Ice-3.4.1 -- A modern alternative to object middleware such as CORBA/COM/DCOM/COM+ ap13-mod_auth_pgsql-0.9.12_4 -- Allows users to use PostgreSQL databases for user authentication
IglooFTP-0.6.1_6 -- Easy to use FTP client for X Window System ap13-mod_auth_pubtkt-0.6 -- An Apache module to provide public key ticket based authentication
ImageMagick-6.6.5.10 -- Image processing tools ap13-mod_auth_useragent-1.0 -- Allows you to forbid clients based on their User-Agent
InsightToolkit-2.8.1_2 -- Insight Toolkit ap13-mod_backhand-1.2.2_1 -- Apache module that allows seamless redirection and load balancing of HTTP requests
Judy-1.0.5 -- General purpose dynamic array ap13-mod_bandwidth-2.1.0 -- Bandwidth management module for the Apache webserver
KPackageKit-0.6.2 -- KDE interface for PackageKit ap13-mod_bf-0.2 -- A brainf*ck module for Apache
KSubeditor-0.2_9,1 -- A video subtitle editor for KDE ap13-mod_blosxom-0.05_1 -- Apache module to build the extremely lightweight Weblog environment
KeePassX-0.4.3 -- Cross Platform Password Manager ap13-mod_blowchunks-1.3_1 -- Apache module for rejecting and logging chunked requests
KrossWordPlayer-1.4_10 -- KDE crossword puzzle game ap13-mod_bunzip2-1 -- Apache module for server-side decompression of bzip2 files
LBreeder-1.0_15 -- Allows you to display and breed L-system forms ap13-mod_cgi_debug-0.7 -- Apache module to make debugging server-side scripts easier
LPRng-3.8.32_2 -- An Enhanced Printer Spooler ap13-mod_chroot-0.5 -- The mod_chroot makes running Apache in a chroot easy
LPRngTool-1.3.2_5 -- Configuration Tool for LPRng ap13-mod_color-0.3 -- Apache module that provides syntax coloring for various languages
LaBrea-2.4_2 -- Security tarpit defense tool ap13-mod_curb-1.1 -- A per-server bandwidth limiter module for Apache 1.3
LabPlot-1.6.0.2_11 -- LabPlot : Data analysis and visualisation ap13-mod_cvs-0.5 -- A module that makes Apache CVS aware
Lila-xfwm4-0.3.1_7 -- XFce 4 Lila window decoration theme for xfwm4 ap13-mod_dav-1.0.3_4 -- An Apache module that provides DAV capabilities
LinNeighborhood-0.6.5_11 -- GTK+ gui for browsing and mounting SMB filesystems ap13-mod_dtcl-0.12.0_1 -- Embeds a TCL8 interpreter in the Apache server
MT-5.03_1 -- A web-based personal publishing system for weblogs ap13-mod_encoding-20021209_2 -- Apache module for non-ascii filename interoperability
Maaate-0.3.1_3 -- MPEG audio analysis toolkit ap13-mod_evasive-1.10.1 -- An Apache module to try to protect the HTTP Server from DoS/DDoS attacks
MailScanner-4.81.4_1 -- Powerful virus/spam scanning framework for mail gateways ap13-mod_extract_forwarded-1.4 -- An Apache module that can make proxied requests appear with client IP
MathPlanner-3.1.3_6 -- A mathematical design and publishing application ap13-mod_fastcgi-2.4.6_1 -- A fast-cgi module for Apache
Mixminion-0.0.8.a3 -- A Type III Anonymous Remailer ap13-mod_filter-1.4.1_1 -- Filter output from other modules inside of Apache
Mowitz-0.2.1_4 -- This is the Mowitz ("More widgets") library ap13-mod_geoip-1.3.4_1 -- An Apache module that provides the country code of the client's IP
MuSE-0.9.2_9 -- Multiple Streaming Engine ap13-mod_gzip-1.3.26.1a -- An Internet Content Acceleration module for Apache
MyPasswordSafe-20061216_6 -- Easy-to-use password manager compatible with Password Safe ap13-mod_hosts_access-1.1.0 -- Apache module that makes Apache respect hosts.allow and hosts.deny
NagiosAgent-1.2.0.1_1 -- A QT-based frontend to Nagios ap13-mod_index_rss-1.0 -- Apache module to provides RSS output for directories
Nefarious-1.2.0 -- IRC server used by evilnet based off of Undernet\'s ircu ap13-mod_jail-0.4_2 -- Apache 1.3.x/2.0.xx module to enable an easy alternative to mod_chroot
Net-IMAP-Server-1.29 -- A single-threaded multiplexing IMAP server implementation ap13-mod_jk-1.2.30,1 -- Apache JK module for connecting to Tomcat using AJP1X
NetHirc-0.94 -- Perl-based IRC client that uses Net::IRC ap13-mod_layout-3.4 -- Apache module to wrap served pages with a header and/or footer
NetPIPE-3.7.1 -- A self-scaling network benchmark ap13-mod_limitipconn-0.04_1 -- Limit the number of simultaneous connections from a single IP address
NetRexx-2.05_3 -- Human-oriented programming language for writing/using Java classes ap13-mod_log_spread-1.0.4_1 -- An Apache module interfacing with spread
NetSpades-4.2.0_6 -- Very popular card game for 1-4 players over a network ap13-mod_log_sql-1.101 -- Allows Apache to log to a MySQL database
NunniMCAX-1.4.1 -- C, non validating XML parser with SAX-like API ap13-mod_log_sql-1.18_3 -- Allows Apache to log to a MySQL database
O2-tools-2.00 -- Huge image processing tools and libraries ap13-mod_macro-1.1.2b -- Apache module for use macros in config files
OQTEncoder-0.1_6 -- A simple encoder using OpenQuicktime (TM) ap13-mod_mod_scgi-1.12 -- Apache module that implements the client side of the SCGI protocol
OQTPlayer-0.5_9 -- A very very small, not functionnal, video OpenQuicktime (TM) player ap13-mod_mp3-0.40_1 -- Apache module to allow MP3 streaming
ORBit-0.5.17_5 -- High-performance CORBA ORB with support for the C language ap13-mod_mylo-0.2.2 -- An Apache module to make Apache log to MySQL
ORBit2-2.14.19 -- High-performance CORBA ORB with support for the C language ap13-mod_ntlm-0.4 -- NTLM authentication module for the Apache webserver
ORBit2-reference-2.14.19 -- Programming reference for devel/ORBit2 ap13-mod_perl-1.31_1 -- Embeds a Perl interpreter in the Apache server
Ocsinventory-Agent-1.1.2.1,1 -- Keep track of the computers configuration and software ap13-mod_proxy_add_forward-20020710 -- Apache module that adds a client IP header to outgoing proxy requests
OpenEXR-1.6.1_3 -- A high dynamic-range (HDR) image file format ap13-mod_put-1.3_1 -- An Apache module that provides PUT and DELETE methods
OpenSP-1.5.2_2 -- This package is a collection of SGML/XML tools called OpenSP ap13-mod_python-2.7.11 -- Apache 1.3 module for integrating Python
OpenSSH-askpass-1.2.4.1 -- Graphical password applet for entering SSH passphrase ap13-mod_realip-2.0 -- Apache module to fix IP addresses in proxied requests
OpenVerse-0.8.7_3 -- A visual chat program written in Tcl/Tk ap13-mod_roaming-1.0.2_1 -- An Apache module that works as a Netscape Roaming Access server
PDL-2.4.4_7 -- Perl Data Language ap13-mod_rpaf-0.6 -- Make proxied requests appear with client IP
PTlink-IRCd-6.19.6 -- PTlink IRC daemon ap13-mod_ruby-1.3.0_1 -- An Apache module that embeds Ruby interpreter within
PTlink-Services-3.9.2 -- PTlink IRC services ap13-mod_sed-0.1 -- An apache module that embeds a copy of the sed(1) command
PackageKit-0.6.10 -- A DBUS packaging abstraction layer ap13-mod_sequester-1.8 -- Apache module that controls access to the website using secure info
PackageKit-qt4-0.6.10 -- Qt4 bindings to packagekit ap13-mod_shapvh-1.0 -- Apache module that provides virtual hosts from a database
ParMetis-3.1_5 -- A package for parallel (mpi) unstructured graph partitioning ap13-mod_sqlinclude-1.4_1 -- An Apache module implementing config inclusion from MySQL databases
PenguinTV-4.1.0_3 -- Graphical RSS feed reader with incorperated playback functions - development
ap13-mod_ticket-1.0
version -- Apache module for a digitally signed ticket in URL
PicMonger-0.9.6_9 -- An automated USENET (NNTP) picture decoding client ap13-mod_trigger-1.1 -- Apache module to launch triggers if certain actions occur
Pymacs-0.22_4 -- A Python package for scripting emacs ap13-mod_tsunami-3.0_1 -- Apache module which dynamically limits a site's slot usage
QNetChess-1.1_6 -- Qt based chess multiplayer game ap13-mod_uid-1.1.0 -- A module issuing the "correct" cookies for counting the site visitors
R-2.11.1 -- A language for statistical computing and graphics ap13-mod_webkit-1.1b1 -- A apache module for WebWare WebKit AppServer
R-cran-RSvgDevice-0.6.4.1_4 -- A R SVG graphics device ap13-mod_wsgi-2.8 -- Python WSGI adapter module for Apache
R-cran-Zelig-3.4.8 -- Everyone's Statistical Software ap13-mod_wsgi-3.3 -- Python WSGI adapter module for Apache
R-cran-car-1.2.16 -- Companion to Applied Regression for R ap20-mod_antiloris-0.4 -- Protect Apache 2.x against the Slowloris HTTP DoS attack
R-cran-gpclib-1.5.1 -- General Polygon Clipping Library for R ap20-mod_auth_cas-1.0.8 -- Apache 2.x module that supports the CASv1 and CASv2 protocols
R-cran-igraph-0.5.2_4 -- R extension package for igraph ap20-mod_auth_cookie_mysql2-0.9.a -- Allows authentication against a MySQL database via a secure cookie
R-cran-inline-0.3.6 -- Inline C, C++, Fortran function calls from R ap20-mod_auth_external-2.2.11 -- Allows users authentication based on external mechanisms
R-cran-psych-1.0.91 -- Psych package for the R project ap20-mod_auth_form-2.05_1 -- MySQL based form authentication module for Apache 2.x
R-cran-sm-2.2.4 -- Smoothing methods for nonparametric regression and density estimationap20-mod_auth_imap-2.2.0 -- An Apache 2 module to provide authentication via an IMAP mail server
R-cran-sp-0.9.62 -- R Classes and Methods for Spatial Data ap20-mod_auth_kerb-5.4_2 -- An Apache module for authenticating users with Kerberos v5
REminiscence-0.1.9_4 -- A rewritten engine for Flashback ap20-mod_auth_ldap-2.12_1 -- Apache module to authenticate against an LDAP directory
Radiator-4.7_1 -- Radiator Radius Server by Open System Consultants ap20-mod_auth_mysql-1.10 -- MySQL-based authentication module with VirtualHost support
RealTimeBattle-1.0.8_8 -- Robot programming game for UNIX ap20-mod_auth_openid-0.5 -- An authentication module for the Apache 2 webserver with OpenID
Ri-li-2.0.1_3 -- Drive a toy wood train in many levels - snake-like arcade game ap20-mod_auth_pam-1.1.1_3 -- Allows users to use PAM modules for user authentication
SNMP4Nagios-0.4 -- Vendor specific SNMP plugins for Nagios ap20-mod_auth_pgsql-2.0.3_1 -- Allows users to use PostgreSQL databases for user authentication
SPE-0.8.4.h_2 -- Stani's Python Editor ap20-mod_auth_remote-1.0 -- Allows users to authenticate on a remote web server
STk-4.0.1_2 -- A scheme interpreter with full access to the Tk graphical package ap20-mod_auth_xradius-0.4.6 -- Enables RADIUS authentication
Sablot-1.0.3 -- XML toolkit implementing XSLT 1.0, XPath 1.0 and DOM Level2 ap20-mod_authenticache-2.0.8_1 -- A generic credential caching module for Apache 2.0.x
SciPlot-1.36_2 -- A full-featured Xt widget to display 2D data in a graph ap20-mod_backtrace-1.0 -- Collects backtraces when a child process crashes
SearchAndRescue-1.1.0 -- A flight simulator in which the player rescues people ap20-mod_bw-0.8 -- Bandwidth and Connection control per Virtual Host or Directory
SearchAndRescue-data-1.0.0 -- The data files for SearchAndRescue flight simulator ap20-mod_cband-0.9.7.5_2 -- A per-virtualhost bandwidth limiter module for Apache 2
SimGear-2.0.0_3 -- A toolkit for 3D games and simulations ap20-mod_cfg_ldap-1.2_1 -- Allows you to keep your virtual host configuration in a LDAP directory
SoQt-1.5.0_1 -- Qt4 toolkit library for Coin ap20-mod_cplusplus-1.5.4_1 -- Apache module for loading C++ objects as handlers
SoXt-1.2.2_8 -- GUI binding for using Open Inventor with Xt/Motif ap20-mod_cvs-0.5.91_1 -- A module that makes Apache 2 CVS aware
Sockets-2.3.9.2 -- A C++ wrapper for BSD-style sockets ap20-mod_domaintree-1.6 -- Hostname to filesystem mapper for Apache 2
SoftMaker-Office-2006_2 -- Microsoft Word/Excel OpenDocument and OpenOffice.org editor ap20-mod_extract_forwarded-2.0.2_2 -- An Apache module that can make proxied requests appear with client IP
SpecTcl-1.1_4 -- Free drag-and-drop GUI builder for Tk and Java from Sun ap20-mod_fcgid-2.3.5 -- An alternative FastCGI module for Apache2
TclExpat-1.1_6 -- The TCL interface to Expat library ap20-mod_fileiri-1.15 -- A http IRIs module for Apache 2
Tee-3.4 -- An enhanced version of tee(1) ap20-mod_flickr-1.0_1 -- Apache module for Flickr API access
TekNap-1.3.g_3 -- Console napster client ap20-mod_geoip2-1.2.5 -- An Apache module that provides the country code of the client's IP
TenDRA-4.20051112 -- A portable BSD-licensed compiler suite ap20-mod_gzip2-2.1.0_1 -- An Internet Content Acceleration module for Apache2+
Terminal-0.4.5 -- Terminal emulator for the X windowing system ap20-mod_jk-ap2-1.2.30_1 -- Apache2 JK module for connecting to Tomcat using AJP1X
TestU01-1.2.3_1 -- Utilities for statistical testing of uniform random number generatorsap20-mod_layout-4.1 -- Apache2 module to wrap served pages with a header and/or footer
Thunar-1.0.2 -- XFce 4 file manager ap20-mod_limitipconn-0.23_2 -- Allows you to limit the number of simultaneous connexions
Tk-FileDialog-1.3_3 -- Tk::FileDialog - A file selector dialog for perl/Tk ap20-mod_lisp2-1.3.1_1 -- Apache2 module for use with Common Lisp
TkTopNetFlows-0.4_4 -- GUI tool for NetFlow data visualisation ap20-mod_log_config-st-1.0_1 -- A modified version of mod_log_config for apache2
Unreal-3.2.8.1_2 -- Unreal - the next generation ircd ap20-mod_log_data-0.0.3_1 -- Module for Apache 2.0 which logs incoming and outgoing data
UserManager-2.1 -- Easily create, change, or delete virtual PureFTPd users ap20-mod_log_firstbyte-1.01 -- Log the time between request and the first byte of the response served
VisualOS-1.0.5_8 -- A visual simulator of an operating system to help understand how OSes
ap20-mod_log_mysql-1.0_1
work -- Allows Apache 2 to log to a MySQL database
WWWdb-0.8.3 -- A Perl based generic WWW DB interface / frontend ap20-mod_macro-1.1.6 -- Apache 2.0.x module for use macros in config files
WadcomBlog-0.3 -- Simple open-source static blog engine written in Python ap20-mod_mono-2.6.3 -- Apache module for serving ASP.NET applications
WebCalendar-1.0.5_2 -- A web-based calendar application ap20-mod_musicindex-1.3.5 -- Apache module that allows downloading and streaming of audio
WebCalendar-devel-1.2.1 -- A web-based calendar application ap20-mod_ntlm2-0.1_3 -- NTLM authentication module for the Apache2 webserver
WebMagick-2.03p3_39,1 -- Image Web Generator - recursively build HTMLs, imagemaps, thumbnails
ap20-mod_perl2-2.0.4_2,3 -- Embeds a Perl interpreter in the Apache2 server
WhistlerK-200010142358_5 -- A GTK theme engine inspired by the Windows Whistler ap20-mod_proctitle-0.3 -- Set httpd process titles to reflect currently processed request
WildMagic-4.p9 -- The Wild Magic Real-Time 3D Graphics Engine ap20-mod_proxy_xml-0.1 -- Apache module for rewriting URI references in XML
Wingz-142_2 -- A Commercial Spreadsheet ap20-mod_pubcookie-3.3.0 -- A single sign-on system for websites (apache module)
WordNet-3.0_2 -- Dictionaries and thesauri with devel. libraries (C, TCL) and browsers ap20-mod_roaming2-2.0.0 -- An Apache module that works as a Netscape Roaming Access server
WowzaMediaServerPro-1.7.2 -- Commercial flash media server written in java ap20-mod_rpaf-ap2-0.6 -- Make proxied requests appear with client IP
XBone-3.2_5 -- Deploys and manages IP-based VPNs (aka "virtual Internets") ap20-mod_security-2.5.12 -- An intrusion detection and prevention engine
XBone-GUI-3.2_5 -- The GUI for XBone, a tool to deploy and manage IP-based VPNs ap20-mod_security21-2.1.7 -- An intrusion detection and prevention engine
XNap-2.5.r3_3 -- A pure java napster client; also, supports OpenNap & giFT (FastTrack) ap20-mod_tidy-0.5.5 -- Validates the HTML output of your apache2 webserver
XPostitPlus-2.3_3 -- PostIt (R) messages onto your X11 screen ap20-mod_traf_thief-0.01 -- Allows you to redirect part of the traffic to your url
XScreenSaver.App-2.3_3 -- WindowMaker dockapp to control XScreenSaver ap20-mod_transform-0.6.0 -- An XSLT and XIncludes Filter module for Apache 2.0
Xaw3d-1.5E_4 -- A 3-D Athena Widget set that looks like Motif ap20-mod_tsa-1.0_1 -- Time stamping authority (RFC 3161) module for apache
XawPlus-3.1.0_4 -- A replacement for Xaw with a nicer 3-D look and some extensions ap20-mod_vdbh-1.0.3 -- Allows mass virtual hosting using a MySQL backend with Apache 2.0.x
Xbae-4.60.4 -- A Motif-based widget which displays a grid of cells as a spreadsheet ap20-mod_vhost_ldap-1.0_1 -- Virtual Hosting from ldap built on top of mod_ldap
XmHTML-1.1.7_9 -- A Motif widget set for displaying HTML 3.2 documents ap20-mod_whatkilledus-2.0 -- Logs a report when a child process crashes
ZendFramework-1.11.1 -- A framework for developing PHP web applications ap20-mod_xmlns-0.97 -- Apache module for XML namespaces
ZendOptimizer-3.3.0.a -- An optimizer for PHP code ap20-mod_xsendfile-0.12 -- An Apache2 module that processes X-SENDFILE headers
a2dev-1.2_1 -- Apple II 6502 assembler, linker, loader, and object file viewer ap22-mod_authn_sasl-1.1 -- Allows user authentication based on libsasl2 mechanisms on apache 2.2
a2pdf-1.13 -- Text to PDF converter ap22-mod_authnz_external-3.1.2_2 -- Allows users authentication based on external mechanisms on apache 2.2
a2png-0.1.5_4 -- Converts plain ASCII text into PNG bitmap images ap22-mod_authz_unixgroup-1.0.1_2 -- A unix group access control module for Apache 2.1 and later
a2ps-a4-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_clamav-0.23_4 -- Scans content delivered by the Apache20 proxy module for viruses
a2ps-letter-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_dnssd-0.6_8 -- An Apache module that provides DNS-SD capabilities
a2ps-letterdj-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_h264_streaming-2.2.7_1 -- Apache H264 streaming module
aXe-6.1.2_3 -- Simple to use text editor for X ap22-mod_layout-5.1_5 -- Apache2.2 module to wrap served pages with a header and/or footer
aa-56_2 -- Self-contained ephemeris calculator ap22-mod_line_edit-1.0.0_1 -- Apache module for simple text rewriting
aacgain-1.8 -- Normalizes the volume of mp3 and AAC (mp4/m4a/QuickTime) media files ap22-mod_log_dbd-0.2_3 -- Uses APR DBD to store Apache access logs in a database
aacplusenc-0.17.1 -- aacPlus v2 command-line encoder ap22-mod_log_sql-dtc-1.101_3 -- Allows Apache to log to a MySQL database
aafid2-0.10_3 -- A distributed monitoring and intrusion detection system ap22-mod_macro-1.1.11 -- Apache 2.2.x module for use macros in config files
aalib-1.4.r5_5 -- An ascii art library ap22-mod_memcache-0.1.0_4 -- Apache 2.2.x module to manage apr_memcache connections
aamath-0.3_1 -- Renders ASCII art from mathematical expressions ap22-mod_proxy_html-3.1.2 -- Apache module for rewriting HTML links in proxied content
aap-1.091 -- A build tool alternative to make with internet access and CVS support ap22-mod_python-3.3.1_3 -- Apache module that embeds the Python interpreter within the server
aaphoto-0.39_1 -- Auto Adjust Photo, automatic color correction of photos ap22-mod_remoteip-2.3.5.a -- Replaces the client IP address/hostname with that given by a proxy
abacus-0.9.13_4 -- Spread sheet for X Window System ap22-mod_smooth_streaming-1.0.8_1 -- Apache smooth streaming module
abakus-0.91_9 -- Michael Pyne's Abakus Calculator ap22-mod_vhs-1.1.0 -- Mass virtual hosting using mod_ldap or mod_dbd with Apache 2.2.x
abby-0.4.8_2 -- Front-end for c/clive apache+ipv6-1.3.42 -- The extremely popular Apache http server. Very fast, very clean
abc2mtex-1.6.1 -- Music TeX converter from "abc" to MusiXTeX format apache+mod_perl-1.3.42 -- The Apache 1.3 webserver with a statically embedded perl interpreter
abcde-2.3.3_4 -- Front-end shell script to encode CDs in flac/mp3/ogg/speex format apache+mod_ssl+ipv6-1.3.41+2.8.31_2 -- The Apache 1.3 webserver with SSL/TLS and IPv6 functionality
abck-2.2 -- Manage intrusion attemps recorded in the system log apache+mod_ssl-1.3.41+2.8.31_2 -- The Apache 1.3 webserver with SSL/TLS functionality
abcl-0.0.10_3 -- An implementation of ANSI Common Lisp in Java apache+ssl-1.3.41.1.59_1 -- Apache secure webserver integrating OpenSSL
abclock-1.0d_2 -- Clock for X that displays hours and minutes in an analog fashion apache-1.3.42 -- The extremely popular Apache http server. Very fast, very clean
abcm2ps-5.9.16 -- Converts ABC to music sheet in PostScript format apache-2.0.64 -- Version 2.0.x of Apache web server with prefork MPM.
abcmidi-2010.02.23 -- Convert abc music files to MIDI and PostScript apache-2.2.17_1 -- Version 2.2.x of Apache web server with prefork MPM.
abcselect-1.5 -- Extract parts, movements, etc from abc music files apache-ant-1.8.1 -- Java- and XML-based build tool, conceptually similar to make
abe-1.1_4 -- Abe's Amazing Adventure apache-contrib-1.0.8_1 -- Third-party modules contributed to the Apache HTTP server project
abgx360-1.0.5 -- Verify and repair Xbox 360 backup images apache-event-2.2.17_1 -- Version 2.2.x of Apache web server with event MPM.
abgx360gui-1.0.2_2 -- A wxWidgets frontend for abgx360 apache-forrest-0.8_3 -- A tool for rapid development of small sites
abi-compliance-checker-1.21.7 -- Checks binary compatibility of two versions of a C/C++ apache-itk-2.2.17_1
shared library -- Version 2.2.x of Apache web server with itk MPM.
abills-0.51 -- Billing system for dialup, VPN and VoIP management apache-mode.el-2.0 -- [X]Emacs major mode for editing Apache configuration files
abinit-5.7.3_8 -- Abinit calculates electronic structure of systems apache-peruser-2.2.17_1 -- Version 2.2.x of Apache web server with peruser MPM.
abiword-2.8.4_1 -- An open-source, cross-platform WYSIWYG word processor apache-solr-1.4.1 -- High performance search server built using Lucene Java
abiword-docs-2.8.4 -- AbiWord help files apache-tomcat-4.1.36_2 -- Open-source Java web server by Apache, stable 4.1.x branch
abntex-0.8.2_3 -- Both classes and styles for both LaTex and bibtex for ABNT rules apache-worker-2.2.17_1 -- Version 2.2.x of Apache web server with worker MPM.
abook-0.5.6_4 -- An addressbook program with mutt mail client support apache-xml-security-c-1.4.0 -- Apache XML security libraries C version
abraca-0.4_2 -- Abraca is a GTK2 client for the XMMS2 music player apachetop-0.12.6_2 -- Apache RealTime log stats
abs-0908_3 -- A free spreadsheet with graphical user interface apc-1.0_4 -- An xforms based Auto Payment Calculator
abuse-2.0_3 -- The classic 2D action game Abuse apcpwr-1.2_1 -- Control APC 9211 MasterSwitchs via snmp
abuse_sdl-0.7.1 -- An SDL port of the Abuse game engine apcupsd-3.14.8_1 -- Set of programs for controlling APC UPS
abyssws-2.6 -- Abyss Web Server is a compact and easy to use web server apel-emacs21-10.8 -- A Portable Emacs Library for emacs21
accerciser-1.12.1 -- Interactive Python accessibility explorer for GNOME apel-emacs22-10.8 -- A Portable Emacs Library for emacs22
accessx-0.951_5 -- Customise accessibility features for X apel-emacs23-10.8 -- A Portable Emacs Library for emacs
accrete-1.0 -- Accrete is a physical simulation of solar system planet formation apercu-1.0.2 -- Summarize information from Apache logs
ace+tao-5.4.2+1.4.2 -- The Adaptive Communication Environment (ACE) with The ACE ORB (TAO)
apertium-3.1.1 -- A toolbox to build shallow-transfer machine translation systems
ace+tao-doc-5.5.0 -- The ACE+TAO HTML documentation apg-2.3.0b_1 -- An automated password generator
ace-5.5.2_3 -- The Adaptive Communication Environment for C++ api-sanity-autotest-1.11 -- Quickly generate sanity tests for the API of a C/C++ shared library
acfax-0.981011_3 -- Receive faxes using sound card and radio apinger-0.6.1_2 -- An IP device monitoring tool
achievo-1.1.0_1 -- A flexible web-based resource management tool apngasm-2.2 -- Create Animated PNG from a sequence of files
acidlaunch-0.5_7 -- An application launcher with simple XML-based configuration syntax apollon-1.0.2.1_4 -- KDE client for giFT daemon
acidrip-0.14_8 -- GTK2::Perl wrapper for MPlayer and MEncoder for ripping DVDs apoolGL-0.99.22_4 -- Another billiard simulator
acidwarp-1.0 -- SVGAlib demo which displays trippy mathematical images in cycling colorsapp_notify-2.0.r1_6 -- Notify application module for the Asterisk PBX
aclgen-2.02 -- Optimize Cisco routers ip access lists apparix-20081026 -- Bookmark directories and apparate inside them
aclock-0.3 -- Analog Clock for GNUstep appres-1.0.2 -- Program to list application's resources
acm-5.0_2 -- A flight simulator for X11 appwrapper-0.1_2 -- GNUstep application wrapper
acovea-5.1.1_1 -- Tool to find the "best" compiler options using genetic algorithm apr-0.9.19.0.9.19 -- Apache Portability Library
acovea-gtk-1.0.1_5 -- GTK+ front-end to ACOVEA apr-ipv6-devrandom-gdbm-db42-1.4.2.1.3.10 -- Apache Portability Library
acpicatools-20030523.0 -- Some utilities for Intel ACPICA (Debugger, ASL Compiler and etc.)
apr-ipv6-devrandom-gdbm-db42-2.0.20100610211336_1 -- Apache Portability Library
acrobatviewer-1.1_2 -- Viewer for the PDF files written in Java(TM) apricots-0.2.6_2 -- Fly a little plane around and shoot things and drop bombs
acron-1.0 -- Database of acronyms and abbreviations aprsd-2.2.515 -- Server daemon providing Internet access to APRS packet data
acroread8-8.1.7_2 -- Adobe Reader for view, print, and search PDF documents (ENU) apsfilter-7.2.8_8 -- Magic print filter with print preview, duplex printing and more
acroread9-9.3.4 -- Adobe Reader for view, print, and search PDF documents (ENU) apt-0.6.46.4.1_5 -- Advanced front-end for dpkg
acroreadwrapper-0.0.20100806 -- Wrapper script for Adobe Reader apvlv-0.0.9.8_1 -- Apvlv is a PDF Viewer Under Linux and its behaviour like Vim
activemq-5.4.1 -- Messaging and Integration Patterns provider apwal-0.4.5_9 -- Simple and powerful application launcher
activitymail-1.26 -- A program for sending email messages for CVS repository commits aqbanking-4.2.4_3 -- Online banking interface and financial data framework
actx-1.23_2 -- Window sitter for X11 aqbubble-0.3_10 -- Game similar to snow bros
acx-6.1,1 -- Texas Instruments (TI) ACX100 and ACX111 IEEE 802.11 driver aqemu-0.8.0 -- Qt4 based Qemu frontend
adabooch-20030309 -- Library which provide container classes as well as powertools for Ada
aqmoney-0.6.3 -- Manage your credit institute accounts using openhbci
adabooch-doc-20030309 -- Manual for adabooch aqsis-1.6.0_3 -- A photorealistic rendering system
adacurses-5.7 -- Curses library for Ada aqualung-0.9.b11_9 -- Music player with rich features
adamem-1.0_2 -- ADAMEm is a portable Coleco ADAM and ColecoVision emulator ar-ae_fonts1_ttf-1.1_2 -- A collection of truetype Arabic fonts created by Arabeyes.org
adasdl-20010504_9 -- An Ada thin binding to SDL ar-ae_fonts_mono-1.0_2 -- A collection of PCF fonts that include Arabic glyphs
adblock-0.5.d_6 -- A content filtering plug-in for seamonkey ar-arabtex-3.11_4 -- A TeX/LaTeX package to generate the arabic writing
adcomplain-3.52 -- Complain about inappropriate commercial use (f.e. SPAM) of usenet/e-mail
ar-aspell-1.2.0_1,1 -- Aspell Arabic dictionaries
add-20100708 -- Full-screen editing calculator ar-kacst_fonts-2.01 -- Truetype Arabic fonts created by KACST
add-css-links-1.0_1 -- Add one or more CSS <link> elements to an XHTML document ar-kde-i18n-3.5.10_4 -- Arabic messages and documentation for KDE3
addresses-0.4.7_2 -- A versatile addressbook for GNUstep ar-kde-l10n-4.5.4 -- Arabic messages and documentation for KDE4
addresses-goodies-0.4.7_1 -- Goodies for addressbook for GNUstep ar-khotot-1.0_2 -- A meta-port of the most popular Arabic font packages
adesklets-0.6.1_8 -- An interactive Imlib2 console for the X Window system ar-koffice-i18n-1.5.2_6 -- Arabic messages and documentation for koffice
adgali-0.2.4_8 -- An open source game library useful for 2D game development ar-libitl-0.7.0 -- An API abstraction to common Islamic calculations
adime-2.2.1_2 -- Generate Allegro dialogs in a very simple way arc-5.21o_1 -- Create & extract files from DOS .ARC files
admesh-0.95_1 -- Program for processing STL triangulated solid meshes arcconf-v6.50.18570 -- Adaptec SCSI RAID administration tool
adminer-3.1.0 -- A full-featured MySQL management tool written in PHP archivemail-0.8.2 -- Archive or delete mail older than N days
adms-2.2.9 -- A model generator for SPICE simulators archivemount-0.6.0 -- Mount archives with FUSE
admuser-2.3.2 -- Handle your Squid or Web users and passwords using your browser archiveopteryx-3.1.3 -- An advanced PostgreSQL-based IMAP/POP server
adns-1.4_1 -- Easy to use, asynchronous-capable DNS client library and utilities archivesmtp-1.1.b1 -- SMTP mail archiver
adobe-cmaps-20051217_1 -- Adobe CMap collection archmage-0.2.4 -- Extensible reader/decompiler of files in CHM format
adocman-0.13_1 -- Automated sourceforge administration tool archmbox-4.10.0 -- Email archiver written in perl; parses mailboxes and performs actions
adodb-4.99.2 -- Database library for PHP ardour-2.8.2_4 -- A multichannel digital audio workstation
adodb-5.11 -- Database library for PHP arduino-0019 -- Open-source electronics prototyping platform
adom-1.1.1_2 -- An rogue-like advanced rpg with color support (binary port) areca-cli-i386-1.83.091103 -- Command Line Interface for the Areca ARC-xxxx RAID controllers
adonthell-0.3.5_6 -- A free role playing game arena-0.9.13 -- C-like scripting language with automatic memory management
adpcm-1.2 -- An Intel/DVI IMA ADPCM codec library ares-1.1.1 -- An asynchronous DNS resolver library
adplay-1.7_3 -- AdLib player using adplug library argouml-0.30.2_1 -- A UML design tool with cognitive support
adstudio-9.0.5 -- A database query and administration tool argp-standalone-1.3_2 -- Standalone version of arguments parsing functions from GLIBC
adtool-1.3_1 -- Active Directory administration tool argtable-2.12 -- An ANSI C library for parsing GNU style command line arguments
adun-0.81 -- Molecular Simulator for GNUstep argus-2.0.6_1 -- A generic IP network transaction auditing tool
advancecomp-1.15 -- Recompression utilities for .ZIP, .PNG, .MNG and .GZ files argus-clients-2.0.6_1 -- Client programs for the argus IP network transaction auditing tool
advancemame-0.106.1 -- SDL MAME port with advanced TV and monitor video support argus-clients-sasl-3.0.2 -- Client programs for the argus IP network transaction auditing tool
advancemenu-2.5.0 -- A frontend for AdvanceMAME, MAME, MESS, RAINE argus-monitor-20060722_4 -- Argus - The All Seeing System and Network Monitoring Software
advancemess-0.102.0.1_2 -- SDL MESS port with advanced TV and monitor video support argus-sasl-3.0.2 -- A generic IP network transaction auditing tool
advi-1.9 -- Active-DVI viewer ari-yahoo-1.10_3 -- A console Yahoo! messenger client
adzap-20090301 -- Filter out animated ad banners from web pages aria-1.0.0_5 -- Yet another download tool
aee-2.2.15b_1 -- An easy editor with both curses and X11 interfaces aria2-1.10.0 -- Yet another download tool
aegis-4.24_5 -- Transaction-based software configuration management system aria2fe-0.0.5_3 -- Aria2 QT front-end
aegisub-2.1.8_2 -- Aegisub Project is a cross-platform subtitle editor ariadne-1.3 -- Programs to compare protein sequences and profiles
aescrypt-0.7_1 -- A command-line AES encryption/decryption suite aribas-1.64 -- Interpreter for big integer/multi-precision floating point arithmetic
aeskulap-0.2.1_1 -- A medical image viewer ario-1.5 -- Ario is a GTK2 client for MPD
aespipe-v2.3.e -- An AES encrypting or decrypting pipe arirang-2.00,1 -- Powerful webserver security scanner for network
aestats-5.39 -- An advanced HTML statistics generator for various games arista-0.9.5 -- An easy to use multimedia transcoder for the GNOME Desktop
aewan-1.0.01 -- Curses-based program for the creation and editing of ascii art arj-3.10.22_4 -- Open-source ARJ
aewm-1.2.7_3 -- ICCCM-compliant window manager based on 9wm arkpandora-2.04_2 -- Arkpandora TrueType fonts
af-aspell-0.50.0_1,1 -- Aspell Afrikaans dictionary arm-elf-binutils-2.14_2 -- GNU binutils for vanilla ARM cross-development
af-kde-i18n-3.5.10_4 -- Afrikaans localized messages and documentation for KDE3 arm-rtems-binutils-2.20 -- GNU binutils port for cross-target development
af-koffice-i18n-1.5.2_6 -- Afrikaans messages and documentation for koffice arm-rtems-gcc-4.4.2_2 -- GNU gcc for cross-target development
afay-041111 -- Improved aflex and ayacc Ada 95 native scanner and parser generators arm-rtems-gdb-7.1 -- GNU gdb port for cross-target development
afbinit-1.0_4 -- Sun AFB aka Sun Elite 3D microcode firmware loader armagetron-0.2.8.2.1_5 -- A multiplayer networked Tron clone in 3D
affenspiel-1.0_2 -- Little puzzle game with monkey for X Window System arora-0.11.0 -- Simple Qt4 based browser
affiche-0.6.0_2 -- Affiche allows people to stick notes aros-sdk-0.20060207 -- The Software development kit (SDK) for the AROS Operating System
afflib-3.6.4 -- The Advanced Forensics Format library and utilities arp-scan-1.7 -- ARP Scanning and Fingerprinting Tool
afio-2.5 -- Archiver & backup program w/ builtin compression arp-sk-0.0.16_2 -- A tool designed to manipulate ARP tables of all kinds of equipment
afm-1.0 -- Adobe Font Metrics arpack++-1.2_3 -- ARPACK++ is an object-oriented version of the ARPACK package
afni-2008.01.02.1043_5 -- Advanced Functional Neuro Imaging arpack-96_7 -- Argand Library: large eigenvalue subroutines (serial version)
afnix-1.9.0 -- A multi-threaded functional programming language arpalert-2.0.11_1 -- ARP traffic monitoring
afsp-8.2 -- Audio file conversion utilities and library arpdig-0.5.2 -- ARP Digger utility
aft-5.098,1 -- A document preparation system using an Almost Free Text input format arping-2.09 -- ARP level "ping" utility
aften-0.0.8 -- ATSC A/52 audio encoder arprelease-1.2_2 -- Libnet tool to flush arp cache entries from devices (eg. routers)
afterglow-1.6.0 -- A collection of graph-generating scripts arpscan-0.3 -- Simple arp scanner
afternoonstalker-1.1.4 -- A clone of the 1981 Night Stalker video game arpwatch-2.1.a15_6 -- Monitor arp & rarp requests
afterstep-1.0_3 -- Window manager originally based on the Bowman NeXTstep clone arss-0.2.3 -- Additive Image Synthesizer (convert audio to images, images to audio)
afterstep-2.2.9_2 -- A stable version of the AfterStep window manager artemis-9_1 -- A DNA sequence viewer and annotation tool
afterstep-i18n-1.0_4 -- The NeXTstep clone window manager with Fontset support arts++-1.1.a13,1 -- A network data storage and analysis library from CAIDA
aftp-1.0 -- A ftp-like shell for accessing Apple II disk images arts-1.5.10_5,1 -- Audio system for the KDE integrated X11 desktop
agame-1577_7 -- A simple tetris-like game artswrapper-1.5.3 -- Setuid wrapper for arts
agave-0.4.2_8 -- A color scheme builder for the GNOME desktop artwiz-aleczapka-de-1.3_2 -- A set of (improved) artwiz fonts
agef-3.0 -- Show disk usage of file sizes and counts sorted by file age artwiz-aleczapka-en-1.3_2 -- A set of (improved) artwiz fonts
aget-0.4.1 -- A multithreaded HTTP download accelerator artwiz-aleczapka-se-1.3_2 -- A set of (improved) artwiz fonts
agg-2.5_6 -- A High Quality Rendering Engine for C++ artwiz-fonts-1.0_3 -- A set of free fonts for X11 desktops
aggregate-1.6_1 -- Optimise a list of route prefixes to help make nice short filters as31-2.0.b3_6 -- A free 8051 assembler
agrep-2.04_2 -- Approximate grep (fast approximate pattern-matching tool) asWedit-4.0.1_3 -- An easy to use HTML and text editor
aguri-0.7_1 -- An Aggregation-based Traffic Profiler asapm-3.1_2 -- Laptop battery status display for X11
ah-tty-0.3.12 -- Ah-tty is an automatic helper for command prompts and shells asbutton-0.3_3 -- A dockapp that displays 4 or 9 buttons to run apps of your choice
ahwm-0.90_2 -- An X11 window manager asc-2.4.0.0 -- A turn based, multiplayer strategic game with very nice graphics
aide-0.13.1_3 -- A replacement and extension for Tripwire ascd-0.13.2_1 -- A dockable cd player for AfterStep or WindowMaker
aifad-1.0.27_2 -- Machine learning system ascii2binary-2.14 -- Convert between textual representations of numbers and binary
aiksaurus-1.2.1_2 -- A set of libraries and applications which provide a thesaurus ascii2pdf-0.9.1 -- A perl script to convert text files to PDF files
aiksaurus-gtk-1.2.1_10 -- A GTK+2 front-end for Aiksaurus, a thesaurus asciidoc-8.6.1 -- A text document format for writing short documents and man pages
aim-1.5.286_4 -- AOL's Instant Messenger (AIM) client asciio-1.02.71_2 -- A Perl/GTK application that lets you draw
aimage-3.2.4 -- Advanced Disk Imager
aimsniff-0.9d -- AOL Instant Messanger Sniffing and Reading Tool
aircrack-ng-1.1 -- An 802.11 WEP and WPA-PSK keys cracking program
airoflash-1.7 -- Flash utiltity for Cisco/Aironet 802.11 wireless cards
airport-2.0.1_3 -- Apple Airport / Lucent RG-1000 configuration program
airrox-0.0.4_8 -- An 3D Air Hockey, which uses SDL & OpenGL
aish-1.13 -- Ish/uuencode/Base64 converter
akamaru-0.1_6 -- Simple, but fun, physics engine prototype
akode-2.0.2_1,1 -- Default KDE audio backend
akode-plugins-ffmpeg-2.0.2_1,1 -- FFMPEG decoder plugin for akode
akode-plugins-jack-2.0.2,1 -- Jack output plugin for akode
akode-plugins-mpc-2.0.2,1 -- Musepack decoder plugin for akode
akode-plugins-mpeg-2.0.2,1 -- MPEG audio decoder plugin for akode
akode-plugins-oss-2.0.2,1 -- OSS output plugin for akode
akode-plugins-pulseaudio-2.0.2_4 -- Pulseaudio output plugin for akode
akode-plugins-resampler-2.0.2,1 -- Resampler plugin for akode
akode-plugins-xiph-2.0.2_3,1 -- FLAC/Speex/Vorbis decoder plugin for akode
akonadi-1.4.1_1 -- Storage server for kdepim
akonadi-googledata-1.1.0_1 -- Akonadi Resources for Google Contacts and Calendar
akpop3d-0.7.7 -- POP3 daemon aimed to be small and secure
alabastra-0.21b_1 -- C++ Editor writen with QT4
alac-0.2.0 -- Basic decoder for Apple Lossless Audio Codec files (ALAC)
alacarte-0.13.2 -- An editor for the freedesktop.org menu specification
alarm-clock-1.4 -- Alarm Clock for the GNOME desktop
albumart-1.6.6_3 -- GUI application for downloading album cover art
albumshaper-2.1_4 -- A drag-n-drop hierarchal photo album creation
ald-0.1.7 -- Debugger for assembly level programs
aldo-0.7.5_2 -- Morse code training program
ale-0.8.11.2_6 -- Anti-Lamenessing Engine
alephone-20100424_1 -- The open source version of Bungie's Marathon game
alephone-data-1.0_6 -- Released Marathon data files for the Aleph One port
alephone-scenarios-1.0_3 -- Free scenarios for the Aleph One engine
alevt-1.6.2_1 -- X11 teletext decoding and display program
alf-0.1_1 -- Abstract Large File
algae-4.3.6_4 -- A programming language for numerical analysis
algol68g-2.0.3 -- Alogol 68 Genie compiler
algotutor-0.8.6_3 -- An interactive tutorial for algorithms and data structures
alienarena-2010.745 -- Alien Arena (native version)
alienarena-data-2010.745 -- Alien Arena (data)
credi t t k
57
am-aspell-0.03.1_1,2 -- Aspell Amharic dictionary
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm
am-utils-6.1.5,1 -- The Berkeley Automounter Suite of Utilities
amanda-client-2.5.1p3_4,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-client-2.6.1p2_3,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-client-3.2.0_2,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-perl-wrapper-1.01 -- Perl wrapper to use with Amanda (with libthr.so.* linked)
amanda-server-2.5.1p3_7,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanda-server-2.6.1p2_3,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanda-server-3.2.0_2,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanith-0.3_8 -- C++ CrossPlatform framework designed for 2d & 3d vector graphics
amap-5.2 -- Application mapper
amarok-1.4.10_12 -- Media player for KDE
amarok-2.3.2 -- Media player for KDE4
amarok-fs-0.5_9 -- A full screen application for Amarok
practice
Crypto algorithms search results. had been independent implementa- tial market demand for fire that can be
tions, then there would be reason to applied nasally.
worry about the security implications, The other possible avenue of hope
Cryptographic Implementations but they are not. is that the ISO-C standardization group
Algorithm Detected In a few cases, optimized or license- would address this embarrassing situ-
MD2 6
sanitized versions have been written, ation. Before getting your hopes too
MD4 49
but overwhelmingly this is just point- high, bear in mind they have still not
MD5 920
less copy-and-paste of identical source managed to provide for specification of
SHA-1 136
code in blatant disregard of Occam’s integer endianness, even though CPUs
SHA-2 192
three-quarters-millennia-old advice. can do it and hardware and protocols
AES 39
I am a card-carrying member of the have needed it since the days of the AR-
Total 1,342
“aghast” segment. My membership PANET.
card is a FreeBSD commit message If the ISO-C crew decided to do it,
shown in the figure here. their process for doing so would un-
My libmd, which is as unencum- doubtedly consume 5–10 years before
ments: “huh?,” “sigh,” and “aghast.” bered by copyright issues as it can be, a document came out at the other end,
The “huh?” segment wonders what later grew more cryptographic hash al- by which time SHA-3 would likely be
the big deal is: the absence of a stan- gorithms, such as RIPEMD-160 and the ready, rendering the standard instantly
dardized system library with these SHA family, and it has been adopted by obsolete.
functions means that you have to some other operating systems. But it is all a pipe dream, if ISO is
“Bring Your Own Crypto” if you want I am also in the “sigh” segment, still allergic to standards with ITAR
some. because not all mainstream operating restrictions. And you can forget every-
The “sigh” segment thinks this is systems have adopted libmd, despite thing about a benevolent dictator lay-
the least of our troubles. having 16 years to do so, and if they ing down the wise word as law: Linus
The “aghast” segment will see this have, they do not agree what should doesn’t do userland.
as a total failure of good software engi- be in it. For example, Solaris seems to To be honest, what I have identified
neering practices, a call to arms for bet- leave MD2 out (see http://hub.opensolar- here is probably the absolutely worst-
ter education, and reason for a stake is.org/bin/view/Project+crypto/libmd), case example.
through the heart of the Open Zombie which begs the question: Which part First, if you need SHA-2, you need
Group. of “software portability” don’t they SHA-2, and it has to do the right and
And they are all correct, of course, understand? correct thing for SHA-2. There is little
each from its own vantage point. I am, sadly, also in the “huh?” seg- or no room for creativity or improve-
Fortunately, what this is not, is The ment, because there seems to be no ments, apart from performance.
Next Big Security Issue, even though I hope. The rational thing to expect Second, crypto algorithms are every-
would not be surprised if one or more would be that somebody from The where these days. Practically all com-
“security researchers” would claim so Open Group reads this article, repro- munication methods, from good old
from their parents’ basement.b If these duces my statistics, and decides that email over VPNs (virtual private net-
yes, there is indeed demand for a “lib- works) and torrent sites to VoIP (voice
b The fact that MD5 seems to be more in de- stdcrypto” filled with the usual bunch over IP), offers strong crypto.
mand—yes, I may indeed be to blame for that of crypto algorithms. That, I am told, is But aren’t those exactly the same
myself, but that is a story for another day; impossible. The Open Group does not two reasons why we should not be in
search for “md5crypt” if you cannot wait—
than its quality warrants is a matter of choice
write new standards; they just bicker this mess to begin with?
of algorithm, not a matter of implementation over the usability of ${.CURDIR} in
of the algorithm chosen. make(1) and probably also the poten-
Related articles
A card-carrying member of the “aghast” segment. on queue.acm.org
Languages, Levels, Libraries, and Longevity
John R. Mashey
src/lib/libmd/Makefile: http://queue.acm.org/detail.cfm?id=1039532
Gardening Tips
r1802 | phk | 1994-07-24 03:29:56 +0000 (Sun, 24 Jul 1994) Kode Vicious
http://queue.acm.org/detail.cfm?id=1870147
Imported libmd. This library contains MD2, MD4, and MD5. Poul-Henning Kamp (phk@FreeBSD.org) has
programmed computers for 26 years and is the inspiration
These three boggers pop up all over the place all of the time, so I behind bikeshed.org. His software has been widely
adopted as “under the hood” building blocks in both open
decided we needed a library with them. In general, they are used for source and commercial products. His most recent project
is the Varnish HTTP accelerator, which is used to speed up
large Web sites such as Facebook.
security checks, so if you use them you want to link them static.
58 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
October 22–27, 2011
Co-located with SPLASH /OOPSLA
Hilton Portland & Executive Tower
Portland, Oregon USA
ONWARD! 2011
ACM Conference on New Ideas in
Programming and Reflections on Software
Submissions for papers, workshops, essays, and films >> April 8, 2011
Chair
Robert Hirschfeld
Hasso-Plattner-Institut Potsdam, Germany
chair@onward-conference.org
Papers
Eelco Visser
Delft University of Technology, The Netherlands
papers@onward-conference.org
Workshops
Pascal Costanza
Vrije Universiteit Brussel, Belgium
workshops@onward-conference.org
Essays
David West
New Mexico Highlands University, USA
essays@onward-conference.org
Films
Bernd Bruegge
Technische Universität München, Germany
films@onward-conference.org
http://onward-conference.org/
contributed articles
d oi:10.1145/1897852.1897871
Many of the best micro-, tele-, and
Compose “dream tools” from continuously macroscopes are designed by scien-
tists keen to observe and comprehend
evolving bundles of software to make sense what no one has seen or understood
of complex scientific data sets. before. Galileo Galilei (1564–1642) rec-
ognized the potential of a spyglass for
By Katy Börner the study of the heavens, ground and
polished his own lenses, and used the
Plug-and-Play
improved optical instruments to make
discoveries like the moons of Jupiter,
providing quantitative evidence for the
Copernican theory. Today, scientists
Macroscopes
repurpose, extend, and invent new
hardware and software to create mac-
roscopes that may solve both local and
global challenges20 (see the sidebar
“Changing Scientific Landscape”).
My aim here is to inspire comput-
er scientists to implement software
frameworks that empower domain sci-
entists to assemble their own continu-
ously evolving macroscopes, adding
De cis ion m akin g i n science, industry, and politics, and upgrading existing (and removing
obsolete) plug-ins to arrive at a set that
as well as in daily life, requires that we make sense is truly relevant for their work—with
of data sets representing the structure and dynamics little or no help from computer scien-
tists. Some macroscopes may resem-
of complex systems. Analysis, navigation, and ble cyberinfrastructures (CIs),1 pro-
management of these continuously evolving data sets viding user-friendly access to massive
require a new kind of data-analysis and visualization amounts of data, services, computing
resources, and expert communities.
tool we call a macroscope (from the Greek macros, or Others may be Web services or stand-
“great,” and skopein, or “to observe”) inspired by de alone tools. While microscopes and
telescopes are physical instruments,
Rosnay’s futurist science writings.8 macroscopes resemble continuously
Just as the microscope made it possible for the changing bundles of software plug-ins.
naked human eye to see cells, microbes, and viruses, Macroscopes make it easy to select and
combine algorithm and tool plug-ins
thereby advancing biology and medicine, and but also interface plug-ins, workflow
just as the telescope opened the human mind to support, logging, scheduling, and oth-
er plug-ins needed for scientifically rig-
the immensity of the cosmos and the conquest of orous work. They make it easy to share
space—the macroscope promises to help make sense
of yet another dimension—the infinitely complex. key insights
Macroscopes provide a “vision of the whole,” helping OS
Gi/CIShell-powered tools improve
decision making in e-science,
us “synthesize” the related elements and detect government, industry, and education.
patterns, trends, and outliers while granting access to N on-programmers can use OSGi/CIShell
myriad details.18,19 Rather than make things larger or to assemble custom “dream tools.”
smaller, macroscopes let us observe what is at once N ew plug-ins are retrieved automatically
via OSGi update services or shared via
too great, slow, or complex for the human eye and email and added manually; they can be
plugged and played dynamically, without
mind to notice and comprehend. restarting the tool.
60 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
UCSD Map of Science with data overlays of MEDLINE publications that acknowledge NIH funding.
plug-ins via email, flash drives, or on- tribute software; for example, in August grid-computing resources for extra
line. To use new plug-ins, simply copy 2009, SourceForge.net hosted more cycles. The collaborative environment
the files into the plug-in directory, and than 230,000 software projects by two of myExperiment (http://myexperi-
they appear in the tool menu ready for million registered users (285,957 in ment.org) (discussed later) supports
use. No restart of the tool is necessary. January 2011); also in August 2009 Pro- the sharing of scientific workflows and
Sharing algorithm components, tools, grammableWeb.com hosted 1,366 ap- other research objects.
and novel interfaces becomes as easy plication programming interfaces and Missing so far is a common stan-
as sharing images on Flickr or videos 4,092 mashups (2,699 APIs and 5,493 dard for the design of modular, com-
on YouTube. Assembling custom tools mashups in January 2011) that combine patible algorithm and tool plug-ins
is as quick as compiling your custom data or functionality from two or more (also called modules or components)
music collection. sources to arrive at a service. easily combined into scientific work-
The macroscopes presented here Web services convert any Web flows (also called pipeline and com-
were built using the Open Services browser into a universal canvas for in- position). This leads to duplication
Gateway Initiative Framework (OSGi) formation and service delivery. In ad- of work, as even in the same project,
industry standard and the Cyberin- dition, there are diverse e-science in- different teams might develop several
Court esy o f Cyb erinfrast ruc ture fo r Net work Science Center, ht tp ://c ns.iu.edu
frastructure Shell (CIShell) that sup- frastructures supporting researchers incompatible “plug-ins” that have al-
ports integration of new and existing in the composition and execution of most identical functionality yet are
algorithms into simple yet powerful analysis and/or visualization pipelines incompatible. Plus, adding a new algo-
tools. As of January 2011, six different or workflows. Among them are sev- rithm plug-in to an existing cyberinfra-
research communities were benefit- eral cyberinfrastructures serving large structure or bundling and deploying a
ting from OSGi and/or CIShell powered biomedical communities: the cancer subset of plug-ins as a new tool/service
tools. Several other tool-development Biomedical Informatics Grid (caBIG) requires extensive programming skills.
efforts consider adoption. (http://cabig.nci.nih.gov); the Biomed- Consequently, many innovative new
ical Informatics Research Network algorithms are never integrated into
Related Work (BIRN) (http://nbirn.net); and the In- common CIs and tools due to resource
Diverse commercial and academic ef- formatics for Integrating Biology and limitations.
forts support code sharing; here, I dis- the Bedside (i2b2) (https://www.i2b2. Web sites like IBM’s Many Eyes
cuss those most relevant for the design org). The HUBzero (http://hubzero.org) (http://manyeyes.alphaworks.ibm.com/
and deployment of plug-and-play mac- platform for scientific collaboration manyeyes/visualizations) and Swivel
roscopes: uses the Rapture toolkit to serve Java (http://swivel.com) demonstrate the
Google Code and SourceForge.net applets, employing the TeraGrid, the power of community data sharing and
provide the means to develop and dis- Open Science Grid, and other national visualization. In 2009 alone, Many Eyes
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 61
contributed articles
Figure 1. The NWB tool interface (I) with menu (a), Console (b), Scheduler (c), and Data Manager (d). The two visualizations of Renaissance
Florentine families used the GUESS tool plug-in (II) and prefuse.org algorithm plug-in (III) Nodes denote families labeled by name;
links represent marriage and business relationships among families. In GUESS, nodes are size-coded by wealth and color-coded by degree;
marriage relationships are in red using the Graph Modifier (d). The “Pazzi” family in (c) was selected to examine properties in the
Information Window (b).
had more than 66,429 data sets and they offer valuable functionality and into “custom tools.” To make all three
35,842 visualizations, while Swivel had are widely used in research, education, parts work properly, it is important to
14,622 data sets and 1,949,355 graphs and industry, none makes it easy for understand who takes ownership of
contributed and designed by 12,144 users to share and bundle their algo- which ones and what general features
users. Both sites let users share data rithms into custom macroscopes. are desirable (see the sidebar “Desir-
(not algorithms), generate and save dif- able Features and Key Decisions”).
ferent visualization types, and provide Plug-and-Play Software Core architecture. To serve the
community support. In January 2011, Architectures needs of scientists (see both sidebars)
the numbers for Many Eyes increased When discussing software architec- the core architecture must empower
to 165,124 data sets and 79,115 visual- tures for plug-and-play macroscopes, non-programmers to plug, play, and
izations, while Swivel ceased to exist. it is beneficial to distinguish among: share their algorithms and to design
Data analysis and visualization is (1) the “core architecture” facilitating custom macroscopes and other tools.
also supported by commercial tools the plug-and-play of data sets and algo- The solution proposed here is based on
like Tableau (http://tableausoftware. rithms; (2) the “dynamic filling” of this OSGi/CIShell:
com), Spotfire (http://spotfire.tibco. core comprising the actual algorithm, Open Services Gateway Initiative. De-
com), and free tools; see Börner et al.6 tool, user interface, and other plug-ins; veloped by the OSGi Alliance (http://
for a review of 20 tools and APIs. While (3) and the bundling of all components osgi.org), this service platform has
been used since 1999 in industry, requires a “persister” plug-in to load, vices, providing instead interfaces for
including by Deutsche Telekom, Hi- view, and save a data set from/to a data data-set and algorithm services, basic
tachi, IBM, Mitsubishi Electric, NEC, file in a specific format. Some data services (such as logging and conver-
NTT, Oracle, Red Hat, SAP AG, and Sie- models lack a persister plug-in, instead sion), and application services (such
mens Enterprise Communications. It converting data to or from some other as scheduler and data manager). Each
is a dynamic module system for Java, data format that does have one. CIS- bundle includes a manifest file with a
supporting interoperability of appli- hell also defines a set of algorithm APIs dependency list stating which pack-
cations and services in a mature and that allows developers to develop and ages and other bundles it must run;
comprehensive way with an effective integrate diverse new or existing algo- all bundles are prioritized. Upon ap-
yet efficient API. The platform is inter- rithms as plug-ins. plication start-up, the bundles with
face-based, easing plug-and-play inte- Though written in Java, CIShell sup- highest priority start first, followed
gration of independent components by ports integration of algorithms written by bundles of second, third, fourth,...
managing class and dependency issues in other programming languages, in- priority. Bundles can also be started at
when combining components. It is cluding C, C++, and Fortran. In prac- runtime.
also dynamic; that is, new components tice, a pre-compiled algorithm must A bundle can create and register an
can be added without stopping the be wrapped as a plug-in that imple- object with the OSGi service registry un-
program. It also comes with a built-in ments basic interfaces defined in the der one or more interfaces. The services
mechanism for retrieving new compo- CIShell Core APIs. Pre-compiled algo- layer connects bundles dynamically by
nents through the Internet. As service- rithms can be integrated with CIShell offering a “publish-find-bind” model
oriented architecture, OSGi is an easy by providing metadata about their in- for Java objects. Each service registra-
way to bundle and pipeline algorithms put and output. Various templates are tion has a set of standard and custom
into “algorithm clouds.” A detailed de- available for facilitating integration properties. An expressive filter language
scription of the OSGi specification and of algorithms into CIShell. A plug-in is available to select relevant services.
existing reference implementations is developer simply fills out a sequence Services are dynamic; that is, bundles
beyond the scope of this article but can of forms for creating a plug-in and ex- can be installed and uninstalled on the
be explored through http://www.osgi. ports it to the installation directory fly, while other bundles adapt, and the
org/Specifications. and the new algorithm appears in the service registry accepts any object as a
Leveraging OSGi provides access to CIShell graphical user interface (GUI) service. However, registering objects
a large amount of industry-standard menu. This way, any algorithm or tool under (standard) interfaces (such as
code—prebuilt, pretested, continuous- that can be executed from a command OSGi and CIShell) helps ensure reuse.
ly updated components—and know- line is easily converted into a CIShell Due to the declarative specification of
how that would otherwise take years to compatible plug-in. bundle metadata, a distributed version
reinvent/re-implement, thus helping CIShell’s reference implementation of CIShell could be built without chang-
reduce time to market, development, also includes a number of basic ser- ing most algorithms.
and cost of maintenance. OSGi bundles vices, including a default menu-driven The result is that domain scientists
can be developed and run using a num- interface, work-log-tracking module, can mix and match data sets and al-
ber of frameworks, including the Equi- a data manager, and a scheduler (see gorithms, even adding them dynami-
nox project from Eclipse (http://eclipse. Figure 1, left). Work logs—displayed in cally to their favorite tool. All plug-ins
org/equinox), the reference implemen- a console and saved in log files—com- that agree on the CIShell interfaces
tation of the OSGi R4 core framework. prise all algorithm calls and param- can be run in software designed with
Eclipse includes extensive add-ons for eters used, references to original pa- the OSGi/CIShell core architecture. No
writing and debugging code, interact- pers and online documentation, data common central data format is need-
ing with code repositories, bug track- loaded or simulated, and any errors. ed. Plug-ins can be shared in a flexible,
ing, and software profiling that greatly The algorithm scheduler shows all cur- decentralized fashion.
extend the base platform. rently scheduled or running processes, Dynamic filling. As of January 2011,
Cyberinfrastructure Shell (http://cis- along with their progress. CIShell can the OSGi/CIShell plug-in pool included
hell.org). This open-source software be deployed as a standalone tool or more than 230 plug-ins, including ap-
specification adds “sockets” to OSGi made available as either a Web or peer- proximately 60 “core” OSGi/CIShell
into which data sets, algorithms, and to-peer service. The CIShell Algorithm plug-ins and a “filling” of more than
tools can be plugged using a wizard- Developer’s Guide7 details how to de- 170 algorithm plug-ins, plus 40 sample
driven process.11 CIShell serves as a velop and integrate Java and non-Java data sets, as well as configuration files
central controller for managing data algorithms or third-party libraries. and sample data files. Nearly 85% of
sets and seamlessly exchanging data OSGi/CIShell combined. Software de- the algorithm plug-ins are implement-
and parameters among various imple- signed using OSGi/CIShell is mainly a ed in Java, 5% in Fortran, and the other
mentations of algorithms. It also de- set of Java Archive bundles, also called 10% in C, C++, Jython, and OCaml; see
fines a set of generic data-model APIs plug-ins. OSGi services, CIShell ser- http://cishell.wiki.cns.iu.edu.
and persistence APIs. Extending the vices, and data set/algorithm services Custom tools. The OSGi/CIShell
data-model APIs makes it possible all run in the OSGi container. The CIS- framework is at the core of six plug-
to implement and integrate various hell framework API is itself an OSGi and-play tools that resemble simple
data-model plug-ins. Each data model bundle that does not register OSGi ser- macroscopes and serve different sci-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 63
contributed articles
entific communities; for example, the interactive plotting utility for data by different users. Much of the related
Information Visualization Cyberifra- and related functions (http://gnuplot. complexity is hidden; for example, us-
structure (IVC) was developed for re- info). NWB uses 15 supporting librar- ers do not see how many converters are
search and education in information ies, including Colt, JUNG, Jython, and involved in workflow execution. Only
visualization; the Network Workbench Prefuse (see Prefuse layouts in Figure those algorithms that can be applied
(NWB) tool was designed for large- 1, III); detailed listings are provided in to a currently selected data set can be
scale network analysis, modeling, and the NWB tutorial3 and wiki (http://nwb. selected and run, with all others grayed
visualization; the Science of Science wiki.cns.iu.edu). out. Expert-workflow templates and tu-
(Sci2) tool is used by science-of-science A common network-science work- torials provide guidance through the
researchers, as well as by science-poli- flow includes data loading and/or vast space of possible algorithm com-
cy analysts; the Epidemics (EpiC) tool modeling, preprocessing, analysis, vi- binations.
is being developed for epidemiolo- sualization, and export of results (such The Science of Science tool (http://
gists; TEXTrend supports analysis of as tables, plots, and images). More sci2.cns.iu.edu). The Sci2 tool supports
text; and DynaNets will be used to ad- than 10 different algorithms may be the study of science itself through sci-
vance theory on network diffusion pro- run in one workflow, not counting data entific methods; science-of-science
cesses. Here, NWB and Sci2 are covered converters. Common workflows and studies are also known as scientomet-
in detail: references to peer-reviewed papers are ric, bibliometric, or informetric stud-
The NWB tool (http://nwb.cns. given in Börner et al.3 Here are six ex- ies. Research in social science, political
iu.edu) supports the study of static emplary NWB workflows from differ- science, physics, economics, and other
and dynamic networks in biomedi- ent application domains: areas further increases our under-
cine, physics, social science, and other ˲˲ Error-tolerance and attack-toler- standing of the structure and dynam-
research areas. It uses 39 OSGi plug- ance analysis in physics and computer ics of science.2,5,16 The tool supports the
ins and 18 CIShell plug-ins as its core science requires loading or modeling study of science at micro (individual),
architecture; two of them define the a network and deleting random nodes meso (institution, state), and global
functionality of the simple GUI in Fig- (such as by error) or deleting highly (all science, international) levels using
ure 1 (I), top left with the menu (I.a) for connected hub nodes (such as in an at- temporal, geospatial, topical, network-
users to load data and run algorithms tack); analyses, and visualization techniques
and tools. The Console (I.b) logs all ˲˲ Peer-to-peer network analysis in (http://sci2.wiki.cns.iu.edu).
data and algorithm operations, list- computer science can include simula- Algorithms needed for these analy-
ing acknowledgment information on tion of various networks and an analy- ses are developed in diverse areas of
authors, programmers, and documen- sis of their properties; science; for example, temporal-analy-
tation URLs for each algorithm. The ˲˲ Temporal text analysis in linguis- sis algorithms come from statistics and
Data Manager (I.d) displays all cur- tics, information science, and com- computer science; geospatial-analysis
rently loaded and available data sets. puter science might apply the burst-de- algorithms from geography and cartog-
A Scheduler (I.c) lets users keep track tection algorithm to identify a sudden raphy; semantic-analysis algorithms
of the progress of running algorithms. increase in the usage frequency of from cognitive science, linguistics, and
Worth noting is that the interface is words, with results visualized; machine learning; and network analy-
easily branded or even replaced (such ˲˲ Social-network analysis in social sis and modeling from social science,
as with a command-line interface). science, sociology, and scientometrics physics, economics, Internet studies,
The NWB tool includes 21 converter might compare properties of scholarly and epidemiology. These areas have
plug-ins that help load data into in- and friendship networks for the same highly contrasting preferences for data
memory objects or into formats the al- set of people; the scholarly network formats, programming languages, and
gorithms read behind the scenes. Most can be derived from publications and software licenses, yet the Sci2 tool pres-
relevant for users are the algorithm the friendship network from data ac- ents them all through a single com-
plug-ins that can be divided into algo- quired via questionnaires; mon interface thanks to its OSGi/CIS-
rithms for preprocessing (19), analysis ˲˲ Discrete network dynamics (biol- hell core. Moreover, new algorithms
(56), modeling (10), and visualization ogy) can be studied through the DND are added easily; in order to read a nov-
(19). Three standalone tools—Discrete tool, which bundles loading and mod- el data format, only one new converter
Network Dynamics (DND), GUESS, and eling a multistate discrete network must be implemented to convert the
GnuPlot—are available via the NWB model, to generate the model’s state- new format into an existing format.
menu system. GUESS is an exploratory space graph, analyze the attractors of Multiple workflows involve more
data-analysis-and-visualization tool for the state space, and generate a visual- data converters than algorithms, as
graphs and networks (http://graphex- ization of an attractor basin; and multiple converters are needed to
ploration.cond.org), as shown in Fig- ˲˲ Data conversion across sciences bridge output and input formats used
ure 1, II, containing a domain-specific can use multiple converter algorithms by consecutive algorithms. Workflows
embedded language called Gython to translate among more than 20 data are frequently rerun several times due
(an extension of Python, or more spe- formats. to imperfect input data, to optimize pa-
cifically Jython) that supports the cus- Most workflows require serial appli- rameter settings, or to compare differ-
tomization of graph designs. GnuPlot cation of algorithms developed in dif- ent algorithms. Thanks to the Sci2 tool,
is a portable, command-line-driven, ferent areas of science and contributed an analysis that once required weeks
64 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles
Figure 2. Exemplary Sci2 tool workflows: horizontal-bar-graph visualization of NSF funding for one investigator (I); circular layout
of a hierarchically clustered co-author network of network-science researchers, with zoom into Eugene Garfield’s network (II);
citation network of U.S. patents on RNAi and patents they cite, with highly cited patents labeled (III); and UCSD science base map
with overlay of publications by network-science researchers (IV).
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 65
contributed articles
Landscape
terns among major network-science
researchers, the publications of four
major researchers were downloaded
As science becomes increasingly data driven and computational, as well as from the Web of Science by Thomson
collaborative and interdisciplinary, there is increased demand for tools that are easy to Reuters (http://wokinfo.com). The
extend, share, and customize:
• Star scientist → Research teams. Traditionally, science was driven by key scientists. data was then loaded into Sci2, the co-
Today, science is driven by collaborating co-author teams, often comprising experts author network extracted, the Blondel
from multiple disciplines and geospatial locations5,17; community-detection algorithm ap-
• Users → Contributors. Web 2.0 technologies empower users to contribute
to Wikipedia and exchange images, videos, and code via Fickr, YouTube, and
plied to extract hierarchical clusters
SourceForge.net. Wikispecies, WikiProfessionals, and WikiProteins combine wiki of the network, and the result laid out
and semantic technology to support real-time community annotation of scientific using the Circular Hierarchy visualiza-
data sets14; tion, with author names plotted in a
• Disciplinary → Cross-disciplinary. The best tools frequently borrow and
synergistically combine methods and techniques from different disciplines of circle and connecting lines represent-
science, empowering interdisciplinary and/or international teams of researchers, ing co-author links (see Figure 2, II).
practitioners, and educators to collectively fine-tune and interpret results; Two of the researchers share a com-
• Single specimen → Data streams. Microscopes and telescopes were originally used
to study one specimen at a time. Today, many researchers must make sense of
bined network, while the others are at
massive data streams of multiple data types and formats and of different dynamics the centers of unconnected networks.
and origin; and Also shown is a zoom into Eugene Gar-
• Static instrument → Evolving cyberinfrastructure. The importance of hardware
field’s network;
instruments that are static and expensive tends to decrease relative to software
˲˲ To understand what patents ex-
tools and services that are highly flexible and evolving to meet the needs of different
sciences. Some of the most successful tools and services are decentralized, ist on the topic of RNA interference
increasing scalability and fault tolerance. (RNAi) and how they built on prior
work, data was retrieved from the
Good software-development practices make it possible for “a million minds” to
design flexible, scalable software that can be used by many: Scholarly Database (http://sdb.cns.
iu.edu).6 Specifically, a query was run
• odularity. Software modules with well-defined functionality accept contributions
M over all text in the U.S. patent data set
from multiple users reduce costs and increase flexibility in tool development,
augmentation, and customization;
covering 1976–2010. The U.S. Patent
• Standardization. Standards accelerate development, as existing code is leveraged, and Trademark Office citation table
helping pool resources, support interoperability, and ease migration from research was downloaded, read into the Sci2
code to production code and hence the transfer of research results into industry tool, the patent-citation network ex-
applications and products; and
• Open data and open code. The practice of making data sets and code freely available tracted, the “indegree” (number of
allows users to check, improve, and repurpose data and code, easing replication of citations within the set) of all patent
scientific studies. nodes calculated, and the network
displayed in GUESS (see Figure 2, III).
The network represents 37 patents
or months to set up and run can now sample studies are discussed here and (in red) matching the term RNAi and
be designed and optimized in a few included in Figure 2, I–IV: their and the 487 patents they cite (in
hours. Users can also share, rerun, and ˲˲ Funding portfolios (such as fund- orange). Nodes are size-coded by in-
improve automatically generated work ing received by investigators and degree (number of times a patent is
logs. Workflows designed, validated, institutions, as well as provided by cited); patents with at least five cita-
and published in peer-reviewed works agencies) can be plotted using a hori- tions are labeled by their patent num-
can be used by science-policy analysts zontal bar graph (HBG); for example, ber. One of the most highly cited is
and policymakers alike. As of January all funding for one researcher was no. 6506559 on “Genetic Inhibition by
2011, the Sci2 tool was being used by downloaded from the National Science Double-Stranded RNA”; and
the National Science Foundation, the Foundation Award Search site (http:// ˲˲ The topical coverage of publication
National Institutes of Health, the U.S. nsf.gov/awardsearch), loaded into Sci2, output is revealed using a base map of
Department of Energy, and private and visualized in HBG, as in Figure 2, science (such as the University of Cali-
foundations adding novel plug-ins and I. Each project is represented by a bar fornia, San Diego map in Figure 2, IV.).
workflows relevant for making deci- starting to the left at a certain state date The map represents 13 major disci-
sions involving science policy. and ending right at an end date, with plines of science in a variety of colors,
The Sci2 tool supports many differ- bar width representing project dura- further subdivided into 554 research
ent analyses and visualizations used tion. Bar-area size encodes a numeric areas. Papers are matched to research
to communicate results to a range of property (here total awarded dollar areas via their journal names. Multiple
stakeholders. Common workflows and amount), and equipment grants show journals are associated with each area,
references to peer-reviewed papers as narrow bars of significant height. A and highly interdisciplinary journals
are given in Börner et al.3 and the Sci2 label (here project name) is given to the (such as Nature and Science) are frac-
wiki (http://sci2.wiki.cns.iu.edu). Four left of the bar. Bars can be color-coded tionally associated with multiple areas.
66 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles
based software
individual, institution, or country can state data.15 Inspired by a workshop on
be mapped to indicate core competen- software infrastructures in July 2007
cies. Most publication output of the
four network-science researchers is in
frameworks (https://nwb.slis.indiana.edu/events/
ivsi2007), Mike Smoot and Bruce W.
physics. improves, and as Herr implemented a proof-of-concept
These and many other Sci2 analyses
and corresponding visualizations are
the number and OSGi-based Cytoscape core several
months later; OSGi bundles are avail-
highly scalable; thousands of authors, diversity of data-set able at http://chianti.ucsd.edu/svn/
references, and projects can be viewed
simultaneously, and visualizations can and algorithm core3. Once the new Cytoscape 3.0
core is implemented (projected mid-
be saved in vector format for further plug-ins increases, 2011), sharing plug-ins between the
manipulation.
so too will the NWB tool and Cytoscape will be much
easier, thereby extending the function-
Macroscope Synergies
Just as the value of the earliest tele-
capabilities ality and utility of both;
˲˲ Taverna Workbench (http://taver-
phones increased in proportion to of custom na.org.uk). Developed by the myGrid
the number of people using them,
plug-and-play macroscopes gain value
macroscopes. team (http://mygrid.org.uk) led by
Carol Goble at the University of Man-
relative to the increase in their core chester, U.K., this suite of free open-
functionality; numbers of data-set and source software tools helps design and
algorithm plug-ins; and the research- execute workflows,12 allowing users
ers, educators, and practitioners using to integrate many different software
and advancing them. tools, including more than 8,000 Web
OSGi/CIShell-compliant plug-ins services from diverse domains, in-
can be shared among tools and proj- cluding chemistry, music, and social
ects; for example, network-analysis sciences. The workflows are designed
algorithms implemented for the NWB in the Taverna Workbench and can
tool can be shared as Java Archive files then be run on a Taverna Engine, in
through email or other means, saved the Workbench, on an external server,
in the plug-in directory of another in a portal, on a computational grid,
tool, and made available for execu- or on a compute cloud. Raven (a Tav-
tion in the menu system of that tool. erna-specific classloader and registry
Text-mining algorithms originally de- mechanism) supports an extensible
veloped in TEXTrend (discussed later) and flexible architecture (with approx-
can be plugged into the Sci2 tool to imately 20 plug-ins) but an imple-
support semantic analysis of scholarly mentation using an OSGi framework,
texts. Though National Science Foun- with alpha release was scheduled for
dation funding for the NWB tool for- February 2011. The myExperiment
mally ended in 2009, NWB’s function- (http://myexperiment.org) social Web
ality continues to increase, as plug-ins site supports the finding and sharing
developed for other tools become of workflows and provides special sup-
available. Even if no project or agency port for Taverna workflows9;
were to fund the OSGi/CIShell core for ˲˲ MAEviz (https://wiki.ncsa.uiuc.
some time, it would remain function- edu/display/MAE/Home). Managed by
al, due to it being lightweight and easy Shawn Hampton of the National Center
to maintain. Finally, the true value of for Supercomputing Applications, this
OSGi/CIShell is due to the continu- open-source, extensible software plat-
ously evolving algorithm filling and form supports seismic risk assessment
the “custom tools” continuously devel- based on Mid-America Earthquake
oped and shared by domain scientists. Center research in the Consequence-
Over the past five years, a number of Based Risk Management framework.10
projects have adopted OSGi (and in two It also uses the Eclipse Rich Client
cases, CIShell): Platform, including Equinox, a com-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 67
contributed articles
68 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles
into algorithm plug-ins, it is algorith- version of Craigslist.org can help ease Network Workbench Tool: User Manual 1.0.0., 2009;
http://nwb.cns.iu.edu/Docs/NWBTool-Manual.pdf
mically possible to modularize visual- the sharing, navigation, and utilization 4. Börner, K., Chen, C., and Boyack, K.W. Visualizing
ization and interaction design. Future of scholarly data sets and algorithms, knowledge domains. In Annual Review of Information
Science & Technology, B. Cronin, Ed. Information
work will focus on developing “visu- reinforcing reputation mechanisms Today, Inc./American Society for Information Science
alization layers” supporting selection by, say, providing ways to cite and ac- and Technology, Medford, NJ, 2003, 179–255.
5. Börner, K., Dall’Asta, L., Ke, W., and Vespignani,
and combination of reference sys- knowledge users who share, highlight A. Studying the emerging global brain: Analyzing
tems, projections/distortions, graphic most downloaded and highest-rated and visualizing the impact of co-authorship teams.
Complexity (Special Issue on Understanding Complex
designs, clustering/grouping, and in- contributions, and offer other means Systems) 10, 4 (Mar./Apr. 2005), 57–67.
teractivity. for making data sets, algorithms, work- 6. Börner, K., Huang, W.B., Linnemeier, M., Duhon,
R.J., Phillips, P., Ma, N., Zoss, A., Guo, H., and Price,
Streaming data. The number of flows, and tutorials part of a valued M.A. Rete-Netzwerk-Red: Analyzing and visualizing
scholarly networks using the Network Workbench
data sets that are generated and must scholarly record. tool. Scientometrics 83, 3 (June 2010), 863-876.
be understood in real time is increas- 7. Cyberinfrastructure for Network Science Center.
Cyberinfrastructure Shell (CIShell) Algorithm
ing; examples are patient-surveillance Acknowledgments Developer’s Guide, 2009; http://cishell.wiki.cns.iu.edu
data streams and models of epidemics I would like to thank Micah Linnemei- 8. de Rosnay, J. Le Macroscope: Vers une Vision Globale.
Editions du Seuil. Harper & Row Publishers, Inc., New
that predict the numbers of suscepti- er and Russell J. Duhon for stimulating York, 1975.
ble, infected, and recovered individu- discussions and extensive comments. 9. De Roure, D., Goble, C., and Stevens, R. The design
and realisation of the myExperiment Virtual Research
als in a population over time. EpiC Bruce W. Herr II, George Kampis, Environment for Social Sharing of Workflows. Future-
tool development funded by the Na- Gregory J. E. Rawlins, Geoffrey Fox, Generation Computer Systems 25 (2009), 561–567.
10. Elnashai, A., Hampton, S., Lee, J.S., McLaren, T.,
tional Institutes of Health contributes Shawn Hampton, Carol Goble, Mike Myers, J. D., Navarro, C., Spencer, B., and Tolbert, N.
algorithms that read and/or output Smoot, Yanbo Han, and anonymous Architectural overview of MAEviz–HAZTURK. Journal
of Earthquake Engineering 12, 1 Suppl.2, 01 (2008),
streams of data tuples, enabling algo- reviewers provided valuable input 92–99.
rithms to emit their results as they run, and comments to an earlier draft. I 11. Herr II, B.W., Huang, W.B., Penumarthy, S., and
Börner, K. Designing highly flexible and usable
not only on completion. Data-graph vi- also thank the members of the Cyber- cyberinfrastructures for convergence. In Progress in
sualizations plot these tuple streams infrastructure for Network Science Convergence: Technologies for Human Wellbeing, W.S.
Bainbridge and M.C. Roco, Eds. Annals of the New York
in real time, resizing (shrinking) the Center (http://cns.iu.edu), the Net- Academy of Sciences, Boston, 2007, 161–179.
12. Hull, D., Wolstencroft, K., Stevens, R., Goble, C.,
temporal axis over time. work Workbench team (http://nwb. Pocock, M.R., Li, P., and Oinn, T. Taverna: A tool for
Web services. The OSGi/CIShell- cns.iu.edu), and Science of Science building and running workflows of services. Nucleic
Acids Research (Web Server Issue) 34, Suppl. 2 (July
based tools discussed here are stand- project team (http://sci2.cns.iu.edu) 1, 2006), W729–W732.
alone desktop applications support- for their contributions toward this 13. Kampis, G., Gulyas, L., Szaszi, Z., and Szakolczi, Z.
Dynamic social networks and the TEXTrend/CIShell
ing offline work on possibly sensitive work. Software development benefits framework. Presented at the Conference on Applied
data, using a GUI familiar to target greatly from the open-source commu- Social Network Analysis (University of Zürich, Aug.
27–28). ETH Zürich, Zürich, Switzerland, 2009.
users. However, some application do- nity. Full software credits are distrib- 14. Mons, B., Ashburner, M., Chicester, C., Van Mulligen, E.,
mains also benefit from online deploy- uted with the source, but I especially Weeber, M., den Dunnen, J., van Ommen, G.-J., Musen,
M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A.,
ment of macroscopes. While the OSGi acknowledge Jython, JUNG, Prefuse, Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris,
specification provides basic support GUESS, GnuPlot, and OSGi, as well as N., Wales, J., Mejissen, G., Moeller, E., Roes, P.J.,
Börner, K., and Bairoch, A. Calling on a million minds
for Web services, CIShell must still be Apache Derby, used in the Sci2 tool. for community annotation in WikiProteins. Genome
extended to make it easy for domain This research is based on work sup- Biology 9, 5 (2008), R89.
15. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang,
scientists to design their own macro- ported by National Science Founda- J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker,
scope Web services. tion grants SBE-0738111, IIS-0513650, T. Cytoscape: A software environment for integrating
models of biomolecular interaction networks. Genome
Incentive design. Many domain and IIS-0534909 and National Insti- Research 13, 11 (2002), 2498–2504.
16. Shiffrin, R. and Börner, K. Mapping knowledge
experts have trouble trying to use an tutes of Health grants R21DA024259 domains. Proceedings of the National Academy of
evolving set of thousands of possibly and 5R01MH079068. Any opinions, Sciences 101, Suppl. 1 (Apr. 2004), 5183–5185.
17. Shneiderman, B. Science 2.0. Science 319, 5868 (Mar.
relevant data sets compiled for specific findings, and conclusions or recom- 2008), 1349–1350.
studies of inconsistent quality and cov- mendations expressed here are those 18. Shneiderman, B. The eyes have it: A task by data
type taxonomy for information visualizations. In
erage, saved in diverse formats, and of the author and do not necessarily Proceedings of the IEEE Symposium on Visual
tagged using terminology specific to reflect the views of the National Sci- Languages (Boulder, CO, Sept. 3–6). IEEE Computer
Society, Washington, D.C., 1996, 336–343.
the original research domains. In addi- ence Foundation. 19. Thomas, J.J. and Cook, K.A., Eds. Illuminating the
tion, thousands of algorithms that sup- Path: The Research and Development Agenda for
Visual Analytics. National Visualization and Analytics
port different functionality and diverse Center, Richland, WA, 2005; http://nvac.pnl.gov/
References
input and output formats are written 1. Atkins, D.E., Drogemeier, K.K., Feldman, S.I., Garcia-
agenda.stm
20. World Bank and International Monetary Fund. Global
in different languages by students and Molina, H., Klein, M.L., Messerschmitt, D.G., Messian,
Monitoring Report 2009: A Development Emergency.
P., Ostriker, J.P., and Wright, M.H. Revolutionizing
experts in a range of scientific domains The World Bank, Washington, D.C., 2009.
Science and Engineering Through Cyberinfrastructure.
and packaged as algorithms or tools Report of the National Science Foundation Blue-Ribbon
Advisory Panel on Cyberinfrastructure. National
using diverse licenses. More-effective Science Foundation, Arlington, VA, 2003. Katy Börner (katy@indiana.edu) is the Victor H. Yngve
2. Börner, Katy. Atlas of Science: Visualizing What Professor of Information Science at the School of Library
means are needed to help domain ex- and Information Science, Adjunct Professor at the School
We Know. MIT Press, Cambridge, MA, 2010;
perts find the data sets and algorithms supplemental material at http://scimaps.org/atlas of Informatics and Computing, and Founding Director
3. Börner, K., Barabási, A.-L., Schnell, S., Vespignani, A., of the Cyberinfrastructure for Network Science Center
most relevant for their work, bundle Wasserman, S., Wernert, E.A., Balcan, D., Beiró, M., (http://cns.iu.edu) at Indiana University, Bloomington, IN.
them into efficient workflows, and re- Biberstine, J., Duhon, R.J., Fortunato, S., Herr II, B.W.,
Hidalgo, C.A., Huang, W.B., Kelley, T., Linnemeier, M.W.,
late the results to existing work. Schol- McCranie, A., Markines, B., Phillips, P., Ramawat, M.,
arly markets resembling a Web 2.0 Sabbineni, R., Tank, C., Terkhorn, F., and Thakre, V. © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 69
contributed articles
doi:10.1145/1897852.1897872
knowledge, we examine a variety of
Effective countermeasures depend on first scams, distilling some general prin-
ciples of human behavior that explain
understanding how users naturally fall victim why the scams work; we then show
to fraudsters. how they also apply to broader attacks
on computer systems insofar as they
By Frank Stajano and Paul Wilson involve humans. Awareness of the as-
pects of human psychology exploited
Understanding
by con artists helps not only the public
avoid these particular scams but also
security engineers build more robust
systems.
Scam Victims:
Over nine series of the BBC TV doc-
umentary The Real Hustle (http://www.
bbc.co.uk/realhustle/) Paul Wilson and
Alexis Conran researched the scams
Seven
most commonly carried out in Britain
and, with Jessica-Jane Clement, rep-
licated hundreds of them on unsus-
pecting victims while filming the ac-
Principles
tion with hidden cameras. The victims
were later debriefed, given their money
back, and asked for their consent to
publish the footage so others would
for Systems
learn not to fall for the same scams (see
the sidebar “Representative Scams” to
which we refer throughout the main
text.)
The objective of the TV show was to
Security
help viewers avoid being ripped off by
similar scams. Can security research-
ers do more? By carefully dissecting
dozens of scams, we extracted seven
recurring behavioral patterns and re-
lated principles exhibited by victims
and exploited by hustlers. They are
not merely small-scale opportunistic
scams (known as “short cons”) but in-
strengthened system is usually its human element; an T hese principles cause vulnerabilities
in computer systems but were exploited
attack is possible because the designers thought only by fraudsters for centuries before
computers were invented and are rooted
about their strategy for responding to threats, without in human nature.
anticipating how real users would react. U sers fall prey to these principles not
We need to understand how users behave and what because they are gullible but because
they are human. Instead of blaming
traits of that behavior make them vulnerable, then users, understand that these inherent
vulnerabilities exist, then make your
design systems security around them. To gain this system robust despite them.
70 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
herent security vulnerabilities of the involving sleight of hand, including “419,” or Nigerian, scam. The hustler,
human element in any complex sys- pickpocketing and the special “throw” posing as a Nigerian government of-
tem. The security engineer must un- found in the Monte. ficer with access to tens of millions
derstand them thoroughly and consid- The very presence of “sexy swindler” of dollars of dodgy money, wants the
er their implications toward computer Jess among the hustlers owes to Dis- mark to help transfer the money out
and system security. traction, as well as to Need and Greed of the country in exchange for a slice
(discussed later), since sex is such a of it. When the mark accepts the deal,
Distraction Principle fundamental human drive. The 2000 the hustler demands some amount of
While we are distracted by what grabs computer worm “ILOVEYOU,” which advance money to cover expenses. New
our interest, hustlers can do anything to reportedly caused $5 billion–$8 billion unexpected expenses come up repeat-
us and we won’t notice. damage worldwide, exploited these edly, always with the promise that the
The young lady who falls prey to two principles. money is just about to be transferred.
the recruitment scam is so engrossed In computing, the well-known ten- These “convincers” keep the mark
in her job-finding task that she totally sion between security and usability is focused solely on the huge sum he is
fails to even suspect that the whole also related to Distraction. Users care promised to receive.
agency might be a fraud. only about what they want to access Are only unsophisticated 419 vic-
Distraction is at the heart of innu- and are essentially blind to the fact that tims gullible? Abagnale1 showed the
Photogra ph by C ollin Pa rk er
merable fraud scenarios. It is also a “the annoying security gobbledygook” Distraction principle works equally
fundamental ingredient of most magic is there to protect them. Smart crooks well on highly educated CTOs and
performances,5 which is not surpris- exploit this mismatch to their advan- CIOs. In 1999, he visited a company full
ing if we see such performances as a tage; a lock that is inconvenient to use of programmers frantically fixing code
“benign fraud” for entertainment pur- is often left open. to avert the Y2K bug. He asked the exec-
poses. Distraction is used in all cases Distraction also plays a role in the utives how they found all the program-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 71
contributed articles
mers and was told “these guys from Social Compliance is the founda- “Yes, the game looks difficult, but I
India” knew computers well and were tion for phishing. For example our did guess where the winning disc was,
inexpensive. But, Abagnale thought, banks, which hold all our money, order even if that guy lost.” Shills are a key
any dishonest programmer from an us to type our password, and, naturally, ingredient.
offshore firm fixing Y2K problems we do. It’s difficult to fault nontechni- In online auctions, a variety of
could also easily implant a backdoor… cal users on this one if they fail to no- frauds are possible if bidders are in
People focused on what they want to tice the site was only a lookalike. Note cahoots with the auctioneer. EBay pio-
do are distracted from the task of pro- the conflict between a bank’s security neered a reputation system in which
tecting themselves. Security engineers department telling customers “never bidders and auctioneers rate each oth-
who don’t understand this principle click on email links” and the marketing er through public feedback. But fraud-
have already lost the battle. department of the same bank sending sters might boost their reputations
them clickable email advertisements through successful transactions with
Social Compliance Principle for new financial products, putting the shills. Basic reputation systems are
Society trains people to not question au- customers in double jeopardy. largely ineffective against shills.
thority. Hustlers exploit this “suspension System architects must coherently In online communities and social
of suspiciousness” to make us do what align incentives and liabilities with networks, multiple aliases created by
they want. overall system goals. If users are ex- certain participants to give the impres-
The jeweler in a jewelry-shop scam pected to perform sanity checks rather sion that others share their opinions
gratefully hands over necklace and than blindly follow orders, then social are indicated as “sock-puppets.” In po-
cash when “policeman” Alex says protocols must allow “challenging the litical elections, introducing fake iden-
they’re needed as evidence, believing authority”; if, on the contrary, users are tities to simulate grass-roots support
him saying they’ll be returned later. expected to obey authority unquestion- for a candidate is called “astroturfing.”
Access control to sensitive databas- ingly, those with authority must relieve In reputation systems in peer-to-peer
es may involve an exploitable human them of liability if they obey a fraud- networks, as opposed to reputation
element. For example, social-engi- ster. The fight against phishing and all systems in human communities, mul-
neering-expert Mitnick7 impersonates other forms of social engineering can tiple entities controlled by the same
a policeman to nothing less than a never be won unless this principle is attacker are called “Sybils.” The variety
law-enforcement agency. He builds understood. of terms created for different contexts
up credibility and trust by exhibiting testifies to the wide applicability of the
knowledge of the lingo, procedures, Herd Principle Herd principle to many kinds of multi-
and phone numbers. He makes the Even suspicious marks let their guard user systems.
clerk consult the National Crime Infor- down when everyone around them ap-
mation Center database and acquires pears to share the same risks. Safety in Dishonesty Principle
confidential information about a cho- numbers? Not if they’re all conspiring Our own inner larceny is what hooks us
sen victim. His insightful observation against us. initially. Thereafter, anything illegal we
is that the police and military, far from In the Monte, most participants are do will be used against us by fraudsters.
being a tougher target, are inherently shills. The whole game is set up to give In the Monte, the shills encour-
more vulnerable to social engineering the mark confidence and make him age the mark to cheat the operator
as a consequence of their strongly in- think: “Yes, the game looks dodgy, but and even help him do it. Then, having
grained respect for rank. other people are winning money,” and fleeced the mark, the operator pre-
tends to notice the mark’s attempt at
Principles to which victims respond, as identified by three sets of researchers. cheating, using it as a reason for clos-
ing the game without giving him a
chance to argue.
Cialdini Lea et al. Stajano-Wilson When hustlers sell stolen goods,
Principle (1985–2009) (2009) (2009) the implied message is “It’s illegal;
Distraction ~
that’s why you’re getting such a good
Social Compliance (a.k.a. “Authority”)
deal,” so marks won’t go to the police
Herd (a.k.a. “Social Proof”)
once they discover they’ve been had.
Dishonesty
The Dishonesty Principle is at the
Kindness ~
core of the 419; once a mark realizes
Need and Greed (a.k.a. “Visceral Triggers”) ~
it’s a scam, calling the police is scary
Scarcity (related to our “Time”) ~ because the mark’s part of the deal
Commitment and Consistency
(essentially money laundering) was in
Reciprocation ~ itself illegal and punishable. Several
First identified this principle victims have gone bankrupt, and some
Also lists this principle have even committed suicide, seeing
~ Lists a related principle
no way out of this tunnel.
The security engineer must be
aware of the Dishonesty Principle. A
72 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles
number of attacks on the system go his or her personal situation; if the Time Principle
unreported because the victims won’t mark is on the verge of bankruptcy, When under time pressure to make an
confess to their “evil” part in the pro- needs major surgery, or is otherwise important choice, we use a different de-
cess. When corporate users fall prey in dire straits, then questioning the cision strategy, and hustlers steer us to-
to a Trojan horse program purporting offer of a solution is very difficult. In ward one involving less reasoning.
to offer, say, free access to porn, they such cases the mark is not greedy, just In the ring-reward rip-off, the mark
have strong incentives not to cooper- depressed and hopeful. If someone is made to believe he must act quickly
ate with the forensic investigations of prays every day for an answer, an email or lose the opportunity. When caught
system administrators to avoid the as- message from a Nigerian Prince might in such a trap, it’s very difficult for
sociated stigma, even if the incident seem like the heaven-sent solution. people to stop and assess the situation
affected the security of the whole cor- The inclusion of sexual appetite as properly.
porate network. Executives for whom a fundamental human need justifies, Unlike the theory of rational choice,
righteousness is not as important as through this principle, the presence that is, that humans take their deci-
the security of their enterprise might of a “sexy swindler” in most scams en- sion after seeking the optimal solution
consider reflecting such priorities in acted by "the trio." As noted, the Need based on all the available information,
the corporate security policy, perhaps and Greed Principle and the Distrac- Simon8 suggested that “organisms
by guaranteeing discretion and immu- tion Principle are often connected; adapt well enough to ‘satisfice’; they do
nity from “internal prosecution” for victims are distracted by (and toward) not, in general, ‘optimize’.”
victims who cooperate with forensic that which they desire. This drive is ex- They may “satisfice,” or reach a
investigations. ploited by a vast proportion of fraudu- “good-enough” solution, through sim-
lent email messages (such as those plifying heuristics rather than the com-
Kindness Principle involving length enhancers, dates with plex, reasoned strategies needed for
People are fundamentally nice and will- attractive prospects, viruses, and Tro- finding the best solution, despite heu-
ing to help. Hustlers shamelessly take ad- jans, including ILOVEYOU). ristics occasionally failing, as studied
vantage of it. An enlightened system administra- by Tversky and Kahneman.10
This principle is, in some sense, the tor once unofficially provided a few Though hustlers may have never
dual of the Dishonesty Principle, as gigabytes of soft porn on an intranet formally studied the psychology of de-
perfectly demonstrated by the Good Sa- server in order to make it unnecessary cision making, they intuitively under-
maritan scam. In it, marks are hustled for local users to go looking for such stand the shift. They know that, when
primarily because they volunteer to material on dodgy sites outside the forced to take a decision quickly, a
help. It is loosely related to Cialdini’s corporate firewall, thereby reducing at mark will not think clearly, acting on
Reciprocation Principle (people return the same time connection charges and impulse according to predictable pat-
favors)2 but applies even in the absence exposure to malware. terns. So they make their marks an of-
of a “first move” from the hustler. A va- If we want to con someone, all we fer they can’t refuse, making it clear
riety of scams that propagate through need to know is what they want, even to them that it’s their only chance to
email or social networks involve tear- if it doesn’t exist. If security engineers accept it. This pattern is evident in
jerking personal stories or follow disas- do not understand what users want, the 419 scam and in phishing (“You’ll
ter news (tsunami, earthquake, hurri- and that they want it so badly they’ll go lose access to your bank account if you
cane), taking advantage of the generous to any lengths to get it, then they won’t don’t confirm your credentials imme-
but naïve recipients following their understand what drives users and diately”) but also in various email of-
spontaneous kindness before suspect- won’t be able to predict their behavior. fers and limited-time discounts in the
ing anything. Many “social engineer- Engineers always lose against fraud- gray area between acceptable market-
ing” penetrations of computer systems7 sters who do understand how they can ing techniques and outright swindle.
also rely on victims’ innate helpfulness. lead their marks. This brings us back As modern computerized marketing
to the security/usability trade-off: Lec- relies more and more on profiling indi-
Need and Greed Principle turing users about disabling ActiveX vidual consumers to figure out how to
Our needs and desires make us vulner- or Flash or Javascript from untrusted press their buttons, we might periodi-
able. Once hustlers know what we want, sites is pointless if these software com- cally have to revise our opinions about
they can easily manipulate us. ponents are required to access what us- which sales methods, while not yet ille-
Loewenstein4 speaks of “visceral ers want or need (such as their online gal, are ethically acceptable.
factors such as the cravings associated social network site or online banking From a systems point of view, the
with drug addiction, drive states (such site or online tax return site). Fraud- Time Principle is particularly impor-
as hunger, thirst, and sexual desire), sters must merely promise some entic- tant, highlighting that, due to the hu-
moods and emotions, and physical ing content to enroll users as unwitting man element, the system’s response
pain.” We say “Need and Greed” to re- accomplices who unlock the doors to the same stimulus may be radically
fer to this spectrum of human needs from inside. different depending on the urgency
and desires—all the stuff we really The defense strategy should also in- with which it is requested. In military
want, regardless of moral judgement. clude user education; as the Real Hustle contexts this is taken into account by
In the 419 scam, what matters most is TV show often says, “If it sounds too wrapping dangerous situations that re-
not necessarily the mark’s greed but good to be true, it probably is.” quire rapid response (such as challeng-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 73
contributed articles
Representative Scams
Since 2006, the Real Hustle TV show operator undetectably switches two operator has made the switch. More
has recreated hundreds of scams during cards. One might therefore imagine the important, even if the cards were marked
which Paul, Alex, and Jess defrauded basic scam to consist of performing a few in some way, there is absolutely no way
unsuspecting victims before hidden “demo runs” where marks are allowed to for a legitimate player to secure a win;
cameras. Here are five instructive ones: guess correctly, then have them bet with should a mark consistently bet on the
In the lingo of this peculiar “trade,” real money and at that point send the correct position, then other players,
the victim of the scam is the mark, the winning card elsewhere. actually shills, would over-bet him,
perpetrator is the operator, and any But this so-called “game” is really a “forcing” the operator to take the larger
accomplice pretending to be a regular cleverly structured piece of street theater bet. This frustrates the mark, who
customer is a shill. designed to attract passersby and hook often increases his bet to avoid being
Monte. This classic scam involves them into the action. The sleight-of-hand topped. One shill will then pretend to
an operator manipulating three cards element is actually least important; it is help the mark by bending a corner of
(or disks or shells: there are many the way marks are manipulated, rather the winning card while the operator is
variations), one of which wins, while the than the props, that brings in the money. distracted, making the mark think he has
other two lose. The operator shows the It’s all about the crowd of onlookers and an unbeatable advantage. This is a very
player the cards, turns them over face players (all shills) betting in a frenzy and strong play; marks have been seen to drop
down, then moves them around on the irresistibly sucking marks into wanting a thousands of dollars only to find the bent
table in full view. Players must follow the piece of the action. card is actually a loser. While mixing the
moves and put money on the card they The Monte is an excellent example cards, it is possible for a skilled operator
believe to be the winner. The operator that nothing is what it seems, even if the to switch the cards and switch the bend
pays out an equal amount if the player marks think they know what to expect. from one card to another.
guessed correctly or otherwise pockets Many people claim to be able to beat the The idea that one can beat the game
the player’s money. game, purely because they understand at all reveals a key misunderstanding—
Technically, at the core of the scam the mechanics of the secret move. But it’s that, in fact, it is not a game in the first
is a sleight-of-hand trick whereby the impossible to tell whether an experienced place. Monte mobs never pay out to the
ing strangers at a checkpoint or being Related Work findings are in substantial agreement.
ordered to launch a nuclear missile) While a few narrative accounts of The table here summarizes and com-
in special “human protocols” meant scams and frauds are available, from pares the principles identified in each
to enforce, even under time pres- Maurer’s study of the criminal world6 of these works.
sure, some of the step-by-step rational that inspired the 1973 movie The Sting
checks the heuristic strategy would to the autobiographical works of no- Conclusion
otherwise omit. table fraudsters,1,7 the literature con- We supported our thesis—that systems
The security architect must identify tains little about systematic studies of involving people can be made secure
the situations in which the humans in fraudsters’ psychological techniques. only if designers understand and ac-
the system may suddenly be put un- But we found two notable exceptions: knowledge the inherent vulnerabili-
der time pressure by an attacker and Cialdini’s outstanding book Influ- ties of the “human factor”—with three
whether the resulting switch in deci- ence: Science and Practice,2 based on main contributions:
sion strategy might open a vulnerabil- undercover field research, revealed First is a vast body of original re-
ity. This directive applies to anything how salespeople’s “weapons of influ- search on scams, initially put together
from retail situations to stock trading ence” are remarkably similar to those by Wilson and Conran. It started as a TV
and online auctions and from admit- of fraudsters; indeed, all of his prin- show, not as a controlled scientific ex-
ting visitors into buildings to handling ciples apply to our scenario and vice periment, but our representative write-
medical emergencies. Devising a hu- versa. Meanwhile, Lea et al.3 examined up9 still offers valuable firsthand data
man protocol to guide and pace the re- postal scams, based on a wealth of ex- not otherwise available in the literature;
sponse of the potential victim toward perimental data, including interviews Second, from these hundreds of
the desired goal may be an adequate with victims and lexical analysis of scams, we abstracted seven principles.
safeguard and also relieve the victim fraudulent letters. Even though our The particular principles are not that
from stressful responsibility. approaches were quite different, our important, and others have found
marks; they keep all the money moving a reward. The barman gets back to the and, crucially, the necklace, which will,
between the shills and the operator. The phone and Jess, very relieved to hear the of course, “be returned.” The jeweler is
marks are allowed to place a bet only if ring is there, says, without prompting, extremely grateful the cops saved her
it’s already a loser. Having studied Monte that she’ll give $200 to the person who from the evil fraudster.
all over the world, we can say it’s nothing found it. But the barman goes back to Ironically, as Jess is taken away in
short of a polite way to mug people. Paul and says the reward is only $20. handcuffs, the upset jeweler spits out a
Ring reward rip-off. The gorgeous That’s when the hustlers know they’ve venomous “Bitch! You could have cost
Jess buys a cheap ring from a market got him; he’s trying to make some profit me my job. You know that?”
stall for $5. She then goes to a pub and for himself. Paul haggles a bit and Recruitment scam. Hustlers set up a
seductively befriends the barman (the eventually returns the ring to the barman fake recruitment agency and, as part of
mark). She makes it obvious she’s very for $50. The mark is all too happy to the sign-on procedure, collect all of the
rich; showing off to her friend (a shill), advance the money to Paul, expecting to applicants’ personal details, including
she makes sure the mark overhears that get much more from Jess. Jess, of course, mother’s maiden name, date of birth,
she just received this amazing $3,500 never calls back. bank-account details, passport number,
diamond ring for her birthday. She then A convicted criminal proudly says even PIN—by asking them to protect
leaves. he once made a $2,000 profit with this their data with a four-digit code, as many
Paul and Alex arrive at the pub, particular hustle. people memorize only one PIN and
posing as two blokes having a pint. Jewelry-shop scam. Jess attempts use it for everything. With this loot, the
Jess then phones the pub, very worried, to buy an expensive necklace but is hustlers are free to engage in identity
calling her friend the barman by name, “arrested” by Alex and Paul posing theft on everyone who came in for an
saying she lost that very precious ring. as plainclothes police officers who interview.
Could he check if it’s there somewhere? expose her as a well-known fraudster, Good Samaritan scam. In a parking
The mark checks, and, luckily, a notorious for paying with counterfeit lot, Jess has jacked up her car but seems
customer (Paul) found the ring. However, cash. The “cops” collect as evidence the stuck. When another car stops nearby,
instead of handing it over, Paul demands “counterfeit” (actually genuine) cash she politely asks the newcomers to
help her change the tire, which they do.
Apologizing for her cheekiness, she then
also asks them if she could get into their
car, as she’s been out in the cold for a
while and is freezing. The gentleman gives
her the keys to his car (required to turn on
the heat) and, while the marks are busy
changing her tire, she drives off with the
car. But didn’t Jess just lose her original
car? No, because it wasn’t hers to start
with; she just jacked up a random one in
A mark, debriefed by accompanying TV crew, is From right to left: Jess gets two marks to change her the parking lot. To add insult to injury, the
dismayed to learn the hustlers just got hold of all her tire before tricking them into handing over their own car marks will also have some explaining to
sensitive personal details in the Recruitment Scam. keys in the Good Samaritan Scam. do when the real owners of the car arrive.
slightly different ones. What matters Acknowledgments Processes 65, 3 (Mar. 1996), 272–292.
5. Macknik, S.L., King, M., Randi, J., Robbins, A., Teller,
is recognizing the existence of a small Special thanks to Alex Conran for Thompson, J., and Martinez-Conde, S. Attention
set of behavioral patterns that ordinary co-writing the TV series and to Alex and awareness in stage magic: Turning tricks into
research. Nature Reviews Neuroscience 9, 11 (Nov.
people exhibit and that hustlers have and Jess Clement for co-starring in it. 2008), 871–879.
been exploiting forever; and Thanks to Joe Bonneau, danah boyd, 6. Maurer, D.W. The Big Con: The Story of the Confidence
Man. Bobbs-Merrill, New York, 1940.
Third, perhaps most significant, we Omar Choudary, Saar Drimer, Jeff 7. Mitnick, K.D. The Art of Deception: Controlling the
applied the principles to a more general Hancock, David Livingstone Smith, Human Element of Security. John Wiley & Sons, Inc.,
New York, 2002.
systems point of view. The behavioral Ford-Long Wong, Ross Anderson, 8. Simon, H.A. Rational choice and the structure of the
patterns are not just opportunities for Stuart Wray, and especially Roberto environment. Psychological Review 63, 2 (Mar. 1956),
129–138.
small-scale hustles but also vulnerabili- Viviani for useful comments on previ- 9. Stajano, F. and Wilson, P. Understanding Scam Victims:
Seven Principles for Systems Security. Technical
ties of the human component of any ous drafts. This article is updated and Report UCAM-CL-TR-754. University of Cambridge
complex system. abridged from the 2009 technical re- Computer Laboratory, Cambridge, U.K, 2009.
10. Tversky, A. and Kahneman, D. Judgment under
Our message for the system-security port9 by the same authors. uncertainty: Heuristics and biases. Science 185, 4157
architect is that it is naïve to lay blame (Sept. 1974), 1124–1131.
on users and whine, “The system I de- References
signed would be secure, if only users 1. Abagnale, F.W. The Art of the Steal: How to Protect Frank Stajano (frank.stajano@cl.cam.ac.uk) is a
Yourself and Your Business from Fraud. Broadway university senior lecturer in the Computer Laboratory of
were less gullible.” The wise security Books, New York, 2001. the University of Cambridge, Cambridge, U.K.
designer seeking a robust solution will 2. Cialdini, R.B. Influence: Science and Practice, Fifth
Edition. Pearson, Boston, MA, 2009; (First Edition
acknowledge the existence of these vul- 1985). Paul Wilson (info@conartist.tv) is an expert on cheating,
3. Lea et al. The Psychology of Scams: Provoking and award-winning conjuror, and magic inventor. He works in
nerabilities as an unavoidable conse- Committing Errors of Judgement. Technical Report film and television in London and Los Angeles.
quence of human nature and actively OFT1070. University of Exeter School of Psychology.
Office of Fair Trading, London, U.K., May 2009.
build safeguards that prevent their ex- 4. Loewenstein, G. Out of control: Visceral influences on
ploitation. behavior. Organizational Behavior and Human Decision © 2011 ACM 0001-0782/11/0300 $10.00
m a r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o m m u n i c at i o n s o f t he acm 75
review articles
doi:10.1145/1897852.1897873
problems with regular, slow-changing
The advent of multicore processors as the (or even static) communication and
coordination patterns. Such problems
standard computing platform will force major arise in scientific computing or in
changes in software design. graphics, but rarely in systems.
The future promises us multiple
by Nir Shavit cores on anything from phones to lap-
tops, desktops, and servers, and there-
Data
fore a plethora of applications char-
acterized by complex, fast-changing
interactions and data exchanges.
Why are these dynamic interactions
Structures
and data exchanges a problem? The
formula we need in order to answer this
question is called Amdahl’s Law. It cap-
tures the idea that the extent to which
in the
we can speed up any complex computa-
tion is limited by how much of the com-
putation must be executed sequentially.
Define the speedup S of a computa-
Multicore Age
tion to be the ratio between the time
it takes one processor to complete the
computation (as measured by a wall
clock) versus the time it takes n concur-
rent processors to complete the same
computation. Amdahl’s Law character-
izes the maximum speedup S that can
be achieved by n processors collaborat-
ing on an application, where p is the
fraction of the computation that can be
executed in parallel. Assume, for sim-
plicity, that it takes (normalized) time
1 for a single processor to complete the
computation. With n concurrent pro-
“Multicor e proce ssors ar e about to revolutionize cessors, the parallel part takes time p/n,
the way we design and use data structures.” and the sequential part takes time 1− p.
You might be skeptical of this statement; after Overall, the parallelized computation
p
takes time 1− p + n . Amdahl’s Law says
all, are multicore processors not a new class of the speedup, that is, the ratio between
multiprocessor machines running parallel programs,
just as we have been doing for more than a quarter key insights
of a century?
We are experiencing a fundamental shift
in the properties required of concurrent
The answer is no. The revolution is partly due to data structures and of the algorithms at
the core of their implementation.
changes multicore processors introduce to parallel
The data structures of our childhood —
architectures; but mostly it is the result of the change stacks, queues, and heaps — will
soon disappear, replaced by looser
in the applications that are being parallelized:
Illustratio n by A ndy Gilmo re
mainstream computing.
Future software engineers will need
to learn how to program using these
Before the introduction of multicore processors, novel structures, understanding
their performance benefits and their
parallelism was largely dedicated to computational fairness limitations.
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 77
review articles
the sequential (single-processor) time The safety aspects of concurrent complexity model requires us to con-
and the parallel time, is: data structures are complicated by the sider a new element: stalls.2,7–10 When
need to argue about the many possible threads concurrently access a shared
1
S= interleavings of methods called by dif- resource, one succeeds and others in-
p
1 – p + n ferent threads. It is infinitely easier and cur stalls. The overall complexity of
more intuitive for us humans to specify the algorithm, and hence the time it
In other words, S does not grow lin- how abstract data structures behave in might take to complete, is correlated
early in n. For example, given an ap- a sequential setting, where there are no to the number of operations together
plication and a 10-processor machine, interleavings. Thus, the standard ap- with the number of stalls (obviously
Amdahl’s Law says that even if we man- proach to arguing the safety properties this is a crude model that does not take
age to parallelize 90% of the applica- of a concurrent data structure is to spec- into account the details of cache co-
tion, but not the remaining 10%, then ify the structure’s properties sequential- herence). From an algorithmic design
we end up with a fivefold speedup, but ly, and find a way to map its concurrent point of view, this model introduces a
not a 10-fold speedup. Doubling the executions to these “correct” sequential continuum starting from centralized
number of cores to 20 will only raise us ones. There are various approaches for structures where all threads share data
to a sevenfold speedup. So the remain- doing this, called consistency condi- by accessing a small set of locations,
ing 10%, those we continue to execute tions. Some familiar conditions are se- incurring many stalls, to distributed
sequentially, cut our utilization of the rializability, linearizability, sequential structures with multiple locations, in
10 processor machine in half, and limit consistency, and quiescent consistency. which the number of stalls is greatly re-
us to a 10-fold speedup no matter how When considering liveness in a con- duced, yet the number of steps neces-
many cores we add. current setting, the good thing one ex- sary to properly share data and move it
What are the 10% we found difficult pects to happen is that method calls around increases significantly.
to parallelize? In many mainstream eventually complete. The terms un- How will the introduction of multi-
applications they are the parts of the der which liveness can be guaranteed core architectures affect the design of
program involving interthread inter- are called progress conditions. Some concurrent data structures? Unlike on
action and coordination, which on familiar conditions are deadlock- uniprocessors, the choice of algorithm
multicore machines are performed by freedom, starvation-freedom, lock- will continue, for years to come, to be
concurrently accessing shared data freedom, and wait-freedom. These greatly influenced by the underlying
structures. Amdahl’s Law tells us it is conditions capture the properties an machine’s architecture. In particular,
worthwhile to invest an effort to derive implementation requires from the un- this includes the number of cores,
as much parallelism as possible from derlying system scheduler in order to their layout with respect to memory
these 10%, and a key step on the way to guarantee that method calls complete. and to each other, and the added cost
doing so is to have highly parallel con- For example, deadlock-free implemen- of synchronization instructions (on a
current data structures. tations depend on strong scheduler multiprocessor, not all steps were cre-
Unfortunately, concurrent data support, while wait-free ones do all the ated equal).
structures are difficult to design. work themselves and are independent However, I expect the greatest
There is a kind of tension between of the scheduler. change we will see is that concurrent
correctness and performance: the Finally, we have the performance data structures will go through a sub-
more one tries to improve perfor- of our data structures to consider. His- stantiative “relaxation process.” As
mance, the more difficult it becomes torically, uniprocessors are modeled the number of cores grows, in each of
to reason about the resulting algo- as Turing machines, and one can ar- the categories mentioned, consistency
rithm as being correct. Some experts gue the theoretical complexity of data conditions, liveness conditions, and
blame the widely accepted threads- structure implementations on uni- the level of structural distribution, the
and-objects programming model processors by counting the number of requirements placed on the data struc-
(that is, threads communicating via steps—the machine instructions—that tures will have to be relaxed in order to
shared objects), and predict its even- method calls might take. There is an im- support scalability. This will put a bur-
tual demise will save us. My experi- mediate correlation between the theoret- den on programmers, forcing them to
ence with the alternatives suggests ical number of uniprocessor steps and understand the minimal conditions
this model is here to stay, at least the observed time a method will take. their applications require, and then
for the foreseeable future. So let us, In the multiprocessor setting, things use as relaxed a data structure as pos-
in this article, consider correctness are not that simple. In addition to the sible in the solution. It will also place a
and performance of data structures actual steps, one needs to consider burden on data structure designers to
on multicore machines within the whether steps by different threads re- deliver highly scalable structures once
threads-and-objects model. quire a shared resource or not, because the requirements are relaxed.
In the concurrent world, in contrast these resources have a bounded capac- This article is too short to allow a
to the sequential one, correctness has ity to handle simultaneous requests. survey of the various classes of concur-
two aspects: safety, guaranteeing that For example, multiple instructions ac- rent data structures (such a survey can
nothing bad happens, and liveness, cessing the same location in memory be found in Moir and Shavit17) and how
guaranteeing that eventually some- cannot be serviced at the same time. one can relax their definitions and im-
thing good will happen. In its simplest form, our theoretical plementations in order to make them
scale. Instead, let us focus here on one tion’s value is equal to the expected In Figure 1, the push() method cre-
abstract data structure—a stack—and value, then it is replaced by the update ates a new node and then calls try-
use it as an example of how the design value, and otherwise the value is left Push() to try to acquire the lock. If the
process might proceed. unchanged. The method call returns a CAS is successful, the lock is set to true
I use as a departure point the ac- Boolean indicating whether the value and the method swings the top refer-
ceptable sequentially specified notion changed. A typical CAS takes signifi- ence from the current top-of-stack to
of a Stack<T> object: a collection of cantly more machine cycles than a read its successor, and then releases the
items (of type T) that provides push() or a write, but luckily, the performance lock by setting it back to false. Other-
and pop() methods satisfying the last- of CAS is improving as new generations wise, the tryPush() lock acquisition
in-first-out (LIFO) property: the last of multicore processors role out. attempt is repeated. The pop() method
item pushed is the first to be popped.
We will follow a sequence of refine- Figure 1. A lock-based Stack<T>: in the push() method, threads alternate between
trying to push an item onto the stack and managing contention by backing off before
ment steps in the design of concurrent retrying after a failed push attempt.
versions of stacks. Each step will ex-
pose various design aspects and relax
1 public class LockBasedStack<T> {
some property of the implementation. 2 private AtomicBoolean lock =
My hope is that as we proceed, the read- 3 new AtomicBoolean(false);
er will grow to appreciate the complexi- 4 ...
ties involved in designing a correct 5 protected boolean tryPush(Node node) {
6 boolean gotLock = lock.compareAndSet(false, true);
scalable concurrent data-structure. 7 if (gotLock) {
8 Node oldTop = top;
A Lock-based Stack 9 node.next = oldTop;
10 top = node;
We begin with a LockBasedStack<T>
11 lock.set ( false );
implementation, whose Java pseudo- 12 }
code appears in figures 1 and 2. The 13 return gotLock;
pseudocode structure might seem a bit 14 }
15 public void push(T value) {
cumbersome at first, this is done in or- 16 Node node = new Node(value);
der to simplify the process of extending 17 while (true) {
it later on. 18 if (tryPush(node)) {
19 return;
The lock-based stack consists of a
20 } else {
linked list of nodes, each with value 21 contentionManager.backoff();
and next fields. A special top field 22 }
points to the first list node or is null if 23 }
24 }
the stack is empty. To help simplify the
presentation, we will assume it is illegal
to add a null value to a stack.
Access to the stack is controlled Figure 2. The lock-based Stack<T>: The pop() method alternates between trying to pop
by a single lock, and in this particular and backing off before the next attempt.
case a spin-lock: a software mechanism
in which a collection of competing 1 protected Node tryPop() throws EmptyException {
threads repeatedly attempt to choose 2 boolean gotLock = lock.compareAndSet(false, true);
exactly one of them to execute a section 3 if (gotLock) {
4 Node oldTop = top;
of code in a mutually exclusive man- 5 if (oldTop == null) {
ner. In other words, the winner that 6 lock . set ( false );
acquired the lock proceeds to execute 7 throw new EmptyException();
8 }
the code, while all the losers spin, wait-
9 top = oldTop.next;
ing for it to be released, so they can at- 10 return oldTop;
tempt to acquire it next. 11 lock . set ( false );
The lock implementation must en- 12 }
13 else return null ;
able threads to decide on a winner. This 14 }
is done using a special synchronization 15 public T pop() throws EmptyException {
instruction called a compareAndSet() 16 while (true) {
17 Node returnNode = tryPop();
(CAS), available in one form or another
18 if (returnNode != null) {
on all of today’s mainstream multicore 19 return returnNode.value ;
processors. The CAS operation executes 20 } else {
a read operation followed by a write op- 21 contentionManager.backoff();
22 }
eration, on a given memory location, in 23 }
one indivisible hardware step. It takes 24 }
two arguments: an expected value and
an update value. If the memory loca-
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 79
review articles
in Figure 2 calls tryPop(), which at- in which each push or pop take effect from the system, all threads accessing
tempts to acquire the lock and remove at some non-overlapping instant dur- the stack will be delayed whenever one
the first node from the stack. If it suc- ing their method calls. In particular, is preempted. Modern operating sys-
ceeds, it throws an exception if the we could think of them taking effect tems can deal with these issues, and
stack is empty, and otherwise it returns when the executing thread acquired will have to become even better at han-
the node referenced by top. If tryPop() the lock. Linearizability is a desired dling them in the future.
fails to acquire the lock it returns null property because linearizable objects In terms of progress, the locking
and is called again until it succeeds. can be composed without having to scheme is deadlock-free, that is, if sev-
What are the safety, liveness, and know anything about their actual im- eral threads all attempt to acquire the
performance properties of our imple- plementation. lock, one will succeed. But it is not
mentation? Well, because we use a But there is a price for this obvious starvation-free: some thread could be
single lock to protect the structure, it atomicity. The use of a lock introduces unlucky enough to always fail in its CAS
is obvious its behavior is “atomic” (the a dependency on the operating system: when attempting to acquire the lock.
technical term used for this is lineariz- we must assume the scheduler will not The centralized nature of the lock-
able15). In other words, the outcomes of involuntarily preempt threads (at least based stack implementation introduces
our concurrent execution are equiva- not for long periods) while they are a sequential bottleneck: only one thread
lent to those of a sequential execution holding the lock. Without such support at a time can complete the update of the
data structure’s state. This, Amdahl’s
Figure 3. The lock-free tryPush() and tryPop() methods. Law tells us, will have a very negative ef-
fect on scalability, and performance will
1 public class LockFreeStack<T> { not improve as the number of cores/
2 private AtomicReference<Node> top = threads increases.
3 new AtomicReference<Node>(null);
4 ...
But there is another separate phe-
5 nomenon here: memory contention.
6 protected boolean tryPush(Node node) { Threads failing their CAS attempts on
7 Node oldTop = top.get();
the lock retry the CAS again even while
8 node.next = oldTop;
9 return top.compareAndSet(oldTop, node); the lock is still held by the last CAS “win-
10 } ner” updating the stack. These repeated
11 attempts cause increased traffic on the
12 protected Node tryPop() throws EmptyException {
13 Node oldTop = top.get();
machine’s shared bus or interconnect.
14 if (oldTop == null) { Since these are bounded resources, the
15 throw new EmptyException(); result is an overall slowdown in per-
16 } formance, and in fact, as the number
17 Node newTop = oldTop.next;
18 if (top.compareAndSet(oldTop, newTop)) { of cores increases, we will see perfor-
19 return oldTop; mance deteriorate below that obtain-
20 } else { able on a single core. Luckily, we can
21 return null ;
22 }
deal with contention quite easily by add-
23 } ing a contention manager into the code
(Line 21 in figures 1 and 2).
The most popular type of conten-
tion manager is exponential backoff:
Figure 4. The EliminationBackoffStack<T>.
every time a CAS fails in tryPush() or
tryPop(), the thread delays for a cer-
C:pop() tain random time before attempting
the CAS again. A thread will double the
A:return(b)
range from which it picks the random
A:pop() C:return(d)
delay upon CAS failure, and will cut
it in half upon CAS success. The ran-
top domized nature of the backoff scheme
makes the timing of the thread’s at-
B:push(b) d e f tempts to acquire the lock less depen-
dent on the scheduler, reducing the
B:ok chance of threads falling into a repeti-
tive pattern in which they all try to CAS
at the same time and end up starving.
Each thread selects a random location in the array. If thread A’s pop() and thread B’s push() calls
Contention managers1,12,19 are key tools
arrive at the same location at about the same time, then they exchange values without accessing the in the design of multicore data struc-
shared lock-free stack. A thread C, that does not meet another thread, eventually pops the shared lock- tures, even when no locks are used, and
free stack. I expect them to play an even greater
role as the number of cores grows.
80 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
review articles
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 81
review articles
It is lock-free because we can easily cess the stack. Note that we described when the music stops. As we will see,
implement a lock-free exchanger using a lock-free implementation, but, as this relaxation will nevertheless pro-
a CAS operation, and the shared stack with many concurrent data structures, vide quite powerful semantics for the
itself is already lock-free. on some systems a lock-based imple- data structure. In particular, as with
In the EliminationBackoff- mentation might be more fitting and linearizability, quiescent consistency
Stack, the EliminationArray is deliver better performance. allows objects to be composed as black
used as a backoff scheme to a shared boxes without having to know anything
lock-free stack. Each thread first ac- An Elimination Tree about their actual implementation.
cesses the stack, and if it fails to com- A drawback of the elimination backoff Consider a binary tree of objects
plete its call (that is, the CAS attempt stack is that under very high loads the called balancers with a single input wire
on top fails) because there is conten- number of un-eliminated threads ac- and two output wires, as depicted in Fig-
tion, it attempts to eliminate using cessing the shared lock-free stack may ure 5. As threads arrive at a balancer, it
the array instead of simply backing off remain high, and these threads will con- repeatedly sends them to the top wire
in time. If it fails to eliminate, it calls tinue to have linear complexity. More- and then the bottom one, so its top wire
the lockfree stack again, and so on. A over, if we have, say, bursts of push calls always has one more thread than the
thread dynamically selects the sub- followed by bursts of pop calls, there bottom wire. The Tree[k] network is
range of the array within which it tries will again be no elimination and there- a binary tree of balancers constructed
to eliminate, growing and shrinking it fore no parallelism. The problem seems inductively by placing a balancer before
exponentially in response to the load. to be our insistence on having a lineariz- two Tree[k/2] networks of balancers
Picking a smaller subrange allows a able stack: we devised a distributed so- and not shuffling their outputs.22
greater chance of a successful rendez- lution that cuts down on the number of We add a collection of lock-free
vous when there are few threads, while stalls, but the theoretical worst case lin- stacks to the output wires of the tree.
a larger range lowers the chances of ear time scenario can happen too often. To perform a push, threads traverse the
threads waiting on a busy Exchanger This leads us to try an alternative balancers from the root to the leaves and
when the load is high. approach: relaxing the consistency then push the item onto the appropri-
In the worst case a thread can still condition for the stack. Instead of a ate stack. In any quiescent state, when
fail on both the stack and the elimi- linearizable stack, let’s implement a there are no threads in the tree, the out-
nation. However, if contention is low, quiescently consistent one.4,14 A stack put items are balanced out so that the
threads will quickly succeed in access- is quiescently consistent if in any exe- top stacks have at most one more than
ing the stack, and as it grows, there will cution, whenever there are no ongoing the bottom ones, and there are no gaps.
be a higher number of successful elim- push and pop calls, it meets the LIFO We can implement the balancers in
inations, allowing many operations to stack specification for all the calls that a straightforward way using a bit that
complete in parallel in only a constant preceded it. In other words, quiescent threads toggle: they fetch the bit and
number of steps. Moreover, contention consistency is like a game of musical then complement it (a CAS operation),
at the lock-free stack is reduced be- chairs, we map the object to the se- exiting on the output wire they fetched
cause eliminated operations never ac- quential specification when and only (zero or one). How do we perform
a pop? Magically, to perform a pop
Figure 5. A Tree[4] network leading to four lock-free stacks. threads traverse the balancers in the
opposite order of the push, that is, in
lock-free each balancer, after complementing
balancer top
the bit, they follow this complement,
5 3 1 5 1 lock-free stack the opposite of the bit they fetched.
1
wire 0 Try this; you will see that from one
2 2 quiescent state to the next, the items
5 1 1
4 removed are the last ones pushed onto
3 wire 1
3 the stack. We thus have a collection of
0 stacks that are accessed in parallel, yet
4 2 4 act as one quiescent LIFO stack.
The bad news is that our imple-
Threads pushing items arrive at the balancers in the order of their numbers, eventually pushing items onto mentation of the balancers using a bit
the stacks located on their output wires. In each balancer, a pushing thread fetches and then comple- means that every thread that enters the
ments the bit, following the wire indicated by the fetched value (If the state is 0 the pushing thread it
will change it to 1 and continue to wire 0, and if it was 1 will change it to 0 and continue on wire 1). The
tree accesses the same bit in the root
tree and stacks will end up in the balanced state seen in the figure. The state of the bits corresponds to balancer, causing that balancer to be-
5 being the last item, and the next location a pushed item will end up on is the lock-free stack containing come a bottleneck. This is true, though
item 2. Try it! A popping thread does the opposite of the pushing one: it complements the bit and follows
to a lesser extent, with balancers lower
the complemented value. Thus, if a thread executes a pop in the depicted state, it will end up switching a
1 to a 0 at the top balancer, and leave on wire 0, then reach the top 2nd level balancer, again switching a 1 in the tree.
to a 0 and following its 0 wire, ending up popping the last value 5 as desired. This behavior will be true for We can parallelize the tree by ex-
concurrent executions as well: the sequences of values in the stacks in all quiescent states can be shown ploiting a simple observation similar
to preserve LIFO order.
to one we made about the elimination
backoff stack:
82 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
review articles
If an even number of threads passes es to memory, and to maintain locality (perhaps with some added liveness
through a balancer, the outputs are as much as possible. conditions)…time will tell.
evenly balanced on the top and bot- What are the implications for our Our overall concurrent pool design
tom wires, but the balancer’s state re- stack design? Consider completely re- is quite simple. As depicted in Figure
mains unchanged. laxing the LIFO property in favor of a 7, we allocate a collection of n concur-
The idea behind the Elimination- Pool<T> structure in which there is rent lock-free stacks, one per com-
Tree<T>20,22 is to place an Elimina- no temporal ordering on push() and puting thread (alternately we could
tionArray in front of the bit in every pop() calls. We will provide a concur- allocate one stack per collection of
balancer as in Figure 6. If two popping rent lock-free implementation of a pool threads on the same core, depending
threads meet in the array, they leave that supports high parallelism, high lo- on the specific machine architecture).
on opposite wires, without a need to cality, and has a low cost in terms of the Each thread will push and pop from
touch the bit, as anyhow it would have overall number of accesses to memory. its own assigned stack. If, when it at-
remained in its original state. If two How useful is such a concurrent pool? tempts to pop, it finds its own stack
pushing threads meet in the array, they I would like to believe that most con- is empty, it will repeatedly attempt
also leave on opposite wires. If a push current applications can be tailored to to “steal” an item from another ran-
or pop call does not manage to meet use pools in place of queues and stacks domly chosen stack.b The pool has, in
another in the array, it toggles the bit
and leaves accordingly. Finally, if a Figure 6. The EliminationTree<T>.
push and a pop meet, they eliminate,
exchanging items as in the Elimina- ½ width
elimination
tionBackoffStack. It can be shown balancer
that this implementation provides C:return(5)
a quiescently consistent stack,a in elimination 5 1
which, in most cases, it takes a thread balancer
1 A: ok
O(log k) steps to complete a push or a 2
A:push(6)
pop, where k is the number of lock-free
B:return(6)
stacks on its output wires.
B:pop()
1
A Pool Made of Stacks
The collection of stacks accessed in C:pop()
parallel in the elimination tree provides D:pop()
3
quiescently consistent LIFO ordering 0
with a high degree of parallelism. How- E:push(7)
4
ever, each method call involves a loga-
rithmic number of memory accesses, E: ok D:return(7)
each involving a CAS operation, and
these accesses are not localized, that Each balancer in Tree[4] is an elimination balancer. The state depicted is the same as in Figure 5. From
is, threads are repeatedly accessing lo- this state, a push of item 6 by thread A will not meet any others on the elimination arrays and so will
cations they did not access recently. toggle the bits and end up on the 2nd stack from the top. Two pops by threads B and C will meet in the
This brings us to the final two is- top balancer’s array and end up going up and down without touching the bit, ending up popping the last
two values 5 and 6 from the top two lock-free stacks. Finally, threads D and E will meet in the top array
sues one must take into account when and “eliminate” each other, exchanging the value 7 and leaving the tree. This does not ruin the tree’s state
designing concurrent data structures: since the states of all the balancers would have been the same even if the threads had both traversed
the machine’s memory hierarchy and all the way down without meeting: they would have anyhow followed the same path down and ended up
exchanging values via the same stack.
its coherence mechanisms. Main-
stream multicore architectures are
cache coherent, where on most ma-
chines the L2 cache (and in the near fu- Figure 7. The concurrent Pool<T>.
ture the L3 cache as well) is shared by
all cores. A large part of the machine’s
performance on shared data is derived Each thread performs push() and
from the threads’ ability to find the pop() calls on a lock-free stack A:push(5) 5 1
and attempts to steal from other
data cached. The shared caches are stacks when a pop() finds the
choose
random
unfortunately a bounded resource, local stack empty. In the figure, B:push(6) 6 2 stack to
both in their size and in the level of ac- thread C will randomly select the steal from
top lock-free stack, stealing the C:pop()
cess parallelism they offer. Thus, the value 5. If the lock-free stacks
data structure design needs to attempt are replaced by lock-free deques,
to lower the overall number of access- thread C will pop the oldest value, D:pop() 4
returning 1.
b One typically adds a termination detection pro-
a To keep things simple, pop operations should tocol14 to the structure to guarantee that threads
block until a matching push appears. will know when there remain no items to pop.
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 83
review articles
the common case, the same O(1) com- the machine’s size and the applica- 2. Anderson, J. and Kim, Y. An improved lower bound
for the time complexity of mutual exclusion. In
plexity per method call as the original tion’s concurrency requirements. For Proceedings of the 20th Annual ACM Symposium on
lockfree stack, yet provides a very high example, small collections of threads Principles of Distributed Computing (2001), 90−99.
3. Arora, N.S., Blumofe, R.D. and Plaxton, C.G. Thread
degree of parallelism. The act of steal- can effectively share a lock-based or scheduling for multiprogrammed multiprocessors.
ing itself may be expensive, especially lock-free stack, slightly larger ones an Theory of Computing Systems 34, 2 (2001), 115−144.
4. Aspnes, J., Herlihy, M. and Shavit, N. Counting
when the pool is almost empty, but elimination stack, but for hundreds networks. J. ACM 41, 5 (1994), 1020−1048.
there are various techniques to reduce of threads we will have to bite the bul- 5. Blumofe, R.D. and Leiserson, C.E. Scheduling
multithreaded computations by work stealing. J. ACM
the number of steal attempts if they let and move from a stack to a pool 46, 5 (1999), 720−748.
are unlikely to succeed. The random- (though within the pool implementa- 6. Chase, D. and Lev, Y. Dynamic circular work-
stealing deque. In Proceedings of the 17th Annual
ization serves the purpose of guaran- tion threads residing on the same core ACM Symposium on Parallelism in Algorithms and
Architectures (2005). ACM Press, NY, 21−28.
teeing an even distribution of threads or machine cluster could use a single 7. Cypher, R. The communication requirements of
over the stacks, so that if there are stack quite effectively). mutual exclusion. In ACM Proceedings of the Seventh
Annual Symposium on Parallel Algorithms and
items to be popped, they will be found In the end, we gave up the stack’s Architectures (1995), 147-156.
quickly. Thus, our construction has LIFO ordering in the name of perfor- 8. Dwork, C., Herlihy, M. and Waarts, O. Contention in
shared memory algorithms. J. ACM 44, 6 (1997),
relaxed the specification by removing mance. I imagine we will have to do the 779−805.
the causal ordering on method calls same for other data structure classes. 9. Fich, F.E., Hendler, D. and Shavit, N. Linear lower
bounds on real-world implementations of concurrent
and replacing the deterministic live- For example, I would guess that search objects. In Proceedings of the 46th Annual IEEE
ness and complexity guarantees with structures will move away from being Symposium on Foundations of Computer Science
(2005).IEEE Computer Society, Washington, D.C.,
probabilistic ones. comparison based, allowing us to use 165−173.
As the reader can imagine, the O(1) hashing and similar naturally parallel 10. Gibbons, P.B., Matias, Y. and Ramachandran, V. The
queue-read queue-write PRAM model: Accounting for
step complexity does not tell the whole techniques, and that priority queues contention in parallel algorithms. SIAM J. Computing
story. Threads accessing the pool will will have a relaxed priority ordering 28, 2 (1999), 733−769.
11. Hendler, D., Shavit, N. and Yerushalmi, L. A scalable
tend to pop items that they them- in place of the strong one imposed by lock-free stack algorithm. J. Parallel and Distributed
selves recently pushed onto their own deleting the minimum key. I can’t wait Computing 70, 1 (Jan. 2010), 1−12.
12. Herlihy, M., Luchangco, V., Moir, M. and Scherer III,
designated stack, therefore exhibit- to see what these and other structures W.N. Software transactional memory for dynamic-
sized data structures. In Proceedings of the 22nd
ing good cache locality. Moreover, will look like. Annual Symposium on Principles of Distributed
since chances of a concurrent stealer As we go forward, we will also need Computing. ACM, NY, 2003, 92−101.
13. Herlihy, M. and Moss, E. Transactional memory:
are low, most of the time a thread ac- to take into account the evolution of architectural support for lock-free data structures.
cesses its lock-free stack alone. This hardware support for synchroniza- SIGARCH Comput. Archit. News 21, 2 (1993),
289−300.
observation allows designers to create tion. Today’s primary construct, the 14. Herlihy, M. and Shavit, N. The Art of Multiprocessor
a lockfree “stack-like” structure called CAS operation, works on a single Programming. Morgan Kaufmann, San Mateo, CA,
2008.
a Dequec that allows the frequently ac- memory location. Future architectures 15. Herlihy, M. and Wing, J. Linearizability: A correctness
cessing local thread to use only loads will most likely support synchroniza- condition for concurrent objects. ACM Trans.
Programming Languages and Systems 12, 3 (July
and stores in its methods, resorting tion techniques such as transactional 1990), 463−492.
to more expensive CAS based method memory,13,21 allowing threads to instan- 16. Moir, M., Nussbaum, D., Shalev, O. and Shavit, N.
Using elimination to implement scalable and lock-
calls only when chances of synchro- taneously read and write multiple loca- free fifo queues. In Proceedings of the 17th Annual
nization with a conflicting stealing tions in one indivisible step. Perhaps ACM Symposium on Parallelism in Algorithms and
Architectures. ACM Press, NY, 2005, 253−262.
thread are high.3,6 more important than the introduction 17. Moir, M. and Shavit, N. Concurrent data structures.
The end result is a pool implemen- of new features like transactional mem- Handbook of Data Structures and Applications, D.
Metha and S. Sahni, eds. Chapman and Hall/CRC
tation that is tailored to the costs of ory is the fact that the relative costs of Press, 2007, 47-14, 47-30.
the machine’s memory hierarchy and 18. Scherer III, W.N., Lea, D. and Scott, M.L. Scalable
synchronization and coherence are synchronous queues. In Proceedings of the 11th ACM
synchronization operations. The big likely to change dramatically as new SIGPLAN Symposium on Principles and Practice of
Parallel Programming. ACM Press, NY, 2006, 147−156.
hope is that as we go forward, many of generations of multicore chips role out. 19. Scherer III, W.N. and Scott, M.L. Advanced contention
these architecture-conscious optimiza- We will have to make sure to consider management for dynamic software transactional
memory. In Proceedings of the 24th Annual ACM
tions, which can greatly influence per- this evolution path carefully as we set Symposium on Principles of Distributed Computing.
formance, will move into the realm of our language and software develop- ACM, NY, 2005, 240−248.
20. Shavit, N. and Touitou, D. Elimination trees and the
compilers and concurrency libraries, ment goals. construction of pools and stacks. Theory of Computing
and the need for everyday program- Concurrent data structure design Systems 30 (1997), 645−670.
21. Shavit, N. and Touitou, D. Software transactional
mers to be aware of them will diminish. has, for many years, been moving for- memory. Distributed Computing 10, 2 (Feb. 1997),
ward at glacial pace. Multicore proces- 99−116.
22. Shavit, N. and Zemach, A. Diffracting trees. ACM
What Next? sors are about to heat things up, leav- Transactions on Computer Systems 14, 4 (1996),
The pool structure ended our se- ing us, the data structure designers 385−428.
23. Treiber, R.K. Systems programming: Coping with
quence of relaxations. I hope the read- and users, with the interesting job of parallelism. Technical Report RJ 5118 (Apr. 1986).
er has come to realize how strongly directing which way they flow. Let’s try IBM Almaden Research Center, San Jose, CA.
84 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
research highlights
p. 86 p. 87
Technical The Informatics Philharmonic
Perspective By Christopher Raphael
Concerto for Violin
and Markov Model
By Juan Bello, Yann LeCun,
and Robert Rowe
p. 94 p. 95
Technical VL2: A Scalable and Flexible
Perspective Data Center Network
VL2
By Jennifer Rexford By Albert Greenberg, James R. Hamilton, Navendu Jain,
Srikanth Kandula, Changhoon Kim, Parantap Lahiri,
David A. Maltz, Parveen Patel, and Sudipta Sengupta
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 85
research highlights
doi:10.1145/1897852.1 8 9 7 8 7 4
86 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
doi:10.1145/1897852 . 1 8 9 7 8 7 5
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 87
research highlights
88 comm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
3.1. The listen model Our data model is composed of three features bt( yt),
Our HMM approach views the audio data as a sequence of et(yt), st( yt) assumed to be conditionally independent given
“frames,” y1, y2, . . . , yT, with about 30 frames per second, the state:
while modeling these frames as the output of a hidden
Markov chain, x1, x2, . . . , xT. The state graph for the Markov P(bt,et,st|xt) = P(bt|xt) P(et|xt) P(st|xt).
chain, described in Figure 1, models the music as a sequence
of sub-graphs, one for each solo note, arranged so that the The first feature, bt, measures the local “burstiness” of
process enters the start of the (n + 1)th note as it leaves the the signal, particularly useful in distinguishing between
nth note. From the figure, one can see that each note begins note attacks and steady-state behavior—observe that we
with a short sequence of states meant to capture the attack distinguished between the attack portion of a note and
portion of the note. This is followed by another sequence of steady-state portion in Figure 1. The second feature, et, mea-
states with self-loops meant to capture the main body of the sures the local energy, useful in distinguishing between
note, and to account for the variation in note duration we rests and notes. By far, however, the vector-valued feature
may observe, as follows. st is the most important, as it is well-suited to making pitch
If we chain together m states which each either move discriminations, as follows.
forward, with probability p, or remain in the current state, We let fn denote the frequency associated with the nomi-
with probability q = 1 − p, then the total number of state vis- nal pitch of the nth score note. As with any quasi-periodic
its (audio frames), L, spent in the sequence of m states has a signal with frequency fn, we expect that the audio data from
negative binomial distribution the nth note will have a magnitude spectrum composed of
“peaks” at integral multiples of fn. This is modeled by the
Gaussian mixture model depicted in Figure 2
Note 1 p p p p p
atck atck sust sust sust sust sust sust
m p
start1
0.2
Note 2
start2
0.0
Note 3
start3 0 200 400 600 800 1000
etc. Frequency (Hz)
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 89
research highlights
st as the histogram of a random sample of size C. Thus our the “forward” probabilities, p(xt|y1, . . ., yt), for our current
data model becomes the multinomial distribution frame, t. Listen waits to detect note n until we are sufficiently
confident that its onset is in the past. That is, until
(1)
P(xt ³ startn|y1, . . . , yt) ³ t
It is worth noting that the model generalizes in a straight- for some constant, t. In this expression, startn represents the
forward way to situations in which multiple pitches sound initial state of the nth note model, as indicated in Figure 1,
at once, simply by mixing several distributions of the forms which is either before, or after all other states in the model (xt ³
of Equation 3.1. In this way our approach accommodates startn makes sense here). Suppose that t* is the first frame where
anything from double stops on the violin to large ensemble the above inequality holds. When this occurs, our knowledge of
performances. the note onset time can be summarized by the function of t:
This modeling approach describes the part of the audio
spectrum due to the soloist reasonably well. However, our P(xt = startn|y1, . . . , yt *)
actual signal will receive not only this solo contribution,
but also audio generated by our accompaniment system which we compute using the forward–backward algorithm.
itself. If the accompaniment audio contains frequency Occasionally this distribution conveys uncertainty about the
content that is confused with the solo audio, the result onset time of the note, say, for instance, if it has high vari-
is the highly undesirable possibility of the accompani- ance or is bimodal. In such a case we simply do not report
ment system following itself—in essence, chasing its own the onset time of the particular note, believing it is better to
shadow. To a certain degree, the likelihood of this outcome remain silent than provide bad information. Otherwise, we
can be diminished by “turning off” the score follower when estimate the onset as
the soloist is not playing; of course we do this. However,
there is still significant potential for shadow-chasing t̂n = arg max P(xt = startn|y1, . . . , yt*) (2)
*
since the pitch content of the solo and accompaniment t £ t
parts is often similar.
Our solution is to directly model the accompaniment and deliver this information to the Predict module.
contribution to the audio signal we receive. Since we know Several videos demonstrating the ability of our score
what the orchestra is playing (our system generates this following can be seen at the aforementioned web site. One of
audio), we add this contribution to the data model. More these simply plays the audio while highlighting the locations
explicitly, if qt is the magnitude spectrum of the orchestra’s of note onset detections at the times they are made, thus
contribution in frame t, we model the conditional distribu- demonstrating detection latency—what one sees lags slightly
tion of st using Equation 1, but with pt,n = λpn + (1 – λ)qt for 0 behind what one hears. A second video shows a rather eccen-
< λ < 1 instead of pn. tric performer who ornaments wildly, makes extreme tempo
This addition creates significantly better results in many changes, plays wrong notes, and even repeats a measure,
situations. The surprising difficulty in actually implement- thus demonstrating the robustness of the score follower.
ing the approach, however, is that there seems to be only
weak agreement between the known audio that our system 4. PREDICT: MODELING MUSICAL TIMING
plays through the speakers and the accompaniment audio As discussed in Section 2, we believe a purely responsive
that comes back through the microphone. Still, with various accompaniment system cannot achieve acceptable coor-
averaging tricks in the estimation of qt, we can nearly elimi- dination of parts in the range of common practice “clas-
nate the undesirable shadow-chasing behavior. sical” music we treat, thus we choose to schedule our
accompaniment through prediction rather than response.
3.2. Online interpretation of audio Our approach is based on a probabilistic model for musi-
One of the worst things a score follower can do is report cal timing. In developing this model, we begin with three
events before they have occurred. In addition to the sheer important traits we believe such a model must have.
impossibility of producing accurate estimates in this case,
the musical result often involves the accompanist arriv- 1. Since our accompaniment must be constructed in real
ing at a point of coincidence before the soloist does. When time, the computational demand of our model must
the accompanist “steps on” the soloist in this manner, the be feasible in real time.
soloist must struggle to regain control of the performance, 2. Our system must improve with rehearsal. Thus our
perhaps feeling desperate and irrelevant in the process. model must be able to automatically train its parame-
Since the consequences of false positives are so great, the ters to embody the timing nuances demonstrated by
score follower must be reasonably certain that a note event the live player in past examples. This way our system
has already occurred before reporting its location. The prob- can better anticipate the future musical evolution of
abilistic formulation of online score following is the key to the current performance.
avoiding such false positives, while navigating the accuracy- 3. If our rehearsals are to be successful in guiding the
latency trade-off in a reasonable manner. system toward the desired musical end, the system
Every time we process a new frame of audio we recompute must “sightread” (perform without rehearsal) reason-
90 co mm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
ably well. Otherwise, the player will become distracted Figure 3. Top: Two musical parts generate a composite rhythm
by the poor ensemble and not be able to demonstrate when superimposed. Bot: The resulting graphical model arising
what he or she wants to hear. Thus there must be a from the composite rhythm.
neutral setting of parameters that allows the system to
3 3
perform reasonably well “out of the box.”
sn + 1 = sn + sn (3)
Listen
tn + 1 = tn + lnsn + tn (4)
Updates
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 91
research highlights
solo notes culminating in a point of coincidence with the evolves. This is our only “control” the orchestra performance.
orchestra. As each solo note is detected we refine our esti- After one or more “rehearsals,” we adapt our timing model
mate of the desired point of coincidence, thus gradually to the soloist to better anticipate future performances. To do
“honing in” on this point of arrival. It is worth noting that this, we first perform an off-line estimate of the solo note times
very little harm is done when Listen fails to detect a solo using Equation 2, only conditioning on the entire sequence of
note. We simply predict the pending orchestra note condi- frames, y1, . . . , yT, using the forward–backward algorithm to
tioning on the variables we have observed. identify the most likely onset time for each note. Using one or
The web page given before contains a video demonstrating more such rehearsals, we can iteratively reestimate the model
this process. The video shows the estimated solo times from parameters {mn} using the EM algorithm, resulting in both
our score follower appearing as green marks on a spectro- measurable and perceivable improvement of prediction accu-
gram. Predictions of our accompaniment system are shown racy. While, in principle, we can also estimate the {Gn} param-
as analogous red marks. One can see the pending orchestra eters, we have observed little or no benefit from doing so.
time “jiggling” as new solo notes are estimated, until finally In practice, we have found the soloist’s interpretation to
the currently predicted time passes. In the video, one can see be something of a “moving target.” At first this is because the
occasional solo notes that are never marked with green lines. soloist tends to compromise somewhat in the initial rehears-
These are notes for which the posterior onset time was not suf- als, pulling the orchestra in the desired direction, while not
ficiently peaked to merit a note detection. This happens most actually reaching the target interpretation. But even after the
often with repeated pitches, for which our data model is less soloist seems to settle down to a particular interpretation on
informative, and notes following longer notes, where our prior a given day, we often observe further “interpretation drift”
model is less opinionated. We simply treat such notes as unob- over subsequent meetings. Of course, without this drift one’s
served and base our predictions only on the observed events. ideas could never improve! For this reason we train the model
The role of Predict is to “schedule” accompaniment notes, using the most recent several rehearsals, thus facilitating the
but what does this really mean in practice? Recall that our continually evolving nature of musical interpretation.
program plays audio by phase-vocoding (time-stretching) an
orchestra-only recording. A time-frequency representation 5. MUSICAL EXPRESSION AND MACHINE LEARNING
of such an audio file for the first movement of the Dvor̂ák Our system learns its musicality through “osmosis.” If the
Cello concerto is shown in Figure 4. If you know the piece, soloist plays in a musical way, and the orchestra manages
you will likely be able to follow this spectrogram. In prepar- to closely follow the soloist, then we hope the orchestra will
ing this audio for our accompaniment system, we perform inherit this musicality. This manner of learning by imita-
an off-line score alignment to determine where the various tion works well in the concerto setting, since the division
orchestra notes occur, as marked with vertical lines in the of authority between the players is rather extreme, mostly
figure. Scheduling a note simply means that we change the granting the “right of way” to the soloist.
phase-vocoder’s play rate so that it arrives at the appropri- In contrast, the pure following approach is less reasonable
ate audio file position (vertical line) at the scheduled time. when the accompaniment needs a sense of musicality that
Thus the play rate is continually modified as the performance acts independently, or perhaps even in opposition, to what
Figure 4. A “spectrogram” of the opening of the first movement of the Dvor̂ák Cello concerto. The horizontal axis of the figure represents
time while the vertical axis represents frequency. The vertical lines show the note times for the orchestra.
92 comm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
other players do. Such a situation occurs with the early-stage understanding of the musical meaning, on which the inter-
accompaniment problem discussed in Section 1, as here pretive decisions are based. This meaning comes from sev-
one cannot learn the desired musicality from the live player. eral different aspects of the music. For example, some comes
Perhaps the accompaniment antithesis of the concerto set- from musical structure, as in the way one might slow down at
ting is the opera orchestra, in which the “accompanying” the end of a phrase, giving a sense of musical closure. Some
ensemble is often on equal footing with the soloists. We meaning comes from prosodic aspects, analogous to speech,
observed the nadir of our system’s performance in an opera such as a local point of arrival, which may be maybe empha-
rehearsal where our system served as rehearsal pianist. What sized or delayed. A third aspect of meaning describes an over-
these two situations have in common is that they require all character or affect of a section of music, such as excited or
an accompanist with independent musical knowledge and calm. While there is no official taxonomy of musical inter-
goals. pretation, most discussions on this subject revolve around
How can we more intelligently model this musicality? An intermediate identifications of this kind, and the interpretive
incremental approach would begin by observing that our actions they require.10
timing model of Equations 3 and 4 is over-parametrized, From the machine learning point of view, it is impossible
with more degrees of freedom than there are notes. We make to learn anything useful from a single example, thus one must
this modeling choice because we do not know which degrees group together many examples of the same musical situation
of freedom are needed ahead of time, so we use the train- in order to learn their associated interpretive actions. Thus it
ing data from the soloist to help sort this out. Unnecessary seems natural to model the music in terms of some latent vari-
learned parameters may contribute some noise to the result- ables that implicitly categorize individual notes or sections of
ing timing model, but the overall result is acceptable. music. What should the latent variables be, and how can one
One possible line of improvement is simply decreasing describe the dependency structure among them? While we can-
the model’s freedom—surely the player does not wish to not answer these questions, we see in them a good deal of depth
change the tempo and apply tempo-independent note length and challenge, and recommend this problem to the musically
variation on every note. For instance, one alternative model inclined members of the readership with great enthusiasm.
adds a hidden discrete process that “chooses,” for each note,
between three possibilities: variation of either tempo or note Acknowledgments
length, or no variation of either kind. Of these, the choice of This work was supported by NSF Grants IIS-0812244 and
neither variation would be the most likely a priori, thus bias- IIS-0739563.
ing the model toward simpler musical interpretations. The References
resulting model is a Switching Kalman Filter.15 While exact 1. Cemgil, A.T., Kappen, H.J., Barber, D. systems. Inf. Process. Soc. Jpn. SIG
A generative model for music Notes, 123 (2002), 1–6.
inference is no longer possible with such a model, we expect transcription. IEEE Trans. Audio Speech 9. Pachet, F. Beyond the cybernetic jam
that one can make approximations that will be good enough Lang. Process. 14, 2 (Mar. 2006), 679–694. fantasy: The continuator. IEEE Comput.
2. Cont, A., Schwarz, D., Schnell, N. Graph. Appl. 24, 1 (2004), 31–35.
to realize the full potential of the model. From Boulez to ballads: Training 10. Palmer, C. Music performance. Annu.
Perhaps a more ambitious approach analyzes the musical ircam’s score follower. In Proceedings Rev. Psychol. 48 (1997), 115–138.
of the International Computer Music 11. Raphael, C. A Bayesian network for
score itself to choose the locations requiring degrees of free- Conference (2005), 241–248. real-time musical accompaniment.
3. Dannenberg, R., Mont-Reynaud, B. In Advances in Neural Information
dom. One can think of this approach as adding “joints” to the Following an improvisation in real Processing Systems (NIPS) 14. MIT
musical structure so that it deforms into musically reason- time. In Proceedings of the 1987 Press, 2002.
International Computer Music 12. Rowe, R. Interactive Music Systems.
able shapes as a musician applies external force. Here there Conference (1987), 241–248. MIT Press, 1993.
is an interesting connection with the work on expressive 4. Dannenberg, R., Mukaino, H. New tech 13. Sagayama, T.N.S., Kameoka, H.
niques for enhanced quality of computer Specmurt anasylis: A piano-roll-
synthesis, such as Widmer and Goebl,16 in which one algo- accompaniment. In Proceedings of the visualization of polyphonic music
rithmically constructs an expressive rendition of a previously 1988 International Computer Music signal by deconvolution of log-
Conference (1988), 243–249. frequency spectrum. In Proceedings
unseen piece of music, using ideas of machine learning. One 5. Flanagan, J.L., Golden, R.M. Phase 2004 ISCA Tutorial and Research
approach here associates various score situations, defined vocoder. Bell Syst. Tech. J. 45 Workshop on Statistical and Perceptual
(Nov. 1966), 1493–1509. Audio Processing (SAPA2004) (2004).
in terms of local configurations of score features, with 6. Franklin, J. Improvisation and 14. Schwarz, D. Score following
interpretive actions. The associated interpretive actions are learning. In Advances in Neural commented bibliography, 2003.
Information Processing Systems 14. 15. Shumway, R.H., Stoffer, D.S. Dynamic
learned by estimating timing and loudness parameters from MIT Press, Cambridge, MA, 2002. linear models with switching. J. Am.
a performance corpus, over all “equivalent” score locations. 7. Klapuri, A., Davy, M. (editors). Signal Stat. Assoc. 86 (1991), 763–769.
Processing Methods for Music Transcrip 16. Widmer, G., Goebl, W. Computational
Such approaches are far more ambitious than our present tion. Springer-Verlag, New York, 2006. models for expressive music
approach to musicality, as they try to understand expression 8. Lippe, C. Real-time interaction among performance: The state of the art.
composers, performers, and computer J. New Music Res. 33, 3 (2004), 203–216
in general, rather than in a specific musical context.
The understanding and synthesis of musical expression Christopher Raphael (craphael@indiana.
edu), School of Informatics and Computing,
is one of the most interesting music-science problems, and Indiana University, Bloomington, IN.
while progress has been achieved in recent years, it would
still be fair to call the problem “open.” One of the principal
challenges here is that one cannot directly map observable
surface-level attributes of the music, such as pitch contour
or local rhythm context, into interpretive actions, such as
delay, or tempo or loudness change. Rather, there is a murky
intermediate level in which the musician comes to some © 2011 ACM 0001-0782/11/0300 $10.00
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 93
research highlights
doi:10.1145/1897852.1 8 9 7 8 7 6
Technical Perspective
VL2
By Jennifer Rexford
T h e I n t e r n e t i s increasingly a plat- thing in networking—from address- interference with the many other ser-
form for online services—such as ing and congestion control to routing vices running in the same data center.
email, Web search, social networks, and the underlying topology—with the They achieve this goal through several
and virtual worlds—running on rack unique needs of data centers in mind. key design decisions, including flat
after rack of servers in data centers. The following paper presents one addressing (so service instances can
The servers not only communicate of the first measurement studies of run on any server, independent of its
with end users, but also with each oth- network traffic in data centers, high- location) and Valiant Load Balancing
er to analyze data (for example, to build lighting specifically the volatility of (to spread traffic uniformly over the
a search index) or compose Web pages the traffic even on a relatively small network). A Clos topology ensures the
(for example, by combining data from timescale. These observations led the network has many paths between each
multiple backend servers). With the authors to design an “agile” network pair of servers. To scale to large data
advent of large data centers, the study engineered for all-to-all connectiv- centers, the servers take responsibility
of the networks that interconnect these ity with no contention inside the net- for translating addresses to the appro-
servers has become an important topic work. This gives data-center operators priate “exit point” from the network,
to researchers and practitioners alike. the freedom to place applications on obviating the need for the networking
Data-center networking presents any servers, without concern for the equipment to keep track of the many
unique opportunities and challenges, performance of the underlying net- end hosts in the data center.
compared to traditional backbone and work. Having an agile network greatly In addition to proposing an effec-
enterprise networks: simplifies the task of designing and tive design, the authors illustrate how
˲˲ In a data center, the same compa- running online services. to build the solution using mecha-
ny controls both the servers and the net- More generally, the authors pro- nisms available in existing network
work elements, enabling new network pose a simple abstraction—a single switches (for example, equal-cost
architectures that implement key func- “virtual” layer-two switch (hence the multipath routing, IP anycast, and
tionality on the end-host computers. name “VL2”) for each service, with no packet encapsulation). This allows
˲˲ Servers are installed in fixed units, data centers to deploy VL2 with no
such as racks or even trucks filled with changes to the underlying switches,
racks driven directly into the data cen- substantially lowering the barrier for
ter. This leads to very uniform wiring This paper is a great practical deployment. This paper is a
topologies, such as fat trees or Clos example of rethinking great example of rethinking network-
networks, reminiscent of the mas- ing from scratch, while coming full
sively parallel computers designed in networking from circle to work with today’s equipment.
the 1990s. scratch, while coming Indeed, the work depicted in the VL2
˲˲ The traffic load in data centers is paper has already spawned substan-
often quite heavy and non-uniform, full circle to work with tial follow-up work in the networking
due to new backend applications like today’s equipment. research community, and likely will
MapReduce; the traffic can also be for years to come.
quite volatile, varying dramatically and
unpredictably over time. Jennifer Rexford is a professor in the Department of
Computer Science at Princeton University, Princeton, NJ.
In light of these new characteristics,
researchers have been revisiting every- © 2011 ACM 0001-0782/11/0300 $10.00
94 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
doi:10.1145/1897852 . 1 8 9 7 8 7 7
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 95
research highlights
ware should be able to assign any IP address the service the actual location of the destination and then tunnels the
requests to any server, and virtual machines should be original packet there. The shim layer also helps eliminate
able to migrate to any server while keeping the same IP the scalability problems created by ARP in layer-2 networks,
address. Finally, features like link-local broadcast, on and the tunneling improves our ability to implement VLB.
which many legacy applications depend, should work. These aspects of the design enable VL2 to provide layer-2
semantics—eliminating the fragmentation and waste of
We design, implement, and evaluate VL2, a network server pool capacity that the binding between addresses
architecture for data centers that meets these three objec- and locations causes in the existing architecture.
tives and thereby achieves agility. Contributions: In the course of this paper, we describe
Design philosophy: In designing VL2, a primary goal the current state of data center networks and the traffic
was to create a network architecture that could be deployed across them, explaining why these are important to design-
today, so we limit ourselves from making any changes to the ing a new architecture. We present VL2’s design, which we
hardware of the switches or servers, and we require that leg- have built and deployed into an 80-server cluster. Using
acy applications work unmodified. Our approach is to build the cluster, we experimentally validate that VL2 has the
a network that operates like a very large switch—choosing properties set out as objectives, such as uniform capac-
simplicity and high performance over other features when ity and performance isolation. We also demonstrate the
needed. We sought to use robust and time-tested control speed of the network, such as its ability to shuffle 2.7TB
plane protocols, and we avoid adaptive routing schemes that of data among 75 servers in 395s (averaging 58.8Gbps).
might theoretically offer more bandwidth but open thorny Finally, we describe our experience applying VLB in a new
problems that might not need to be solved and would take context, the inter-switch fabric of a data center, and show
us away from vanilla, commodity, high-capacity switches. that VLB smooths utilization while eliminating persistent
We observe, however, that the software and operating sys- congestion.
tems on data center servers are already extensively modified
(e.g., to create hypervisors for virtualization or blob file sys- 2. BACKGROUND
tems to store data across servers). Therefore, VL2’s design In this section, we first explain the dominant design pattern
explores a new split in the responsibilities between host and for data center architecture today.5 We then discuss why this
network—using a layer 2.5 shim in servers’ network stack architecture is insufficient to serve large cloud-service data
to work around limitations of the network devices. No new centers.
switch software or switch APIs are needed. As shown in Figure 1, the network is a hierarchy reach-
Topology: VL2 consists of a network built from low- ing from a layer of servers in racks at the bottom to a layer
cost switch ASICs arranged into a Clos topology that pro- of core routers at the top. There are typically 20–40 servers
vides extensive path diversity between servers. This design per rack, each singly connected to a Top of Rack (ToR) switch
replaces today’s mainframe-like large, expensive switches with a 1Gbps link. ToRs connect to two aggregation switches
with broad layers of low-cost switches that can be scaled out for redundancy, and these switches aggregate further con-
to add more capacity and resilence to failure. In essence, necting to access routers. At the top of the hierarchy, core
VL2 applies the principles of RAID (redundant arrays of routers carry traffic between access routers and manage
inexpensive disks) to the network. traffic into and out of the data center. All links use Ethernet
Traffic engineering: Our measurements show data cen- as a physical-layer protocol, with a mix of copper and fiber
ters have tremendous volatility in their workload, their traf- cabling. All switches below each pair of access routers form
fic, and their failure patterns. To cope with this volatility a single layer-2 domain. The number of servers in a single
in the simplest manner, we adopt Valiant Load Balancing
(VLB) to spread traffic across all available paths without any Figure 1. A conventional network architecture for data centers
centralized coordination or traffic engineering. Using VLB, (adapted from figure by Cisco5).
each server independently picks a path at random through
the network for each of the flows it sends to other servers Internet Internet
CR CR
in the data center. Our experiments verify that using this
design achieves both uniform high capacity and perfor- Data Center
Layer 3
mance isolation. AR AR ... AR AR
Control plane: The switches that make up the network
operate as layer-3 routers with routing tables calculated Layer 2 AS AS
by OSPF, thereby enabling the use of multiple paths while Key
using a time-tested protocol. However, the IP addresses • CR = L3 Core Router
• AR = L3 Access Router
used by services running in the data center must not be s s s s ...
• AS = L2 Aggr Switch
tied to particular switches in the network, or the ability for • S = L2 Switch
agile reassignment of servers between services would be ToR ToR ToR ToR
• ToR = Top-of-Rack Switch
lost. Leveraging a trick used in many systems,9 VL2 assigns Servers ... Servers Servers ... Servers
96 co mmunications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
layer-2 domain is typically limited to a few hundred due much data to whom and when?) and churn (how often does
to Ethernet scaling overheads (packet flooding and ARP the state of the network change due to switch/link failures
broadcasts). To limit these overheads and to isolate differ- and recoveries, etc.?). We studied the production data
ent services or logical server groups (e.g., e-mail, search, centers of a large cloud service provider, and we use the
web front ends, web back ends), servers are partitioned into results to drive our choices in designing VL2. Details of the
virtual LANs (VLANs) placed into distinct layer-2 domains. methodology and results can be found in other papers.10, 16
Unfortunately, this conventional design suffers from three Here we present the key findings that directly impact the
fundamental limitations: design of VL2.
Limited server-to-server capacity: As we go up the Most traffic is internal to the data center: The ratio of traf-
hierarchy, we are confronted with steep technical and fic volume between servers in our data centers to traffic
financial barriers in sustaining high bandwidth. Thus, as entering/leaving our data centers is currently around 4:1
traffic moves up through the layers of switches and rout- (excluding CDN applications). An increasing fraction of
ers, the oversubscription ratio increases rapidly. For exam- the computation in data centers involves back-end com-
ple, servers typically have 1:1 oversubscription to other putations, and these are driving the demands for network
servers in the same rack—that is, they can communicate bandwidth.
at the full rate of their interfaces (e.g., 1 Gbps). We found The network bottlenecks computation: Data center compu-
that uplinks from ToRs are typically 1:2 to 1:20 oversub- tation is focused where high-speed access to data on mem-
scribed (i.e., 1–10 Gbps of uplink for 20 servers), and paths ory or disk is fast and cheap. Even inside a single data center,
through the highest layer of the tree can be 1:240 oversub- the network is a bottleneck to computation—we frequently
scribed. This large oversubscription factor fragments the see switches whose uplinks are above 80% utilization.
server pool by preventing idle servers from being assigned Intense computation and communication on data does not
to overloaded services, and it severely limits the entire straddle data centers due to the cost of long-haul links.
data center’s performance. Structured flow sizes: Figure 2 illustrates the nature of flows
Fragmentation of resources: As the cost and per- within the monitored data center. The flow size statistics
formance of communication depends on distance in (marked as “+”s) show that the majority of flows are small (a
the hierarchy, the conventional design encourages ser- few KB); most of these small flows are hellos and meta-data
vice planners to cluster servers nearby in the hierarchy. requests to the distributed file system. To examine longer
Moreover, spreading a service outside a single layer-2 flows, we compute a statistic termed total bytes (marked
domain frequently requires the onerous task of reconfigur- as “o”s) by weighting each flow size by its number of bytes.
ing IP addresses and VLAN trunks, since the IP addresses Total bytes tells us, for a random byte, the distribution of the
used by servers are topologically determined by the access flow size it belongs to. Almost all the bytes in the data cen-
routers above them. Collectively, this contributes to the ter are transported in flows whose lengths vary from about
squandering of computing resources across the data cen- 100MB to about 1GB. The mode at around 100MB springs
ter. The consequences are egregious. Even if there is plen- from the fact that the distributed file system breaks long files
tiful spare capacity throughout the data center, it is often into 100-MB-long chunks. Importantly, there are almost no
effectively reserved by a single service (and not shared), so flows over a few GB.
that this service can scale out to nearby servers to respond Figure 3 shows the probability density function (as
rapidly to demand spikes or to failures. In fact, the growing a fraction of time) for the number of concurrent flows
resource needs of one service have forced data center oper- going in and out of a machine. There are two modes. More
ations to evict other services in the same layer-2 domain,
incurring significant cost and disruption. Figure 2. Mice are numerous; 99% of flows are smaller than 100MB.
Poor reliability and utilization: Above the ToR, the However, more than 90% of bytes are in flows between 100MB and 1GB.
basic resilience model is 1:1. For example, if an aggrega- 0.45
tion switch or access router fails, there must be sufficient 0.4 Flow Size PDF
0.35 Total Bytes PDF
remaining idle capacity on the counterpart device to carry 0.3
the load. This forces each device and link to be run up to 0.25
PDF
0.2
at most 50% of its maximum utilization. Inside a layer-2 0.15
domain, use of the Spanning Tree Protocol means that 0.1
0.05
even when multiple paths between switches exist, only 0
a single one is used. In the layer-3 portion, Equal Cost 1 100 10000 1e+06 1e+08 1e+10 1e+12
Multipath (ECMP) is typically used: when multiple paths of Flow Size (Bytes)
the same length are available to a destination, each router
1
uses a hash function to spread flows evenly across the avail- 0.8 Flow Size CDF
able next hops. However, the conventional topology offers 0.6 Total Bytes CDF
CDF
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 97
research highlights
Figure 3. Number of concurrent connections has two modes: (1) 10 theory, ensures a noninterfering packet-switched network6
flows per node more than 50% of the time and (2) 80 flows per node (the counterpart of a non-blocking circuit-switched net-
for at least 5% of the time. work) as long as (a) traffic spreading ratios are uniform, and
0.04 1
(b) the offered traffic patterns do not violate edge constraints
PDF (i.e., line card speeds). We use ECMP to pursue the former
Fraction of Time
Cumulative
and TCP’s end-to-end congestion control to pursue the lat-
0.6
0.02 ter. While these design choices do not perfectly ensure the
0.4 two assumptions (a and b), we show in Section 5.1 that our
0.01 0.2 scheme’s performance is close to the optimum in practice.
0 0 Building on proven networking technology: VL2 is based
1 10 100 1000
on IP routing and forwarding technologies already avail-
Number of Concurrent flows in/out of each Machine able in commodity switches: link-state routing, ECMP for-
warding, and IP any-casting. VL2 uses a link-state routing
protocol to maintain the switch-level topology, but not to
than 50% of the time, an average machine has about ten disseminate end hosts’ information. This strategy protects
concurrent flows, but at least 5% of the time it has greater switches from needing to learn voluminous, frequently
than 80 concurrent flows. We almost never see more than changing host information. Furthermore, the routing
100 concurrent flows. design uses ECMP forwarding along with anycast addresses
The distributions of flow size and number of concurrent to enable VLB while minimizing control plane messages
flows both imply that flow-based VLB will perform well on and churn.
this traffic. Since even big flows are only 100MB (1s of trans- Separating names from locators: To be able to rapidly
mit time at 1Gbps), randomizing at flow granularity (rather grow or shrink server allocations and rapidly migrate
than packet) will not cause perpetual congestion if there is VMs, the data center network must support agility, which
unlucky placement of too many flows in the same link. means support hosting any service on any server. This,
Volatile traffic patterns: While the sizes of flows show a in turn, calls for separating names from locations. VL2’s
strong pattern, the traffic patterns inside a data center are addressing scheme separates servers’ names, termed
highly divergent. When we cluster the traffic patterns, we application-specific addresses (AAs), from their loca-
find that more than 50 representative patterns are required tions, termed location-specific addresses (LAs). VL2 uses
to describe the traffic in the data center. Further, the traf- a scalable, reliable directory system to maintain the map-
fic pattern varies frequently—60% of the time the network pings between names and locators. A shim layer running
spends only 100 s in one pattern before switching to another. in the networking stack on every host, called the VL2 agent,
Frequent failures: As discussed in Section 2, conventional invokes the directory system’s resolution service.
data center networks apply 1 + 1 redundancy to improve reli- Embracing end systems: The rich and homogeneous
ability at higher layers of the hierarchical tree. This hierar- programmability available at data center hosts provides a
chical topology is intrinsically unreliable—even with huge mechanism to rapidly realize new functionality. For exam-
effort and expense to increase the reliability of the network ple, the VL2 agent enables fine-grained path control by
devices close to the top of the hierarchy, we still see failures adjusting the randomization used in VLB. The agent also
on those devices resulting in significant downtime. In 0.3% replaces Ethernet’s ARP functionality with queries to the
of failures, all redundant components in a network device VL2 directory system. The directory system itself is also
group became unavailable (e.g., the pair of switches that realized on regular servers, rather than switches, and thus
comprise each node in the conventional network (Figure 1) offers flexibility, such as fine-grained access control between
or both the uplinks from a switch). The main causes of fail- application servers.
ures are network misconfigurations, firmware bugs, and
faulty components. 4.1. Scale-out topologies
With no obvious way to eliminate failures from the top As described in Sections 2 and 3, conventional hierarchical
of the hierarchy, VL2’s approach is to broaden the top levels data center topologies have poor bisection bandwidth and
of the network so that the impact of failures is muted and are susceptible to major disruptions due to device failures.
performance degrades gracefully, moving from 1 + 1 redun- Rather than scale up individual network devices with more
dancy to n + m redundancy. capacity and features, we scale out the devices—building a
broad network offering huge aggregate capacity using a large
4. VIRTUAL LAYER 2 NETWORKING number of simple, inexpensive devices, as shown in Figure 4.
Before detailing our solution, we briefly discuss our design This is an example of a folded Clos network6 where the links
principles and preview how they will be used in our design. between the intermediate switches and the aggregation
Randomizing to cope with volatility: The high divergence switches form a complete bipartite graph. As in the conven-
and unpredictability of data center traffic matrices suggest tional topology, ToRs connect to two aggregation switches,
that optimization-based approaches to traffic engineering but the large number of paths between any two aggregation
risk congestion and complexity to little benefit. Instead, switches means that if there are n intermediate switches, the
VL2 uses VLB: destination-independent (e.g., random) failure of any one of them reduces the bisection bandwidth
traffic spreading across the paths in the network. VLB, in by only 1/n—a desirable property we call graceful degradation
associated with an LA, the IP address of the ToR switch to S(20.0.0.55) D(20.0.0.56)
which the application server is connected. The ToR switch
IP subnet with AAs (20/8) IP subnet with AAs (20/8)
need not be physical hardware—it could be a virtual switch
or hypervisor implemented in software on the server itself!
ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 99
research highlights
spot–free performance for arbitrary traffic matrices, VL2 require high throughput and low response time to quickly
uses VLB as its traffic engineering philosophy. As illustrated establish a large number of connections. Since lookups
in Figure 5, VL2 achieves VLB using a combination of ECMP replace ARP, their response time should match that of ARP,
routing implemented by the switches and packet encapsu- that is, tens of milliseconds. For updates, however, the work-
lation implemented by the shim on each server. ECMP, a load is driven by server-deployment events, most of which are
mechanism already implemented in the hardware of most planned ahead by the data center management system and
switches, will distribute flows across the available paths in hence can be batched. The key requirement for updates is
the network, with the packets with the same source and des- reliability, and response time is less critical.
tination address taking the same path to avoid packet reor- Our directory service replaces ARP in a conventional L2
dering. To leverage all the available paths in the network and network, and ARP ensures eventual consistency via timeout
overcome some limitations in ECMP, the VL2 agent on each and broadcasting. This implies that eventual consistency of
sender encapsulates each packet to an intermediate switch. AA-to-LA mappings is acceptable as long as we provide a reli-
Hence, the packet is first delivered to one of the intermedi- able update mechanism. Nonetheless, we intend to support
ate switches, decapsulated by the switch, delivered to the live VM migration in a VL2 network; our directory system
ToR’s LA, decapsulated again, and finally sent to the des- should be able to correct all the stale entries without break-
tination server. The source address in the outer headers of ing any ongoing communications.
the encapsulated packet is set to a hash of the inner packet’s The differing performance requirements and workload
addresses and ports—this provides additional entropy to patterns of lookups and updates lead us to a two-tiered direc-
better distribute flows between the same servers across the tory system architecture consisting of (1) a modest number
available paths. (50–100 servers for 100 K servers) of read-optimized, repli-
One potential issue for both ECMP and VLB is the chance cated lookup servers that cache AA-to-LA mappings and that
that uneven flow sizes and random spreading decisions will communicate with VL2 agents, and (2) a small number (5–10
cause transient congestion on some links. Our evaluation servers) of write-optimized, asynchronous replicated state-
did not find this to be a problem on data center workloads machine (RSM) servers offering a strongly consistent, reli-
(Section 5), but should it occur, the VL2 agent on the sender able store of AA-to-LA mappings. The lookup servers ensure
can detect and deal with it via simple mechanisms. For low latency, high throughput, and high availability for a high
example, it can change the hash used to create the source lookup rate. Meanwhile, the RSM servers ensure strong con-
address periodically or whenever TCP detects a severe sistency and durability for a modest rate of updates using
congestion event (e.g., a full window loss) or an Explicit the Paxos19 consensus algorithm.
Congestion Notification. Each lookup server caches all the AA-to-LA mappings
stored at the RSM servers and independently replies to
4.3. Maintaining host information using lookup queries from agents using the cached state. Since
the VL2 directory system strong consistency is not required, a lookup server lazily
The VL2 directory system provides two key functions: synchronizes its local mappings with the RSM every 30s. To
(1) lookups and updates for AA-to-LA mappings and (2) a achieve high availability and low latency, an agent sends a
reactive cache update mechanism that ensures eventual query to k (two in our prototype) randomly chosen lookup
consistency of the mappings with very little update overhead servers and simply chooses the fastest reply. Since AA-to-LA
(Figure 6). mappings are cached at lookup servers and at VL2 agents’
We expect the lookup workload for the directory system cache, an update can lead to inconsistency. To resolve incon-
to be frequent and bursty because servers can communicate sistency, the cache-update protocol leverages a key observa-
with up to hundreds of other servers in a short time period, tion: a stale host mapping needs to be corrected only when
with each new flow generating a lookup for an AA-to-LA map- that mapping is used to deliver traffic. Specifically, when a
ping. The bursty nature of workload implies that lookups stale mapping is used, some packets arrive at a stale LA—a
ToR which does not host the destination server anymore.
Figure 6. VL2 Directory System Architecture. The ToR forwards such non-deliverable packets to a lookup
server, triggering the lookup server to correct the stale
RSM mapping in the source’s cache via unicast.
RSM
3. Replicate
Servers
5. EVALUATION
RSM RSM
In this section, we evaluate VL2 using a prototype running on
2. Set 4. Ack an 80-server testbed and 10 commodity switches (Figure 7).
(6. Disseminate)
Our goals are first to show that VL2 can be built from compo-
... ... ... Directory nents available today, and second, that our implementation
DS DS DS
Servers
meets the objectives described in Section 1.
2. Reply 2. Reply
1. Lookup
5. Ack The testbed is built using the Clos network topology of
1. Update
Figure 4, consisting of three intermediate switches, three
Agent Agent aggregation switches, and 4 ToRs. The aggregation and
“Lookup” “Update” intermediate switches have 24 10 Gbps Ethernet ports, of
which 6 ports are used on each aggregation switch and 3
100 co mm unications of t h e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
Figure 7. VL2 testbed comprising 80 servers and 10 switches. Figure 8. Aggregate goodput during a 2.7TB shuffle among 75
servers.
Active flows
40 4000
30 Aggregate goodput 3000
20 Active flows 2000
10 1000
0 0
0 50 100 150 200 250 300 350 400
Time (s)
In this experiment, we add two services to the network. handle 50K lookup/second with latency under 10ms (99th
The first service has a steady network workload, while the percentile latency). Second, the directory system can handle
workload of the second service ramps up and down. Both updates at rates significantly higher than the expected
the services’ servers are intermingled among the 4 ToRs, churn rate in typical environments: three directory servers
so their traffic mixes at every level of the network. Figure can handle 12K updates/s within 600ms (99th percentile
9 shows the aggregate goodput of both services as a func- latency). Third, our system is incrementally scalable: each
tion of time. As seen in the figure, there is no perceptible directory server increases the processing rate by about 17K
change to the aggregate goodput of service one as the flows for lookups and 4K for updates. Finally, the directory sys-
in service two start up or complete, demonstrating perfor- tem is robust to component (directory or RSM servers) fail-
mance isolation when the traffic consists of large long- ures and offers high availability under network churn.
lived flows. In Figure 10, we perform a similar experiment, To understand the incremental scalability of the direc-
but service two sends bursts of small TCP connections, tory system, we measured the maximum lookup rates
each burst containing progressively more connections. (ensuring sub-10ms latency for 99% requests) with 3, 5, and
These two experiments demonstrate TCP’s enforcement of 7 directory servers. The result confirmed that the maximum
the hose model sufficient to provide performance isolation lookup rates increases linearly with the number of direc-
across services at timescales greater than a few RTT (i.e., tory servers (with each server offering a capacity of 17K
1–10ms in data centers). lookups/s). Based on this result, we estimate the worst case
number of directory servers needed for a 100K server data
5.3. VL2 directory system performance center. Using the concurrent flow measurements (Figure 3),
Finally, we evaluate the performance of the VL2 directory we use the median of 10 correspondents per server in a 100s
system which provides the equivalent semantics of ARP in window. In the worst case, all 100K servers may perform 10
layer 2. We perform this evaluation through macro- and simultaneous lookups at the same time resulting in a mil-
micro-benchmark experiments on the directory system. We lion simultaneous lookups per second. As noted above,
run our prototype on up to 50 machines: 3–5 RSM nodes, each directory server can handle about 17K lookups/s under
3–7 directory server nodes, and the remaining nodes emu- 10ms at the 99th percentile. Therefore, handling this worst
lating multiple instances of VL2 agents generating lookups case will require a directory system of about 60 servers
and updates. (0.06% of the entire servers).
Our evaluation supports four main conclusions. First,
the directory system provides high throughput and fast 6. DISCUSSION
response time for lookups: three directory servers can In this section, we address several remaining concerns
about the VL2 architecture, including whether other traffic
Figure 9: Aggregate goodput of two services with servers engineering mechanisms might be better suited to the DC
intermingled on the ToRs. Service one’s goodput is unaffected than VLB, and the cost of a VL2 network.
as service two ramps traffic up and down.
Optimality of VLB: As noted in Section 4.2.2, VLB uses
randomization to cope with volatility, potentially sacrific-
15
Aggregate goodput (Gbps)
15 1500
all TMs so as to minimize the maximum link utilization.
# mice started
of performance that the theory predicts. Experiments with generation data center architecture: ethernet architecture for large
Scalability and commoditization. In enterprises. In SIGCOMM (2008).
two data center services showed that churn (e.g., dynamic PRESTO Workshop at SIGCOMM 18. Kodialam, M., Lakshman, T.V.,
reprovisioning of servers, change of link capacity, and (2008). Sengupta, S. Efficient and robust
12. Guo, C., Wu, H., Tan, K., Shiy, L., Zhang, routing of highly variable traffic. In
microbursts of flows) has little impact on TCP goodput. Y., Lu, S. DCell: a scalable and fault- HotNets (2004).
VL2’s implementation of VLB splits flows evenly and VL2 tolerant network structure for data 19. Lamport, L. The part-time parliament.
centers. In SIGCOMM (2008). ACM Trans. Comput. Syst. 16 (1998),
achieves high TCP fairness. On all-to-all data shuffle com- 13. Guo, C., Wu, H., Tan, K., Shiy, L., Zhang, 133–169.
munications, the prototype achieves an efficiency of 94% Y., Lu, S. BCube: a high performance, 20. Mudigonda, J., Yalagandula, P.,
server-centric network architecture Al-Fares, M., Mogul, J.C. Spain: Cots
with a TCP fairness index of 0.995. for modular data centers. In data-center ethernet for multipathing
SIGCOMM (2009). over arbitrary topologies. In NSDI
14. Hamilton, J. Cems: Low-cost, (2010).
Acknowledgments low-power servers for internet- 21. Touch, J., Perlman, R. Transparent
scale services. In Conference on interconnection of lots of links
Insightful comments from David Andersen, Jon Crowcroft, Innovative Data Systems Research (TRILL): Problem and applicability
and the anonymous reviewers greatly improved the final ver- (Jan 2009). statement. IETF RFC 5556 (2009).
15. Jain, R. The Art of Computer Systems 22. Yu, Y., Isard, M., Fetterly, D., Budiu,
sion of this paper. Performance Analysis. John Wiley and M., Erlingsson, U., Gunda, P.K.,
Sons, Inc., 1991. Currey, J. DryadLINQ: a system for
16. Kandula, S., Sengupta, S., Greenberg, general-purpose distributed data-
A., Patel, P., Chaiken, R. The nature of parallel computing using a high-level
References
datacenter traffic: Measurements and language. In OSDI (2008).
1. Abu-Libdeh, H., Costa, P., Rowstron, 7. Dean, J., Ghemawat, S. MapReduce: analysis. In IMC (2009). 23. Zhang-Shen, R., McKeown, N. Designing
A., O’Shea, G., Donnelly, A. Symbiotic simplified data processing on large 17. Kim, C., Caesar, M., Rexford, J. a Predictable Internet Backbone
routing in future data centers. In clusters. In OSDI (2004). Floodless in SEATTLE: a scalable Network. In HotNets (2004).
SIGCOMM (2010). 8. Duffield, N.G., Goyal, P., Greenberg,
2. Al-Fares, M., Loukissas, A., Vahdat, A. A.G., Mishra, P.P., Ramakrishnan,
A scalable, commodity data center K.K., van der Merwe, J.E. A flexible
Albert Greenberg, Navendu Jain, James R. Hamilton,
network architecture. In SIGCOMM model for resource management in
Srikanth Kandula, Changhoon Kim, Amazon Web Services
(2008). virtual private network. In SIGCOMM
Parantap Lahiri, David A. Maltz,
3. Chang, C., Lee, D., Jou, Y. Load (1999).
Parveen Patel, Sudipta Sengupta,
balanced Birkhoff-von Neumann 9. Farinacci, D., Fuller, V., Oran, D., Meyer,
Microsoft Research
switches, part I: one-stage buffering. D., Brimm, S. Locator/ID Separation
IEEE HPSR (2001). Protocol (LISP). Internet-draft, Dec.
4. Cisco. Data center Ethernet. http:// 2008.
www.cisco.com/go/dce. 10. Greenberg, A., Jain, N., Kandula, S.,
5. Cisco: Data center: Load balancing Kim, C., Lahiri, P., Maltz, D., Patel,
data center services, 2004. P., Sengupta, S. Vl2: A scalable and
6. Dally, W.J., Towles, B. Principles flexible data center network. In
and Practices of Interconnection SIGCOMM (2009).
Networks. Morgan Kaufmann 11. Greenberg, A., Lahiri, P., Maltz, D.A.,
Publishers, 2004. Patel, P., Sengupta, S. Towards a next © 2011 ACM 0001-0782/11/0300 $10.00
ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring network in engineering,
science and mathematics. MentorNet’s award-winning One-on-One Mentoring Programs pair ACM
student members with mentors from industry, government, higher education, and other sectors.
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.
Barry University the breadth of NEC business, and maintains a bal- and which are ripe for technical breakthrough.
Assistant Professor of anced mix of fundamental and applied research. The Energy Management Department in Cu-
Computer Science The Media Analytics Department in Cuper- pertino, CA, is seeking an outstanding and en-
tino, CA, is seeking an outstanding and enthu- thusiastic researcher with background in energy
The Department of Mathematics and Computer siastic researcher, with background in machine systems modeling and optimization to work on
Science invites applications for a continuing con- learning and computer vision, to work on devel- design of energy micro-grids. Candidates are ex-
tract faculty position in Computer Science with oping visual recognition technologies for novel pected to be strong in conducting cutting edge
the rank of Assistant Professor, starting in Fall mobile applications, web services, and HCI solu- research, and also passionate about leading re-
2011. tions. We expect the candidates to be strong in search threads and turning research into high
Strong teaching skills, a commitment to re- conducting cutting edge research, and also pas- impact products and services. We encourage re-
search, and service to the Department and the sionate about turning research into high impact searchers to establish leadership in the research
University are expected. The Department has un- products and services. We encourage researchers community, and maintain active research collab-
dergraduate majors in Mathematical Sciences, to establish leadership in the research commu- orations with top universities in the US.
Computer Science and Computer Information nity, and maintain active research collaborations
Science. It also has a pre-Engineering program. with top universities in the US. Required Skills or Experience:
The Department offers courses for the majors ˲˲ PhD in ME/EE(Power Systems)/CS/OR (or
and service courses for all other Schools. Required Skills or Experience: equivalent)
˲˲ PhD in Computer Science (or equivalent) ˲˲ Solid knowledge in math, optimization, and
Qualifications: ˲˲ Strong publication in top machine learning or statistical analysis
The search is open to all areas of Computer Sci- computer vision conferences & journals ˲˲ Hands-on experiences in implementing energy
ence, with a particular emphasis on candidates ˲˲ Solid knowledge in math, optimization, and system models
with research interests in software engineering, statistical inference ˲˲ Great problem solving skills, with a strong
computer security or mobile computing. ˲˲ Hands-on experiences in implementing large- desire for quality and engineering excellence
A Ph.D. in Computer Science or closely related scale learning algorithms and systems ˲˲ Expert knowledge of optimization theory and
field is required. ˲˲ Great problem solving skills, with a strong tools
Review of applications will start 02/14/2011 desire for quality and engineering excellence ˲˲ Working knowledge of power and energy
and will continue until the position is filled. ˲˲ Expert knowledge developing and debugging systems
Interested and qualified candidates should in C/C++
send the following: Desired Skills:
Complete curriculum vitae Desired Skills a Plus: ˲˲ Knowledge of thermodynamic principles
Transcripts and Three letters of reference to: ˲˲ Good knowledge developing and debugging on ˲˲ Knowledge of GAMS or equivalent optimiza-
Linux tion tools
CS Faculty Search Committee, ˲˲ Good knowledge developing in Java ˲˲ Experience in power sector
Dept. of MATH and CS, Barry University, ˲˲ Experience with scripting languages such as
11300 NE 2nd Ave., Python, PHP, Perl, and shell scripts For consideration, please visit our career cen-
Miami Shores, FL 33161. ˲˲ Experience with parallel/distributed computing ter at http://www.nec-labs.com/careers/index.php
Contact us by Fax (305)899-3610 ˲˲ Experience with algorithm implementation on to submit your resume and a research statement.
By email to: mathcs@mail.barry.edu. GPU
˲˲ Experience with mobile or embedded systems EOE/AA/MFDV
Barry University is a Catholic institution ˲˲ Experience with image classification, object
grounded in the liberal arts tradition and is com- recognition, and visual scene parsing
mitted to an inclusive community, social justice, ˲˲ Ability to work on other media data, like textual Reykjavik University
and collaborative service. and audio data School of Computer Science
Barry University is an Equal Employment Op- Faculty position in computer systems
portunity Employer. For consideration please submit your resume
Barry University does not discriminate appli- and a one-page research statement at http://www. The School of Computer Science at Reykjavik
cants or employees for terms of employment on nec-labs.com/careers/index.php. University seeks to hire a faculty member in the
the basis of race, color, sex, religion, national ori- field of computer systems. We are looking for a
gin, disability, veteran status, political affiliation EOE/AA/MFDV highly-qualified academic who, apart from devel-
or any other terms prohibited under the county oping her/his research programme, is interested
ordinance, state or federal law. in working with existing faculty, and in bridging
NEC Laboratories America, Inc. between research, in one or more of the research
Research Staff Member - Energy Management areas within the School, in particular artificial in-
NEC Laboratories America, Inc. telligence, software engineering and theoretical
Research Staff Member - Machine Learning & NEC Laboratories America, Inc.’s research pro- computer science.
Computer Vision gram covers many areas, reflecting the breadth The level of the position can range from as-
of NEC business, and maintains a balanced mix sistant professor to full professor, depending on
NEC Laboratories America, Inc. is a vibrant in- of fundamental and applied research. We focus the qualifications of the applicant. For informa-
dustrial research center, conducting research in on topics with strong innovations in the U.S. and tion on the position, how to apply and the School
support of NEC’s U.S. and global businesses. Our place emphasis on developing deep competence in of Computer Science at Reykjavik University, see
research program covers many areas, reflecting selective areas that are important to NEC business http://www.ru.is/faculty/luca/compsysjob.html
Take Advantage of
ACM’s Lifetime Membership Plan!
ACM Professional Members can enjoy the convenience of making a single payment for their
entire tenure as an ACM Member, and also be protected from future price increases by
taking advantage of ACM's Lifetime Membership option.
ACM Lifetime Membership dues may be tax deductible under certain circumstances, so
becoming a Lifetime Member can have additional advantages if you act before the end of
2011. (Please consult with your tax advisor.)
Lifetime Members receive a certificate of recognition suitable for framing, and enjoy all of
the benefits of ACM Professional Membership.
The Apps are freely available to download from the Apple iTunes Store, but users must be registered
individual members of ACM with valid Web Accounts to receive regularly updated content.
http://www.apple.com/iphone/apps-for-iphone/ http://www.apple.com/ipad/apps-for-ipad/
ACM TechNews
last byte
Puzzled
Solutions and Sources
Last month (February 2011, p. 112) we posted a trio of brainteasers, including
one as yet unsolved, concerning partitions of Ms. Feldman’s fifth-grade class.
Here, we offer solutions to at least two of them. How did you do?
1. Monday, Tuesday.
Solution. Recall that on Mon-
day, Ms. Feldman partitioned her class
2. Unfriendly Partitions.
Solution. A partition of the
kind Ms. Feldman wants, into two
3. Countably Infinite Graphs.
Unsolved. On Friday, when the
class suddenly has a countably infi-
into k subsets and on Tuesday repar- subsets, such that no student has nite number of students, Ms. Feld-
titioned the same students into k+1 more than half his/her friends in that man can no longer apply these argu-
subsets. We were asked to show that student’s own group, is called an “un- ments to show that an unfriendly
at least two students were in smaller friendly partition.” To see that an un- partition exists. The difficulty is that
subsets on Tuesday than they were on friendly partition exists, consider, for now there may be no partition with
Monday. any partition, the number of broken the maximum number of broken
It turns out that a nice way to see friendships; that is, the number of friendships. Moreover, even if there
this is to consider how much work pairs of friendly students who have is a partition that breaks infinitely
each student contributed to his or her been separated. Now choose a parti- many friendships, the argument fails
assigned project. Assume that all proj- tion that maximizes this number; it because switching student X as in So-
ects (both days) were equally demand- must be unfriendly. Why? Because if lution 2 doesn’t give a contradiction.
ing, with each requiring a total of one student X has more friends in his/her Amazingly, no substitute argument
unit of effort. Assume, too, that the subset than in the other subset, mov- has been found, nor has anyone come
work was divided perfectly equitably, ing X from one to the other subset up with an example where no unfriend-
so a student in a subset of size m con- would yield a partition with more bro- ly partition exists. For (much) more
tributed 1/m units of effort. The total ken friendships. information, see the marvelous article
effort contributed on Monday was, of by Saharon Shelah and the late Eric
course, k and on Tuesday k+1, so some Milnor, “Graphs with No Unfriendly
students must have contributed more Partitions,” in A Tribute to Paul Erdős, A.
of their effort on Tuesday than on Mon- Baker, B. Bollobás, and A. Hajnal, Eds.,
day. But no individual student could Cambridge University Press, 1990, 373–
have made up the full unit difference; 384. Shelah and Milnor constructed an
the difference between 1/m and 1/n is “uncountable” graph with no unfriend-
less than 1 for any positive integers m ly partitions; they also showed that ev-
and n. At least two students thus put in ery graph has an unfriendly partition
more effort on Tuesday and were there- divided into three subsets. The case of
fore in smaller groups the second day. countably infinite graphs, when seek-
This surprisingly tricky puzzle was ing to divide an unfriendly partition
brought to my attention by Ori Gurel- into two subsets, remains tantalizingly
Gurevich of the University of British open—until, perhaps, you solve it.
Columbia; it had appeared in the 1990
Australian Mathematics Olympiad.
Peter Winkler (puzzled@cacm.acm.org) is Professor of Mathematics and of Computer Science and Albert Bradley
Third Century Professor in the Sciences at Dartmouth College, Hanover, NH.
All readers are encouraged to submit prospective puzzles for future columns to puzzled@cacm.acm.org.
Future Tense, one of the revolving features on this page, presents stories and
essays from the intersection of computational science and technological speculation,
their boundaries limited only by our ability to imagine what will and could be.
Future Tense
Catch Me If You Can
Or how to lose a billion in your spare time…
So I wrote some simple code and vented the idea into the 1980s, when everything viruses of robust ability
sent it along in my next ARPANet trans- a virus called Elk Cloner infected early and myriad variations: Trojan horses,
mission. Just a few lines in Fortran Apple computers. It was fixed quickly, chameleons (acts friendly, turns nas-
told the computer to attach them to but Microsoft software proved more ty), software bombs (self-detonating
programs being transmitted to a par- vulnerable, and in 1986 a virus called agents, destroying without cloning
ticular terminal. Soon it popped up in Brain started booting up with Mi- themselves), logic bombs (go off given
other programs and began propagat- crosoft’s disk operating system and specific cues), time bombs (keyed by
ing. By the next day it was in a lot of spread through floppy disks, stimulat- clock time), [co ntinue d o n p. 1 1 1 ]
Papers and Notes Chairs: Gloria Mark, John Riedl, Jonathan Grudin Doctoral Colloquium
16 October 2011
Also consider attending WSDM (wsdm2012.org) immediately before
CSCW 2012.
Sponsored by
http://www.cscw2012.org
Think Parallel.....
It’s not just what we make.
It’s what we make possible.
Advancing Technology Curriculum
Driving Software Evolution
Fostering Tomorrow’s Innovators