Sunteți pe pagina 1din 116

COMMUNICATIONS

ACM
CACM.ACM.ORG OF THE 03/2011 VOL.54 NO.3

Plug-and-Play
Macroscopes

Data Structures
in the Multicore Age
Fumbling the Future
The Informatics
Philharmonic
Testable System
Administration
Memristors:
Pass or Fail?

Association for
Computing Machinery
AdvAnCe Your CAreer wiTh ACM TeCh PACkS…

For Serious
Computing Professionals.

Searching through technology books, magazines, and websites


to find the authoritative information you need takes time.
That’s why ACM created “Tech Packs."
• Compiled by subject matter experts to help serious Current topics include Cloud Computing and
computing practitioners, managers, academics, and Parallel Computing. In development are
students understand today’s vital computing topics. Gaming, Globalization/Localization, Mobility, Security,
and Software as a Service (SaaS).
• Comprehensive annotated bibliographies: from ACM
Digital Library articles, conference proceedings, and
Suggestions for future Tech Packs? Contact:
videos to ACM Learning Center Online Books and Courses
to non-ACM books, tutorials, websites, and blogs. Yan Timanovsky
ACM Education Manager
• Peer-reviewed for relevance and usability by computing timanovsky@hq.acm.org
professionals, and updated periodically to ensure currency.

Access to ACM resources in Tech Packs is available


to ACM members at http://techpack.acm.org
or http://learning.acm.org.
communications of the acm

Departments News Viewpoints

5 Editor’s Letter 15 Grid Computing’s Future 26 Legally Speaking


Fumbling the Future Outreach programs and usability Do You Own the Software You Buy?
By Moshe Y. Vardi improvements are drawing many Examining the fine print
researchers to grid computing from concerning your rights in your
6 Letters To The Editor disciplines that have not traditionally copies of purchased software.
Free Speech for Algorithms? used such resources. By Pamela Samuelson
By Kirk L. Kroeker
11 In the Virtual Extension 29 Computing Ethics
18 Twitter as Medium and Message Surrounded by Machines
12 BLOG@CACM Researchers are mining Twitter’s A chilling scenario portends
Scientists, Engineers, and vast flow of data to measure a possible future.
Computer Science; Industry public sentiment, follow political By Kenneth D. Pimple
and Research Groups activity, and detect earthquakes
Mark Guzdial discusses what and flu outbreaks. 32 The Profession of IT
scientists and engineers should By Neil Savage Managing Time
know about computer science, Professionals overwhelmed
such as Alan Kay’s “Triple Whammy.” 21 Evaluating Government Funding with information glut can find
Greg Linden writes about industry’s Presidential report asserts the value hope from new insights about
different approaches to research and of U.S. government funding and time management.
how to organize researchers specifies areas needing greater focus. By Peter J. Denning
in a company. By Tom Geller
35 Broadening Participation
14 CACM Online 22 Memristors: Pass or Fail? A Program Greater than the Sum
Time to Change The device may revolutionize of Its Parts: The BPC Alliances
By David Roman data storage, replacing flash Changing the trajectory of
memory and perhaps even disks. participation in computing
31 Calendar Whether they can be reliably for students at various stages
and cheaply manufactured, of development.
105 Careers though, is an open question. By Daryl E. Chubin
By Gary Anthes and Roosevelt Y. Johnson

Last Byte 25 Gary Chapman, Technologist: 38 Viewpoint


1952–2010 Computer and Information
109 Puzzled He raised important public issues, Science and Engineering:
Solutions and Sources such as the impact of computers One Discipline, Many Specialties
By Peter Winkler and the Internet on society, and Mathematics is no longer the only
encouraged social responsibility foundation for computing
112 Future Tense for computer professionals. and information research and
Catch Me If You Can By Samuel Greengard education in academia.
Or how to lose a billion in By Marc Snir
your spare time…
By Gregory Benford Reaching Out to the Media:
Become a Computer
Science Ambassador
Why computer scientists should
come out from “behind the scenes”
more often and work with the
media to draw public attention to
their fundamental innovations.
Association for Computing Machinery
Advancing Computing as a Science & Profession By Frances Rosamond et al.

2 co mmunications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
03/2011 vol. 54 no. 03

Practice Contributed Articles Review Articles

76 Data Structures in the Multicore Age


The advent of multicore processors
as the standard computing
platform will force major changes
in software design.
By Nir Shavit

Research Highlights

86 Technical Perspective
Concerto for Violin
and Markov Model
By Juan Bello, Yann LeCun,
50 70 and Robert Rowe

44 Testable System Administration 60 Plug-and-Play Macroscopes 87 The Informatics Philharmonic


Models of determinism are Compose “dream tools” from By Christopher Raphael
changing IT management. continuously evolving bundles
By Mark Burgess of software to make sense 94 Technical Perspective
of complex scientific data sets. VL2
50 National Internet Defense— By Katy Börner By Jennifer Rexford
Small States on the Skirmish Line
Attacks in Estonia and Georgia 70 Understanding Scam Victims: 95 VL2: A Scalable and Flexible
highlight key vulnerabilities in Seven Principles for Systems Security Data Center Network
national Internet infrastructure. Effective countermeasures depend By Albert Greenberg, James R. Hamilton,
By Ross Stapleton-Gray on first understanding how users Navendu Jain, Srikanth Kandula,
and William Woodcock naturally fall victim to fraudsters. Changhoon Kim, Parantap Lahiri,
By Frank Stajano and Paul Wilson David A. Maltz, Parveen Patel,
56 B.Y.O.C (1,342 Times and Counting) and Sudipta Sengupta
Why can’t we all use standard The Internet Electorate
libraries for commonly The 2008 U.S. presidential election
needed algorithms? demonstrated the Internet is
By Poul-Henning Kamp a major source of both political
information and expression.
Articles’ development led by By R. Kelly Garrett
queue.acm.org and James N. Danziger
Illustratio n by A lex william so n, Ph otograph by Co llin pa rker

Governing Web 2.0


Grounding principles to get the most
out of enterprise 2.0 investments.
By Steven De Hertogh, Stijn Viaene, About the Cover:
and Guido Dedene Artist/photographer
Eric Fischer created the
Geotaggers’ World Atlas
by collected geographical
data from Flickr photos,
revealing where people
take pictures in major
cities around the world.
Each city was ranked on
the density of photographs
taken around its center;
our cover image of New
York City topped the list.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f the acm 3


communications of the acm
Trusted insights for computing’s leading professionals.

Communications of the ACM is the leading monthly print and online magazine for the computing and information technology fields.
Communications is recognized as the most trusted and knowledgeable source of industry information for today’s computing professional.
Communications brings its readership in-depth coverage of emerging areas of computer science, new trends in information technology,
and practical applications. Industry leaders use Communications as a platform to present and debate various technology implications,
public policies, engineering challenges, and market trends. The prestige and unmatched reputation that Communications of the ACM
enjoys today is built upon a 50-year commitment to high-quality editorial content and a steadfast dedication to advancing the arts,
sciences, and applications of information technology.

ACM, the world’s largest educational STA F F editoria l Board


and scientific computing society, delivers  
resources that advance computing as a Director of Group P ublishi ng E ditor-i n -c hief
science and profession. ACM provides the Scott E. Delman Moshe Y. Vardi ACM Copyright Notice
computing field’s premier Digital Library publisher@cacm.acm.org eic@cacm.acm.org Copyright © 2011 by Association for
and serves its members and the computing Executive Editor News Computing Machinery, Inc. (ACM).
profession with leading-edge publications, Diane Crawford Co-chairs Permission to make digital or hard copies
conferences, and career resources. Managing Editor Marc Najork and Prabhakar Raghavan of part or all of this work for personal
Thomas E. Lambert Board Members or classroom use is granted without
Executive Director and CEO Senior Editor Hsiao-Wuen Hon; Mei Kobayashi; fee provided that copies are not made
John White Andrew Rosenbloom William Pulleyblank; Rajeev Rastogi; or distributed for profit or commercial
Deputy Executive Director and COO Senior Editor/News Jeannette Wing advantage and that copies bear this
Patricia Ryan Jack Rosenberger notice and full citation on the first
Director, Office of Information Systems Web Editor Viewpoints page. Copyright for components of this
Wayne Graves David Roman Co-chairs work owned by others than ACM must
Director, Office of Financial Services Editorial Assistant Susanne E. Hambrusch; John Leslie King; be honored. Abstracting with credit is
Russell Harris Zarina Strakhan J Strother Moore permitted. To copy otherwise, to republish,
Director, Office of Membership Rights and Permissions Board Members to post on servers, or to redistribute to
Lillian Israel Deborah Cotton P. Anandan; William Aspray; lists, requires prior specific permission
Director, Office of SIG Services Stefan Bechtold; Judith Bishop; and/or fee. Request permission to publish
Donna Cappo Art Director Stuart I. Feldman; Peter Freeman; from permissions@acm.org or fax
Director, Office of Publications Andrij Borys Seymour Goodman; Shane Greenstein; (212) 869-0481.
Bernard Rous Associate Art Director Mark Guzdial; Richard Heeks;
Director, Office of Group Publishing Alicia Kubista Rachelle Hollander; Richard Ladner; For other copying of articles that carry a
Scott Delman Assistant Art Directors Susan Landau; Carlos Jose Pereira de Lucena; code at the bottom of the first or last page
Mia Angelica Balaquiot Beng Chin Ooi; Loren Terveen or screen display, copying is permitted
ACM Cou n ci l Brian Greenberg provided that the per-copy fee indicated
President Production Manager P ractice in the code is paid through the Copyright
Alain Chesnais Lynn D’Addesio Chair Clearance Center; www.copyright.com.
Vice-President Director of Media Sales Stephen Bourne
Barbara G. Ryder Jennifer Ruzicka Board Members Subscriptions
Secretary/Treasurer Public Relations Coordinator Eric Allman; Charles Beeler; David J. Brown; An annual subscription cost is included
Alexander L. Wolf Virgina Gold Bryan Cantrill; Terry Coatta; Mark Compton; in ACM member dues of $99 ($40 of
Past President Publications Assistant Stuart Feldman; Benjamin Fried; which is allocated to a subscription to
Wendy Hall Emily Eng Pat Hanrahan; Marshall Kirk McKusick; Communications); for students, cost
Chair, SGB Board George Neville-Neil; Theo Schlossnagle; is included in $42 dues ($20 of which
Vicki Hanson Columnists is allocated to a Communications
Jim Waldo
Co-Chairs, Publications Board Alok Aggarwal; Phillip G. Armour; subscription). A nonmember annual
Martin Campbell-Kelly; The Practice section of the CACM
Ronald Boisvert and Jack Davidson subscription is $100.
Michael Cusumano; Peter J. Denning; Editorial Board also serves as
Members-at-Large
Shane Greenstein; Mark Guzdial; the Editorial Board of .
Vinton G. Cerf; ACM Media Advertising Policy
Carlo Ghezzi; Peter Harsha; Leah Hoffmann; Communications of the ACM and other
C on tributed Articles
Anthony Joseph; Mari Sako; Pamela Samuelson; ACM Media publications accept advertising
Co-chairs
Mathai Joseph; Gene Spafford; Cameron Wilson in both print and electronic formats. All
Al Aho and Georg Gottlob
Kelly Lyons; Board Members advertising in ACM Media publications is
Mary Lou Soffa; C o ntac t P o i n ts at the discretion of ACM and is intended
Yannis Bakos; Elisa Bertino; Gilles
Salil Vadhan Copyright permission to provide financial support for the various
Brassard; Alan Bundy; Peter Buneman;
SGB Council Representatives permissions@cacm.acm.org activities and services for ACM members.
Andrew Chien; Peter Druschel;
Joseph A. Konstan; Calendar items Current Advertising Rates can be found
Anja Feldmann; Blake Ives; James Larus;
G. Scott Owens; calendar@cacm.acm.org by visiting http://www.acm-media.org or
Igor Markov; Gail C. Murphy; Shree Nayar;
Douglas Terry Change of address by contacting ACM Media Sales at
Lionel M. Ni; Sriram Rajamani;
acmcoa@cacm.acm.org (212) 626-0654.
Publi cati o n s B oard Jennifer Rexford; Marie-Christine Rousset;
Letters to the Editor
Co-Chairs Avi Rubin; Fred B. Schneider;
letters@cacm.acm.org Single Copies
Ronald F. Boisvert; Jack Davidson Abigail Sellen; Ron Shamir; Marc Snir;
Larry Snyder; Manuela Veloso; Single copies of Communications of the
Board Members W e b S I TE
Michael Vitale; Wolfgang Wahlster; ACM are available for purchase. Please
Nikil Dutt; Carol Hutchins; http://cacm.acm.org
Andy Chi-Chih Yao; Willy Zwaenepoel contact acmhelp@acm.org.
Joseph A. Konstan; Ee-Peng Lim;
Catherine McGeoch; M. Tamer Ozsu; Aut h o r G uide l in es Research High lights Comm uni cation s of the ACM
Holly Rushmeier; Vincent Shen; http://cacm.acm.org/guidelines Co-chairs (ISSN 0001-0782) is published monthly
Mary Lou Soffa David A. Patterson and Stuart J. Russell by ACM Media, 2 Penn Plaza, Suite 701,
ACM U.S. Public Policy Office A dv e rtising Board Members New York, NY 10121-0701. Periodicals
Cameron Wilson, Director Martin Abadi; Stuart K. Card; Jon Crowcroft; postage paid at New York, NY 10001,
ACM Advertisi n g Department Shafi Goldwasser; Monika Henzinger;
1828 L Street, N.W., Suite 800 and other mailing offices.
2 Penn Plaza, Suite 701, New York, NY Maurice Herlihy; Dan Huttenlocher;
Washington, DC 20036 USA
10121-0701 Norm Jouppi; Andrew B. Kahng; P OSTMASTER
T (202) 659-9711; F (202) 667-1066
T (212) 869-7440 Gregory Morrisett; Michael Reiter; Please send address changes to
Computer Science Teachers Association F (212) 869-0481 Mendel Rosenblum; Ronitt Rubinfeld; Communications of the ACM
Chris Stephenson David Salesin; Lawrence K. Saul; 2 Penn Plaza, Suite 701
Director of Media Sales
Executive Director Guy Steele, Jr.; Madhu Sudan; New York, NY 10121-0701 USA
Jennifer Ruzicka
2 Penn Plaza, Suite 701 Gerhard Weikum; Alexander L. Wolf;
jen.ruzicka@hq.acm.org
New York, NY 10121-0701 USA Margaret H. Wright
T (800) 401-1799; F (541) 687-1840 Media Kit acmmediasales@acm.org
W eb
Association for Computing Machinery Co-chairs
(ACM) James Landay and Greg Linden
2 Penn Plaza, Suite 701 Board Members A
SE
REC
Y

New York, NY 10121-0701 USA Gene Golovchinsky; Marti Hearst;


E

CL
PL

T (212) 869-7440; F (212) 869-0481 Jason I. Hong; Jeff Johnson; Wendy E. MacKay Printed in the U.S.A.
NE
TH

S
I

Z
I

M AGA

4 communications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
editor’s letter

DOI:10.1145/1897852.1897853 Moshe Y. Vardi

Fumbling the Future


Fumbling the Future: How Xerox Invented, Then Ignored,
the First Personal Computer is the title of a classic 1998 book
by D.K. Smith and R.C. Alexander that tells the gripping story
of how Xerox invented the personal-computing technology
in the 1970s, and then “miscalculated deliver, on a flexible pay-as-you-go ba- the report was recently declassified; see
and mishandled” the opportunity to sis, the most varied multimedia infor- https://researcher.ibm.com/researcher/
fully exploit it. To “fumble the future” mation services, such as multimedia files/zurich-pj/vision1%20-%202010.pdf.)
has since become a standard phrase electronic messaging, broadcasting, In the early 1990s, the computing
in discussions of advanced technology multimedia reference and encyclope- environment at IBM Research was
and its commercialization. dia database querying, rich informa- shifting from mainframe-based to
Another example definitely worth tion services, teleconferences, online workstation-based. Users requested an
watching is a recently discovered copy simulation, visualization, and explora- interface that would integrate the two
of a 1993 AT&T commercial (http://www. tion services.” In other words, while we environments. I got involved, together
geekosystem.com/1993-att-video/), did not predict Google, Facebook, or with several other people, in the devel-
with a rather clear vision of the future, Wikipedia, we did describe, rather pre- opment of ARCWorld, a software tool
predicting what was then revolutionary sciently I think, the World-Wide Web. with a graphical user interface that al-
technology, such as paying tolls without What happened with that exciting lowed a user to manage and manipu-
stopping and reading books on comput- vision? Very little, I am afraid. Our re- late files on multiple Internet-connect-
ers. As we know today, the future did not port was issued as an IBM Research ed computer systems. Our focus was
work out too well for AT&T; following Report. It was deemed significant on file manipulation, rather than infor-
the telecom crash of the early 2000s, the enough to be classified as “IBM Con- mation display, but ARCWorld could
telecom giant had to sell itself in 2005 fidential,” which ensured that it was have been described as a “file browser.”
to its former spin-off, SBC Communica- not widely read, and it had no visibil- It was clear to all involved that ARC-
tions, which then took the name AT&T. ity outside IBM. The publication of the World was an innovative approach to
This editorial is a story of how I fum- report practically coincided with the wide-area information management,
bled the future. It is not widely known, start of IBM’s significant business but it was difficult to see how it could
but I almost invented the World-Wide difficulties in the early 1990s, and the fit within IBM’s product strategy at the
Web—twice (smiley face here). corporation’s focus naturally shifted time. Some feeble attempts at com-
In 1988 I was a research staff mem- to its near-term future. In other words, mercialization were not successful. We
ber at the IBM Almaden Research Cen- we fumbled the future. (At my request, fumbled the future, again.
ter. In response to a concern that IBM What is the moral of these reminis-
was not innovative enough, a group of cences? My main lesson is that fum-
about 20 researchers, including me, The future looks clear bling the future is very easy. I have done
from different IBM Research labs, was it myself! The future looks clear only in
tasked with envisioning an exciting only in hindsight. hindsight. It is rather easy to practically
information-technology-enabled 21st- It is rather easy to stare at it and not see it. It follows that
century future. The “CS Future Work those who did make the future happen
Group” met several times over a period practically stare deserve double and triple credit. They
of 18 months and produced a “vision” at it and not see it. not only saw the future, but also trusted
titled “Global Multi-Media Informa- their vision to follow through, and trans-
tion Utilities.” What did this somewhat lated vision to execution. We should all
clunky name refer to? To quote: “The recognize the incredible contributions
utility is assumed to have a geographi- of those who did not fumble the future.
cal coverage similar to that of today’s
telephone system, over which it can Moshe Y. Vardi, editor-in-chie f

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f the acm 5


letters to the editor

DOI:10.1145/1897852.1897854

Free Speech for Algorithms?

I
n “Regulating the Information absurd. At least until artificial intel- With computation, even more fac-
Gatekeepers” (Nov. 2010), ligence has advanced to where ma- tors are needed, including the cor-
Patrick Vogl and Michael chines must indeed be granted the rectness of hardware design and the
Barrett said a counterargu- same rights we grant our fellow hu- validity of the software packages be-
ment against the regulation mans. ing used, as argued by Nick Barnes
of search-engine bias is that “Search Roger Neate, Seattle, WA in his comment “Release the Code”
results are free speech and therefore (Dec. 2010) concerning Dennis Mc-
cannot be regulated.” While I have Cafferty’s news story “Should Code Be
no quarrel as to whether this claim is Authors’ Response: Released?” (Oct. 2010).
true, I’m astounded that anyone could Neate touches a nerve concerning the For such a set of scientific assump-
seriously make such a counterargu- increasingly complex relationship between tions, Thomas S. Kuhn coined the
ment—or any judge accept it. humans and material technologies in term “paradigm” in his 1962 book The
Search results are the output of an society. Accountability in these socio- Structure of Scientific Revolutions. Imre
algorithm. I was unaware the field of material settings is challenging for judge Lakatos later evolved the concept into
artificial intelligence had advanced to and regulator alike. In the 2003 case the notion of “research program” in
the point that we must now consider of SearchKing vs. Google Technology, a his 1970 paper “Falsification and the
granting algorithms the right of free U.S. District Court noted the ambiguity of Methodology of Scientific Research
speech. To illustrate such absurdity, deciding whether PageRank is mechanical Programs.”
suppose I was clever enough to have and objective or subjective, ruling that In this light, neither the two-leg nor
devised an algorithm that could crawl PageRank represents constitutionally the four-leg hypothesis is convincing.
the Web and produce opinionated ar- protected opinions. Whether search Citing the leg metaphor at all, science
ticles, rather than search results, as its results are indeed free speech remains is perhaps more accurately viewed as
output. Would anyone seriously sug- controversial, meaning we can expect the a millipede.
gest the resulting articles be granted debate to continue. Wolf Siberski, Hannover, Germany
all the constitutional protections af- Patrick Vogl and Michael Barrett,
forded the works of a human author? Cambridge, U.K.
Taking the analogy further, suppose, Certify Software Professionals
too, my algorithm produced some- and their Work
thing equivalent to shouting “Fire!” Science Has 1,000 Legs As a programmer for the past 40
in a crowded theater. Or, further still, It’s great to reflect on the foundations years, I wholeheartedly support Da-
perhaps it eventually produced some- of science in Communications, as in vid L. Parnas’s Viewpoint “Risks of
thing genuinely treasonous. Tony Hey’s comment “Science Has Undisciplined Development” (Oct.
If we accept the idea that the out- Four Legs” (Dec. 2010) and Moshe Y. 2010) concerning the lack of dis-
put of an algorithm can be protected Vardi’s Editor’s Letter “Science Has cipline in programming projects.
under the right of free speech, then Only Two Legs” (Sept. 2010), but also We could be sitting on a time bomb
we ought also to accept the idea how the philosophy of science sheds and should take immediate action
that it is subject to the same limita- light on questions involving the num- to prevent potential catastrophic
tions we place on truly unfettered ber of legs in a natural science. consequences of the carelessness of
free speech in a civilized society. But Willard Van Orman Quine’s 1951 software professionals. I agree with
who would we go after when these paper “Two Dogmas of Empiricism” Parnas that undisciplined software
limitations are exceeded? I may have convincingly argued that the attempt development must be curbed.
created the algorithm, but I’m not to distinguish experiment from the- I began with structured program-
responsible for the input it found ory fails in modern science because ming and moved on to objects and
that actually produced the offensive every observation is so theory-laden; now to Web programming and find
output. Who’s guilty? Me? The algo- for example, as a result of a Large that software is a mess today. When
rithm? (Put the algorithm on trial?) Hadron Collider experiment, scien- I travel on a plane, I hope its embed-
The machine that executed the algo- tists will not perceive, say, muons or ded software does not execute some
rithm? How about those responsible other particles, but rather some visual untested loop in some exotic func-
for the input that algorithmically pro- input originating from the computer tion never previously recognized or
duced the output? screen displaying experimental data. documented. When I conduct an
Unless humans intervene to mod- The interpretation of this perception online banking transaction, I like-
ify the output of algorithms produc- depends on the validity of many non- wise hope nothing goes wrong.
ing search results, arguments involv- empirical factors, including physics See the Web site “Software Hor-
ing search results and free speech are theories and methods. ror Stories” (http://www.cs.tau.

6 co mmunications of the ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
letters to the editor

ac.il/~nachumd/horror.html) showing PIN,” though flaws in EMV and hard-


why the facts can no longer be ignored. Unicode Not So Unifying ware mean, in practice, the onus is
Moreover, certification standards Poul-Henning Kamp’s attack in “Sir, still on the bank to demonstrate its
like CMMI do not work. I have been Please Step Away from the ASR-33!” customer is at fault.
part of CMMI-certification drives and on ASCII as the basis of modern pro- Alastair Houghton, Fareham, England
find that real software-development gramming languages was somewhat
processes have no relation to what is misplaced. While, as Kamp said, most
ultimately certified. Software develop- operating systems support Unicode, a Author’s Response:
ment in real life starts with ambiguous glance at the keyboard shows that us- The U.K. Financial Services Authority took
specifications. When a project is initi- ers are stuck with an ASCII subset (or over regulation of this area November 1,
ated and otherwise unrelated employ- regional equivalent). 2009, because many found the situation,
ees assembled into a team, the project My dubious honor learning and as I described it, objectionable. In practice,
manager creates a process template using APL* while at university in the however, it is unclear whether the FSA’s
and fills it with virtual data for the 1970s required a special “golf ball” jurisdiction has made much difference.
quality-assurance review. But the ac- and stick-on key labels for the IBM While the burden of proof is now on the
tual development is an uncontrolled Selectric terminals supporting it. A bank, one source (see Dark Reading, Apr.
process, where programs are assem- vexing challenge in using the lan- 26, 2010) reported that 37% of credit-card
bled from random collections of code guage was finding one the many Greek fraud victims get no refund. The practice
available online, often taken verbatim or other special characters required to in the U.S. is not necessarily better but is
from earlier projects. write even the simplest code. different.
Most software winds up with an Also, while Kamp mentioned Perl, Joel F. Brenner, Washington, D.C.
unmanageable set of bugs, a scenario he failed to mention that the regular
repeated in almost 80% of the proj- expressions made popular by that lan-
ects I’ve seen. In them, software for guage—employing many special char- Format Migration or
dropped projects might be revived, acters as operators—are virtually un- Unforgiving Obsolescence
fixed by a new generation of coders, intelligible to all but the most diehard David S.H. Rosenthal’s response (Jan.
and deployed in new computer sys- fans. The prospect of a programming 2011) to Robin Williams’ comment
tems and business applications ulti- language making extensive use of the “Interpreting Data 100 Years On” said
mately delivered to everyday users. Unicode character set is a frightening he was unaware of a single format
Software developers must ensure proposition. widely used that has actually become
their code puts no lives at risk and en- William Hudson, Abingdon, U.K. obsolete. Though I understand the
force a licensing program for all soft- sentiment, it brought to mind Apple’s
*APL stands for “A Programming Language,” so “the
ware developers. Proof of professional APL programming language” deconstructs as “the a
switch from PowerPC to Intel architec-
discipline and competency must be programming language programming language.” ture about six years ago. Upgrading the
provided before they are allowed to computers in my company in response
write, modify, or patch any software to to that switch required migrating all
be used by the public. The Merchant Is Still Liable our current and legacy data to the new
As suggested by Parnas,1,2 software In his Viewpoint “Why Isn’t Cyber- format used by Intel applications at
should be viewed as a professional space More Secure?” (Nov. 2010), Joel the time. Though we didn’t have to do
engineering discipline. Science is F. Brenner said that in the U.K. the it straightaway, as we could have kept
limited to creating and disseminating customer, not the bank, usually pays running our older hardware and soft-
knowledge. When a task involves cre- in cases of credit-card fraud. I would ware, we had no choice but to com-
ating products for others, it becomes like to know the statistical basis for mence a process to migrate over time.
an engineering discipline and must this claim, since for transactions con- This decision directly affected only
be controlled, as it is in every other ducted in cyberspace the situation in my company, not the entire comput-
engineering profession. Therefore, both the U.K. and the U.S. is that liabil- ing world, but when addressing data
software-coding standards should be ity generally rests with the merchant, exchange and sharing, it was an ad-
included in penal codes and country unless it provides proof of delivery or ditional factor we had to consider.
laws, as in the ones that guide other has used the 3-D Secure protocol to Rather than face some general obso-
engineering, as well as medical, pro- enable the card issuer to authenticate lescence, we may inevitably all have to
fessions. Moreover, software develop- the customer directly. While the rates address format obsolescence that is a
ers should be required to undergo pe- of uptake of the 3-D Secure authentica- natural consequence of IT’s histori-
riodic relicensing, perhaps every five tion scheme may differ, I have difficul- cally unforgiving evolution.
or 10 years. ty believing that difference translates  ob Jansen, Erskineville,
B
Basudeb Gupta, Kolkata, India into a significant related difference in NSW, Australia
levels of consumer liability.
References The process in the physical retail Communications welcomes your opinion. To submit a
1. Parnas, D.L. Licensing software engineers in Canada. Letter to the Editor, please limit your comments to 500
Commun. ACM 45, 11 (Nov. 2002), 96–98. sector is quite different in the U.K. as words or less and send to letters@cacm.acm.org.
2. Parnas, D.L. Software engineering: An
unconsummated marriage. Commun. ACM 40, 9
a result of the EMV, or Europay, Mas-
(Sept. 1997), 128. terCard, and VISA protocol, or “Chip & © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f the acm 7


ACM, Advancing Computing as
ACM, Advancing
a Science and Computing
a Professionas
a Science and a Profession
Dear Colleague,
Dear Colleague,
The power of computing technology continues to drive innovation to all corners of the globe,
bringing with it opportunities for economic development and job growth. ACM is ideally positioned
The
to help power of computing
computing technology
professionals worldwidecontinues to driveininnovation
stay competitive to all
this dynamic corners of the globe,
community.
bringing with it opportunities for economic development and job growth. ACM is ideally positioned
to provides
ACM help computing
invaluableprofessionals worldwide
member benefits stayyou
to help competitive in this
advance your dynamic
career community.
and achieve success in your
chosen specialty. Our international presence continues to expand and we have extended our online
ACM provides
resources invaluable
to serve needs that member benefits to help
span all generations you advance
of computing your careereducators,
practitioners, and achieve success in and
researchers, your
chosen
students. specialty. Our international presence continues to expand and we have extended our online
resources to serve needs that span all generations of computing practitioners, educators, researchers, and
students.
ACM conferences, publications, educational efforts, recognition programs, digital resources, and diversity
initiatives are defining the computing profession and empowering the computing professional.
ACM conferences, publications, educational efforts, recognition programs, digital resources, and diversity
initiatives areare
This year we defining the computing
launching professionlearning
Tech Packs, integrated and empowering
packages onthecurrent
computing professional.
technical topics created and
reviewed by expert ACM members. The Tech Pack core is an annotated bibliography of resources from the
This year weACM
renowned are launching Tech–Packs,
Digital Library integrated
articles learning
from journals, packages
magazines, on current
conference technical topics
proceedings, created
Special and
Interest
reviewed by expert ACM members. The Tech Pack core is an annotated bibliography of resources
Group newsletters, videos, etc. – and selections from our many online books and courses, as well an non- from the
renowned ACM where
ACM resources Digitalappropriate.
Library – articles from journals, magazines, conference proceedings, Special Interest
Group newsletters, videos, etc. – and selections from our many online books and courses, as well an non-
ACM resources where
BY BECOMING AN ACM appropriate.
MEMBER YOU RECEIVE:

Timely
BY accessAN
BECOMING toACMrelevant
MEMBER information
YOU RECEIVE:
Communications of the ACM magazine • ACM Tech Packs • TechNews email digest • Technical Interest Alerts and
Timely access• to
ACM Bulletins ACM relevant
journalsinformation
and magazines at member rates • full access to the acmqueue website for practi-
Communications
tioners • ACM SIG the ACM magazine
of conference discounts• ACM
• theTech PacksACM
optional • TechNews email digest • Technical Interest Alerts
Digital Library
and ACM Bulletins • ACM journals and magazines at member rates • full access to the acmqueue website for
practitioners
Resources that• ACM SIGenhance
will conference discounts
your career• and
the optional
follow youACM toDigital
newLibrary
positions
Career & Job Center • online books from Safari® featuring O’Reilly and Books24x7® • online courses in multiple
Resources
languages •that will
virtual enhance
labs your career
• e-mentoring servicesand follow you
• CareerNews emailtodigest
new positions
• access to ACM’s 34 Special Interest
Career
Groups&•Job Center • email
an acm.org The Learning
forwardingCenter • online
address withbooks
spamfrom Safari® featuring O’Reilly and Books24x7® •
filtering
online courses in multiple languages • virtual labs • e-mentoring services • CareerNews email digest • access to
ACM’s36
ACM’s worldwide network
Special Interest of more
Groups than
• an 97,000
acm.org members
email rangesaddress
forwarding from students to seasoned
with spam filtering professionals and
includes many renowned leaders in the field. ACM members get access to this network and the advantages that
come worldwide
ACM’s from their expertise
network of to more
keep you
thanat100,000
the forefront of the
members technology
ranges world. to seasoned professionals and
from students
includes many renowned leaders in the field. ACM members get access to this network and the advantages that
Pleasefrom
come taketheir
a moment
expertise to to
consider
keep youtheatvalue of an ACM
the forefront membership
of the your career and your future in the
technologyforworld.
dynamic computing profession.
Please take a moment to consider the value of an ACM membership for your career and your future in the
Sincerely,computing profession.
dynamic

Sincerely,

Alain Chesnais
President
Alain Chesnais
Association for Computing Machinery
President
Association for Computing Machinery

Advancing Computing as a Science & Profession


membership application &
Advancing Computing as a Science & Profession
digital library order form
Priority Code: AD10

You can join ACM in several easy ways:


Online Phone Fax
http://www.acm.org/join +1-800-342-6626 (US & Canada) +1-212-944-1318
+1-212-626-0500 (Global)
Or, complete this application and return with payment via postal mail

Special rates for residents of developing countries: Special rates for members of sister societies:
http://www.acm.org/membership/L2-3/ http://www.acm.org/membership/dues.html
Please print clearly
Purposes of ACM
ACM is dedicated to:
Name
1) advancing the art, science, engineering,
and application of information technology
2) fostering the open interchange of
Address information to serve both professionals and
the public
3) promoting the highest professional and
City State/Province Postal code/Zip ethics standards
I agree with the Purposes of ACM:
Country E-mail address

Signature

Area code & Daytime phone Fax Member number, if applicable ACM Code of Ethics:
http://www.acm.org/serving/ethics.html

choose one membership option:


PROFESSIONAL MEMBERSHIP: STUDENT MEMBERSHIP:
o ACM Professional Membership: $99 USD o ACM Student Membership: $19 USD

o ACM Professional Membership plus the ACM Digital Library: o ACM Student Membership plus the ACM Digital Library: $42 USD
$198 USD ($99 dues + $99 DL) o ACM Student Membership PLUS Print CACM Magazine: $42 USD
o ACM Digital Library: $99 USD (must be an ACM member) o ACM Student Membership w/Digital Library PLUS Print
CACM Magazine: $62 USD

All new ACM members will receive an payment:


ACM membership card. Payment must accompany application. If paying by check or
For more information, please visit us at www.acm.org money order, make payable to ACM, Inc. in US dollars or foreign
currency at current exchange rate.
Professional membership dues include $40 toward a subscription
to Communications of the ACM. Student membership dues include o Visa/MasterCard o American Express o Check/money order
$15 toward a subscription to XRDS. Member dues, subscriptions,
and optional contributions are tax-deductible under certain
o Professional Member Dues ($99 or $198) $ ______________________
circumstances. Please consult with your tax advisor.
o ACM Digital Library ($99) $ ______________________
RETURN COMPLETED APPLICATION TO:
o Student Member Dues ($19, $42, or $62) $ ______________________
Association for Computing Machinery, Inc.
General Post Office Total Amount Due $ ______________________
P.O. Box 30777
New York, NY 10087-0777

Questions? E-mail us at acmhelp@acm.org Card # Expiration date


Or call +1-800-342-6626 to speak to a live representative

Satisfaction Guaranteed! Signature


10 Years of Celebrating Diversity in Computing

2011 Richard Tapia


Celebration of Diversity in Computing Conference
April 3-5, 2011 http://tapiaconference.org/2011/
San Francisco, CA

GZ\^hiZgcdlidViiZcYi]Z'%&&G^X]VgYIVe^V8ZaZWgVi^dcd[9^kZgh^in^c8dbeji^c\
8dc[ZgZcXZVcYhVkZ,*ÄVYkVcXZgZ\^higVi^dcgViZhVgZkVa^Yi]gdj\]IjZhYVn!BVgX]-#

S^cXZ'%%&!i]ZIVe^V8ZaZWgVi^dcd[9^kZgh^in^c ™@e^dAkX_Wjem_Yp"Egd[Zhhdgd[
8dbeji^c\]VhhZgkZYVhVaZVY^c\[dgjb[dgWg^c\" 8dbejiZgHX^ZcXZVii]ZJc^kZgh^in
^c\id\Zi]ZghijYZcih!egd[ZhhdghVcYegd[Zhh^dcVahid d[8Va^[dgc^V!7Zg`ZaZn!lVhhZaZXiZY
Y^hXjhhVcYhigZc\i]Zci]Z^geVhh^dcVcYXdbb^ibZci WnHX^Zci^ÒX6bZg^XVc^c'%%'VhdcZ
idXdbeji^c\#I]Z'%&&egd\gVbl^aa^cXajYZhiZaaVg d[*%hX^Zci^hih[dgdjihiVcY^c\
heZV`Zghl]dVgZZmZbeaVgnaZVYZgh^cVXVYZb^VVcY VX]^ZkZbZcih^chX^ZcXZVcY
^cYjhign!hjX]Vh/ iZX]cdad\n#

™?hl_d]MbWZWmiao#8[h][h"[dgbZgX]V^gd[ ™FWjjoBef[p"8dbedcZci9Zh^\c
i]Z>7B6XVYZbnd[:c\^cZZg^c\VcYi]Z'%%& :c\^cZZgl^i]>ciZa!Vl^ccZgd[
=:C668=^heVc^X:c\^cZZgd[i]ZNZVg!l^aa =ZlaZiiEVX`VgYÉhIZX]c^XVaAZVYZgh]^e
\^kZi]Z@Zc@ZccZYnBZbdg^VaAZXijgZdc 6lVgY^c'%%&!VcYXd"[djcYZg
ÆI]Z8]Vc\^c\CVijgZd[GZhZVgX]VcY d[AVi^cVh^c8dbeji^c\#
>ccdkVi^dc^ci]Z'&hi8Zcijgn#Ç
8Wi[Zedikhl[oi\hecfWijWjj[dZ[[i"\ehJWf_W
™:[XehW^;ijh_d"i]Z?dcEdhiZaEgd[Zhhdgd[ 9ed\[h[dY[(&''lZÉkZVYYZYcZlegd\gVbhidXdc"
8dbejiZgHX^ZcXZViJ8A6VcYVbZbWZgd[ cZXihijYZcihl^i]Xdbeji^c\egd[Zhh^dcVah!i]ZgZWn
i]ZCVi^dcVa6XVYZbnd[:c\^cZZg^c\!l^aaiVa` deZc^c\i]ZYddgid[jijgZdeedgijc^i^Zh!VcYVheZX^Va
dcÆEVgi^X^eVidgnHZch^c\/[gdb:XdhnhiZbh dji^c\idiV`Z^ci]Zh^\]ihd[HVc;gVcX^hXd#LZl^aa
id=jbVcHnhiZbh#Ç Xdci^cjZeVhiedejaVghZhh^dch!^cXajY^c\i]ZHijYZci
EdhiZgHZhh^dc!Idlc=VaaBZZi^c\!7VcfjZi!VcYi]Z
™7bWd;kijWY["HZc^dgK^XZEgZh^YZcid[ 9dXidgVa8dchdgi^jb!VYVn"adc\egd\gVbYZh^\cZY
:c\^cZZg^c\VcYGZhZVgX]Vi<dd\aZ!l^aa id]ZaeZfj^ehijYZcih[dgi]Z\gjZa^c\X]VaaZc\Zd[
\^kZVcV[iZgY^ccZgiVa`Zci^iaZYÆDg\Vc^o^c\ Òc^h]^c\i]Z^gYdXidgViZ#I]ZgZl^aaVahdWZGZhjbZ!
i]ZLdgaYÉh>c[dgbVi^dc#Ç <gVYHX]ddaVcY:Vgan8VgZZg6Yk^XZLdg`h]dehVcY
ViiZcYZZ"egdedhZY7D;hVcYeVcZah#
™7oWddW>emWhZ"6hhdX^ViZEgd[Zhhdg^ci]Z
:8:HX]ddaVi<Zdg\^VIZX]l]dIZX]cdad\n J^[9ed\[h[dY[fhe]hWc"d[miWdZh[]_ijhWj_ed
GZk^ZlhZaZXiZYVhV'%%(Ndjc\>ccdkVidg! _d\ehcWj_edYWdX[\ekdZWj0
l^aa\^kZi]ZiVa`ÆHcdBdiZh"GdWdi^XHX^Zci^ÒX ]iie/$$iVe^VXdc[ZgZcXZ#dg\$'%&&$
:meadgZgh[dgJcYZghiVcY^c\8a^bViZ8]Vc\Z#Ç
JWf_W9ed\[h[dY[(&''ikffehj[hi_dYbkZ[0
™8bW_i[7]k[hWo7hYWi"6gX]^iZXil^i]B^Xgd" =ee]b[VcYDWj_edWbIY_[dY[<ekdZWj_edEaVi^cjb0
hd[il]dlVhhZaZXiZYWnIZX]cdad\nGZk^ZlVh ?dj[b<daY0
V'%%-Ndjc\>ccdkVidgVcYlVhVXZaZWgViZY 9_iYe"C_Yheie\jVcYD[j7ffH^akZg0
heZV`ZgVii]ZI:9IZX]cdad\n:ciZgiV^cbZci ?8CVcYIocWdj[Y7gdcoZ0
9Zh^\cXdc[ZgZcXZ# 7cWped"<h[ZZ_[CWY"BWmh[dY[8[ha[b[oDWj_edWb
 BWXehWjeho"BWmh[dY[B_l[hceh[DWj_edWbBWXehW#
™?bboW>_Yai"6hhdX^ViZEgd[Zhhdg^ci]Z8dbej" jeho"DWj_edWb9[dj[h\eh7jceif^[h_YH[i[WhY^"
iVi^dcVaVcY6eea^ZYBVi]ZbVi^Xh9ZeVgibZci DWj_edWbI[Ykh_jo7][dYoVcYI7FHjeedgiZgh#
ViG^XZJc^kZgh^inVcYgZX^e^Zcid[i]Z'%%*
Dei^b^oVi^dcEg^oZ[dgNdjc\GZhZVgX]ZghWn I]ZIVe^V8dc[ZgZcXZ'%&&^hdg\Vc^oZYWni]Z8dVa^i^dcid
i]ZDei^b^oVi^dcHdX^Zin# 9^kZgh^[n8dbeji^c\VcY^hXd"hedchdgZYWni]Z6hhdX^Vi^dc
[dg8dbeji^c\BVX]^cZgnVcYi]Z>:::8dbejiZgHdX^Zin!^c
XddeZgVi^dcl^i]i]Z8dbeji^c\GZhZVgX]6hhdX^Vi^dc#

;ijh_d
in the virtual extension

DOI:10.1145/1897852.1897855

In the Virtual Extension


To ensure the timely publication of articles, Communications created the Virtual Extension (VE)
to expand the page limitations of the print edition by bringing readers the same high-quality
articles in an online-only format. VE articles undergo the same rigorous review process as those
in the print edition and are accepted for publication on merit. The following synopses are from
articles now available in their entirety to ACM members via the Digital Library.

viewpoint contributed article contributed article


DOI: 10.1145/1897852.1897880 DOI: 10.1145/1897852.1897881 DOI: 10.1145/1897852.1897882

Reaching Out to the Media: The Internet Electorate Governing Web 2.0
Become a Computer R. Kelly Garrett and James N. Danziger Steven DeHertogh, Stijn Viaene,
Science Ambassador and Guido Dedene
The Internet was a prominent feature
Frances Rosamond et al. Web 2.0 applications aspire to make
of the 2008 U.S. presidential election,
Science communication or public regularly noted for its role in the Obama maximal use of the level playing field
outreach can be seen as taking a lot campaign’s successful fundraising and for engagement offered by the Internet,
of time and effort compared to the supporter-mobilization efforts and for its both technologically and socially. The
perceived payoffs these types of initiatives widespread use by interested voters. This World Wide Web has thereby entered
provide. In effect, there’s a tragedy of the article reports on a national telephone “the realm of sociality,” where software
commons—we all benefit from those survey conducted in the weeks following becomes fused with everyday social life.
who do it, so there is incentive to let other that election to assess how Americans’ This evolution has taken huge strides—
people shoulder the load. experience of elections was changing in Web 2.0 environments such as Wikipedia,
The rationale behind science response to the increasing availability Facebook, and MySpace have all become
communication is fairly obvious, and it and use of the digital communication household names.
is often difficult to provide compelling network. Both practitioners and researchers
arguments that appeal to skeptics. Public The Internet has long been heralded as are converging on the usefulness of Web
outreach is related to the reputation of the an efficient means of acquiring political 2.0 for professional organizations. In and
scientific field, funding, and the integration information, but the increasing presence around enterprises, Web 2.0 platforms
of the science community into society. More of user-created content means the have been professed to support a profound
locally and perhaps more relevantly, it is network is also becoming an important change in intra- and inter-enterprise
related to the reputation of your university mode of political expression. The communication patterns. It is still early in
and to the quality of your students. article examines these complementary terms of available management research
Other sciences have established long- roles, focusing on how Americans used on so-called “enterprise 2.0” experiences.
lasting traditions of transmitting their key the Internet to learn about the 2008 Nevertheless, we have observed, as have
issues, raising public awareness including campaign, share political information, others, that the way for organizations to
highlights such as the Nobel Prize or and voice their own opinions. Also capture benefits from Web 2.0 technology
the Fields Medal (however, rarely is the considered is which individuals are most in the enterprise probably differs
general public aware of the ACM A.M. likely to engage in online information substantially from the way they attended
Turing Award). acquisition and expression, examining to other enterprise information system
Computer science is not yet where it the influence of these practices on voters. projects in the past.
should be regarding public awareness. Analyses are based on the national This article proposes a set of grounding
The reasons for this situation may lie in random-digit dial telephone survey of 600 principles to get the most out of enterprise
the relative youth of the area, the rapid adult Americans conducted two weeks 2.0 investments. The principles represent
advances in the field, as well as the fast- after the 2008 election (November 6–20), a synthesis of existing management
moving technology that computer science with a response rate of 26.2%. theory and the author’s own case research
is related to. Computer scientists face In the lead-up to the U.S. election in of companies with recent experience in
the myriad drawbacks of lacking public 2008, nearly two-thirds (64%) of Americans introducing Web 2.0 into their enterprises.
awareness. They are confronted with low got campaign news online, a marked The successful introduction of Web 2.0
enrollment numbers and low funding, increase from 2004, when only about one- for the enterprise will require a move
and to some extent, they feel ignored quarter (27%) of Americans said they got away from predesigned paternalistically
and misunderstood. The authors of this campaign news online. Equally notable imposed communication strategies and
article provide suggestions for what can be is the fact that in 2008 two-fifths (38%) structures, toward carefully stimulating a
pragmatically done to increase coverage of of respondents reported seeking online many-to-many, decentralized emergence
computer science in the media. campaign news almost every day. of bottom-up communicative connections.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 11
The Communications Web site, http://cacm.acm.org,
features more than a dozen bloggers in the BLOG@CACM
community. In each issue of Communications, we’ll publish
selected posts or excerpts.

Follow us on Twitter at http://twitter.com/blogCACM

doi:10.1145/1897852.1897856 http://cacm.acm.org/blogs/blog-cacm

Scientists, Engineers, they have to pay in order to get results.


The evidence is that the problem of

and Computer Science;


teaching everyone else about computer
science is bigger than teaching com-
puter science majors about computer

Industry and science. Chris Scaffidi, Mary Shaw, and


Brad Myers have estimated that, by
2012, there will be about three million

Research Groups professional software developers in the


U.S., but there will also be about 13 mil-
lion end-user programmers—people
Mark Guzdial discusses what scientists and engineers should know who program as part of their work, but
about computer science, such as Alan Kay’s “Triple Whammy.” do not primarily develop software. This
Greg Linden writes about industry’s different approaches to research result suggests that for every student in
your computer science classes, there
and how to organize researchers in a company.
are four more students who could use
some help in learning computer sci-
ence. Those scientists and engineers
Mark Guzdial figure out what to teach scientists and who will be programming one day are
“What do Scientists engineers about computing: in those other four.
and Engineers Need I’ve been trying to construct a better Brian Dorn and I have a paper,
to Know About answer for the past 13 years; Software “Learning on the Job: Characteriz-
Computer Science?” Carpentry (http://software-carpentry. ing the Programming Knowledge and
http://cacm.acm.org/ org/blog/) is what I’ve arrived at. It’s the Learning Strategies of Web Designers,”
blogs/blog-cacm/96699 10% of software engineering (with a very in the 2010 ACM International Com-
A new effort at the Texas Advanced small “e”) that scientists and engineers puting Education Research workshop
Computing Center is aimed at teach- need to know before they tackle GPUs, on Brian’s work studying graphic de-
ing scientists and engineers about clusters, and other fashionable Everests. signers who program. Brian finds that
supercomputing. They argue that Like sanitation and vaccination, the ba- these end-user programmers don’t
“Anyone looking to do relevant compu- sic skills it teaches are cheap and effec- know a lot about computer science,
tational work today in the sciences and tive; unfortunately, the other character- and that lack of knowledge hurts them.
engineering must have these skills.” istic they share is that they’re not really He finds that they mostly learn to pro-
They offer a certificate or portfolio in what photo ops are made of. We’ve also gram through Google. In his most
“Scientific Computation.” found a lot of resistance based on survi- recent work, he is finding that not
Greg Wilson has been going after vor bias: all too often, senior scientists knowing much about computer sci-
this same goal using a different strate- who have managed to get something to ence means that they’re inefficient at
gy. He suggests that before we can teach work on a supercomputer say, “Well, I searching. When they see “try-catch”
scientists and engineers about high- didn’t need version control or unit test- in a piece of code that they’re trying to
performance computing, we first have to ing or any of that guff, so why should my understand, they don’t know to look
teach them about computing. He leads students waste their time on it?” Most sci- up “exception handling,” and they can
an effort called “Software Carpentry” to entists rightly regard computing as a tax easily spend hours reading about Java

12 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
blog@cacm

exception handling when they are actu- The purpose is not even well under- ephemeral teams get people to know
ally working in JavaScript. stood. The business strategy behind people, yielding valuable peer net-
Maybe we should be teaching sci- forming a research group sometimes works. When a tough research prob-
entists and engineers about computer seems to be little more than a vari- lem later comes up and no one nearby
science more generally. But as Greg ant of the Underpants Gnomes’ plan knows how to solve it, finding the per-
Wilson points out, they don’t want in South Park. Phase 1: Hire Ph.Ds. son in the company who can solve it
much—they see computer science Phase 2: ? Phase 3: Profit! becomes much easier.
as a “tax.” What’s the core of com- Generally, researchers in industry Many other companies, including
puter science that even scientists and are supposed to yield some combina- Microsoft, Facebook, and Twitter, main-
engineers ought to know? Alan Kay tion of long-term innovation, improv- tain separate research organizations,
recently suggested a “Triple Wham- ing the sophistication of technology but try to keep the researchers working
my” (http://computinged.wordpress. and products beyond simple and ob- very closely with the product teams. At
com/2010/05/24/the-core-of-computer- vious solutions, and helping to attract these companies, the impetus for novel
science-alan-kays-triple-whammy/) talented and enthusiastic developers. research often is a problem in the prod-
defining the core of computer science: To take one example in search, with- uct, usually a problem that would not
1. Matter can be made to remember, out researchers who know the latest be obvious in academia because of their
discriminate, decide, and do. work, it would be hard for a company to lack of access to big data and scale.
2. Matter can remember descrip- build the thousands of classifiers that What organizational structure
tions and interpret and act on them. ferret out subtleties of query intent, works best in industry may depend on
3. Matter can hold and interpret and document meaning, and spamminess, your goals. For immediate impact, hav-
act on descriptions that describe any- all of which is needed for a high-quality ing researchers integrated into product
thing that matter can do. search experience. Information retriev- groups provides a lot of value; they are
That’s a pretty powerful set. It goes al is a field that benefits from a long directly solving today’s hard problems.
way beyond Python vs. Java, or using history of past work, and researchers But what about the problem that might
Perl to check genome sequences with often are the ones that know the history hit in a year or two? And what about
regular expressions vs. using MATLAB and how to stand on giants’ shoulders. long-term breakthroughs, entirely new
for analyzing data from ecological sim- Even so, there are many in industry products, enabled by new technology
ulations. How do we frame the Triple that consider researchers an expensive no one has thought of yet?
Whammy in a way that fledgling scien- luxury that their company can ill afford. My personal opinion leans mostly
tists and engineers would find valuable Part of this comes from the historically toward integrating researchers on
and learnable? common organizational structure of projects, much like Google does, but
having a separate and independent also giving researchers 20% time (as all
Reader’s comment research lab, which sometimes looks developers should get) and occasion-
The worrying trend I see is that many to be a gilded ivory tower to those who ally turning a 20% time project into a
computer engineering graduates are feel they are locked outside. full project (again, as all developers
interested in learning only a large set of The separate research lab is the tra- should get, but the threshold for what
programming languages, but dislike courses ditional structure, but a problematic is considered impactful might differ
like algorithm design, not realizing that these one, not only for the perception of the for a researcher, given the speculative
languages are merely tools for implementing group by the rest of the company, but gamble that is the nature of research).
solutions. The end result is what you could also because the researchers can be so This strikes a balance between imme-
call technicians but not engineers. far removed from the company’s prod- diate impact, doing novel research,
—Farhan Ahmad ucts as to have little ability to make an and taking advantage of a long-term
impact. Many companies appear to opportunity when inspiration hits.
Greg Linden be trying other ways of organizing re- What do you think? How should re-
“Research in the Wild: searchers into the company. searchers be organized in companies?
Making Research For example, Google is well known Why?
Work in Industry” for integrating many of its researchers
http://cacm.acm.org/ into product groups and shifting them Reader’s comment
blogs/blog-cacm/97467 among product groups, working side- Research as a process and a profession and
How to do research in academia is well by-side with different development as a mind set is quite different than product-
established. You get grants to fund teams. While on a particular project, making. Pushing the two too close together
your group, attract students, publish a researcher might focus on the part or expecting people to be good at both may
papers, and pump out Ph.Ds. Depend- of the problem that requires esoteric not always be optimal. See my “Research as
ing on who you ask and how cynical knowledge of particular algorithms, product” post on the FXPAL Blog.
they have become, the goals are some but they are exposed to and work on —Gene Golovchinsky
combination of impacting the field, many problems in the product. When
educating students, and personal ag- this group comes together, everyone Mark Guzdial is a professor at the Georgia Institute
of Technology. Greg Linden is the founder of Geeky
grandizement. shares knowledge, and then people Ventures.
Research in industry is less estab- move to another group, sharing their
lished. How to organize is not clear. knowledge again. Moreover, these © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 13
cacm online

ACM
Member
News
DOI:10.1145/1897852.1897857 David Roman David Harel Elected
to Israel Academy of

Time to Change Sciences and Humanities


David Harel, the William
Sussman Professor of
Computer Science and
Applied Mathematics at the
The list of add-on features for Communications’ Web site began to take form Weizmann Institute of Science,
was inducted last December
shortly after the site was launched two years ago, and was a starting point for re- into the Israel Academy of
visions now under way. Over the course of the last several months, suggestions to Sciences and Humanities. He
enhance the site were solicited and explored and are now spiriting the changes is the first Academy member
that will lead to a streamlined and improved site later this year. to be inducted as a computer
scientist. Several computer
Always a work in progress, Communications’ site has changed in some ways scientists, including Noga
since launch, most visibly with the addition of author-generated videos on the Alon, Amir Pnueli, Michael
homepage. Some backend changes have also taken place. The upcoming chang- Rabin, and Adi Shamir, had
been previously inducted
es are intended to remove elements that some users find crowded or confusing.
into the Academy, but as
Communications’ 2010 Readership Survey, conducted by Harvey Research mathematicians. In the past
Inc., points out some of the site’s favored features. Not surprisingly, the “Current Harel has worked in several
Issue” is the most areas of theoretical computer
science, but in recent years his
widely used feature, research has focused on areas
cited by 57.8% of such as software and systems
survey respondents. engineering, visual languages,
The magazine ar- and the modeling and analysis
of biological systems, and taken
chive, cited by 40.3%, a more practical direction. At
ranks second. The the induction ceremony, Harel
News section ranks says, the Academy’s head of
natural sciences made a point
third, at 32.3%. of telling Harel that he was
The readership “happy to have me in because of
survey also shows my practical work.”
there is work to be In January, Harel was
honored by the A.M.N.
done, particularly Foundation for the
to encourage more Advancement of Science,
frequent and longer Art, and Culture in Israel,
which named him one of
visits to the site.
seven recipients of the $1
Participants in million EMET Prize. Harel was
the survey were recognized for his studies on a
asked for sugges- “wide variety of topics within
computer science, for his
tions to improve the results that are at the forefront
site. It is impossible of scientific research, and for
to implement all his achievements that have
suggestions, though become a standard and working
tools in many industries around
not for lack of time, interest, or resources, but for the wide variety of opinions the world.”
expressed. For example, one 13-year veteran of ACM reads articles online only Harel suggests his
and asked ACM to “dispense with the print edition,” while a 17-year member selection by the Academy may
bring greater recognition to
said the opposite. “I sit at the computer all day for work. I don’t want to sit at it computer science and “make
to read magazines.” To accommodate such diverse preferences, both print and it easier to get people in on
online formats will continue for the foreseeable future (http://cacm.acm.org/ this ticket.” But “far more
magazines/2011/2/104384). important to me,” Harel says, is
that Israel’s three A.M. Turing
Guiding these site plans are Communications’ savvy Web Board, ACM vol- Award winners—Pnueli, Rabin,
unteers, and HQ staffers, plus site testers and developers. All have a voice and Shamir—were previously
in the revised site. Plans are not finalized, but some upcoming changes will elected into the Academy. “This
makes me extremely proud and
affect search functionality, social media features, and the addition of other humbled by my own election.
ACM content. We look forward to announcing the next iteration of Communi- Great company indeed!”
cations’ Web site in coming months. —David Lindley

14 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
N
news

Science | doi:10.1145/1897852.1897858 Kirk L. Kroeker

Grid Computing’s Future


Outreach programs and usability improvements are drawing
many researchers to grid computing from disciplines that have
not traditionally used such resources.

I
n r ece n t y ears , several pow-
erful research grids consisting
of thousands of computing
nodes, dozens of data cen-
ters, and massive amounts of
bandwidth have emerged, but few of
these grids have received much atten-
tion in the mainstream media. Unlike
seti@home, folding@home, and other
highly focused grid projects that have
captured the popular imagination by
allowing home users to donate com-
pute cycles, the big research grids are
not accessible to the public and their
fame does not extend far beyond the
researchers who use them. Outreach
teams and usability engineers at the
largest of these new grids, such as Na-
regi, Egee, and TeraGrid, are trying to
change that reality by helping to facili- A grid-based computer simulation of the gravitational waves produced as two black holes
Scientific Visualiz atio n by Werner Benger, AE I/ZIB/LSU/UIBK

tate the adoption of grid technologies merge with each other to form a larger black hole.
in fields that have not traditionally used
grid-based supercomputing resources. and is tied to a 10-gigabyte backbone ers working on almost 1,600 projects at
TeraGrid, said to be the world’s larg- that connects primary network facili- the end of 2009.
est distributed network infrastructure ties in Los Angeles, Denver, and Chi- Matthew Heinzel, director of Ter-
for open scientific research, is one such cago. At maximum capacity, TeraGrid aGrid’s grid infrastructure group,
network that has quietly been making can produce more than a petaflop of says that TeraGrid’s outreach teams
waves in research communities out- computing power and store more than have done an excellent job drawing re-
side computer science, and is helping 30 petabytes of data. The project, start- searchers from fields outside comput-
to solve complex problems in biology, ed by the National Science Foundation er science. To obtain CPU time on the
physics, medicine, and numerous oth- in August 2001, has grown in the past grid, scientists simply submit a request
er fields. TeraGrid consists of 11 data- five years from fewer than 1,000 active that describes the work to be done; ex-
center sites located around the U.S. users in 2005 to nearly 5,000 active us- tensive CPU-time requests are subject

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 15
news

code. To accommodate those who do


not have the skills needed to translate
Computers in their problems into compatible code,
TeraGrid has allocated funding to em-
Scientific Research bed dedicated supercomputer special-
ists in research teams for up to one year.
In 2009, Greg Wilson published details of a research project designed not only to Heinzel says that these specialists, who
determine how scientists learn what they know about using computers and how widely
they share their software, but also to gauge the ratios of scientists who use desktop write and refine project-specific code,
systems versus local clusters and grid-computing resources. Conventional wisdom are among TeraGrid’s most valuable
suggests that as clusters and grids become more sophisticated and increasingly user resources, and represent a large part of
friendly, more scientists will use them instead of less powerful desktop PCs. The
the infrastructure group’s budget.
findings Wilson and his colleagues produced run counter to that thinking.
While hard numbers indicate that more scientists each year are using grid-based “You can’t just put a big piece of
resources to conduct their research, Wilson’s project indicates that amount of work is hardware on the floor and say, ‘OK
relatively small compared to the vast number of projects still being run on the desktop. guys, here you go,’ ” says Heinzel. “You
Some 81% of those participating in the research said they primarily use desktop
machines, with only 13% saying they use intermediate-sized machines such as local need somebody with the skills to help
clusters, and 6% saying they use supercomputers for their research. people not only run the code but also
Wilson, until recently a computer science professor at the University of Toronto improve the code.”
and now at work full time on a course designed to address what he perceives as a
computer-skills shortfall among scientists, says these figures might very well change in
the years ahead. “If we’re lucky, though, most scientists won’t notice,” he says. “Behind Modeling Black Holes
the scenes, an increasing number of data services will migrate upward, but the vast One researcher who has conducted ex-
majority of scientists will reach them through desktop interfaces, because that’s all tensive work on TeraGrid is Erik Sch-
they’ll have the skills for.”
That trend might already be materializing. Wilson says that in the year since the
netter, an assistant research professor
details of his research were published, he has witnessed an increased interest in in the department of physics and as-
commodity clouds, such as Amazon’s Elastic Compute Cloud. “The course Titus Brown tronomy at Louisiana State University.
just taught at Michigan State [http://bit.ly/aE5Qpg] would have been a lot harder on His research has modeled black holes,
everyone without pay-and-play,” says Wilson, who predicts that more researchers will
begin using such systems, which can be run independently from supercomputing-style neutron stars, and other highly com-
applications. plex astrophysical objects. The most
As for the efforts to make grids easier to use and to draw researchers from fields that interesting of these objects, he says,
don’t typically train scientists in supercomputer programming, Wilson says these efforts
will fail outright unless scientists first invest in basic computing skills. “Asking someone
are gamma-ray bursts, which are bright
who doesn’t know how to modularize or test a program to use clouds and GPUs is as bursts of high-energy photons that are
sensible as taking someone who’s just learning how to drive a family car and putting visible from Earth and are said to be
them behind the wheel of a transport truck on the highway at rush hour,” he says. generated by the most energetic events
The best solution, according to Wilson, is to teach scientists some basic
computational skills so they can tackle the challenges posed by advanced resources in the universe. “It turns out that these
without expending heroic effort, which is the primary goal of the project Wilson is bursts emanate from billions of light
working on now, called Software Carpentry. As for future research to explore ways of years away, essentially at the other end
assessing the impact of computers on scientists’ productivity, Wilson wasn’t able to of the universe,” says Schnetter. “The
find funding. “This is one of the reasons I left academia and returned to industry,” he
says. “As in health care, it seems that most people are more interested in pushing new fact that they are still so brightly visible
things to market than in finding out what works and what doesn’t.” —Kirk L. Kroeker here means that they must come from
truly tremendous explosions.”
The mechanism that creates these
to a quarterly peer-review process. Sur- Laboratory, extend beyond mere com- explosions, the source of the energy,
prisingly, molecular biosciences, phys- pute cycles and bandwidth. As more is not completely understood. After
ics, astronomical sciences, chemistry, users from outside computer science decades of research, the astrophys-
and materials research top the list of are drawn to TeraGrid to run complex ics community found that one model,
disciplines whose researchers are us- computational problems, one ongo- called the collapsar model, might help
ing the grid. “In some cases, we are vic- ing and key challenge is usability for explain the gamma-ray bursts. “What
tims of our own success,” says Heinzel. those who do not have the necessary we do in one of our projects is model
“We’ve extended our user base well be- computer science skills. (The sidebar stars that form a supernova, then form
yond our goals.” But the downside to “Computers in Scientific Research,” a black hole at their center, and then
this success, he says, is that TeraGrid above, describes one project designed we study how the remaining material
is being asked for far more CPU cycles to address this skills shortfall.) behaves,” says Schnetter. “These are
than it can provide. TeraGrid’s resources, coordinated very complex systems, and modeling
Still, Heinzel says that having more by UNIX-based Globus grid software, them is a large task.”
demand than capacity is a positive are integrated through a service-orient- The computer code used to calcu-
force that generates an impetus for the ed architecture, with all systems run- late these models not only is complex,
network to grow regularly. But the chal- ning a common user interface. While but also requires significant compu-
lenges facing Heinzel and TeraGrid’s the interface includes job-submission tational power. Schnetter’s group has
infrastructure group, which is run and data-management tools, the net- performed simulations on local work-
through the University of Chicago in work requires problems to be trans- stations and clusters, but he says that
partnership with the Argonne National lated into specific types of grid-ready any kind of production work that has a

16 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news

high level of accuracy requires systems dimensional flow fields and other Stanford’s Naiman offers a simi-
that are too large for a single university. data, such as time histories for cloud lar observation. He says that while his
For example, the nodes processing Sch- ice mass. Naiman says that although work on the grid has been positive, the
netter’s modeling code require com- the TeraGrid data is still undergoing usability of the technology could be im-
munication with other nodes several analysis, it is likely to help improve proved. For his part, TeraGrid’s Heinzel
times per second and a data-exchange understanding of the development says he remains optimistic about the
rate of about one gigabyte per second. of contrails and their environmental accessibility of grids, and predicts that
“To finish a simulation in a reason- impact. “TeraGrid performs as adver- major usability improvements are on
able time, we need to use hundreds tised, providing us with CPU hours the way. He likens the evolutionary
or thousands of cores,” he says. “That that we would not have had access to pace of these grid developments to
means we need to split the problem otherwise,” he says. “We also take ad- how the Web quickly emerged from the
into many pieces, and we need to en- vantage of the large archival storage Internet and now requires little more
sure that each of these pieces remains available on TeraGrid to ensure that than a browser and a basic knowledge
as independent from the others as important data is backed up.” of hyperlinks. “If you know exactly how
possible.” By using this modular tech- to use grid tools, they work effectively,”
nique, Schnetter’s team can replace or Improving Grid Software says Heinzel. “Now we need to make
exchange grid code if it becomes nec- As for the future of research on grid them more user-friendly so we can get
essary to apply new physics or use a dif- networks, TeraGrid’s Heinzel says he a wider audience.”
ferent hardware architecture. remains optimistic, but points out that In the future envisioned by Heinzel,
Another project run on TeraGrid is improvements must be made in grid grids will be manipulated easily by
an effort to understand the environ- software not only to enhance ease of computer scientists while still provid-
mental impact of aviation. Conducted use for researchers such as Schnetter ing friendly interfaces for researchers
out of Stanford University by doctoral and Naiman, but also to take complete coming from other fields. Rather than
candidate Alexander Naiman and advantage of new generations of hard- predicting that the arrival of such tech-
overseen by Sanjiva Lele, a professor ware. “You have to be almost a systems nologies will take decades or more,
in the department of aeronautics and admin to set your parameters on data Heinzel says that much progress will
astronautics, the project models con- movement correctly so you can take be made in the next few years alone.
densation trails, the ice clouds formed full advantage of these systems,” says “We’re going to see some big improve-
by aircraft emissions. Naiman, whose Heinzel. “So the software really needs ments in the usability of the grid and
research group specializes in compu- to mature.” grid software in the next two to four
tational fluid dynamics and turbulence Echoing these concerns, LSU’s years,” he says. “Future systems will be
simulations, says the difficulty of con- Schnetter points out that his research very user-friendly with a high degree
trail modeling becomes increasingly groups consist of people with widely of abstracting the inner workings of
acute as the complexity of the model varying degrees of supercomputer expe- what’s going on from the end users.”
increases. “The more complex the rience. “Teaching everybody how to use
flow, the higher resolution required to the different systems, and staying on
Further Reading
simulate it, and the more resources are top of what works best on what system,
needed,” he says. and which parameters need to be tuned Ferreira, L., Lucchese, F., Yasuda, T., Lee, C.Y.,
Queiroz, C.A., Minetto, E., and Mungioli, A.S.R.
While Stanford has local supercom- in what way to achieve the best perfor-
Grid Computing in Research And Education.
puting resources, they are in high de- mance, is like herding cats,” he says. IBM Redbooks, Armonk, NY, 2005.
mand. “TeraGrid provides relatively “There are almost no GUIs for super-
Magoulès, F.
large and modern supercomputing re- computers, and most of the ones that Fundamentals of Grid Computing: Theory,
sources to projects like ours that have exist are really bad, so that using them Algorithms, and Technologies. Chapman &
no other supercomputing support,” he requires some arcane knowledge.” Hall, Boca Raton, FL, 2009.
says. The simulation code that Naiman Schnetter says he hopes that grid- Neeman, H., Severini, H., Wu, D.,
and his team run on TeraGrid was based supercomputing will have a and Kantardjieff, K.
written at the Center for Turbulence much larger influence on the curricu- Teaching high performance computing via
Research at Stanford. Naiman says it lum than it does today, especially with videoconferencing, ACM Inroads 1, 1, March
2010.
was easy to get that code running on so few universities teaching scientific
TeraGrid. The research group paral- programming at the level required to Scavo, T., and Welch, V.
A grid authorization model for science
lelized the program, a type of large effectively use grid resources. “The gateways, International Workshop on Grid
eddy simulation, using standard mes- good students in my group learned Computing Environments 2007, Reno, NV,
sage-passing interface strategies that programming by themselves, on the Nov. 11, 2007.
Naiman says have been highly scalable side, because they were interested,” Wong, J. (Ed.)
on TeraGrid. he says. Still, Schnetter suggests that Grid Computing Research Progress. Nova
The contrail modeling project is such self-taught programming might Science Publishers, Hauppauge, NY, 2008.
ongoing, but so far the Stanford team not be sustainable in a world in which
has simulated the first 20 minutes of computers are becoming increasingly Based in Los Angeles, Kirk L. Kroeker is a freelance
editor and writer specializing in science and technology.
contrail development for several sce- complex. “I hope that this changes in
narios, producing terabytes of three- the next decade,” he says. © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 17
news

Society | doi:10.1145/1897852.1897860 Neil Savage

Twitter as Medium
and Message
Researchers are mining Twitter’s vast flow of data to measure public sentiment,
follow political activity, and detect earthquakes and flu outbreaks.

T
wi t t e r g e ne rate s a lot of
noise. One hundred sixty
million users send upward
of 90 million messages per
day, 140-character mus-
ings—studded with misspellings,
slang, and abbreviations—on what they
had for lunch, the current episode of
“Glee,” or a video of a monkey petting
a porcupine that you just have to watch.
Individually, these tweets range
from the inane to the arresting. But
taken together, they open a surprising
window onto the moods, thoughts, and
activities of society at large. Research-
ers are finding they can measure public
sentiment, follow political activity, even
spot earthquakes and flu outbreaks,
just by running the chatter through
algorithms that search for particular
words and pinpoint message origins.
“Social media give us an opportu-
nity we didn’t have until now to track Truthy shows how a tweet propagates, with retweets in blue and topic mentions in orange.
what everybody is saying about every- Tweets that are sent back and forth between two Twitter accounts appear as a thick blue bar.
thing,” says Filippo Menczer, associ-
ate director of the Center for Complex ness than somebody tweeting “home which time the disease has almost cer-
Networks and Systems Research at In- sick with flu.” But it can take a week tainly spread. Twitter reports, though
diana University. “It’s amazing.” or two for the CDC to collect the data less precise, are available in real time,
The results can be surprisingly accu- and disseminate the information, by and cost a lot less to collect. They could
rate. Aron Culotta, assistant professor draw health officials’ attention to an
of computer science at Southeastern outbreak in its earlier stages. “We’re
Louisiana University, found that track- Twitter data may help certainly not recommending that the
ing a few flu-related keywords allowed CDC stop tracking the flu the way they
him to predict future flu outbreaks. He answer sociological do it now,” Culotta says. “It would be
used a simple keyword search to look questions that are nice to use this as a first-pass alarm.”
at 500 million messages sent from Sep- Google Flu Trends does something
tember 2009 to May 2010. Just finding otherwise hard to similar. One potential point in Twit-
the word “flu” produced an 84% cor- approach, because ter’s favor is that a tweet contains more
relation with statistics collected by the words, and therefore more clues to
U.S. Centers for Disease Control and polling enough people meaning, than the three or four words
Im age courtesy of t rut hy.india na .ed u

Prevention (CDC). Adding a few other is too expensive and of a typical search engine query. And
words, like “have” and “headache” in- training algorithms to classify mes-
creased the agreement to 95%. time consuming. sages—filtering out the tweets that talk
The CDC’s counts of what it terms about flu shots or Bieber Fever—im-
influenza-like illness are based on proves the accuracy further.
doctors’ reports of specific symptoms There are other physical phenom-
in their patients, so they’re probably ena where Twitter can be an add-on to
a more accurate measure of actual ill- existing monitoring methods. Air Twit-

18 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news

ter, a project at Washington University Technology

IBM’s
in St. Louis, looks for comments and
photos about events like fires and dust “If you think about
storms as a way to get early indications our society as being
about air quality. And the U.S. Geologi-
cal Survey (USGS) has explored using a big organism,” says 2015 Pre-
Twitter messages as a supplement to
its network of seismographic monitors
Noah Smith, “this is
just another tool to
dictions
that alert the federal agency when an
earthquake occurs. Paul Earle, a seis- look inside of it.” IBM recently unveiled its fifth
annual “Next Five in Five,” a
mologist at the USGS, is responsible
list of technology innovations
for quickly getting out information the company says have the
about seismic activity. He has searched potential to change how
for spikes in keywords—“OMG, earth- people work, live, and play over
the next five years. The IBM
quake!” is a popular tweet—for a quick predictions are:
alert of an event. “It’s another piece of the sorts of information that might be ˲˲ 3D interfaces will let peo-
information in the seconds and min- derived from the Twitter data stream, ple interact via 3D holograms
utes when things are just unfolding,” even without taking additional steps to in real time. As 3D and holo-
graphic cameras become more
Earle says. “It comes in earlier. Some or filter out false positives. “There’s just sophisticated and miniaturized
most of its value is replaced as we get simply so much data that you can do to fit into mobile phones, us-
more detailed or science-derived infor- pretty decently, even by taking naïve ap- ers will be able to interact with
photos, surf the Web, and chat
mation.” proaches,” says Mislove. in novel ways.
Earle says Twitter might help weed This type of inquiry, of course, has ˲˲ Scientific advances in tran-
out the occasional false alarm from au- limitations. Researchers readily admit sistors and battery technology
tomated equipment, when no tweets that Twitter data is noisy, and it’s not will allow devices to last about
10 times longer than they do
follow an alert. The content of tweets always simple to know what a word now. Instead of today’s heavy
might also supplement Web-based means—in some parlances, “sick” is a lithium-ion batteries, scientists
forms that collect people’s experiences good thing. But with hundreds of mil- are working on batteries that
use air to react with energy-
of an earthquake and are used to map lions of messages, the errors tend to
dense metal.
the event’s intensity, a more subjective shrink. Another worry is hysteria; peo- ˲˲ Sensors in phones, cars, and
measure of impact that includes fac- ple worried about swine flu might tweet other objects will collect data
tors such as building damage. A recent about it more, leading others to worry to give scientists a real-time
picture of the environment.
earthquake in Indonesia, for instance, and tweet (or retweet), so there’s a spike IBM recently patented a tech-
produced a spike of tweets—in Indo- in mentions without any increase in ac- nique that enables a system to
nesian. There’s no Web form in that tual cases. accurately conduct post-event
language for intensity, but Earle says There’s also sample bias; certain analysis of seismic events, as
well as provide early warnings
Indonesian tweets might help fill in the segments of the population use Twitter for tsunamis.
blanks. more than others. But researchers seek- ˲˲ Advanced analytics technolo-
ing to glean insights from tweets can gies will provide personalized
recommendations that get
Sentiment Analysis apply corrections to the sample, just as commuters where they need
Many researchers are doing senti- traditional pollsters do. And as a wider to go in the fastest time. Using
ment analysis of tweets. Using tools variety of people send more tweets, the new mathematical models and
from psychology, such as the Affective bias is reduced. “The more data you IBM’s predictive analytics tech-
nologies, researchers will ana-
Norms for English Words, which rates have, the closer you get to a true repre- lyze and combine multiple pos-
the emotional value of many words, sentation of what the underlying popu- sible scenarios that can deliver
Alan Mislove tracked national moods, lation is,” says Noah Smith, an assistant the best routes for daily travel.
˲˲ Innovations in computers
and found that Americans tend to be professor of computer science at Carn-
and data centers are enabling
a lot happier on Sunday morning than egie Mellon University. the excessive heat and energy
Thursday evening, and that West Coast Smith is examining how Twitter can that they give off to help heat
residents seem happier than those on supplement more familiar polling. One buildings in the winter and
power air conditioning in the
the East Coast. advantage is that pollsters can influ- summer. With new technolo-
“I think this is going to be one of the ence the answers they get by the way gies, such as novel on-chip
most important datasets of this era, be- they phrase a question; people are fairly water-cooling systems, the ther-
cause we are looking at what people are consistent, for example, in being more mal energy from processors can
be efficiently recycled to provide
talking about in real time at the scale of supportive of “gay marriage” than of hot water for an office or home.
an entire society,” says Mislove, an as- “homosexual marriage,” just because The 2015 predictions
sistant professor of computer science of the word choice. Studying tweets, are based on emerging
technologies from IBM’s labs
at Northeastern University. He says which people send out of their own ac- around the world as well as
there’s no easy way to validate those re- cord, removes that problem. “We’re not market and societal trends.
sults, but as a proof-of-concept it shows actually talking to anyone. We’re not —Bob Violino

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 19
news

asking a question. We’re just taking and campaign names, the system de- CDC statistics or public opinion polls,
found data,” Smith says. “We can get a tects what he calls memes, messages but others remain unproven. Still, the
much larger population of people par- about a specific topic or candidate. It scientists are excited at the prospects
ticipating passively.” then displays a graphic representa- of what they might find by mining such
Indeed, he says, Twitter data may tion of how each meme propagates, a large, raw stream of data. “As Twitter
help researchers answer all sorts of with retweets in blue and mentions of and other social media grow, you’ll be
sociological questions that are other- the topic in orange. If someone sets up able to ask much more fine-grained
wise hard to approach, because poll- two accounts and repeatedly sends the questions,” says Smith. “If you think
ing enough subjects is too expensive same tweets back and forth between about our society as being a big organ-
and time-consuming using traditional them—an effort to show up in Twit- ism, this is just another tool to look in-
methods. For instance, Smith says, a ter’s popular “trending topics” list—it side of it.”
researcher might study how linguistic appears as a thick blue bar. Networks
patterns correlate to socioeconomic of automated tweets pushing a particu-
Further Reading
status, and perhaps learn something lar meme show up as regular, orange
about communication patterns among starbursts. More natural propagations Chen, J., Nairn, R., Nelson, L., and Chi, E. H.
Short and tweet: experiments on
people in different demographic look like fuzzy dandelions. In some
recommending content from information
groups. That, in turn, could reveal cases, the tweets carry links to Web streams, ACM Conference on Human
something about their access to infor- sites with questionable claims or even Factors in Computing Systems, Atlanta, GA,
mation, jobs, or government services. strident propaganda. Others turn out April 10–15, 2010.
Of course, the power of widespread, to be pitching a product. Culotta, A.
unfiltered information invites the pos- The patterns, along with informa- Detecting influenza outbreaks by analyzing
sibility of abuse. Two researchers at tion about when each Twitter account Twitter messages, KDD Workshop on Social
Media Analytics, Washington, D.C., July 25,
Wellesley University, Panagiotis Metax- was created and whether the owner is
2010.
as and Eni Mustafaraj, found that dur- known, allow voters to distinguish ac-
ing a special election in Massachusetts tual political dialogue from organized Earle, P., Guy, M., Buckmaster, R., Ostrum, C.,
Horvath, S., and Vaughan, A.
for U.S. Senate, the Democratic candi- attacks. Menczer hopes to add senti- OMG earthquake! Can Twitter improve
date, Martha Coakley, was the subject ment analysis to analyze the content of earthquake response? Seismological
of a “Twitter bomb” attack. A conserva- the messages as well as their dispersal Research Letters 81, 2, March/April 2010.
tive group in Iowa, the American Future patterns. Metaxas, P.T. and Mustafaraj, E.
Fund, sent out 929 tweets in just over At Xerox Palo Alto Research Center, From obscurity to prominence in minutes:
two hours with a link to a Web site that Research Manager Ed H. Chi is also political speech and real-time search, Web
attacked Coakley. The researchers es- looking at message propagation. “Twit- Science Conference, Raleigh, NC, April
26–27, 2010.
timate the messages could have been ter is kind of this perfect laboratory
seen by more than 60,000 people before for understanding how information O’Connor, B., Balasubramanyan, R.,
Routledge, B., and Smith, N.
being shut down as spam. spreads,” Chi says. Such a study can
From Tweets to polls: linking text sentiment
Indiana’s Menczer developed a tool improve theoretical models of informa- to public opinion time series, Proceedings
to distinguish between organized par- tion dispersal, and also give people and of the International AAAI Conference on
tisan spamming and grass-roots activ- businesses better strategies for deliver- Weblogs and Social Media, Washington,
ism. He calls it Truthy, from comedian ing their messages or managing their D.C., May 23–26, 2010.
Stephen Colbert’s coinage describing reputations.
a statement that sounds factual but Much Twitter-based research is still Neil Savage is a science and technology writer based in
Lowell, MA.
isn’t. Starting with a list of keywords preliminary. Some findings can be vali-
that includes all candidates, parties, dated through other sources, such as © 2011 ACM 0001-0782/11/0300 $10.00

Milestones

AAAS Fellows
In December, the American for the Section on California, Santa Cruz/Palo Carolina, Chapel Hill; Hanan
Association for the Advancement Information, Computing, Alto Research Center; Venu Samet, University of Maryland;
of Science (AAAS) elected and Communication. They Govindaraju, University at Abraham Silberschatz, Yale
503 members as Fellows in are: Srinivas Aluru, Iowa Buffalo State, The University of University; Manuela M. Veloso,
recognition of their meritorious State University; Victor Bahl, New York; Hamid Jafarkhani, Carnegie Mellon University;
efforts to advance science or Microsoft Research; David R. University of California, Irvine; and Barry Wessler, Wessler
its applications. Election as Boggs, Consulting Electrical Farnam Jahanian, University of Consulting.
an AAAS Fellow is an honor Engineer; Geoffrey Charles Michigan; Phokion G. Kolaitis, The new Fellows were
bestowed upon members by Bowker, University of Pittsburgh; University of California, Santa honored at the Fellows Forum
their peers. John M. Carroll, Pennsylvania Cruz; C. C. Jay Kuo, University held in February during the
Of the new Fellows, 16 State University; J. J. Garcia- of Southern California; Dinesh AAAS Annual Meeting in
members were selected Luna-Aceves, University of Manocha, University of North Washington, D.C.

20 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news

Research and Development | doi:10.1145/1897852.1897861 Tom Geller

Evaluating
Government Funding
Presidential report asserts the value of U.S. government
funding and specifies areas needing greater focus.

I
s computer science rightly part
of public education? How
much does the U.S. govern-
ment spend on basic network-
ing and IT research? Should
industry provide that funding instead?
How important is supercomputing?
These are some of the questions ad-
dressed by a 148-page report released
by the President’s Council of Advisors
on Science and Technology (PCAST)
last December. Titled Designing a Digi-
tal Future: Federally Funded Research
and Development in Networking and In-
formation Technology, the report looked
at U.S. investments in the cross-agency
Networking and Information Technol- President Obama enjoys a lighthearted moment with members of the President’s Council of
ogy Research and Development (NI- Advisors on Science and Technology during a meeting at the White House.
TRD) program, currently totaling ap-
proximately $4.3 billion per year. parative rankings of the world’s fastest focus, such as the marked increase in
Among other points, the council supercomputers” are “relevant to only data. “We’ve gone from a world where
called for affirmation of computer sci- some of our national priorities,” and data was rare and precious to where
ence as a part of education in science, said that they shouldn’t “ ‘crowd out’ we’re drowning in it,” Reed says.
technology, engineering, and math; the fundamental research in computer The report also documented NIT’s
increased investment in the areas of science and engineering that will be re- importance to U.S. competitiveness—
privacy, human-computer interaction, quired to develop truly transformation- and the payback for NIT investment.
massive data stores, and physical in- al next-generation HPC systems.” An example of high payback was
strumentation such as sensors and PCAST called for an increase of $1 given at the report’s public release by
robotics; long-term, multi-agency NIT billion in funding for “new, potentially Akamai founder Tom Leighton. He re-
initiatives for health, energy, transpor- transformative NIT research” and rec- lated a story of U.S. Defense Advanced
tation, and security; better coordina- ommended more specific accounting Research Projects Agency funding he
tion among agencies by the Office of to separate basic NIT research from in- received in the 1990s to study “highly
Science and Technology Policy and frastructure costs. The report’s Work- mathematical and highly theoretical
the National Science and Technology ing Group Co-chair and University of subjects ... the living example of high-
Council; and a standing committee to Washington Professor Ed Lazowska risk research.” When the research was
provide ongoing strategic perspectives. was quick to point out that the money finished, Internet companies weren’t
The report also warned against sin- was used in ways that are “appropriate interested in the results—even for free.
gle-minded performance metrics when and important”—for example, large- “So we started a company called
evaluating high-performance comput- scale genome databases—although not Akamai Technologies,” Leighton says.
official wh it e h ouse photo by p ete souza

ing (HPC) projects—a subject made “pushing the forefront of NIT.” “We [now] carry over a third of Web
timely by Top500’s most-recent rank- Independent technology reviews traffic ... and are probably paying over
ing of the world’s fastest supercomput- of this sort were mandated under the a $100 million in taxes this year. But it
ers, which appeared five weeks before High-Performance Computing Act of wasn’t the kind of research that com-
PCAST released its report. The Chinese- 1991. The previous review, published in panies fund.”
built Tianhe-1A supercomputer topped 2007, “found many of the same issues”
that list, bumping U.S.-made comput- according to that report’s co-chair, Mi- Tom Geller is an Oberlin, OH-based science, technology,
and business writer.
ers from the lead spot for the first time crosoft Corporate Vice President Daniel
in six years. The report stated that “com- Reed. But he did note some changes of © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 21
news

Technology | doi:10.1145/1897852.1897859 Gary Anthes

Memristors:
Pass or Fail?
The device may revolutionize data storage, replacing flash memory
and perhaps even disks. Whether they can be reliably and cheaply
manufactured, though, is an open question.

A
fundamental elec-
device, whose
t ro n i c
existence was postulat-
ed five decades ago but
which proved hard to
understand, let alone build, is ready
to emerge from the lab, corporate and
university researchers say. If so, the
memristor (or memory resistor), as it
is called, may arrive just in time to save
the information storage industry from
the transistor’s collision with the scal-
ing wall at the end of Moore’s Law.
Hewlett-Packard announced last
August that it would team with the
South Korean computer memory
maker Hynix Semiconductor to de-
velop memristor-based memory chips,
called resistive RAM (ReRAM), which
they say will be on the market in about
three years. The companies say their Lorem ipsum dolor sit amet consect
titanium-based chips could replace
flash memory—which has become
nearly ubiquitous in mobile applica-
tions—and would be 10 times faster An image of a circuit with 17 memristors captured by an atomic force microscope at
and 10 times more energy efficient. Hewlett-Packard’s Information and Quantum Systems Lab.
Meanwhile, Rice University has joined
with Austin, TX-based PrivaTran, a tain types of logic circuits. “That [logic Berkeley, says the memristor’s size ad-
semiconductor design company spe- ability] could change the standard par- vantage isn’t its sole advantage. “You

Im age by R. Stanley Wi llia m s, H P Senior F ellow, Qua ntum Scien ce R esea rch , H P L a bs
cializing in custom integrated sys- adigm of computing, by enabling com- can not only build them smaller, but
tems, to develop an all-silicon ReRAM putation to one day be performed in use fewer of them,” he says. “Ten mem-
chip that could be a substitute for flash chips where data is stored, rather than ristors might do the same thing as 50
memory. But a senior research official on a specialized CPU,” says Gilberto transistors, so it’s a new ball game.”
at Intel says it is far from certain that Medeiros Ribeiro, a senior scientist at In 1971, Chua published a paper,
either effort will succeed. HP Labs. “Memristor—The Missing Circuit Ele-
A memristor is a tiny two-terminal The memristor has several quali- ment,” in IEEE Transactions on Circuit
electronic component that can be ties that make it attractive for memory Theory, which outlined the mathemat-
made from a variety of materials—in- chips. First, it is nonvolatile, so that ical underpinnings of memristors,
cluding polymers, metal oxides, and it remembers its state after electrical which he called the fourth fundamen-
conventional semiconductors like sili- current is switched off. Second, it can tal building block of electronics (along
con— whose resistance varies with the be scaled to a single nanometer (nm) with resistors, capacitors, and induc-
voltage applied across it and with the in size, researchers believe, whereas tors). The existence of memristance
length of time the voltage is applied. the one-bit flash memory cell is expect- had been reported earlier—in 1960 by
Its initial applications are likely to be ed to reach its scaling limit at about Bernard Widrow at Stanford Univer-
as binary memory devices, but it could 20 nm. And Leon Chua, a professor of sity, for example—but it was not well
work in an analog fashion and could electrical engineering and computer understood.
eventually become the basis for cer- science at the University of California, Earlier researchers had erroneously

22 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news

interpreted memristance as a hyster- Society

Women
esis relationship (one in which effect
lags cause) between voltage and cur- The memristor could
rent, when in fact it is based on flux enable computation
and charge, the time integrals of volt-
age and current, says Chua. He likens to be performed in and
the pre-1971 view of memristance to
Aristotle’s belief that force is propor-
chips where data is
stored, rather than
Tenure
tional to velocity and not, as Newton
correctly demonstrated 2,000 years on a specialized Over the last couple of decades,
women have played an
later, as proportional to the change in
velocity, or acceleration. CPU, says Gilberto increasingly important role in
the research sciences. However,
In 2006, HP designed and built a Medeiros Ribeiro. according to Keeping Women
in the Science Pipeline, a recent
titanium memristor that worked pre-
study by University of California,
dictably and retained its state when Berkeley researchers, there’s
powered off, based on the mathemati- trouble brewing. Women,
cal framework proposed by Chua. “For who now receive more than
50% of the Ph.D.s granted by
years people built [memristance] de- institutions, are more likely to
vices almost by accident. It’s to the leave the profession than men.
great credit of HP that they finally fig- However, not everyone has been im- This, combined with growing
ured it out,” he says. Figuring it out, pressed by the recent announcements demand for talent in Europe
and Asia, puts U.S. preeminence
according to HP’s Stanley Williams, from HP and Rice. “The memristor is in the sciences at risk.
the chief architect of the company’s only one of several interesting [recent] The authors, who collected
memristor, meant “understanding flash technologies, and by no means data from multiple sources
and surveyed 62 academic
the mathematical framework for the most interesting,” says Justin Rat- institutions, found considerable
memristors.” tner, Intel’s chief technology officer. differences in men and women
Almost 40 years seems a long time “Any time someone hypes a particular attaining tenure track positions.
between the emergence of Chau’s memory technology before building a Married women with young
children are 35% less likely than
framework and the ability to reli- large memory chip, you should be sus- their male counterparts to enter
ably produce memristors, but enor- picious, very suspicious. It’s one thing a tenure-track position after
mous engineering hurdles had to be to demonstrate a storage device in the receiving a Ph.D. in science.
What’s more, married women
overcome. It required methods and lab, but it’s an entirely different thing
with children are 27% less likely
tools, such as scanning tunneling mi- to demonstrate it can be built in high than men with children to
croscopy, that could work at atomic volume at low cost and with exception- receive tenure after entering a
scales. HP says it experimented with al reliability.” tenure-track job in the sciences.
On the other hand, single
an enormous number of device types, Rattner acknowledges that flash women without young children
many based on exotic materials and memory, which is a $20 billion-plus are about as successful as
structures, but the results were often market, is rapidly approaching its scal- married men with children in
inconsistent and unexplainable. It was ing limit. But rather than memristors, attaining tenure-track jobs.
According to the report, both
not until 2006 that HP developed equa- Intel is concentrating on nonvola- men and women view tenure-
tions that explained just what was oc- tile phase-change memory, by which track positions in research-
curring in its titanium memristors. certain types of glass can be made to intensive universities as less
than a family-friendly career
switch between two states by the ap- choice. Only 46% of the men
More Speed, Less Power plication of heat. In the amorphous and 29% of the women rate their
The breakthrough achieved by HP state, the atomic structure of the glass institutions “somewhat” or
in 2006 could revolutionize memory is highly disordered and has high re- “very” family friendly. Numerous
work hours and maternity leave
technology, the company says. “Mem- sistivity. But when switched to its crys- of less than six weeks were cited
ristor memory chips promise to run at talline state, the glass has a regular as common problems.
least 10 times faster and use 10 times atomic structure and low resistivity. The upshot? The academic
world needs to adopt more
less power than an equivalent flash “We built commercial-grade, phase-
family-friendly policies and
memory chip,” according to Williams, change memories of sufficient size to provide greater opportunities
director of HP’s Information and fully understand the pros and cons of for female tenure-track
Quantum Systems Lab. “Experiments the technology in a high-volume envi- candidates. “America’s
researchers do not receive
in our lab also suggest that memris- ronment,” Rattner says. Intel is look- enough family-responsive
tor memory can be erased and writ- ing at additional novel approaches to benefits,” the report concludes.
ten over many more times than flash nonvolatile memories such as the spin “Academia needs to be more
memory. We believe we can create torque transfer memory, which ex- flexible… research universities
should look to build a family-
memristor ReRAM products that, at ploits magnetic spin states to electri- friendly package of policies and
any price point, will have twice the ca- cally change the magnetic orientation resources.”
pacity of flash memory.” of a material. —Samuel Greengard

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 23
news

The memristor prototype chip built with the current flowing through it
at Rice is a 1-kilobyte ReRAM with sub- over time. Similarly, the brain learns
5 nm switches, according to Jim Tour, In the short term, and configures itself by varying the
a synthetic organic chemist at Rice memristors are most strength of synaptic connections be-
and a leading memristor researcher. tween neurons. The ability of mem-
Of Rattner’s concern about manufac- likely to be used in ristors to remember and to work as
turing the devices, he says, “All so far storage devices, analog devices allows them to assume
looks good—materials cost, fabrica- any of many values over a range, just as
tion needs, scalability and switching but eventually may synapses do.
times—except the switch voltage is a be used in artificial The memristor self-learns from
bit higher than we’d like, but we have experience, and the brain is made of
some ideas to reduce that.” neural networks. memristors,” Chua says. “That in the
Rice claims to have an edge over long run is much more interesting and
HP’s silicon-and-titanium memristor important [than data storage], because
chip with its all-silicon model. “There it’s how you can design intelligent ma-
are lots of engineering barriers to be chines. But it’s in the next 50 years, not
overcome before this really takes off,” the next 10.”
says Doug Natelson, a professor of oxidation and reduction of titanium.
physics and astronomy at Rice. “But Titanium dioxide (TiO2) is a semicon-
Further Reading
the use of all silicon makes the manu- ductor and is highly resistive in its pure
facturing very understandable.” state. However, oxygen-deficient TiO2, Chua, L.
Memristance comes from reduc- which has oxygen “vacancies” where an Memristor–the missing circuit element,
IEEE Transactions on Circuit Theory 18, 5,
tion-oxidation chemistry, in which oxygen atom would normally appear, is Sept. 1971.
atoms or molecules gain or lose their highly conductive. By applying a bias
Jo, S.H., Chang, T., Ebong, I.,
affinity for oxygen atoms, and in which voltage across a thin film of semicon-
Bhadviya, B.B., Mazumder, P., and Lu, W.
the physical structure of materials can ductor with oxygen-deficient TiO2 on Nanoscale memristor device as synapse in
change. The Rice memristor chip, a one side, the oxygen vacancies move neuromorphic systems, Nano Letters 10, 4,
thin layer of silicon oxide sandwiched into the pure TiO2 on the other side of March 1, 2010.
between two electrodes, is made to the semiconductor, thus lowering the Strukov, D.B., Snider, G.S., Stewart, D.R.,
convert back and forth between sili- resistance. Running current in the oth- and Williams, R.S.
con (a conductor) and silicon oxide er direction will move the oxygen va- The missing memristor found, Nature 453,
May 1, 2008.
(an insulator.) A sufficiently large volt- cancies back to the other side, increas-
age (up to 13 volts) applied across the ing the resistance of the TiO2 gain. Tour, J.M. and He, T.
silicon oxide converts some of it into In the short term, memristors are Electronics: the fourth element, Nature 453,
May 1, 2008.
pure silicon nanocrystals that conduct most likely to be used in storage de-
current through the layer. The switch, vices, but eventually may be used in Yao, J., Sun, Z., Zhong, L.,
Natelson, D., and Tour, J.M.
according to Natelson, shows robust artificial neural networks, in applica- Resistive switches and memories from
nonvolatile properties, a high ratio of tions such as pattern recognition or silicon oxide, Nano Letters 10, 10, Aug. 31,
current “on” to current “off” (>105), real-time analysis of the signals from 2010.
fast switching (sub-100 ns), and good sensor arrays, in a way that mimics
endurance (104 write-erase cycles). the human brain. A memristor works Gary Anthes is a technology writer and editor based in
Arlington, VA.
The HP version is conceptually like a biological synapse, with its con-
similar, but works by the alternating ductance varying with experience, or © 2011 ACM 0001-0782/11/0300 $10.00

Technology

Flexible Screens
Hewlett-Packard plans to a screen that won’t break or them,” said Nicholas Colaneri, like newspapers rolling off a
deliver a prototype of a solar- shatter like glass. director of the Flexible Display printing press, which could
powered, lightweight device Researchers expect the Center at Arizona State be more inexpensive than the
with a flexible plastic screen— HP prototype to inspire a University, in an interview with current batch production of
which HP researchers are new generation of products The San Jose Mercury News. glass displays.
affectionately terming “a Dick with flexible plastic screens, “How about a stack of thin The flexible display and
Tracy wristwatch”—to the U.S. including clothing, household displays that I can peel off and e-reader market is expected
Army later this year. Roughly furnishings, and more. “You stick on things, sort of like a to grow dramatically from
the size of an index card, the can start thinking about pad of Post-it notes?” $431 million in 2009 to $9.8
low-power device will enable putting electronic displays HP hopes to produce billion in 2018, according to
soldiers to read digital maps, on things where you wouldn’t flexible displays that can DisplaySearch.
directions, and other data on ordinarily think of having be produced continuously, —Graeme Stemp-Morlock

24 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
news

In Memoriam | doi:10.1145/1897852.1897862 Samuel Greengard

Gary Chapman,
Technologist: 1952–2010
He raised important public issues, such as the impact
of computers and the Internet on society, and encouraged
social responsibility for computer professionals.

G
ary Chapman, an American labors,” Mitch Kapor, the founder of
technologist, Internet ex- Lotus Development Corp., said in an
pert, and ethicist, died email interview.
Dec. 14, 2010, at the age of Chapman served as CPSR execu-
58. He suffered a massive tive director from 1984 to 1991. Under
heart attack while on a kayaking trip in Chapman’s leadership, CPSR flour-
Guatemala. ished and grew, and emerged as an
Over the last two decades, Chap- outspoken critic of the use of technol-
man established himself as an inter- ogy by the military. “With Gary at the
national authority on Internet and helm, CPSR raised serious questions
technology policy. He was among the in public forums about the Strategic
first technologists to draw public at- Defense Initiative [Star Wars],” Orn-
tention to the issues that computing stein says. “Gary and I shared the no-
technology, including the Internet, tion that too much technology was be-
presents to society. Chapman helped ing developed for military purposes.
to insert ethics and human values into We felt strongly that it would have
the world of computing by focusing on been better if technology funding
a mélange of issues, including how to came through the National Science
address the digital divide in society, Foundation rather than the Depart-
preventing the misuse of technology ment of Defense.”
by government agencies, especially However, after the defeat of Star
the military, and encouraging young job interview, but was told he was too Wars and the collapse of the Berlin Wall
people to use the Internet responsibly. late. Ornstein had already decided on in 1989, the threat of nuclear war di-
At the time of his death, Chapman a candidate. Chapman insisted, and minished, and Chapman became more
was a senior lecturer at the LBJ School got an interview. “When we heard him concerned about the effects of comput-
of Public Affairs at the University of speak, we knew he was the perfect ers and, later, the Internet on society.
Texas at Austin. He began teaching at person for the job,” Ornstein recalls. Born on Aug. 8, 1952 in Los Ange-
the school in 1994. He also served as “Everyone on the board agreed that we les, Chapman served as a medic in
director of the 21st Century Project, a had to hire him.” the U.S. Army Special Forces during
research and education resource for “Gary was a real pioneer in link- the Vietnam War. He earned a B.A. in
policymakers and the public. ing the lives and careers of computer political science from Occidental Col-
In addition, Chapman lectured professionals to the social impact of lege in 1979 and attended Stanford
internationally and wrote for many the work they do and calling for us to University’s political science Ph.D.
prominent publications, including take responsibility for the fruits of our program. He left Stanford in 1984 to
The New York Times, Technology Review, lead CPSR.
and The New Republic. His syndicated “Gary was deeply concerned about
Digital Nation column, which ap- Chapman helped society plunging ahead with technology
peared in more than 200 newspapers without giving adequate thought to the
and Web sites, ran from 1995 to 2001. bring ethics and social implications,” Ornstein says. “He
Chapman’s big break came in 1984 human values to the helped provide much-needed direc-
photogra ph by Sash a Ha agensen

when, while a graduate student in po- tion, and he has left behind students
litical science at Stanford University, world of computing. and others who will continue to moni-
he learned the newly formed Com- tor and analyze technology policy.”
puter Professionals for Social Respon-
sibility (CPSR) was hiring its first ex- Samuel Greengard is an author and journalist based in
West Linn, OR.
ecutive director. Chapman contacted
CPSR cofounder Severo Ornstein for a © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 25
V
viewpoints

doi:10.1145/1897852.1897863 Pamela Samuelson

Legally Speaking
Do You Own the
Software You Buy?
Examining the fine print concerning your rights
in your copies of purchased software.

S
oftware companies have cause we bought our copy of that soft- “license” label, has been going on for
mass-marketed computer ware, we think we own it, regardless decades. Strangely enough, it has yet to
programs for the past few of what any “license” document says. be definitively resolved. A recent appel-
decades on terms that typ- We are also quite confident the vendor late court ruling has upheld the license
ically purport to restrict won’t take any action against us, even if characterization, but a further appeal
the right of end users to resell or oth- we do violate one of the terms, because is under way in that case. This ruling is
erwise transfer their interests in cop- realistically the vendor can’t monitor also at odds with other appellate court
ies of software they have purchased. every end user of its products. decisions. So things are still up in the air
The restrictions are usually stated in The debate over whether mass-mar- on the ultimate issue. This column will
documents known as shrink-wrap or ket transactions like these are really explain what is at stake in these battles
click-through “licenses.” Vendors of “sales” of goods, notwithstanding the over your rights in your copies of pur-
other types of digital content some- chased software.
times distribute their works with simi-
lar restrictions. Copyright law Three Legal Options
Shrink-wraps are documents in- The distinction between “sales” and “li-
serted in packaged software, often allows rights holders censes” really matters when assessing
just under the clear plastic wrap sur- to control only the risk of liability to copyright owners
rounding the package, informing pur- if resale restrictions are ignored.
chasers they are not owners of copies the first sale of Copyright law allows rights holders
of programs they just bought, but in- a copy of a protected to control only the first sale of a copy
stead have rights in the program that of a protected work to members of the
are limited by the terms of a license work to members public. After the first sale, the owner of
agreement. Click-throughs are similar of the public. that copy is entitled to resell or other-
in substance, although the “license” wise transfer (for example, give it away
terms only become manifest when you as a gift or lend it to others) the copy
try to install the software and are direct- free from risk of copyright liability.
ed to click “I agree” to certain terms. Bookstores and libraries are among the
Most of us ignore these documents institutions made possible by copyright
and the restrictions they contain. Be- law’s first-sale doctrine.

26 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


viewpoints

What happens if copyright owners sions in which copyright owners chal- The Ninth Circuit Court of Appeals
try to restrict resales through license lenged the resale of copies of copyright- heard the arguments in Vernor and Au-
restrictions? There are three possible ed works on eBay. In both cases, the gusto on the same day. In September
outcomes. courts ruled that copyright’s first-sale 2010, the appellate court ruled in favor
First, the effort to restrict resales may rule applied, notwithstanding transfer of Autodesk. Yet it upheld Augusto’s
be deemed a nullity, as it was in a fa- restrictions, because of economic reali- first-sale claim.
mous 1908 Supreme Court case, Bobbs- ties of the transactions. In assessing whether the first-sale
Merrill Co. v. Straus. Bobbs-Merrill sold The plaintiff in Vernor v. Autodesk rule should apply to mass-market
books to Straus containing a prominent asked the court to declare that he was transactions like these, it is useful to
notice that resale of the books except at the owner of copies of Autodesk soft- compare the economic realities test
a stated price would be treated as copy- ware he purchased from one of Au- used by the trial courts in the Vernor
right infringement. When Straus sold todesk’s customers and that he was and Augusto cases and the labeling and
the books for a lower price, Bobbs-Mer- entitled under the first-sale doctrine to restrictions test adopted by the Ninth
rill sued for infringement. The Court re- resell those copies on eBay. Circuit in Vernor.
fused to enforce this resale restriction, Autodesk claimed no sale had taken
saying that the copyright owner was en- place because the software was licensed Economic Realities Test
titled to control only the first sale of cop- on terms that forbade transfer of the Under this test, a copyright owner’s
ies of its works to the public. copy to third parties. Autodesk asked characterization of a transaction as a
Second, a resale restriction may be the court to declare that sales of these license, rather than a sale, is not dis-
enforceable against the purchaser in-
sofar as he has agreed not to resell his
copy, but it would be unenforceable
against anyone to whom the purchaser
might subsequently sell his copy.
This result might seem odd, but
there is a fundamental difference be-
tween contract and property rights:
Contracts only bind those who have
agreed to whatever terms the contract
provides; property rights create obliga-
tions that are good against the world.
A first-sale purchaser may thus have
breached a contractual obligation to the
copyright owner if it resells its copy of
the work in violation of a resale restric-
tion, but he is not a copyright infringer.
Those who purchase copies of copy-
righted works from owners of first-sale
copies are not at risk of either copyright
or contract liability. These third-party
purchasers are also free to resell their
copies to a fourth party without fear copies on eBay constituted copyright positive. It is instead only one factor
that either is at risk of any liability to the infringement. among many that should be weighed
copyright owner. UMG v. Augusto involved promotion- in determining the true nature of the
Third, courts may rule that the first- al CDs of music. Augusto purchased transaction.
sale rule does not apply to mass market these CDs at flea markets, online auc- Other factors include whether the
“license” transactions involving copies tions, and used CD stores. Language purchaser has the right of perpetual
of copyrighted works because no “sale” on the CD packaging indicated they possession of the copy, whether the
has taken place. Under this interpreta- were licensed for personal use only rights holder has the right and ability
tion, secondary markets in those cop- and could not lawfully be sold or other- to reclaim the copy if the license terms
ies are illegal. Anyone who purports to wise transferred to third parties. When are violated, whether the purchaser has
resell the copies is a copyright infringer Augusto started selling UMG promo- paid substantial sums for the privilege
for distributing copies of copyrighted tional CDs on eBay, UMG sued him for of permanent possession, and whether
works without getting permission from infringement. the purchaser has the right to discard
Photogra ph by Wind ell Oskay

the copyright owner. My March 2009 column predicted or destroy the copy. The marketing
that Autodesk and UMG would appeal channels through which the copy was
Vernor and Augusto the trial court rulings against them, and obtained (such as purchasing pack-
My March 2009 Legally Speaking col- that the software industry could be ex- ages of software at Walmart or Office
umn (“When is a ‘License’ Really a pected to push very hard for a reversal, Depot) may also be relevant.
Sale?”) discussed two lower-court deci- particularly in the Vernor case. Under the economic realities test,

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 27
viewpoints

Vernor and Augusto seem to be owners


of copies. Those from whom they ob-
tained the products had, it seems, the Software companies
right of perpetual possession in the cop- have been cheered
ies, and they could destroy or discard
the copies if they wished. The software by the Ninth Circuit’s
in Vernor had been purchased through ruling in Vernor.
a mass-market transaction, and the CDs
in Augusto had been mailed for free to But the rest of us
people who had not requested the CDs should be worried
and indeed, UMG had not even kept
track of the persons to whom the pro- about its implications.
motional CDs had been sent.
In a previous case, U.S. v. Wise, the
Ninth Circuit reversed a conviction for
criminal copyright infringement be-
cause the actress from whom Wise ob- first-sale argument. But the license la-
tained a copy of a movie was the owner bel was, as in MAI v. Peak, given consid-
of that copy, notwithstanding various erable weight. The court directed that
restrictions on what she could do with two other factors be taken into account:
the copy, including transfers to third whether the license restricted transfers
parties. of the copies and whether it contained
In Vernor’s petition for rehearing by other substantial restrictions. The pan-
the full Ninth Circuit Court of Appeals, el ruled that Autodesk should prevail
he argues the Ninth Circuit’s ruling is in against Vernor under this test. The re-
conflict with Wise and with precedents strictions in Augusto, by contrast, were
ACM’s from other appellate courts, including less substantial than those in Vernor.
Bobbs-Merrill.
interactions
Conclusion
magazine explores Labeling and Restrictions Test Software companies have been
critical relationships The Ninth Circuit in Vernor relied in cheered by the Ninth Circuit’s ruling
between experiences, people, part on MAI Systems Corp. v. Peak Com- in Vernor. But the rest of us should be
puter, in which a Ninth Circuit panel worried about its implications. Think
and technology, showcasing in 1993 ruled that customers of Peak’s about what Vernor may mean for flea
emerging innovations and industry computers, on which Peak software markets, bookstores, libraries, garage
leaders from around the world was installed, were not owners of cop- sales, and auction sites. Even selling a
ies of this software, but rather licens- used computer loaded with software
across important applications of ees. Owners of copies of copyrighted is infringing on this theory. Think also
design thinking and the broadening software are entitled to make copies about how easy it is for a vendor to put
field of the interaction design. for their use and to authorize third par- a “license” label on a mass-marketed
ties to make use-copies; non-owners product with copyrighted or patented
Our readers represent a growing components that states that any trans-
are not entitled to this privilege.
community of practice that MAI provided maintenance services fer of that copy to third parties will sub-
is of increasing and vital for Peak computers to Peak customers. ject the transferor to copyright or pat-
When MAI technicians turned on Peak ent infringement charges.
global importance.
computers to service them, they made Consumers enjoy significant ben-
temporary copies of Peak software in efits from the existence of secondary
the random access memory. Peak ar- markets. The first-sale limit on patent
gued, and the Ninth Circuit agreed, that and copyrights is essential to the op-
e

these copies were infringing because eration of those markets. Vernor and
ib
cr

they were not authorized by Peak. Augusto’s cases are important to the fu-
s
ub

MAI v. Peak cited no authority and of- ture of competition in product markets
/s
rg

fered no analysis in support of its ruling and to preservation of the long-stand-


.o
cm

that Peak’s customers were non-owners ing balancing principle in copyright


a
w.

of their copies of Peak software. Peak’s law that the first-sale rule represents.
w
w

characterization of the transaction as a


://
tp

license was, for that panel, dispositive. Pamela Samuelson (pam@law.berkeley.edu) is the
ht

The three-judge panel decision in Richard M. Sherman Distinguished Professor of Law and
Information at the University of California, Berkeley.
Vernor did not rely on the license label
alone as a basis for rejecting Vernor’s Copyright held by author.

28 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
V
viewpoints

doi:10.1145/1897852.1897864 Kenneth D. Pimple

Computing Ethics
Surrounded
by Machines
A chilling scenario portends a possible future.

I
pr ed i c t t hat in the near fu-
ture a low-budget movie will
become a phenomenon. It
will be circulated on the In-
ternet, shared in the millions
via mobile telephones, and dominate
Facebook for a full nine days. It will
show ordinary people going about their
everyday lives as slowly, subtly, every-
thing starts to go wrong as described in
the following events.
A middle-aged man notices the ad-
vertisements that pop up with his Web
searches are no longer related to the
search, but to odd and random prod-
ucts and services—hair replacement,
sports cars, retirement homes, second
career counseling, fantasy vacations,
divorce lawyers, treatments for depres-
sion and impotence.
A young professional woman, re-
cently laid off because of the bad econ-
omy, posts an Internet ad to sell her These merely perplexing events be- pathizers were taking over America—
piano. The ad doesn’t mention that she come ever more ominous as thousands The Invasion of the Info Snatchers will
needs the money to pay her rent. None of people, then millions, realize they play on our high-tech anxiety as our
of the offers are as high as her asking are always being watched, they have online lifestyles, position-broadcast-
price, but two offer exactly what she no privacy, and their every decision is ing cellphones, and protective moni-
owes for rent (to the penny) and one of- controlled by some unseen force. Four- toring devices are inexorably compro-
fers exactly $100 more. sevenths of moderate Americans who mised, exploited, and joined by ever
The seven most troublesome stu- are likely to vote begin to slide from the more subtle devices.
dents at a high school notice that wher- middle to the extreme right or left, not The preceding descriptions are in-
ever they go, they run into one or more knowing why. It gets worse and worse. It tended to be satirical, but all of these
of their three meanest teachers. seems like Armageddon. scenarios are possible today, though
An elderly couple starts hearing Just as the 1956 film Invasion of the their probability varies. What seems
Illustratio n by vi ktor ko en

voices in their assisted-living apart- Body Snatchers encapsulated the Red most unlikely to me, though, is that
ment, faint whispers they can barely Scare zeitgeist with its depiction of or- people are and will be nervous about
discern: dinary people being replaced by exact being swept away in the rising tide of
“He’s awake.” replicas who are actually aliens bent pervasive information technology.
“She just got out of bed.” on taking over the world—as many Pervasive computing, ubiquitous
“The coffee machine is on.” feared that Communist spies and sym- computing, ambient intelligence, and

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 29
viewpoints

everyware are all commonly used terms Machines on the go… do we build and shape such pervasive
for generally similar phenomena—the When presented properly, the ben- sensing systems without slipping into
increasing presence in our everyday efits of pervasive IT are obvious. The coercion, surveillance, and control?”
lives of information and communica- popularity and benefits of the Internet
tion technologies (ICT) that are too and cellphones need not be defended ...in the home…
small to notice, or integrated into ap- or even described, but the amount of Kalpana Shankar, an assistant professor
pliances or clothing or automobiles, or personal information in circulation in the School of Informatics and Com-
are aspects of services we use willingly. over the former is tremendous and in- puter Science and an Adjunct Assistant
Some technologists have professed a creasing, as are the unsanctioned uses Professor in the School of Library and
goal for such technologies to be invis- of personal data. The position-broad- Information Science at Indiana Uni-
ible to the users, taken-for-granted casting function of the latter is not ne- versity Bloomington, distributed a case
or simply unnoticed by most people, farious in intent. Both can be used in study to participants before the work-
continuous with our background envi- knowledge creation, but also for stalk- shop. The case study, “Sensing Pres-
ronment, existing and operating below ing of various sorts. ence and Privacy: The Presence Clock,”
and behind the realm of real-time hu- At the workshop, Katie Shilton, a was developed as part of an NSF-funded
man intervention or awareness. doctoral candidate in the Department research project, “Ethical Technology
These technologies were the focus of Information Studies and a research- in the Homes of Seniors,” or ETHOS
of a two-day workshop held last year: er at the Center for Embedded Network (http://ethos.indiana.edu/).
Ethical Guidance for Research and Ap- Sensing (CENS) at the University of Cal- An increasing number of people
plication of Pervasive and Autonomous ifornia at Los Angeles, described three want to live in their own homes as they
Information Technology (PAIT). The intriguing CENS projects using mobile age and begin to become less self-reli-
workshop was funded by the National phones; I’ll describe two. ant due to slowly increasing frailty of
Science Foundation (grant number Participants in the Personal Envi- various sorts. Their offspring want to
SES-0848097), with additional support ronmental Impact Report (PEIR) pro- ensure they are safe and that responses
from the Poynter Center for the Study gram record and submit a continuous to mishaps are rapid and certain. The
of Ethics and American Institutions location trace using their mobile devic- ETHOS project investigates how In-
(http://poynter.indiana.edu) at Indiana es. A participant’s location is captured ternet-connected devices that alert re-
University Bloomington, and hosted by every few seconds, allowing the system sponsible parties to troubling changes
the Association for Practical and Pro- to determine the location and infer the in routine—such as a person falling in
fessional Ethics (http://www.indiana. most likely mode of locomotion—foot, the living room and not getting up—
edu/~appe). Thirty-six scholars, includ- car, or bus. The participant’s travel pro- can give care providers peace of mind
ing ethicists, engineers, social scien- file is then correlated with Southern and elders more autonomy than they
tists, lawyers, geographers, and social California air quality and weather data, would enjoy in an assisted-living facili-
scientists, participated in the meeting, allowing PEIR to estimate the partici- ty as well as life-saving interventions in
discussed ethical issues in pervasive pant’s carbon footprint, as well as her an ethical manner acceptable to both
IT, and began crafting approaches to or his exposure to air pollution. The elders and their offspring.
ethical guidance for the development accuracy of the data gives an unprec- The Presence Clock case can be
and use of such devices, including edented look into the environmental found at the PAIT blog (http://ethical-
public policy guidance. The workshop harms people create and suffer. pait.blogspot.com/2009/08/case-
schedule, a list of participants, and Through the Biketastic project, bi- study-presence-clock.html) along with
more can be found at http://poynter. cyclists carrying GPS-enabled mobile commentary. Briefly described, the
indiana.edu/pait/. In this space, I can- phones transmit data on their routes Presence Clock is an analog clock that
not hope to do justice to the rich and through L.A. The information is not comes in pairs. One clock is installed
wide-ranging conversations we had at limited to position, but also includes in the elders’ living space and the sec-
the workshop, so I will focus on three data on the volume of noise and, using ond in the living space of their caretak-
significant topics we discussed at the the device’s accelerometer, the rough- ers. The two clocks are connected via
workshop. ness of the path. The data is transmit- the Internet. Both clocks sense move-
ted to a Web site (http://biketastic.com) ment and presence and lights on the
and displayed. Future improvements remote clock show roughly how much
When presented could display existing data about the time someone spent at any given hour
route, such as air quality, traffic con- in the room with the local clock; the
properly, the benefits ditions, and traffic accidents. The time spent is indicated by the bright-
of pervasive IT are Biketastic riders can also share their ness of a light next to the relevant hour
information with other riders to create marker (for example, a dull light at 1
obvious. a detailed profile of the rideability of and a bright light at 4 indicate some-
numerous routes. one spent little time near the clock at
Shilton described not only the ben- 1 and a good deal of time there at 4). A
efits and uses of these projects, but the different-colored light blinks next to
potential down-side and abuses. Sum- the hour marker when someone most
marizing the latter, she asked, “How recently entered the room.

30 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


viewpoints

The goal of the Presence Clock is to


give the elder and caregiver a sense of
Determining moral
Calendar
mutual presence, even at a distance. A
glance at the clock can give either party
a sense of what the other has been do-
responsibility is a of Events
ing; a change in routine might prompt serious endeavor, and March 19–23
a telephone call. It is less intrusive than dodging or shifting Computer Supported
Cooperative Work,
a true surveillance system with an au-
dio or visual feed, but could afford a blame (if that’s all one Hangzhou, China,
Sponsored: SIGCHI,
great deal of comfort to both parties does) is irresponsible Contact: John Tang,
Email: johntang@microsoft.
and enable an elder to stay in her or his
own home longer than the caretakers in itself. com

would otherwise feel comfortable. March 21–25


But it could feel intrusive to the el- Tenth International Conference
der, a trade-off between privacy and se- on Aspect-Oriented Software
Development,
curity that he or she does not want to Porto de Galinhas, Brazil,
make, but the caretaker insists upon. Contact: Borba Paulo,
Responsible development and deploy- workshop Miller started advocating for Email: phmb@cin.ufpe.br
ment of the Presence Clock is not just an effort to make a strong statement
March 22–24
a technical and marketing challenge, about the importance of accepting 6th ACM Symposium on
but also a challenge in human rela- moral responsibility even in circum- Information, Computer and
tions and customer/user education. stances of complicated causality. Our Communications Security,
time was too short to make much prog- Hong Kong,
Contact: Lucas Chi Kwong Hui,
…and thinking for themselves. ress, but Miller has pushed the project Email: hui@cs.hku.hk
Perhaps even more troubling than forward in the interim. As I write this
machines that hide or drop from our column in early 2011, Miller is work- March 22–24
awareness are machines that make ing on the 27th draft of “Moral Respon- 4th International ICST
Conference
choices without direct human inter- sibility for Computing Artifacts: Five on Simulation Tools and
vention—generally termed “autono- Rules” (The Rules, for short) and has Techniques,
mous systems.” Keith W. Miller, the assembled a 50-member, international Barcelona, Spain,
Contact: Liu Jason,
Louise Hartman Schewe and Karl Ad Hoc Committee for Responsible Email: liux@cis.fiu.edu
Schewe Professor of Computer Science Computing to improve drafts. It’s re-
at the University of Illinois at Spring- markable to get such cooperation and March 28–29
field (to whom I owe the title of this col- consensus from scholars solely over International Cross-
Disciplinary Conference on Web
umn) highlighted one concern about email; more so is that the document is Accessibility,
autonomous systems in a presentation only four pages long (see https://edocs. Andhra Pradesh, India,
called “The problem of many hands uis.edu/kmill2/www/TheRules/). Contact: Ashley Cozzi,
when some of the hands are robotic,” Email: cozzi@hq.acm.org
in which he revisited Helen Nissen- Conclusion March 30–April 1
baum’s 1994 article, “Computing and I have only touched on what happened 8th USENIX Symposium on
accountability.”1 The problem of many at the workshop itself, and mentioned Networked Systems Design and
hands lies in discerning or assigning only one of the ongoing projects the Implementation,
Boston, MA,
responsibility when something goes workshop inspired. More is happening Contact: David G. Andersen,
wrong. The more people involved in a and still more will be accomplished Email: dga@cs.cmu.edu
project, the more people there are to thanks to the enthusiasm of this re-
whom the blame can be shifted. When markable group of scholars, the ripple April 1–2
Consortium for Computing
the technology is believed to be able to effect that will let this workshop touch Sciences and Colleges (CCSC)
learn on its own and make its own de- many more people than those who at- Midsouth,
cisions, there can arise a temptation— tended, and a small grant from the Na- Conway, AR,
Contact: Larry Morell ,
perhaps even a compulsion—to blame tional Science Foundation.
Email: lmorell@atu.edu
the machine itself, allowing the hu-
mans involved in its design, construc- Reference April 1–2
1. Nissenbaum, H. Computing and accountability. Consortium for Computing
tion, and deployment to wash their Commun. ACM 37, 1 (Jan. 1994), 72–80.
Sciences and Colleges (CCSC)
hands of it.
Southwestern,
I think it’s fair to say that most of the Kenneth D. Pimple (pimple@indiana.edu) is Director of Los Angeles, CA,
workshop participants deplored this Teaching Research Ethics Programs at the Poynter Center Contact: Stephanie August,
for the Study of Ethics and American Institutions, an Email: saugust@lmu.edu
tendency. Determining moral respon- endowed center at Indiana University-Bloomington, and
sibility is a serious endeavor, and dodg- on the Affiliate Faculty of the Indiana University Center
for Bioethics.
ing or shifting blame (if that’s all one
does) is irresponsible in itself. At the Copyright held by author.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 31
V
viewpoints

doi:10.1145/1897852.1897865 Peter J. Denning

The Profession of IT
Managing Time
Professionals overwhelmed with information glut
can find hope from new insights about time management.

T
a per-
i m e m a n ag e m e n t i s Tracking Commitments
sistently hot issue for many to Completion
computing professionals. Although we often Much of the literature on time man-
Almost every day we hear complain about agement focuses on the first practice.
(or have) laments about That practice directly addresses one of
information overload, about a relent- not having enough the biggest breakdowns with commit-
lessly increasing rate of input from time, lack of time ment management—missed or forgot-
Internet and other sources, and about ten commitments. When the world
feelings of overwhelm, data drowning, is the symptom, gets demanding, we can find ourselves
inadequacy, and even victimization. not the problem. in a state of constant worry about
The consequences from poor time whether we forgot commitments or
management can be significant: loss their due dates and whether we have
of trust, loss of reputation, negative the capacity to get everything done.
assessments about our competence David Allen has written a hugely
and sincerity, and inability to get the popular book about how to organize
jobs and projects we want. Books and our records so that nothing is lost and
seminars on time management con- lack of time is the symptom, not the we can eliminate from our minds all
tinue to be popular. Software tools to problem. The problem is commit- concerns about whether every commit-
help keep track of calendars and to-do ment management. Time is one of ment is being taken care of.1 He has de-
lists sell well. the resources needed to manage com- fined an operating system for manag-
The same issues plague us as de- mitments. Other resources, such as ing commitments. His system can be
cision makers. We wanted larger money, space, and personnel, may be implemented with a few simple rules
networks and more sensors for bet- needed as well. From now on, let us and folders. The folders and structure
ter situational awareness—and now talk about commitment management. of flows among them are remarkably
those networks overwhelm us. We still In managing commitments we need similar to the job-scheduling part of
complain about the quality of our de- to know only four things. I’ll call them a computer operating system. After
cisions. practices because you can learn them you set up your system and practice its
In my own research on this subject as skills and get tools to help you do rules for a short time, you soon become
I have turned up new insights that are them better (see the figure here). skilled at commitment tracking. That
very helpful especially if viewed as a 1. How to track commitments to so many people have found his book re-
coherent framework. I discuss these their completions; ally helpful illustrates that the record-
insights here. There are opportunities 2. How to chose what commitments keeping part of commitment tracking
here for all computing professionals to to make or decline; is a huge struggle for many.
become more productive and for some 3. How to organize the conversa- Allen’s story begins with “stuff” ar-
to design new software tools. tions that lead to completions of com- riving before you. Stuff is anything that
mitments; and demands your attention and possible
From Time Management to 4. How to manage mood and capacity. future action. Think of stuff as incom-
Commitment Management These four practices go together. If ing requests. A request can be anything
It is very important to frame the ques- we pay attention to only one, we will from the really simple (such as “read
tion properly. Although we often com- see some headway but not a lasting so- me” or “take note”) to the complex
plain about not having enough time, lution to our problem. (such as “write an analytic report” or

32 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints

Figure 1. Four practices of commitment management: 1-tracking, 2-selecting, 3-executing, 4-capacity planning.

time
money
material
personnel

4
now

world delegated results

execute
3
importance queued

1
mission
2

“implement a software tool”). Allen Trump the Urgent asperated over the sheer number of ur-
says to sort the incoming items into With the Important gent, time-wasting requests. The irony
trash (ignore and delete), possibly use- Stephen Covey has discussed at length is that many urgent requests are the re-
ful (save in tickler file), reference (save the notion of controlling what com- sult of previously neglected important
in reference file), and actionable. You mitments you enter or decline.2 The tasks. For example, if you make sure
do actionable items immediately if central question is: what exactly do you give excellent service to your cus-
they require two minutes or less (for you commit to? Covey maintains that tomers, you will not spend a lot of time
example, a quick answer to an email the answers come from having a clear answering complaints.
message); otherwise you enqueue sense of mission. Just as organizations Covey tells an engaging story about
them in your to-do list and calendar, have mission statements, individuals a time-management seminar leader
or you delegate them. You review your should have personal mission state- who did a demonstration involving
queues periodically to see if your dele- ments. We can ignore requests that placing rocks, then gravel, sand, and
gations have completed and the order- do not serve our mission, and we can water into a large glass jar. After his
ings of lists reflects your current pri- (politely) ask the people making them students struggled with getting all
orities. Once an item is in this system, to leave us alone. Covey counsels each these items successfully into the jar, he
you do not have to think about it and of us to write down a mission state- asked, “What is the point about time
your mind is clear to focus on the tasks ment, including our ongoing personal management?” He got many answers
needing completion. and professional commitments. Then including there is always more room
This story is incomplete in three we arrange our calendars to make sure to fit more things in your schedule if
ways. (1) It does not address the pos- that we allocate time sufficient for each they are small or liquid enough, and
sibility of controlling the flow of stuff. major commitment. you may therefore have more capacity
(2) It does not make explicit that much Covey argues that good mission to get things done than you think. He
of the stuff originates with you and statements help people distinguish said, “No. The point is that if you don’t
your teams as you design actions to important requests from urgent re- put the big rocks in at the beginning,
fulfill your own commitments. And quests. Many people find themselves you can’t get them in at all.”
(3), it does not deal with limitations overwhelmed with urgent but unim- The moral for commitment man-
on your capacity and the mood of over- portant requests that consume all their agement is: let your mission statement
whelm when you are beyond capacity. time. This is a double whammy—they inform you about what tasks are most
These three aspects take us to the next are frustrated at being unable to find important, then set aside sufficient
three practices. time for the important things and ex- time in your schedule to do them.

M AR C H 2 0 1 1 | VO L . 5 4 | N O. 3 | C OM M U N I C AT I O N S O F T HE ACM 33
viewpoints

Mastering Conversations for To manage commitments means to ally do not spend more than 60–80
Context, Possibility, Action manage the conversations leading to hours per week on professional com-
The third practice begins with the real- the fulfillment of those commitments. mitments.
ization that all commitments are made Have you or someone made the appro- You need to reduce your load if you
in conversations.3 The practice is to priate requests or offers? Who is re- are over capacity. First, look at your
become an observer and facilitator of sponsible for performing each action? mission statement and recall what
those conversations. There are three Who is responsible for accepting and is most important to you. Make sure
basic kinds of conversations. declaring satisfaction with the result? that the time you allocate for your
˲˲ Context. Define the purpose, mean- Do you trust promises made to you by “big rock” commitments is sufficient
ing, and value of actions. others along the way? to do them right. All other commit-
˲˲ Possibility. Invent possibilities for ments need to be modified or elimi-
future action (in the context). Managing Capacity and Mood nated. Modified means you negotiate
˲˲ Action. Elicit the commitments The final aspect of the picture is your new terms with the person(s) who ex-
that will realize specific possibilities ability to manage your capacity and pects the results. Eliminated means
and see them through to completion. mood. You have the capacity for a you cancel the commitment. In both
It would be a misunderstanding of commitment if you have the time and those cases you need to work with the
Allen’s model (practice 1) to interpret other resources needed to fulfill the customers of your commitments to re-
his “actionable” items only in the third commitment. If you do not have the set their expectations and take care of
sense. Professionals who do not create resources, you will need to initiate any consequences resulting from your
context will find it difficult to get anyone conversations to get them—and you scale-back or cancellation.
to work with them. Although the action must manage those conversations
itself is performed in the third conver- as well. Generally, if you have accept- Conclusion
sation, the other two are needed before ed too many commitments relative Commitment management presents a
people are willing to engage in a conver- to your capacity, you will feel over- big software challenge. There are soft-
sation for action. Sometimes you need whelmed, victimized, and sometimes ware tools that help with some of the
to schedule time for context and possi- panicked—poor moods for productiv- four practices separately. For example,
bility conversations, but more often you ity. When you do not have the capacity, OmniFocus (omnigroup.com), Things
can insert them as needed as prefaces you can find yourself in a death spiral (culturedcode.com), and Taskwarrior
to your requests and offers (which open of an increasing backlog of broken (taskwarrior.org) conform to Allen’s
conversations for action). promises, negative assessments about workflows in practice 1. Orchestrator
A conversation for action takes your performance, lack of willingness (orchmail.com) tracks conversations
place between a customer and per- to trust you, and a personal sense of for action through their stages in prac-
former; the customer makes a request powerlessness. Over time, these bad tice 3; ActionWorks (actiontech.com)
(or accepts an offer) that the perform- moods increase stress and anxiety in goes further, mapping and managing
er commits to fulfilling. The transac- your body and lead to chronic diseas- entire business processes. Can some-
tion between them can be visualized es. Not a pretty picture. one design a coherent system that
as a closed loop with four segments: With a simple exercise, you can supports all four together?
request, negotiate, perform, accept.5 assess whether you have the capacity If you learn the four commitment-
Performers often make requests of for your commitments and take cor- management practices, you will be
others to get components for their rective steps when they are beyond able to execute all your commitments
own deliveries; thus a single request your capacity.3,4 On a three-column productively and in a mood of fulfill-
can evoke coordination in a larger spreadsheet, make one row for each ment and satisfaction. All your custom-
network of people (for details on con- commitment. Put a description of the ers will be satisfied and you will enjoy a
versations for action and their skilled commitment in the first column, the strong, trustworthy reputation.
management, see 3–5). number of weekly hours you need to
do it well in the second column, and References
1. Allen, D. Getting Things Done: The Art of Stress-Free
the number of weekly hours you ac- Productivity. Penguin, 2001.
Commitment tually spend in the third. Make sure 2. Covey, S.R., Merrill, R., and Merrill, R. First Things First.
Simon & Schuster, 1994.
to include all your “big rock” com-
management mitments including time for fam-
3. Denning, P. and Dunham. R. The Innovator’s Way:
Essential Practices for Successful Innovation. MIT

presents ily, sleep, and exercise. Many people Press, 2010.


4. Denning, P. Accomplishment. Commun. ACM
who feel chronically overwhelmed 46, 7 (July 2003), 19–23. DOI= http://doi.acm.
a big software discover from the exercise that their
org/10.1145/792704.792722.
5. Winograd, T. and Flores, F. Understanding Computers
challenge. column-two total exceeds 168, the and Cognition. Addison-Wesley, 1987.

number of hours in a week. Even if the


column-two total fits, they discover Peter J. Denning (pjd@nps.edu) is Distinguished
Professor of Computer Science and Director of the
that their column-three total exceeds Cebrowski Institute for innovation and information
100 hours per week for professional superiority at the Naval Postgraduate School in Monterey,
CA and is a past president of ACM.
commitments. In contrast, people
who feel productive and satisfied usu- Copyright held by author.

34 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
V
viewpoints

doi:10.1145/1897852.1897866 Daryl E. Chubin and Roosevelt Y. Johnson

Broadening Participation
A Program Greater
than the Sum of Its Parts:
The BPC Alliances
Changing the trajectory of participation in computing
for students at various stages of development.

T
h e re is v ir tually no disci-
pline or aspect of our daily
lives that is not positively im-
pacted by advances in com-
puter science. It has become
the backbone of our technologically
dependent society. In fact, computer
software engineers are among the oc-
cupations projected to grow the fast-
est and add the most new jobs over the
2008–2018 decade.6 Yet, bachelor’s,
master’s, and Ph.D. degrees earned by
U.S. citizens and permanent residents
continue to decline.3,7 Further, degrees
earned by women, persons with dis-
abilities, and underrepresented mi-
norities (American Indians/Alaskan
Natives, African Americans, Native Ha- Students at the CAHSI 2009 annual meeting held at Google headquarters.
waiian’s/Pacific Islanders or Hispan-
Photo court est y of AAAS Center fo r Adva ncing Science & Engineering Ca pacit y

ics) lag those of non-resident Aliens, or Demonstration Projects (LSA). Typi- underrepresentation in the computing
Asians, and White males. cal DPs pilot innovative programs that, disciplines. Projects may target stages
once fully developed, could be incorpo- of the academic pipeline from middle
Program Focus rated into the activities of an Alliance school through the early faculty ranks,
Rather than focus on the problems that or otherwise scaled for wider impact. and are expected to have significant im-
beset computing, we will emphasize LSA projects can leverage, scale, and pact on both the quality of opportunities
solutions in the form of the National adapt the work of Alliances or DPs, as afforded to participants and the number
Science Foundation’s (NSF) Broaden- well as efforts by other organizations of participants potentially served.5
ing Participation in Computing (BPC) to extend the impact of effective prac- NSF funding for the Alliances began
program.a The BPC-A program supports tices. Alliance and Alliances Extension in 2005/2006 with most programs oper-
three categories of awards: Alliances; Projects (Alliances) represent broad ating with students approximately one
Demonstration projects (DPs); and Le- coalitions of academic institutions of year later. Ten alliances constitute the
veraging, Scaling, or Adapting Projects, higher learning, secondary and middle core of BPC as of 2009. An eleventh al-
schools, government, industry, profes- liance, the National Center for Women
a For additional information on the BPC pro-
sional societies, and other not-for-profit & IT (NCWIT), predated the BPC pro-
gram, visit: http://www.bpcportal.org/bpc/ organizations designing and carrying gram, but has served as a focal point
shared/home.jhtml. out comprehensive programs to reduce and resource for all the Alliances, par-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 35
viewpoints

NSF BPC Alliances* year, and four-year) to develop a com-


mon data framework enabling them to
focus on students and educational sys-
A4RC (N. Carolina A&T) www.a4rc.org tems as well as operate on various levels
AccessComputing (U. Washington) www.washington.edu/accesscomputing of the education pathway.
CAHSI (U.Texas-El Paso) http://cahsi.org ˲˲ Focusing on undergraduates. Allianc-
Computing Research Association-Women/Coalition www.cra-w.org and cdc-computing.org es employ varied methods, such as in-
to Diversify Computing troductory computer classes designed
STARS (U. N. Carolina) www.starsalliance.org to attract majors and bolster under-
ARTSI (Spelman College) http://artsialliance.org/ prepared students; peer-facilitation in
CAITE (U. Massachusetts) http://caite.cs.umass.edu the gatekeeper courses; undergraduate
El (Rice U) http://empoweringleadership.org professional socialization and research
GeorgiaComputes! (Georgia Tech) http://gacomputes.cc.gatech.edu/ experiences; mentoring; developing
Into the Loop (U. California, LA) http://intotheloop.gseis.ucla.edu undergraduates’ technical excellence,
NCWIT (Nat’l Ctr for Women & IT) http://www.ncwit.org leadership skills, and civic engagement
*as of 2009 around computing; and, partnering un-
dergraduates with younger students so
that both are motivated to reach their
ticularly for those focusing on gender. tives that enable participation by all. personal best in computing.
Jointly, these 11 Alliances have blanket- The BPC Alliances are not about ˲˲ Connecting unlike institutions/Cre-
ed computer science with alternatives quick fixes. Rather, they aim to produce ating new partnership models. Alliances
for diversifying participation in com- systemic changes—changing the tra- build productive relationships between
puter careers (see the table here). jectory of participation in computing dissimilar institutions (for example,
In unison, the BPC Alliances en- for students at various stages of devel- the University of California, Los Ange-
deavor to significantly increase the opment. Such change comes not only les with the Los Angeles Unified School
number of U.S. citizens and permanent through impacts on individual students District; Historically Black Colleges
residents receiving degrees in the com- and educators, but also as institutions and Universities with top research uni-
puting disciplines, with an emphasis adjust their approaches and structures versities). These models feature novel
on students from communities long to enhance the teaching and learning of research collaborations, team learning,
underserved in computing. Cohorts computing. The Alliances embody goal- and multiple educational pathways.
of students—those steeped in poverty, directed change across the educational ˲˲ Creating national, interlocking net-
first-generation college-goers, ignored spectrum. Simply put, they develop tal- works. Alliances socialize computer
through stereotyping and low expecta- ent. Students are encouraged to work science students at all levels, providing
tions, academically underprepared or hard and be successful, whether that students and educators with opportu-
not resembling what computer scien- means entering the work force upon nities to share experiences and develop
tists traditionally look like—are being high school graduation or pursuing a professional skills and knowledge.
reached, gaining confidence and skills, college or advanced degree. Regardless of approach, the Alli-
and making progress toward degrees The American Association for the ances are committed to collaboration,
and careers in computing. Advancement of Science (AAAS) Center serving on each others’ boards, con-
By working to create a critical mass for Advancing Science & Engineering ducting face-to-face meetings of senior
on campuses, the BPC Alliances have Capacityb staff conducted a three-year personnel, contributing to the BPC Por-
built their own infrastructure(s), en- portfolio assessment of the Alliance tal, and disseminating results. Virtually
compassing both the physical (facilities, component of the BPC program. The all of the Alliances work with Access-
instrumentation) and the social (net- Capacity Center found the 11 BPC Alli- Computing to increase their inclusion
works, partnerships) components of ances are implementing various meth- of persons with disabilities, send stu-
their activities. The Alliances extend or- ods to attract, nurture, and retain stu- dents and faculty to the annual STARS
ganizational commitments to educate, dents, using innovative practices and Celebration, and encourage their stu-
train, and utilize science, technology, strategies.2 Four approaches are dem- dents to join the Empowering Leader-
engineering, and mathematics (STEM) onstrating success: ship (EL) Alliance and participate in the
professionals in various communities. ˲˲ Reforming statewide systems. Alli- CRA/CDC’s programs. Together, they
Individually impressive, the BPC Alli- ances work across different institutions are forming a national infrastructure
ances are more than the sum of their and systems (for example, K–12, two- for change. This includes devising a
parts and greater than the sum of their common core of indicators to measure
experiments. Some Alliances target spe- b The AAAS Center for Advancing Science & En- and monitor Alliance progress (see BPC
cific races or ethnicities, others direct gineering Capacity is a fee-for-service consult- Common Core Indicators1).
their efforts toward women or persons ing group that provides institutions of higher
with disabilities, and several Alliances education with assistance in improving deliv- A Snapshot of the Alliances
ery of their educational mission, especially in
reach to all underrepresented groups. science, technology, engineering and math-
Each BPC Alliance has a storyline that
Together, the Alliances are a cohesive ematics fields. Details can be found at www. conveys the excitement of its work. We
entity, providing the field with alterna- aaascapacity.org. encourage readers to visit the Alliance

36 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints

Web sites listed in the accompanying support Type I (smaller-scale studies


table to obtain in-depth and updated of the effectiveness of new instruc-
information. The BPC Alliances tional materials and interventions and
While seeking to increase African- have made the field of strategies to develop K–14 teaching
Americans’ entry into computing re- expertise), Type II (proven effective
search careers, the Alliance for the computing more real implementations taken to scale), and
Advancement of African American and more attainable Planning proposals (support for the
Researchers in Computing (A4RC) and establishment of new partnerships
the Advancing Robotics Technology for for many students. and collaborations to develop Type I
Societal Impact (ARTSI) connect stu- and Type II proposals)5.
dents at Historically Black Colleges Through the EWF program, NSF
and Universities with the resources of seeks to build on the foundation of
top research institutions. A4RC covers a the BPC Alliances to reach more stu-
range of research topics; ARTSI focuses dents and illuminate pathways into
entirely on robotics. innovation economy, that is, econom- computing. An important stated
The AccessComputing Alliance ically, academically, and socially dis- goal of the EWF program, in fact, is
strives to increase the number of stu- advantaged students. Georgia Com- to “transform computing education
dents with disabilities who complete putes! works to attract women and at all levels and on a national scale
postsecondary computing degrees and minorities into computing by build- to meet the opportunities of a world
enter the computing work force. The ing a computing education pipeline where computing is increasingly es-
program leads capacity-building in- across the state of Georgia. Into the sential to all sectors of society.” Con-
stitutes for computing departments. Loop aims to increase the computer sidering the multiple stakeholders
The Computing Alliance of Hispanic- science learning opportunities of involved, the importance of cultivat-
Serving Institutions’ (CAHSI) interven- students in the Los Angeles Unified ing interpersonal relationships, forg-
tions center on undergraduates and the School District and broaden the par- ing and embracing shared values,
gateway of introductory courses, as well ticipation of African Americans, His- and using process and outcome data
as the power of peer groups, to increase panics, and girls in computing via the to monitor and evaluate Alliance con-
the number of Hispanic students enter- Computing Science Equity Alliance. tributions to computing, the BPC Al-
ing the computing professoriate and liances are fulfilling the expectation
work force. Looking Ahead of how transformative models of in-
With a goal of increasing women’s The BPC Alliances have made the field tervention in a STEM discipline look
participation in information technol- of computing more real and more and function.
ogy, the NCWIT has programs in K–12 attainable for many students. In ac-
education, college-level outreach and complishing this feat, BPC Alliances References
1. BPC Common Core Indicators, Post-Workshop
curriculum reform, corporate recruit- highlight at least two key character- Version. Washington, DC: AAAS Working Paper, Feb.
ment and retention, and entrepreneur- istics of good alliances: the ability to 9, 2010; http://php.aaas.org/programs/centers/
capacity/07_Engagement/07_BPCProgram.php.
ial endeavors. collaboratively adjust approaches, 2. Chubin, D.E. and Johnson, R.Y. Telling the Stories
Other Alliances work to increase the structures, and practices; and the abil- of the BPC Alliances: How One NSF Program Is
Changing the Face of Computing. AAAS, Washington,
numbers of all minorities. For example, ity to develop new communication in- D.C., June 2010; http://php.aaas.org/programs/
Students & Technology in Academia, frastructures to more effectively plan, centers/capacity/documents/BPC%20Stories.pdf.
3. Computing Research News. Computing Research
Research & Service (STARS) targets implement, evaluate, and broadly dis- Association, Washington, D.C., 2010; http://www.cra.
undergraduates and directs its efforts seminate effective practices. org/uploads/documents/resources/taulbee/0809.pdf.
4. Congressional Testimony, Oct. 6 1999. Computing
toward all underrepresented groups, In FY 2011, the NSF’s Division of Research Association, Washington, D.C., 2004; http://
including those with disabilities. Its Computer and Network Systems is archive.cra.org/Policy/testimony/lazowska-5.html.
5. National Science Foundation: Broadening Participating
centerpiece is the STARS Leadership investing in a comprehensive Educa- in Computing Program. National Science Foundation,
Arlington, VA; http://www.bpcportal.org/bpc/shared/
Corps, a program that draws students tion and Work Force (EWF) Program. about.jhtml.
from all member institutions in year- The BPC Alliance Program, Comput- 6. U.S. Bureau of Labor Statistics, Office of Occupational
Statistics and Employment Projections. Government
long, team-based leadership projects. ing Education for the 21st Century Printing Office, Washington, D.C., 2009; http://www.
Also working to increase the number (CE21), and the Graduate Research bls.gov/oco/ocos303.htm.
7. WebCASPAR. National Science Foundation, Arlington,
of minorities is the EL Alliance, a pro- Fellowship Program will be funded as VA, 2010; https://webcaspar.nsf.gov/.
gram that provides a safety net to its part of that activity. The Computing
participants by fostering networking Education for the 21st Century (CE21) Daryl E. Chubin (dchubin@aaas.org) is the director of the
opportunities, ongoing communica- program seeks to increase competen- Center for Advancing Science & Engineering Capacity at
the American Association for the Advancement of Science
tion, and a shared learning experience. cies for all students, regardless of gen- in Washington, D.C.
Using community colleges as its der, race, ethnicity, disability status, Roosevelt Y. Johnson (rjohnson@aaas.org) at the time
centerpiece, the Commonwealth Al- or socioeconomic status. By promot- of the work reported here, was a Fellow at the Center
for Advancing Science & Engineering Capacity at the
liance for Information Technology ing and enhancing K–14 computing American Association for the Advancement of Science
Education (CAITE) focuses on women education, it will enhance interest in in Washington, D.C., on leave from the National Science
Foundation.
and minorities in groups that are un- and student preparation for careers in
derrepresented in the Massachusetts computing-intensive fields. CE21 will Copyright held by author.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 37
V
viewpoints

doi:10.1145/1897852.1897867 Marc Snir

Viewpoint
Computer and Information
Science and Engineering:
One Discipline, Many Specialties
Mathematics is no longer the only foundation for computing
and information research and education in academia.

D
u ri n g t h e last 60 years we it into subdisciplines. The dominant
have seen the beginning discourse in our community should
of a major technological The fast evolution be about building a coherent view of
revolution, the Information of IT motivates the broad discipline, building bridges
Revolution. IT has spanned between its constituents, and build-
large new economic sectors and has, a periodic ing bridges to other disciplines as we
over a long period, doubled the rate reexamination engage in interdisciplinary research.
of increase in labor productivity in the I hope this column will contribute to
U.S.1,16 Over two-thirds of job open- and reorganization these goals.
ings in science and engineering in of computing and
the coming decade are in IT.12 Intel- C&I Is a Use-Driven
lectual property, rather than physical information (C&I) Research Discipline
assets, has become the main means of research and I am discussing in this column the
production: control over intangibles broad field of Computing and Informa-
(such as patents and copyrights) are education in academia. tion Science and Engineering (CISE):
at the forefront of the national and in- the study of the design and use of digital
ternational business agenda;6,23 invest- systems that support storing, process-
ment by industry in intangible assets ing, accessing and communicating in-
has overtaken investment in tangible formation. To prevent possibly mislead-
means of production.7,19 (C&I) research and education in aca- ing connotations, I shall call this broad
The information revolution is far demia. We seem to be in one such peri- field Computing and Information (C&I).
from having run its course: “machine- od. Many universities have established We still seem to be debating whether
thought” has not yet replaced “brain- or expanded schools and programs computer science is science, engineer-
thought,” to the extent that “machine- that integrate a broad range of subdis- ing, or something unlike any other aca-
made” has replaced “hand-made.” One ciplines in C&I; and NSF is affecting demic discipline (see, for example9,11).
can be confident that the use of digital the scope of research and education in The debate is often rooted in a linear
technologies will continue to spread; C&I through the creation of programs view of science and engineering: Sci-
that more and more workers will move such as the Cyber-Enabled Discovery entists seek knowledge, for knowledge
from the physical economy to the in- and Innovation (CDI) and Pathways to sake; through a mysterious process,
formation economy; and that people Revitalized Undergraduate Computing this knowledge turns out to have prac-
will spend more and more of their work Education (CPATH) programs.21,22 tical consequences and is picked up by
and leisure time creating, manipulat- I strongly believe that C&I is one broad applied scientists, next engineers, and
ing, and communicating information. discipline, with strong interactions be- then used to develop better technolo-
The fast evolution of IT motivates a tween its various components. A coher- gies. This view encourages an implicit
periodic reexamination and reorgani- ent view of the whole must precede any value system whereby science is seen a
zation of computing and information discussion of the best ways of dividing higher call than engineering.

38 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


viewpoints

Donald Stokes, in his book Pas- The diagram in Figure 2 describes C&I Needs Broader Foundations
teur’s Quadrant,25 leads a powerful not only engineering departments, C&I was lucky to develop early on
attack against this simplistic view but also other use-oriented disciplines mathematical abstractions that rep-
of science. He points out that, over such as medicine or agriculture. Fur- resented important constraints on
the centuries, fundamental research thermore, concern about impact and computing devices, such as time and
has been often motivated by con- use, and research in “Pasteur’s Quad- space complexity; this enabled C&I to
siderations of use—by the desire to rant,” are increasingly prevalent in develop useful artifacts while being
implement certain processes and science departments, be it life sci- fully contained within the confines of
achieve certain goals—not (not only) ences, social sciences, or physical sci- mathematics: The early development
by the desire to acquire knowledge for ences. Only a few purists would claim of algorithms, programming languag-
knowledge’s sake. His paradigmatic that departments are weakened by es, compilers or operating systems re-
example is Pasteur, who founded such concerns. quired no knowledge beyond C&I and
modern microbiology, driven by the The diagram in Figure 2 clearly ap- its mathematical foundations.
practical goal of preserving food. plies to C&I. Our discipline is use-in- Mathematics continues to be
According to Stokes, research spired: We want to build better comput- the most important foundation for
should be described as a two-dimen- ing, communication, and information C&I: The artifacts produced by C&I
sional space, as shown in Figure 1. systems. This occasionally motivates researchers and practitioners are
Stokes further argues that “Pasteur’s use-inspired basic research (for exam- algorithms, programs, protocols,
Quadrant,” namely use-inspired basic ple, complexity, cryptography), and of- and schemes for organizing infor-
research, is increasingly prevalent in ten involves applied research (such as mation; these are mathematical or
modern research institutes. The argu- architecture, databases, graphics). The logical objects, not physical objects.
ment of Stokes strongly resonates with design and experimentation with pro- Algorithms, programs or protocols
schools of engineering, or computer totypes is essential in system research. are useful once realized, executed
science. Most of their faculty members C&I scientists use scientific methods in or embodied in a physical digital de-
pursue scientific research that has a their research;8,10 and there is a contin- vice; but they are mostly studied as
utilitarian justification; their research ued back and forth between basic and mathematical objects and the prop-
is in “Pasteur’s Quadrant.” applied research and between academ- erties studied do not depend on their
Any engineering department in a ic research and the development of dig- physical embodiment. Indeed, one
modern research university is a science ital products and services by industry. might call much of C&I “mathemati-
and engineering department. This is
Figure 1. Pasteur’s Quadrant (adapted from Stokes28).
often indicated by the department’s
name: Material Science and Engineer-
ing, Nuclear Science and Engineering,
or even Engineering Science (at Oxford Consideration of use?
University). Figure 2 describes the re- No Yes
search activities in such a department. Quest for No Pure applied research
Faculty members perform basic use- fundamental (Edison)
inspired or applied research related understanding? Yes Pure basic research (Bohr) Use-inspired basic research
to the applications of their discipline. (Pasteur)

The foundations guiding this research


and constraining the engineering de-
sign space are natural sciences—mostly Figure 2. Modern engineering research.
physics.a The practical goal of their re-
search is to enable the production of
better artifacts or better processes. The
design of and experimentation with
prototypes often is an essential step Use-inspired Applied
in the transfer of knowledge from re- basic research research

search to practice, as they provide a


proof of concept, a test and validation
for theories, and a platform to experi-
Foundations Prototypes
ment with design alternatives. I believe
it is the richness of the feedback loops be-
tween research and practice and between Artifacts,
basic research and applied research that processes
best characterizes top engineering de-
partments. Products

a Using the definition of engineering as “design


under constraints.”28

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 39
viewpoints

cal engineering”b: it is focused on the I believe, however, that physical and the technical: One may well argue
creation of new mathematical objects constraints are a small fraction of the that the essential insight that enabled
under constraints, such as low time constraints relevant to the design of C&I efficient Web search and led to the cre-
and space complexity for discrete algo- systems. For example, software engi- ation of companies such as Google is
rithms, good numerical convergence neering research has strived for de- that the structure of the Web carries
for numerical algorithms, or good cades to define code metrics that rep- information about the usefulness of
precision and recall for classifiers; the resent how complex a code is (hence, Web pages—a socio-technical insight.
difference between mathematics and what effort is required to program or Progress in graphics and animation
“mathematical engineering” is precise- debug it)—with limited success. Such increasingly requires an understand-
ly the emphasis on such constraints. a code metric would measure how dif- ing of human vision: otherwise, one
As technology progresses, new con- ficult it is for a programmer to com- makes progress in quality metrics that
straints need to be considered. For ex- prehend a code. But this is a cognitive have low correlation to the subjective
ample, time complexity is increasingly issue: It is highly unlikely that one can quality of an image; examples can be
irrelevant when communication (to develop successful theories on this easily multiplied.
memory, disk, and network) replaces subject without using empirically vali- Another important aspect of the
computation as the main performance dated cognitive models that are based evolution of our field is the increasing
bottleneck, and when energy consump- on our best understanding of human importance of applications. Precisely
tion becomes the critical constraint. cognition. Unfortunately, traditional because software is so malleable and
New technologies that will take us “Be- software engineering research has not universal, one can develop very spe-
yond Moore’s Law” (quantum comput- been rooted in cognitive sciences. cialized systems to handle the needs
ing, molecular computing) will require Cognitive, cultural, social, organiza- of various disciplines: computer-aided
new mathematical abstractions. tional, and legal issues are increasingly design, medical imaging, DNA match-
Part of C&I, namely computer en- important to engineering, in general.5 ing, Web auctions—these are but a
gineering, has always been concerned This is a fortiori true for C&I. In the ear- few examples of application areas that
with the interplay between the math- ly days of computing, only few people have motivated significant specialized
ematical abstractions and their physi- interacted directly with computers: C&I research. Such research cannot be
cal embodiment. In addition to math- the psychology of programmers or us- successful without a good understand-
ematics, physics is foundational for ers could be ignored without too much ing of the application area.
this specialty, and will continue to be inconvenience: these few people would This suggests a new view for the
so. Physics is also important for cyber- adapt to the computer. Today, the situ- organization of C&I that is described
physical systems that directly interact ation is vastly changed: Billions of peo- in Figure 3: Mathematics is no longer
with their physical environment. ple interact daily with digital devices the only foundation. For those working
and C&I systems become intimately close to hardware or working on cyber-
involved in many cognitive and social physical systems, a good foundation
b Mathematical engineering was apparently processes; it is not possible anymore in physics continues to be important.
used as a synonym for “computer science” in
Holland, in the early days of the discipline. It
to ignore the human in the loop. In- An increasing number of C&I research
is now used by some schools as a synonym for deed, interesting research increasingly areas (such as human-computer inter-
“scientific computing.” occurs at the intersection of the social action, social computing, graphics and
visualization, and information retriev-
Figure 3. C&I—An inclusive view. al) require insights from the social sci-
ences (cognitive psychology, sociology,
anthropology, economics, law, and so
forth); human subject experiments be-
Use-inspired Applied
come increasingly important for such
basic research research research. At a more fundamental level,
the development of artificial cognitive
systems provides a better understand-
ing of natural cognitive systems—of
Prototypes the brain and its function; and para-
Mathematics, digms borrowed from C&I become
statistics,
foundational in biology. Insights from
social sciences,
physical neuroscience provide a better way of
sciences… Science, building artificial intelligent systems
engineering, Products and biology may become the source of
arts, humanities,
Foundations future computing devices. Finally, re-
business,
medicine Systems, applications, search in C&I is strongly affected by the
data repositories multiple application areas where infor-
Application areas
mation technology is used (such as sci-
ence, humanities, art, and business),
and profoundly affects these areas.

40 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
viewpoints

Organizational Implications To clarify: A common core is not


Similar to a school of medicine, a col- about what every student in C&I must
lege of agriculture, or an engineering We can and know: most of the specific knowl-
department, I believe the correct orga- should develop an edge we teach will be obsolete long
nizational principle for a use-driven re- before our students reach retirement
search area such as C&I is not common environment where age. A common core is about C&I
foundations, but shared concerns about no scientist has an “education,”c not about C&I know-how.
the use of C&I systems. The view illus- It is about educating students in ways
trated in Figure 3 does not imply that incentive to withhold of thinking and problem solving that
each C&I researcher needs to be an ex- information. characterize our community and dif-
pert in all core sciences or application ferentiate us from other communities:
areas. Rather, it implies that C&I re- a system view of the world, a focus on
searchers with different foundational mathematical and computational rep-
knowledge and knowledge of different resentations of systems, information
application domains will often need representation and transformation,
to work together in order to design, Undergraduate Curriculum and so forth. The selection of courses
implement, and evaluate C&I systems I discussed in the previous section the for the core will not be based (only or
and provide students with the educa- increasing variety of C&I research. In mostly) on the usefulness of the facts
tion needed to do so. addition, there is a tremendous di- taught, but on the skills and concepts
The broad, integrated view of C&I versification of the professional ca- that are acquired by the students.
is reflected at the NSF in the name reers in IT. Less than half of students I believe such a common core is
of the Directorate for Computer and who graduated in computer science extremely important: It is, to a large
Information Science and Engineer- in 1992–1993 were employed in tradi- extent, what defines a discipline: You
ing (CISE). It is no surprise these days tional computer science professions can expect a student of physics to take
to find a linguist, anthropologist, or 10 years after graduation (compared a sequence of physics courses that start
economist in a research lab at Micro- for 57% in engineering and 69% in with mechanics and end with quantum
soft or Yahoo. Some U.S. universities health sciences).4 In many computer physics. This is not necessarily what
(including Carnegie-Mellon, Cornell, science departments, more than half those students will need in their fu-
Georgia Tech, Indiana, Michigan, of the students graduating with bach- ture careers; but those courses define
and the University of California at Ir- elor’s degrees are hired by companies the physics canon. If we take ourselves
vine) are establishing or expanding in finance, services, or manufactur- seriously as a discipline, we should be
schools or colleges that bring under ing, not by IT companies; this is where able to define the C&I canon. Like phys-
one roof computer science, informa- most of the growth in IT jobs is expect- ics, this core should be concise—say
tion science, applied informatics (C&I ed to be.12 The Bureau of Labor Statis- four courses: A common core does not
research that is application domain tics tracks a dozen different occupa- preclude variety and specialization in
specific) as well as interdisciplinary tions within computing12 (although its junior and senior years.
research and education programs. categories are somewhat obsolete). A
These universities are still a minority. recent Gartner report20 suggests the IT Eating Our Own Dog Food
The broad, inclusive model is com- profession will split into four distinct IT has a profound impact on the way
mon in Japan (University of Tokyo, professions: technology infrastruc- the information economy works. It can
Kyoto University, Tokyo Institute of ture and services, information design and should have a profound impact
Technology, Osaka University), and is and management, process design and on the operation of universities that
becoming more prevalent in the U.K. management, and relationship and are information enterprises par excel-
(Edinburgh, Manchester). sourcing management. lence. The C&I academic community
While organization models will dif- These trends imply an increasing can and should have a major role in
fer from university to university, it is diversification of C&I education. Cur- pioneering this change. We should be
essential that all C&I units on a cam- rently, ABET accredits three different ahead of the curve in using advanced IT
pus develop an integrative view of their types of computing programs; ACM in our professional life, and using it in
field, and jointly develop coordinated has developed recommendations for ways that can revolutionize our enter-
research and education programs. five curricula. Many schools experi- prise. I illustrate the possibilities with
This may require a change of attitude ment with more varied majors and a few examples here.
from all involved. Many cognitive and interdisciplinary programs—in par- William J. Baumol famously ob-
social aspects of system design are ticular, Georgia Tech.17 This evolution served that labor productivity of musi-
not amenable to quantitative stud- could lead to an increasing balkaniza- cians has not increased for centuries:
ies; however, the engineering culture tion of our discipline: It is fair to assert it still takes four musicians to play
is often suspicious of social sciences that we are still more concerned with a string quartet.2 This has become
and dismissive of qualitative sciences. differentiating the various programs
Conversely, the importance of proto- than defining their common content. c “Education is what remains after one has
types and artifacts is not always well In particular, should there be a core forgotten everything he learned in school”—
appreciated outside engineering. common to all programs in C&I? A. Einstein.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 41
viewpoints

known as “Baumol’s cost disease:”


Some sectors are labor intensive, re-

ACM LAUNCHES quire highly qualified personnel, and


see no increases in labor productivity,
due to improved technology. This is
ENHANCED DIGITAL LIBRARY true for higher education: As long as a
main measure of the quality of higher
education is the student/faculty ratio,
teaching productivity of faculty can-
not increase; as long as faculty sala-
ries keep up with inflation, the cost
of higher education will keep up with
inflation.d Such a situation will lead
to the same pressures we see now in
the health sector, and will force major
changes. IT is, in many service sectors,
the cure for Baumol’s cost disease;27
can it be in higher education?
IT often cures Baumol’s cost dis-
The new DL simplifies usability, extends ease not by increasing labor produc-
tivity, but by enabling a cheaper, re-
placement service. It still takes four
connections, and expands content with: musicians to play a string quartet, but
digital recording enables us to enjoy
the music where and when we want to
• Broadened citation pages with tabs hear it. ATMs replace bank tellers, In-
ternet shopping replaces sales clerks.
for metadata and links to expand The convenience of getting a service
exploration and discovery where and when we want it, and the
lower cost of self-service, compensate
for the loss of personal touch. To many
• Redesigned binders to create of our students, the idea that one must
attend a lecture at a particular place
personal, annotatable reading lists for and time in order to obtain a piece of
sharing and exporting information chosen by the lecturer is
as antiquated as pre-Web shopping. In-

• Enhanced interactivity tools to retrieve


creasingly, students will want to obtain
the information they need when and
data, promote user engagement, and where they want it. An increasing shift
to “self-service” education that is “stu-
introduce user-contributed content dent pull” based, rather than “lecturer
push” based, may well be the cure to
• Expanded table-of-contents service Baumol’s cost disease in higher educa-
tion, as well as the cure to the depress-
for all publications in the DL ing passivity of many students.
“Self-service” education need not
imply a lack of social interaction. The
study of Richard J. Light, at Harvard,
Visit the ACM Digital Library at: indicated that participation in a small
student study group is a stronger de-
terminant of success in a course than
dl.acm.org
d Note, however, that the recent fast rise in the
cost of higher education in the U.S. is not due
to increases in faculty salaries. According to
the AAUP, faculty salaries have risen in real
terms by 7% in the last three decades (http://
www.aaup.org/AAUP/GR/CapHill/2008/rising-
costs.htm); state support to public universi-
ties has shrunk by more than one-third during
this period.13

42 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
ACM_DL_Ad_CACM_2.indd 1 2/2/11 1:54:56 PM
viewpoints

the teaching style of the instructor.20 only provide researchers with data that Schuster, New York, 1953.
4. Choy, S.P., Bradburn, E.M., and Carroll, C.D. Ten
Rather than focusing on the use of IT cannot be obtained otherwise but also Years After College: Comparing the Employment
to improve the lecture experience, we change in fundamental ways the re- Experiences of 1992–93 Bachelor’s Degree Recipients
With Academic and Career-Oriented Majors, National
should probably focus on the use of IT lation of the scientist to the object of Center for Education Statistics, Institute of Education
and social networking tools to make study. The volunteers are unlikely to be Sciences, U.S. Department of Education, 2008.
5. Clough, G.W. The Engineer of 2020: Visions of
individual and group self-study more motivated by pure scientific curiosity; Engineering in the New Century. National Academy of
productive by multiplying the interac- they want the research they participate Engineering Press, Washington, D.C. (2004).
6. Cohen, W.M. and Merrill, S.A. Patents in the
tion channels between students and in to have an impact—save the environ- Knowledge-based Economy. National Academies
between students and faculty.15 ment or cure cancer. The researcher Press, 2003.
7. Corrado, C.A., Sichel, D.E., and Hulten, C.R. Intangible
As the half-life of knowledge grows that uses their data has an implicit or Capital and Economic Growth. Board of Governors of
the Federal Reserve System City, 2006.
shorter, it becomes less important to explicit obligation to use the data col- 8. Denning, P.J. Computer science: The discipline.
impart specific knowledge to students lected for that common purpose and Encyclopedia of Computer Science (2000).
9. Denning, P.J. Is computer science science? Commun.
(and to test them on this knowledge) not use it for other purposes. Research ACM 48, 4 (Apr. 2005), 27–31.
and more important to teach them how becomes engaged and obligated to a 10. Denning, P.J. et al. Computing as a discipline.
Commun. ACM 32, 1 (Jan. 1989).
to learn, how to identify and leverage large community.17 11. Denning, P.J. and Freeman, P.A. Computing’s
sources of knowledge and expertise, IT enables the fast dissemination paradigm. Commun. ACM 52, 12 (Dec. 2009), 28–30.
12. Dohm, A. and Shniper, L. Occupational employment
and how to collaborate with experts in of scientific observations and results. projections to 2016. Monthly Labor Review Online 130,
other areas, creating collective knowl- Research progresses faster if observa- 11 (Nov. 2007).
13. Ehrenberg, R. et al. Financial forces and the future
edge. Yet our education is still strongly tional data and preliminary results are of American higher education. Academe 90, 4 (Apr.
focused on acquiring domain-specific shared as quickly and as broadly as pos- 2004), 28–31.
14. Furst, M. and DeMillo, R.A. Creating symphonic-
individual knowledge; and students sible. One obstacle to such unimpeded thinking computer science graduates for an
mostly collaborate with other students sharing is that academic careers are increasingly competitive global environment. White
Paper, College of Computing, Georgia Institute of
that have similar expertise. Projects fostered by the publication of polished Technology (2004).
and practicums that involve teams of analyses, not by the publication of raw 15. Haythornthwaite C. et al. New theories and models
of and for online learning. First Monday 12, 8 (August
students from different programs, with data or partial results: Research groups 2007).
16. Jorgenson, D.W., Ho, M.S. and Stiroh, K.J. Information
different backgrounds, could refocus tend to hold on to their data until they Technology and the American Growth Resurgence.
education so as to train more foxes and can analyze it and obtain conclusive re- MIT Press, Cambridge, Mass., 2005.
17. Krasny, M.E. and Bonney, R. Environmental education
fewer hedgehogse—a change I believe sults. Better ways of tracking the prov- through citizen science and participatory action
will benefit many of our students. Such enance of data used by researchers and research. Environmental Education and Advocacy:
Changing Perspectives of Ecology and Education
collaborative learning-by-doing em- the web of mutual influences among (2005), 292–319.
powers students, increases motivation, researchers would enable to track the 18. Light, R.J. Making the Most of the College: Students
Speak their Minds. Harvard University Press, 2004.
improves retention and teaches skills impact of contributions other than 19. Marrano, M.G., Haskel, J.E. and Wallis, G. What
that are essential for success in the in- polished publications and develop a Happened to the Knowledge Economy? ICT,
Intangible Investment and Britain’s Productivity
formation society. A skillful use of IT merit system that encourage more in- Record Revisited. Department of Economics, Queen
technology, both for supporting course formation sharing. We can and should Mary, University of London, 2007.
20. Morello, D. The IT Professional Outlook: Where Will
activities and for assessing teaching develop an environment where no sci- We Go From Here? Gartner, 2005.
and learning, can facilitate this educa- entist has an incentive to withhold in- 21. National Science Foundation—Directorate Computing
and Information Science and Engineering. CISE
tion style.f formation. Pathways to Revitalized Undergraduate Computing
IT changes the way research is pur- C&I has been, for years, an amaz- Education (CPATH). NSF, Arlington, VA.
22. National Science Foundation—Directorate Computing
sued: For example, it enables citizen ingly vibrant, continuously renew- and Information Science and Engineering. Cyber-
Enabled Discovery and Innovation (CDI). NSF,
science projects where many volun- ing intellectual pursuit that has had Arlington, VA.
teers collect data. Such projects have a profound impact on our society. It 23. Sell, S.K. Private Power, Public Law: The Globalization
of Intellectual Property Rights. Cambridge University
become prevalent in environmental has succeeded being so by continu- Press, 2003.
sciences24 and are likely to have a large ously pursuing new uses of IT and 24. Silvertown, J. A new dawn for citizen science. Trends
in Ecology and Evolution 24, 9 (Sept. 2009).
impact on health sciences. They not continuously adjusting disciplinary 25. Stokes, D.E. Pasteur’s Quadrant. Brookings Institution
focus in research and education so Press, 1997.
e “The fox knows many things, but the hedgehog 26. Thelwall, M. Can Google’s PageRank be used to find
as to address the new problems. This the most important academic Web pages? Journal of
knows one big thing.”3 Sir Isaiah Berlin distin-
fast evolution must continue for our Documentation 59, 2 (Feb. 2003), 205–217.
guishes between hedgehogs—thinkers “who 27. Triplett, J.E. and Bosworth, B.P. ‘Baumol’s disease’
relate everything to a single central vision,” discipline to stay vital. IT will continue has been cured: IT and multifactor productivity in U.S.
and foxes—thinkers who “pursue many ends, to be a powerful agent of change in our services industries. Edward Elgar Publishing, City, 2006.
28. Wulf, W.A. The Urgency of Engineering Education
often unrelated and even contradictory, con- society and, to drive this change, we Reform. The Bridge 28, 1 (Jan. 1998), 48.
nected only in some de facto way.” Although must continuously change and strive to
the essay of Isaiah Berlin focuses on Russian
writers, I see “foxiness” as being very much the
change our academic environment. Marc Snir (snir@illinois.edu) is Michael Faiman and
Saburo Muroga Professor in the Department of Computer
tradition of American Pragmatism. Both types Science at the University of Illinois at Urbana-Champaign.
are needed in our society, but “hedgehogs”
References
who prize the hedgehog way of thinking seem 1. Atkinson, R.D. and McKay, A.S. Digital Prosperity: I thank Martha Pollack for her careful reading of an early
to dominate in academia, especially in science Understanding the Economic Benefits of the version of this Viewpoint and for her many suggestions.
and engineering. Information Technology Revolution. Information Some of the ideas presented here were inspired by a talk
Technology and Innovation Foundation, 2007. by John King. This Viewpoint greatly benefited from the
f The recently started International Journal on
2. Baumol, W.J. and Bowen, W.G. Performing Arts: The detailed feedback of one of the referees.
Computer-Supported Collaborative Learning Economic Dilemma, 1966.
provides several useful references. 3. Berlin, I. The Hedgehog and the Fox. Simon & Copyright held by author.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 43
practice
doi:10.1145/1897852.1897868
We don’t have to scratch the surface
Article development led by
queue.acm.org
very hard to find cracks in the belief
system of deterministic management.
Experienced system practitioners know
Models of determinism are changing deep down that they cannot think of
IT management. system administration as a simple pro-
cess of reversible transactions to be
by Mark Burgess administered by hand; yet it is easy to
see how the belief stems from classical

Testable
teachings. At least half of computer sci-
ence stems from the culture of discrete
modeling, which deals with absolutes
as in database theory, where idealized

System
conditions can still be simulated to an
excellent approximation. By contrast,
the stochastic models that originate
from physics and engineering, such

Administration
as queueing and error correction, are
often considered too difficult for most
basic CS courses. The result is that sys-
tem designers and maintainers are ill
prepared for the reality of the Unexpect-
ed Event. To put it quaintly, “systems”
are raised in laboratory captivity under
ideal conditions, and released into a
wild of diverse and challenging circum-
stances. Today, system administration
still assumes, for the most part, that
the world is simple and deterministic,
The methods of system administration have but that could not be further from the
truth.
changed little in the past 20 years. While core In the mid-1990s, several research
IT technologies have improved in a multitude of practitioners, myself included, argued
for a different model of system admin-
ways, for many if not most organizations system istration, embracing automation for
administration is still based on production- consistency of implementation and
line build logistics (aka provisioning) and using policy to describe an ideal state.
The central pillar of this approach was
reactive incident handling—an industrial-age stability.2,4 We proposed that by plac-
method using brute-force mechanization to ing stability center stage, one would
achieve better reliability (or at the very
amplify a manual process. As we progress into least predictability). A tool such as IT
an information age, humans will need to work is, after all, useful only if it leads to
less like the machines they use and embrace consistently predictable outcomes.
This is an evolutionary approach to
knowledge-based approaches. That means management: only that which survives
exploiting simple (hands-free) automation that can be successful.
As a physicist by training, I was sur-
leaves us unencumbered to discover patterns and prised by the lack of a viable model for
make decisions. This goal is reachable if IT itself explaining actual computer behavior.
opens up to a core challenge of automation that is It seemed that, instead of treating be-
havior as an empirical phenomenon
long overdue—namely, how to abandon the myth full of inherent uncertainties, there
of determinism and expect the unexpected. was an implicit expectation that com-

44 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


puters would behave as programmed. repairing—and if you’ve only got a of predictable, actionable repairs, in
Everyone knows this to be simplistic; sledgehammer...then you rebuild. spite of environmental indetermin-
yet still, a system administrator would There is growing acceptance of a ism, then automating maintenance
worry about behavior only when an inci- test-driven or diagnostic approach would become a simple reality. This, in
dent reported something to the contrary. to the problem. This was originally essence, is how Cfengine changed the
ushered in by Cfengine,5 and then landscape of IT management.
From Demolition to Maintenance partially adopted in other software The term compliance is often used
When a problem occurs, many orga- such as Puppet.11 In a test-driven ap- today for correctness of state with re-
nizations take affected systems out proach, system state is regulated by spect to a model. If a system deviates
of service, wipe them, and restore continual reappraisal at a microscop- from its model, then with proper au-
them from backup or reinstall from ic level, like having a groundskeeper tomation it self-repairs,2,4 somewhat
scratch. This is the only way they watch continuously over an estate, like an autopilot that brings systems
know to assure the state of the system plucking the weeds or applying a lick back on course. What is interesting
Illustratio n by stua rt b ra dford

because they know no simple way of of paint where needed. Such an ap- is that, when you can repair system
discovering what changed without an proach required the conceptual leap state (both static configuration and
arduous manual investigation. The to a computable notion of mainte- runtime state), then the initial condi-
process is crude, like tearing down a nance. Maintenance can be defined tion of the system becomes unimport-
building to change a lightbulb. But by referring to a policy or model for ant, and you may focus entirely on
the reason is understandable. Cur- an ideal system state. If such a model the desired outcome. This is the way
rent tools are geared for building, not could somehow be described in terms businesses want to think about IT—in

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 45
practice

terms of goals rather than “building journey, because you often need to re- # >10kb of complex stuff
projects”—thus also bringing us clos- compute the path when the unexpected
er to a modern IT industry. occurs, such as a closed road. This GPS To fix both problems, it is sufficient
approach was taken by Cfengine5 in the to alter only this list (for example, a de-
Convergence to a Desired State mid-1990s. It says: work relative to the sired outcome):
Setting up a reference model for repair desired end-state of your model, not an
sounds like a simple matter, but it re- initial baseline configuration, because # >10kB of complex stuff
quires a language with the right proper- the smallest unexpected change breaks MODULES = JAVA OTHERS PHP
ties. Common languages used in soft- a recipe based on an initial state. This # >10kB of complex stuff
ware engineering are not well suited has been likened to Prolog.7
for the task, as they describe sequential In simple terms, the approach Traditionally, one replaces the
steps from a fixed beginning rather works by making every change satisfy a whole file with a hand-managed tem-
than end goals. Generally, we don’t simple algorithm: plate or even reinstalls a new package,
know the starting state of a machine forcing the end user to handle every-
when it fails. Moreover, a lot of redun- Change (arbitrary_state) → desired_state (1) thing from the ground up. Using a de-
dant computation is required to track a Change (desired_state) → desired_state (2) sired state approach, we can simple
model, and that would intrude on clari- say: in the context of file webserv-
ty. The way around this has been to con- This construction is an expression er.config, make sure that any line
struct declarative DSLs (domain-specif- of “dumb” stability, because if you per- matching “MODULES = something”
ic languages) that hide the details and turb the desired state into some arbi- is such that “something” contains
offer predictable semantics. Although trary state, it just gets pushed back into “PHP” and does not contain “SECURI-
Cfengine was the first attempt to han- the desired state again, like an auto- TY HOLE.” Figure 1 illustrates how this
dle indeterminism, special languages mated course correction. It represents might look in Cfengine.
had been proposed even earlier.9 a system that will recover from acciden- Thus, the code defines two inter-
Many software engineers are not tal or incidental error, just by repeating nal list variables for convenience and
convinced by the declarative DSL ar- a dumb mantra—without the need for passes these to the specially defined
gument: they want to use the famil- intelligent reasoning. method edit_ listvar, which is con-
iar tools and methods of traditional For example: suppose you want to structed from convergent primitives.
programming. For a mathematician reconfigure a Web server to support For each item in the list, Cfengine will
or even a carpet fitter, however, this PHP and close a security hole. The serv- assure the presence or absence of the
makes perfect sense. If you are trying er and all of its files are typically part of listed atoms without touching any-
to fit a solution to a known edge state, a software package and is configured thing else. With this approach, you
it is cumbersome to start at the oppo- by a complex file with many settings: don’t need to reconstruct the whole
site end with a list of directions that as- Web server or know anything about
sume the world is fixed. When you pro- # >10kB of complex stuff how it is otherwise configured (for ex-
gram a GPS, for example, you enter the MODULES = SECURITY _ HOLE JAVA ample, what is in “complex stuff”)
desired destination, not the start of the OTHERS or even who is managing it: a desired
end-state relative to an unknown start-
Figure 1. Reconfiguring a Web server in Cfengine. state has been specified. It is a highly
compressed form of information.
bundle agent webserver_config I referred to this approach as con-
{ vergent maintenance (also likening
vars:
“add” slist => { “PHP”, “php5” };
the behavior to a human immune
“del” slist => { “SECURITY_HOLE”, “otherstuff” }; system2), as all changes converge on
a destination or healthy state for the
column_edits: system in the frame of reference of the
“APACHE_MODULES=.*”
policy. Later, several authors adopted
edit_column => edit_listvar(“$(add_modules)”,”append”); the mathematical term idempotence
“APACHE_MODULES=.*” (meaning invariance under repetition),
edit_column => edit_listvar(“$(del_modules)”,”delete”);
focusing on the fact that you can apply
}
these rules any number of times and
[Note: The syntax (which incorporates implicit guards and iteration) the system will only get better.
has the form:

type_of_promise:
Guarded Policy
In the most simplistic terms, this ap-
“Atom” proach amounts to something like
Dijkstra’s scheme of guarded com-
property_type => desired_end_state;
] mands.8 Indeed, Cfengine’s language
implementation has as much in com-
mon with Guarded Command Lan-

46 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice

guage as it does with Prolog.7 The as- tion suggests a panacea, ushering in
sertion of X as a statement may be a new and perfect world. Alas, the ap-
interpreted as: proach can be applied only partially
to actual systems because no actual
If not model(X), set model(X)
If you are trying to systems are built using these pure con-
structions. Usually, multiple change
For example: fit a solution to a mechanisms tether such atoms togeth-

known edge state,


er in unforeseeable ways (for example,
“/etc/passwd” create => “true”; packages that bundle up software and

Repeating an infinite number of


it is cumbersome prevent access to details). The approxi-
mation has worked remarkably well
times does not change the outcome. to start at the in many cases, however, as evidenced
With hindsight, this seems like a trivial
observation, and hardly a revolutionary
opposite end with by the millions of computers running
this software today in the most exact-
technology, yet it is the simplest of in- a list of directions ing environments. Why? The answer
sights that are often the hardest won.
The implication of this statement is that assume is most likely because a language that
embodies such principles encourages
that X is not just what you want, but a the world is fixed. administrators to think in these terms
model for what should be. The separa- and keep to sound practices.
tion of the intended from the actual is
the essence of the relativity. Tangled by Dependency:
There is one more piece to the puz- The Downside of Packaging
zle: Knowing the desired state is not The counterpoint to this free atomiza-
enough; we also have to know that it is tion of system parts is what software
achievable. We must add reachability designers are increasingly doing today:
of the desired state to the semantics. bundling atoms and changes together
into packages. In modern systems
Getting Stuck in Local Minima packaging is a response to the com-
It is well known from artificial intelli- plexity of the software management
gence and its modern applications that process. By packaging data to solve one
algorithms can get stuck while search- management problem, however, we
ing parameter landscapes for the opti- lose the resolution needed to custom-
mum state. When you believe yourself ize what goes on inside the packages
at the bottom of the valley, how do you and replace it with another. Where a
know there is not a lower valley just high degree of customization is need-
over the rise? To avoid the presence of ed, unpacking a standard “package up-
false or local minima, you have to en- date” is like exploding a smart bomb in
sure that each independent dimension a managed environment—wiping out
of the search space has only a single customization—and going back to the
minimum, free of obstacles. Then demolition school of management.
there are two things at work: indepen- We don’t know whether any oper-
dence and convergence. Independence ating system can be fully managed
can be described by many names: ato- with convergent operations alone, nor
micity, autonomy, orthogonality, and whether it would even be a desirable
so on. The essence of them all is that goal. Any such system must be able to
the fundamental objects in a system address the need of surgically precise
should have no dependencies. customization to adapt to the envi-
What we are talking about is a theo- ronment. The truly massive data cen-
ry of policy atoms in an attribute space. ters of today (Google and Facebook)
If you choose vectors carefully (such are quite monolithic and often less
as, file permissions, file contents, and complex than the most challenging
processes) so that each change can be environments. Institutions such as
made without affecting another, no banks or the military are more repre-
ordering of operations is required to sentative, with growth and acquisition
reach a desired end-state, and there cultures driving diverse challenges to
can be only one minimum. Indeed or- scale. What is known is that no pres-
der-independence can be proven with ent-day operating system makes this
periodic maintenance as long as the a completely workable proposition.
operators form irreducible groups.3,6 At best one can approximate a subset
The discovery of such a simple solu- of management operations, but even

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 47
practice

this leads to huge improvements in represented by the extent to which a


scalability and consistency of pro- promise is kept; and the insensitivity to
cess—by allowing humans to be taken initial conditions is taken care of by the
out of the process. fact that promises describe outcomes,

From Demanding Compliance Promise theory not initial states.


Promise theory turns out to be a
To Offering Capability
What is the future of this test-driven
is a wide-ranging rather wide-ranging description of co-
operative model building that thinks
approach to management? To under- description of bottom-up instead of top-down. It can
stand the challenges, you need to be
aware of a second culture that pervades
cooperative model be applied to humans and machines in
equal measure and can also describe
computer science: the assumption of building that thinks human workflows—a simple recipe for
management by obligation. Obliga-
tions are modal statements: for exam- bottom-up instead federated management. It has not yet
gained widespread acceptance, but its
ple, X must comply with Y, A should do of top-down. principal findings are now being used
B, C is allowed to do D, and so on. The
assumption is that you can force a sys- It can be applied to restructure some of the largest orga-
nizations in banking and manufactur-
tem to bow down to a decision made
externally. This viewpoint has been the
to humans and ing, allowing them to model complex-
ity in terms of robust intended states.
backbone of policy-based systems for machines in Today, only Cfengine is intentionally
years,12 and it suffers from a number of
fundamental flaws.
equal measure based on promise theory principles,
but some aspects of Chef’s decentral-
The first flaw is that one cannot gen- and can also ization10 are compatible with it.
erally exert a mandatory influence on
another part of a software or hardware
describe human The Limits of Knowledge
system without its willing consent. workflows—a There are subtler issues lurking in
Lack of authority, lack of proximity,
lack of knowledge, and straightforward simple recipe system measurement that we’ve only
glossed over so far. These will likely
impossibility are all reasons why this is
impractical. For example, if a comput-
for federated challenge both researchers and prac-
titioners in the years ahead. To verify
er is switched off, you cannot force it to management. a model, you need to measure a sys-
install a new version of software. Thus, tem and check its compliance with the
a model of maintenance based on ob- model. Your assessment of the state of
ligation is, at best, optimistic and, at the system (does it keep its promises?)
worst, futile. The second point is that requires a trust of the measurement
obligations lead to contradictions in process itself to form a conclusion.
networks that cannot be resolved. Two That one dependence is inescapable.
different parties can insist that a third What happens when you test a sys-
will obey quite different rules, without tem’s compliance with a model? It
even being aware of one another.1 turns out that every intermediate part
Realizing these weaknesses has led in a chain of measurement potentially
to a rethink of obligations, turning distorts the information you want to
them around completely into an atom- observe, leading to less and less cer-
ic theory of “voluntary cooperation,” or tainty. Uncertainty lies at the very heart
promise theory.1 After all, if an obliga- of observability. If you want to govern
tion requires a willing consent to im- systems by pretending to know them
plement it, then voluntary cooperation absolutely, you will be disappointed.
is the more fundamental point of view. Consider this: environmental influ-
It turns out that a model of promises ences on systems and measurers can
provides exactly the kind of umbrella lead directly to illogical behavior, such
under which all of the aspects of sys- as undecidable propositions. Suppose
tem administration can be modeled. you have an assertion (for example,
The result is an agent-based approach: promise that a system property is true).
each system part should keep its own In logic this assertion must either be
promises as far as possible without ex- true or false, but consider these cases:
ternal help, expecting as little as pos- ˲˲ You do not observe the system (so
sible of its unpredictable environment. you don’t know);
Independence of parts is represent- ˲˲ Observation of the system requires
ed by agents that keep their own prom- interacting with it, which changes its
ises; the convergence to a standard is state;

48 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice

˲˲ You do not trust the measuring de- ius theorem (graph theory), and the list a system is the fundamental challenge:
vice completely; or goes on. If this sounds like mere aca- the test-driven approach is about bet-
˲˲ There is a dependence on some- demic nonsense, then consider how ter knowledge management—knowing
thing that prevents the measurement much of this nonsense is in our daily what you can and cannot know.
from being made. lives through technologies such as Whether system administration
If you believe in classic first-order Google PageRank or the Web of Trust is management or engineering is an
logic, any assertion must be either true that rely on this same idea. oft-discussed topic. Certainly without
or false, but in an indeterminate world Note, however, that the robustness some form of engineering, manage-
following any of these cases, you simply advocated in this article, using the ment becomes a haphazard affair. We
do not know, because there is insuffi- principle of atomization and indepen- still raise computers in captivity and
cient information from which to choose dence of parts, is in flat contradiction then release them into the wild, but
either true or false. The system has only with modern programming lore. We there is now hope for survival. Desired
two states, but you cannot know which are actively encouraged to make hier- states, the continual application of
of them is the case. Moreover, suppose archies of dependent, specialized ob- “dumb” rule-based maintenance, and
you measure at some time t; how much jects for reusability. In doing so we are testing relative to a model are the keys
time must elapse before you can no lon- bound to build fragilities and limita- to quantifiable knowledge.
ger be certain of the state? tions implicitly into them. There was
This situation has been seen before a time when hierarchical organization
Related articles
in, of all places, quantum mechanics. was accepted wisdom, but today it is on queue.acm.org
Like Schrodinger’s cat, you cannot know becoming clear that hierarchies are
A Plea to Software Vendors from
which of the two possibilities (dead or fragile and unmanageable structures,
Sysadmins—10 Do’s and Don’ts
alive) is the case without an active mea- with many points of failure. The alter- Thomas A. Limoncelli
surement. All you can know is the out- native of sets of atoms promising to http://queue.acm.org/detail.cfm?id=1921361
come of each measurement reported stabilize patches of the configuration Self-Healing in Modern Operating Systems
by a probe, after the fact. The lesson of space is tantamount to heresy. Never- Michael W. Shapiro
physics, on the other hand, is that one theless, sets are a more fundamental http://queue.acm.org/detail.cfm?id=1039537
can actually make excellent progress construction than graphs. A Conversation with Peter Tippett
without complete knowledge of a sys- For many system administrators, and Steven Hofmeyr
tem—by using guiding principles that these intellectual ruminations are no January 10, 2009
do not depend on the uncertain details. more pertinent than the moon land- http://queue.acm.org/detail.cfm?id=1071725
ings were to the users of Teflon pans.
Back to Stability? They do not see themselves in these References
1. Burgess, M. An approach to understanding policy
A system might not be fully knowable, issues, which is why researchers, not based on autonomy and voluntary cooperation.
but it can still be self-consistent. An merely developers, need to investigate Submitted to IFIP/IEEE 16th International Workshop
on Distributed Systems Operations and Management
obvious example that occurs repeated- them. Ultimately, I believe there is still (2005).
2. Burgess, M. Computer immunology. In Proceedings of
ly in nature and engineering is that of great progress to be made in system ad- the 12th System Administration Conference, 1998.
equilibrium. Regardless of whether you ministration using these approaches. 3. Burgess, M. Configurable immunity for evolving
human-computer systems. Science of Computer
know the details underlying a complex The future of system administration Programming 51, 3 (2004), 197–213.
system, you can know its stable states lies more in a better understanding 4. Burgess, M. On the theory of system administration.
Science of Computer Programming 49 (2003), 1–46.
because they persist. A persistent state of what we already have to work with 5. Cfengine; http://www.cfengine.org.
is an appropriate policy for tools such than in trying to oversimplify necessary 6. Couch, A., Daniels, N. The maelstrom: network service
debugging via `ineffective procedures.’ Proceedings of
as computers—if tools are changing complexity with industrial force. the 15th Systems Administration Conference (2001), 63.
too fast, they become useless. It is bet- 7. Couch, A., Gilfix, M. It’s elementary, dear Watson:
Applying logic programming to convergent system
ter to have a solid tool that is almost Conclusion management processes. In Proceedings of the 13th
what you would like, rather than the It is curious that embracing uncer- Systems Administration Conference (1999), 123.
8. Dijkstra, E. http://en.wikipedia.org/wiki/Guarded_
exact thing you want that falls apart tainty should allow you to understand Command_Language.
after a single use (what you want and something more fully, but the simple 9. Hagemark, B., Zadeck, K. Site: A language and system
for configuring many computers as one computer site.
what you need are not necessarily the truth is that working around what you Proceedings of the Workshop on Large Installation
same thing). Similarly, if system ad- don’t know is both an effective and Systems Administration III (1989); http://www2.parc.
com/csl/members/jthornton/Thesis.pdf.
ministrators cannot have what they low-cost strategy for deciding what you 10. Opscode; http://www.opscode.com/chef.
11. Puppet Labs; http://www.puppetlabs.com/.
want, they can at least choose from the actually can do. 12. Sloman, M. S., Moffet, J. Policy hierarchies for
best we can do. Major challenges of scale and com- distributed systems management. Journal of Network
and System Management 11, 9 (1993), 404.
Systems can be stable, either be- plexity haunt the industry today. We
cause they are unchanging or because now know that scalability is about not
Mark Burgess is a professor of network and system
many lesser changes balance out over only increasing throughput but also be- administration, the first with this title, at Oslo University
time (maintenance). There are count- ing able to comprehend the system as it College. His current research interests include the
behavior of computers as dynamic systems and applying
less examples of very practical tools grows. Without a model, the risk of not ideas from physics to describe computer behavior. He is
that are based on this idea: Lagrange knowing the course you are following the author of Cfengine and is the founder, chairman, and
CTO of Cfengine, Oslo, Norway.
points (optimization), Nash equilibri- can easily grow out of control. Ultimate-
um (game theory), the Perron-Froben- ly, managing the sum knowledge about © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 49
practice
doi:10.1145/1897852.1897869

Article development led by


queue.acm.org

Attacks in Estonia and Georgia highlight


key vulnerabilities in national Internet
infrastructure.
by Ross Stapleton-Gray and William Woodcock

National
Internet
Defense—
Small States
on the
Skirmish Line
and commercial activity and influence.
This is far less palpable than a nation’s
physical territory or even than “its air”
or “its water”—one could, for example,
establish by treaty how much pollution
De spite the gl oba land borderless nature of the Mexican and American factories might
Internet’s underlying protocols and driving contribute to the atmosphere along
their shared border, and establish met-
philosophy, there are significant ways in which it rics and targets fairly objectively. Cy-
remains substantively territorial. Nations have berspace is still a much wilder frontier,
difficult to define and measure. Where
policies and laws that govern and attempt to its effects are noted and measurable,
defend “their Internet”—the portions of the global all too often they are hard to attribute
network that they deem to most directly impact to responsible parties.
Nonetheless, nation-states are tak-
their commerce, their citizens’ communication, and ing steps to defend that space, and
their national means to project social, political, some have allegedly taken steps to at-

50 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
tack that of others. Two recent events plans to relocate the Bronze Soldier, a sending campaigns, suggests that this
illustrate the potential vulnerabilities Soviet war memorial, and the capital, was a one-month attack for hire (or was
faced by small nation-states and sug- Tallinn, experienced several nights of intended to look like one). Unfortu-
gest steps that others may take to miti- rioting. The subsequent cyber attacks nately, such attacks, either threatened
gate those vulnerabilities and establish are believed to be a consequence of the or launched for commercial extortion,
a more robust and defensible Internet memorial’s relocation. have become commonplace. Based on
presence. The first was an attack on An attack on Estonian Internet in- offers visible on the black market at
Estonian Internet infrastructure and frastructure and Web sites began at the time, the attack likely cost between
Web sites in May and June 2007. The 11 p.m. local time, midnight Moscow $200 and $2,000 to hire. Like many
second was a cyber attack against the time, Tuesday, May 8. The attack was politically motivated attacks, it com-
Illustratio n by a lex william so n

Georgian infrastructure that accompa- effectively mitigated by 7 a.m. the fol- bined a distributed denial-of-service
nied the Russian incursion into South lowing day but continued to be visible (DDoS) attack against Internet infra-
Ossetia in August 2008. in traffic logs for exactly 30 days there- structure with DDoS and attempted
after. That time period, together with defacement attacks against the Web
Estonia the fact that the attacking botnets’ sites of Estonian banks, media outlets,
Tensions had been building in Estonia signature was identical to that used in and government.
in the spring of 2007 over the country’s prior Russian Business Network spam- The Estonian defense was notably

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 51
practice

successful, and there are a number of of providers is via Internet exchange diction is generally clear, though limit-
lessons to be taken from it by other points, commonly abbreviated IXP. ed and perhaps difficult to enforce; but
countries wishing to avoid a cyberwar- The world has about 330 IXPs at the on the other end it is nearly impossible
fare defeat. The simplest summary of moment, and that number has been even to influence. Thus, diversity is key
the dynamics of a DDoS-based cyber at- steadily increasing. Each IXP has a spe- to optimizing the survivability of inter-
tack is as a numbers game. An attacker cific physical location and connects a national connectivity.
with greater network capacity than the community of ISPs that meet as peers Estonia had numerous privately
defender will be able to overwhelm the at the exchange. Some countries, such controlled data circuits crossing its
defender’s network, while retaining as the U.S., have many IXPs. Others, borders, with the other ends located
sufficient capacity to support its own such as the Netherlands and Germa- in several different countries. Of these,
needs at the same time. Such an attack ny, have very large IXPs. Many smaller the most significant were large Scan-
would be deemed successful. An at- countries have exactly one exchange, dinavian and Western European ISPs
tacker with less bandwidth than the de- located in the capital city. But the great- with which Estonian ISPs had commer-
fender would exhaust itself in consum- est number of countries, typically the cial relationships and that were based
ing the defender’s capacity, while the smallest ones, has no IXP at all. This in diplomatically friendly neighboring
defender might well retain enough ex- means that they are heavily dependent countries. This is an optimal situation,
cess capacity that its population would for their domestic connectivity upon and when push came to shove, Estonia
not be significantly inconvenienced; international data circuits. Imagine a received fast and effective aid from the
such an attack would be considered situation in which there were no local ISPs at the other ends of those circuits.
unsuccessful. telephone calls, only calls overseas; to Name resolution. The ability to re-
Viewed in closer detail, there are reach someone next door, you would solve domain names domestically is
different kinds of network capacity have to make a call that went overseas another critical infrastructure capabil-
and different mechanisms for improv- and then back again, at twice the cost. ity. The Domain Name System (DNS)
ing and defending each. They can be This is the situation in most less- is the Internet’s directory service, pro-
placed in four categories: local or in- developed countries, as a result of viding Internet-connected computers
ternal capacity; external connectivity; misunderstanding Internet econom- with the ability to map the human-
name resolution capability; and defen- ics and topology. Countries in this readable domain names in email and
sive coordination. situation are extremely vulnerable to Web addresses to the machine-read-
Local capacity, or bandwidth, is having those external lines of commu- able binary IP addresses used to route
most familiar as one’s initial connec- nications cut or overburdened, since traffic within the network. Domain
tion to the Internet. This local loop, or that causes not only international but names are resolved to IP addresses
last mile, is the copper wire or fiber line also domestic communications to fail, (and vice versa) by iterating through
in the ground or on poles, or the wire- and thus the ability to coordinate a de- a delegation hierarchy of DNS direc-
less link that carry signals from the fense fails as well. A strong domestic tory servers, starting at the “root” and
customer to an ISP (Internet service Internet exchange point is the first and progressing through top-level domain
provider). A robust local-loop infra- most critical component of a cyberwar- (TLD) name servers such as .com and
structure consists of buried fiber-optic fare defense. A redundant pair of IXPs, .net, to the organization-specific name
cable interconnecting each business or one in each major city, is the desir- servers that hold the particular answer
or residence with multiple ISPs over able goal. A redundant pair of IXPs in one is looking for.
different physical paths. Ideally, these Tallinn formed the linchpin of the Es- If connectivity is broken between
service providers ought to be in com- tonian defense. users and any one of the name servers
petition so they cannot be collectively International communications ca- in the delegation chain from the root
suborned or sabotaged, and so their pability is necessary for conducting down to the specific one they are look-
prices are low enough that people can business in a global economy. It’s also ing for, then the users will be unable
actually choose fluidly among them. A needed for defensive coordination with to resolve the domain name they’re
sparsely supplied market for local con- outside allies in order to protect a na- looking for, and unable to reach the
nectivity can create bottlenecks and tion’s international capacity. Interna- corresponding Web site or send the
make attractive targets. In Estonia’s tional capacity is the asset most easily email, regardless of whether they have
case, multiple independent fiber infra- targeted from the outside, and it is per- connectivity to the Web site or email
structure operators existed, and many haps the most challenging to defend addressee. If the directory service is
different ISPs built a healthy, competi- from the perspective of the state, since broken, you can’t find things, even if
tive marketplace on top of that. More— it’s a multinational private-sector re- you could, hypothetically, reach them.
and more diverse—domestic fiber is source. In most countries, each circuit Estonia did not have any root servers
always better, but Estonia’s was more that crosses the border is controlled by within the country at the time of the
than sufficient. one company at one end, another com- attack, and still does not today. This is
External connectivity. More impor- pany at the other end, and a third in be- one of the few weak points of the Esto-
tant to defensibility is the ecosystem tween. In turn, many of these compa- nian defense and would have become
for the providers’ own connectivity nies are themselves consortia of other more debilitating over the course of an
within that domestic context. The mod- multinational companies. On the do- attack that had been more effective for
ern means to create an effective mesh mestic end of a circuit regulatory juris- a longer period of time.

52 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice

Defensive coordination. The final vulnerable. Most of its international


component of an effective cyberwar- links were through Russian territory;
fare defense is coordination. Know- and unlike Estonia, Georgia had no
ing that one is under attack is an in- IXPs. As with Estonia, Georgia lacked
telligence function. Identifying and
characterizing the attack is a forensic A sparsely supplied a DNS root server, but that was mooted
by its limited infrastructure being eas-
analytical function. Communicating
this information to the ISPs that can
market for local ily overwhelmed.
Given the relatively modest infra-
mitigate the attack is a communica- connectivity can structure and comparative lack of
tions function. These functions are
most often coordinated by a computer
create bottlenecks e-commerce to be affected (and all
dwarfed in significance by an actual
emergency response team (CERT), or and make attractive shooting war), it may be more diffi-
sometimes called a CIRT (computer
incident response team). A CERT is the
targets. cult to extract lessons from Georgia’s
experience than from Estonia’s. One
glue that holds a defense together, pro- noteworthy issue in the case of Geor-
viding expertise, analytical facilities, gia, however, was the number of of-
and open lines of communication be- fers made by governments and cor-
tween the many organizations that are porations to “mirror” Georgian Web
party to the defense or have some stake content. If the Georgian government
in its success. desired to reach a non-Georgian audi-
CERTs provide training and pre- ence for sympathy and support, then
paredness workshops, maintain and distributing that message to parties
exercise contact lists, and observe outside Georgia and in regions of the
trends and find patterns in online Internet far less amenable to denial-of-
criminal, military, and espionage ac- service attacks would be a worthwhile
tivity. When a country is under attack, strategy.
CERTs help individual organizations
identify which portions of the attack Why Cyberwar?
are directed against them particularly, The mere fact that significant con-
as opposed to those that they’re feel- versation is still occurring more than
ing the effects of incidentally. CERTs three years after the attacks on Estonia
provide the expertise to help those or- indicates that even if the destructive
ganizations with the very specialized impact was minimal, the overall infor-
tasks of discerning attack traffic from mation warfare effect was significant.
legitimate traffic and developing filters The return on a very small investment
that will block the attack while protect- was disproportionately high; these
ing their ability to conduct business. margins suggest that cyberwarfare
CERTs will then communicate those techniques will continue to be applied
filters up the path of ISPs toward the at- until they become considerably more
tackers, blocking the malicious traffic expensive or less noticed.
at each step, pushing the boundary of It is worth understanding what was
the cleaned network away from the vic- successful about the attack and what
tims and toward the attackers. was successful about the defense.
Viewed in the large, the Chinese cy-
Georgia berwarfare doctrine upon which the
A little more than a year after the Esto- attacks were patterned states that one
nian incident, Georgia was subjected of the principal goals of an attack is to
to cyber attacks in conjunction with dispirit an adversary’s civilian popu-
the Russian incursion into South Os- lation, reduce their productivity, and
setia in August 2008. This more com- cause them to withdraw economic,
plex attack combined Georgian targets and eventually moral, support from
with domestic media outlets that were their country’s engagement in the
perceived to be reporting news from a conflict. This was not the SCADA at-
Georgian perspective. tack—an attack on the cyber aspects
Much of what had worked well in of physical systems, with the intent
the case of Estonia did not in the Geor- to cripple the latter—that is so often
gia attack. Relative to Estonia, Georgia warned of in the U.S. (SCADA, for su-
suffered from two crippling deficien- pervisory control and data acquisi-
cies: Georgian international connectiv- tion, is a catchall label for the various
ity was far more limited, hence more systems used to manage industrial

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 53
practice

systems and processes, from facto- effort, consuming far more resources
ries to pipelines to transportation than does actual defense.
networks.) Rather, The Estonia inci-
dent was a pure information-warfare Defending the Small Nation-State
attack, attempting to convince Esto-
nians that the information-economy Much of what had Ensuring the Internet security of a
small nation-state entails investment
infrastructure of which they were so
proud was vulnerable and unsound,
worked well in the in four areas: ensuring physical net-
work robustness; securing the inter-
that their work in that sector was of case of Estonia did connection of participating networks
little value, that their adversary was
more capable and better prepared,
not in the Georgia through exchange points; securing the
data and services required to keep the
and in a more pitched conflict, their attack…Georgian Internet running; and developing an
defeat would be inevitable. A popula-
tion that would take such a message
international effective response community.
In advance of any threat, a nation
to heart would indeed be unwilling to connectivity was far should take steps to ensure that its net-
support conflict against the attacker.
The Estonia attack had very little more limited, hence works are connected to the rest of the
world via diverse international tran-
success in concrete terms, and little
more success in information-war-
more vulnerable. sit links to different unrelated tran-
sit providers in different, unaligned
fare terms, relative to the Estonians countries. A significant factor in why
against whom it was directed. Because Georgia was so affected by its cyber at-
of its apparent state-on-state nature, tack was its extremely limited connec-
and Estonia’s status at the time as the tivity to the outside world; Estonia was
most recently admitted NATO ally, the in a far better position, with a more di-
attack managed to garner a surpris- verse mesh of connectivity to friendlier
ing degree of attention elsewhere, neighbors. Submarine cables are also
though. The attacks against Georgia worth noting as a clear point of vulner-
were far more effective, but Georgia ability in international transit. There
did not have as far to fall and the con- have been a number of accidental sub-
flict on the Internet paled in compari- marine cable cuts in the past several
son to the actual shooting war in its years, and a coordinated, willful effort
territory. One might accurately term to take those out would be fairly simple
both the Estonia and Georgia cyber to mount and would have significant
assaults as skirmishing; the attack on effect in certain regions.
Estonia amounted to little more than In the case of Estonia, DoS attacks
a nuisance, in part because of its scale effectively stopped at the country’s IXP
and in part because of the effective- and had minimal impact on domestic
ness of the response. Internet traffic. In countries lacking
Without a doubt, any major war IXPs, even domestic traffic may end
would see complementary attacks up routed internationally, at greater
against the adversaries’ information expense than if there had been an IXP
infrastructure, including their nation- to broker exchanges before incurring
al presence on the Internet—suppres- higher international transit costs, and
sion of the means to coordinate and at greater risk of disruption.
organize has long been a basic tenet It is critical that countries have root
of warfare. It is perhaps early to assess and TLD name servers well connected
the impact of cyberwar, absent “real to their domestic IXPs, such that all of
war”; the attack against Estonia was their domestic ISPs can provide unin-
too slight to measure significant ef- terrupted DNS service to their custom-
fects, while the attack on Georgia was ers. In the case of ISO country-code
just a sideshow to a widely, physically TLD name servers, such as those for
destructive conflict. Estonia’s .ee domain, that’s relatively
The ultimate source of both attacks easily accomplished, though not yet
remains murky. Many assertions have universally done. In the case of root
been made, but there has been little ac- name servers, it requires the coopera-
tual discussion of the question of state tion and goodwill of a foreign organi-
involvement in cyber attacks. Plausible zation, the operator of the root name
deniability has become the watchword server, and generally some small in-
in cyberwarfare, and accordingly, at- vestment in infrastructure support for
tribution has become a major focus of the remotely operated root server. This

54 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
practice

might amount to an expenditure of content, and service provider com- and perhaps government investment,
some $15,000 (U.S.) per year, per root munity. Therefore, it will be limited to foster a robust physical infrastructure.
server installation within the country. operators, vendors, researchers, and ˲˲ Similarly, take steps to ensure a di-
(It’s worth noting that all of the in- people in the FIRST community work- versity of international connections.
vestments required for cyberwarfare ing to stop NSP security incidents.”3 ˲˲ Encourage (or directly sponsor)
defense are equally applicable to gen- New members of the “culture of creation of one or more IXPs.
eral economic development. Just as security” come out of academic and ˲˲ Ensure the domestic availability of
the cyberwarfare field of conflict is a training programs (which must be es- DNS resolution, through root servers.
private-sector space, this, too, is un- tablished), intern in a CERT (interna- ˲˲ Foster the growth of a collaborating
like traditional military expenditures. tionally or domestically), and go on to community of security professionals.
A tank or a bunker is purely a cost cen- careers as CSOs (chief security officers) A diversity of interconnections,
ter, whereas an IXP or domain name in CERTs, academia, law enforcement, both international and domestic, fa-
server is a profit center, generating or government. This is fundamentally cilitated by the efficient peering af-
new, concrete, and monetized value analogous to the peopling of a national forded by IXPs, provides a more robust
for its users from the moment it’s es- health environment with doctors. logical infrastructure, and local DNS
tablished. The return on investment of In the U.S., the Department of resolution further lessens depen-
a newly established IXP is typically less Homeland Security has included dence on more exposed international
than three weeks, and often less than CERTs and information assurance an- connections. With that technical in-
one week.) alysts and operators in a new research frastructure ensured, nations should
The CERT is a widely employed and development solicitation. In a then foster development of the human
model for computer and network in- draft of the solicitation, DHS notes, infrastructure, the information secu-
cident response. CERTs are directly “While we have a good understand- rity personnel needed to anticipate
responsible for systems under their ing of the technologies involved in [cy- threats, the ability to intercede inven-
own control, and, with other CERTs, bersecurity incident response teams], tively to restore services, and the abil-
collaborate on collective network se- we have not adequately studied the ity to support incident forensic collec-
curity. FIRST (Forum of Incident Re- characteristics of individuals, teams, tion and analysis.
sponse and Security Teams), an asso- and communities that distinguish
ciation of CERTs, brings CERTs and the great [cybersecurity incidence] re-
Related articles
their staffs together to build the most sponders from the average technology
on queue.acm.org
fundamental links in a web of trust.1 A contributor. In other areas where indi-
CERT should also have already estab- vidual contributions are essential to Cybercrime 2.0: When the Cloud Turns Dark
Niels Provos, Moheeb Abu Rajab,
lished lines of communication with success, for example, first responders, Panayiotis Mavrommatis
ISPs, law enforcement, and other ele- commercial pilots, and military per- http://queue.acm.org/detail.cfm?id=1517412
ments of government concerned with sonnel, we have studied the individual
CTO Roundtable: Malware Defense
infrastructure security. and group characteristics essential http://queue.acm.org/detail.cfm?id=1731902
Network operators’ groups pro- to success. To optimize the selection,
The Evolution of Security
mote community and cooperation training, and organization of CSIR per- Daniel E. Geer
between a country’s Internet opera- sonnel to support the essential cyber http://queue.acm.org/detail.cfm?id=1242500
tors and their foreign counterparts. missions of DHS, a much greater un-
Participation in Inter-network Opera- derstanding and appreciation of these
References
tions Center Dial-by-ASN (INOC-DBA) characteristics must be achieved.” 1. FIRST; http://first.org/about/.
and Network Service Provider Security 2. Inter-network Operations Center Dial-by-ASN
(INOC-DBA), a Resource for the Network Operator
(NSP-SEC) can also aid in coordinat- Conclusion Community; http://www2.computer.org/portal/web/
csdl/doi/10.1109/CATCH.2009.36.
ing incident response. INOC-DBA is It would be fair to describe these two 3. NSP Security Forum; http://puck.nether.net/mailman/
a voice over Internet Protocol (VoIP) incidents—Estonia in 2007, and Geor- listinfo/nsp-security.
hotline system, interconnecting net- gia a year later—as “cyberskirmish-
work operation centers; it uses the ing.” The attacks on Estonia amounted Bill Woodcock is a founder and research director of
networks’ own numeric identifiers as to little more than a nuisance, though a Packet Clearing House, a nonprofit research institute
dedicated to understanding and supporting Internet traffic
dialing numbers so that a NOC op- quite visible and much discussed one. exchange technology, policy, and economics. He entered
erator observing problematic traffic Georgia had far greater problems to the field of Internet routing research in 1989 while serving
as the network architect and operations director for an
can merely enter the address of the deal with in an armed incursion into international multiprotocol service-provision backbone
offending network to place a call to its territory, and the Internet was not a network. Woodcock has participated in the establishment
of more than 70 public Internet exchange points in
the responsible party.2 NSP-SEC is an factor in that fight. Europe, Africa, Asia, and the Americas.
informal organization of security pro- The difference in responsiveness Ross Stapleton-Gray is research program manager at
fessionals at the largest Internet infra- between the two, however, recom- Packet Clearing House. Prior to joining PCH, he served as
an intelligence analyst for the CIA, in information policy
structure providers: “Membership in mends that the small nation-state positions with the American Petroleum Institute and the
NSP-SEC is restricted to those actively ought to make investments in Inter- University of California Office of the President, and has
worked with several IT security start-ups, including as a
involved in the mitigation of [Network net defensibility akin to those seen in cofounder of Sandstorm Enterprises.
Service Provider] security incidents Estonia:
within organizations in the IP transit, ˲˲ Through policy and regulation, © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 55
practice
doi:10.1145/1897852.1897870
So much for theory. Does any of this
Article development led by
queue.acm.org
hold in practice?
One of the nice side effects of the
“software tools” concept is that pro-
Why can’t we all use standard libraries grams are data, too. We can apply data
for commonly needed algorithms? mining methods to program source
code, allowing us to investigate such
by Poul-Henning Kamp questions.
Cryptography algorithms provide a

B.Y.O.C
good example because they are easier
to identify than other algorithms. Mag-
ic numbers in crypto algorithms make
for good oracular answers to their pres-

(1,342 Times
ence: you are not likely to encounter
both 0xc76c51a3 and 0xd192e819 any-
where other than an implementation
of SHA-2. Creating an oracle to detect

and Counting)
sorting algorithms in source code with
(p>0.9) would be a good student proj-
ect (albeit, likely impossible).
For data mining FOSS (free and
open source software) programs, the
FreeBSD operating system ships with
a handy facility called the Ports Collec-
tion, containing strategic metadata for
22.003 pieces of FOSS. A small number
of these “ports” are successive versions
of the same software (Perl 5.8, Perl
A lthough seldom a rt i cu l at ed clearly, or even 5.10, among others), but the vast ma-
at all, one of the bedrock ideas of good software jority are independent pieces of soft-
ware, ranging from trivialities such as
engineering is reuse of code libraries holding easily XLogo to monsters such as Firefox and
accessible implementations of common algorithms OpenOffice.
A simple command downloads and
and facilities. The reason for this reticence is probably extracts the source code to as many
because there is no way to state it succinctly, without ports as possible into an easily navigat-
sounding like a cheap parody of Occam’s razor: ed directory tree:

Frustra fit per plura quod potest fieri per pauciora (it is cd /usr/ports ; make -k extract
pointless to do with several where few will suffice).
You will obviously need both suffi-
Obviously, choice of programming language means cient disk space and patience. (Using
that “few” will never be “a single one,” and until cd /usr/ports ; make -k -j 10
somebody releases a competent implementation extract will do 10 pieces of software
in parallel, but will be a bandwidth
under an open source license, we may have several hog.)
more versions floating around than are strictly The results are worse. I had not ex-
pected to see 1,342, as shown in the ac-
necessary, for legal rather than technological reasons. companying table.a I expect that these
It also never hurts to have a few competing numbers will trisect my readers into
implementations to inspire improvement; in fact, three somewhat flippantly labeled seg-
there seems to be a distinct lack of improvement a Sorry, I forgot to include the DES algorithm in
where a single implementation becomes too “golden.” the search.

56 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
2d-rewriter-1.4 -- Cellular automata simulator amoebax-0.2.1_4 -- Cute and addictive Puyo Puyo like game
2dhf-2005.05_4 -- A Numerical Hartree-Fock Program for Diatomic Molecules amp-0.7.6,1 -- Another mp3 player
2ping-1.0 -- A bi-directional ping utility ampache-3.5.4_1 -- A Web-based Audio file manager
3dc-0.8.1_3 -- 3-Dimensional Chess for X Window System ampasCTL-1.4.0 -- Color Transformation Language interpreter
3ddesktop-0.2.9_8 -- 3D Virtual Desktop Switcher amphetadesk-0.93.1_6 -- RSS client that serves HTTP to your local web browser
3dm-2.11.00.009_1,1 -- 3ware RAID controller monitoring daemon and web server amphetamine-0.8.10_7 -- A 2D - Jump'n'run shooter
3dpong-0.5_2 -- X Window 3D Pong game for 1 or 2 players with a ball and paddles ample-0.5.7 -- Allows you to listen to your own MP3's away from home
3proxy-0.6.1 -- Proxy servers set (support HTTP(S), FTP, SOCKS, POP3, TCP & UDP) amqp08-20090705 -- Vendor neutral AMQP 0.8 XML specification
44bsd-csh-20001106 -- The traditional 4.4BSD /bin/csh C-shell amrcoder-1.0 -- AMR encoder/decoder for Mbuni MMS Gateway (www.mbuni.org)
44bsd-more-20000521 -- The pager installed with FreeBSD before less(1) was imported amrstat-20070216 -- Utility for LSI Logic's MegaRAID RAID controllers
44bsd-rdist-20001111 -- The traditional 4.4BSD rdist amsn-0.98.3_1 -- Alvano\'s MSN Messenger
4stattack-2.1.4_6 -- Connect four stones in a straight line amspsfnt-1.0_5 -- AMSFonts PostScript Fonts (Adobe Type 1 format)
4va-1.21_2 -- Four-Dimensional graphics tumbler for X11 amule-2.2.6 -- The all-platform eMule p2p client
54321-1.0.2001.11.16_9 -- 54321 is five games in four-, three-, or two-dimensions for one
amule-devel-10390
player -- The all-platform eMule p2p client
6tunnel-0.11.r2_2 -- TCP proxy for applications that don't speak IPv6 an-0.95_1 -- Fast anagram generator
8kingdoms-1.1.0_3 -- 3D turn-based fantasy strategic game anacron-2.3_4 -- Schedules periodic jobs on systems that are not permanently up
915resolution-0.5.3_1,1 -- Resolution tool for Intel i915 video cards anagramarama-0.2_5 -- Anagramarama - a word game for Linux, Windows and BeOS
9base-b20090309 -- Port of various original plan9 tools analog-6.0_5,1 -- An extremely fast program for analysing WWW logfiles
9box-0.2.1_3 -- 9box can "pack" windows inside itself and-1.2.2 -- Auto Nice Daemon
9e-1.0 -- Explode Plan9 archives angband-3.1.2.v2 -- Rogue-like game with color, X11 support
9libs-1.0.1_2 -- Plan9 compatibility libraries angst-0.4b_2 -- An active sniffer
9menu-1.8_2 -- A simple menu patterned after Plan9 animenfo-client-20020819 -- AnimeNfo client
ACH-1.0.2 -- A free, open source tool for complex research problems animenfo-client-gui-gtk-20020819_9 -- AnimeNfo client with GTK support
ADMsmb-0.3 -- Security scanner for Samba animorph-0.3 -- Morphing engine for MakeHuman
ADMsnmp-0.1 -- SNMP audit scanner anjuta-2.32.1.0 -- Integrated Development Environment for C and C++
AquaGatekeeper-1.22_4 -- Aqua H323 Gatekeeper and proxy anjuta-extras-2.32.0.0 -- Extra anjuta plugins.
AquaGatekeeper-2.0_3 -- Aqua H323 Gatekeeper and proxy anki-1.0.1 -- Flashcard trainer with spaced repetition
Atlas-0.5.2_1 -- A C++ reference implementation of the Atlas protocol ann-1.1.2 -- A Library for Approximate Nearest Neighbor Searching
Atlas-0.6.2 -- A C++ reference implementation of the Atlas protocol annelid-1_4 -- Remake of the ubiquitous "Snake" and "Worm" games
AtomicParsley-0.9.0_1 -- Command line program for reading parsing and setting MP4/M4A metadata
annextools-10.0_1 -- BSD tools for the MicroAnnex-XL Terminal Server
AutoIndex-1.5.4_1 -- PHP 4.x script that makes a table that lists the files in a directory
annoyance-filter-1.0d -- Adaptive Bayesian Junk Mail Filter
AutoIndex-2.2.4 -- PHP 5.x script that makes a table that lists the files in a directoryanomy-sanitizer-1.76_4 -- Sanitize and clean incoming/outgoing mail
BillardGL-1.75_6 -- An OpenGL billiard simulator anope-1.8.5 -- A set of IRC services for IRC networks
BitchX-1.1.0.1_4 -- Feature-rich scriptable IRC client ansifilter-1.4 -- Customizable ANSI Code Converter
CKEditor-3.4 -- CKEditor is a WYSIWYG editor to be used inside web page ansiprint-1.0 -- Prints through a terminal with ANSI escape sequences
CalculiX-2.2 -- A Three-Dimensional Structural Finite Element Program ant-xinclude-task-0.2_2 -- XInclude task for Jakarta Ant
CaribbeanStud-1.0_2 -- Caribbean Stud gambling game for X Window System anteater-0.4.5 -- A MTA log analyzer
Cgraph-2.04_2 -- A PostScript plotting library in C antipolix-2.1_2 -- Simple multiplayer game for X Window System
Clp-1.11.0 -- Linear Programming Solver antivirus-3.30_6 -- Sendmail milter wich uses Mcafee Virus Scan or clamav
Coin-3.1.3_1 -- C++ 3D graphics library based on the Open Inventor 2.1 API antiwm-0.0.5 -- A minimalist window manager inspired by Ratpoison
DFileServer-1.1.3 -- A compact webserver designed to make sharing files easy antiword-0.37_1 -- An application to display Microsoft(tm) Word files
DTraceToolkit-0.99 -- Collection of useful scripts for DTrace antlr-2.7.6_2 -- ANother Tool for Language Recognition
DarwinStreamingServer-6.0.3 -- Darwin Streaming Server, a MP3, MPEG4 and QuickTime streaming
antlrworks-1.3.1,1
server -- The ANTLR GUI Development Environment
DirComp-1.3.10_2 -- Compare two directories antrix-1477_1 -- Free stable dedicated-server for World of Warcraft
E-Run-1.2_10 -- A simple epplet for launching arbitrary programs anubis-3.6.2_10 -- Outgoing SMTP mail processor
E-buttons-0.2_11 -- A simple epplet that contains several buttons used to launch programs
anyremote-5.2 -- Remote control service over Bluetooth, infrared or tcp/ip networking
EZWGL-1.50_6 -- The EZ Widget and Graphics Library anyremote2html-1.4 -- A web interface for anyRemote
FSViewer.app-0.2.5_9 -- X11 file manager using WINGS library. Dockable in WindowMaker anyterm-1.1.29 -- A terminal emulator on a Web page
FlightGear-2.0.0_3 -- The FlightGear flight simulator aoe-1.2.0 -- FreeBSD driver for ATA over Ethernet (AoE)
FlightGear-aircrafts-20100302 -- Additional aircrafts for the FlightGear flight simulator
aoi-2.5.1_2 -- An open source Java written 3D modelling and rendering studio
FlightGear-data-2.0.0_1 -- FlightGear scenery, textures and aircraft models aolserver-4.5.1_1 -- A multithreaded web server with embedded TCL interpreter
Fnorb-1.3_1 -- A CORBA 2.0 ORB written in Python aolserver-nsencrypt-0.4_2 -- OpenSSL data encryption module for AOLserver
FreeMat-4.0_1 -- An environment for rapid engineering and scientific processing aolserver-nsgd-2.0_8 -- Graphics module for AOLserver
FreeMat-mpi-4.0_1 -- An environment for rapid engineering and scientific processing aolserver-nsmcrypt-1.1_3 -- AOLserver interface to mcrypt library
Fudgit-2.41_1 -- Multi-purpose data-processing and fitting program aolserver-nsmhash-1.1_2 -- AOLserver interface to mhash library
GNUDoku-0.93_5 -- A free program for creating and solving Su Doku puzzles aolserver-nsmysql-1.0_2 -- Internal MySQL database driver for AOLserver
GSubEdit-0.4.p1_9 -- GNOME Subtitle Editor is a tool for editing/converting video subtitles
aolserver-nsopenssl-3.0.b26_1 -- OpenSSL socket encryption module for AOLserver
GTKsubtitler-0.2.4_8 -- A small GNOME program for editing and converting subtitles aolserver-nspostgres-4.1_3 -- Internal PostgreSQL driver for AOLserver
Gdtclft-2.2.5_9 -- A TCL interface to the Thomas Boutell's Gd library aolserver-nssha1-0.1_1 -- AOLserver module to perform SHA1 hashes
Generic-NQS-3.50.9_2 -- Generic Network Queuing System aolserver-nszlib-1.1_2 -- Zlib library interface for AOLserver
GeoIP-1.4.6 -- Find the country that any IP address or hostname originates from aolserver-xotcl-1.6.6 -- Object-oriented scripting language based on Tcl
GiNaC-1.5.6 -- A C++ library for symbolic mathematical calculations aop-0.6 -- A curses based arcade game with only 64 lines of code
GimpUserManual-HTML-2_1 -- The user manual for the GNU Image Manipulation Program (GIMP)ap-utils-1.4.1_4 -- A set of utilities to configure and monitor wireless access points
GimpUserManual-PDF-2_1 -- The user manual for the GNU Image Manipulation Program (GIMP) ap13-mod_accessCookie-0.4 -- Supply access control based cookies stored in a MySQL database
GraphicsMagick-1.1.15_3,1 -- Fast image processing tools based on ImageMagick ap13-mod_access_identd-1.2.0 -- Apache module to supply access control based on ident reply
GraphicsMagick-1.2.10_1 -- Fast image processing tools based on ImageMagick ap13-mod_access_referer-1.0.2_1 -- Provides access control based on Referer HTTP header for Apache
GraphicsMagick-1.3.12_1 -- Fast image processing tools based on ImageMagick ap13-mod_accounting-0.5_7 -- An Apache module that records traffic statistics into a database
HVSC-Update-2.8.4 -- Update program for the HVSC C= 64 SID tune collection ap13-mod_auth_any-1.5 -- Apache module to use any command line program to authenticate a user
Hermes-1.3.3_2 -- Fast pixel formats conversion library ap13-mod_auth_cookie_mysql-1.0 -- Allows authentication against a MySQL database via a secure cookie
HeroesOfMightAndMagic-3_1 -- BSD Installation of the Linux game "Heroes of Might and Magic
ap13-mod_auth_external-2.1.19_1
III" -- Enables the use of external mechanisms for user authentication
Hyperlatex-2.9.a_2 -- Produce HTML and printed documents from LaTeX source ap13-mod_auth_imap-1.1 -- An Apache module to provide authentication via an IMAP mail server
IExtract-0.9.30_1 -- Extract meta-information from files ap13-mod_auth_kerb-5.3,1 -- An Apache module for authenticating users with Kerberos v5
IMHear-1.0 -- An MSN Messenger event/message sniffer ap13-mod_auth_mysql-3.2 -- Allows users to use MySQL databases for user authentication
IMP-1.0.7_1 -- Monadic interpreter of a simple imperative language ap13-mod_auth_mysql_another-3.0.0_2 -- Allows users to use MySQL databases for user authentication
IPA-1.07 -- Image Processing Algorithms ap13-mod_auth_pam-1.1.1_1 -- Allows users to use PAM modules for user authentication
Ice-3.4.1 -- A modern alternative to object middleware such as CORBA/COM/DCOM/COM+ ap13-mod_auth_pgsql-0.9.12_4 -- Allows users to use PostgreSQL databases for user authentication
IglooFTP-0.6.1_6 -- Easy to use FTP client for X Window System ap13-mod_auth_pubtkt-0.6 -- An Apache module to provide public key ticket based authentication
ImageMagick-6.6.5.10 -- Image processing tools ap13-mod_auth_useragent-1.0 -- Allows you to forbid clients based on their User-Agent
InsightToolkit-2.8.1_2 -- Insight Toolkit ap13-mod_backhand-1.2.2_1 -- Apache module that allows seamless redirection and load balancing of HTTP requests
Judy-1.0.5 -- General purpose dynamic array ap13-mod_bandwidth-2.1.0 -- Bandwidth management module for the Apache webserver
KPackageKit-0.6.2 -- KDE interface for PackageKit ap13-mod_bf-0.2 -- A brainf*ck module for Apache
KSubeditor-0.2_9,1 -- A video subtitle editor for KDE ap13-mod_blosxom-0.05_1 -- Apache module to build the extremely lightweight Weblog environment
KeePassX-0.4.3 -- Cross Platform Password Manager ap13-mod_blowchunks-1.3_1 -- Apache module for rejecting and logging chunked requests
KrossWordPlayer-1.4_10 -- KDE crossword puzzle game ap13-mod_bunzip2-1 -- Apache module for server-side decompression of bzip2 files
LBreeder-1.0_15 -- Allows you to display and breed L-system forms ap13-mod_cgi_debug-0.7 -- Apache module to make debugging server-side scripts easier
LPRng-3.8.32_2 -- An Enhanced Printer Spooler ap13-mod_chroot-0.5 -- The mod_chroot makes running Apache in a chroot easy
LPRngTool-1.3.2_5 -- Configuration Tool for LPRng ap13-mod_color-0.3 -- Apache module that provides syntax coloring for various languages
LaBrea-2.4_2 -- Security tarpit defense tool ap13-mod_curb-1.1 -- A per-server bandwidth limiter module for Apache 1.3
LabPlot-1.6.0.2_11 -- LabPlot : Data analysis and visualisation ap13-mod_cvs-0.5 -- A module that makes Apache CVS aware
Lila-xfwm4-0.3.1_7 -- XFce 4 Lila window decoration theme for xfwm4 ap13-mod_dav-1.0.3_4 -- An Apache module that provides DAV capabilities
LinNeighborhood-0.6.5_11 -- GTK+ gui for browsing and mounting SMB filesystems ap13-mod_dtcl-0.12.0_1 -- Embeds a TCL8 interpreter in the Apache server
MT-5.03_1 -- A web-based personal publishing system for weblogs ap13-mod_encoding-20021209_2 -- Apache module for non-ascii filename interoperability
Maaate-0.3.1_3 -- MPEG audio analysis toolkit ap13-mod_evasive-1.10.1 -- An Apache module to try to protect the HTTP Server from DoS/DDoS attacks
MailScanner-4.81.4_1 -- Powerful virus/spam scanning framework for mail gateways ap13-mod_extract_forwarded-1.4 -- An Apache module that can make proxied requests appear with client IP
MathPlanner-3.1.3_6 -- A mathematical design and publishing application ap13-mod_fastcgi-2.4.6_1 -- A fast-cgi module for Apache
Mixminion-0.0.8.a3 -- A Type III Anonymous Remailer ap13-mod_filter-1.4.1_1 -- Filter output from other modules inside of Apache
Mowitz-0.2.1_4 -- This is the Mowitz ("More widgets") library ap13-mod_geoip-1.3.4_1 -- An Apache module that provides the country code of the client's IP
MuSE-0.9.2_9 -- Multiple Streaming Engine ap13-mod_gzip-1.3.26.1a -- An Internet Content Acceleration module for Apache
MyPasswordSafe-20061216_6 -- Easy-to-use password manager compatible with Password Safe ap13-mod_hosts_access-1.1.0 -- Apache module that makes Apache respect hosts.allow and hosts.deny
NagiosAgent-1.2.0.1_1 -- A QT-based frontend to Nagios ap13-mod_index_rss-1.0 -- Apache module to provides RSS output for directories
Nefarious-1.2.0 -- IRC server used by evilnet based off of Undernet\'s ircu ap13-mod_jail-0.4_2 -- Apache 1.3.x/2.0.xx module to enable an easy alternative to mod_chroot
Net-IMAP-Server-1.29 -- A single-threaded multiplexing IMAP server implementation ap13-mod_jk-1.2.30,1 -- Apache JK module for connecting to Tomcat using AJP1X
NetHirc-0.94 -- Perl-based IRC client that uses Net::IRC ap13-mod_layout-3.4 -- Apache module to wrap served pages with a header and/or footer
NetPIPE-3.7.1 -- A self-scaling network benchmark ap13-mod_limitipconn-0.04_1 -- Limit the number of simultaneous connections from a single IP address
NetRexx-2.05_3 -- Human-oriented programming language for writing/using Java classes ap13-mod_log_spread-1.0.4_1 -- An Apache module interfacing with spread
NetSpades-4.2.0_6 -- Very popular card game for 1-4 players over a network ap13-mod_log_sql-1.101 -- Allows Apache to log to a MySQL database
NunniMCAX-1.4.1 -- C, non validating XML parser with SAX-like API ap13-mod_log_sql-1.18_3 -- Allows Apache to log to a MySQL database
O2-tools-2.00 -- Huge image processing tools and libraries ap13-mod_macro-1.1.2b -- Apache module for use macros in config files
OQTEncoder-0.1_6 -- A simple encoder using OpenQuicktime (TM) ap13-mod_mod_scgi-1.12 -- Apache module that implements the client side of the SCGI protocol
OQTPlayer-0.5_9 -- A very very small, not functionnal, video OpenQuicktime (TM) player ap13-mod_mp3-0.40_1 -- Apache module to allow MP3 streaming
ORBit-0.5.17_5 -- High-performance CORBA ORB with support for the C language ap13-mod_mylo-0.2.2 -- An Apache module to make Apache log to MySQL
ORBit2-2.14.19 -- High-performance CORBA ORB with support for the C language ap13-mod_ntlm-0.4 -- NTLM authentication module for the Apache webserver
ORBit2-reference-2.14.19 -- Programming reference for devel/ORBit2 ap13-mod_perl-1.31_1 -- Embeds a Perl interpreter in the Apache server
Ocsinventory-Agent-1.1.2.1,1 -- Keep track of the computers configuration and software ap13-mod_proxy_add_forward-20020710 -- Apache module that adds a client IP header to outgoing proxy requests
OpenEXR-1.6.1_3 -- A high dynamic-range (HDR) image file format ap13-mod_put-1.3_1 -- An Apache module that provides PUT and DELETE methods
OpenSP-1.5.2_2 -- This package is a collection of SGML/XML tools called OpenSP ap13-mod_python-2.7.11 -- Apache 1.3 module for integrating Python
OpenSSH-askpass-1.2.4.1 -- Graphical password applet for entering SSH passphrase ap13-mod_realip-2.0 -- Apache module to fix IP addresses in proxied requests
OpenVerse-0.8.7_3 -- A visual chat program written in Tcl/Tk ap13-mod_roaming-1.0.2_1 -- An Apache module that works as a Netscape Roaming Access server
PDL-2.4.4_7 -- Perl Data Language ap13-mod_rpaf-0.6 -- Make proxied requests appear with client IP
PTlink-IRCd-6.19.6 -- PTlink IRC daemon ap13-mod_ruby-1.3.0_1 -- An Apache module that embeds Ruby interpreter within
PTlink-Services-3.9.2 -- PTlink IRC services ap13-mod_sed-0.1 -- An apache module that embeds a copy of the sed(1) command
PackageKit-0.6.10 -- A DBUS packaging abstraction layer ap13-mod_sequester-1.8 -- Apache module that controls access to the website using secure info
PackageKit-qt4-0.6.10 -- Qt4 bindings to packagekit ap13-mod_shapvh-1.0 -- Apache module that provides virtual hosts from a database
ParMetis-3.1_5 -- A package for parallel (mpi) unstructured graph partitioning ap13-mod_sqlinclude-1.4_1 -- An Apache module implementing config inclusion from MySQL databases
PenguinTV-4.1.0_3 -- Graphical RSS feed reader with incorperated playback functions - development
ap13-mod_ticket-1.0
version -- Apache module for a digitally signed ticket in URL
PicMonger-0.9.6_9 -- An automated USENET (NNTP) picture decoding client ap13-mod_trigger-1.1 -- Apache module to launch triggers if certain actions occur
Pymacs-0.22_4 -- A Python package for scripting emacs ap13-mod_tsunami-3.0_1 -- Apache module which dynamically limits a site's slot usage
QNetChess-1.1_6 -- Qt based chess multiplayer game ap13-mod_uid-1.1.0 -- A module issuing the "correct" cookies for counting the site visitors
R-2.11.1 -- A language for statistical computing and graphics ap13-mod_webkit-1.1b1 -- A apache module for WebWare WebKit AppServer
R-cran-RSvgDevice-0.6.4.1_4 -- A R SVG graphics device ap13-mod_wsgi-2.8 -- Python WSGI adapter module for Apache
R-cran-Zelig-3.4.8 -- Everyone's Statistical Software ap13-mod_wsgi-3.3 -- Python WSGI adapter module for Apache
R-cran-car-1.2.16 -- Companion to Applied Regression for R ap20-mod_antiloris-0.4 -- Protect Apache 2.x against the Slowloris HTTP DoS attack
R-cran-gpclib-1.5.1 -- General Polygon Clipping Library for R ap20-mod_auth_cas-1.0.8 -- Apache 2.x module that supports the CASv1 and CASv2 protocols
R-cran-igraph-0.5.2_4 -- R extension package for igraph ap20-mod_auth_cookie_mysql2-0.9.a -- Allows authentication against a MySQL database via a secure cookie
R-cran-inline-0.3.6 -- Inline C, C++, Fortran function calls from R ap20-mod_auth_external-2.2.11 -- Allows users authentication based on external mechanisms
R-cran-psych-1.0.91 -- Psych package for the R project ap20-mod_auth_form-2.05_1 -- MySQL based form authentication module for Apache 2.x
R-cran-sm-2.2.4 -- Smoothing methods for nonparametric regression and density estimationap20-mod_auth_imap-2.2.0 -- An Apache 2 module to provide authentication via an IMAP mail server
R-cran-sp-0.9.62 -- R Classes and Methods for Spatial Data ap20-mod_auth_kerb-5.4_2 -- An Apache module for authenticating users with Kerberos v5
REminiscence-0.1.9_4 -- A rewritten engine for Flashback ap20-mod_auth_ldap-2.12_1 -- Apache module to authenticate against an LDAP directory
Radiator-4.7_1 -- Radiator Radius Server by Open System Consultants ap20-mod_auth_mysql-1.10 -- MySQL-based authentication module with VirtualHost support
RealTimeBattle-1.0.8_8 -- Robot programming game for UNIX ap20-mod_auth_openid-0.5 -- An authentication module for the Apache 2 webserver with OpenID
Ri-li-2.0.1_3 -- Drive a toy wood train in many levels - snake-like arcade game ap20-mod_auth_pam-1.1.1_3 -- Allows users to use PAM modules for user authentication
SNMP4Nagios-0.4 -- Vendor specific SNMP plugins for Nagios ap20-mod_auth_pgsql-2.0.3_1 -- Allows users to use PostgreSQL databases for user authentication
SPE-0.8.4.h_2 -- Stani's Python Editor ap20-mod_auth_remote-1.0 -- Allows users to authenticate on a remote web server
STk-4.0.1_2 -- A scheme interpreter with full access to the Tk graphical package ap20-mod_auth_xradius-0.4.6 -- Enables RADIUS authentication
Sablot-1.0.3 -- XML toolkit implementing XSLT 1.0, XPath 1.0 and DOM Level2 ap20-mod_authenticache-2.0.8_1 -- A generic credential caching module for Apache 2.0.x
SciPlot-1.36_2 -- A full-featured Xt widget to display 2D data in a graph ap20-mod_backtrace-1.0 -- Collects backtraces when a child process crashes
SearchAndRescue-1.1.0 -- A flight simulator in which the player rescues people ap20-mod_bw-0.8 -- Bandwidth and Connection control per Virtual Host or Directory
SearchAndRescue-data-1.0.0 -- The data files for SearchAndRescue flight simulator ap20-mod_cband-0.9.7.5_2 -- A per-virtualhost bandwidth limiter module for Apache 2
SimGear-2.0.0_3 -- A toolkit for 3D games and simulations ap20-mod_cfg_ldap-1.2_1 -- Allows you to keep your virtual host configuration in a LDAP directory
SoQt-1.5.0_1 -- Qt4 toolkit library for Coin ap20-mod_cplusplus-1.5.4_1 -- Apache module for loading C++ objects as handlers
SoXt-1.2.2_8 -- GUI binding for using Open Inventor with Xt/Motif ap20-mod_cvs-0.5.91_1 -- A module that makes Apache 2 CVS aware
Sockets-2.3.9.2 -- A C++ wrapper for BSD-style sockets ap20-mod_domaintree-1.6 -- Hostname to filesystem mapper for Apache 2
SoftMaker-Office-2006_2 -- Microsoft Word/Excel OpenDocument and OpenOffice.org editor ap20-mod_extract_forwarded-2.0.2_2 -- An Apache module that can make proxied requests appear with client IP
SpecTcl-1.1_4 -- Free drag-and-drop GUI builder for Tk and Java from Sun ap20-mod_fcgid-2.3.5 -- An alternative FastCGI module for Apache2
TclExpat-1.1_6 -- The TCL interface to Expat library ap20-mod_fileiri-1.15 -- A http IRIs module for Apache 2
Tee-3.4 -- An enhanced version of tee(1) ap20-mod_flickr-1.0_1 -- Apache module for Flickr API access
TekNap-1.3.g_3 -- Console napster client ap20-mod_geoip2-1.2.5 -- An Apache module that provides the country code of the client's IP
TenDRA-4.20051112 -- A portable BSD-licensed compiler suite ap20-mod_gzip2-2.1.0_1 -- An Internet Content Acceleration module for Apache2+
Terminal-0.4.5 -- Terminal emulator for the X windowing system ap20-mod_jk-ap2-1.2.30_1 -- Apache2 JK module for connecting to Tomcat using AJP1X
TestU01-1.2.3_1 -- Utilities for statistical testing of uniform random number generatorsap20-mod_layout-4.1 -- Apache2 module to wrap served pages with a header and/or footer
Thunar-1.0.2 -- XFce 4 file manager ap20-mod_limitipconn-0.23_2 -- Allows you to limit the number of simultaneous connexions
Tk-FileDialog-1.3_3 -- Tk::FileDialog - A file selector dialog for perl/Tk ap20-mod_lisp2-1.3.1_1 -- Apache2 module for use with Common Lisp
TkTopNetFlows-0.4_4 -- GUI tool for NetFlow data visualisation ap20-mod_log_config-st-1.0_1 -- A modified version of mod_log_config for apache2
Unreal-3.2.8.1_2 -- Unreal - the next generation ircd ap20-mod_log_data-0.0.3_1 -- Module for Apache 2.0 which logs incoming and outgoing data
UserManager-2.1 -- Easily create, change, or delete virtual PureFTPd users ap20-mod_log_firstbyte-1.01 -- Log the time between request and the first byte of the response served
VisualOS-1.0.5_8 -- A visual simulator of an operating system to help understand how OSes
ap20-mod_log_mysql-1.0_1
work -- Allows Apache 2 to log to a MySQL database
WWWdb-0.8.3 -- A Perl based generic WWW DB interface / frontend ap20-mod_macro-1.1.6 -- Apache 2.0.x module for use macros in config files
WadcomBlog-0.3 -- Simple open-source static blog engine written in Python ap20-mod_mono-2.6.3 -- Apache module for serving ASP.NET applications
WebCalendar-1.0.5_2 -- A web-based calendar application ap20-mod_musicindex-1.3.5 -- Apache module that allows downloading and streaming of audio
WebCalendar-devel-1.2.1 -- A web-based calendar application ap20-mod_ntlm2-0.1_3 -- NTLM authentication module for the Apache2 webserver
WebMagick-2.03p3_39,1 -- Image Web Generator - recursively build HTMLs, imagemaps, thumbnails
ap20-mod_perl2-2.0.4_2,3 -- Embeds a Perl interpreter in the Apache2 server
WhistlerK-200010142358_5 -- A GTK theme engine inspired by the Windows Whistler ap20-mod_proctitle-0.3 -- Set httpd process titles to reflect currently processed request
WildMagic-4.p9 -- The Wild Magic Real-Time 3D Graphics Engine ap20-mod_proxy_xml-0.1 -- Apache module for rewriting URI references in XML
Wingz-142_2 -- A Commercial Spreadsheet ap20-mod_pubcookie-3.3.0 -- A single sign-on system for websites (apache module)
WordNet-3.0_2 -- Dictionaries and thesauri with devel. libraries (C, TCL) and browsers ap20-mod_roaming2-2.0.0 -- An Apache module that works as a Netscape Roaming Access server
WowzaMediaServerPro-1.7.2 -- Commercial flash media server written in java ap20-mod_rpaf-ap2-0.6 -- Make proxied requests appear with client IP
XBone-3.2_5 -- Deploys and manages IP-based VPNs (aka "virtual Internets") ap20-mod_security-2.5.12 -- An intrusion detection and prevention engine
XBone-GUI-3.2_5 -- The GUI for XBone, a tool to deploy and manage IP-based VPNs ap20-mod_security21-2.1.7 -- An intrusion detection and prevention engine
XNap-2.5.r3_3 -- A pure java napster client; also, supports OpenNap & giFT (FastTrack) ap20-mod_tidy-0.5.5 -- Validates the HTML output of your apache2 webserver
XPostitPlus-2.3_3 -- PostIt (R) messages onto your X11 screen ap20-mod_traf_thief-0.01 -- Allows you to redirect part of the traffic to your url
XScreenSaver.App-2.3_3 -- WindowMaker dockapp to control XScreenSaver ap20-mod_transform-0.6.0 -- An XSLT and XIncludes Filter module for Apache 2.0
Xaw3d-1.5E_4 -- A 3-D Athena Widget set that looks like Motif ap20-mod_tsa-1.0_1 -- Time stamping authority (RFC 3161) module for apache
XawPlus-3.1.0_4 -- A replacement for Xaw with a nicer 3-D look and some extensions ap20-mod_vdbh-1.0.3 -- Allows mass virtual hosting using a MySQL backend with Apache 2.0.x
Xbae-4.60.4 -- A Motif-based widget which displays a grid of cells as a spreadsheet ap20-mod_vhost_ldap-1.0_1 -- Virtual Hosting from ldap built on top of mod_ldap
XmHTML-1.1.7_9 -- A Motif widget set for displaying HTML 3.2 documents ap20-mod_whatkilledus-2.0 -- Logs a report when a child process crashes
ZendFramework-1.11.1 -- A framework for developing PHP web applications ap20-mod_xmlns-0.97 -- Apache module for XML namespaces
ZendOptimizer-3.3.0.a -- An optimizer for PHP code ap20-mod_xsendfile-0.12 -- An Apache2 module that processes X-SENDFILE headers
a2dev-1.2_1 -- Apple II 6502 assembler, linker, loader, and object file viewer ap22-mod_authn_sasl-1.1 -- Allows user authentication based on libsasl2 mechanisms on apache 2.2
a2pdf-1.13 -- Text to PDF converter ap22-mod_authnz_external-3.1.2_2 -- Allows users authentication based on external mechanisms on apache 2.2
a2png-0.1.5_4 -- Converts plain ASCII text into PNG bitmap images ap22-mod_authz_unixgroup-1.0.1_2 -- A unix group access control module for Apache 2.1 and later
a2ps-a4-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_clamav-0.23_4 -- Scans content delivered by the Apache20 proxy module for viruses
a2ps-letter-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_dnssd-0.6_8 -- An Apache module that provides DNS-SD capabilities
a2ps-letterdj-4.13b_4 -- Formats an ascii file for printing on a postscript printer ap22-mod_h264_streaming-2.2.7_1 -- Apache H264 streaming module
aXe-6.1.2_3 -- Simple to use text editor for X ap22-mod_layout-5.1_5 -- Apache2.2 module to wrap served pages with a header and/or footer
aa-56_2 -- Self-contained ephemeris calculator ap22-mod_line_edit-1.0.0_1 -- Apache module for simple text rewriting
aacgain-1.8 -- Normalizes the volume of mp3 and AAC (mp4/m4a/QuickTime) media files ap22-mod_log_dbd-0.2_3 -- Uses APR DBD to store Apache access logs in a database
aacplusenc-0.17.1 -- aacPlus v2 command-line encoder ap22-mod_log_sql-dtc-1.101_3 -- Allows Apache to log to a MySQL database
aafid2-0.10_3 -- A distributed monitoring and intrusion detection system ap22-mod_macro-1.1.11 -- Apache 2.2.x module for use macros in config files
aalib-1.4.r5_5 -- An ascii art library ap22-mod_memcache-0.1.0_4 -- Apache 2.2.x module to manage apr_memcache connections
aamath-0.3_1 -- Renders ASCII art from mathematical expressions ap22-mod_proxy_html-3.1.2 -- Apache module for rewriting HTML links in proxied content
aap-1.091 -- A build tool alternative to make with internet access and CVS support ap22-mod_python-3.3.1_3 -- Apache module that embeds the Python interpreter within the server
aaphoto-0.39_1 -- Auto Adjust Photo, automatic color correction of photos ap22-mod_remoteip-2.3.5.a -- Replaces the client IP address/hostname with that given by a proxy
abacus-0.9.13_4 -- Spread sheet for X Window System ap22-mod_smooth_streaming-1.0.8_1 -- Apache smooth streaming module
abakus-0.91_9 -- Michael Pyne's Abakus Calculator ap22-mod_vhs-1.1.0 -- Mass virtual hosting using mod_ldap or mod_dbd with Apache 2.2.x
abby-0.4.8_2 -- Front-end for c/clive apache+ipv6-1.3.42 -- The extremely popular Apache http server. Very fast, very clean
abc2mtex-1.6.1 -- Music TeX converter from "abc" to MusiXTeX format apache+mod_perl-1.3.42 -- The Apache 1.3 webserver with a statically embedded perl interpreter
abcde-2.3.3_4 -- Front-end shell script to encode CDs in flac/mp3/ogg/speex format apache+mod_ssl+ipv6-1.3.41+2.8.31_2 -- The Apache 1.3 webserver with SSL/TLS and IPv6 functionality
abck-2.2 -- Manage intrusion attemps recorded in the system log apache+mod_ssl-1.3.41+2.8.31_2 -- The Apache 1.3 webserver with SSL/TLS functionality
abcl-0.0.10_3 -- An implementation of ANSI Common Lisp in Java apache+ssl-1.3.41.1.59_1 -- Apache secure webserver integrating OpenSSL
abclock-1.0d_2 -- Clock for X that displays hours and minutes in an analog fashion apache-1.3.42 -- The extremely popular Apache http server. Very fast, very clean
abcm2ps-5.9.16 -- Converts ABC to music sheet in PostScript format apache-2.0.64 -- Version 2.0.x of Apache web server with prefork MPM.
abcmidi-2010.02.23 -- Convert abc music files to MIDI and PostScript apache-2.2.17_1 -- Version 2.2.x of Apache web server with prefork MPM.
abcselect-1.5 -- Extract parts, movements, etc from abc music files apache-ant-1.8.1 -- Java- and XML-based build tool, conceptually similar to make
abe-1.1_4 -- Abe's Amazing Adventure apache-contrib-1.0.8_1 -- Third-party modules contributed to the Apache HTTP server project
abgx360-1.0.5 -- Verify and repair Xbox 360 backup images apache-event-2.2.17_1 -- Version 2.2.x of Apache web server with event MPM.
abgx360gui-1.0.2_2 -- A wxWidgets frontend for abgx360 apache-forrest-0.8_3 -- A tool for rapid development of small sites
abi-compliance-checker-1.21.7 -- Checks binary compatibility of two versions of a C/C++ apache-itk-2.2.17_1
shared library -- Version 2.2.x of Apache web server with itk MPM.
abills-0.51 -- Billing system for dialup, VPN and VoIP management apache-mode.el-2.0 -- [X]Emacs major mode for editing Apache configuration files
abinit-5.7.3_8 -- Abinit calculates electronic structure of systems apache-peruser-2.2.17_1 -- Version 2.2.x of Apache web server with peruser MPM.
abiword-2.8.4_1 -- An open-source, cross-platform WYSIWYG word processor apache-solr-1.4.1 -- High performance search server built using Lucene Java
abiword-docs-2.8.4 -- AbiWord help files apache-tomcat-4.1.36_2 -- Open-source Java web server by Apache, stable 4.1.x branch
abntex-0.8.2_3 -- Both classes and styles for both LaTex and bibtex for ABNT rules apache-worker-2.2.17_1 -- Version 2.2.x of Apache web server with worker MPM.
abook-0.5.6_4 -- An addressbook program with mutt mail client support apache-xml-security-c-1.4.0 -- Apache XML security libraries C version
abraca-0.4_2 -- Abraca is a GTK2 client for the XMMS2 music player apachetop-0.12.6_2 -- Apache RealTime log stats
abs-0908_3 -- A free spreadsheet with graphical user interface apc-1.0_4 -- An xforms based Auto Payment Calculator
abuse-2.0_3 -- The classic 2D action game Abuse apcpwr-1.2_1 -- Control APC 9211 MasterSwitchs via snmp
abuse_sdl-0.7.1 -- An SDL port of the Abuse game engine apcupsd-3.14.8_1 -- Set of programs for controlling APC UPS
abyssws-2.6 -- Abyss Web Server is a compact and easy to use web server apel-emacs21-10.8 -- A Portable Emacs Library for emacs21
accerciser-1.12.1 -- Interactive Python accessibility explorer for GNOME apel-emacs22-10.8 -- A Portable Emacs Library for emacs22
accessx-0.951_5 -- Customise accessibility features for X apel-emacs23-10.8 -- A Portable Emacs Library for emacs
accrete-1.0 -- Accrete is a physical simulation of solar system planet formation apercu-1.0.2 -- Summarize information from Apache logs
ace+tao-5.4.2+1.4.2 -- The Adaptive Communication Environment (ACE) with The ACE ORB (TAO)
apertium-3.1.1 -- A toolbox to build shallow-transfer machine translation systems
ace+tao-doc-5.5.0 -- The ACE+TAO HTML documentation apg-2.3.0b_1 -- An automated password generator
ace-5.5.2_3 -- The Adaptive Communication Environment for C++ api-sanity-autotest-1.11 -- Quickly generate sanity tests for the API of a C/C++ shared library
acfax-0.981011_3 -- Receive faxes using sound card and radio apinger-0.6.1_2 -- An IP device monitoring tool
achievo-1.1.0_1 -- A flexible web-based resource management tool apngasm-2.2 -- Create Animated PNG from a sequence of files
acidlaunch-0.5_7 -- An application launcher with simple XML-based configuration syntax apollon-1.0.2.1_4 -- KDE client for giFT daemon
acidrip-0.14_8 -- GTK2::Perl wrapper for MPlayer and MEncoder for ripping DVDs apoolGL-0.99.22_4 -- Another billiard simulator
acidwarp-1.0 -- SVGAlib demo which displays trippy mathematical images in cycling colorsapp_notify-2.0.r1_6 -- Notify application module for the Asterisk PBX
aclgen-2.02 -- Optimize Cisco routers ip access lists apparix-20081026 -- Bookmark directories and apparate inside them
aclock-0.3 -- Analog Clock for GNUstep appres-1.0.2 -- Program to list application's resources
acm-5.0_2 -- A flight simulator for X11 appwrapper-0.1_2 -- GNUstep application wrapper
acovea-5.1.1_1 -- Tool to find the "best" compiler options using genetic algorithm apr-0.9.19.0.9.19 -- Apache Portability Library
acovea-gtk-1.0.1_5 -- GTK+ front-end to ACOVEA apr-ipv6-devrandom-gdbm-db42-1.4.2.1.3.10 -- Apache Portability Library
acpicatools-20030523.0 -- Some utilities for Intel ACPICA (Debugger, ASL Compiler and etc.)
apr-ipv6-devrandom-gdbm-db42-2.0.20100610211336_1 -- Apache Portability Library
acrobatviewer-1.1_2 -- Viewer for the PDF files written in Java(TM) apricots-0.2.6_2 -- Fly a little plane around and shoot things and drop bombs
acron-1.0 -- Database of acronyms and abbreviations aprsd-2.2.515 -- Server daemon providing Internet access to APRS packet data
acroread8-8.1.7_2 -- Adobe Reader for view, print, and search PDF documents (ENU) apsfilter-7.2.8_8 -- Magic print filter with print preview, duplex printing and more
acroread9-9.3.4 -- Adobe Reader for view, print, and search PDF documents (ENU) apt-0.6.46.4.1_5 -- Advanced front-end for dpkg
acroreadwrapper-0.0.20100806 -- Wrapper script for Adobe Reader apvlv-0.0.9.8_1 -- Apvlv is a PDF Viewer Under Linux and its behaviour like Vim
activemq-5.4.1 -- Messaging and Integration Patterns provider apwal-0.4.5_9 -- Simple and powerful application launcher
activitymail-1.26 -- A program for sending email messages for CVS repository commits aqbanking-4.2.4_3 -- Online banking interface and financial data framework
actx-1.23_2 -- Window sitter for X11 aqbubble-0.3_10 -- Game similar to snow bros
acx-6.1,1 -- Texas Instruments (TI) ACX100 and ACX111 IEEE 802.11 driver aqemu-0.8.0 -- Qt4 based Qemu frontend
adabooch-20030309 -- Library which provide container classes as well as powertools for Ada
aqmoney-0.6.3 -- Manage your credit institute accounts using openhbci
adabooch-doc-20030309 -- Manual for adabooch aqsis-1.6.0_3 -- A photorealistic rendering system
adacurses-5.7 -- Curses library for Ada aqualung-0.9.b11_9 -- Music player with rich features
adamem-1.0_2 -- ADAMEm is a portable Coleco ADAM and ColecoVision emulator ar-ae_fonts1_ttf-1.1_2 -- A collection of truetype Arabic fonts created by Arabeyes.org
adasdl-20010504_9 -- An Ada thin binding to SDL ar-ae_fonts_mono-1.0_2 -- A collection of PCF fonts that include Arabic glyphs
adblock-0.5.d_6 -- A content filtering plug-in for seamonkey ar-arabtex-3.11_4 -- A TeX/LaTeX package to generate the arabic writing
adcomplain-3.52 -- Complain about inappropriate commercial use (f.e. SPAM) of usenet/e-mail
ar-aspell-1.2.0_1,1 -- Aspell Arabic dictionaries
add-20100708 -- Full-screen editing calculator ar-kacst_fonts-2.01 -- Truetype Arabic fonts created by KACST
add-css-links-1.0_1 -- Add one or more CSS <link> elements to an XHTML document ar-kde-i18n-3.5.10_4 -- Arabic messages and documentation for KDE3
addresses-0.4.7_2 -- A versatile addressbook for GNUstep ar-kde-l10n-4.5.4 -- Arabic messages and documentation for KDE4
addresses-goodies-0.4.7_1 -- Goodies for addressbook for GNUstep ar-khotot-1.0_2 -- A meta-port of the most popular Arabic font packages
adesklets-0.6.1_8 -- An interactive Imlib2 console for the X Window system ar-koffice-i18n-1.5.2_6 -- Arabic messages and documentation for koffice
adgali-0.2.4_8 -- An open source game library useful for 2D game development ar-libitl-0.7.0 -- An API abstraction to common Islamic calculations
adime-2.2.1_2 -- Generate Allegro dialogs in a very simple way arc-5.21o_1 -- Create & extract files from DOS .ARC files
admesh-0.95_1 -- Program for processing STL triangulated solid meshes arcconf-v6.50.18570 -- Adaptec SCSI RAID administration tool
adminer-3.1.0 -- A full-featured MySQL management tool written in PHP archivemail-0.8.2 -- Archive or delete mail older than N days
adms-2.2.9 -- A model generator for SPICE simulators archivemount-0.6.0 -- Mount archives with FUSE
admuser-2.3.2 -- Handle your Squid or Web users and passwords using your browser archiveopteryx-3.1.3 -- An advanced PostgreSQL-based IMAP/POP server
adns-1.4_1 -- Easy to use, asynchronous-capable DNS client library and utilities archivesmtp-1.1.b1 -- SMTP mail archiver
adobe-cmaps-20051217_1 -- Adobe CMap collection archmage-0.2.4 -- Extensible reader/decompiler of files in CHM format
adocman-0.13_1 -- Automated sourceforge administration tool archmbox-4.10.0 -- Email archiver written in perl; parses mailboxes and performs actions
adodb-4.99.2 -- Database library for PHP ardour-2.8.2_4 -- A multichannel digital audio workstation
adodb-5.11 -- Database library for PHP arduino-0019 -- Open-source electronics prototyping platform
adom-1.1.1_2 -- An rogue-like advanced rpg with color support (binary port) areca-cli-i386-1.83.091103 -- Command Line Interface for the Areca ARC-xxxx RAID controllers
adonthell-0.3.5_6 -- A free role playing game arena-0.9.13 -- C-like scripting language with automatic memory management
adpcm-1.2 -- An Intel/DVI IMA ADPCM codec library ares-1.1.1 -- An asynchronous DNS resolver library
adplay-1.7_3 -- AdLib player using adplug library argouml-0.30.2_1 -- A UML design tool with cognitive support
adstudio-9.0.5 -- A database query and administration tool argp-standalone-1.3_2 -- Standalone version of arguments parsing functions from GLIBC
adtool-1.3_1 -- Active Directory administration tool argtable-2.12 -- An ANSI C library for parsing GNU style command line arguments
adun-0.81 -- Molecular Simulator for GNUstep argus-2.0.6_1 -- A generic IP network transaction auditing tool
advancecomp-1.15 -- Recompression utilities for .ZIP, .PNG, .MNG and .GZ files argus-clients-2.0.6_1 -- Client programs for the argus IP network transaction auditing tool
advancemame-0.106.1 -- SDL MAME port with advanced TV and monitor video support argus-clients-sasl-3.0.2 -- Client programs for the argus IP network transaction auditing tool
advancemenu-2.5.0 -- A frontend for AdvanceMAME, MAME, MESS, RAINE argus-monitor-20060722_4 -- Argus - The All Seeing System and Network Monitoring Software
advancemess-0.102.0.1_2 -- SDL MESS port with advanced TV and monitor video support argus-sasl-3.0.2 -- A generic IP network transaction auditing tool
advi-1.9 -- Active-DVI viewer ari-yahoo-1.10_3 -- A console Yahoo! messenger client
adzap-20090301 -- Filter out animated ad banners from web pages aria-1.0.0_5 -- Yet another download tool
aee-2.2.15b_1 -- An easy editor with both curses and X11 interfaces aria2-1.10.0 -- Yet another download tool
aegis-4.24_5 -- Transaction-based software configuration management system aria2fe-0.0.5_3 -- Aria2 QT front-end
aegisub-2.1.8_2 -- Aegisub Project is a cross-platform subtitle editor ariadne-1.3 -- Programs to compare protein sequences and profiles
aescrypt-0.7_1 -- A command-line AES encryption/decryption suite aribas-1.64 -- Interpreter for big integer/multi-precision floating point arithmetic
aeskulap-0.2.1_1 -- A medical image viewer ario-1.5 -- Ario is a GTK2 client for MPD
aespipe-v2.3.e -- An AES encrypting or decrypting pipe arirang-2.00,1 -- Powerful webserver security scanner for network
aestats-5.39 -- An advanced HTML statistics generator for various games arista-0.9.5 -- An easy to use multimedia transcoder for the GNOME Desktop
aewan-1.0.01 -- Curses-based program for the creation and editing of ascii art arj-3.10.22_4 -- Open-source ARJ
aewm-1.2.7_3 -- ICCCM-compliant window manager based on 9wm arkpandora-2.04_2 -- Arkpandora TrueType fonts
af-aspell-0.50.0_1,1 -- Aspell Afrikaans dictionary arm-elf-binutils-2.14_2 -- GNU binutils for vanilla ARM cross-development
af-kde-i18n-3.5.10_4 -- Afrikaans localized messages and documentation for KDE3 arm-rtems-binutils-2.20 -- GNU binutils port for cross-target development
af-koffice-i18n-1.5.2_6 -- Afrikaans messages and documentation for koffice arm-rtems-gcc-4.4.2_2 -- GNU gcc for cross-target development
afay-041111 -- Improved aflex and ayacc Ada 95 native scanner and parser generators arm-rtems-gdb-7.1 -- GNU gdb port for cross-target development
afbinit-1.0_4 -- Sun AFB aka Sun Elite 3D microcode firmware loader armagetron-0.2.8.2.1_5 -- A multiplayer networked Tron clone in 3D
affenspiel-1.0_2 -- Little puzzle game with monkey for X Window System arora-0.11.0 -- Simple Qt4 based browser
affiche-0.6.0_2 -- Affiche allows people to stick notes aros-sdk-0.20060207 -- The Software development kit (SDK) for the AROS Operating System
afflib-3.6.4 -- The Advanced Forensics Format library and utilities arp-scan-1.7 -- ARP Scanning and Fingerprinting Tool
afio-2.5 -- Archiver & backup program w/ builtin compression arp-sk-0.0.16_2 -- A tool designed to manipulate ARP tables of all kinds of equipment
afm-1.0 -- Adobe Font Metrics arpack++-1.2_3 -- ARPACK++ is an object-oriented version of the ARPACK package
afni-2008.01.02.1043_5 -- Advanced Functional Neuro Imaging arpack-96_7 -- Argand Library: large eigenvalue subroutines (serial version)
afnix-1.9.0 -- A multi-threaded functional programming language arpalert-2.0.11_1 -- ARP traffic monitoring
afsp-8.2 -- Audio file conversion utilities and library arpdig-0.5.2 -- ARP Digger utility
aft-5.098,1 -- A document preparation system using an Almost Free Text input format arping-2.09 -- ARP level "ping" utility
aften-0.0.8 -- ATSC A/52 audio encoder arprelease-1.2_2 -- Libnet tool to flush arp cache entries from devices (eg. routers)
afterglow-1.6.0 -- A collection of graph-generating scripts arpscan-0.3 -- Simple arp scanner
afternoonstalker-1.1.4 -- A clone of the 1981 Night Stalker video game arpwatch-2.1.a15_6 -- Monitor arp & rarp requests
afterstep-1.0_3 -- Window manager originally based on the Bowman NeXTstep clone arss-0.2.3 -- Additive Image Synthesizer (convert audio to images, images to audio)
afterstep-2.2.9_2 -- A stable version of the AfterStep window manager artemis-9_1 -- A DNA sequence viewer and annotation tool
afterstep-i18n-1.0_4 -- The NeXTstep clone window manager with Fontset support arts++-1.1.a13,1 -- A network data storage and analysis library from CAIDA
aftp-1.0 -- A ftp-like shell for accessing Apple II disk images arts-1.5.10_5,1 -- Audio system for the KDE integrated X11 desktop
agame-1577_7 -- A simple tetris-like game artswrapper-1.5.3 -- Setuid wrapper for arts
agave-0.4.2_8 -- A color scheme builder for the GNOME desktop artwiz-aleczapka-de-1.3_2 -- A set of (improved) artwiz fonts
agef-3.0 -- Show disk usage of file sizes and counts sorted by file age artwiz-aleczapka-en-1.3_2 -- A set of (improved) artwiz fonts
aget-0.4.1 -- A multithreaded HTTP download accelerator artwiz-aleczapka-se-1.3_2 -- A set of (improved) artwiz fonts
agg-2.5_6 -- A High Quality Rendering Engine for C++ artwiz-fonts-1.0_3 -- A set of free fonts for X11 desktops
aggregate-1.6_1 -- Optimise a list of route prefixes to help make nice short filters as31-2.0.b3_6 -- A free 8051 assembler
agrep-2.04_2 -- Approximate grep (fast approximate pattern-matching tool) asWedit-4.0.1_3 -- An easy to use HTML and text editor
aguri-0.7_1 -- An Aggregation-based Traffic Profiler asapm-3.1_2 -- Laptop battery status display for X11
ah-tty-0.3.12 -- Ah-tty is an automatic helper for command prompts and shells asbutton-0.3_3 -- A dockapp that displays 4 or 9 buttons to run apps of your choice
ahwm-0.90_2 -- An X11 window manager asc-2.4.0.0 -- A turn based, multiplayer strategic game with very nice graphics
aide-0.13.1_3 -- A replacement and extension for Tripwire ascd-0.13.2_1 -- A dockable cd player for AfterStep or WindowMaker
aifad-1.0.27_2 -- Machine learning system ascii2binary-2.14 -- Convert between textual representations of numbers and binary
aiksaurus-1.2.1_2 -- A set of libraries and applications which provide a thesaurus ascii2pdf-0.9.1 -- A perl script to convert text files to PDF files
aiksaurus-gtk-1.2.1_10 -- A GTK+2 front-end for Aiksaurus, a thesaurus asciidoc-8.6.1 -- A text document format for writing short documents and man pages
aim-1.5.286_4 -- AOL's Instant Messenger (AIM) client asciio-1.02.71_2 -- A Perl/GTK application that lets you draw
aimage-3.2.4 -- Advanced Disk Imager
aimsniff-0.9d -- AOL Instant Messanger Sniffing and Reading Tool
aircrack-ng-1.1 -- An 802.11 WEP and WPA-PSK keys cracking program
airoflash-1.7 -- Flash utiltity for Cisco/Aironet 802.11 wireless cards
airport-2.0.1_3 -- Apple Airport / Lucent RG-1000 configuration program
airrox-0.0.4_8 -- An 3D Air Hockey, which uses SDL & OpenGL
aish-1.13 -- Ish/uuencode/Base64 converter
akamaru-0.1_6 -- Simple, but fun, physics engine prototype
akode-2.0.2_1,1 -- Default KDE audio backend
akode-plugins-ffmpeg-2.0.2_1,1 -- FFMPEG decoder plugin for akode
akode-plugins-jack-2.0.2,1 -- Jack output plugin for akode
akode-plugins-mpc-2.0.2,1 -- Musepack decoder plugin for akode
akode-plugins-mpeg-2.0.2,1 -- MPEG audio decoder plugin for akode
akode-plugins-oss-2.0.2,1 -- OSS output plugin for akode
akode-plugins-pulseaudio-2.0.2_4 -- Pulseaudio output plugin for akode
akode-plugins-resampler-2.0.2,1 -- Resampler plugin for akode
akode-plugins-xiph-2.0.2_3,1 -- FLAC/Speex/Vorbis decoder plugin for akode
akonadi-1.4.1_1 -- Storage server for kdepim
akonadi-googledata-1.1.0_1 -- Akonadi Resources for Google Contacts and Calendar
akpop3d-0.7.7 -- POP3 daemon aimed to be small and secure
alabastra-0.21b_1 -- C++ Editor writen with QT4
alac-0.2.0 -- Basic decoder for Apple Lossless Audio Codec files (ALAC)
alacarte-0.13.2 -- An editor for the freedesktop.org menu specification
alarm-clock-1.4 -- Alarm Clock for the GNOME desktop
albumart-1.6.6_3 -- GUI application for downloading album cover art
albumshaper-2.1_4 -- A drag-n-drop hierarchal photo album creation
ald-0.1.7 -- Debugger for assembly level programs
aldo-0.7.5_2 -- Morse code training program
ale-0.8.11.2_6 -- Anti-Lamenessing Engine
alephone-20100424_1 -- The open source version of Bungie's Marathon game
alephone-data-1.0_6 -- Released Marathon data files for the Aleph One port
alephone-scenarios-1.0_3 -- Free scenarios for the Aleph One engine
alevt-1.6.2_1 -- X11 teletext decoding and display program
alf-0.1_1 -- Abstract Large File
algae-4.3.6_4 -- A programming language for numerical analysis
algol68g-2.0.3 -- Alogol 68 Genie compiler
algotutor-0.8.6_3 -- An interactive tutorial for algorithms and data structures
alienarena-2010.745 -- Alien Arena (native version)
alienarena-data-2010.745 -- Alien Arena (data)
credi t t k

alienblaster-1.1.0_4 -- Alien Blaster


alienwah-1.13_1 -- Paul Nasca's AlienWah LADSPA Plugin
alienwave-0.3.0 -- Shoot'em up game written using ncurses
align-1.7.1 -- Text column alignment filter
alignmargins-1.0_1 -- Utility script to generate custom margins in PPDs for CUPS
alisp-20060917 -- An interpreter for purely symbolic LISP
allacrost-1.0.2 -- A single player 2D role-playing game
allegro-4.2.2_3 -- A cross-platform library for games and multimedia programming
allegro-devel-4.3.1_5 -- A cross-platform library for games and multimedia programming
allegrogl-0.4.3 -- OpenGL inteface for Allegro library
alliance-5.0.20090901_1 -- A complete set of CAD tools and libraries for VLSI design
alltraxclock-2.0.2_10 -- An analog clock plugin for gkrellm2
alltray-0.70_3 -- Dock any application with no native tray icon
alpine-2.00_2 -- Mail and news client descended from Pine
alpng-1.3 -- Library for display PNG images in programs
alsa-lib-1.0.23 -- ALSA compatibility library
alsa-plugins-1.0.23_1 -- ALSA compatibility library plugins
alsa-utils-1.0.23_1 -- ALSA compatibility utils
altermime-0.3.11.a1 -- Small C program which is used to alter your mime-encoded mailpacks
althea-0.5.7_5 -- Yet another GTK-based mail reader for X. Supports IMAP

57
am-aspell-0.03.1_1,2 -- Aspell Amharic dictionary

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm
am-utils-6.1.5,1 -- The Berkeley Automounter Suite of Utilities
amanda-client-2.5.1p3_4,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-client-2.6.1p2_3,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-client-3.2.0_2,1 -- The Advanced Maryland Automatic Network Disk Archiver (client)
amanda-perl-wrapper-1.01 -- Perl wrapper to use with Amanda (with libthr.so.* linked)
amanda-server-2.5.1p3_7,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanda-server-2.6.1p2_3,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanda-server-3.2.0_2,1 -- The Advanced Maryland Automatic Network Disk Archiver (server)
amanith-0.3_8 -- C++ CrossPlatform framework designed for 2d & 3d vector graphics
amap-5.2 -- Application mapper
amarok-1.4.10_12 -- Media player for KDE
amarok-2.3.2 -- Media player for KDE4
amarok-fs-0.5_9 -- A full screen application for Amarok
practice

Crypto algorithms search results. had been independent implementa- tial market demand for fire that can be
tions, then there would be reason to applied nasally.
worry about the security implications, The other possible avenue of hope
Cryptographic Implementations but they are not. is that the ISO-C standardization group
Algorithm Detected In a few cases, optimized or license- would address this embarrassing situ-
MD2 6
sanitized versions have been written, ation. Before getting your hopes too
MD4 49
but overwhelmingly this is just point- high, bear in mind they have still not
MD5 920
less copy-and-paste of identical source managed to provide for specification of
SHA-1 136
code in blatant disregard of Occam’s integer endianness, even though CPUs
SHA-2 192
three-quarters-millennia-old advice. can do it and hardware and protocols
AES 39
I am a card-carrying member of the have needed it since the days of the AR-
Total 1,342
“aghast” segment. My membership PANET.
card is a FreeBSD commit message If the ISO-C crew decided to do it,
shown in the figure here. their process for doing so would un-
My libmd, which is as unencum- doubtedly consume 5–10 years before
ments: “huh?,” “sigh,” and “aghast.” bered by copyright issues as it can be, a document came out at the other end,
The “huh?” segment wonders what later grew more cryptographic hash al- by which time SHA-3 would likely be
the big deal is: the absence of a stan- gorithms, such as RIPEMD-160 and the ready, rendering the standard instantly
dardized system library with these SHA family, and it has been adopted by obsolete.
functions means that you have to some other operating systems. But it is all a pipe dream, if ISO is
“Bring Your Own Crypto” if you want I am also in the “sigh” segment, still allergic to standards with ITAR
some. because not all mainstream operating restrictions. And you can forget every-
The “sigh” segment thinks this is systems have adopted libmd, despite thing about a benevolent dictator lay-
the least of our troubles. having 16 years to do so, and if they ing down the wise word as law: Linus
The “aghast” segment will see this have, they do not agree what should doesn’t do userland.
as a total failure of good software engi- be in it. For example, Solaris seems to To be honest, what I have identified
neering practices, a call to arms for bet- leave MD2 out (see http://hub.opensolar- here is probably the absolutely worst-
ter education, and reason for a stake is.org/bin/view/Project+crypto/libmd), case example.
through the heart of the Open Zombie which begs the question: Which part First, if you need SHA-2, you need
Group. of “software portability” don’t they SHA-2, and it has to do the right and
And they are all correct, of course, understand? correct thing for SHA-2. There is little
each from its own vantage point. I am, sadly, also in the “huh?” seg- or no room for creativity or improve-
Fortunately, what this is not, is The ment, because there seems to be no ments, apart from performance.
Next Big Security Issue, even though I hope. The rational thing to expect Second, crypto algorithms are every-
would not be surprised if one or more would be that somebody from The where these days. Practically all com-
“security researchers” would claim so Open Group reads this article, repro- munication methods, from good old
from their parents’ basement.b If these duces my statistics, and decides that email over VPNs (virtual private net-
yes, there is indeed demand for a “lib- works) and torrent sites to VoIP (voice
b The fact that MD5 seems to be more in de- stdcrypto” filled with the usual bunch over IP), offers strong crypto.
mand—yes, I may indeed be to blame for that of crypto algorithms. That, I am told, is But aren’t those exactly the same
myself, but that is a story for another day; impossible. The Open Group does not two reasons why we should not be in
search for “md5crypt” if you cannot wait—
than its quality warrants is a matter of choice
write new standards; they just bicker this mess to begin with?
of algorithm, not a matter of implementation over the usability of ${.CURDIR} in
of the algorithm chosen. make(1) and probably also the poten-
Related articles
A card-carrying member of the “aghast” segment. on queue.acm.org
Languages, Levels, Libraries, and Longevity
John R. Mashey
src/lib/libmd/Makefile: http://queue.acm.org/detail.cfm?id=1039532
Gardening Tips
r1802 | phk | 1994-07-24 03:29:56 +0000 (Sun, 24 Jul 1994) Kode Vicious
http://queue.acm.org/detail.cfm?id=1870147

Imported libmd. This library contains MD2, MD4, and MD5. Poul-Henning Kamp (phk@FreeBSD.org) has
programmed computers for 26 years and is the inspiration
These three boggers pop up all over the place all of the time, so I behind bikeshed.org. His software has been widely
adopted as “under the hood” building blocks in both open
decided we needed a library with them. In general, they are used for source and commercial products. His most recent project
is the Varnish HTTP accelerator, which is used to speed up
large Web sites such as Facebook.
security checks, so if you use them you want to link them static.

© 2011 ACM 0001-0782/11/0300 $10.00

58 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
October 22–27, 2011
Co-located with SPLASH /OOPSLA
Hilton Portland & Executive Tower
Portland, Oregon USA

ONWARD! 2011
ACM Conference on New Ideas in
Programming and Reflections on Software
Submissions for papers, workshops, essays, and films >> April 8, 2011
Chair
Robert Hirschfeld
Hasso-Plattner-Institut Potsdam, Germany
chair@onward-conference.org

Papers
Eelco Visser
Delft University of Technology, The Netherlands
papers@onward-conference.org

Workshops
Pascal Costanza
Vrije Universiteit Brussel, Belgium
workshops@onward-conference.org

Essays
David West
New Mexico Highlands University, USA
essays@onward-conference.org

Films
Bernd Bruegge
Technische Universität München, Germany
films@onward-conference.org

http://onward-conference.org/
contributed articles
d oi:10.1145/1897852.1897871
Many of the best micro-, tele-, and
Compose “dream tools” from continuously macroscopes are designed by scien-
tists keen to observe and comprehend
evolving bundles of software to make sense what no one has seen or understood
of complex scientific data sets. before. Galileo Galilei (1564–1642) rec-
ognized the potential of a spyglass for
By Katy Börner the study of the heavens, ground and
polished his own lenses, and used the

Plug-and-Play
improved optical instruments to make
discoveries like the moons of Jupiter,
providing quantitative evidence for the
Copernican theory. Today, scientists

Macroscopes
repurpose, extend, and invent new
hardware and software to create mac-
roscopes that may solve both local and
global challenges20 (see the sidebar
“Changing Scientific Landscape”).
My aim here is to inspire comput-
er scientists to implement software
frameworks that empower domain sci-
entists to assemble their own continu-
ously evolving macroscopes, adding
De cis ion m akin g i n science, industry, and politics, and upgrading existing (and removing
obsolete) plug-ins to arrive at a set that
as well as in daily life, requires that we make sense is truly relevant for their work—with
of data sets representing the structure and dynamics little or no help from computer scien-
tists. Some macroscopes may resem-
of complex systems. Analysis, navigation, and ble cyberinfrastructures (CIs),1 pro-
management of these continuously evolving data sets viding user-friendly access to massive
require a new kind of data-analysis and visualization amounts of data, services, computing
resources, and expert communities.
tool we call a macroscope (from the Greek macros, or Others may be Web services or stand-
“great,” and skopein, or “to observe”) inspired by de alone tools. While microscopes and
telescopes are physical instruments,
Rosnay’s futurist science writings.8 macroscopes resemble continuously
Just as the microscope made it possible for the changing bundles of software plug-ins.
naked human eye to see cells, microbes, and viruses, Macroscopes make it easy to select and
combine algorithm and tool plug-ins
thereby advancing biology and medicine, and but also interface plug-ins, workflow
just as the telescope opened the human mind to support, logging, scheduling, and oth-
er plug-ins needed for scientifically rig-
the immensity of the cosmos and the conquest of orous work. They make it easy to share
space—the macroscope promises to help make sense
of yet another dimension—the infinitely complex. key insights
Macroscopes provide a “vision of the whole,” helping OS
 Gi/CIShell-powered tools improve
decision making in e-science,
us “synthesize” the related elements and detect government, industry, and education.
patterns, trends, and outliers while granting access to N on-programmers can use OSGi/CIShell
myriad details.18,19 Rather than make things larger or to assemble custom “dream tools.”

smaller, macroscopes let us observe what is at once N ew plug-ins are retrieved automatically
via OSGi update services or shared via
too great, slow, or complex for the human eye and email and added manually; they can be
plugged and played dynamically, without
mind to notice and comprehend. restarting the tool.

60 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
UCSD Map of Science with data overlays of MEDLINE publications that acknowledge NIH funding.

plug-ins via email, flash drives, or on- tribute software; for example, in August grid-computing resources for extra
line. To use new plug-ins, simply copy 2009, SourceForge.net hosted more cycles. The collaborative environment
the files into the plug-in directory, and than 230,000 software projects by two of myExperiment (http://myexperi-
they appear in the tool menu ready for million registered users (285,957 in ment.org) (discussed later) supports
use. No restart of the tool is necessary. January 2011); also in August 2009 Pro- the sharing of scientific workflows and
Sharing algorithm components, tools, grammableWeb.com hosted 1,366 ap- other research objects.
and novel interfaces becomes as easy plication programming interfaces and Missing so far is a common stan-
as sharing images on Flickr or videos 4,092 mashups (2,699 APIs and 5,493 dard for the design of modular, com-
on YouTube. Assembling custom tools mashups in January 2011) that combine patible algorithm and tool plug-ins
is as quick as compiling your custom data or functionality from two or more (also called modules or components)
music collection. sources to arrive at a service. easily combined into scientific work-
The macroscopes presented here Web services convert any Web flows (also called pipeline and com-
were built using the Open Services browser into a universal canvas for in- position). This leads to duplication
Gateway Initiative Framework (OSGi) formation and service delivery. In ad- of work, as even in the same project,
industry standard and the Cyberin- dition, there are diverse e-science in- different teams might develop several
Court esy o f Cyb erinfrast ruc ture fo r Net work Science Center, ht tp ://c ns.iu.edu

frastructure Shell (CIShell) that sup- frastructures supporting researchers incompatible “plug-ins” that have al-
ports integration of new and existing in the composition and execution of most identical functionality yet are
algorithms into simple yet powerful analysis and/or visualization pipelines incompatible. Plus, adding a new algo-
tools. As of January 2011, six different or workflows. Among them are sev- rithm plug-in to an existing cyberinfra-
research communities were benefit- eral cyberinfrastructures serving large structure or bundling and deploying a
ting from OSGi and/or CIShell powered biomedical communities: the cancer subset of plug-ins as a new tool/service
tools. Several other tool-development Biomedical Informatics Grid (caBIG) requires extensive programming skills.
efforts consider adoption. (http://cabig.nci.nih.gov); the Biomed- Consequently, many innovative new
ical Informatics Research Network algorithms are never integrated into
Related Work (BIRN) (http://nbirn.net); and the In- common CIs and tools due to resource
Diverse commercial and academic ef- formatics for Integrating Biology and limitations.
forts support code sharing; here, I dis- the Bedside (i2b2) (https://www.i2b2. Web sites like IBM’s Many Eyes
cuss those most relevant for the design org). The HUBzero (http://hubzero.org) (http://manyeyes.alphaworks.ibm.com/
and deployment of plug-and-play mac- platform for scientific collaboration manyeyes/visualizations) and Swivel
roscopes: uses the Rapture toolkit to serve Java (http://swivel.com) demonstrate the
Google Code and SourceForge.net applets, employing the TeraGrid, the power of community data sharing and
provide the means to develop and dis- Open Science Grid, and other national visualization. In 2009 alone, Many Eyes

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 61
contributed articles

Figure 1. The NWB tool interface (I) with menu (a), Console (b), Scheduler (c), and Data Manager (d). The two visualizations of Renaissance
Florentine families used the GUESS tool plug-in (II) and prefuse.org algorithm plug-in (III) Nodes denote families labeled by name;
links represent marriage and business relationships among families. In GUESS, nodes are size-coded by wealth and color-coded by degree;
marriage relationships are in red using the Graph Modifier (d). The “Pazzi” family in (c) was selected to examine properties in the
Information Window (b).

had more than 66,429 data sets and they offer valuable functionality and into “custom tools.” To make all three
35,842 visualizations, while Swivel had are widely used in research, education, parts work properly, it is important to
14,622 data sets and 1,949,355 graphs and industry, none makes it easy for understand who takes ownership of
contributed and designed by 12,144 users to share and bundle their algo- which ones and what general features
users. Both sites let users share data rithms into custom macroscopes. are desirable (see the sidebar “Desir-
(not algorithms), generate and save dif- able Features and Key Decisions”).
ferent visualization types, and provide Plug-and-Play Software Core architecture. To serve the
community support. In January 2011, Architectures needs of scientists (see both sidebars)
the numbers for Many Eyes increased When discussing software architec- the core architecture must empower
to 165,124 data sets and 79,115 visual- tures for plug-and-play macroscopes, non-programmers to plug, play, and
izations, while Swivel ceased to exist. it is beneficial to distinguish among: share their algorithms and to design
Data analysis and visualization is (1) the “core architecture” facilitating custom macroscopes and other tools.
also supported by commercial tools the plug-and-play of data sets and algo- The solution proposed here is based on
like Tableau (http://tableausoftware. rithms; (2) the “dynamic filling” of this OSGi/CIShell:
com), Spotfire (http://spotfire.tibco. core comprising the actual algorithm, Open Services Gateway Initiative. De-
com), and free tools; see Börner et al.6 tool, user interface, and other plug-ins; veloped by the OSGi Alliance (http://
for a review of 20 tools and APIs. While (3) and the bundling of all components osgi.org), this service platform has

62 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


contributed articles

been used since 1999 in industry, requires a “persister” plug-in to load, vices, providing instead interfaces for
including by Deutsche Telekom, Hi- view, and save a data set from/to a data data-set and algorithm services, basic
tachi, IBM, Mitsubishi Electric, NEC, file in a specific format. Some data services (such as logging and conver-
NTT, Oracle, Red Hat, SAP AG, and Sie- models lack a persister plug-in, instead sion), and application services (such
mens Enterprise Communications. It converting data to or from some other as scheduler and data manager). Each
is a dynamic module system for Java, data format that does have one. CIS- bundle includes a manifest file with a
supporting interoperability of appli- hell also defines a set of algorithm APIs dependency list stating which pack-
cations and services in a mature and that allows developers to develop and ages and other bundles it must run;
comprehensive way with an effective integrate diverse new or existing algo- all bundles are prioritized. Upon ap-
yet efficient API. The platform is inter- rithms as plug-ins. plication start-up, the bundles with
face-based, easing plug-and-play inte- Though written in Java, CIShell sup- highest priority start first, followed
gration of independent components by ports integration of algorithms written by bundles of second, third, fourth,...
managing class and dependency issues in other programming languages, in- priority. Bundles can also be started at
when combining components. It is cluding C, C++, and Fortran. In prac- runtime.
also dynamic; that is, new components tice, a pre-compiled algorithm must A bundle can create and register an
can be added without stopping the be wrapped as a plug-in that imple- object with the OSGi service registry un-
program. It also comes with a built-in ments basic interfaces defined in the der one or more interfaces. The services
mechanism for retrieving new compo- CIShell Core APIs. Pre-compiled algo- layer connects bundles dynamically by
nents through the Internet. As service- rithms can be integrated with CIShell offering a “publish-find-bind” model
oriented architecture, OSGi is an easy by providing metadata about their in- for Java objects. Each service registra-
way to bundle and pipeline algorithms put and output. Various templates are tion has a set of standard and custom
into “algorithm clouds.” A detailed de- available for facilitating integration properties. An expressive filter language
scription of the OSGi specification and of algorithms into CIShell. A plug-in is available to select relevant services.
existing reference implementations is developer simply fills out a sequence Services are dynamic; that is, bundles
beyond the scope of this article but can of forms for creating a plug-in and ex- can be installed and uninstalled on the
be explored through http://www.osgi. ports it to the installation directory fly, while other bundles adapt, and the
org/Specifications. and the new algorithm appears in the service registry accepts any object as a
Leveraging OSGi provides access to CIShell graphical user interface (GUI) service. However, registering objects
a large amount of industry-standard menu. This way, any algorithm or tool under (standard) interfaces (such as
code—prebuilt, pretested, continuous- that can be executed from a command OSGi and CIShell) helps ensure reuse.
ly updated components—and know- line is easily converted into a CIShell Due to the declarative specification of
how that would otherwise take years to compatible plug-in. bundle metadata, a distributed version
reinvent/re-implement, thus helping CIShell’s reference implementation of CIShell could be built without chang-
reduce time to market, development, also includes a number of basic ser- ing most algorithms.
and cost of maintenance. OSGi bundles vices, including a default menu-driven The result is that domain scientists
can be developed and run using a num- interface, work-log-tracking module, can mix and match data sets and al-
ber of frameworks, including the Equi- a data manager, and a scheduler (see gorithms, even adding them dynami-
nox project from Eclipse (http://eclipse. Figure 1, left). Work logs—displayed in cally to their favorite tool. All plug-ins
org/equinox), the reference implemen- a console and saved in log files—com- that agree on the CIShell interfaces
tation of the OSGi R4 core framework. prise all algorithm calls and param- can be run in software designed with
Eclipse includes extensive add-ons for eters used, references to original pa- the OSGi/CIShell core architecture. No
writing and debugging code, interact- pers and online documentation, data common central data format is need-
ing with code repositories, bug track- loaded or simulated, and any errors. ed. Plug-ins can be shared in a flexible,
ing, and software profiling that greatly The algorithm scheduler shows all cur- decentralized fashion.
extend the base platform. rently scheduled or running processes, Dynamic filling. As of January 2011,
Cyberinfrastructure Shell (http://cis- along with their progress. CIShell can the OSGi/CIShell plug-in pool included
hell.org). This open-source software be deployed as a standalone tool or more than 230 plug-ins, including ap-
specification adds “sockets” to OSGi made available as either a Web or peer- proximately 60 “core” OSGi/CIShell
into which data sets, algorithms, and to-peer service. The CIShell Algorithm plug-ins and a “filling” of more than
tools can be plugged using a wizard- Developer’s Guide7 details how to de- 170 algorithm plug-ins, plus 40 sample
driven process.11 CIShell serves as a velop and integrate Java and non-Java data sets, as well as configuration files
central controller for managing data algorithms or third-party libraries. and sample data files. Nearly 85% of
sets and seamlessly exchanging data OSGi/CIShell combined. Software de- the algorithm plug-ins are implement-
and parameters among various imple- signed using OSGi/CIShell is mainly a ed in Java, 5% in Fortran, and the other
mentations of algorithms. It also de- set of Java Archive bundles, also called 10% in C, C++, Jython, and OCaml; see
fines a set of generic data-model APIs plug-ins. OSGi services, CIShell ser- http://cishell.wiki.cns.iu.edu.
and persistence APIs. Extending the vices, and data set/algorithm services Custom tools. The OSGi/CIShell
data-model APIs makes it possible all run in the OSGi container. The CIS- framework is at the core of six plug-
to implement and integrate various hell framework API is itself an OSGi and-play tools that resemble simple
data-model plug-ins. Each data model bundle that does not register OSGi ser- macroscopes and serve different sci-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 63
contributed articles

entific communities; for example, the interactive plotting utility for data by different users. Much of the related
Information Visualization Cyberifra- and related functions (http://gnuplot. complexity is hidden; for example, us-
structure (IVC) was developed for re- info). NWB uses 15 supporting librar- ers do not see how many converters are
search and education in information ies, including Colt, JUNG, Jython, and involved in workflow execution. Only
visualization; the Network Workbench Prefuse (see Prefuse layouts in Figure those algorithms that can be applied
(NWB) tool was designed for large- 1, III); detailed listings are provided in to a currently selected data set can be
scale network analysis, modeling, and the NWB tutorial3 and wiki (http://nwb. selected and run, with all others grayed
visualization; the Science of Science wiki.cns.iu.edu). out. Expert-workflow templates and tu-
(Sci2) tool is used by science-of-science A common network-science work- torials provide guidance through the
researchers, as well as by science-poli- flow includes data loading and/or vast space of possible algorithm com-
cy analysts; the Epidemics (EpiC) tool modeling, preprocessing, analysis, vi- binations.
is being developed for epidemiolo- sualization, and export of results (such The Science of Science tool (http://
gists; TEXTrend supports analysis of as tables, plots, and images). More sci2.cns.iu.edu). The Sci2 tool supports
text; and DynaNets will be used to ad- than 10 different algorithms may be the study of science itself through sci-
vance theory on network diffusion pro- run in one workflow, not counting data entific methods; science-of-science
cesses. Here, NWB and Sci2 are covered converters. Common workflows and studies are also known as scientomet-
in detail: references to peer-reviewed papers are ric, bibliometric, or informetric stud-
The NWB tool (http://nwb.cns. given in Börner et al.3 Here are six ex- ies. Research in social science, political
iu.edu) supports the study of static emplary NWB workflows from differ- science, physics, economics, and other
and dynamic networks in biomedi- ent application domains: areas further increases our under-
cine, physics, social science, and other ˲˲ Error-tolerance and attack-toler- standing of the structure and dynam-
research areas. It uses 39 OSGi plug- ance analysis in physics and computer ics of science.2,5,16 The tool supports the
ins and 18 CIShell plug-ins as its core science requires loading or modeling study of science at micro (individual),
architecture; two of them define the a network and deleting random nodes meso (institution, state), and global
functionality of the simple GUI in Fig- (such as by error) or deleting highly (all science, international) levels using
ure 1 (I), top left with the menu (I.a) for connected hub nodes (such as in an at- temporal, geospatial, topical, network-
users to load data and run algorithms tack); analyses, and visualization techniques
and tools. The Console (I.b) logs all ˲˲ Peer-to-peer network analysis in (http://sci2.wiki.cns.iu.edu).
data and algorithm operations, list- computer science can include simula- Algorithms needed for these analy-
ing acknowledgment information on tion of various networks and an analy- ses are developed in diverse areas of
authors, programmers, and documen- sis of their properties; science; for example, temporal-analy-
tation URLs for each algorithm. The ˲˲ Temporal text analysis in linguis- sis algorithms come from statistics and
Data Manager (I.d) displays all cur- tics, information science, and com- computer science; geospatial-analysis
rently loaded and available data sets. puter science might apply the burst-de- algorithms from geography and cartog-
A Scheduler (I.c) lets users keep track tection algorithm to identify a sudden raphy; semantic-analysis algorithms
of the progress of running algorithms. increase in the usage frequency of from cognitive science, linguistics, and
Worth noting is that the interface is words, with results visualized; machine learning; and network analy-
easily branded or even replaced (such ˲˲ Social-network analysis in social sis and modeling from social science,
as with a command-line interface). science, sociology, and scientometrics physics, economics, Internet studies,
The NWB tool includes 21 converter might compare properties of scholarly and epidemiology. These areas have
plug-ins that help load data into in- and friendship networks for the same highly contrasting preferences for data
memory objects or into formats the al- set of people; the scholarly network formats, programming languages, and
gorithms read behind the scenes. Most can be derived from publications and software licenses, yet the Sci2 tool pres-
relevant for users are the algorithm the friendship network from data ac- ents them all through a single com-
plug-ins that can be divided into algo- quired via questionnaires; mon interface thanks to its OSGi/CIS-
rithms for preprocessing (19), analysis ˲˲ Discrete network dynamics (biol- hell core. Moreover, new algorithms
(56), modeling (10), and visualization ogy) can be studied through the DND are added easily; in order to read a nov-
(19). Three standalone tools—Discrete tool, which bundles loading and mod- el data format, only one new converter
Network Dynamics (DND), GUESS, and eling a multistate discrete network must be implemented to convert the
GnuPlot—are available via the NWB model, to generate the model’s state- new format into an existing format.
menu system. GUESS is an exploratory space graph, analyze the attractors of Multiple workflows involve more
data-analysis-and-visualization tool for the state space, and generate a visual- data converters than algorithms, as
graphs and networks (http://graphex- ization of an attractor basin; and multiple converters are needed to
ploration.cond.org), as shown in Fig- ˲˲ Data conversion across sciences bridge output and input formats used
ure 1, II, containing a domain-specific can use multiple converter algorithms by consecutive algorithms. Workflows
embedded language called Gython to translate among more than 20 data are frequently rerun several times due
(an extension of Python, or more spe- formats. to imperfect input data, to optimize pa-
cifically Jython) that supports the cus- Most workflows require serial appli- rameter settings, or to compare differ-
tomization of graph designs. GnuPlot cation of algorithms developed in dif- ent algorithms. Thanks to the Sci2 tool,
is a portable, command-line-driven, ferent areas of science and contributed an analysis that once required weeks

64 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles

Figure 2. Exemplary Sci2 tool workflows: horizontal-bar-graph visualization of NSF funding for one investigator (I); circular layout
of a hierarchically clustered co-author network of network-science researchers, with zoom into Eugene Garfield’s network (II);
citation network of U.S. patents on RNAi and patents they cite, with highly cited patents labeled (III); and UCSD science base map
with overlay of publications by network-science researchers (IV).

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 65
contributed articles

by award type (such as Small Business


Innovation Research and Career);
Changing Scientific ˲˲ Looking for collaboration pat-

Landscape
terns among major network-science
researchers, the publications of four
major researchers were downloaded
As science becomes increasingly data driven and computational, as well as from the Web of Science by Thomson
collaborative and interdisciplinary, there is increased demand for tools that are easy to Reuters (http://wokinfo.com). The
extend, share, and customize:
• Star scientist → Research teams. Traditionally, science was driven by key scientists. data was then loaded into Sci2, the co-
Today, science is driven by collaborating co-author teams, often comprising experts author network extracted, the Blondel
from multiple disciplines and geospatial locations5,17; community-detection algorithm ap-
• Users → Contributors. Web 2.0 technologies empower users to contribute
to Wikipedia and exchange images, videos, and code via Fickr, YouTube, and
plied to extract hierarchical clusters
SourceForge.net. Wikispecies, WikiProfessionals, and WikiProteins combine wiki of the network, and the result laid out
and semantic technology to support real-time community annotation of scientific using the Circular Hierarchy visualiza-
data sets14; tion, with author names plotted in a
• Disciplinary → Cross-disciplinary. The best tools frequently borrow and
synergistically combine methods and techniques from different disciplines of circle and connecting lines represent-
science, empowering interdisciplinary and/or international teams of researchers, ing co-author links (see Figure 2, II).
practitioners, and educators to collectively fine-tune and interpret results; Two of the researchers share a com-
• Single specimen → Data streams. Microscopes and telescopes were originally used
to study one specimen at a time. Today, many researchers must make sense of
bined network, while the others are at
massive data streams of multiple data types and formats and of different dynamics the centers of unconnected networks.
and origin; and Also shown is a zoom into Eugene Gar-
• Static instrument → Evolving cyberinfrastructure. The importance of hardware
field’s network;
instruments that are static and expensive tends to decrease relative to software
˲˲ To understand what patents ex-
tools and services that are highly flexible and evolving to meet the needs of different
sciences. Some of the most successful tools and services are decentralized, ist on the topic of RNA interference
increasing scalability and fault tolerance. (RNAi) and how they built on prior
work, data was retrieved from the
Good software-development practices make it possible for “a million minds” to
design flexible, scalable software that can be used by many: Scholarly Database (http://sdb.cns.
iu.edu).6 Specifically, a query was run
•  odularity. Software modules with well-defined functionality accept contributions
M over all text in the U.S. patent data set
from multiple users reduce costs and increase flexibility in tool development,
augmentation, and customization;
covering 1976–2010. The U.S. Patent
• Standardization. Standards accelerate development, as existing code is leveraged, and Trademark Office citation table
helping pool resources, support interoperability, and ease migration from research was downloaded, read into the Sci2
code to production code and hence the transfer of research results into industry tool, the patent-citation network ex-
applications and products; and
• Open data and open code. The practice of making data sets and code freely available tracted, the “indegree” (number of
allows users to check, improve, and repurpose data and code, easing replication of citations within the set) of all patent
scientific studies. nodes calculated, and the network
displayed in GUESS (see Figure 2, III).
The network represents 37 patents
or months to set up and run can now sample studies are discussed here and (in red) matching the term RNAi and
be designed and optimized in a few included in Figure 2, I–IV: their and the 487 patents they cite (in
hours. Users can also share, rerun, and ˲˲ Funding portfolios (such as fund- orange). Nodes are size-coded by in-
improve automatically generated work ing received by investigators and degree (number of times a patent is
logs. Workflows designed, validated, institutions, as well as provided by cited); patents with at least five cita-
and published in peer-reviewed works agencies) can be plotted using a hori- tions are labeled by their patent num-
can be used by science-policy analysts zontal bar graph (HBG); for example, ber. One of the most highly cited is
and policymakers alike. As of January all funding for one researcher was no. 6506559 on “Genetic Inhibition by
2011, the Sci2 tool was being used by downloaded from the National Science Double-Stranded RNA”; and
the National Science Foundation, the Foundation Award Search site (http:// ˲˲ The topical coverage of publication
National Institutes of Health, the U.S. nsf.gov/awardsearch), loaded into Sci2, output is revealed using a base map of
Department of Energy, and private and visualized in HBG, as in Figure 2, science (such as the University of Cali-
foundations adding novel plug-ins and I. Each project is represented by a bar fornia, San Diego map in Figure 2, IV.).
workflows relevant for making deci- starting to the left at a certain state date The map represents 13 major disci-
sions involving science policy. and ending right at an end date, with plines of science in a variety of colors,
The Sci2 tool supports many differ- bar width representing project dura- further subdivided into 554 research
ent analyses and visualizations used tion. Bar-area size encodes a numeric areas. Papers are matched to research
to communicate results to a range of property (here total awarded dollar areas via their journal names. Multiple
stakeholders. Common workflows and amount), and equipment grants show journals are associated with each area,
references to peer-reviewed papers as narrow bars of significant height. A and highly interdisciplinary journals
are given in Börner et al.3 and the Sci2 label (here project name) is given to the (such as Nature and Science) are frac-
wiki (http://sci2.wiki.cns.iu.edu). Four left of the bar. Bars can be color-coded tionally associated with multiple areas.

66 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles

Circle size represents number of pa- ˲˲ Cytoscape (http://cytoscape.org).


pers published per research area; the Led by Trey Ideker at the Univer-
number of publications per discipline sity of California, San Diego, this
given below the map. The knowledge open-source bioinformatics soft-
input (such as in terms of read or cited
papers) and output (such as in terms As the functionality ware platform enables visualization
of molecular-interaction networks,
of published or funded papers) of an of OSGi/CIShell- gene-expression profiles, and other

based software
individual, institution, or country can state data.15 Inspired by a workshop on
be mapped to indicate core competen- software infrastructures in July 2007
cies. Most publication output of the
four network-science researchers is in
frameworks (https://nwb.slis.indiana.edu/events/
ivsi2007), Mike Smoot and Bruce W.
physics. improves, and as Herr implemented a proof-of-concept
These and many other Sci2 analyses
and corresponding visualizations are
the number and OSGi-based Cytoscape core several
months later; OSGi bundles are avail-
highly scalable; thousands of authors, diversity of data-set able at http://chianti.ucsd.edu/svn/
references, and projects can be viewed
simultaneously, and visualizations can and algorithm core3. Once the new Cytoscape 3.0
core is implemented (projected mid-
be saved in vector format for further plug-ins increases, 2011), sharing plug-ins between the
manipulation.
so too will the NWB tool and Cytoscape will be much
easier, thereby extending the function-
Macroscope Synergies
Just as the value of the earliest tele-
capabilities ality and utility of both;
˲˲ Taverna Workbench (http://taver-
phones increased in proportion to of custom na.org.uk). Developed by the myGrid
the number of people using them,
plug-and-play macroscopes gain value
macroscopes. team (http://mygrid.org.uk) led by
Carol Goble at the University of Man-
relative to the increase in their core chester, U.K., this suite of free open-
functionality; numbers of data-set and source software tools helps design and
algorithm plug-ins; and the research- execute workflows,12 allowing users
ers, educators, and practitioners using to integrate many different software
and advancing them. tools, including more than 8,000 Web
OSGi/CIShell-compliant plug-ins services from diverse domains, in-
can be shared among tools and proj- cluding chemistry, music, and social
ects; for example, network-analysis sciences. The workflows are designed
algorithms implemented for the NWB in the Taverna Workbench and can
tool can be shared as Java Archive files then be run on a Taverna Engine, in
through email or other means, saved the Workbench, on an external server,
in the plug-in directory of another in a portal, on a computational grid,
tool, and made available for execu- or on a compute cloud. Raven (a Tav-
tion in the menu system of that tool. erna-specific classloader and registry
Text-mining algorithms originally de- mechanism) supports an extensible
veloped in TEXTrend (discussed later) and flexible architecture (with approx-
can be plugged into the Sci2 tool to imately 20 plug-ins) but an imple-
support semantic analysis of scholarly mentation using an OSGi framework,
texts. Though National Science Foun- with alpha release was scheduled for
dation funding for the NWB tool for- February 2011. The myExperiment
mally ended in 2009, NWB’s function- (http://myexperiment.org) social Web
ality continues to increase, as plug-ins site supports the finding and sharing
developed for other tools become of workflows and provides special sup-
available. Even if no project or agency port for Taverna workflows9;
were to fund the OSGi/CIShell core for ˲˲ MAEviz (https://wiki.ncsa.uiuc.
some time, it would remain function- edu/display/MAE/Home). Managed by
al, due to it being lightweight and easy Shawn Hampton of the National Center
to maintain. Finally, the true value of for Supercomputing Applications, this
OSGi/CIShell is due to the continu- open-source, extensible software plat-
ously evolving algorithm filling and form supports seismic risk assessment
the “custom tools” continuously devel- based on Mid-America Earthquake
oped and shared by domain scientists. Center research in the Consequence-
Over the past five years, a number of Based Risk Management framework.10
projects have adopted OSGi (and in two It also uses the Eclipse Rich Client
cases, CIShell): Platform, including Equinox, a com-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 67
contributed articles

under development is Dyneta, which


uses OSGi/CIShell as its core to sup-
Desirable Features port the study of dynamically evolving
networks. The tool is able to generate
and Key Decisions networks corresponding to different
network models, execute a specific
The socio-technical design of plug-and-play software architectures involves major event chain on them, and analyze the
decisions based on domain requirements to arrive at powerful tools and services:
interplay of network structure and
• Division of labor. The design of a “core architecture” requires extensive computer dynamics at runtime. The tool will be
science expertise and a close collaboration with domain experts. Data-set and used to develop a theory of spreading
algorithm plug-ins—the “filling”—are typically provided by domain experts most
in networks (such as HIV infections
invested in the data and knowledgeable about the inner workings and utility of
different algorithms. The design of “custom tools” is best performed by domain and transmission of drug resistance).
experts, as only they have the expertise needed to bundle different plug-ins relevant An initial set of plug-ins is available
for diverse workflows. Technical manuals on how to use, improve, or extend the at http://egg.science.uva.nl/dynanets/
“core” need to be compiled by computer scientists, while data-set, algorithm, and
tool descriptions are written by domain experts; nightly/latest.
• Ease of use. As most plug-in contributions come from domain experts with limited As the functionality of OSGi/CIS-
programming skills, non-computer scientists must be able to contribute, share, and hell-based software frameworks im-
use plug-ins without having to write new code. What seems to work well is wizard- proves and the number and diversity
driven integration of algorithms and data sets, sharing of plug-ins through email
and online sites, deploying plug-ins by adding them to the “plug-in directory,” and of data-set and algorithm plug-ins in-
running them via a menu-driven user interface, as in word-processing systems and crease, so too will the capabilities of
Web browsers; custom macroscopes.
• Core vs. plug-ins. The “core architecture” and the plug-in filling can be implemented
as sets of plug-in bundles. Determining whether the graphical user interface,
interface menu, scheduler, and data manager should be part of the core or its filling Outlook
depends on the types of tools and services to be delivered; Instead of working at the Library of
• Plug-in content and interfaces. Should a plug-in be a single algorithm or an entire
Alexandria, the Large Hadron Col-
tool? What about data converters needed to make the output of one algorithm
compatible with the input of another algorithm? Should they be part of the lider, or any of the world’s largest
algorithm plug-in? Should they be packaged separately? What general interfaces optical telescopes, many researchers
are needed to communicate parameter settings, input, and output data? Answers have embraced Web 2.0 technology
are domain-specific, depending on existing tools and practices and the problems
as a way to access and share images
domain experts aim to solve;
• Supported (central) data models. Some tools (such as Cytoscape) use a central data and videos, along with data sets, al-
model to which all algorithms conform. Others (such as NWB and Sci2) support gorithms, and tools. They are learn-
many internal data models and provide an extensive set of data converters. The ing to navigate, manage, and utilize
former often speeds execution and visual rendering, and the latter eases integration
of new algorithms. In addition, most tools support an extensive set of input and the massive amounts of new data
output formats, since a tool that cannot read or write a desired data format is usually (streams), tools, services, results, and
of little use by domain experts; and expertise that become available every
• Supported platforms. Many domain experts are used to standalone tools (like MS
moment of every day. Computer sci-
Office and Adobe products) running on a specific operating system. A different
deployment (such as Web services) is necessary if the software is to be used via Web entists can help make this a produc-
interfaces. tive experience by empowering bi-
ologists, physicists, social scientists,
and others to share, reuse, combine,
ponent framework based on the OSGi incubator.apache.org/uima); the da- and extend existing algorithms and
standard (https://wiki.ncsa.uiuc.edu/ ta-mining, machine-learning, classi- tools across disciplinary and geospa-
display/MAE/OSGI+Plug-ins). fication, visualization toolset WEKA tial boundaries in support of scien-
˲˲ TEXTrend (http://textrend.org). Led (http://cs.waikato.ac.nz/ml/weka); tific discovery, product development,
by George Kampis at Eötvös Loránd Cytoscape; Arff2xgmml converter; R and education. Computer scientists
University, Budapest, Hungary, this (http://r-project.org) via iGgraph and will have succeeded in the design of
E.U.-funded project is developing a scripts (http://igraph.sourceforge. the “core architecture” if they are not
framework for flexible integration, net); yEd (http://yworks.com); and the needed for the filling or bundling of
configuration, and extension of plug- CFinder clique percolation-analysis- components into custom tools.
in-based components in support of and-visualization tool (http://cfinder. The Cyberinfrastructure for Net-
natural-language processing, classifi- org). In addition, TEXTrend extended work Science Center (http://cns.iu.edu)
cation/mining, and graph algorithms CIShell’s workflow support and now at Indiana University is working on
for analysis of business and govern- offers Web services to researchers. the following OSGi/CIShell core ex-
mental text corpuses with an inher- ˲˲ DynaNets (http://www.dynanets.org). tensions, as well as on more effective
ently temporal component.13 In 2009, Coordinated by Peter M.A. Sloot at the means for sharing data sets and algo-
TEXTrend adopted OSGi/CIShell as University of Amsterdam, The Nether- rithms via scholarly markets:
its core architecture and has since lands, DynaNets is an E.U.-funded proj- Modularity. The OSGi/CIShell core
added a number of plug-ins, includ- ect for studying and developing a new supports modularity at the algorithm
ing: the Unstructured Information paradigm of computing that employs level but not at the visualization level.
Management Architecture (http:// complex networks. One related tool Like the decomposition of workflows

68 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles

into algorithm plug-ins, it is algorith- version of Craigslist.org can help ease Network Workbench Tool: User Manual 1.0.0., 2009;
http://nwb.cns.iu.edu/Docs/NWBTool-Manual.pdf
mically possible to modularize visual- the sharing, navigation, and utilization 4. Börner, K., Chen, C., and Boyack, K.W. Visualizing
ization and interaction design. Future of scholarly data sets and algorithms, knowledge domains. In Annual Review of Information
Science & Technology, B. Cronin, Ed. Information
work will focus on developing “visu- reinforcing reputation mechanisms Today, Inc./American Society for Information Science
alization layers” supporting selection by, say, providing ways to cite and ac- and Technology, Medford, NJ, 2003, 179–255.
5. Börner, K., Dall’Asta, L., Ke, W., and Vespignani,
and combination of reference sys- knowledge users who share, highlight A. Studying the emerging global brain: Analyzing
tems, projections/distortions, graphic most downloaded and highest-rated and visualizing the impact of co-authorship teams.
Complexity (Special Issue on Understanding Complex
designs, clustering/grouping, and in- contributions, and offer other means Systems) 10, 4 (Mar./Apr. 2005), 57–67.
teractivity. for making data sets, algorithms, work- 6. Börner, K., Huang, W.B., Linnemeier, M., Duhon,
R.J., Phillips, P., Ma, N., Zoss, A., Guo, H., and Price,
Streaming data. The number of flows, and tutorials part of a valued M.A. Rete-Netzwerk-Red: Analyzing and visualizing
scholarly networks using the Network Workbench
data sets that are generated and must scholarly record. tool. Scientometrics 83, 3 (June 2010), 863-876.
be understood in real time is increas- 7. Cyberinfrastructure for Network Science Center.
Cyberinfrastructure Shell (CIShell) Algorithm
ing; examples are patient-surveillance Acknowledgments Developer’s Guide, 2009; http://cishell.wiki.cns.iu.edu
data streams and models of epidemics I would like to thank Micah Linnemei- 8. de Rosnay, J. Le Macroscope: Vers une Vision Globale.
Editions du Seuil. Harper & Row Publishers, Inc., New
that predict the numbers of suscepti- er and Russell J. Duhon for stimulating York, 1975.
ble, infected, and recovered individu- discussions and extensive comments. 9. De Roure, D., Goble, C., and Stevens, R. The design
and realisation of the myExperiment Virtual Research
als in a population over time. EpiC Bruce W. Herr II, George Kampis, Environment for Social Sharing of Workflows. Future-
tool development funded by the Na- Gregory J. E. Rawlins, Geoffrey Fox, Generation Computer Systems 25 (2009), 561–567.
10. Elnashai, A., Hampton, S., Lee, J.S., McLaren, T.,
tional Institutes of Health contributes Shawn Hampton, Carol Goble, Mike Myers, J. D., Navarro, C., Spencer, B., and Tolbert, N.
algorithms that read and/or output Smoot, Yanbo Han, and anonymous Architectural overview of MAEviz–HAZTURK. Journal
of Earthquake Engineering 12, 1 Suppl.2, 01 (2008),
streams of data tuples, enabling algo- reviewers provided valuable input 92–99.
rithms to emit their results as they run, and comments to an earlier draft. I 11. Herr II, B.W., Huang, W.B., Penumarthy, S., and
Börner, K. Designing highly flexible and usable
not only on completion. Data-graph vi- also thank the members of the Cyber- cyberinfrastructures for convergence. In Progress in
sualizations plot these tuple streams infrastructure for Network Science Convergence: Technologies for Human Wellbeing, W.S.
Bainbridge and M.C. Roco, Eds. Annals of the New York
in real time, resizing (shrinking) the Center (http://cns.iu.edu), the Net- Academy of Sciences, Boston, 2007, 161–179.
12. Hull, D., Wolstencroft, K., Stevens, R., Goble, C.,
temporal axis over time. work Workbench team (http://nwb. Pocock, M.R., Li, P., and Oinn, T. Taverna: A tool for
Web services. The OSGi/CIShell- cns.iu.edu), and Science of Science building and running workflows of services. Nucleic
Acids Research (Web Server Issue) 34, Suppl. 2 (July
based tools discussed here are stand- project team (http://sci2.cns.iu.edu) 1, 2006), W729–W732.
alone desktop applications support- for their contributions toward this 13. Kampis, G., Gulyas, L., Szaszi, Z., and Szakolczi, Z.
Dynamic social networks and the TEXTrend/CIShell
ing offline work on possibly sensitive work. Software development benefits framework. Presented at the Conference on Applied
data, using a GUI familiar to target greatly from the open-source commu- Social Network Analysis (University of Zürich, Aug.
27–28). ETH Zürich, Zürich, Switzerland, 2009.
users. However, some application do- nity. Full software credits are distrib- 14. Mons, B., Ashburner, M., Chicester, C., Van Mulligen, E.,
mains also benefit from online deploy- uted with the source, but I especially Weeber, M., den Dunnen, J., van Ommen, G.-J., Musen,
M., Cockerill, M., Hermjakob, H., Mons, A., Packer, A.,
ment of macroscopes. While the OSGi acknowledge Jython, JUNG, Prefuse, Pacheco, R., Lewis, S., Berkeley, A., Melton, W., Barris,
specification provides basic support GUESS, GnuPlot, and OSGi, as well as N., Wales, J., Mejissen, G., Moeller, E., Roes, P.J.,
Börner, K., and Bairoch, A. Calling on a million minds
for Web services, CIShell must still be Apache Derby, used in the Sci2 tool. for community annotation in WikiProteins. Genome
extended to make it easy for domain This research is based on work sup- Biology 9, 5 (2008), R89.
15. Shannon, P., Markiel, A., Ozier, O., Baliga, N.S., Wang,
scientists to design their own macro- ported by National Science Founda- J.T., Ramage, D., Amin, N., Schwikowski, B., and Ideker,
scope Web services. tion grants SBE-0738111, IIS-0513650, T. Cytoscape: A software environment for integrating
models of biomolecular interaction networks. Genome
Incentive design. Many domain and IIS-0534909 and National Insti- Research 13, 11 (2002), 2498–2504.
16. Shiffrin, R. and Börner, K. Mapping knowledge
experts have trouble trying to use an tutes of Health grants R21DA024259 domains. Proceedings of the National Academy of
evolving set of thousands of possibly and 5R01MH079068. Any opinions, Sciences 101, Suppl. 1 (Apr. 2004), 5183–5185.
17. Shneiderman, B. Science 2.0. Science 319, 5868 (Mar.
relevant data sets compiled for specific findings, and conclusions or recom- 2008), 1349–1350.
studies of inconsistent quality and cov- mendations expressed here are those 18. Shneiderman, B. The eyes have it: A task by data
type taxonomy for information visualizations. In
erage, saved in diverse formats, and of the author and do not necessarily Proceedings of the IEEE Symposium on Visual
tagged using terminology specific to reflect the views of the National Sci- Languages (Boulder, CO, Sept. 3–6). IEEE Computer
Society, Washington, D.C., 1996, 336–343.
the original research domains. In addi- ence Foundation. 19. Thomas, J.J. and Cook, K.A., Eds. Illuminating the
tion, thousands of algorithms that sup- Path: The Research and Development Agenda for
Visual Analytics. National Visualization and Analytics
port different functionality and diverse Center, Richland, WA, 2005; http://nvac.pnl.gov/
References
input and output formats are written 1. Atkins, D.E., Drogemeier, K.K., Feldman, S.I., Garcia-
agenda.stm
20. World Bank and International Monetary Fund. Global
in different languages by students and Molina, H., Klein, M.L., Messerschmitt, D.G., Messian,
Monitoring Report 2009: A Development Emergency.
P., Ostriker, J.P., and Wright, M.H. Revolutionizing
experts in a range of scientific domains The World Bank, Washington, D.C., 2009.
Science and Engineering Through Cyberinfrastructure.
and packaged as algorithms or tools Report of the National Science Foundation Blue-Ribbon
Advisory Panel on Cyberinfrastructure. National
using diverse licenses. More-effective Science Foundation, Arlington, VA, 2003. Katy Börner (katy@indiana.edu) is the Victor H. Yngve
2. Börner, Katy. Atlas of Science: Visualizing What Professor of Information Science at the School of Library
means are needed to help domain ex- and Information Science, Adjunct Professor at the School
We Know. MIT Press, Cambridge, MA, 2010;
perts find the data sets and algorithms supplemental material at http://scimaps.org/atlas of Informatics and Computing, and Founding Director
3. Börner, K., Barabási, A.-L., Schnell, S., Vespignani, A., of the Cyberinfrastructure for Network Science Center
most relevant for their work, bundle Wasserman, S., Wernert, E.A., Balcan, D., Beiró, M., (http://cns.iu.edu) at Indiana University, Bloomington, IN.
them into efficient workflows, and re- Biberstine, J., Duhon, R.J., Fortunato, S., Herr II, B.W.,
Hidalgo, C.A., Huang, W.B., Kelley, T., Linnemeier, M.W.,
late the results to existing work. Schol- McCranie, A., Markines, B., Phillips, P., Ramawat, M.,
arly markets resembling a Web 2.0 Sabbineni, R., Tank, C., Terkhorn, F., and Thakre, V. © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 69
contributed articles
doi:10.1145/1897852.1897872
knowledge, we examine a variety of
Effective countermeasures depend on first scams, distilling some general prin-
ciples of human behavior that explain
understanding how users naturally fall victim why the scams work; we then show
to fraudsters. how they also apply to broader attacks
on computer systems insofar as they
By Frank Stajano and Paul Wilson involve humans. Awareness of the as-
pects of human psychology exploited

Understanding
by con artists helps not only the public
avoid these particular scams but also
security engineers build more robust
systems.

Scam Victims:
Over nine series of the BBC TV doc-
umentary The Real Hustle (http://www.
bbc.co.uk/realhustle/) Paul Wilson and
Alexis Conran researched the scams

Seven
most commonly carried out in Britain
and, with Jessica-Jane Clement, rep-
licated hundreds of them on unsus-
pecting victims while filming the ac-

Principles
tion with hidden cameras. The victims
were later debriefed, given their money
back, and asked for their consent to
publish the footage so others would

for Systems
learn not to fall for the same scams (see
the sidebar “Representative Scams” to
which we refer throughout the main
text.)
The objective of the TV show was to

Security
help viewers avoid being ripped off by
similar scams. Can security research-
ers do more? By carefully dissecting
dozens of scams, we extracted seven
recurring behavioral patterns and re-
lated principles exhibited by victims
and exploited by hustlers. They are
not merely small-scale opportunistic
scams (known as “short cons”) but in-

From a h olisti c security engineering point of view, key insights


real-world systems are often vulnerable to attack W
 e observed and documented hundreds
of frauds, but almost all of them can be
despite being protected by elaborate technical reduced to a handful of general principles
safeguards. The weakest point in any security- that explain what victims fall for.

strengthened system is usually its human element; an T hese principles cause vulnerabilities
in computer systems but were exploited
attack is possible because the designers thought only by fraudsters for centuries before
computers were invented and are rooted
about their strategy for responding to threats, without in human nature.
anticipating how real users would react. U sers fall prey to these principles not
We need to understand how users behave and what because they are gullible but because
they are human. Instead of blaming
traits of that behavior make them vulnerable, then users, understand that these inherent
vulnerabilities exist, then make your
design systems security around them. To gain this system robust despite them.

70 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
herent security vulnerabilities of the involving sleight of hand, including “419,” or Nigerian, scam. The hustler,
human element in any complex sys- pickpocketing and the special “throw” posing as a Nigerian government of-
tem. The security engineer must un- found in the Monte. ficer with access to tens of millions
derstand them thoroughly and consid- The very presence of “sexy swindler” of dollars of dodgy money, wants the
er their implications toward computer Jess among the hustlers owes to Dis- mark to help transfer the money out
and system security. traction, as well as to Need and Greed of the country in exchange for a slice
(discussed later), since sex is such a of it. When the mark accepts the deal,
Distraction Principle fundamental human drive. The 2000 the hustler demands some amount of
While we are distracted by what grabs computer worm “ILOVEYOU,” which advance money to cover expenses. New
our interest, hustlers can do anything to reportedly caused $5 billion–$8 billion unexpected expenses come up repeat-
us and we won’t notice. damage worldwide, exploited these edly, always with the promise that the
The young lady who falls prey to two principles. money is just about to be transferred.
the recruitment scam is so engrossed In computing, the well-known ten- These “convincers” keep the mark
in her job-finding task that she totally sion between security and usability is focused solely on the huge sum he is
fails to even suspect that the whole also related to Distraction. Users care promised to receive.
agency might be a fraud. only about what they want to access Are only unsophisticated 419 vic-
Distraction is at the heart of innu- and are essentially blind to the fact that tims gullible? Abagnale1 showed the
Photogra ph by C ollin Pa rk er

merable fraud scenarios. It is also a “the annoying security gobbledygook” Distraction principle works equally
fundamental ingredient of most magic is there to protect them. Smart crooks well on highly educated CTOs and
performances,5 which is not surpris- exploit this mismatch to their advan- CIOs. In 1999, he visited a company full
ing if we see such performances as a tage; a lock that is inconvenient to use of programmers frantically fixing code
“benign fraud” for entertainment pur- is often left open. to avert the Y2K bug. He asked the exec-
poses. Distraction is used in all cases Distraction also plays a role in the utives how they found all the program-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 71
contributed articles

mers and was told “these guys from Social Compliance is the founda- “Yes, the game looks difficult, but I
India” knew computers well and were tion for phishing. For example our did guess where the winning disc was,
inexpensive. But, Abagnale thought, banks, which hold all our money, order even if that guy lost.” Shills are a key
any dishonest programmer from an us to type our password, and, naturally, ingredient.
offshore firm fixing Y2K problems we do. It’s difficult to fault nontechni- In online auctions, a variety of
could also easily implant a backdoor… cal users on this one if they fail to no- frauds are possible if bidders are in
People focused on what they want to tice the site was only a lookalike. Note cahoots with the auctioneer. EBay pio-
do are distracted from the task of pro- the conflict between a bank’s security neered a reputation system in which
tecting themselves. Security engineers department telling customers “never bidders and auctioneers rate each oth-
who don’t understand this principle click on email links” and the marketing er through public feedback. But fraud-
have already lost the battle. department of the same bank sending sters might boost their reputations
them clickable email advertisements through successful transactions with
Social Compliance Principle for new financial products, putting the shills. Basic reputation systems are
Society trains people to not question au- customers in double jeopardy. largely ineffective against shills.
thority. Hustlers exploit this “suspension System architects must coherently In online communities and social
of suspiciousness” to make us do what align incentives and liabilities with networks, multiple aliases created by
they want. overall system goals. If users are ex- certain participants to give the impres-
The jeweler in a jewelry-shop scam pected to perform sanity checks rather sion that others share their opinions
gratefully hands over necklace and than blindly follow orders, then social are indicated as “sock-puppets.” In po-
cash when “policeman” Alex says protocols must allow “challenging the litical elections, introducing fake iden-
they’re needed as evidence, believing authority”; if, on the contrary, users are tities to simulate grass-roots support
him saying they’ll be returned later. expected to obey authority unquestion- for a candidate is called “astroturfing.”
Access control to sensitive databas- ingly, those with authority must relieve In reputation systems in peer-to-peer
es may involve an exploitable human them of liability if they obey a fraud- networks, as opposed to reputation
element. For example, social-engi- ster. The fight against phishing and all systems in human communities, mul-
neering-expert Mitnick7 impersonates other forms of social engineering can tiple entities controlled by the same
a policeman to nothing less than a never be won unless this principle is attacker are called “Sybils.” The variety
law-enforcement agency. He builds understood. of terms created for different contexts
up credibility and trust by exhibiting testifies to the wide applicability of the
knowledge of the lingo, procedures, Herd Principle Herd principle to many kinds of multi-
and phone numbers. He makes the Even suspicious marks let their guard user systems.
clerk consult the National Crime Infor- down when everyone around them ap-
mation Center database and acquires pears to share the same risks. Safety in Dishonesty Principle
confidential information about a cho- numbers? Not if they’re all conspiring Our own inner larceny is what hooks us
sen victim. His insightful observation against us. initially. Thereafter, anything illegal we
is that the police and military, far from In the Monte, most participants are do will be used against us by fraudsters.
being a tougher target, are inherently shills. The whole game is set up to give In the Monte, the shills encour-
more vulnerable to social engineering the mark confidence and make him age the mark to cheat the operator
as a consequence of their strongly in- think: “Yes, the game looks dodgy, but and even help him do it. Then, having
grained respect for rank. other people are winning money,” and fleeced the mark, the operator pre-
tends to notice the mark’s attempt at
Principles to which victims respond, as identified by three sets of researchers. cheating, using it as a reason for clos-
ing the game without giving him a
chance to argue.
Cialdini Lea et al. Stajano-Wilson When hustlers sell stolen goods,
Principle (1985–2009) (2009) (2009) the implied message is “It’s illegal;
Distraction ~ 
that’s why you’re getting such a good
Social Compliance (a.k.a. “Authority”)   
deal,” so marks won’t go to the police
Herd (a.k.a. “Social Proof”)  
once they discover they’ve been had.
Dishonesty 
The Dishonesty Principle is at the
Kindness ~ 
core of the 419; once a mark realizes
Need and Greed (a.k.a. “Visceral Triggers”) ~  
it’s a scam, calling the police is scary
Scarcity (related to our “Time”)   ~ because the mark’s part of the deal
Commitment and Consistency  
(essentially money laundering) was in
Reciprocation  ~ itself illegal and punishable. Several
 First identified this principle victims have gone bankrupt, and some
 Also lists this principle have even committed suicide, seeing
~ Lists a related principle
no way out of this tunnel.
The security engineer must be
aware of the Dishonesty Principle. A

72 co mmunications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
contributed articles

number of attacks on the system go his or her personal situation; if the Time Principle
unreported because the victims won’t mark is on the verge of bankruptcy, When under time pressure to make an
confess to their “evil” part in the pro- needs major surgery, or is otherwise important choice, we use a different de-
cess. When corporate users fall prey in dire straits, then questioning the cision strategy, and hustlers steer us to-
to a Trojan horse program purporting offer of a solution is very difficult. In ward one involving less reasoning.
to offer, say, free access to porn, they such cases the mark is not greedy, just In the ring-reward rip-off, the mark
have strong incentives not to cooper- depressed and hopeful. If someone is made to believe he must act quickly
ate with the forensic investigations of prays every day for an answer, an email or lose the opportunity. When caught
system administrators to avoid the as- message from a Nigerian Prince might in such a trap, it’s very difficult for
sociated stigma, even if the incident seem like the heaven-sent solution. people to stop and assess the situation
affected the security of the whole cor- The inclusion of sexual appetite as properly.
porate network. Executives for whom a fundamental human need justifies, Unlike the theory of rational choice,
righteousness is not as important as through this principle, the presence that is, that humans take their deci-
the security of their enterprise might of a “sexy swindler” in most scams en- sion after seeking the optimal solution
consider reflecting such priorities in acted by "the trio." As noted, the Need based on all the available information,
the corporate security policy, perhaps and Greed Principle and the Distrac- Simon8 suggested that “organisms
by guaranteeing discretion and immu- tion Principle are often connected; adapt well enough to ‘satisfice’; they do
nity from “internal prosecution” for victims are distracted by (and toward) not, in general, ‘optimize’.”
victims who cooperate with forensic that which they desire. This drive is ex- They may “satisfice,” or reach a
investigations. ploited by a vast proportion of fraudu- “good-enough” solution, through sim-
lent email messages (such as those plifying heuristics rather than the com-
Kindness Principle involving length enhancers, dates with plex, reasoned strategies needed for
People are fundamentally nice and will- attractive prospects, viruses, and Tro- finding the best solution, despite heu-
ing to help. Hustlers shamelessly take ad- jans, including ILOVEYOU). ristics occasionally failing, as studied
vantage of it. An enlightened system administra- by Tversky and Kahneman.10
This principle is, in some sense, the tor once unofficially provided a few Though hustlers may have never
dual of the Dishonesty Principle, as gigabytes of soft porn on an intranet formally studied the psychology of de-
perfectly demonstrated by the Good Sa- server in order to make it unnecessary cision making, they intuitively under-
maritan scam. In it, marks are hustled for local users to go looking for such stand the shift. They know that, when
primarily because they volunteer to material on dodgy sites outside the forced to take a decision quickly, a
help. It is loosely related to Cialdini’s corporate firewall, thereby reducing at mark will not think clearly, acting on
Reciprocation Principle (people return the same time connection charges and impulse according to predictable pat-
favors)2 but applies even in the absence exposure to malware. terns. So they make their marks an of-
of a “first move” from the hustler. A va- If we want to con someone, all we fer they can’t refuse, making it clear
riety of scams that propagate through need to know is what they want, even to them that it’s their only chance to
email or social networks involve tear- if it doesn’t exist. If security engineers accept it. This pattern is evident in
jerking personal stories or follow disas- do not understand what users want, the 419 scam and in phishing (“You’ll
ter news (tsunami, earthquake, hurri- and that they want it so badly they’ll go lose access to your bank account if you
cane), taking advantage of the generous to any lengths to get it, then they won’t don’t confirm your credentials imme-
but naïve recipients following their understand what drives users and diately”) but also in various email of-
spontaneous kindness before suspect- won’t be able to predict their behavior. fers and limited-time discounts in the
ing anything. Many “social engineer- Engineers always lose against fraud- gray area between acceptable market-
ing” penetrations of computer systems7 sters who do understand how they can ing techniques and outright swindle.
also rely on victims’ innate helpfulness. lead their marks. This brings us back As modern computerized marketing
to the security/usability trade-off: Lec- relies more and more on profiling indi-
Need and Greed Principle turing users about disabling ActiveX vidual consumers to figure out how to
Our needs and desires make us vulner- or Flash or Javascript from untrusted press their buttons, we might periodi-
able. Once hustlers know what we want, sites is pointless if these software com- cally have to revise our opinions about
they can easily manipulate us. ponents are required to access what us- which sales methods, while not yet ille-
Loewenstein4 speaks of “visceral ers want or need (such as their online gal, are ethically acceptable.
factors such as the cravings associated social network site or online banking From a systems point of view, the
with drug addiction, drive states (such site or online tax return site). Fraud- Time Principle is particularly impor-
as hunger, thirst, and sexual desire), sters must merely promise some entic- tant, highlighting that, due to the hu-
moods and emotions, and physical ing content to enroll users as unwitting man element, the system’s response
pain.” We say “Need and Greed” to re- accomplices who unlock the doors to the same stimulus may be radically
fer to this spectrum of human needs from inside. different depending on the urgency
and desires—all the stuff we really The defense strategy should also in- with which it is requested. In military
want, regardless of moral judgement. clude user education; as the Real Hustle contexts this is taken into account by
In the 419 scam, what matters most is TV show often says, “If it sounds too wrapping dangerous situations that re-
not necessarily the mark’s greed but good to be true, it probably is.” quire rapid response (such as challeng-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 73
contributed articles

Representative Scams
Since 2006, the Real Hustle TV show operator undetectably switches two operator has made the switch. More
has recreated hundreds of scams during cards. One might therefore imagine the important, even if the cards were marked
which Paul, Alex, and Jess defrauded basic scam to consist of performing a few in some way, there is absolutely no way
unsuspecting victims before hidden “demo runs” where marks are allowed to for a legitimate player to secure a win;
cameras. Here are five instructive ones: guess correctly, then have them bet with should a mark consistently bet on the
In the lingo of this peculiar “trade,” real money and at that point send the correct position, then other players,
the victim of the scam is the mark, the winning card elsewhere. actually shills, would over-bet him,
perpetrator is the operator, and any But this so-called “game” is really a “forcing” the operator to take the larger
accomplice pretending to be a regular cleverly structured piece of street theater bet. This frustrates the mark, who
customer is a shill. designed to attract passersby and hook often increases his bet to avoid being
Monte. This classic scam involves them into the action. The sleight-of-hand topped. One shill will then pretend to
an operator manipulating three cards element is actually least important; it is help the mark by bending a corner of
(or disks or shells: there are many the way marks are manipulated, rather the winning card while the operator is
variations), one of which wins, while the than the props, that brings in the money. distracted, making the mark think he has
other two lose. The operator shows the It’s all about the crowd of onlookers and an unbeatable advantage. This is a very
player the cards, turns them over face players (all shills) betting in a frenzy and strong play; marks have been seen to drop
down, then moves them around on the irresistibly sucking marks into wanting a thousands of dollars only to find the bent
table in full view. Players must follow the piece of the action. card is actually a loser. While mixing the
moves and put money on the card they The Monte is an excellent example cards, it is possible for a skilled operator
believe to be the winner. The operator that nothing is what it seems, even if the to switch the cards and switch the bend
pays out an equal amount if the player marks think they know what to expect. from one card to another.
guessed correctly or otherwise pockets Many people claim to be able to beat the The idea that one can beat the game
the player’s money. game, purely because they understand at all reveals a key misunderstanding—
Technically, at the core of the scam the mechanics of the secret move. But it’s that, in fact, it is not a game in the first
is a sleight-of-hand trick whereby the impossible to tell whether an experienced place. Monte mobs never pay out to the

All images c ourtesy o f Obj ect iv e Pro d ucti ons


From right to left: Paul, with Alex as a shill, scams two From right to left: Paul and Alex haggle with the mark Alex, flashing a fake police badge, pretends to arrest
marks at the three-shells game (one of several variants over the reward in the Ring Reward Rip-off. Jess in the Jewelry Shop Scam.
of the Monte).

ing strangers at a checkpoint or being Related Work findings are in substantial agreement.
ordered to launch a nuclear missile) While a few narrative accounts of The table here summarizes and com-
in special “human protocols” meant scams and frauds are available, from pares the principles identified in each
to enforce, even under time pres- Maurer’s study of the criminal world6 of these works.
sure, some of the step-by-step rational that inspired the 1973 movie The Sting
checks the heuristic strategy would to the autobiographical works of no- Conclusion
otherwise omit. table fraudsters,1,7 the literature con- We supported our thesis—that systems
The security architect must identify tains little about systematic studies of involving people can be made secure
the situations in which the humans in fraudsters’ psychological techniques. only if designers understand and ac-
the system may suddenly be put un- But we found two notable exceptions: knowledge the inherent vulnerabili-
der time pressure by an attacker and Cialdini’s outstanding book Influ- ties of the “human factor”—with three
whether the resulting switch in deci- ence: Science and Practice,2 based on main contributions:
sion strategy might open a vulnerabil- undercover field research, revealed First is a vast body of original re-
ity. This directive applies to anything how salespeople’s “weapons of influ- search on scams, initially put together
from retail situations to stock trading ence” are remarkably similar to those by Wilson and Conran. It started as a TV
and online auctions and from admit- of fraudsters; indeed, all of his prin- show, not as a controlled scientific ex-
ting visitors into buildings to handling ciples apply to our scenario and vice periment, but our representative write-
medical emergencies. Devising a hu- versa. Meanwhile, Lea et al.3 examined up9 still offers valuable firsthand data
man protocol to guide and pace the re- postal scams, based on a wealth of ex- not otherwise available in the literature;
sponse of the potential victim toward perimental data, including interviews Second, from these hundreds of
the desired goal may be an adequate with victims and lexical analysis of scams, we abstracted seven principles.
safeguard and also relieve the victim fraudulent letters. Even though our The particular principles are not that
from stressful responsibility. approaches were quite different, our important, and others have found

74 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


contributed articles

marks; they keep all the money moving a reward. The barman gets back to the and, crucially, the necklace, which will,
between the shills and the operator. The phone and Jess, very relieved to hear the of course, “be returned.” The jeweler is
marks are allowed to place a bet only if ring is there, says, without prompting, extremely grateful the cops saved her
it’s already a loser. Having studied Monte that she’ll give $200 to the person who from the evil fraudster.
all over the world, we can say it’s nothing found it. But the barman goes back to Ironically, as Jess is taken away in
short of a polite way to mug people. Paul and says the reward is only $20. handcuffs, the upset jeweler spits out a
Ring reward rip-off. The gorgeous That’s when the hustlers know they’ve venomous “Bitch! You could have cost
Jess buys a cheap ring from a market got him; he’s trying to make some profit me my job. You know that?”
stall for $5. She then goes to a pub and for himself. Paul haggles a bit and Recruitment scam. Hustlers set up a
seductively befriends the barman (the eventually returns the ring to the barman fake recruitment agency and, as part of
mark). She makes it obvious she’s very for $50. The mark is all too happy to the sign-on procedure, collect all of the
rich; showing off to her friend (a shill), advance the money to Paul, expecting to applicants’ personal details, including
she makes sure the mark overhears that get much more from Jess. Jess, of course, mother’s maiden name, date of birth,
she just received this amazing $3,500 never calls back. bank-account details, passport number,
diamond ring for her birthday. She then A convicted criminal proudly says even PIN—by asking them to protect
leaves. he once made a $2,000 profit with this their data with a four-digit code, as many
Paul and Alex arrive at the pub, particular hustle. people memorize only one PIN and
posing as two blokes having a pint. Jewelry-shop scam. Jess attempts use it for everything. With this loot, the
Jess then phones the pub, very worried, to buy an expensive necklace but is hustlers are free to engage in identity
calling her friend the barman by name, “arrested” by Alex and Paul posing theft on everyone who came in for an
saying she lost that very precious ring. as plainclothes police officers who interview.
Could he check if it’s there somewhere? expose her as a well-known fraudster, Good Samaritan scam. In a parking
The mark checks, and, luckily, a notorious for paying with counterfeit lot, Jess has jacked up her car but seems
customer (Paul) found the ring. However, cash. The “cops” collect as evidence the stuck. When another car stops nearby,
instead of handing it over, Paul demands “counterfeit” (actually genuine) cash she politely asks the newcomers to
help her change the tire, which they do.
Apologizing for her cheekiness, she then
also asks them if she could get into their
car, as she’s been out in the cold for a
while and is freezing. The gentleman gives
her the keys to his car (required to turn on
the heat) and, while the marks are busy
changing her tire, she drives off with the
car. But didn’t Jess just lose her original
car? No, because it wasn’t hers to start
with; she just jacked up a random one in
A mark, debriefed by accompanying TV crew, is From right to left: Jess gets two marks to change her the parking lot. To add insult to injury, the
dismayed to learn the hustlers just got hold of all her tire before tricking them into handing over their own car marks will also have some explaining to
sensitive personal details in the Recruitment Scam. keys in the Good Samaritan Scam. do when the real owners of the car arrive.

slightly different ones. What matters Acknowledgments Processes 65, 3 (Mar. 1996), 272–292.
5. Macknik, S.L., King, M., Randi, J., Robbins, A., Teller,
is recognizing the existence of a small Special thanks to Alex Conran for Thompson, J., and Martinez-Conde, S. Attention
set of behavioral patterns that ordinary co-writing the TV series and to Alex and awareness in stage magic: Turning tricks into
research. Nature Reviews Neuroscience 9, 11 (Nov.
people exhibit and that hustlers have and Jess Clement for co-starring in it. 2008), 871–879.
been exploiting forever; and Thanks to Joe Bonneau, danah boyd, 6. Maurer, D.W. The Big Con: The Story of the Confidence
Man. Bobbs-Merrill, New York, 1940.
Third, perhaps most significant, we Omar Choudary, Saar Drimer, Jeff 7. Mitnick, K.D. The Art of Deception: Controlling the
applied the principles to a more general Hancock, David Livingstone Smith, Human Element of Security. John Wiley & Sons, Inc.,
New York, 2002.
systems point of view. The behavioral Ford-Long Wong, Ross Anderson, 8. Simon, H.A. Rational choice and the structure of the
patterns are not just opportunities for Stuart Wray, and especially Roberto environment. Psychological Review 63, 2 (Mar. 1956),
129–138.
small-scale hustles but also vulnerabili- Viviani for useful comments on previ- 9. Stajano, F. and Wilson, P. Understanding Scam Victims:
Seven Principles for Systems Security. Technical
ties of the human component of any ous drafts. This article is updated and Report UCAM-CL-TR-754. University of Cambridge
complex system. abridged from the 2009 technical re- Computer Laboratory, Cambridge, U.K, 2009.
10. Tversky, A. and Kahneman, D. Judgment under
Our message for the system-security port9 by the same authors. uncertainty: Heuristics and biases. Science 185, 4157
architect is that it is naïve to lay blame (Sept. 1974), 1124–1131.
on users and whine, “The system I de- References
signed would be secure, if only users 1. Abagnale, F.W. The Art of the Steal: How to Protect Frank Stajano (frank.stajano@cl.cam.ac.uk) is a
Yourself and Your Business from Fraud. Broadway university senior lecturer in the Computer Laboratory of
were less gullible.” The wise security Books, New York, 2001. the University of Cambridge, Cambridge, U.K.
designer seeking a robust solution will 2. Cialdini, R.B. Influence: Science and Practice, Fifth
Edition. Pearson, Boston, MA, 2009; (First Edition
acknowledge the existence of these vul- 1985). Paul Wilson (info@conartist.tv) is an expert on cheating,
3. Lea et al. The Psychology of Scams: Provoking and award-winning conjuror, and magic inventor. He works in
nerabilities as an unavoidable conse- Committing Errors of Judgement. Technical Report film and television in London and Los Angeles.
quence of human nature and actively OFT1070. University of Exeter School of Psychology.
Office of Fair Trading, London, U.K., May 2009.
build safeguards that prevent their ex- 4. Loewenstein, G. Out of control: Visceral influences on
ploitation. behavior. Organizational Behavior and Human Decision © 2011 ACM 0001-0782/11/0300 $10.00

m a r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o m m u n i c at i o n s o f t he acm 75
review articles
doi:10.1145/1897852.1897873
problems with regular, slow-changing
The advent of multicore processors as the (or even static) communication and
coordination patterns. Such problems
standard computing platform will force major arise in scientific computing or in
changes in software design. graphics, but rarely in systems.
The future promises us multiple
by Nir Shavit cores on anything from phones to lap-
tops, desktops, and servers, and there-

Data
fore a plethora of applications char-
acterized by complex, fast-changing
interactions and data exchanges.
Why are these dynamic interactions

Structures
and data exchanges a problem? The
formula we need in order to answer this
question is called Amdahl’s Law. It cap-
tures the idea that the extent to which

in the
we can speed up any complex computa-
tion is limited by how much of the com-
putation must be executed sequentially.
Define the speedup S of a computa-

Multicore Age
tion to be the ratio between the time
it takes one processor to complete the
computation (as measured by a wall
clock) versus the time it takes n concur-
rent processors to complete the same
computation. Amdahl’s Law character-
izes the maximum speedup S that can
be achieved by n processors collaborat-
ing on an application, where p is the
fraction of the computation that can be
executed in parallel. Assume, for sim-
plicity, that it takes (normalized) time
1 for a single processor to complete the
computation. With n concurrent pro-
“Multicor e proce ssors ar e about to revolutionize cessors, the parallel part takes time p/n,
the way we design and use data structures.” and the sequential part takes time 1− p.
You might be skeptical of this statement; after Overall, the parallelized computation
p
takes time 1− p + n . Amdahl’s Law says
all, are multicore processors not a new class of the speedup, that is, the ratio between
multiprocessor machines running parallel programs,
just as we have been doing for more than a quarter key insights
of a century? 
We are experiencing a fundamental shift
in the properties required of concurrent
The answer is no. The revolution is partly due to data structures and of the algorithms at
the core of their implementation.
changes multicore processors introduce to parallel

The data structures of our childhood —
architectures; but mostly it is the result of the change stacks, queues, and heaps — will
soon disappear, replaced by looser
in the applications that are being parallelized:
Illustratio n by A ndy Gilmo re

“unordered” concurrent constructs


multicore processors are bringing parallelism to based on distribution and randomization.

mainstream computing. 
Future software engineers will need
to learn how to program using these
Before the introduction of multicore processors, novel structures, understanding
their performance benefits and their
parallelism was largely dedicated to computational fairness limitations.

76 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


credi t t k

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 77
review articles

the sequential (single-processor) time The safety aspects of concurrent complexity model requires us to con-
and the parallel time, is: data structures are complicated by the sider a new element: stalls.2,7–10 When
need to argue about the many possible threads concurrently access a shared
1
S= interleavings of methods called by dif- resource, one succeeds and others in-
p
1 –  p +  n ferent threads. It is infinitely easier and cur stalls. The overall complexity of
more intuitive for us humans to specify the algorithm, and hence the time it
In other words, S does not grow lin- how abstract data structures behave in might take to complete, is correlated
early in n. For example, given an ap- a sequential setting, where there are no to the number of operations together
plication and a 10-processor machine, interleavings. Thus, the standard ap- with the number of stalls (obviously
Amdahl’s Law says that even if we man- proach to arguing the safety properties this is a crude model that does not take
age to parallelize 90% of the applica- of a concurrent data structure is to spec- into account the details of cache co-
tion, but not the remaining 10%, then ify the structure’s properties sequential- herence). From an algorithmic design
we end up with a fivefold speedup, but ly, and find a way to map its concurrent point of view, this model introduces a
not a 10-fold speedup. Doubling the executions to these “correct” sequential continuum starting from centralized
number of cores to 20 will only raise us ones. There are various approaches for structures where all threads share data
to a sevenfold speedup. So the remain- doing this, called consistency condi- by accessing a small set of locations,
ing 10%, those we continue to execute tions. Some familiar conditions are se- incurring many stalls, to distributed
sequentially, cut our utilization of the rializability, linearizability, sequential structures with multiple locations, in
10 processor machine in half, and limit consistency, and quiescent consistency. which the number of stalls is greatly re-
us to a 10-fold speedup no matter how When considering liveness in a con- duced, yet the number of steps neces-
many cores we add. current setting, the good thing one ex- sary to properly share data and move it
What are the 10% we found difficult pects to happen is that method calls around increases significantly.
to parallelize? In many mainstream eventually complete. The terms un- How will the introduction of multi-
applications they are the parts of the der which liveness can be guaranteed core architectures affect the design of
program involving interthread inter- are called progress conditions. Some concurrent data structures? Unlike on
action and coordination, which on familiar conditions are deadlock- uniprocessors, the choice of algorithm
multicore machines are performed by freedom, starvation-freedom, lock- will continue, for years to come, to be
concurrently accessing shared data freedom, and wait-freedom. These greatly influenced by the underlying
structures. Amdahl’s Law tells us it is conditions capture the properties an machine’s architecture. In particular,
worthwhile to invest an effort to derive implementation requires from the un- this includes the number of cores,
as much parallelism as possible from derlying system scheduler in order to their layout with respect to memory
these 10%, and a key step on the way to guarantee that method calls complete. and to each other, and the added cost
doing so is to have highly parallel con- For example, deadlock-free implemen- of synchronization instructions (on a
current data structures. tations depend on strong scheduler multiprocessor, not all steps were cre-
Unfortunately, concurrent data support, while wait-free ones do all the ated equal).
structures are difficult to design. work themselves and are independent However, I expect the greatest
There is a kind of tension between of the scheduler. change we will see is that concurrent
correctness and performance: the Finally, we have the performance data structures will go through a sub-
more one tries to improve perfor- of our data structures to consider. His- stantiative “relaxation process.” As
mance, the more difficult it becomes torically, uniprocessors are modeled the number of cores grows, in each of
to reason about the resulting algo- as Turing machines, and one can ar- the categories mentioned, consistency
rithm as being correct. Some experts gue the theoretical complexity of data conditions, liveness conditions, and
blame the widely accepted threads- structure implementations on uni- the level of structural distribution, the
and-objects programming model processors by counting the number of requirements placed on the data struc-
(that is, threads communicating via steps—the machine instructions—that tures will have to be relaxed in order to
shared objects), and predict its even- method calls might take. There is an im- support scalability. This will put a bur-
tual demise will save us. My experi- mediate correlation between the theoret- den on programmers, forcing them to
ence with the alternatives suggests ical number of uniprocessor steps and understand the minimal conditions
this model is here to stay, at least the observed time a method will take. their applications require, and then
for the foreseeable future. So let us, In the multiprocessor setting, things use as relaxed a data structure as pos-
in this article, consider correctness are not that simple. In addition to the sible in the solution. It will also place a
and performance of data structures actual steps, one needs to consider burden on data structure designers to
on multicore machines within the whether steps by different threads re- deliver highly scalable structures once
threads-and-objects model. quire a shared resource or not, because the requirements are relaxed.
In the concurrent world, in contrast these resources have a bounded capac- This article is too short to allow a
to the sequential one, correctness has ity to handle simultaneous requests. survey of the various classes of concur-
two aspects: safety, guaranteeing that For example, multiple instructions ac- rent data structures (such a survey can
nothing bad happens, and liveness, cessing the same location in memory be found in Moir and Shavit17) and how
guaranteeing that eventually some- cannot be serviced at the same time. one can relax their definitions and im-
thing good will happen. In its simplest form, our theoretical plementations in order to make them

78 comm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


review articles

scale. Instead, let us focus here on one tion’s value is equal to the expected In Figure 1, the push() method cre-
abstract data structure—a stack—and value, then it is replaced by the update ates a new node and then calls try-
use it as an example of how the design value, and otherwise the value is left Push() to try to acquire the lock. If the
process might proceed. unchanged. The method call returns a CAS is successful, the lock is set to true
I use as a departure point the ac- Boolean indicating whether the value and the method swings the top refer-
ceptable sequentially specified notion changed. A typical CAS takes signifi- ence from the current top-of-stack to
of a Stack<T> object: a collection of cantly more machine cycles than a read its successor, and then releases the
items (of type T) that provides push() or a write, but luckily, the performance lock by setting it back to false. Other-
and pop() methods satisfying the last- of CAS is improving as new generations wise, the tryPush() lock acquisition
in-first-out (LIFO) property: the last of multicore processors role out. attempt is repeated. The pop() method
item pushed is the first to be popped.
We will follow a sequence of refine- Figure 1. A lock-based Stack<T>: in the push() method, threads alternate between
trying to push an item onto the stack and managing contention by backing off before
ment steps in the design of concurrent retrying after a failed push attempt.
versions of stacks. Each step will ex-
pose various design aspects and relax
1 public class LockBasedStack<T> {
some property of the implementation. 2 private AtomicBoolean lock =
My hope is that as we proceed, the read- 3 new AtomicBoolean(false);
er will grow to appreciate the complexi- 4 ...
ties involved in designing a correct 5 protected boolean tryPush(Node node) {
6 boolean gotLock = lock.compareAndSet(false, true);
scalable concurrent data-structure. 7 if (gotLock) {
8 Node oldTop = top;
A Lock-based Stack 9 node.next = oldTop;
10 top = node;
We begin with a LockBasedStack<T>
11 lock.set ( false );
implementation, whose Java pseudo- 12 }
code appears in figures 1 and 2. The 13 return gotLock;
pseudocode structure might seem a bit 14 }
15 public void push(T value) {
cumbersome at first, this is done in or- 16 Node node = new Node(value);
der to simplify the process of extending 17 while (true) {
it later on. 18 if (tryPush(node)) {
19 return;
The lock-based stack consists of a
20 } else {
linked list of nodes, each with value 21 contentionManager.backoff();
and next fields. A special top field 22 }
points to the first list node or is null if 23 }
24 }
the stack is empty. To help simplify the
presentation, we will assume it is illegal
to add a null value to a stack.
Access to the stack is controlled Figure 2. The lock-based Stack<T>: The pop() method alternates between trying to pop
by a single lock, and in this particular and backing off before the next attempt.
case a spin-lock: a software mechanism
in which a collection of competing 1 protected Node tryPop() throws EmptyException {
threads repeatedly attempt to choose 2 boolean gotLock = lock.compareAndSet(false, true);
exactly one of them to execute a section 3 if (gotLock) {
4 Node oldTop = top;
of code in a mutually exclusive man- 5 if (oldTop == null) {
ner. In other words, the winner that 6 lock . set ( false );
acquired the lock proceeds to execute 7 throw new EmptyException();
8 }
the code, while all the losers spin, wait-
9 top = oldTop.next;
ing for it to be released, so they can at- 10 return oldTop;
tempt to acquire it next. 11 lock . set ( false );
The lock implementation must en- 12 }
13 else return null ;
able threads to decide on a winner. This 14 }
is done using a special synchronization 15 public T pop() throws EmptyException {
instruction called a compareAndSet() 16 while (true) {
17 Node returnNode = tryPop();
(CAS), available in one form or another
18 if (returnNode != null) {
on all of today’s mainstream multicore 19 return returnNode.value ;
processors. The CAS operation executes 20 } else {
a read operation followed by a write op- 21 contentionManager.backoff();
22 }
eration, on a given memory location, in 23 }
one indivisible hardware step. It takes 24 }
two arguments: an expected value and
an update value. If the memory loca-

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 79
review articles

in Figure 2 calls tryPop(), which at- in which each push or pop take effect from the system, all threads accessing
tempts to acquire the lock and remove at some non-overlapping instant dur- the stack will be delayed whenever one
the first node from the stack. If it suc- ing their method calls. In particular, is preempted. Modern operating sys-
ceeds, it throws an exception if the we could think of them taking effect tems can deal with these issues, and
stack is empty, and otherwise it returns when the executing thread acquired will have to become even better at han-
the node referenced by top. If tryPop() the lock. Linearizability is a desired dling them in the future.
fails to acquire the lock it returns null property because linearizable objects In terms of progress, the locking
and is called again until it succeeds. can be composed without having to scheme is deadlock-free, that is, if sev-
What are the safety, liveness, and know anything about their actual im- eral threads all attempt to acquire the
performance properties of our imple- plementation. lock, one will succeed. But it is not
mentation? Well, because we use a But there is a price for this obvious starvation-free: some thread could be
single lock to protect the structure, it atomicity. The use of a lock introduces unlucky enough to always fail in its CAS
is obvious its behavior is “atomic” (the a dependency on the operating system: when attempting to acquire the lock.
technical term used for this is lineariz- we must assume the scheduler will not The centralized nature of the lock-
able15). In other words, the outcomes of involuntarily preempt threads (at least based stack implementation introduces
our concurrent execution are equiva- not for long periods) while they are a sequential bottleneck: only one thread
lent to those of a sequential execution holding the lock. Without such support at a time can complete the update of the
data structure’s state. This, Amdahl’s
Figure 3. The lock-free tryPush() and tryPop() methods. Law tells us, will have a very negative ef-
fect on scalability, and performance will
1 public class LockFreeStack<T> { not improve as the number of cores/
2 private AtomicReference<Node> top = threads increases.
3 new AtomicReference<Node>(null);
4 ...
But there is another separate phe-
5 nomenon here: memory contention.
6 protected boolean tryPush(Node node) { Threads failing their CAS attempts on
7 Node oldTop = top.get();
the lock retry the CAS again even while
8 node.next = oldTop;
9 return top.compareAndSet(oldTop, node); the lock is still held by the last CAS “win-
10 } ner” updating the stack. These repeated
11 attempts cause increased traffic on the
12 protected Node tryPop() throws EmptyException {
13 Node oldTop = top.get();
machine’s shared bus or interconnect.
14 if (oldTop == null) { Since these are bounded resources, the
15 throw new EmptyException(); result is an overall slowdown in per-
16 } formance, and in fact, as the number
17 Node newTop = oldTop.next;
18 if (top.compareAndSet(oldTop, newTop)) { of cores increases, we will see perfor-
19 return oldTop; mance deteriorate below that obtain-
20 } else { able on a single core. Luckily, we can
21 return null ;
22 }
deal with contention quite easily by add-
23 } ing a contention manager into the code
(Line 21 in figures 1 and 2).
The most popular type of conten-
tion manager is exponential backoff:
Figure 4. The EliminationBackoffStack<T>.
every time a CAS fails in tryPush() or
tryPop(), the thread delays for a cer-
C:pop() tain random time before attempting
the CAS again. A thread will double the
A:return(b)
range from which it picks the random
A:pop() C:return(d)
delay upon CAS failure, and will cut
it in half upon CAS success. The ran-
top domized nature of the backoff scheme
makes the timing of the thread’s at-
B:push(b) d e f tempts to acquire the lock less depen-
dent on the scheduler, reducing the
B:ok chance of threads falling into a repeti-
tive pattern in which they all try to CAS
at the same time and end up starving.
Each thread selects a random location in the array. If thread A’s pop() and thread B’s push() calls
Contention managers1,12,19 are key tools
arrive at the same location at about the same time, then they exchange values without accessing the in the design of multicore data struc-
shared lock-free stack. A thread C, that does not meet another thread, eventually pops the shared lock- tures, even when no locks are used, and
free stack. I expect them to play an even greater
role as the number of cores grows.

80 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
review articles

A Lock-Free Stack stack implementation scales poorly,


As noted, a drawback of our lock-based primarily because its single point of
implementation, and in fact, of lock- access forms a sequential bottleneck:
based algorithms in general, is that the method calls can proceed only one
scheduler must guarantee that threads
are preempted infrequently (or not at I expect the after the other, ordered by successful
CAS calls applied to the stack’s lock
all) while holding the locks. Otherwise,
other threads accessing the same locks
greatest change or top fields. A sad fact we should ac-
knowledge is this sequential bottle-
will be delayed, and performance will we will see is neck is inherent: in the worst case it
suffer. This dependency on the capri-
ciousness of the scheduler is particu-
that concurrent takes a thread at least Ω (n) steps and/or
stalls (recall, a stall is the delay a thread
larly problematic in hard real-time sys- data structures incurs when it must wait for another
tems where one requires a guarantee
on how long method calls will take to
will go through thread taking a step) to push or pop a
linearizable lock-free stack.9 In other
complete. a substantiative words, the theory tells us there is no
We can eliminate this dependency by
designing a lock-free stack implemen- “relaxation way to avoid this bottleneck by distrib-
uting the stack implementation over
tation.23 In the LockFreeStack<T>,
instead of acquiring a lock to manipu-
process.” multiple locations; there will always be
an execution of linear complexity.
late the stack, threads agree who can Surprisingly, though, we can intro-
modify it by directly applying a CAS to duce parallelism into many of the com-
the top variable. To do so, we only need mon case executions of a stack imple-
to modify the code for the tryPush() mentation. We do so by exploiting the
and tryPop() methods, as in Figure 3. following simple observation: if a push
As before, if unsuccessful, the method call is immediately followed by a pop
calls are repeated after backing off, just call, the stack’s state does not change;
as in the lock-based algorithm. the two calls eliminate each other and
A quick analysis shows the comple- it is as if both operations never hap-
tion of a push (respectively pop) meth- pened. By causing concurrent pushes
od call cannot be delayed by the preemp- and pops to meet and pair up in sepa-
tion of some thread: the stack’s state is rate memory locations, the thread call-
changed by a single CAS operation that ing push can exchange its value with a
either completes or not, leaving the thread calling pop, without ever having
stack ready for the next operation. Thus, to access the shared lock-free stack.
a thread can only be delayed by schedul- As depicted in Figure 4, in the
ing infinitely many calls that successful- EliminationBackoffStack<T>11
ly modify the top of the stack and cause one achieves this effect by adding an
the tryPush() to continuously fail. In EliminationArray to the lock-free
other words, the system as a whole will stack implementation. Each location
always make progress no matter what in the array is a coordination structure
the scheduler does. We call this form called an exchanger,16,18 an object that al-
of progress lock-freedom. In many data lows a pair of threads to rendezvous and
structures, having at least some of the exchange values.
structure’s methods be lock-free tends Threads pick random array entries
to improve overall performance. and try to pairup with complementary
It is easy to see that the lock-free operations. The calls exchange values
stack is linearizable: it behaves like a in the location in which they met, and
sequential stack whose methods “take return. A thread whose call cannot be
effect” at the points in time where their eliminated, either because it has failed
respective CAS on the top variable suc- to find a partner, or because it found a
ceeded (or threw the exception in case of partner with the wrong type of method
a pop on an empty stack). We can thus call (such as a push meeting a push),
compose this stack with other lineariz- can either try again to eliminate at a
able objects without worrying about the new location, or can access the shared
implementation details: as far as safety lock-free stack. The combined data
goes, there is no difference between the structure, array and stack, is lineariz-
lock-based and lock-free stacks. able because the lock-free stack is lin-
earizable, and we can think of the elim-
An Elimination Backoff Stack inated calls as if they occurred at the
Like the lock-based stack, the lock-free point in which they exchanged values.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 81
review articles

It is lock-free because we can easily cess the stack. Note that we described when the music stops. As we will see,
implement a lock-free exchanger using a lock-free implementation, but, as this relaxation will nevertheless pro-
a CAS operation, and the shared stack with many concurrent data structures, vide quite powerful semantics for the
itself is already lock-free. on some systems a lock-based imple- data structure. In particular, as with
In the EliminationBackoff- mentation might be more fitting and linearizability, quiescent consistency
Stack, the EliminationArray is deliver better performance. allows objects to be composed as black
used as a backoff scheme to a shared boxes without having to know anything
lock-free stack. Each thread first ac- An Elimination Tree about their actual implementation.
cesses the stack, and if it fails to com- A drawback of the elimination backoff Consider a binary tree of objects
plete its call (that is, the CAS attempt stack is that under very high loads the called balancers with a single input wire
on top fails) because there is conten- number of un-eliminated threads ac- and two output wires, as depicted in Fig-
tion, it attempts to eliminate using cessing the shared lock-free stack may ure 5. As threads arrive at a balancer, it
the array instead of simply backing off remain high, and these threads will con- repeatedly sends them to the top wire
in time. If it fails to eliminate, it calls tinue to have linear complexity. More- and then the bottom one, so its top wire
the lockfree stack again, and so on. A over, if we have, say, bursts of push calls always has one more thread than the
thread dynamically selects the sub- followed by bursts of pop calls, there bottom wire. The Tree[k] network is
range of the array within which it tries will again be no elimination and there- a binary tree of balancers constructed
to eliminate, growing and shrinking it fore no parallelism. The problem seems inductively by placing a balancer before
exponentially in response to the load. to be our insistence on having a lineariz- two Tree[k/2] networks of balancers
Picking a smaller subrange allows a able stack: we devised a distributed so- and not shuffling their outputs.22
greater chance of a successful rendez- lution that cuts down on the number of We add a collection of lock-free
vous when there are few threads, while stalls, but the theoretical worst case lin- stacks to the output wires of the tree.
a larger range lowers the chances of ear time scenario can happen too often. To perform a push, threads traverse the
threads waiting on a busy Exchanger This leads us to try an alternative balancers from the root to the leaves and
when the load is high. approach: relaxing the consistency then push the item onto the appropri-
In the worst case a thread can still condition for the stack. Instead of a ate stack. In any quiescent state, when
fail on both the stack and the elimi- linearizable stack, let’s implement a there are no threads in the tree, the out-
nation. However, if contention is low, quiescently consistent one.4,14 A stack put items are balanced out so that the
threads will quickly succeed in access- is quiescently consistent if in any exe- top stacks have at most one more than
ing the stack, and as it grows, there will cution, whenever there are no ongoing the bottom ones, and there are no gaps.
be a higher number of successful elim- push and pop calls, it meets the LIFO We can implement the balancers in
inations, allowing many operations to stack specification for all the calls that a straightforward way using a bit that
complete in parallel in only a constant preceded it. In other words, quiescent threads toggle: they fetch the bit and
number of steps. Moreover, contention consistency is like a game of musical then complement it (a CAS operation),
at the lock-free stack is reduced be- chairs, we map the object to the se- exiting on the output wire they fetched
cause eliminated operations never ac- quential specification when and only (zero or one). How do we perform
a pop? Magically, to perform a pop
Figure 5. A Tree[4] network leading to four lock-free stacks. threads traverse the balancers in the
opposite order of the push, that is, in
lock-free each balancer, after complementing
balancer top
the bit, they follow this complement,
5 3 1 5 1 lock-free stack the opposite of the bit they fetched.
1
wire 0 Try this; you will see that from one
2 2 quiescent state to the next, the items
5 1 1
4 removed are the last ones pushed onto
3 wire 1
3 the stack. We thus have a collection of
0 stacks that are accessed in parallel, yet
4 2 4 act as one quiescent LIFO stack.
The bad news is that our imple-
Threads pushing items arrive at the balancers in the order of their numbers, eventually pushing items onto mentation of the balancers using a bit
the stacks located on their output wires. In each balancer, a pushing thread fetches and then comple- means that every thread that enters the
ments the bit, following the wire indicated by the fetched value (If the state is 0 the pushing thread it
will change it to 1 and continue to wire 0, and if it was 1 will change it to 0 and continue on wire 1). The
tree accesses the same bit in the root
tree and stacks will end up in the balanced state seen in the figure. The state of the bits corresponds to balancer, causing that balancer to be-
5 being the last item, and the next location a pushed item will end up on is the lock-free stack containing come a bottleneck. This is true, though
item 2. Try it! A popping thread does the opposite of the pushing one: it complements the bit and follows
to a lesser extent, with balancers lower
the complemented value. Thus, if a thread executes a pop in the depicted state, it will end up switching a
1 to a 0 at the top balancer, and leave on wire 0, then reach the top 2nd level balancer, again switching a 1 in the tree.
to a 0 and following its 0 wire, ending up popping the last value 5 as desired. This behavior will be true for We can parallelize the tree by ex-
concurrent executions as well: the sequences of values in the stacks in all quiescent states can be shown ploiting a simple observation similar
to preserve LIFO order.
to one we made about the elimination
backoff stack:

82 co mm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
review articles

If an even number of threads passes es to memory, and to maintain locality (perhaps with some added liveness
through a balancer, the outputs are as much as possible. conditions)…time will tell.
evenly balanced on the top and bot- What are the implications for our Our overall concurrent pool design
tom wires, but the balancer’s state re- stack design? Consider completely re- is quite simple. As depicted in Figure
mains unchanged. laxing the LIFO property in favor of a 7, we allocate a collection of n concur-
The idea behind the Elimination- Pool<T> structure in which there is rent lock-free stacks, one per com-
Tree<T>20,22 is to place an Elimina- no temporal ordering on push() and puting thread (alternately we could
tionArray in front of the bit in every pop() calls. We will provide a concur- allocate one stack per collection of
balancer as in Figure 6. If two popping rent lock-free implementation of a pool threads on the same core, depending
threads meet in the array, they leave that supports high parallelism, high lo- on the specific machine architecture).
on opposite wires, without a need to cality, and has a low cost in terms of the Each thread will push and pop from
touch the bit, as anyhow it would have overall number of accesses to memory. its own assigned stack. If, when it at-
remained in its original state. If two How useful is such a concurrent pool? tempts to pop, it finds its own stack
pushing threads meet in the array, they I would like to believe that most con- is empty, it will repeatedly attempt
also leave on opposite wires. If a push current applications can be tailored to to “steal” an item from another ran-
or pop call does not manage to meet use pools in place of queues and stacks domly chosen stack.b The pool has, in
another in the array, it toggles the bit
and leaves accordingly. Finally, if a Figure 6. The EliminationTree<T>.
push and a pop meet, they eliminate,
exchanging items as in the Elimina- ½ width
elimination
tionBackoffStack. It can be shown balancer
that this implementation provides C:return(5)
a quiescently consistent stack,a in elimination 5 1
which, in most cases, it takes a thread balancer
1 A: ok
O(log k) steps to complete a push or a 2
A:push(6)
pop, where k is the number of lock-free
B:return(6)
stacks on its output wires.
B:pop()
1
A Pool Made of Stacks
The collection of stacks accessed in C:pop()
parallel in the elimination tree provides D:pop()
3
quiescently consistent LIFO ordering 0
with a high degree of parallelism. How- E:push(7)
4
ever, each method call involves a loga-
rithmic number of memory accesses, E: ok D:return(7)
each involving a CAS operation, and
these accesses are not localized, that Each balancer in Tree[4] is an elimination balancer. The state depicted is the same as in Figure 5. From
is, threads are repeatedly accessing lo- this state, a push of item 6 by thread A will not meet any others on the elimination arrays and so will
cations they did not access recently. toggle the bits and end up on the 2nd stack from the top. Two pops by threads B and C will meet in the
This brings us to the final two is- top balancer’s array and end up going up and down without touching the bit, ending up popping the last
two values 5 and 6 from the top two lock-free stacks. Finally, threads D and E will meet in the top array
sues one must take into account when and “eliminate” each other, exchanging the value 7 and leaving the tree. This does not ruin the tree’s state
designing concurrent data structures: since the states of all the balancers would have been the same even if the threads had both traversed
the machine’s memory hierarchy and all the way down without meeting: they would have anyhow followed the same path down and ended up
exchanging values via the same stack.
its coherence mechanisms. Main-
stream multicore architectures are
cache coherent, where on most ma-
chines the L2 cache (and in the near fu- Figure 7. The concurrent Pool<T>.
ture the L3 cache as well) is shared by
all cores. A large part of the machine’s
performance on shared data is derived Each thread performs push() and
from the threads’ ability to find the pop() calls on a lock-free stack A:push(5) 5 1
and attempts to steal from other
data cached. The shared caches are stacks when a pop() finds the
choose
random
unfortunately a bounded resource, local stack empty. In the figure, B:push(6) 6 2 stack to
both in their size and in the level of ac- thread C will randomly select the steal from
top lock-free stack, stealing the C:pop()
cess parallelism they offer. Thus, the value 5. If the lock-free stacks
data structure design needs to attempt are replaced by lock-free deques,
to lower the overall number of access- thread C will pop the oldest value, D:pop() 4
returning 1.
b One typically adds a termination detection pro-
a To keep things simple, pop operations should tocol14 to the structure to guarantee that threads
block until a matching push appears. will know when there remain no items to pop.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 83
review articles

the common case, the same O(1) com- the machine’s size and the applica- 2. Anderson, J. and Kim, Y. An improved lower bound
for the time complexity of mutual exclusion. In
plexity per method call as the original tion’s concurrency requirements. For Proceedings of the 20th Annual ACM Symposium on
lockfree stack, yet provides a very high example, small collections of threads Principles of Distributed Computing (2001), 90−99.
3. Arora, N.S., Blumofe, R.D. and Plaxton, C.G. Thread
degree of parallelism. The act of steal- can effectively share a lock-based or scheduling for multiprogrammed multiprocessors.
ing itself may be expensive, especially lock-free stack, slightly larger ones an Theory of Computing Systems 34, 2 (2001), 115−144.
4. Aspnes, J., Herlihy, M. and Shavit, N. Counting
when the pool is almost empty, but elimination stack, but for hundreds networks. J. ACM 41, 5 (1994), 1020−1048.
there are various techniques to reduce of threads we will have to bite the bul- 5. Blumofe, R.D. and Leiserson, C.E. Scheduling
multithreaded computations by work stealing. J. ACM
the number of steal attempts if they let and move from a stack to a pool 46, 5 (1999), 720−748.
are unlikely to succeed. The random- (though within the pool implementa- 6. Chase, D. and Lev, Y. Dynamic circular work-
stealing deque. In Proceedings of the 17th Annual
ization serves the purpose of guaran- tion threads residing on the same core ACM Symposium on Parallelism in Algorithms and
Architectures (2005). ACM Press, NY, 21−28.
teeing an even distribution of threads or machine cluster could use a single 7. Cypher, R. The communication requirements of
over the stacks, so that if there are stack quite effectively). mutual exclusion. In ACM Proceedings of the Seventh
Annual Symposium on Parallel Algorithms and
items to be popped, they will be found In the end, we gave up the stack’s Architectures (1995), 147-156.
quickly. Thus, our construction has LIFO ordering in the name of perfor- 8. Dwork, C., Herlihy, M. and Waarts, O. Contention in
shared memory algorithms. J. ACM 44, 6 (1997),
relaxed the specification by removing mance. I imagine we will have to do the 779−805.
the causal ordering on method calls same for other data structure classes. 9. Fich, F.E., Hendler, D. and Shavit, N. Linear lower
bounds on real-world implementations of concurrent
and replacing the deterministic live- For example, I would guess that search objects. In Proceedings of the 46th Annual IEEE
ness and complexity guarantees with structures will move away from being Symposium on Foundations of Computer Science
(2005).IEEE Computer Society, Washington, D.C.,
probabilistic ones. comparison based, allowing us to use 165−173.
As the reader can imagine, the O(1) hashing and similar naturally parallel 10. Gibbons, P.B., Matias, Y. and Ramachandran, V. The
queue-read queue-write PRAM model: Accounting for
step complexity does not tell the whole techniques, and that priority queues contention in parallel algorithms. SIAM J. Computing
story. Threads accessing the pool will will have a relaxed priority ordering 28, 2 (1999), 733−769.
11. Hendler, D., Shavit, N. and Yerushalmi, L. A scalable
tend to pop items that they them- in place of the strong one imposed by lock-free stack algorithm. J. Parallel and Distributed
selves recently pushed onto their own deleting the minimum key. I can’t wait Computing 70, 1 (Jan. 2010), 1−12.
12. Herlihy, M., Luchangco, V., Moir, M. and Scherer III,
designated stack, therefore exhibit- to see what these and other structures W.N. Software transactional memory for dynamic-
sized data structures. In Proceedings of the 22nd
ing good cache locality. Moreover, will look like. Annual Symposium on Principles of Distributed
since chances of a concurrent stealer As we go forward, we will also need Computing. ACM, NY, 2003, 92−101.
13. Herlihy, M. and Moss, E. Transactional memory:
are low, most of the time a thread ac- to take into account the evolution of architectural support for lock-free data structures.
cesses its lock-free stack alone. This hardware support for synchroniza- SIGARCH Comput. Archit. News 21, 2 (1993),
289−300.
observation allows designers to create tion. Today’s primary construct, the 14. Herlihy, M. and Shavit, N. The Art of Multiprocessor
a lockfree “stack-like” structure called CAS operation, works on a single Programming. Morgan Kaufmann, San Mateo, CA,
2008.
a Dequec that allows the frequently ac- memory location. Future architectures 15. Herlihy, M. and Wing, J. Linearizability: A correctness
cessing local thread to use only loads will most likely support synchroniza- condition for concurrent objects. ACM Trans.
Programming Languages and Systems 12, 3 (July
and stores in its methods, resorting tion techniques such as transactional 1990), 463−492.
to more expensive CAS based method memory,13,21 allowing threads to instan- 16. Moir, M., Nussbaum, D., Shalev, O. and Shavit, N.
Using elimination to implement scalable and lock-
calls only when chances of synchro- taneously read and write multiple loca- free fifo queues. In Proceedings of the 17th Annual
nization with a conflicting stealing tions in one indivisible step. Perhaps ACM Symposium on Parallelism in Algorithms and
Architectures. ACM Press, NY, 2005, 253−262.
thread are high.3,6 more important than the introduction 17. Moir, M. and Shavit, N. Concurrent data structures.
The end result is a pool implemen- of new features like transactional mem- Handbook of Data Structures and Applications, D.
Metha and S. Sahni, eds. Chapman and Hall/CRC
tation that is tailored to the costs of ory is the fact that the relative costs of Press, 2007, 47-14, 47-30.
the machine’s memory hierarchy and 18. Scherer III, W.N., Lea, D. and Scott, M.L. Scalable
synchronization and coherence are synchronous queues. In Proceedings of the 11th ACM
synchronization operations. The big likely to change dramatically as new SIGPLAN Symposium on Principles and Practice of
Parallel Programming. ACM Press, NY, 2006, 147−156.
hope is that as we go forward, many of generations of multicore chips role out. 19. Scherer III, W.N. and Scott, M.L. Advanced contention
these architecture-conscious optimiza- We will have to make sure to consider management for dynamic software transactional
memory. In Proceedings of the 24th Annual ACM
tions, which can greatly influence per- this evolution path carefully as we set Symposium on Principles of Distributed Computing.
formance, will move into the realm of our language and software develop- ACM, NY, 2005, 240−248.
20. Shavit, N. and Touitou, D. Elimination trees and the
compilers and concurrency libraries, ment goals. construction of pools and stacks. Theory of Computing
and the need for everyday program- Concurrent data structure design Systems 30 (1997), 645−670.
21. Shavit, N. and Touitou, D. Software transactional
mers to be aware of them will diminish. has, for many years, been moving for- memory. Distributed Computing 10, 2 (Feb. 1997),
ward at glacial pace. Multicore proces- 99−116.
22. Shavit, N. and Zemach, A. Diffracting trees. ACM
What Next? sors are about to heat things up, leav- Transactions on Computer Systems 14, 4 (1996),
The pool structure ended our se- ing us, the data structure designers 385−428.
23. Treiber, R.K. Systems programming: Coping with
quence of relaxations. I hope the read- and users, with the interesting job of parallelism. Technical Report RJ 5118 (Apr. 1986).
er has come to realize how strongly directing which way they flow. Let’s try IBM Almaden Research Center, San Jose, CA.

the choice of structure depends on to get it right.


Nir Shavit is a professor of computer science at Tel-Aviv
University and a member of the Scalable Synchronization
c This Deque supports push() and pop() References Group at Oracle Labs. He is a recipient of the 2004 ACM/
1. Agarwal, A. and Cherian, M. Adaptive backoff EATCS Gödel Prize.
methods with the traditional LIFO semantics
synchronization techniques. In Proceedings of
and an additional popTop() method for steal- the 16th International Symposium on Computer
ers that pops the first-in (oldest) item.5 Architecture (May 1989), 396−406. © 2011 ACM 0001-0782/11/0300 $10.00

84 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
research highlights
p. 86 p. 87
Technical The Informatics Philharmonic
Perspective By Christopher Raphael
Concerto for Violin
and Markov Model
By Juan Bello, Yann LeCun,
and Robert Rowe

p. 94 p. 95
Technical VL2: A Scalable and Flexible
Perspective Data Center Network
VL2
By Jennifer Rexford By Albert Greenberg, James R. Hamilton, Navendu Jain,
Srikanth Kandula, Changhoon Kim, Parantap Lahiri,
David A. Maltz, Parveen Patel, and Sudipta Sengupta

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 85
research highlights
doi:10.1145/1897852.1 8 9 7 8 7 4

Technical Perspective ticipate each other’s actions through


practice.
Concerto for Violin The system’s architecture is con-
structed using common staples of the
and Markov Model machine learning literature such as
hidden Markov models and Gaussian
By Juan Bello, Yann LeCun, and Robert Rowe graphical models. Yet, these elements
and others are combined using a healthy
In the opening moments of Jean Sibel- ing by reaction alone. The system must dose of musically meaningful insights,
ius’ Violin Concerto, the young soloist incorporate a predictive component as well as the engineering acumen nec-
plays delicately, almost languidly. The that attempts to align upcoming notes essary to make the system work robustly
orchestra responds in kind, muting of the accompaniment with imminent in real time, as emphatically demonstrat-
the repeated string motif to a whisper. attacks of the human player. Failing to ed by the accompanying videos. The re-
As the piece progresses, soloist and or- solve this problem can result in poten- sult is an effective, albeit limited, model
chestra alternatively perform the main tially disastrous consequences for the of a human accompanist that has been
motifs in increasing measures of pow- performance. extensively tested by student perform-
er and virtuosity, which inexorably lead The proposed approach starts by ers at one of the country’s premier con-
toward the movement’s stirring resolu- using a hidden Markov model-based servatories, the Jacobs School of Music
tion. The soloist looks relieved as she score follower tasked with estimating at the University of Indiana.
crosses the stage to shake the conduc- the start time of the notes played by the The system is an important mile-
tor’s hand. soloist and matching them to a position stone toward machine musicianship.
This violinist, like most others in in the score. The model considers the Through the use of machine learning,
music education, can benefit enor- sequence of frame-wise signal features, computers are acquiring new skills
mously from interacting with large characterizing transient and pitch in- once thought to be uniquely human.
ensembles in honing her performing formation on the audio input, as its out- If music communicates human emo-
skills. However, the demand far ex- put, and the state graph for the Markov tions, isn’t it futile to teach comput-
ceeds the number and capabilities of chain as a sequence of note sub-graphs ers to play music? The beauty of Music
existing orchestras, ensuring most of modeling the soloist’s performance. Plus One is that it follows and ampli-
these students won’t have access to this In a process akin to echo cancellation, fies the emotions of the human player.
experience. Our soloist is no exception. the contribution of the accompanist to In that sense, it is very much like a tra-
The previous paragraph describes her the audio signal is explicitly modeled to ditional musical instrument, albeit a
interaction with Chris Raphael’s Music avoid the system following itself. highly sophisticated one. Music Plus
Plus One system: A machine learning- The estimated sequence of note on- One relies on a predetermined musical
driven alternative to working with or- set times could be used to adaptively score. Perhaps the next step would be
chestras that retains much of the ex- control the playback rate of the or- to create an automatic rhythm section
pressivity and interactivity that makes chestral recording and match the play- that reacts to a soloist the way jazz mu-
concerto performance such a reward- er’s position in the score. However, it sicians instantly react to each other’s
ing and educational experience. The is not possible to accurately estimate improvisations. It would require a new
following paper details the approach, the soloist’s timing without a certain level of machine musicianship, and
for videos see http://www.music.infor- amount of latency, thus causing the or- would constitute a major challenge
matics.indiana.edu/papers/icml10/. chestra to consistently lag behind. The for machine learning, one that will in-
Automatic music accompaniment author’s solution is to use a Gaussian crease our understanding and appre-
has been actively researched since the graphical model to predict the timing ciation of the human mind’s ability to
1980s, starting with the work of such pi- of the next orchestra event based on create and improvise.
oneers as Barry Vercoe and Roger Dan- previous observation of both solo and
nenberg. The problem can be broken orchestra note occurrences. In this Reference
1. Dannenberg, R.B. An on-line algorithm for real-time
into three subparts:1 tracking the play- context, the orchestra’s playback rate, accompaniment. In Proceedings of the International
ing of a human soloist, matching it to a modulated using the well-established Computer Music Conference (Paris, France, 1984),
193−198.
known musical score, and synthesizing phase vocoder method, is continu-
an appropriate accompaniment to that ously re-estimated as new events are Juan Bello is an assistant professor of music technology
in the Department of Music and Performing Arts
solo part in real time. Solutions usually observed, a formulation that is robust Professions at NYU’s Steinhardt School of Culture,
involve ingenious pattern-matching to missing notes, as pending orchestra Education and Human Development.
mechanisms for dealing with expres- notes are conditioned only on those Yann LeCun is Silver Professor of Computer Science and
Neural Science at NYU and co-founder of MuseAmi, a
sive, incorrectly played or missing notes events that have been observed. Cru- music technology company.
in the soloist performance, while using cially, Raphael uses information from Robert Rowe is a professor and vice chair in the Department
the output of the pattern match to drive rehearsals to adapt the predictive of Music and Performing Arts Professions at NYU’s Steinhardt
School of Culture, Education and Human Development,
the scheduling of accompaniment model to the soloist’s interpretation where he directs the Music Composition Program.
events. However, as Raphael notes, it is style, thus mimicking the process by
impossible to accomplish score follow- which human performers learn to an- © 2011 ACM 0001-0782/11/0300 $10.00

86 co mm unications of th e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
doi:10.1145/1897852 . 1 8 9 7 8 7 5

The Informatics Philharmonic


By Christopher Raphael

Abstract Even within the realm of classical music, there are a


A system for musical accompaniment is presented in which number of ways to further subdivide the accompaniment
a computer-driven orchestra follows and learns from a solo- problem, requiring substantially different approaches.
ist in a concerto-like setting. The system is decomposed into The JSoM is home to a large string pedagogy program
three modules: The first computes a real-time score match beginning with students at 5 years of age. Students in this
using a hidden Markov model; the second generates the program play solo pieces with piano even in their first year.
output audio by phase-vocoding a preexisting audio record- When accompanying these early-stage musicians, the
ing; the third provides a link between these two, by pre- ­pianist’s role is not simply to follow the young soloist, but
dicting future timing evolution using a Kalman Filter–like to teach as well, by modeling good rhythm, steady tempo
model. Several examples are presented showing the system where appropriate, while introducing musical ideas. In a
in action in diverse musical settings. Connections with sense, this is the hardest of all classical music accompani-
machine learning are highlighted, showing current weak- ment problems, since the accompanist must be expected
nesses and new possible directions. to  know more than the soloist, thus dictating when the
accompanist should follow, as well as when and how to
lead. A coarse approximation to this accompanist role
1. MUSICAL ACCOMPANIMENT SYSTEMS provides a rather rigid accompaniment that is not overly
Musical accompaniment systems are computer programs responsive to the soloist’s interpretation (or errors)—there
that serve as musical partners for live musicians, usually are several commercial programs that take this approach.
playing a supporting role for music centering around the live The more sophisticated view of the pedagogical music
player. The types of possible interaction between live player ­system—one that follows and leads as appropriate—is
and computer are widely varied. Some approaches create almost completely untouched, possibly due to the diffi-
sound by processing the musician’s audio, often driven by culty of modeling the objectives. However, we see this area
analysis of the audio content itself, perhaps distorting, echo- as fertile for lasting research contributions and hope that
ing, harmonizing, or commenting on the soloist’s audio in we, and others, will be able to contribute to this cause.
largely predefined ways.8, 12 Other orientations are directed An entirely different scenario deals with music that
toward improvisatory music, such as jazz, in which the com- evolves largely without any traditional sense of rhythmic
puter follows the outline of a score, perhaps even composing flow, such as in some compositions of Penderecki, Xenakis,
its own musical part “on the fly,”3 or evolving as a “call and Boulez, Cage, and Stockhausen, to name some of the more
response” in which the computer and human alternate the famous examples. Such music is often notated in terms of
lead role.6, 9 Our focus here is on a third approach that mod- seconds, rather than beats or measures, to emphasize the
els the traditional “classical” concerto-type setting in which irrelevance of regular pulse. For works of this type involving
the computer performs a precomposed musical part in a soloist and accompaniment, the score can indicate points
way that follows a live soloist.2, 4, 11 This categorization is only of synchronicity, or time relations, between various points
meant to summarize some past work, while acknowledging in the solo and accompaniment parts. If the approach
that there is considerable room for blending these scenar- is based solely on audio, a natural strategy is simply to wait
ios, or working entirely outside this realm of possibilities. until various solo events are detected, and then to respond
The motivation for the concerto version of the problem to these events. This is the approach taken by the IRCAM
is strikingly evident in the Jacobs School of Music ( JSoM) at score follower, with some success in a variety of pieces of
Indiana University, where most of our recent experiments this type.2
have been performed. For example, the JSoM contains about A third scenario, which includes our system, treats works
200 student pianists, for whom the concerto literature is for soloist and accompaniment having a continuing ­musical
central to their daily practice and aspirations. However, in pulse, including the overwhelming majority of “common
the JSoM, the regular orchestras perform only two piano practice” art music. This music is the primary focus of most
concerti each year using student soloists, thus ensuring that of our performance-oriented music students at the JSoM,
most of these aspiring pianists will never perform as orches- and is the music where our accompaniment system is most
tral soloist while at IU. We believe this is truly unfortunate at home. Music containing a regular, though not rigid, pulse
since nearly all of these students have the necessary tech- requires close synchronization between the solo and accom-
nical skills and musical depth to greatly benefit from the panying parts, as the overall result suffers greatly as this
concerto experience. Our work in musical accompaniment
The original version of this chapter is entitled “Music
systems strives to bring this rewarding experience to the
Plus One and Machine Learning” and was published in
music students, amateurs, and many others who would like
Proceedings of the International Conference on Machine
to play as orchestral soloist, though, for whatever reason, do
Learning, Haifa, 2010.
not have the opportunity.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 87
research highlights

­synchrony degrades. Section 4. The  Play module uses phase-vocoding5 to con-


Our system is known interchangeably as the “Informatics struct the orchestral audio output using audio from an
Philharmonic,” or “Music Plus One” (MPO), due to its accompaniment-only recording. This well-known technique
alleged improvement on the play-along accompaniment warps the timing of the original audio without introducing
records from the Music Minus One company that inspired pitch ­distortions, thus retaining much of the original musi-
our work. For several years, we have collaborated with fac- cal intent including balance, expression, and tone color. The
ulty and students in the JSoM on this traditional concerto Play process is driven by the output of the Predict module, in
setting, in an ongoing effort to improve the performance of essence by following an evolving sequence of future targets
our system while exploring variations on this scenario. The like a trail of breadcrumbs.
web page http://www.music.informatics.indiana.edu/papers/ While the basic methodology of the system relies on old
icml10 contains a video of violinist Yoo-jin Cho, accompa- standards from the ML community—HMMs and Gaussian
nied by our system on the first movement of the Sibelius vio- graphical models—the computational challenge of the
lin concerto, taken from a lecture/concert for our Art’s Week system should not be underestimated, requiring accurate
festival of 2007. We will present a description of the overall real-time two-way audio computation in musical scenarios
architecture of our system in terms of its three basic compo- complex enough to be of interest in a sophisticated musi-
nents: Listen, Predict, and Play, including several illuminat- cal community. The system was implemented for off-­the-
ing examples. We also identify open problems or limitations shelf hardware in C and C++ over a period of more than 15
of proposed approaches that are likely to be interesting to years by the author. Both Listen and Play are implemented
the Machine Learning community, and well may benefit as ­separate threads which both make calls to the Predict
from their contributions. ­module when either a solo note is detected (Listen) or an
The basic technology required for common practice clas- orchestra note is played (Play).
sical music extends naturally to the avant garde domain. In What follows is a more detailed look at Listen and Predict.
fact, we believe one of the greatest potential contributions
of the accompaniment system is in new music composed 3. LISTEN: HMM-BASED SCORE FOLLOWING
specifically for human–computer partnerships. The com- Blind music audio recognition1, 7, 13 treats the automatic
puter offers essentially unlimited virtuosity in terms of play- transcription of music audio into symbolic music repre-
ing fast notes and coordinating complicated rhythms. On sentations, using no prior knowledge of the music to be
the other hand, at present, the computer is comparatively recognized. This problem remains completely open, espe-
weak at providing aesthetically satisfying musical inter- cially with polyphonic (several independent parts) music,
pretations. Compositions that leverage the technical abil- where the state of the art remains primitive. While there
ity of the accompaniment system, while humanizing the are many ways one can build reasonable data models
performance through the live soloist’s leadership, provide quantifying how well a particular audio instant matches
an open-ended musical meeting place for the twenty-first- a hypothesized collection of pitches, what seems to be
century composition and technology. Several compositions missing is the musical language model. If phonemes and
of this variety, written specifically for our accompaniment notes are regarded as the atoms of speech and music,
system by Swiss composer and mathematician Jan Beran, there does not seem to be a musical equivalent of the word.
are presented at the web page referenced above. Furthermore, while music follows simple logic and can be
quite predictable, this logic is often cast in terms of higher-
2. OVERVIEW OF MUSIC PLUS ONE level constructs such as meter, ­harmony, and motivic
Our system is composed of three sub-tasks called “Listen,” transformation. Computationally tractable models such
“Predict,” and “Play.” The Listen module interprets the as note n-grams seem to contribute very little here, while
audio input of the live soloist as it accumulates in real time. a computationally useful music ­language model remains
In essence, Listen annotates the incoming audio with uncharted territory.
a ­“running commentary,” identifying note onsets with vari- Our Listen module deals with the much simpler situa-
able detection latency, using the hidden Markov model tion in which the music score is known, giving the pitches
discussed in Section 3. A moment’s thought here reveals the soloist will play along with their approximate dura-
that some detection latency is inevitable since a note must tions. Thus, the score following problem is one of align-
be heard for an instant before it can be identified. For this ment rather than recognition. Score following, otherwise
reason, we believe it is hopeless to build a purely “respon- known as online alignment, is more difficult than its off-
sive” system—one that waits until a solo note is detected line cousin, since an online algorithm cannot consider
before playing a  synchronous accompaniment event: Our future audio data in estimating the times of audio events.
detection latency is usually in the 30–90-ms range, enough A score following must “hear” a little bit of a note before
to prove fatal if the accompaniment is consistently behind the note’s onset can be detected, thus always resulting with
by this much. For this reason, we model the timing of our some degree of latency—the lag between the estimated
accompaniment on the human musician, continually pre- onset time and the time the estimate is made. One of the
dicting future evolution, while modifying these predictions principal challenges of online alignment is navigating the
as more information becomes available. The module of trade-off between latency and accuracy. Schwarz14 gives a
our system that performs this task, Predict, is a Gaussian nice annotated bibliography of the many contributions to
­graphical model quite close to a Kalman Filter, discussed in score following.

88 comm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
3.1. The listen model Our data model is composed of three features bt( yt),
Our HMM approach views the audio data as a sequence of et(yt), st( yt) assumed to be conditionally independent given
“frames,” y1, y2, . . . , yT, with about 30 frames per second, the state:
while modeling these frames as the output of a hidden
Markov chain, x1, x2, . . . , xT. The state graph for the Markov P(bt,et,st|xt) = P(bt|xt) P(et|xt) P(st|xt).
chain, described in Figure 1, models the music as a sequence
of sub-graphs, one for each solo note, arranged so that the The first feature, bt, measures the local “burstiness” of
process enters the start of the (n + 1)th note as it leaves the the signal, particularly useful in distinguishing between
nth note. From the figure, one can see that each note begins note attacks and steady-state behavior—observe that we
with a short sequence of states meant to capture the attack ­distinguished between the attack portion of a note and
portion of the note. This is followed by another sequence of steady-state portion in Figure 1. The second feature, et, mea-
states with self-loops meant to capture the main body of the sures the local energy, useful in distinguishing between
note, and to account for the variation in note duration we rests and notes. By far, however, the vector-valued feature
may observe, as follows. st is the most important, as it is well-suited to making pitch
If we chain together m states which each either move discriminations, as follows.
forward, with probability p, or remain in the current state, We let fn denote the frequency associated with the nomi-
with probability q = 1 − p, then the total number of state vis- nal pitch of the nth score note. As with any quasi-periodic
its (audio frames), L, spent in the sequence of m states has a signal with frequency fn, we expect that the audio data from
negative binomial distribution the nth note will have a magnitude spectrum composed of
“peaks” at integral multiples of fn. This is modeled by the
Gaussian mixture model depicted in Figure 2

for l = m, m + 1, . . . . While convenient to represent this distri-


bution with a Markov chain, the asymmetric nature of the
negative binomial is also musically reasonable: While it is where hwh = 1 and N( j; m, s 2) is a discrete approximation
common for an inter-onset interval (IOI) to be much longer of a Gaussian distribution. The model captures the note’s
than its nominal length, the reverse is much less common. “spectral envelope,” describing the way energy is distrib-
For each note, we choose the parameters m and p so that uted over the frequency range. In addition, due to the loga-
E(T) = m/p and Var(T) = mq/p2 reflect our prior beliefs. Before rithmic nature of pitch, frequency “errors” committed by
any rehearsals, the mean is chosen to be consistent with the player are proportional to the desired frequency. This
the note value and the nominal tempo given in the score, is captured in our model by the increasing variance of the
while the variance is chosen to be a fixed increasing func- mixture components. We define st to be the magnitude
tion of the mean. However, once we have rehearsed a piece spectrum of yt, normalized to sum to constant value, C. If we
a few times, we choose m and p according to the method of believe the nth note is sounding in the tth frame, we regard
moments—so that the empirical mean and variance agree
with the mean and variance from the model. Figure 2. An idealized note spectrum modeled as a mixture
In reality, we use a wider variety of note models than of Gaussians.
depicted in Figure 1, with variants for short notes, notes
ending with optional rests, notes that are rests, etc., though 0.8
all follow the same essential idea. The result is a network of
thousands of states.
0.6
Figure 1. The state graph for the hidden sequence, x1, x2, . . . , of our
HMM.
q q q q q q
0.4
p

Note 1 p p p p p
atck atck sust sust sust sust sust sust
m p
start1

0.2
Note 2
start2

0.0
Note 3
start3 0 200 400 600 800 1000
etc. Frequency (Hz)

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 89
research highlights

st as the histogram of a random sample of size C. Thus our the “forward” probabilities, p(xt|y1, . . ., yt), for our current
data model becomes the multinomial distribution frame, t. Listen waits to detect note n until we are sufficiently
confident that its onset is in the past. That is, until
 (1)
P(xt ³ startn|y1, . . . , yt) ³ t

It is worth noting that the model generalizes in a straight- for some constant, t. In this expression, startn represents the
forward way to situations in which multiple pitches sound ­initial state of the nth note model, as indicated in Figure 1,
at once, simply by mixing several distributions of the forms which is either before, or after all other states in the model (xt ³
of Equation 3.1. In this way our approach accommodates startn makes sense here). Suppose that t* is the first frame where
anything from double stops on the violin to large ensemble the above inequality holds. When this occurs, our knowledge of
performances. the note onset time can be summarized by the function of t:
This modeling approach describes the part of the audio
spectrum due to the soloist reasonably well. However, our P(xt = startn|y1, . . . , yt *)
actual signal will receive not only this solo contribution,
but also audio generated by our accompaniment system which we compute using the forward–backward algorithm.
itself. If the accompaniment audio contains frequency Occasionally this distribution conveys uncertainty about the
content that is confused with the solo audio, the result onset time of the note, say, for instance, if it has high vari-
is the highly undesirable possibility of the accompani- ance or is bimodal. In such a case we simply do not report
ment system following itself—in essence, chasing its own the onset time of the particular note, believing it is better to
shadow. To a certain degree, the likelihood of this outcome remain silent than provide bad information. Otherwise, we
can be diminished by “turning off” the score follower when estimate the onset as
the soloist is not playing; of course we do this. However,
there  is still significant potential for shadow-chasing t̂n = arg max P(xt = startn|y1, . . . , yt*) (2)
*
since  the pitch content of the solo and accompaniment   t £ t
parts is often similar.
Our solution is to directly model the accompaniment and deliver this information to the Predict module.
contribution to the audio signal we receive. Since we know Several videos demonstrating the ability of our score
what the orchestra is playing (our system generates this ­following can be seen at the aforementioned web site. One of
audio), we add this contribution to the data model. More these simply plays the audio while highlighting the locations
explicitly, if qt is the magnitude spectrum of the orchestra’s of note onset detections at the times they are made, thus
contribution in frame t, we model the conditional distribu- demonstrating detection latency—what one sees lags slightly
tion of st using Equation 1, but with pt,n = λpn + (1 – λ)qt for 0 behind what one hears. A second video shows a rather eccen-
< λ < 1 instead of pn. tric performer who ornaments wildly, makes extreme tempo
This addition creates significantly better results in many changes, plays wrong notes, and even repeats a measure,
situations. The surprising difficulty in actually implement- thus demonstrating the robustness of the score follower.
ing the approach, however, is that there seems to be only
weak agreement between the known audio that our system 4. PREDICT: MODELING MUSICAL TIMING
plays through the speakers and the accompaniment audio As discussed in Section 2, we believe a purely responsive
that comes back through the microphone. Still, with various accompaniment system cannot achieve acceptable coor-
averaging tricks in the estimation of qt, we can nearly elimi- dination of parts in the range of common practice “clas-
nate the undesirable shadow-chasing behavior. sical” music we treat, thus we choose to schedule our
accompaniment through prediction rather than response.
3.2. Online interpretation of audio Our approach is based on a probabilistic model for musi-
One of the worst things a score follower can do is report cal ­timing. In developing this model, we begin with three
events before they have occurred. In addition to the sheer important traits we believe such a model must have.
impossibility of producing accurate estimates in this case,
the musical result often involves the accompanist arriv- 1.  Since our accompaniment must be constructed in real
ing at a point of coincidence before the soloist does. When time, the computational demand of our model must
the accompanist “steps on” the soloist in this manner, the be feasible in real time.
­soloist must struggle to regain control of the performance, 2.  Our system must improve with rehearsal. Thus our
perhaps feeling desperate and irrelevant in the process. model must be able to automatically train its parame-
Since the consequences of false positives are so great, the ters to embody the timing nuances demonstrated by
score follower must be reasonably certain that a note event the live player in past examples. This way our system
has already occurred before reporting its location. The prob- can better anticipate the future musical evolution of
abilistic formulation of online score following is the key to the current performance.
avoiding such false positives, while navigating the accuracy- 3.  If our rehearsals are to be successful in guiding the
latency trade-off in a reasonable manner. system toward the desired musical end, the system
Every time we process a new frame of audio we recompute must “sightread” (perform without rehearsal) reason-

90 co mm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
ably well. Otherwise, the player will become distracted Figure 3. Top: Two musical parts generate a composite rhythm
by the poor ensemble and not be able to demonstrate when superimposed. Bot: The resulting graphical model arising
what he or she wants to hear. Thus there must be a from the composite rhythm.
­neutral setting of parameters that allows the system to
3 3
perform reasonably well “out of the box.”

4.1. The timing model


We first consider a timing model for a single musical part. Solo
Our model is expressed in terms of two hidden sequences,
{tn} and {sn} where tn is the time, in seconds, of nth note Accomp
onset and sn is the tempo, in seconds per beat, for the nth
note. These sequences evolve according to the model

sn + 1 = sn + sn (3)
Listen
tn + 1 = tn + lnsn + tn (4)
Updates

where ln is the length of the nth event, in beats. Composite


With the “update” variables, {sn} and {tn}, set to 0,
this model gives a literal and robotic musical perfor-
Accomp
mance with each inter-onset-interval, tn + 1 − tn, consuming
an amount of time proportional to its length in beats, ln.
The introduction of the update variables allows time-
varying tempo through the {s n}, and elongation or com- onset times, {on¢}. We model
pression of note lengths with the {tn}. We further assume t̂n = tn + n
that the {(sn, tn)t} are independent with (sn,tn)t ∼ N(mn, Gn),
n = 1, 2, . . . , and (s 0, t 0) t ~ N(m 0, G 0), thus leading to a joint on¢ = tn¢ + dn¢
Gaussian model on all model ­v ariables. The rhythmic
interpretation embodied by the model is expressed in where n ~ N(0, rs2) and dn¢ ~ N(0, ro2). The result is the Gaussian
terms of the {mn, Gn} parameters. In this regard, the {mn} graphical model depicted in the bottom panel of Figure 3.
vectors represent the tendencies of the performance— In this figure, the row labeled “Composite” corresponds to
where the player tends to speed up (sn < 0), slow down the {(sn, tn)} variables of Equations 3 and 4, while the row
(sn > 0), and stretch (tn > 0), while the {Gn} matrices cap- labeled “Updates” corresponds to the {(sn, tn)} variables. The
ture the repeatability of these tendencies. “Listen” row is the collection of estimated solo note onset
It is simplest to think of Equations 3 and 4 as a timing times, {t̂n} while the “Accompaniment” row ­corresponds to
model for single musical part. However, it is just as reason- the orchestra times, {on¢}.
able to view these equations as a timing model for the com-
posite rhythm of the solo and orchestra. That is, consider the 4.2. The model in action
situation, depicted in Figure 3, in which the solo, orchestra, With the model in place we are now ready for real-time accom-
and composite rhythms have the following musical times paniment. For our first rehearsal we initialize the model
(in beats): so that mn = 0 for all n. This assumption in no way precludes
our system from correctly interpreting and following tempo
solo 0 1/3 2/3 1 4/3 5/3 2 changes or other rhythmic nuances of the soloist. Rather, it
states that, whatever we have seen so far in a ­performance, we
accomp 0 1/2 1 3/2 2
expect future timing to evolve according to the current tempo.
comp. 0 1/3 1/2 2/3 2 4/3 3/2 5/3 2 In real-time accompaniment, our system is concerned only
with scheduling the currently pending orchestra note time,
The {ln} for the composite would be found by simply ­taking on¢. The time of this note is initially scheduled when we play
the differences of rational numbers forming the ­composite the previous orchestra note, on¢− 1. At this point we compute the
rhythm: l1 = 1/3, l2 = 1/6, etc. In what follows, we regard new mean of on¢, conditioning on on¢− 1 and whatever other vari-
Equations 3 and 4 as a model for this composite rhythm of ables have been observed, and schedule on¢ accordingly. While
the solo and orchestra parts. we wait for the currently scheduled time to occur, the Listen
The observable variables in this model are the solo module may detect various solo events, t̂n. When this happens
note onset estimates produced by Listen and the known we recompute the mean of on¢, conditioning on this new infor-
note onsets of the orchestra (our system constructs these mation. Sooner or later the actual clock time will catch up to
­during the performance). Suppose that n indexes the the currently scheduled time of the n¢th event, at which point
events in the composite rhythm having associated solo the orchestra note is played. Thus an orchestra note may be
notes, estimated by the { t̂n}. Additionally, suppose that n¢ rescheduled many times before it is actually played.
indexes the events having associated orchestra notes with A particularly instructive example involves a run of many

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 91
research highlights

solo notes culminating in a point of coincidence with the evolves. This is our only “control” the orchestra performance.
orchestra. As each solo note is detected we refine our esti- After one or more “rehearsals,” we adapt our timing model
mate of the desired point of coincidence, thus gradually to the soloist to better anticipate future performances. To do
“honing in” on this point of arrival. It is worth noting that this, we first perform an off-line estimate of the solo note times
very little harm is done when Listen fails to detect a solo using Equation 2, only conditioning on the entire sequence of
note. We simply predict the pending orchestra note condi- frames, y1, . . . , yT, using the forward–backward algorithm to
tioning on the variables we have observed. identify the most likely onset time for each note. Using one or
The web page given before contains a video demonstrating more such rehearsals, we can iteratively reestimate the model
this process. The video shows the estimated solo times from parameters {mn} using the EM algorithm, resulting in both
our score follower appearing as green marks on a spectro- measurable and perceivable improvement of prediction accu-
gram. Predictions of our accompaniment system are shown racy. While, in principle, we can also estimate the {Gn} param-
as analogous red marks. One can see the pending orchestra eters, we have observed little or no benefit from doing so.
time “jiggling” as new solo notes are estimated, until finally In practice, we have found the soloist’s interpretation to
the currently predicted time passes. In the video, one can see be something of a “moving target.” At first this is because the
occasional solo notes that are never marked with green lines. soloist tends to compromise somewhat in the initial rehears-
These are notes for which the posterior onset time was not suf- als, pulling the orchestra in the desired direction, while not
ficiently peaked to merit a note detection. This happens most actually reaching the target interpretation. But even after the
often with repeated pitches, for which our data model is less soloist seems to settle down to a particular interpretation on
informative, and notes following longer notes, where our prior a given day, we often observe further “interpretation drift”
model is less opinionated. We simply treat such notes as unob- over subsequent meetings. Of course, without this drift one’s
served and base our predictions only on the observed events. ideas could never improve! For this reason we train the model
The role of Predict is to “schedule” accompaniment notes, using the most recent several rehearsals, thus facilitating the
but what does this really mean in practice? Recall that our continually evolving nature of musical interpretation.
program plays audio by phase-vocoding (time-stretching) an
orchestra-only recording. A time-frequency representation 5. MUSICAL EXPRESSION AND MACHINE LEARNING
of such an audio file for the first movement of the Dvor̂ák Our system learns its musicality through “osmosis.” If the
Cello concerto is shown in Figure 4. If you know the piece, soloist plays in a musical way, and the orchestra manages
you will likely be able to follow this spectrogram. In prepar- to closely follow the soloist, then we hope the orchestra will
ing this audio for our accompaniment system, we perform inherit this musicality. This manner of learning by imita-
an off-line score alignment to determine where the various tion works well in the concerto setting, since the division
orchestra notes occur, as marked with vertical lines in the of authority between the players is rather extreme, mostly
figure. Scheduling a note simply means that we change the granting the “right of way” to the soloist.
phase-vocoder’s play rate so that it arrives at the appropri- In contrast, the pure following approach is less reasonable
ate audio file position (vertical line) at the scheduled time. when the accompaniment needs a sense of musicality that
Thus the play rate is continually modified as the performance acts independently, or perhaps even in opposition, to what

Figure 4. A “spectrogram” of the opening of the first movement of the Dvor̂ák Cello concerto. The horizontal axis of the figure represents
time while the vertical axis represents frequency. The vertical lines show the note times for the orchestra.

92 comm unications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
other players do. Such a situation occurs with the early-stage understanding of the musical meaning, on which the inter-
accompaniment problem discussed in Section 1, as here pretive decisions are based. This meaning comes from sev-
one cannot learn the desired musicality from the live player. eral different aspects of the music. For example, some comes
Perhaps the accompaniment antithesis of the ­concerto set- from musical structure, as in the way one might slow down at
ting is the opera orchestra, in which the “accompanying” the end of a phrase, giving a sense of musical ­closure. Some
ensemble is often on equal footing with the ­soloists. We meaning comes from prosodic aspects, analogous to speech,
observed the nadir of our system’s performance in an opera such as a local point of arrival, which may be maybe empha-
rehearsal where our system served as rehearsal ­pianist. What sized or delayed. A third aspect of meaning describes an over-
these two situations have in common is that they require all character or affect of a section of music, such as excited or
an accompanist with independent musical knowledge and calm. While there is no official taxonomy of musical inter-
goals. pretation, most discussions on this subject revolve around
How can we more intelligently model this musicality? An intermediate identifications of this kind, and the interpretive
incremental approach would begin by observing that our actions they require.10
timing model of Equations 3 and 4 is over-parametrized, From the machine learning point of view, it is impossible
with more degrees of freedom than there are notes. We make to learn anything useful from a single example, thus one must
this modeling choice because we do not know which degrees group together many examples of the same musical situation
of freedom are needed ahead of time, so we use the train- in order to learn their associated interpretive actions. Thus it
ing data from the soloist to help sort this out. Unnecessary seems natural to model the music in terms of some latent vari-
learned parameters may contribute some noise to the result- ables that implicitly categorize individual notes or sections of
ing timing model, but the overall result is acceptable. music. What should the latent variables be, and how can one
One possible line of improvement is simply decreasing describe the dependency structure among them? While we can-
the model’s freedom—surely the player does not wish to not answer these questions, we see in them a good deal of depth
change the tempo and apply tempo-independent note length and challenge, and recommend this problem to the musically
variation on every note. For instance, one alternative model inclined members of the readership with great enthusiasm.
adds a hidden discrete process that “chooses,” for each note,
between three possibilities: variation of either tempo or note Acknowledgments
length, or no variation of either kind. Of these, the choice of This work was supported by NSF Grants IIS-0812244 and
neither variation would be the most likely a priori, thus bias- IIS-0739563.
ing the model toward simpler musical interpretations. The References
resulting model is a Switching Kalman Filter.15 While exact 1. Cemgil, A.T., Kappen, H.J., Barber, D. systems. Inf. Process. Soc. Jpn. SIG
A generative model for music Notes, 123 (2002), 1–6.
inference is no longer possible with such a model, we expect transcription. IEEE Trans. Audio Speech 9. Pachet, F. Beyond the cybernetic jam
that one can make approximations that will be good enough Lang. Process. 14, 2 (Mar. 2006), 679–694. fantasy: The continuator. IEEE Comput.
2. Cont, A., Schwarz, D., Schnell, N. Graph. Appl. 24, 1 (2004), 31–35.
to realize the full potential of the model. From Boulez to ballads: Training 10. Palmer, C. Music performance. Annu.
Perhaps a more ambitious approach analyzes the musical ircam’s score follower. In Proceedings Rev. Psychol. 48 (1997), 115–138.
of the International Computer Music 11. Raphael, C. A Bayesian network for
score itself to choose the locations requiring degrees of free- Conference (2005), 241–248. real-time musical accompaniment.
3. Dannenberg, R., Mont-Reynaud, B. In Advances in Neural Information
dom. One can think of this approach as adding “joints” to the Following an improvisation in real Processing Systems (NIPS) 14. MIT
musical structure so that it deforms into musically reason- time. In Proceedings of the 1987 Press, 2002.
International Computer Music 12. Rowe, R. Interactive Music Systems.
able shapes as a musician applies external force. Here there Conference (1987), 241–248. MIT Press, 1993.
is an interesting connection with the work on expressive 4. Dannenberg, R., Mukaino, H. New tech­ 13. Sagayama, T.N.S., Kameoka, H.
niques for enhanced quality of computer Specmurt anasylis: A piano-roll-
synthesis, such as Widmer and Goebl,16 in which one algo- accompaniment. In ­Pro­ceedings of the visualization of polyphonic music
rithmically constructs an expressive rendition of a previously 1988 International Computer Music signal by deconvolution of log-
Conference (1988), 243–249. frequency spectrum. In Proceedings
unseen piece of music, using ideas of machine learning. One 5. Flanagan, J.L., Golden, R.M. Phase 2004 ISCA Tutorial and Research
approach here associates various score situations, defined vocoder. Bell Syst. Tech. J. 45 Workshop on Statistical and Perceptual
(Nov. 1966), 1493–1509. Audio Processing (SAPA2004) (2004).
in terms of local configurations of score features, with 6. Franklin, J. Improvisation and 14. Schwarz, D. Score following
interpretive actions. The associated interpretive actions are learning. In Advances in Neural commented bibliography, 2003.
Information Processing Systems 14. 15. Shumway, R.H., Stoffer, D.S. Dynamic
learned by estimating timing and loudness parameters from MIT Press, Cambridge, MA, 2002. linear models with switching. J. Am.
a performance corpus, over all “equivalent” score locations. 7. Klapuri, A., Davy, M. (editors). Signal Stat. Assoc. 86 (1991), 763–769.
Pro­cessing Methods for Music Transcrip­ 16. Widmer, G., Goebl, W. Computational
Such approaches are far more ambitious than our present tion. Springer-Verlag, New York, 2006. models for expressive music
approach to musicality, as they try to understand expression 8. Lippe, C. Real-time interaction among performance: The state of the art.
composers, performers, and computer J. New Music Res. 33, 3 (2004), 203–216
in general, rather than in a specific musical context.
The understanding and synthesis of musical expression Christopher Raphael (craphael@indiana.
edu), School of Informatics and Computing,
is one of the most interesting music-science problems, and Indiana University, Bloomington, IN.
while progress has been achieved in recent years, it would
still be fair to call the problem “open.” One of the principal
challenges here is that one cannot directly map observable
surface-level attributes of the music, such as pitch contour
or local rhythm context, into interpretive actions, such as
delay, or tempo or loudness change. Rather, there is a murky
intermediate level in which the musician comes to some © 2011 ACM 0001-0782/11/0300 $10.00

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t he acm 93
research highlights
doi:10.1145/1897852.1 8 9 7 8 7 6

Technical Perspective
VL2
By Jennifer Rexford

T h e I n t e r n e t i s increasingly a plat- thing in networking—from address- interference with the many other ser-
form for online services—such as ing and congestion control to routing vices running in the same data center.
email, Web search, social networks, and the underlying topology—with the They achieve this goal through several
and virtual worlds—running on rack unique needs of data centers in mind. key design decisions, including flat
after rack of servers in data centers. The following paper presents one addressing (so service instances can
The servers not only communicate of the first measurement studies of run on any server, independent of its
with end users, but also with each oth- network traffic in data centers, high- location) and Valiant Load Balancing
er to analyze data (for example, to build lighting specifically the volatility of (to spread traffic uniformly over the
a search index) or compose Web pages the traffic even on a relatively small network). A Clos topology ensures the
(for example, by combining data from timescale. These observations led the network has many paths between each
multiple backend servers). With the authors to design an “agile” network pair of servers. To scale to large data
advent of large data centers, the study engineered for all-to-all connectiv- centers, the servers take responsibility
of the networks that interconnect these ity with no contention inside the net- for translating addresses to the appro-
servers has become an important topic work. This gives data-center operators priate “exit point” from the network,
to researchers and practitioners alike. the freedom to place applications on obviating the need for the networking
Data-center networking presents any servers, without concern for the equipment to keep track of the many
unique opportunities and challenges, performance of the underlying net- end hosts in the data center.
compared to traditional backbone and work. Having an agile network greatly In addition to proposing an effec-
enterprise networks: simplifies the task of designing and tive design, the authors illustrate how
˲˲ In a data center, the same compa- running online services. to build the solution using mecha-
ny controls both the servers and the net- More generally, the authors pro- nisms available in existing network
work elements, enabling new network pose a simple abstraction—a single switches (for example, equal-cost
architectures that implement key func- “virtual” layer-two switch (hence the multipath routing, IP anycast, and
tionality on the end-host computers. name “VL2”) for each service, with no packet encapsulation). This allows
˲˲ Servers are installed in fixed units, data centers to deploy VL2 with no
such as racks or even trucks filled with changes to the underlying switches,
racks driven directly into the data cen- substantially lowering the barrier for
ter. This leads to very uniform wiring This paper is a great practical deployment. This paper is a
topologies, such as fat trees or Clos example of rethinking great example of rethinking network-
networks, reminiscent of the mas- ing from scratch, while coming full
sively parallel computers designed in networking from circle to work with today’s equipment.
the 1990s. scratch, while coming Indeed, the work depicted in the VL2
˲˲ The traffic load in data centers is paper has already spawned substan-
often quite heavy and non-uniform, full circle to work with tial follow-up work in the networking
due to new backend applications like today’s equipment. research community, and likely will
MapReduce; the traffic can also be for years to come.
quite volatile, varying dramatically and
unpredictably over time. Jennifer Rexford is a professor in the Department of
Computer Science at Princeton University, Princeton, NJ.
In light of these new characteristics,
researchers have been revisiting every- © 2011 ACM 0001-0782/11/0300 $10.00

94 comm unications of th e ac m | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
doi:10.1145/1897852 . 1 8 9 7 8 7 7

VL2: A Scalable and Flexible


Data Center Network
By Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim,
Parantap Lahiri, David A. Maltz, Parveen Patel, and Sudipta Sengupta

Abstract hardware. Due to the high equipment cost, the capacity


To be agile and cost effective, data centers must allow between ­different levels of the tree is typically oversub-
dynamic resource allocation across large server pools. In scribed by ­factors of 1:5 or more, with paths near the root
particular, the data center network should provide a simple oversubscribed by factors of 1:80 to 1:240. This oversub-
flat abstraction: it should be able to take any set of servers scription limits communication between servers to the
anywhere in the data center and give them the illusion that point that it fragments the server pool—congestion and
they are plugged into a physically separate, noninterfering computation hot spots are prevalent even when spare
Ethernet switch with as many ports as the service needs. To capacity is available elsewhere. Second, while data centers
meet this goal, we present VL2, a practical network archi- host multiple services, the network does little to prevent
tecture that scales to support huge data centers with uni- a traffic flood in one service from affecting other services
form high capacity between servers, performance isolation around it—when one service experiences a traffic flood, it
between services, and Ethernet layer-2 semantics. VL2 uses is common for all those sharing the same network subtree
(1) flat addressing to allow service instances to be placed to suffer collateral damage. Third, the routing design in
­anywhere in the network, (2) Valiant Load Balancing to conventional networks achieves scale by assigning servers
spread traffic uniformly across network paths, and (3) end topologically significant IP addresses and dividing serv-
system–based address resolution to scale to large server ers up among VLANs. However, this creates an enormous
pools without introducing complexity to the network con­ configuration burden when servers must be reassigned
trol plane. VL2’s design is driven by detailed measurements among services, further fragmenting the resources of the
of traffic and fault data from a large operational cloud data center. The human involvement typically required in
service provider. VL2’s implementation leverages proven these reconfigurations undermines speed of deployment.
network technologies, already available at low cost in high- To overcome these limitations in the current network
speed hardware implementations, to build a scalable and and achieve agility, we arrange for the network to imple-
reliable network architecture. As a result, VL2 networks ment a familiar and concrete model: give each service the
can be deployed today, and we have built a working proto- illusion that all the servers assigned to it, and only those
type. We evaluate the merits of the VL2 design using mea- servers, are connected by a single noninterfering Ethernet
surement, analysis, and experiments. Our VL2 prototype switch—a Virtual Layer 2—and maintain this illusion even
­shuffles 2.7 TB of data among 75 servers in 395 s—sustain- as the size of each service varies from 1 server to 100,000.
ing a rate that is 94% of the maximum possible. Realizing this vision for the data center network concretely
translates into building a network that meets the following
three objectives:
1. INTRODUCTION
Cloud services are driving the creation of huge data centers, • Uniform high capacity: The maximum rate of a
holding tens to hundreds of thousands of servers, that con- ­server-to-server traffic flow should be limited only by
currently support a large and dynamic number of distinct the available capacity on the network-interface cards of
services (web apps, e-mail, map-reduce clusters, etc.). The the sending and receiving servers, and it should be
case for cloud service data centers depends on a scale-out ­possible to assign servers to a service without having to
design: reliability and performance achieved through large consider network topology.
pools of resources that can be rapidly reassigned between •• Performance isolation: Traffic of one service should
services as needed. With data centers being built with over not be affected by the traffic of any other service, just as
100,000 servers, at an amortized cost approaching $12 mil- if each service was connected by a separate physical
lion per month,14 the most desirable property for a data cen- switch.
ter is agility—the ability to assign any server to any service. •• Layer-2 semantics: The servers in each service should
Anything less inevitably results in stranded resources and experience the network as if it were an Ethernet Local
wasted money. Area Network (LAN). Data center management soft-
Unfortunately, the data center network is not up to the
task, falling short in several ways. First, existing architec-
A previous version of this paper was published in the
tures do not provide enough capacity between the serv-
Proceedings of SIGCOMM ’09 (Barcelona, Spain, Aug. 17–21,
ers they interconnect. Conventional architectures rely
2009). ACM, New York.
on treelike network configurations built from expensive

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 95
research highlights

ware should be able to assign any IP address the service the actual location of the destination and then tunnels the
requests to any server, and virtual machines should be original packet there. The shim layer also helps eliminate
able to migrate to any server while keeping the same IP the scalability problems created by ARP in layer-2 networks,
address. Finally, features like link-local broadcast, on and the tunneling improves our ability to implement VLB.
which many legacy applications depend, should work. These aspects of the design enable VL2 to provide layer-2
­semantics—eliminating the fragmentation and waste of
We design, implement, and evaluate VL2, a network server pool capacity that the binding between addresses
architecture for data centers that meets these three objec- and locations causes in the existing architecture.
tives and thereby achieves agility. Contributions: In the course of this paper, we describe
Design philosophy: In designing VL2, a primary goal the current state of data center networks and the traffic
was to create a network architecture that could be deployed across them, explaining why these are important to design-
today, so we limit ourselves from making any changes to the ing a new architecture. We present VL2’s design, which we
hardware of the switches or servers, and we require that leg- have built and deployed into an 80-server cluster. Using
acy applications work unmodified. Our approach is to build the cluster, we experimentally validate that VL2 has the
a network that operates like a very large switch—choosing properties set out as objectives, such as uniform capac-
simplicity and high performance over other features when ity and performance isolation. We also demonstrate the
needed. We sought to use robust and time-tested control speed of the network, such as its ability to shuffle 2.7TB
plane protocols, and we avoid adaptive routing schemes that of data among 75 servers in 395s (averaging 58.8Gbps).
might theoretically offer more bandwidth but open thorny Finally, we describe our experience applying VLB in a new
problems that might not need to be solved and would take context, the inter-switch fabric of a data center, and show
us away from vanilla, commodity, high-capacity switches. that VLB smooths utilization while eliminating persistent
We observe, however, that the software and operating sys- congestion.
tems on data center servers are already extensively modified
(e.g., to create hypervisors for virtualization or blob file sys- 2. BACKGROUND
tems to store data across servers). Therefore, VL2’s design In this section, we first explain the dominant design pattern
explores a new split in the responsibilities between host and for data center architecture today.5 We then discuss why this
network—using a layer 2.5 shim in servers’ network stack architecture is insufficient to serve large cloud-service data
to work around limitations of the network devices. No new centers.
switch software or switch APIs are needed. As shown in Figure 1, the network is a hierarchy reach-
Topology: VL2 consists of a network built from low- ing from a layer of servers in racks at the bottom to a layer
cost switch ASICs arranged into a Clos topology that pro- of core routers at the top. There are typically 20–40 servers
vides extensive path diversity between servers. This design per rack, each singly connected to a Top of Rack (ToR) switch
replaces today’s mainframe-like large, expensive switches with a 1Gbps link. ToRs connect to two aggregation switches
with broad layers of low-cost switches that can be scaled out for redundancy, and these switches aggregate further con-
to add more capacity and resilence to failure. In essence, necting to access routers. At the top of the hierarchy, core
VL2 applies the principles of RAID (redundant arrays of routers carry traffic between access routers and manage
inexpensive disks) to the network. traffic into and out of the data center. All links use Ethernet
Traffic engineering: Our measurements show data cen- as a physical-layer protocol, with a mix of copper and fiber
ters have tremendous volatility in their workload, their traf- cabling. All switches below each pair of access routers form
fic, and their failure patterns. To cope with this volatility a single layer-2 domain. The number of servers in a single
in the simplest manner, we adopt Valiant Load Balancing
(VLB) to spread traffic across all available paths without any Figure 1. A conventional network architecture for data centers
centralized coordination or traffic engineering. Using VLB, (adapted from figure by Cisco5).
each server independently picks a path at random through
the network for each of the flows it sends to other servers Internet Internet
CR CR
in the data center. Our experiments verify that using this
design achieves both uniform high capacity and perfor- Data Center
Layer 3
mance isolation. AR AR ... AR AR
Control plane: The switches that make up the network
operate as layer-3 routers with routing tables calculated Layer 2 AS AS
by OSPF, thereby enabling the use of multiple paths while Key
using a time-tested protocol. However, the IP addresses • CR = L3 Core Router
• AR = L3 Access Router
used by services running in the data center must not be s s s s ...
• AS = L2 Aggr Switch
tied to particular switches in the network, or the ability for • S = L2 Switch
agile reassignment of servers between services would be ToR ToR ToR ToR
• ToR = Top-of-Rack Switch
lost. Leveraging a trick used in many systems,9 VL2 assigns Servers ... Servers Servers ... Servers

servers IP addresses that act as names alone, with no topo-


logical significance. When a server sends a packet, the A Single Layer 2 Domain
shim layer on the server invokes a directory system to learn

96 co mmunications of th e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
layer-2 domain is typically limited to a few hundred due much data to whom and when?) and churn (how often does
to Ethernet scaling overheads (packet flooding and ARP the state of the network change due to switch/link failures
broadcasts). To limit these overheads and to isolate differ- and recoveries, etc.?). We studied the production data
ent services or logical server groups (e.g., e-mail, search, ­centers of a large cloud service provider, and we use the
web front ends, web back ends), servers are partitioned into results to drive our choices in designing VL2. Details of the
virtual LANs (VLANs) placed into distinct layer-2 domains. methodology and results can be found in other papers.10, 16
Unfortunately, this conventional design suffers from three Here we present the key findings that directly impact the
fundamental limitations: design of VL2.
Limited server-to-server capacity: As we go up the Most traffic is internal to the data center: The ratio of traf-
­hierarchy, we are confronted with steep technical and fic volume between servers in our data centers to traffic
financial barriers in sustaining high bandwidth. Thus, as entering/leaving our data centers is currently around 4:1
traffic moves up through the layers of switches and rout- (excluding CDN applications). An increasing fraction of
ers, the oversubscription ratio increases rapidly. For exam- the computation in data centers involves back-end com-
ple, servers typically have 1:1 oversubscription to other putations, and these are driving the demands for network
servers in the same rack—that is, they can communicate bandwidth.
at the full rate of their interfaces (e.g., 1 Gbps). We found The network bottlenecks computation: Data center compu-
that uplinks from ToRs are typically 1:2 to 1:20 oversub- tation is focused where high-speed access to data on mem-
scribed (i.e., 1–10 Gbps of uplink for 20 servers), and paths ory or disk is fast and cheap. Even inside a single data center,
through the highest layer of the tree can be 1:240 oversub- the network is a bottleneck to computation—we frequently
scribed. This large oversubscription factor fragments the see switches whose uplinks are above 80% utilization.
server pool by preventing idle servers from being assigned Intense computation and communication on data does not
to overloaded services, and it severely limits the entire straddle data centers due to the cost of long-haul links.
data center’s performance. Structured flow sizes: Figure 2 illustrates the nature of flows
Fragmentation of resources: As the cost and per- within the monitored data center. The flow size statistics
formance of communication depends on distance in (marked as “+”s) show that the majority of flows are small (a
the hierarchy, the conventional design encourages ser- few KB); most of these small flows are hellos and meta-data
vice planners to cluster servers nearby in the hierarchy. requests to the distributed file system. To examine ­longer
Moreover, spreading a ­service outside a single layer-2 flows, we compute a statistic termed total bytes (marked
domain frequently requires the ­onerous task of reconfigur- as “o”s) by weighting each flow size by its number of bytes.
ing IP addresses and VLAN trunks, since the IP addresses Total bytes tells us, for a random byte, the distribution of the
used by servers are topologically determined by the access flow size it belongs to. Almost all the bytes in the data cen-
routers above them. Collectively, this contributes to the ter are transported in flows whose lengths vary from about
squandering of computing resources across the data cen- 100MB to about 1GB. The mode at around 100MB springs
ter. The consequences are egregious. Even if there is plen- from the fact that the distributed file system breaks long files
tiful spare capacity throughout the data center, it is often into 100-MB-long chunks. Importantly, there are almost no
effectively reserved by a single service (and not shared), so flows over a few GB.
that this service can scale out to nearby servers to respond Figure 3 shows the probability density function (as
rapidly to demand spikes or to failures. In fact, the growing a ­fraction of time) for the number of concurrent flows
resource needs of one service have forced data center oper- going in and out of a machine. There are two modes. More
ations to evict other services in the same layer-2 domain,
incurring significant cost and disruption. Figure 2. Mice are numerous; 99% of flows are smaller than 100MB.
Poor reliability and utilization: Above the ToR, the However, more than 90% of bytes are in flows between 100MB and 1GB.
basic resilience model is 1:1. For example, if an aggrega- 0.45
tion switch or access router fails, there must be sufficient 0.4 Flow Size PDF
0.35 Total Bytes PDF
remaining idle capacity on the counterpart device to carry 0.3
the load. This forces each device and link to be run up to 0.25
PDF

0.2
at most 50% of its maximum utilization. Inside a layer-2 0.15
domain, use of the Spanning Tree Protocol means that 0.1
0.05
even when multiple paths between switches exist, only 0
a single one is used. In the layer-3 portion, Equal Cost 1 100 10000 1e+06 1e+08 1e+10 1e+12
Multipath (ECMP) is typically used: when multiple paths of Flow Size (Bytes)
the same length are available to a destination, each router
1
uses a hash function to spread flows evenly across the avail- 0.8 Flow Size CDF
able next hops. However, the conventional topology offers 0.6 Total Bytes CDF
CDF

at most two paths. 0.4


0.2
0
1 100 10000 1e+06 1e+08 1e+10 1e+12
3. MEASUREMENTS AND IMPLICATIONS
Developing a new network architecture requires a quanti- Flow Size (Bytes)
tative understanding of the traffic matrix (who sends how

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 97
research highlights

Figure 3. Number of concurrent connections has two modes: (1) 10 theory, ensures a noninterfering packet-switched network6
flows per node more than 50% of the time and (2) 80 flows per node (the ­counterpart of a non-blocking circuit-switched net-
for at least 5% of the time. work) as long as (a) traffic spreading ratios are uniform, and
0.04 1
(b) the offered traffic patterns do not violate edge constraints
PDF (i.e., line card speeds). We use ECMP to pursue the former
Fraction of Time

0.03 CDF 0.8

Cumulative
and TCP’s end-to-end congestion control to pursue the lat-
0.6
0.02 ter. While these design choices do not perfectly ensure the
0.4 two assumptions (a and b), we show in Section 5.1 that our
0.01 0.2 scheme’s performance is close to the optimum in practice.
0 0 Building on proven networking technology: VL2 is based
1 10 100 1000
on IP routing and forwarding technologies already avail-
Number of Concurrent flows in/out of each Machine able in commodity switches: link-state routing, ECMP for-
warding, and IP any-casting. VL2 uses a link-state routing
protocol to maintain the switch-level topology, but not to
than 50% of the time, an average machine has about ten disseminate end hosts’ information. This strategy protects
­concurrent flows, but at least 5% of the time it has greater switches from needing to learn voluminous, frequently
than 80 ­concurrent flows. We almost never see more than changing host information. Furthermore, the routing
100 ­concurrent flows. design uses ECMP forwarding along with anycast addresses
The distributions of flow size and number of concurrent to enable VLB while minimizing control plane messages
flows both imply that flow-based VLB will perform well on and churn.
this traffic. Since even big flows are only 100MB (1s of trans- Separating names from locators: To be able to ­rapidly
mit time at 1Gbps), randomizing at flow granularity (rather grow or shrink server allocations and rapidly migrate
than packet) will not cause perpetual congestion if there is VMs, the data center network must support agility, which
unlucky placement of too many flows in the same link. means support hosting any service on any server. This,
Volatile traffic patterns: While the sizes of flows show a in turn, calls for separating names from locations. VL2’s
strong pattern, the traffic patterns inside a data center are ­addressing scheme separates servers’ names, termed
highly divergent. When we cluster the traffic patterns, we ­application-­specific addresses (AAs), from their loca-
find that more than 50 representative patterns are required tions, termed ­location-specific addresses (LAs). VL2 uses
to describe the traffic in the data center. Further, the traf- a ­scalable, reliable directory system to maintain the map-
fic pattern varies frequently—60% of the time the network pings between names and locators. A shim layer running
spends only 100 s in one pattern before switching to another. in the networking stack on every host, called the VL2 agent,
Frequent failures: As discussed in Section 2, conventional invokes the directory ­system’s resolution service.
data center networks apply 1 + 1 redundancy to improve reli- Embracing end systems: The rich and homogeneous
ability at higher layers of the hierarchical tree. This hierar- ­programmability available at data center hosts provides a
chical topology is intrinsically unreliable—even with huge mechanism to rapidly realize new functionality. For exam-
effort and expense to increase the reliability of the network ple, the VL2 agent enables fine-grained path control by
devices close to the top of the hierarchy, we still see failures ­adjusting the randomization used in VLB. The agent also
on those devices resulting in significant downtime. In 0.3% replaces Ethernet’s ARP functionality with queries to the
of failures, all redundant components in a network device VL2 directory system. The directory system itself is also
group became unavailable (e.g., the pair of switches that ­realized on regular servers, rather than switches, and thus
comprise each node in the conventional network (Figure 1) offers ­flexibility, such as fine-grained access control between
or both the uplinks from a switch). The main causes of fail- application servers.
ures are network misconfigurations, firmware bugs, and
faulty components. 4.1. Scale-out topologies
With no obvious way to eliminate failures from the top As described in Sections 2 and 3, conventional hierarchical
of the hierarchy, VL2’s approach is to broaden the top levels data center topologies have poor bisection bandwidth and
of the network so that the impact of failures is muted and are susceptible to major disruptions due to device failures.
performance degrades gracefully, moving from 1 + 1 redun- Rather than scale up individual network devices with more
dancy to n + m redundancy. capacity and features, we scale out the devices—building a
broad network offering huge aggregate capacity using a large
4. VIRTUAL LAYER 2 NETWORKING number of simple, inexpensive devices, as shown in Figure 4.
Before detailing our solution, we briefly discuss our design This is an example of a folded Clos network6 where the links
principles and preview how they will be used in our design. between the intermediate switches and the aggregation
Randomizing to cope with volatility: The high divergence switches form a complete bipartite graph. As in the conven-
and unpredictability of data center traffic matrices suggest tional topology, ToRs connect to two aggregation switches,
that optimization-based approaches to traffic engineering but the large number of paths between any two aggregation
risk congestion and complexity to little benefit. Instead, switches means that if there are n intermediate switches, the
VL2 uses VLB: destination-independent (e.g., random) failure of any one of them reduces the bisection bandwidth
traffic spreading across the paths in the network. VLB, in by only 1/n—a desirable property we call graceful degradation

98 comm unications of th e acm | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


Figure 4. An example Clos network between aggregation and The VL2 directory system stores the mapping of AAs to
intermediate switches provides a richly connected backbone well LAs, and this mapping is created when application serv-
suited for VLB. The network is built with two separate address ers are provisioned to a service and assigned AA addresses.
families—topologically significant locator-specific addresses (LAs) Resolving these mappings through a unicast-based custom
and flat application-specific addresses (AAs).
protocol eliminates the ARP and DHCP scaling bottlenecks
Link-state network Internet
that plague large Ethernets.
carrying only LAs Packet forwarding: Since AA addresses are not announced
(e.g., 10/8) DA/2 x Intermediate Switches into the routing protocols of the network, for a server to
Int ... receive a packet the sending server must first encapsulate
DIx10G the packet (Figure 5), setting the destination of the outer
header to the LA of the destination AA. Once the packet
DA/2 x 10G arrives at the LA (the destination ToR or hypervisor), the
Aggr ...
switch (the ToR or the VM switch in the destination hypervi-
DIx Aggregate Switches sor) decapsulates the packet and delivers it to the destina-
DA/2 x 10G DADI/4 x ToR Switches
2 x10G tion AA given in the inner header.
...
ToR Address resolution and access control: Servers in each
service are configured to believe that they all belong to the
20 20(DADI/4) x Servers same IP subnet, so when an application sends a packet to an
Servers ....
Fungible pool of AA for the first time, the networking stack on the host gen-
servers owning AAs erates a broadcast ARP request for the destination AA. The
(e.g., 20/8)
VL2 agent running in the source’s networking stack inter-
cepts the ARP request and converts it to a unicast query to
the VL2 directory system. The directory system answers the
of bandwidth. Further, it is easy and inexpensive to build a query with the LA of the ToR to which packets should be tun-
Clos network for which there is no ­oversubscription ­(further neled. During this resolution process, the directory server
discussion on cost is given in Section 6). For example, in can additionally evaluate the access-control policy between
Figure 4, we use DA-port Aggregation and DI-port interme- the source and destination and selectively reply to the reso-
diate switches, and connect these switches such that the lution query, enforcing necessary isolation policies between
­capacity between each layer is DI  DA/2 times the link capacity. applications.
The Clos topology is exceptionally well suited for VLB in These addressing and forwarding mechanisms were cho-
that by forwarding traffic through an intermediate switch sen for two main reasons. First, they make it possible to use
that is chosen in a destination-independent passion (e.g., low-cost switches, which often have small routing tables
randomly chosen), the network can provide bandwidth guar- (typically just 16K entries) that can hold only LA routes,
antees for any traffic matrices that obey the hose ­model.8 without concern for the huge number of AAs. Second, they
Meanwhile, routing remains simple and resilient on this allow the control plane to support agility with very little
topology—take a random path up to a random intermediate overhead; the design obviates frequent link-state advertise-
switch and a random path down to a destination ToR switch. ments to disseminate host-state changes and host/switch
reconfiguration.
4.2. VL2 addressing and routing Random Traffic Spreading over Multiple Paths: To offer hot
This section explains the motion of packets in a VL2 net-
work, and how the topology, routing design, VL2 agent, and Figure 5: VLB in an example VL2 network. Sender S sends packets
directory system combine to virtualize the underlying net- to destination D via a randomly chosen intermediate switch using
work fabric and create the illusion that hosts are connected ­IP-in-IP encapsulation. AAs are from 20/8, and LAs are from 10/8.
to a big, noninterfering data center–wide layer-2 switch. H(ft) denotes a hash of the five tuple.
Address Resolution and Packet Forwarding: VL2 uses two
Link-state network with LAs (10/8)
separate classes of IP-address illustrated in Figure 4. The
network infrastructure operates using LAs; all switches and Int Int Int
(10.1.1.1) (10.1.1.1) (10.1.1.1)
interfaces are assigned LAs, and switches run an IP-based
H(ft) 10.1.1.1
(layer-3) link-state routing protocol that disseminates only H(ft) 10.0.0.6
H(ft) 10.0.0.6
20.0.0.55 20.0.0.56
these LAs. This allows switches to obtain complete knowl- 20.0.0.55 20.0.0.56
Payload
edge about the switch-level topology, as well as forward any (10.0.0.6)
(10.0.0.4)
packets encapsulated with LAs along the shortest paths. ToR ToR
(20.0.0.1) (20.0.0.1)
On the other hand, applications use permanent AAs, which
H(ft) 10.1.1.1
remain unaltered no matter how servers’ locations change H(ft) 10.0.0.6
20.0.0.55 20.0.0.56 20.0.0.55 20.0.0.66
due to VM migration or reprovisioning. Each AA (server) is Payload Payload

associated with an LA, the IP address of the ToR  switch to S(20.0.0.55) D(20.0.0.56)
which the application server is connected. The ToR switch
IP subnet with AAs (20/8) IP subnet with AAs (20/8)
need not be physical hardware—it could be a virtual switch
or hypervisor implemented in software on the server itself!

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t he acm 99
research highlights

spot–free performance for arbitrary traffic matrices, VL2 require high throughput and low response time to quickly
uses VLB as its traffic engineering philosophy. As illustrated establish a large number of connections. Since lookups
in Figure 5, VL2 achieves VLB using a combination of ECMP replace ARP, their response time should match that of ARP,
routing implemented by the switches and packet encapsu- that is, tens of milliseconds. For updates, however, the work-
lation implemented by the shim on each server. ECMP, a load is driven by server-deployment events, most of which are
mechanism already implemented in the hardware of most planned ahead by the data center management system and
switches, will distribute flows across the available paths in hence can be batched. The key requirement for updates is
the network, with the packets with the same source and des- reliability, and response time is less critical.
tination address taking the same path to avoid packet reor- Our directory service replaces ARP in a conventional L2
dering. To leverage all the available paths in the network and network, and ARP ensures eventual consistency via timeout
overcome some limitations in ECMP, the VL2 agent on each and broadcasting. This implies that eventual consistency of
sender encapsulates each packet to an intermediate switch. AA-to-LA mappings is acceptable as long as we provide a reli-
Hence, the packet is first delivered to one of the intermedi- able update mechanism. Nonetheless, we intend to support
ate switches, decapsulated by the switch, delivered to the live VM migration in a VL2 network; our directory system
ToR’s LA, decapsulated again, and finally sent to the des- should be able to correct all the stale entries without break-
tination server. The source address in the outer headers of ing any ongoing communications.
the encapsulated packet is set to a hash of the inner packet’s The differing performance requirements and workload
addresses and ports—this provides additional entropy to patterns of lookups and updates lead us to a two-tiered direc-
better distribute flows between the same servers across the tory system architecture consisting of (1) a modest number
available paths. (50–100 servers for 100 K servers) of read-optimized, repli-
One potential issue for both ECMP and VLB is the chance cated lookup servers that cache AA-to-LA mappings and that
that uneven flow sizes and random spreading decisions will communicate with VL2 agents, and (2) a small number (5–10
cause transient congestion on some links. Our evaluation servers) of write-optimized, asynchronous replicated state-
did not find this to be a problem on data center workloads machine (RSM) servers offering a strongly consistent, reli-
(Section 5), but should it occur, the VL2 agent on the sender able store of AA-to-LA mappings. The lookup servers ensure
can detect and deal with it via simple mechanisms. For low latency, high throughput, and high availability for a high
example, it can change the hash used to create the source lookup rate. Meanwhile, the RSM servers ensure strong con-
address periodically or whenever TCP detects a severe sistency and durability for a modest rate of updates using
congestion event (e.g., a full window loss) or an Explicit the Paxos19 consensus algorithm.
Congestion Notification. Each lookup server caches all the AA-to-LA mappings
stored at the RSM servers and independently replies to
4.3. Maintaining host information using lookup queries from agents using the cached state. Since
the VL2 directory system strong consistency is not required, a lookup server lazily
The VL2 directory system provides two key functions: synchronizes its local mappings with the RSM every 30s. To
(1)  lookups and updates for AA-to-LA mappings and (2) a achieve high availability and low latency, an agent sends a
reactive cache update mechanism that ensures eventual query to k (two in our prototype) randomly chosen lookup
consistency of the mappings with very little update overhead servers and simply chooses the fastest reply. Since AA-to-LA
(Figure 6). mappings are cached at lookup servers and at VL2 agents’
We expect the lookup workload for the directory system cache, an update can lead to inconsistency. To resolve incon-
to be frequent and bursty because servers can communicate sistency, the cache-update protocol leverages a key observa-
with up to hundreds of other servers in a short time period, tion: a stale host mapping needs to be corrected only when
with each new flow generating a lookup for an AA-to-LA map- that mapping is used to deliver traffic. Specifically, when a
ping. The bursty nature of workload implies that ­lookups stale mapping is used, some packets arrive at a stale LA—a
ToR which does not host the destination server anymore.
Figure 6. VL2 Directory System Architecture. The ToR forwards such non-deliverable packets to a lookup
server, triggering the lookup server to correct the stale
RSM ­mapping in the source’s cache via unicast.
RSM
3. Replicate
Servers
5. EVALUATION
RSM RSM
In this section, we evaluate VL2 using a prototype running on
2. Set 4. Ack an 80-server testbed and 10 commodity switches (Figure 7).
(6. Disseminate)
Our goals are first to show that VL2 can be built from compo-
... ... ... Directory nents available today, and second, that our implementation
DS DS DS
Servers
meets the objectives described in Section 1.
2. Reply 2. Reply
1. Lookup
5. Ack The testbed is built using the Clos network topology of
1. Update
Figure 4, consisting of three intermediate switches, three
Agent Agent aggregation switches, and 4 ToRs. The aggregation and
“Lookup” “Update” intermediate switches have 24 10 Gbps Ethernet ports, of
which 6 ports are used on each aggregation switch and 3

100 co mm unications of t h e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3
Figure 7. VL2 testbed comprising 80 servers and 10 switches. Figure 8. Aggregate goodput during a 2.7TB shuffle among 75
servers.

Aggregate goodput (Gbps)


60 6000
50 5000

Active flows
40 4000
30 Aggregate goodput 3000
20 Active flows 2000
10 1000
0 0
0 50 100 150 200 250 300 350 400
Time (s)

standard deviation 0.75Mbps). This goodput is more than


10x what the network in our current data centers can achieve
with the same investment.
We measure how close VL2 gets to the maximum achiev-
able throughput in this environment by computing the
goodput efficiency for this data transfer. Goodput effi-
ciency is defined as the ratio of the sent goodput summed
ports on each intermediate switch. The ToR switches have over all interfaces divided by the sum of the interface
2 10Gbps ports and 24 1Gbps ports. Each ToR is connected capacities. An efficiency of 1.0 would mean that all the
to 2 aggregation switches via 10Gbps links, and to 20 serv- capacity on all the interfaces is entirely used carrying use-
ers via 1Gbps links. Internally, the switches use commod- ful bytes from the time the first flow starts to when the last
ity ASICs: Broadcom ASICs 56820 and 56514, although any flow ends. The VL2 network achieves an efficiency of 94%,
switch that supports line rate L3 forwarding, OSPF, ECMP, with the difference from perfect being due to the encapsu-
and IPinIP decapsulation will work. lation headers (3.8%), TCP congestion control dynamics,
Overall, our evaluation shows that VL2 provides an and TCP retransmissions.
effective substrate for a scalable data center network; VL2 This 94% efficiency combined with the fairness index
achieves (1) 94% optimal network capacity, (2) a TCP fairness of 0.995 demonstrates that VL2 can achieve uniform high
index of 0.995, (3) graceful degradation under failures with bandwidth across all servers in the data center.
fast reconvergence, and (4) 50 K lookups/s under 10 ms for
fast address resolution. 5.2. VL2 provides performance isolation
One of the primary objectives of VL2 is agility, which we
5.1. VL2 provides uniform high capacity define as the ability to assign any server, anywhere in the
A central objective of VL2 is uniform high capacity between data center to any service (Section 1). Achieving agility criti-
any two servers in the data center. How closely does the per- cally depends on providing sufficient performance isolation
formance and efficiency of a VL2 network match that of a between services so that if one service comes under attack or
layer-2 switch with 1:1 oversubscription? To answer this a bug causes it to spray packets, it does not adversely impact
question, we consider an all-to-all data shuffle stress test: the performance of other services.
all servers simultaneously initiate TCP transfers to all other Performance isolation in VL2 rests on the mathematics
servers. This data shuffle pattern arises in large-scale sorts, of VLB—that any traffic matrix that obeys the hose model
merges, and joint operations in the data center, for example, is routed by splitting to intermediate nodes in equal ratios
in Map/Reduce or DryadLINQ jobs.7, 22 Application develop- (through randomization) to prevent any persistent hot
ers use these operations with caution today, because they spots. Rather than have VL2 perform admission control or
are so network resource expensive. If data shuffles can be rate shaping to ensure the traffic offered to the network con-
supported efficiently, it would have large impact on the over- forms to the hose model, we instead rely on TCP to ensure
all algorithmic and data storage strategy. that each flow offered to the network is rate limited to its fair
We create an all-to-all data shuffle traffic matrix involving share of its bottleneck.
75 servers. Each of 75 servers must deliver 500MB of data to A key question we need to validate for performance iso-
each of the 74 other servers—a shuffle of 2.7TB from mem- lation is whether TCP reacts sufficiently quickly to control
ory to memory. Figure 8 shows how the sum of the goodput the offered rate of flows within services. TCP works with
over all flows varies with time during a typical run of the packets and adjusts their sending rate at the time scale of
2.7TB data shuffle. During the run, the sustained utilization RTTs. Conformance to the hose model, however, requires
of the core links in the Clos network is about 86%, and VL2 instantaneous feedback to avoid oversubscription of traffic
achieves an aggregate goodput of 58.8Gbps. The goodput is ingress/egress bounds. Our next set of experiments shows
very evenly divided among the flows for most of the run, with that TCP is “fast enough” to enforce the hose model for traf-
a fairness index between the flows of 0.99515 where 1.0 indi- fic in each service so as to provide the desired performance
cates perfect fairness (mean goodput per flow 11.4Mbps, isolation across services.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t h e acm 101


research highlights

In this experiment, we add two services to the network. handle 50K lookup/second with latency under 10ms (99th
The first service has a steady network workload, while the percentile latency). Second, the directory system can ­handle
workload of the second service ramps up and down. Both updates at rates significantly higher than the expected
the services’ servers are intermingled among the 4 ToRs, churn rate in typical environments: three directory servers
so their traffic mixes at every level of the network. Figure can handle 12K updates/s within 600ms (99th percentile
9 shows the aggregate goodput of both services as a func- latency). Third, our system is incrementally scalable: each
tion of time. As seen in the figure, there is no perceptible directory server increases the processing rate by about 17K
change to the aggregate goodput of service one as the flows for lookups and 4K for updates. Finally, the directory sys-
in service two start up or complete, demonstrating perfor- tem is robust to component (directory or RSM servers) fail-
mance isolation when the traffic consists of large long- ures and offers high availability under network churn.
lived flows. In Figure 10, we perform a similar experiment, To understand the incremental scalability of the direc-
but service two sends bursts of small TCP connections, tory system, we measured the maximum lookup rates
each burst containing progressively more connections. (ensuring sub-10ms latency for 99% requests) with 3, 5, and
These two experiments demonstrate TCP’s enforcement of 7 directory servers. The result confirmed that the maximum
the hose model sufficient to provide performance isolation lookup rates increases linearly with the number of direc-
across services at timescales greater than a few RTT (i.e., tory servers (with each server offering a capacity of 17K
1–10ms in data centers). lookups/s). Based on this result, we estimate the worst case
number of directory servers needed for a 100K server data
5.3. VL2 directory system performance center. Using the concurrent flow measurements (Figure 3),
Finally, we evaluate the performance of the VL2 directory we use the median of 10 correspondents per server in a 100s
system which provides the equivalent semantics of ARP in window. In the worst case, all 100K servers may perform 10
layer 2. We perform this evaluation through macro- and simultaneous lookups at the same time resulting in a mil-
micro-benchmark experiments on the directory system. We lion simultaneous lookups per second. As noted above,
run our prototype on up to 50 machines: 3–5 RSM nodes, each directory server can handle about 17K lookups/s under
3–7 directory server nodes, and the remaining nodes emu- 10ms at the 99th percentile. Therefore, handling this worst
lating multiple instances of VL2 agents generating lookups case will require a directory system of about 60 servers
and updates. (0.06% of the entire servers).
Our evaluation supports four main conclusions. First,
the directory system provides high throughput and fast 6. DISCUSSION
response time for lookups: three directory servers can In this section, we address several remaining concerns
about the VL2 architecture, including whether other traffic
Figure 9: Aggregate goodput of two services with servers engineering mechanisms might be better suited to the DC
intermingled on the ToRs. Service one’s goodput is unaffected than VLB, and the cost of a VL2 network.
as service two ramps traffic up and down.
Optimality of VLB: As noted in Section 4.2.2, VLB uses
randomization to cope with volatility, potentially sacrific-
15
Aggregate goodput (Gbps)

ing some performance for a best-case traffic pattern by


turning all traffic patterns (including both best-case and
10
worst-case) into the average case. This performance loss will
Service 1
Service 2
manifest itself as the utilization of some links being higher
5 than they would under a more optimal traffic engineering
system. To quantify the increase in link utilization VLB will
0 suffer, we compare VLB’s maximum link utilization with
60 80 100 120 140 160 180 200 220
that achieved by other routing strategies on the VL2 topol-
Time (s)
ogy for a full day’s traffic matrices (TMs) (at 5 min intervals)
from the data center traffic data reported in Section 3.
We first compare to adaptive routing, which routes each
Figure 10. Aggregate goodput of service one as service two creates TM separately so as to minimize the maximum link utiliza-
bursts containing successively more short TCP connections. tion for that TM—essentially upper-bounding the best per-
formance that real-time adaptive traffic engineering could
20 2000 achieve. Second, we compare to best oblivious routing over
Aggregate goodput (Gbps)

15 1500
all TMs so as to minimize the maximum link utilization.
# mice started

(Note that VLB is just one among many oblivious routing


10 1000
strategies.) For adaptive and best oblivious routing, the
5 Aggregate goodput 500 routings are computed using respective linear programs in
# mice started
0 0
cplex. The overall utilization for a link in all schemes is
50 60 70 80 90 100 110 120 130 computed as the maximum utilization over all routed TMs.
Time (s) In Figure 11, we plot the CDF for link utilizations for
the three schemes. We normalized the link utilization
numbers so that the maximum utilization on any link for

102 comm unications of t h e acm | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


adaptive routing is 1.0. The results show that for most links, 13
VL2 also leverages the programmability of servers; how-
VLB performs about the same as the other two schemes. ever, it uses servers only to control the way traffic is routed
For the most heavily loaded link in each scheme, VLB’s as switch ASICs forward packets at less cost in power and
link capacity usage is at worst 20% higher than that of the dollars per Mbps.
other two schemes. Thus, evaluations on actual data cen- Valiant load balancing: Valiant introduced VLB as a
ter workloads show that the simplicity and universality of ­randomized scheme for communication among parallel
VLB costs relatively little capacity when compared to much processors interconnected in a hypercube topology.6 Among
more complex traffic engineering schemes. Moreover, its recent applications, VLB has been used inside the switch-
adaptive routing schemes might be difficult to implement ing fabric of a packet switch.3 VLB has also been proposed,
in the data ­center. Since even the “elephant” flows are with modifications and generalizations,18,  23 for oblivious
about 100MB (see Figure  2), lasting about 1 s on a server ­routing of variable traffic on the Internet under the hose
with a 1Gbps NIC, any reactive traffic engineering scheme traffic model.8
will need to run at least as frequently if it wants to react to Scalable routing: The Locator/ID Separation Protocol9
individual flows. proposes “map-and-encap” as a key principle to achieve
Cost and Scale: With the range of low-cost commod- ­scalability and mobility in Internet routing. VL2’s ­control
ity devices currently available, the VL2 topology can scale plane takes a similar approach (i.e., demand-driven
to create networks with no oversubscription between all ­host-­information ­resolution and caching) but adapted to
the servers of even the largest data centers. For example, the data center environment and implemented on end
switches with 144 ports (D = 144) are available today for hosts. SEATTLE17 proposes a distributed host-­information
$150K, enabling a network that connects 100K servers ­resolution system running on switches to enhance
using the topology in Figure 4 and up to 200K servers using Ethernet’s scalability.
a slight variation. Using switches with D = 24 ports (which Commercial networks: Data Center Ethernet (DCE)4 by
are available today for $8K each), we can connect about Cisco and other switch manufacturers shares VL2’s goal
3K servers. Comparing the cost of a VL2 network for 35K of increasing network capacity through multipath. These
servers with a conventional one found in one of our data industry efforts are primarily focused on consolidation of
centers shows that a VL2 network with no oversubscription IP and storage area network (SAN) traffic, and there are few
can be built for the same cost as the current network that SANs in cloud-service data centers. Due to the requirement
has 1:240 ­oversubscription. Building a conventional net- to support loss-less traffic, their switches need much bigger
work with no oversubscription would cost roughly 14× the buffers (tens of MBs) than commodity Ethernet switches do
cost of a equivalent VL2 network with no oversubscription. (tens of KBs), hence driving their cost higher.

7. RELATED WORK 8. SUMMARY


Data center network designs: There is great interest in VL2 is a new network architecture that puts an end to the
building data center networks using commodity switches need for oversubscription in the data center network, a
and a Clos topology.2, 11, 20, 21 The designs differ in whether result that would be prohibitively expensive with the exist-
they ­provide layer-2 semantics, their traffic engineering ing architecture.
strategy, the maturity of their control planes, and their VL2 benefits the cloud service programmer. Today, pro-
compatibility with existing switches. Other approaches grammers have to be aware of network bandwidth con-
use the servers themselves for switching data packets.1, 12, straints and constrain server-to-server communications
accordingly. VL2 instead provides programmers the simpler
abstraction that all servers assigned to them are plugged
Figure 11. CDF of normalized link utilizations for VLB, adaptive, into a single layer-2 switch, with hot spot–free performance
and best oblivious routing schemes, showing that VLB (and best
oblivious routing) come close to matching the link utilization
regardless of where the servers are actually connected in
performance of adaptive routing. the topology. VL2 also benefits the data center operator as
today’s bandwidth and control plane constraints fragment
1
the server pool, leaving servers (which account for the lion’s
VLB
share of data center cost) underutilized even while demand
0.8 Adaptive
elsewhere in the data center is unmet. Instead, VL2 enables
Best-Oblivious
agility: any service can be assigned to any server, while the
Percentile Rank

0.6 network maintains uniform high bandwidth and perfor-


mance isolation between services.
0.4 VL2 is a simple design that can be realized today with
available networking technologies, and without changes to
0.2 switch control and data plane capabilities. The key enablers
are an addition to the end-system networking stack, through
0 well-established and public APIs, and a flat addressing
0 0.2 0.4 0.6 0.8 1 1.2
scheme, supported by a directory service.
Link Utilizations (Normalized) VL2 is efficient. Our working prototype, built using
­commodity switches, approaches in practice the high level

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t h e acm 103


research highlights

of performance that the theory predicts. Experiments with generation data center architecture: ethernet architecture for large
Scalability and commoditization. In enterprises. In SIGCOMM (2008).
two data center services showed that churn (e.g., dynamic PRESTO Workshop at SIGCOMM 18. Kodialam, M., Lakshman, T.V.,
reprovisioning of servers, change of link capacity, and (2008). Sengupta, S. Efficient and robust
12. Guo, C., Wu, H., Tan, K., Shiy, L., Zhang, routing of highly variable traffic. In
microbursts of flows) has little impact on TCP goodput. Y., Lu, S. DCell: a scalable and fault- HotNets (2004).
VL2’s implementation of VLB splits flows evenly and VL2 tolerant network structure for data 19. Lamport, L. The part-time parliament.
centers. In SIGCOMM (2008). ACM Trans. Comput. Syst. 16 (1998),
achieves high TCP fairness. On all-to-all data shuffle com- 13. Guo, C., Wu, H., Tan, K., Shiy, L., Zhang, 133–169.
munications, the prototype achieves an efficiency of 94% Y., Lu, S. BCube: a high performance, 20. Mudigonda, J., Yalagandula, P.,
server-centric network architecture Al-Fares, M., Mogul, J.C. Spain: Cots
with a TCP fairness index of 0.995. for modular data centers. In data-center ethernet for multipathing
SIGCOMM (2009). over arbitrary topologies. In NSDI
14. Hamilton, J. Cems: Low-cost, (2010).
Acknowledgments low-power servers for internet- 21. Touch, J., Perlman, R. Transparent
scale services. In Conference on interconnection of lots of links
Insightful comments from David Andersen, Jon Crowcroft, Innovative Data Systems Research (TRILL): Problem and applicability
and the anonymous reviewers greatly improved the final ver- (Jan 2009). statement. IETF RFC 5556 (2009).
15. Jain, R. The Art of Computer Systems 22. Yu, Y., Isard, M., Fetterly, D., Budiu,
sion of this paper. Performance Analysis. John Wiley and M., Erlingsson, U., Gunda, P.K.,
Sons, Inc., 1991. Currey, J. DryadLINQ: a system for
16. Kandula, S., Sengupta, S., Greenberg, general-purpose distributed data-
A., Patel, P., Chaiken, R. The nature of parallel computing using a high-level
References
datacenter traffic: Measurements and language. In OSDI (2008).
1. Abu-Libdeh, H., Costa, P., Rowstron, 7. Dean, J., Ghemawat, S. MapReduce: analysis. In IMC (2009). 23. Zhang-Shen, R., McKeown, N. Designing
A., O’Shea, G., Donnelly, A. Symbiotic simplified data processing on large 17. Kim, C., Caesar, M., Rexford, J. a Predictable Internet Backbone
routing in future data centers. In clusters. In OSDI (2004). Floodless in SEATTLE: a scalable Network. In HotNets (2004).
SIGCOMM (2010). 8. Duffield, N.G., Goyal, P., Greenberg,
2. Al-Fares, M., Loukissas, A., Vahdat, A. A.G., Mishra, P.P., Ramakrishnan,
A scalable, commodity data center K.K., van der Merwe, J.E. A flexible
Albert Greenberg, Navendu Jain, James R. Hamilton,
network architecture. In SIGCOMM model for resource management in
Srikanth Kandula, Changhoon Kim, Amazon Web Services
(2008). virtual private network. In SIGCOMM
Parantap Lahiri, David A. Maltz,
3. Chang, C., Lee, D., Jou, Y. Load (1999).
Parveen Patel, Sudipta Sengupta,
balanced Birkhoff-von Neumann 9. Farinacci, D., Fuller, V., Oran, D., Meyer,
Microsoft Research
switches, part I: one-stage buffering. D., Brimm, S. Locator/ID Separation
IEEE HPSR (2001). Protocol (LISP). Internet-draft, Dec.
4. Cisco. Data center Ethernet. http:// 2008.
www.cisco.com/go/dce. 10. Greenberg, A., Jain, N., Kandula, S.,
5. Cisco: Data center: Load balancing Kim, C., Lahiri, P., Maltz, D., Patel,
data center services, 2004. P., Sengupta, S. Vl2: A scalable and
6. Dally, W.J., Towles, B. Principles flexible data center network. In
and Practices of Interconnection SIGCOMM (2009).
Networks. Morgan Kaufmann 11. Greenberg, A., Lahiri, P., Maltz, D.A.,
Publishers, 2004. Patel, P., Sengupta, S. Towards a next © 2011 ACM 0001-0782/11/0300 $10.00

You’ve come a long way.


Share what you’ve learned.

ACM has partnered with MentorNet, the award-winning nonprofit e-mentoring network in engineering,
science and mathematics. MentorNet’s award-winning One-on-One Mentoring Programs pair ACM
student members with mentors from industry, government, higher education, and other sectors.
• Communicate by email about career goals, course work, and many other topics.
• Spend just 20 minutes a week - and make a huge difference in a student’s life.
• Take part in a lively online community of professionals and students all over the world.

Make a difference to a student in your field.


Sign up today at: www.mentornet.net
Find out more at: www.acm.org/mentornet
MentorNet’s sponsors include 3M Foundation, ACM, Alcoa Foundation, Agilent Technologies, Amylin Pharmaceuticals, Bechtel Group Foundation, Cisco
Systems, Hewlett-Packard Company, IBM Corporation, Intel Foundation, Lockheed Martin Space Systems, National Science Foundation, Naval Research
Laboratory, NVIDIA, Sandia National Laboratories, Schlumberger, S.D. Bechtel, Jr. Foundation, Texas Instruments, and The Henry Luce Foundation.

104 comm unications of t h e ac m | Ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


careers

Barry University the breadth of NEC business, and maintains a bal- and which are ripe for technical breakthrough.
Assistant Professor of anced mix of fundamental and applied research. The Energy Management Department in Cu-
Computer Science The Media Analytics Department in Cuper- pertino, CA, is seeking an outstanding and en-
tino, CA, is seeking an outstanding and enthu- thusiastic researcher with background in energy
The Department of Mathematics and Computer siastic researcher, with background in machine systems modeling and optimization to work on
Science invites applications for a continuing con- learning and computer vision, to work on devel- design of energy micro-grids. Candidates are ex-
tract faculty position in Computer Science with oping visual recognition technologies for novel pected to be strong in conducting cutting edge
the rank of Assistant Professor, starting in Fall mobile applications, web services, and HCI solu- research, and also passionate about leading re-
2011. tions. We expect the candidates to be strong in search threads and turning research into high
Strong teaching skills, a commitment to re- conducting cutting edge research, and also pas- impact products and services. We encourage re-
search, and service to the Department and the sionate about turning research into high impact searchers to establish leadership in the research
University are expected. The Department has un- products and services. We encourage researchers community, and maintain active research collab-
dergraduate majors in Mathematical Sciences, to establish leadership in the research commu- orations with top universities in the US.
Computer Science and Computer Information nity, and maintain active research collaborations
Science. It also has a pre-Engineering program. with top universities in the US. Required Skills or Experience:
The Department offers courses for the majors ˲˲ PhD in ME/EE(Power Systems)/CS/OR (or
and service courses for all other Schools. Required Skills or Experience: equivalent)
˲˲ PhD in Computer Science (or equivalent) ˲˲ Solid knowledge in math, optimization, and
Qualifications: ˲˲ Strong publication in top machine learning or statistical analysis
The search is open to all areas of Computer Sci- computer vision conferences & journals ˲˲ Hands-on experiences in implementing energy
ence, with a particular emphasis on candidates ˲˲ Solid knowledge in math, optimization, and system models
with research interests in software engineering, statistical inference ˲˲ Great problem solving skills, with a strong
computer security or mobile computing. ˲˲ Hands-on experiences in implementing large- desire for quality and engineering excellence
A Ph.D. in Computer Science or closely related scale learning algorithms and systems ˲˲ Expert knowledge of optimization theory and
field is required. ˲˲ Great problem solving skills, with a strong tools
Review of applications will start 02/14/2011 desire for quality and engineering excellence ˲˲ Working knowledge of power and energy
and will continue until the position is filled. ˲˲ Expert knowledge developing and debugging systems
Interested and qualified candidates should in C/C++
send the following: Desired Skills:
Complete curriculum vitae Desired Skills a Plus: ˲˲ Knowledge of thermodynamic principles
Transcripts and Three letters of reference to: ˲˲ Good knowledge developing and debugging on ˲˲ Knowledge of GAMS or equivalent optimiza-
Linux tion tools
CS Faculty Search Committee, ˲˲ Good knowledge developing in Java ˲˲ Experience in power sector
Dept. of MATH and CS, Barry University, ˲˲ Experience with scripting languages such as
11300 NE 2nd Ave., Python, PHP, Perl, and shell scripts For consideration, please visit our career cen-
Miami Shores, FL 33161. ˲˲ Experience with parallel/distributed computing ter at http://www.nec-labs.com/careers/index.php
Contact us by Fax (305)899-3610 ˲˲ Experience with algorithm implementation on to submit your resume and a research statement.
By email to: mathcs@mail.barry.edu. GPU
˲˲ Experience with mobile or embedded systems EOE/AA/MFDV
Barry University is a Catholic institution ˲˲ Experience with image classification, object
grounded in the liberal arts tradition and is com- recognition, and visual scene parsing
mitted to an inclusive community, social justice, ˲˲ Ability to work on other media data, like textual Reykjavik University
and collaborative service. and audio data School of Computer Science
Barry University is an Equal Employment Op- Faculty position in computer systems
portunity Employer. For consideration please submit your resume
Barry University does not discriminate appli- and a one-page research statement at http://www. The School of Computer Science at Reykjavik
cants or employees for terms of employment on nec-labs.com/careers/index.php. University seeks to hire a faculty member in the
the basis of race, color, sex, religion, national ori- field of computer systems. We are looking for a
gin, disability, veteran status, political affiliation EOE/AA/MFDV highly-qualified academic who, apart from devel-
or any other terms prohibited under the county oping her/his research programme, is interested
ordinance, state or federal law. in working with existing faculty, and in bridging
NEC Laboratories America, Inc. between research, in one or more of the research
Research Staff Member - Energy Management areas within the School, in particular artificial in-
NEC Laboratories America, Inc. telligence, software engineering and theoretical
Research Staff Member - Machine Learning & NEC Laboratories America, Inc.’s research pro- computer science.
Computer Vision gram covers many areas, reflecting the breadth The level of the position can range from as-
of NEC business, and maintains a balanced mix sistant professor to full professor, depending on
NEC Laboratories America, Inc. is a vibrant in- of fundamental and applied research. We focus the qualifications of the applicant. For informa-
dustrial research center, conducting research in on topics with strong innovations in the U.S. and tion on the position, how to apply and the School
support of NEC’s U.S. and global businesses. Our place emphasis on developing deep competence in of Computer Science at Reykjavik University, see
research program covers many areas, reflecting selective areas that are important to NEC business http://www.ru.is/faculty/luca/compsysjob.html

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t h e acm 105


careers

University of Detroit Mercy and excellent communication skills.


Computer Science/Software Engineering USU offers competitive salaries and out-
Tenure Track Faculty standing medical, retirement, and profes-
sional benefits (see http://www.usu.edu/hr/
The University of Detroit Mercy is currently seek- for details). The department currently has ap-
ing qualified candidates for the position of Assis- proximately 280 undergraduate majors, 80 MS
tant Professor beginning in August 2011. Please students and 27 PhD students. There are 17 full
visit http://engsci.udmercy.edu/opportunities/ time faculty. The BS degree is ABET accredited.
index.htm for details and qualifications. Utah State University is a Carnegie Research
Doctoral extensive University of over 23,000
students, nestled in a mountain valley 80 miles
University of Wisconsin-Superior north of Salt Lake City, Utah. Opportunities for
Assistant Professor a wide range of outdoor activities are plentiful.
Housing costs are at or below national averag-
Faculty Position, Computer Science, UW-Supe- es, and the area provides a supportive environ-
rior. Requires Doctorate in Computer Science, ment for families and a balanced personal and
Electrical Engineering, Discrete Mathematics, professional life. Women, minority, veteran
or closely related field. See complete ad at http:// and candidates with disabilities are encour-
www.uwsuper.edu/hr/employment/index.cfm aged to apply. USU is sensitive to the needs of
CBC required. AA/EOE dual-career couples. Utah State University is an
affirmative action/equal opportunity employer,
with a National Science Foundation ADVANCE
Utah State University Gender Equity program, committed to increas-
Assistant Professor ing diversity among students, faculty, and all
participants in university life.
Applications are invited for a faculty position at Applications must be submitted using USU’s
the Assistant Professor level, for employment online job-opportunity system. To access this job
beginning Fall 2011. Applicants must have com- opportunity directly and begin the application
pleted a PhD in computer science by the time of process, visit https://jobs.usu.edu/applicants/
appointment. The position requires demonstrat- Central?quickFind=55484.
ed research success, a significant potential for The review of the applications will begin on
attracting external research funding, excellence January 15, 2011 and continue until the position
in teaching both undergraduate and graduate is filled. The salary will be competitive and de-
courses, the ability to supervise student research, pend on qualifications.

APPLIED AND COMPUTATIONAL ANALYSIS


PROGRAM OFFICER
(Mathematician, Statistician, Computer Scientist,
Electrical Engineer, Or Physicist)

The Office of Naval Research is seeking a qualified individual to


plan, initiate, manage and coordinate sponsored basic and applied
research programs in the broad area of information science with
potential applications in C4ISR (Command, Control, Communications,
Computers, Intelligence, Surveillance and Reconnaissance). This is
a Civil Service position at the NP-IV level ($105,211 - $155,500)
depending on individual qualifications.

The intended focus concerns interdisciplinary research at the


interface of mathematics and computer science to tackle large and
disparate data representation, analysis, and integration problems
with potential Navy and DoD applications in image and signal
understanding and computational decision making. Research
experiences in functional analysis; probability theory and stochastic
analysis; statistical theory and computation; information theory;
machine learning; and information processing, analysis, and
integration are desirable.

This is a future vacancy to be announced. Interested parties should


send resumes to bernadette.sterling.ctr@navy.mil. When the formal
announcement is posted interested parties will be notified and
advised how to apply.

U.S. CITIZENSHIP REQUIRED AN EQUAL OPPORTUNITY EMPLOYER

106 comm unications of t h e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


Qualification requirements
DEAN OF THE SCHOOL The successful candidate will demonstrate competency with a variety of
OF COMPUTER SCIENCE experience:
• Leadership experience in a university department or research facility
FUDAN UNIVERSITY, • Carried out pioneered successful national and/or international research
projects in computer science
SHANGHAI, CHINA • Published papers in leading international academic journals or
international conference in recent six years
Founded 1905, Fudan University is one of the top five universities in China with • No nationality restriction, Chinese as working language, fluent in English
over 26,000 undergraduate and graduate students, and over 3,600 international
students earning more than 300 degrees through 28 schools and departments. Term and Salary
In its quest to become recognized as a world class research university, Fudan • 5 years each term
has invested heavily on modern educational and research environments. • Full-time assignment
• Compensation package negotiated during interview
The newly established School of Computer Science is continuing a long tradition
of computer science research at Fudan University which began in 1956 with Rights during the term
the construction of China’s first computer. The school is well positioned to • The Dean will lead the school steering committee to operate and manage
remain at the forefront of international computing research and to continue to the financial and human resources based upon the guidelines and
provide world-class education to its students. As Shanghai is rapidly growing policies delegated by the university
into a world financial and business center, the School of Computer Science is
strategically positioned to develop into a world-class center of excellence in To Apply:
computer science. Application timeline: From March 1 to March 31, 2011
Submit resume, degree certificates, certification of current employment, list of
Learn more about Fudan University by visiting http://www.fudan.edu.cn. published papers during the last six years, 3-5 references and strategic vision
and goal for the position.
Fudan University is seeking an innovative leader for the position of Dean of the
School of Computer Science with academic and administrative leadership, and Contact Information:
international vision. Fudan University, Office of Personnel
GE Qinghua , FENG Tao
Duties and Responsibilities
Tel: 86-21-65642953 65654795 55664593
The Dean will be charged with the responsibilities of continuing to build the
Email: yj@fudan.ac.cn
School towards achieving international stature. The successful candidate will
do this as follows:
• Retain and recruit high caliber staff Fudan University School of Computer Science
• Continue to improve academic and research facilities YAO Xiaozhi
• Continue to improve curriculum Tel: 86-21-51355555 ext. 28, Fax: 86-21-51355558
• Seek to strengthen strategic and cooperative relationships with Email: xzyao@fudan.edu.cn
CACM internationally
lifetime recognized
mem half page ad:Layout 1 1/4/11 5:53 PM Website:
universities Page http://www.cs.fudan.edu.cn
1

Take Advantage of
ACM’s Lifetime Membership Plan!
 ACM Professional Members can enjoy the convenience of making a single payment for their
entire tenure as an ACM Member, and also be protected from future price increases by
taking advantage of ACM's Lifetime Membership option.
 ACM Lifetime Membership dues may be tax deductible under certain circumstances, so
becoming a Lifetime Member can have additional advantages if you act before the end of
2011. (Please consult with your tax advisor.)
 Lifetime Members receive a certificate of recognition suitable for framing, and enjoy all of
the benefits of ACM Professional Membership.

Learn more and apply at:


http://www.acm.org/life

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t h e acm 107


ACM TechNews Goes Mobile
iPhone & iPad Apps Now Available in the iTunes Store
ACM TechNews—ACM’s popular thrice-weekly news briefing service—is now
available as an easy to use mobile apps downloadable from the Apple iTunes Store.
These new apps allow nearly 100,000 ACM members to keep current with
news, trends, and timely information impacting the global IT and Computing
communities each day.

TechNews mobile app users will enjoy:


• Latest News: Concise summaries of the most
relevant news impacting the computing world
• Original Sources: Links to the full-length
articles published in over 3,000 news sources
• Archive access: Access to the complete
archive of TechNews issues dating back to
the first issue published in December 1999
• Article Sharing: The ability to share news
with friends and colleagues via email, text
messaging, and popular social networking sites
• Touch Screen Navigation: Find news
articles quickly and easily with a
streamlined, fingertip scroll bar
• Search: Simple search the entire TechNews
archive by keyword, author, or title
• Save: One-click saving of latest news or archived
summaries in a personal binder for easy access
• Automatic Updates: By entering and saving
your ACM Web Account login information,
the apps will automatically update with
the latest issues of TechNews published
every Monday, Wednesday, and Friday

The Apps are freely available to download from the Apple iTunes Store, but users must be registered
individual members of ACM with valid Web Accounts to receive regularly updated content.
http://www.apple.com/iphone/apps-for-iphone/ http://www.apple.com/ipad/apps-for-ipad/

ACM TechNews
last byte

DOI:10.1145/1897852.1897878 Peter Winkler

Puzzled
Solutions and Sources
Last month (February 2011, p. 112) we posted a trio of brainteasers, including
one as yet unsolved, concerning partitions of Ms. Feldman’s fifth-grade class.
Here, we offer solutions to at least two of them. How did you do?

1. Monday, Tuesday.
Solution. Recall that on Mon-
day, Ms. Feldman partitioned her class
2. Unfriendly Partitions.
Solution. A partition of the
kind Ms. Feldman wants, into two
3. Countably Infinite Graphs.
Unsolved. On Friday, when the
class suddenly has a countably infi-
into k subsets and on Tuesday repar- subsets, such that no student has nite number of students, Ms. Feld-
titioned the same students into k+1 more than half his/her friends in that man can no longer apply these argu-
subsets. We were asked to show that student’s own group, is called an “un- ments to show that an unfriendly
at least two students were in smaller friendly partition.” To see that an un- partition exists. The difficulty is that
subsets on Tuesday than they were on friendly partition exists, consider, for now there may be no partition with
Monday. any partition, the number of broken the maximum number of broken
It turns out that a nice way to see friendships; that is, the number of friendships. Moreover, even if there
this is to consider how much work pairs of friendly students who have is a partition that breaks infinitely
each student contributed to his or her been separated. Now choose a parti- many friendships, the argument fails
assigned project. Assume that all proj- tion that maximizes this number; it because switching student X as in So-
ects (both days) were equally demand- must be unfriendly. Why? Because if lution 2 doesn’t give a contradiction.
ing, with each requiring a total of one student X has more friends in his/her Amazingly, no substitute argument
unit of effort. Assume, too, that the subset than in the other subset, mov- has been found, nor has anyone come
work was divided perfectly equitably, ing X from one to the other subset up with an example where no unfriend-
so a student in a subset of size m con- would yield a partition with more bro- ly partition exists. For (much) more
tributed 1/m units of effort. The total ken friendships. information, see the marvelous article
effort contributed on Monday was, of by Saharon Shelah and the late Eric
course, k and on Tuesday k+1, so some Milnor, “Graphs with No Unfriendly
students must have contributed more Partitions,” in A Tribute to Paul Erdős, A.
of their effort on Tuesday than on Mon- Baker, B. Bollobás, and A. Hajnal, Eds.,
day. But no individual student could Cambridge University Press, 1990, 373–
have made up the full unit difference; 384. Shelah and Milnor constructed an
the difference between 1/m and 1/n is “uncountable” graph with no unfriend-
less than 1 for any positive integers m ly partitions; they also showed that ev-
and n. At least two students thus put in ery graph has an unfriendly partition
more effort on Tuesday and were there- divided into three subsets. The case of
fore in smaller groups the second day. countably infinite graphs, when seek-
This surprisingly tricky puzzle was ing to divide an unfriendly partition
brought to my attention by Ori Gurel- into two subsets, remains tantalizingly
Gurevich of the University of British open—until, perhaps, you solve it.
Columbia; it had appeared in the 1990
Australian Mathematics Olympiad.

Peter Winkler (puzzled@cacm.acm.org) is Professor of Mathematics and of Computer Science and Albert Bradley
Third Century Professor in the Sciences at Dartmouth College, Hanover, NH.

All readers are encouraged to submit prospective puzzles for future columns to puzzled@cacm.acm.org.

ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c o mmu n i c at i o n s o f t h e acm 109


CACM_TACCESS_one-third_page_vertical:Lay last byte

[co n t i n u ed from p. 112] replicators


(“rabbits” clone until they fill all mem-
ory), worms (traveling through net- It’s a wholly new
worked computer systems, laying eggs), thing—a smart
and plenty more.
Viruses were not a legacy I sought. virus with a grudge—
Inevitably, someone would invent evolving, self-aware,
them; the idea requires only a simple
self-educating, ACM
biological analogy. But once it would
escape into the general culture, there
would be no way back, and I didn’t want
craftily fulfilling Transactions on
its mission.
to make my professional life around it,
lucrative as it might be. The manufac-
Accessible
turers of spray-paint cans likely feel the
same way… Computing
Consider that our cities will get Stuxnet looks like a kluge with in-
smart and be able to track us with cam- ventive parts. It does not hide its pay-
eras on the street and with microwaves load well or cover its tracks. It will not
that read the chips in our phones, take much effort to greatly improve
computers, even embedded beneath such methods (with, say, virtual ma-
our skin. The first commercial use will chine-based obfuscation and novel
likely be to feed advertising to us, as in techniques for anti-debugging), what-
the 2002 Steven Spielberg film Minority ever the target. Once major players use
Report. We’ll inevitably live in an arms them in nation-state rivalries, they will
race against intrusive eyes, much as we surely leak into commerce, where the
guard against computer viruses today. stakes are immense for all of us. If Stux-
Stuxnet, the virus known to have type, untraceable malware becomes a
invaded Iran’s nuclear facilities, is ap- weapon of commerce, our increasingly
parently the first malicious code de- global commercial competitiveness
liberately designed to disrupt targeted will take on a nastier edge.
industrial processes, mutating on a Meanwhile, if living in space be-
schedule to avoid erasure, interrogat- comes routine, the related systems will
ing the computers it invades, and send- demand levels of maintenance and con-
ing data back to its inventors. Stuxnet trol seldom required on Earth. Consid-
is able to reprogram Siemens-manu- er that the International Space Station
◆ ◆ ◆ ◆ ◆
factured programmable logic control- spends most of its crew time just keep-
lers and hide the changes it introduces ing the place running—and potentially This quarterly publication is a
into them. Commands in Stuxnet code can be corrupted with malware. So can quarterly journal that publishes
increase the frequency of rotors in many systems to come, as our environ- refereed articles addressing issues
centrifuges at Iran’s Natanz uranium- ment becomes smarter and interacts
of computing as it impacts the
enrichment plant so they fly apart. Yet with us invisibly, around the clock. In-
much of Stuxnet’s code is unremark- creasing interconnection of all systems lives of people with disabilities.
able, standard stuff, lacking advanced will make smart sabotage a compelling The journal will be of particular
cloaking techniques. temptation. So will malware that elicits interest to SIGACCESS members
Still, it’s a wholly new thing—a data from our lives or corrupts systems
and delegrates to its affiliated
smart virus with a grudge—evolv- we already have, in hopes we’ll be com-
ing, self-aware, self-educating, craft- pelled to replace them. conference (i.e., ASSETS), as well
ily fulfilling its mission. Expect more Now think beyond these early stages. as other international accessibility
to come. Countries hostile to the U.S. What secondary effects could emerge? conferences.
could likewise launch malware attacks Seeds of mistrust and suspicion travel
◆ ◆ ◆ ◆ ◆
against U.S. facilities, using Stuxnet- far. But that’s the world we’ll live in,
like code to attack the national power with fresh problems we’ll be able to www.acm.org/taccess
grid or other critical infrastructure. attack but only if we’ve thought them www.acm.org/subscribe
Though seldom discussed, U.S. through first.
policy has traditionally been to lead in
technology while selling last-genera- Gregory Benford (gbenford@uci.edu) is a professor
of physics at the University of California, Irvine, and a
tion tech to others. Thus we are able to novelist, including of Timescape, winner of the 1980
defeat our own prior inventions, along Nebula and British Science Fiction Awards.
with sometimes deliberately installed
defects we might exploit later. © 2011 ACM 0001-0782/11/0300 $10.00

mar c h 2 0 1 1 | vo l . 5 4 | n o. 3 | c omm u n i c at i o n s o f t h e acm 111


last byte

Future Tense, one of the revolving features on this page, presents stories and
essays from the intersection of computational science and technological speculation,
their boundaries limited only by our ability to imagine what will and could be.

DOI:10.1145/1897852.1897879 Gregory Benford

Future Tense
Catch Me If You Can
Or how to lose a billion in your spare time…

I envisioned and wrote the first com-


puter virus in 1969 but failed to see
that viruses would become wide-
spread. Technologies don’t always
evolve as we’d like. I learned this then
but failed to catch the train I knew,
even then, would soon leave the sta-
tion. Further, I failed to see the levels
of mistrust that would derive from
malware generally. I also did not antic-
ipate that seeds of mistrust could be
blown by the gales of national rivalry
through an Internet that would some-
day infiltrate every aspect of our lives.
At the Lawrence Radiation Labo-
ratory I used the Advanced Research
Projects Administration’s network, or
ARPANet, to send brief messages to col- otherwise unrelated code, so I wrote ing creation of the antivirus industry I
leagues in other labs running over the a memo, emphasizing to the mavens had anticipated in 1970.
big, central computers we worshipped of the Main Computer that what I It is some solace, I suppose, that the
then. However, ARPANet email had a had done could likewise be done with 2010 second-best-selling virus-protec-
potentially pernicious problem—“bad considerably more malevolent intent. tion software was a neat little package
code” that could arise when research- Moreover, viruses could move. called Vaccine. The same basic idea was
ers sent something new (maybe acci- I avoided “credit” for the idea for adapted into a different kind of currency
dentally), possibly sending yet other a long time but gradually realized in the hands of renowned British biolo-
things awry. the virus-infection metaphor was in- gist Richard Dawkins, coining the term
One day I thought maybe I could evitable, fairly obvious in fact. In the “memes” to describe cultural notions
add such code intentionally, making early 1970s it surfaced again at Liver- that catch on and propagate through
a program that would copy itself de- more when a self-replicating program human cultural mechanisms. Rang-
liberately. The biological analogy was called Creeper infected ARPANet, just ing from pop songs we can’t get out of
obvious; evolution would favor it, espe- printing on a user’s video screen “I’m our heads all the way up to the Catho-
cially if designed to use clever methods the creeper, catch me if you can!” In lic Church, memes express how cul-
to hide itself and tap other programs’ response, users quickly wrote the first tural evolution occurs so quickly, as old
energy (computing time) to further its antivirus program, called Reaper, to memes give way to voracious new ones.
own genetic ends. erase Creeper. Various people rein- Nowadays there are nasty scrub-
Illustratio n by J oh n David Bi gl III

So I wrote some simple code and vented the idea into the 1980s, when everything viruses of robust ability
sent it along in my next ARPANet trans- a virus called Elk Cloner infected early and myriad variations: Trojan horses,
mission. Just a few lines in Fortran Apple computers. It was fixed quickly, chameleons (acts friendly, turns nas-
told the computer to attach them to but Microsoft software proved more ty), software bombs (self-detonating
programs being transmitted to a par- vulnerable, and in 1986 a virus called agents, destroying without cloning
ticular terminal. Soon it popped up in Brain started booting up with Mi- themselves), logic bombs (go off given
other programs and began propagat- crosoft’s disk operating system and specific cues), time bombs (keyed by
ing. By the next day it was in a lot of spread through floppy disks, stimulat- clock time), [co ntinue d o n p. 1 1 1 ]

112 comm unications of t h e acm | ma r c h 2 0 1 1 | vo l . 5 4 | n o. 3


The 2012 ACM Conference on C
Computer Supported Cooperative Work S
)(
2
0
1
C
February 11-15, 2012 | Seattle, Washington 2 W

Call for Submissions Submission


Deadlines
CSCW is an international and interdisciplinary conference on
CS
technical and social aspects of communication, collaboration, and Papers & Notes
coordination. The conference addresses the design and use of 3 June 2011
technologies that affect groups, organizations, and communities.
CSCW topics continue to expand as we increasingly use
technologies to live, work and play with others. Workshops
28 July 2011
This year we have adopted a new review process for papers and
notes intended to increase their diversity and quality. The
submission deadline is early to avoid conflicting with the CHI 2012 Panels
deadline. This enables us to include a revision cycle: Qualifying Interactive Posters
authors can revise their submissions after the initial reviews.
Demonstrations
For complete details about CSCW’s venues, please review the Call CSCW Horizon
for Participation on our website www.cscw.2012.org. And please Videos
follow our progress at www.twitter.com/cscw2012 or by joining the
CSCW 2012 Facebook group. Student Volunteers
9 September 2011
Conference Chairs: Steve Poltrock, Carla Simone

Papers and Notes Chairs: Gloria Mark, John Riedl, Jonathan Grudin Doctoral Colloquium
16 October 2011
Also consider attending WSDM (wsdm2012.org) immediately before
CSCW 2012.

Sponsored by

http://www.cscw2012.org
Think Parallel.....
It’s not just what we make.
It’s what we make possible.
Advancing Technology Curriculum
Driving Software Evolution
Fostering Tomorrow’s Innovators

Learn more at: www.intel.com/thinkparallel


Copyright © 2009 Intel Corporation. All rights reserved. Intel and the Intel logo are trademarks of Intel Corporation in the U.S. and other countries.
*Other names and brands may be claimed as the property of others.

S-ar putea să vă placă și