Sunteți pe pagina 1din 496

Army Research Laboratory

Network Science
Collaborative Technology Alliance

Initial Program Plan


March 17, 2010
version 1.4

Applicable Period: September 28 2009 – September 27 2010


Table of Contents
Cover Sheet
Table of Contents
1 Introduction
2 Alliance Overview
3 Research Overview
4 CCRI: Trust in Distributed Decision Making
5 CCRI EDIN: Evolving Dynamic Integrated (Composite) Networks
6 Non-CCRI Research: Interdisciplinary Research Center (IRC)
7 Non-CCRI Research: Information Networks Academic Research Center (INARC)
8 Non-CCRI Research: Social/Cognitive Academic Research Center (SCNARC)
9 Non-CCRI Research: Communication Networks Academic Research Center (CNARC)
10 NS CTA Meetings Schedule (First 15 Months)
11 First Year Budget by Project and Organization
12 Five Year Roadmap

A detailed Table of Contents is provided at the start of each of Sections 2 through 9 and 11.

NS CTA IPP v1.4 ToC-1 March 17, 2010


1. Introduction
The Network Science Collaborative Technology Alliance (NS CTA) is a
collaborative research alliance between the US Army Research
Laboratory (ARL), other government researchers, and a Consortium of
four research centers: an Academic Research Center (ARC) focused on
social/cognitive networks (the SCNARC), an ARC focused on
information networks (the INARC), an ARC focused on communications
networks (the CNARC), and an Interdisciplinary Research Center (the
IRC) focused on interdisciplinary research and technology transition. The
Alliance unites research across organizations, technical disciplines, and
research areas to address the critical technical challenges of the Army and Network-Centric
Warfare (NCW). Its purpose is to perform foundational cross-cutting research on network
science, resulting in greatly enhanced human performance for network-enabled warfare and in
greatly enhanced speed and precision for complex military operations.

The Alliance will conduct interdisciplinary research in network science and transition the results
of this research to benefit network-centric military operations. Network science is the study of
the properties, models, and theories that apply to all varieties of networks, and the use of this
understanding in the analysis, prediction, design, and control of all varieties of networks. The NS
CTA research program exploits intellectual synergies across network science by uniting parallel
fundamental (6.1) and applied (6.2) research across the disciplines of social/cognitive,
information, and communications network research. It will drive the synergistic combination of
these technical areas for network-centric warfare and network-enabling capabilities in support of
all missions required of today's military forces, including humanitarian support, peacekeeping,
and full combat operations in any kind of terrain, but especially in complex and urban terrain. It
will also support and stimulate dual-use applications of this research and technology to benefit
commercial use. As a critical element of this program, the Alliance is creating a sustainable
world-class network science research facility, with critical mass in the NS CTA Facility in
Cambridge, MA, as well as shared distributed experimental resources throughout the Alliance.
The NS CTA also serves the Army’s NCW needs through an Education Component, which acts
to increase the pool of network science expertise in the Army and the nation, while bringing
greater awareness of Army needs into the academic and industrial research community.

This document presents the Initial Program Plan (IPP) for the Alliance, which describes the
projects and technical activities to be undertaken for the first year of its existence, Sep 28, 2009
through Sep 27, 2010. The current document incorporates all changes for the first modification
of the IPP (mod-1). The research is structured into six areas: two Cross-Cutting Research Issues
(CCRIs) that entail close intellectual integration across all four Centers, and the non-CCRI
research conducted at each Center. An essential aspect of all research (both CCRI and non-
CCRI) conducted within and between the Centers is that it is addressing critical NCW technical
challenges in the context of composite networks of networks: not simply multi-technology
environments, but environments where all genres of network (social/cognitive, information, and
communications) inherently interact. The two CCRIs are “Trust in Distributed Decision Making”
and “Evolving Dynamic Integrated (Composite) Networks” (EDIN).

NS CTA IPP v1.4 1-1 March 17, 2010


This document is structured as follows. Section 2 gives an overview of the Alliance and the
manner in which it will conduct its research. Section 3 provides an overview of the research to be
conducted in each of six technical areas in the first year. This overview is followed by Sections 4
through 9, which present the detailed research plans for the two CCRIs and for the non-CCRI
research of each of the four Centers. Section 10 provides the schedule of meetings that will be
conducted in the first year. Section 11 provides an organizational break-down of the first year
budget for the Consortium. Finally, Section 12 provides a 5-year roadmap for the research
program.

NS CTA IPP v1.4 1-2 March 17, 2010


2. Alliance Overview

Table of Contents
2. Alliance Overview ................................................................................................................ 2-1
2.1 The nature of the Alliance............................................................................................... 2-1
2.2 Oversight of the NS CTA Program ................................................................................. 2-2
2.3 Interdisciplinary Research Center (IRC) ........................................................................ 2-3
2.3.1 Principal and General Members institutions ............................................................ 2-3
2.3.2 Responsibilities ........................................................................................................ 2-3
2.3.3 Key personnel and their roles .................................................................................. 2-4
2.4 Social/Cognitive Networks Academic Research Center (SCNARC) ............................. 2-5
2.4.1 Principal and General Member institutions ............................................................. 2-5
2.4.2 Research focus ......................................................................................................... 2-5
2.4.3 Key personnel and their roles .................................................................................. 2-5
2.5 Information Networks Academic Research Center (INARC) ........................................ 2-6
2.5.1 Principal and General Member institutions ............................................................. 2-6
2.5.2 Research focus ......................................................................................................... 2-6
2.5.3 Key personnel and their roles .................................................................................. 2-6
2.6 Communications Networks Academic Research Center (CNARC) ............................... 2-7
2.6.1 Principal and General Member institutions ............................................................. 2-7
2.6.2 Research focus ......................................................................................................... 2-7
2.6.3 Key personnel and their roles .................................................................................. 2-7
2.7 Alliance summary ........................................................................................................... 2-8

2.1 The nature of the Alliance

As described in the Introduction (Section 1, above), the Network Science Collaborative


Technology Alliance (NS CTA or the Alliance) is a close collaboration between ARL, other
Government and non-Government researchers, and a Consortium of four research centers: the
Social/Cognitive Networks Academic Research Center (SCNARC), the Information Networks
ARC (INARC), the Communications Networks ARC (CNARC), and the Interdisciplinary
Research Center (IRC). Each Center is led by a Principal Member institution and is supported by
General Member institutions and subawardees.

The essence of the Alliance is threefold: close intellectual and programmatic collaboration,
interdisciplinary network science research, and parallel mutually-reinforcing basic (6.1) and
applied (6.2) research activities. The driving vision of this unified network science approach is
to capture the fundamental underlying commonalities across social/cognitive, information, and
communications networks, and to exploit this understanding in support of network-centric
operations. The result of our research is both to advance understanding and control of composite
systems embracing all these intensely interacting elements, and also to cross-fertilize insights,

NS CTA IPP v1.4 2-1 March 17, 2010


methods, and technologies so that our command of each kind of network is accelerated. We see a
world, a decade from now, where the rapidly maturing field of network science allows us to
predict and control the behaviors of composite interdisciplinary network systems so complex that
they are opaque to the best science today.

In support of this ambitious program of fundamental research, the Consortium has an NS CTA
Facility in Cambridge, MA, with significant stable core of advanced research staff from across
the Alliance, working together with rotational and shorter-term researchers. The Facility
provides the Alliance with a central point for close interaction, as well as support for distributed
multi-user experimentation and distributed interactions across the Alliance.

A key aspect of program is its Education Component: the Centers, working with ARL and other
government researchers, is explicitly charged with expanding the scope and relevance of the
entire network science community. The Education Component will enhance the access of Army
researchers to the latest academic and industrial research, expand the pool of network science
experts by promoting network science educational programs, and enhance the awareness of
Army technical challenges throughout the network science community.

In addition to the Cooperative Agreement shared by ARL and all the Principal and General
Members, the IRC also holds a separate contract vehicle with ARL for the IRC’s Technology
Transition Component, which will support transition into specific DoD and commercial
programs. The IRC is charged with aggressively promoting transition of research products and
technologies from 6.1 and 6.2 research into direct benefit to Army programs and further
government and commercial use. This emphasis on practical impact, in turn, feeds back new
challenges into the guidance of existing and future Alliance research.

2.2 Oversight of the NS CTA Program

The NS CTA is guided throughout its research and transition activities by a highly collaborative
structure for oversight; key elements include:
Collaborative Alliance Manager (CAM). Overall technical management and fiscal
responsibility for the NS- CTA resides in Dr Alexander Kott of ARL, the designated NS CTA
CAM. Dr Kott is assisted by six Government Area Technical Leads (GATLs): Dr. Ananthram
Swami (IRC), Dr. Lance Kaplan (INARC), Dr. Jeff Hansberger (SCNARC), Mr. Greg
Cirincione (CNARC), Mr. David Dent (EDIN CCRI), and Dr. Jerry Powell (Trust CCRI).
Deputy CAM. The CAM is assisted by Dr Robert Cole of CERDEC, the designated Deputy
CAM for the NS CTA. Dr. Cole’s focus is on experimentation and technology transition.
Program Director. The NS CTA Program Director, Dr. Will Leland of BBN, is the
Consortium’s technical representative charged with the Consortium’s overall responsibility for
the network science basic research, coordination of research results, and management of the
cooperative agreement. Dr. Leland is assisted by Dr. David Sincoskie of the University of
Delaware, the Associate Program Director.

NS CTA IPP v1.4 2-2 March 17, 2010


Transition Administrator. The NS CTA Transition Administrator, Dr. Isidro Castineyra of
BBN, is the Consortium’s representative charged with the responsibility for executing transition.
Dr Castineyra also serves as Deputy Director of the IRC, in support of the Program Director.
Academic Research Center Directors. Each ARC is led by a Center Director. The Director is
that Center’s technical representative charged with the Center’s technical leadership,
management, and guidance.
The Technical Management Group (TMG) is chaired by the Collaborative Alliance Manager
and consists of the Program Director, Transition Administrator, and the three ARC Directors, as
well as the corresponding Government technical leads. The TMG will assist the CAM and the
Program Director in carrying out their duties concerning the NS CTA.
The Consortium Management Committee (CMC) consists of one representative from each
Principal and General Member of the Centers. The CAM participates as an ex officio member in
all discussions except those that deal with purely internal Consortium matters. The CMC chair
will be the Program Director, Dr. Leland. The CMC makes recommendations that concern the
composition of the Consortium, the definition of the tasks and goals of the participants, and the
distribution of funding to the participants.
The Research Management Board (RMB) identifies and develops collaborative opportunities,
advises and assists the CAM in setting research goals, and facilitates transition to development
programs. The RMB includes representatives from Army and other service organizations and
other government agencies with interest or expertise in technologies related to the NS CTA.

The following four subsections summarize the composition and focus of each of the Centers.

2.3 Interdisciplinary Research Center (IRC)

2.3.1 Principal and General Members institutions


BBN Technologies, Inc Principal Member
ArtisTech, Inc General Member
University of California, Riverside General Member
University of Delaware General Member

2.3.2 Responsibilities
The IRC is focused on performing interdisciplinary research (both basic and applied) that spans
the interests in the Alliance and leads to cross-cutting insights into network science and
innovative technologies. The IRC transitions basic research from across the Consortium (its own
and the ARCs’) into applied research, and, through the Technology Transition Component,
promotes the rapid transition of technology to meet the specific needs of a network-centric
Army. The IRC leads the Education Component as a cooperative program across all four
Centers. The IRC also operates the NS CTA Facility, which supports and coordinates distributed
collaborative research and experimentation.

NS CTA IPP v1.4 2-3 March 17, 2010


Programmatically, the IRC is the
leader of the Consortium, both
intellectually (responsible for
setting research directions for the
four Centers to ensure that
research is focused on
fundamental network science
issues that are relevant to
network-centric operations and
the Army mission) and
administratively (responsible for
financial management and for
tracking and reporting on the Consortium’s work). The research, transition, and leadership roles
of the IRC are summarized in the accompanying figure.

2.3.3 Key personnel and their roles


Dr. Ananthram Swami ARL Government Lead for the IRC
Dr. Will E. Leland BBN Program Director
Dr. David Sincoskie University of Associate Program Director; Deputy
Delaware Transition Administrator; Education
Component Coordinator
Dr. Isidro Castineyra BBN IRC Deputy Director; Transition
Administrator
Dr. Prithwish Basu BBN Coordinator for EDIN CCRI
Dr. Mike Dean BBN IRC Liaison to Information Networks ARC;
co-Lead for research project R2 (see Section
3, Research Overview)
Dr. Michalis Faloutsos University of Leadership in network science research;
California, Project Lead for research project R1
Riverside
Dr. Karen Haigh BBN Co-Lead for IRC Trust CCRI research
Mr. John P. Hancock ArtisTech, Inc. Co-Lead for research project R3
Dr. Jim Hendler RPI IRC Liaison to Social/Cognitive Networks
ARC; co-Lead for research project R2
Dr. Alice Leung BBN Co-Lead for research project R3
Mr. Jeff Opper BBN Co-Lead for IRC Trust CCRI research
Dr. Craig Partridge BBN Leadership in network science research
Dr. Don Towsley University of IRC Liaison to Communications Networks
Massachusetts ARC

NS CTA IPP v1.4 2-4 March 17, 2010


2.4 Social/Cognitive Networks Academic Research Center
(SCNARC)

2.4.1 Principal and General Member institutions

Rensselaer Polytechnic Institute Principal Member


City University of New York General Member
IBM General Member
Northeastern University General Member

2.4.2 Research focus


The modern military increasingly needs to rely on bottom up network processes, as compared to
top down hierarchic processes. How does the pattern of interactions within a military unit affect
performance of tasks? What kinds of ties external to the Army are necessary to success? How
can we use the massive streams of data to detect adversarial networks? How a social and
cognitive network can quickly extract the most meaningful information for the soldier and
decision maker that is useful in all aspects of their operations from supporting humanitarian
operations to force protection and full combat operations. These are but a sample of network-
related questions with which the 21st century Army must wrestle. The long term objective of the
Center is to advance the scientific understanding of how the social networks form, operate and
evolve and how they affect the functioning of large, complex organizations such as the Army;
how adversary networks hidden in large social networks can be detected, monitored or dissolved;
and how human cognition directs and is impacted by the network-centric interactions. The
Center will undertake research to gain a fundamental understanding of the underlying theory, as
well as create scientific foundations for modeling, simulation, measurements, analysis,
prediction, and control of social/cognitive networks and their impact on the U.S. Army.

2.4.3 Key personnel and their roles


Government Lead: Jeffrey Hansberger (ARL)
Center Director: Boleslaw Szymanski (RPI) (Project E3 co-Lead)
Principal Member, RPI, Participants:
Sibel Adali (CCRI Trust Coordinator, Project T2 Lead and Task T2.2 co-Lead), Malik
Magdon-Ismail (Project S2 and Tasks S2.1, S2.2 Lead), Wayne Gray (Project S3 and
S3.1, S3.2, S3.2 Lead), Jim Hendler (Collaboration Coordinator), Gyorgy Korniss
(Project S4 and Tasks S4.1, S4.2 Lead), Mark Goldberg, Chjan Lim (Task T2.1 co-Lead),
Michael Shoelles, William Wallace
General Member Northeastern Participants:
Albert-Laszlo Barabasi (Task E4.1 Lead), David Lazer (Chief Social Scientist and Task
E3.2 Lead)
General Member IBM Participants:
Ching-Yung Lin (Project S1 and Task S1.2 Lead), Ravi Konuru (Task S1.1 Lead) Zhen
Wen (Task S1.3 Lead), Spiros Papadimitriou
General Member CUNY Participants:

NS CTA IPP v1.4 2-5 March 17, 2010


Ted Brown (Education Lead), Simon Parsons (Task 1.3 Lead), Hernan Makse, Jennifer
Mangels
Subawardees:
Sinan Aral (NYU)
Jennifer Golbeck (UMD)
Alex Pentland (MIT)
Brian Uzzi (Northwestern Univ.)
Zoltan Toroczkai, Nitesh Chawla, David Hachen, Omar Lizzardo (Notre Dame)
Alex Vespignani and Stan Wasserman (Indiana)

2.5 Information Networks Academic Research Center (INARC)

2.5.1 Principal and General Member institutions


University of Illinois at Urbana-Champaign Principal Member
City University of New York General Member
IBM General Member
University of California, Santa Barbara General Member

2.5.2 Research focus


INARC is aimed at developing the information network technologies required to improve the
capabilities of the US Army and providing users with reliable and actionable intelligence across
the full spectrum of Network-Centric Operations. INARC will systematically develop the
foundation, methodologies, algorithms, and implementations needed for effective, scalable,
hierarchical, and most importantly, dynamic and resilient information networks for military
applications. The center will be focusing on the five research projects: (i) EDIN: Foundation of
Evolving, Dynamic Information Networks, (ii) Trust-CCRI: Foundation of Trusted Information
Networks, (iii) Distributed and Real Time Data Integration and Information Fusion, (iv)
Scalable, Human-Centric Information Network System, and (v) Knowledge Discovery in
Information Networks.

2.5.3 Key personnel and their roles


Government Lead: Lance Kaplan (ARL)
Center Director: Jiawei Han (UIUC)
Principal Member, UIUC, Participants:
Tarek Abdelzaher (Project I1 Co-lead, Task I1.1 Lead), Thomas Huang (Task I1.2 lead),
Dan Roth (Task I3.3 lead)
General Member UCSB Participants:
Ambuj Singh (Project E3 co-Lead), Xifeng Yan (Project I2 and Task I2.2 Lead), Tobias
Hollerer (Task T2.3 lead), B. S. Manjunath
General Member IBM Participants:
Charu Aggarwal (Project I1 co-lead), Dakshi Agrawal (Project T1 INARC-Lead),
Mudhakar Srivatsa (Task T1.2 lead), Spyridon Papadimitriou (Task I3.2 co-lead),
Anastasios Kementsietsidis, Min Wang
General Member CUNY Participants:

NS CTA IPP v1.4 2-6 March 17, 2010


Amotz Bar-Noy, Ted Brown (Education Lead), Heng Ji (Task I1.3 Lead)
Subawardees:
Christos Faloutsos (CMU)
Lada Adamic (Univ. of Michigan)
Noshir Contractor (Northwestern Univ.)
Peter Pirolli (PARC: Task T1.2 lead)

2.6 Communications Networks Academic Research Center


(CNARC)

2.6.1 Principal and General Member institutions

Penn State University Principal Member


City University of New York General Member
University of California, Davis General Member
University of California, Santa Cruz General Member
University of Southern California General Member

2.6.2 Research focus


The CNARC will focus its research on characterizing complex communications networks, such
as those used for network-centric warfare and operations, so that their behavior may be
accurately predicted and they may be configured for optimal information sharing and gathering.
In particular the CNARC will focus on characterizing and controlling the operational information
content capacity (OICC) of a tactical network. OICC is a function of the quality and amount of
information that is delivered to decision makers. This includes data delivery and security
properties of the network. Thus it is vastly different than other measures of network capacity that
are traditionally modeled. In essence, the CNARC models treat the network as an information
source.

To perform this work the CNARC will work closely with the other centers to fully leverage the
existence of underlying social and information networks. These networks will impact the
configuration of network flows and the mobility of nodes. The relative importance of
information and the need for multiple pieces of information to make decisions will impact how
information is transferred across a network and which security properties are applied.

2.6.3 Key personnel and their roles

The CNARC team has experts in areas of network modeling, security, optimizations, systems,
protocols analysis and experimental evaluation. Key personnel include:

Thomas La Porta (PSU, Center Director) – expertise in wireless and mobile networks, security
and protocol analysis;

NS CTA IPP v1.4 2-7 March 17, 2010


Prasant Mohapatra (UC Davis) – expertise in wireless and mobile networks, resource allocation
and experimentation;

J.J. Garcia-Aceves-Luna (UC Santa Cruz) – expertise in formal analysis of network structure,
scaling laws, wireless networks;

Ramesh Govindan (USC) – expertise in wireless networks, networked applications and quality of
information;

Amotz Bar-Noy (CUNY) – expertise in algorithms and complexity, scheduling;

Karl Levitt (UC Davis) – expert on network security.

2.7 Alliance summary

The ARL Network Science CTA performs foundational, cross-cutting research on network
science to achieve: A fundamental understanding of the interplay and common underlying
science among social/cognitive, information, and communications networks; Determination of
how processes and parameters in one of these networks affect and are affected by those in other
networks; and, Prediction and control of the individual and composite behavior of these complex
interacting networks.

The Alliance provides a unique, and uniquely powerful, program of fundamental research that
addresses the critical technical and scientific needs of Army in network-centric operations. The
most critical and Army-relevant phenomena emerge in intertwining social, information,
communications networks, not in one of them individually. These phenomena have had dramatic
impact on both conventional and irregular warfare, as well as humanitarian and peace-keeping
missions. As the first and only major Army research program to study mutual interdependency
and relations of these three dissimilar (and most influential) genres of networks, it is positioned
to achieve significant advancement in scientific understanding and practical impact. Its intensely
collaborative organization as four Centers working closely with ARL and other government
researchers serves to balance (a) in-depth expertise in each network genre; (b) relations,
dependencies, and mutual influences of three networking genres; and (c) sustained focus on the
Army’s technical challenges in network-centric warfare and operations.

The result of the NS CTA’s research, and its parallel technology transition, will be optimized
human performance in network-enabled warfare and greatly enhanced speed and precision for
complex military operations.

NS CTA IPP v1.4 2-8 March 17, 2010


3. Research Overview

Table of Contents
3. Research Overview ............................................................................................................... 3-1
3.1 Military need .................................................................................................................... 3-1
3.2 Defining the IPP Research Plan ....................................................................................... 3-4
3.2.1 From military needs to research tasks ........................................................................ 3-4
3.2.2 The fundamental network science research questions ................................................ 3-5
3.3 Roadmap to the rest of Section 3...................................................................................... 3-5
3.4 Trust in Distributed Decision-Making CCRI ................................................................... 3-6
3.5 Evolving Dynamic Integrated (Composite) Networks (EDIN) CCRI ............................. 3-9
3.6 IRC non-CCRI research ................................................................................................. 3-12
3.7 INARC non-CCRI research............................................................................................ 3-15
3.8 SCNARC non-CCRI research ........................................................................................ 3-18
3.8.1 Research Challenges: ................................................................................................ 3-18
3.8.2 Research Approach ................................................................................................... 3-19
3.9 CNARC non-CCRI research .......................................................................................... 3-21

This chapter gives an overview of the entire research program in NS CTA.

3.1 Military need

Army missions face a world in which every mission is profoundly influenced by the interactions
of complex, rapidly evolving, systems of composite social/cognitive, information, and
communications networks. Warfighter performance and mission effectiveness are critically
affected by our ability to understand, design, predict, and ultimately control the structure and
behavior of this vast composite system. To bring these general observations down to concrete
cases, we illustrate here the network science-related technical challenges faced by the modern
Army with two examples, and present some of the key research questions that directly arise from
these examples.

NS CTA IPP v1.4 3-1 March 17, 2010


Figure 3.1: Streamlining the complex relationships among systems and units all performing
simultaneous Battle Command is a requirement for enhancing the speed and precision of
network-enabled warfare.
Battle Command. As the Army continues its transformation from a division-oriented structure
to one of modular, independent Brigade Combat Teams, it must extend its information and
communications architectures down to individual soldiers, and its decision-making tools down to
squad leaders through the Joint Battle Command – Platform initiative. Thus, the need for detailed
understanding of the impact of these changes on network-centric warfare has never been greater.

Consider Figure 3.1. All decision-makers, at every level of command, follow the Battle
Command framework depicted. The information needs of the squad in the street (bottom center
of Figure 3.1) are different from than that of the battalion, which sent them there (upper center).
Staff interactions impact decision making at the battalion command post (upper right), as do their
interactions with the deployed company and the formal and informal command structures found
there (lower right). Communication exchanges throughout the battlespace are dependent on the
type of communication resources available (left).

This challenging environment directly defines key network science research questions that will
be addressed in the NS CTA: for example, what is the most effective (social) organizational
structure given the realities of battlefield information and communications networking? What is
the most effective information network structure given the realities of social structures and
communications networks? What is the impact of errors and losses in communications and

NS CTA IPP v1.4 3-2 March 17, 2010


information networks on warfighter decision-making and mission effectiveness? How do
individuals to adapt (or not) to advances in these technologies as they alter social structures?
How can human assessment of trust be made more accurate with algorithmic or heuristic analysis
of observations from all genres of networking? How can such assessments be subverted? The
insight and tools necessary to understand, design, predict, and control these complex systems of
multi-genre networks must be found in network science.

Counter-insurgency (COIN). Nowhere is the need for network science research greater than in
the support of Stability Operations, where the threat uses asymmetric warfare tactics, discovering
and exploiting weaknesses in our networks and network interactions for their own gains.
Consider Figure 3.2. The insurgent fights a political battle, most of it outside the traditional
focus of tactical intelligence collection and analysis methods and systems.

US forces have adopted the “Cultural Intelligence Preparation of the Battlefield” (lower left
corner of Figure 3.2) which explains what we want to know but not how we are going to learn it.
Consider the upper right hand corner of the figure. There are three different explanations of Iraqi
tribal organization: not only do we have difficulty in establishing who the leaders are within the
structure, we still have challenges identifying the structure as well and its interaction with social
and government networks. In short, we are trying to identify all the networks in play, the
individuals (or elements) present in those networks, their relative importance vis-à-vis their
primary, secondary, and tertiary networks, and at what points these networks intersect. In
parallel, we are analyzing our own networks to ensure their interactions are organized to have
maximum effect on the operation we are supporting. We seek to strengthen our own networks
while identifying the exploitable points in our foes.

In this environment, network science, through its disciplined approach to ontology formation,
creation of operationally useful metrics, development of mathematical tools and modeling
techniques, culminating in simulations of composite networks, has the potential to provide the
means to understand the operational environment, identify the key individuals/elements of that
environment, and wargame our own Courses of Actions within those networks.

Here again, the technical challenges of the mission environment directly drive key research
questions that the NS CTA program will address. How, for example, can analysis of composite
social/information/communications networks reveal hidden communities? How can we
manipulate controllable parts of the composite system of networks to drive their evolution in
desirable ways (even though the composite includes networks for which we have no direct
control or even observation)? What is an objective measure of Quality of Information (QoI), so
our information and communications networks support social/cognitive systems by prioritizing
QoI, not merely QoS (Quality of Service)? How can effective modeling, simulation, and
hypothesis validation be performed for such composite systems of networks?

NS CTA IPP v1.4 3-3 March 17, 2010


Governor
Qabilah = Large Tribe
(is also Mayor of largest city that is the
“seat” of the Governate) Nation Abu Tribe
Shab Ashirah = Major Tribe
Saeed
Districts (Quada’) headed by District (Quada’) headed by
“Qa’em-maqam” / “Mayor”
Region Sub-Tribe
“Qa’em-maqam” / “Mayor” Qabila Baten = sub-
sub-tribe
OR a District Manager OR a District Manager Hamad
Fakhdh Town Clan Fakhed = Branch
Sub-District (Nahyia) headed by Sub-District (Nahyia) headed by Sub-District (Nahyia) headed by Khazzal
a Mudir (manager / officer) a Mudir (manager / officer) a Mudir (manager / officer)
Village Branch Khamess = 5 Generation Group
Falila

“Mayors” Also Head Smaller Small Villages Have “Mukhtar” Some Sheikhs (elders) are Household Family Hammulah = Extended Family
Towns (=Chosen) Mayors of their Towns Raht
National versus Tribal Hierarchies in Iraq Bayt = Household
“neighborhoods” with own “neighborhoods” with own “neighborhoods” with own
“council” or a “Mukhtar” “council” or a “Mukhtar” “council” or a Mukhtar Tribal “Chain of Command”
Insurgent
Insurgent Communications Networks Activity
– Fallujah Cycle Identify Target with greatest
(Hypothetical) political impact

Analyze Light Blue – Counterintelligence/


Propaganda net – 4 nodes
mission/ Reconnitor
Dark Blue – Technical Net (IED
Impact Factories) – 9 nodes Site, preferably
Red – Special combat Net (snipers/ in sympathetic
Foreign volunteers) – 19 nodes area
Green – Districts Command Net –
5 nodes
Striped Nets – 4 District nets:
Find the weak
Northeastern District – 5 nodes
Eastern District – 4 nodes
Report spots in our
Central District – 3 nodes
Plan Event/
results, Southern District – 5 nodes
networks
Total District nodes – 17 nodes Assembly
wide media resources,
Yellow – Logistics net – 2 nodes
coverage of
avoiding
Document Tree Scenario
success Total Network – 56 nodes Coalition
Geo Area/
Intelligence
Unit/
Conflict Event
Mission
State
Exfiltrate Occupy site
Iraq – Iraq, An Abbreviated FM 34-130 Intelligence Infantry Site
Red Team
Country Study, Jul 2002 Preparation of the Iraqi Freedom
FM 3-21.9
Ethnic groupings pp 12-13 Battlefield, Jul 94 Stability
The SBCT Rifle Tactical Intelligence
Opposition groups pp 20-30 Population
Platoon and Operations Event
(Folder: Background Islam Iraq Characterization pg 57 Playbook, Focus
Document: Iraq a country study.ppt)
(Folder: Army Documents Squad, Dec 2002
Oct 2003 IED/Ambush/
Folder: FM 34-130 IPB Composition
Iraq, An Abbreviated Document: chap3.pdf) Tactics Demonstration, Rocket Attack
(Folder: Army Doctrine pp 18-19
Country Study, Jul 2002 FM 3-07 Stability Document: Stryker Plt.pdf) (Folder: Iraq Specific
Appendix D: Customs Operations and Studies
Document: OIF
Support Operations, FM 3-19.15 Playbook.ppt)
Arab Cultural Awareness Civil Disturbance
Feb 2003
Fact Sheet, Feb 2005 Population Operations, Apr
Cultural
Dress – p. 9
Do’s/Don’ts – pp 11,16,18,20,41
Characterization pp 1-2 2005 Table E-3. Commodities Indicators
(Folder: Army Documents (Folder: Army Doctrine
Database
(Iraq)
Gestures – pp. 21-24
Islamic holidays – pg 17
Folder: Stability Operations
Document: SASO.pdf)
Document: Civil
Disturbance
Food-Related Activities
(Folder: Cultural Awareness
Document: Arab Cultural Awareness.ppt)
Operations.pdf)
• Diversion of crops or meat from markets.
Cultural Awareness: Iraq
• Unexplained shortages of food supplies when there are no reports of natural
Nov 2004
Interactions causes.
Funerals, Property boundaries,
gestures denoting honesty • Increased reports of pilfering of foodstuffs
(Folder: Cultural Awareness
Document: cultural awareness.doc)

Figure 3.2 – Network science is the key to unraveling the organizational structure and
weaknesses of an insurgency operating in a framework of political, cultural, and personal
networks where their communication networks or impact to local economies (such as
commodity-related activities) may be the only observable indicators of their presence, strength,
organization, and intentions.

3.2 Defining the IPP Research Plan

3.2.1 From military needs to research tasks


These examples, and many others, show the profound technical and
scientific challenges created by military operations in an increasingly
network-driven world. To create a coherent, productively focused long-
term plan for network science research requires stepping back for careful
consideration of the broad research questions that must be addressed by
the NS CTA. By establishing that broad intellectual framework, we can
then proceed to define feasible points of attack, and express them in
defined research projects: each with a clear vision of its long-term
research questions, goals, validation approaches, and potential research
products. These long-term research projects are then prioritized to select
those that meet our requirements of research relevance, scientific
importance, potential military benefit, and realistic points of attack:
technical approaches that show promise of yielding answers to the
questions and technologies to transition. Finally, for this Initial Program Plan, we identify the

NS CTA IPP v1.4 3-4 March 17, 2010


near-term research tasks that provide essential insights and capabilities for driving future
research, attack significant immediately-addressable sub-questions of their project’s long-term
research questions, and/or promise verifiable significant research results starting in the first year.

3.2.2 The fundamental network science research questions

Fundamental Network Science Research Questions


Q1: What are the fundamental attributes of a network? Is the list in the 2005 NRC report
sufficient? [Committee on Network Science for Future Army Applications, National Research
Council, Network Science. The National Academies Press, 2005. ISBN 0-309-10026-7.]
Q2: What factors affect network formation? How do networks evolve over time? How do
changes in one network drive changes in other genres of interacting networks? How do these
changes in turn reflect back to cause chains of network behavioral and structural response?
Q3: How do network formation rules impact network properties? How do properties interact and
evolve as multiple social/information/communications networks are integrated?
Q4: How do various forms of network dynamics affect network properties? How do temporal
dynamics in one network propagate across another? For example, social trust can be enabled by
communications networks but trust does not go away if communications are interrupted.
Q5: How can we predict properties and scaling laws for networks of finite and often modest size
and also control them? The tendency in theory is to use asymptotic analysis, but many military
networks are of finite and often modest size and we need concrete results.
Q6: How can we extract network properties from observables such as network traces, especially
if the observations contain incomplete, delayed, or unreliable information?
Q7: What is the most effective way of visualizing networks, especially networks composing or
integrating multiple social/information/communications networks?
Q8: When do composite networks enhance each other’s function and where do they interfere?
Are there special issues composing human (social) networks with technical networks or are the
composition challenges generic?

3.3 Roadmap to the rest of Section 3

The remaining subsections of Section 3 provide an overview of each of the major research
programs in the NS CTA Initial Program Plan: The Trust in Distributed Decision-Making Cross-
Cutting Research Issue (CCRI), the Evolving Dynamic Interdisciplinary Networks (EDIN)
CCRI, and the non-CCRI research programs of each Center. An essential aspect of all research
(both CCRI and non-CCRI) conducted within and between the Centers is that it is addressing
critical NCW technical challenges in the context of composite networks of networks: not simply
multi-technology environments, but environments where all genres of network (social/cognitive,
information, and communications) inherently interact.

Much further detail is provided in the corresponding Sections of this IPP (Section 4 through 9,
below). In particular, the key long-term research questions are provided for each project, and the
focused initial research questions are provided for each task. The overall structure for technical
descriptions is roughly recursive: at each level (project and task), we provide explicit key

NS CTA IPP v1.4 3-5 March 17, 2010


research questions being addressed, representative initial hypotheses (short term and long term)
that will be explored, technical approaches, prior work, validation approach, and research
products. projects will be at a higher-level of generality than tasks: for example, the project
presents the larger-scale and longer-term vision of its research program. At the project level, the
narrative also rolls up the initial research staff, military relevance driving the project’s research
questions, research linkages to other research across the Alliance, collaborations and rotations,
research milestones for the first year, and the project’s year-1 research budget by organization.

3.4 Trust in Distributed Decision-Making CCRI

Trust in distributed decision making is one of the two cross-cutting research initiatives in the NS
CTA being studied collaboratively in all the centers from different and complementary
perspectives. Trust is a relationship involving two entities, a trustor and a trustee, in a specific
context under the conditions of uncertainty and vulnerability. It is universally agreed that trust is
a subjective matter and it involves expectations of future outcomes. In a trust relationship, trustor
encounters uncertainty from a lack of information or an inability to verify the integrity,
competence, predictability, and other characteristics of trustee, and is vulnerable to suffering a
loss if expectations of future outcomes turn out to be incorrect. Trust allows trustor to take
actions under uncertainty and from a position of vulnerability by depending on trustee.

Distributed decision making in Network-Centric Operations (NCO), especially IW and COIN


operations requires command, control and coordination of numerous and heterogeneous nodes,
from hardware and software infrastructure to information and people: sensors connected with a
communication network, information available from many different sources such as known
people, anonymous sources, sensors, automated programs for mining these, and people
connected to each other through different and complex interpersonal relationships. A basic tenet
of NCO is that the strategic design and use of networks will improve information sharing and
collaboration, shared situational awareness, and actionable intelligence – all of which will
dramatically increase mission effectiveness. All the networks involved in decision making,
communication, information and social networks are interdependent. They exist and evolve
simultaneously. The networks may contain adversaries who may lie, nodes may fail, information
may be inconsistent. Warfighters accessing and producing data are typically spread across these
networks. Network-centric operations will require soldiers to actively interact with people who
are not necessarily from their own organization, monitor the progress of mission based on
feedback obtained from a distributed network and act quickly to important events – all of which
produce conditions of uncertainty and vulnerability. Under such conditions, trust plays a crucial
role in decision making and it is critical that the interfaces to the network facilitate access to
trusted information and people and help foster development of trust. Equally critical is the ability
to adapt to changes and disruptions in the network and to compensate for actions and events that
may cause trust to diminish.

As opposed to the traditional settings of electronic marketplaces or other closed or limited


settings, the trust relevant for NCO must be derived in a distributed fashion, in time-critical and

NS CTA IPP v1.4 3-6 March 17, 2010


stressful situations, in environments where node capture and subversion are likely; and where the
underlying communications network is resource-constrained, mobile, and dynamic; and where
decision makers’ reliance on and compliance with an information system are subject to
numerous internal and external influences. Furthermore, the trust will have many components
each of which will be derived depending on its own context; trust components will have varying
uncertainties; and they will involve network’s ability to be available, accurate, reliable, timely,
comprehensible, etc. The ability to achieve trust and incorporate it into decision-making methods
under such environments is fundamental to the success of networked forces and is the subject of
scrutiny for this research effort.

To address the challenges raised by understanding and establishing trust in composite networks,
the Trust CCRI is organized into three main projects that incorporate highly collaborative
research in all aspects of the problem. An overview of the projects in the Trust CCRI is given in
Figure 1 which outlines the centers participating in the different tasks in FY10. The three main
projects in the Trust CCRI are the following:
T1. Trust models and metrics: computation, inference and aggregation
T2. Understanding the interactions between network characteristics and Trust.
T3. Fundamental paradigms for enhancing Trust
These projects center around the problems related to identifying how to model and compute trust
in composite networks (Project T1), understanding how the composite network may impact trust
and how trust may impact the network (Project T2) and paradigms for enhancing trust and
propagation trust related information in the network (Project T3). In addition to the highly
collaborative nature of each task in these projects, there are close linkages between the three
projects, as described in Section 4. Furthermore, all the tasks outlined in the Trust CCRI have
close linkages to other tasks in the CTA that concentrate on various characteristics of different
network types and the dynamics of networks in the EDIN CCRI. In the following, we describe
the research that will be conducted in FY10 in each project.

Figure 1: Trust Projects

T1: Trust models and metrics: computation, inference and aggregation : The overall goal of
this project is to investigate the fundamental question of how the trust relationship between two
entities in the composite network can be computed and aggregated. There is a substantial body of

NS CTA IPP v1.4 3-7 March 17, 2010


literature that addresses this question from many different perspectives originating in different
disciplines – the goal of this project will be to espouse a view which promotes an end-to-end
understanding of these issues starting from the point when information and data is gathered,
stored, transported by a communication network, processed by information network, and finally
used in decision making by humans. The main aim of this project is to develop a unifying view
of trust across all these different domains. Dakshi Agrawal from INARC and Prasant Mohapatra
from CNARC will lead this project.

In FY10, this project will concentrate on providing unified models and ontologies for computing
trust by incorporating factors and models from different networks. We will conduct in-depth
investigation of critical trust factors that are most relevant to NCO. In year 1, we will concentrate
on two key factors that play a crucial role in determining quality of information and therefore
trust in decision making. : provenance (the origin of the information, the path it traveled and the
changes that are made to it), credibility (how believable the information is). We will develop
cognitive models of credibility and technology-enabled human interactions to better understand
the perception of trust.

T2: Understanding the interactions between network characteristics and Trust: The main
research challenge in this project will be the identification of the various interactions between the
network characteristics and trust. Some of the main research topics to be studied in this project
are how trust enables the various types of interactions in the network, how the different behavior
dynamics such as conversation or mobility patterns that are related to trust can be detected, how
these different behaviors are correlated to each other and to which degree a network can sustain
these behaviors. In FY10, we will investigate economic models that deal with risk in the context
of trust, networks characteristics related to patterns of reciprocity, mobility, heterogeneity, and
the variations in channel conditions. The theory developed in Project T2 will build on the
fundamental work in Project T1 and will be used to develop the necessary paradigms for
enhancing trust, studied in Project T3.This project will be led by Sibel Adali from SCNARC.

T3: Fundamental paradigms for enhancing Trust: This project looks at questions relating to
trust meta-data – how do we propagate the trust models about entities in the network, how do we
support dynamics of trust, and how can we modify the network to increase the trustworthiness of
entities. In FY10, we will look at how to scalably disseminate the trust meta-data, how to reliably
establish trust for newcomers, and how to revoke trust as the environment changes. In later years,
we will examine how to tie the properties of the network to the emergent trust of the entities, and
possibly optimize the structure of the networks to improve the overall trust behaviour. The effort
in this project requires a comprehensive unified trust model developed in Project T1, and will
provide a framework for validating ideas in both Projects T1 and T2. Lessons learned in T3 will
feed the development of ideas in T1 and T2. This project will be led by Karen Haigh from IRC.

NS CTA IPP v1.4 3-8 March 17, 2010


3.5 Evolving Dynamic Integrated (Composite) Networks (EDIN)
CCRI

A basic tenet of Blue Forces Network-Centric Operations (NCO) is that the mission
effectiveness of networked forces can be improved by information sharing and collaboration,
shared situational awareness, and actionable intelligence. The effectiveness of such networks is
dependent on our ability to accurately anticipate the evolution of the structure and dynamics of
social-cognitive (SCN), information (IN) and communication (CN) networks that are constantly
influencing each other. Understanding the structure of these component networks and the
dynamics therein, and of the dynamic interactions between these networks is crucial to the
design of robust composite networks which is one of the primary goals of the NS CTA program.

Understanding the evolution and dynamics of a network entails understanding both the structural
properties of dynamic networks and understanding the dynamics of processes (or behaviors) of
interest embedded in the network. Typically, the dynamics of network structure impacts certain
processes (e.g., how information propagates through the network); but at the same time the
dynamics of processes (or behaviors) may result in alteration of network structures. Therefore,
gaining a fundamental understanding of such relationships under several kinds of network
dynamics is of paramount importance for obtaining significant insights into the behavior of
complex military tactical networks as well as adversarial networks.

Decision-makers in this complex network of networks generally take individual or collaborative


decisions that are impacted by their dynamic surroundings, which may include the current state
of various network variables or extraneous non-network variables; and they expect to get a
certain benefit from taking such a decision while incurring a certain cost (in a generalized sense).
They ideally want to make near-optimal decisions assuming certain knowledge of past, present,
and expected future state of the network and non-network variables but are often forced to
operate with incomplete knowledge under a time deadline. This situation is made even more
complex in the case of integrated networks where a particular decision may have a significant
impact on the future structure of the network. While the individual networks (CN, IN, and SCN)
have been studied in isolation to some extent, the lack of a common mathematical language and
mathematical formalism to jointly represent these interacting networks has hindered
understanding of such complex networks as a whole.

NS CTA IPP v1.4 3-9 March 17, 2010


Figure 2: EDIN Projects and Tasks

To address these challenges, the NS CTA has formulated this CCRI named EDIN (Evolving
Dynamic Interdisciplinary Networks) whose focus is to characterize, analyze and design
composite networks that consist of social/cognitive, information and communication network
components. The key research questions that will be address by this research include:

How should dynamic composite networks be represented? What are their fundamental
attributes? How do we represent dynamic network structures succinctly but with
sufficient richness? How do we incorporate probabilistic or fuzzy information?
How to model time-varying dependencies and interaction between devices, information
objects, and humans? How to model conditional triggers/interactions between pairs of
networks? How to model interaction with adversary networks, which may only be
partially known or even observable?
What are the factors and rules of network formation and the subsequent co-evolution?
How do we predict the integrated effects of both emergent properties from simple rules
and complex engineered behavior from complex rules?
What are the objectives of an integrated network? What are good cross-cutting shared
metrics for measuring the network’s effectiveness in accomplishing the goal? Can the
overall network function (metrics) be systematically composed from constituent network
functions (metrics)?

NS CTA IPP v1.4 3-10 March 17, 2010


What are the dynamic processes executing inside static or dynamic networks? How do
they act as stimuli and impact the modification of network structure and properties? How
do changes in local structure affect global properties/behaviors being subject to network
formation constraints?
What is the effect of deterministic and stochastic evolution in one network (e.g., spatio-
temporal dynamics in SCN) on another network (e.g., CN) at widely varying time scales?
How to optimally control network structure with respect to a given cross-cutting metric
by modifying a few key network variables while taking into account constraints imposed
by network formation rules?

To answer these research questions, we have broken down the research efforts under EDIN into
4 major projects summarized below. Each project consists of 2-3 tasks that have been named in
Figure 2. The high-level linkages between these research projects have also been shown.
Project E1: Ontology and Shared Metrics for Composite Military Networks (Leads:
Prithwish Basu, BBN (IRC); Jim Hendler, RPI (SCNARC, IRC)) This project is focused on
developing a shared vocabulary and ontology across social, information, and communication
networks. Specifically, it will identify entities in a composite network and their attributes; what
are the relationships between them and how do they affect network formation; what are the
metrics that need to be defined across composite networks irrespective of their representation
structure (this will include metrics relevant to tactical missions that use all 3 networks).
Project E2: Mathematical Modeling of Composite Networks (Leads: J. J. Garcia-Luna-
Aceves, UC Santa Cruz (CNARC); Prithwish Basu, BBN (IRC)) This project will focus on
the development of mathematical representations, models, and tools to capture the salient aspects
of dynamic composite networks and their evolution. Since our knowledge about composite
network structures is in its infancy, we will explore a number of orthogonal mathematical
modeling approaches rather than focusing on one or two. We will investigate composite multi-
layered graph theory, tensor analysis tools, temporal graphlets, dynamic random graphs,
constrained optimization etc. Since each technique is likely to have advantages and
disadvantages with respect to the specific scenario, we have to be careful to not declare winners
or losers too early in the research.
Project E3: Dynamics and Evolution of Composite Networks (Leads: Ambuj Singh, UC
Santa Barbara (INARC); Boleslaw Szymanski, RPI (SCNARC)) This project is focused on
the investigation of the temporal evolution of various structural properties of integrated
networks; We will study the short-term effects of network stimuli and dynamics, e.g., arrival of
new information flows, deletion of nodes, on the properties of a given composite network and
also investigate how composite networks co-evolve in longer time scales due to exogenous and
exogenous influences. This project will also focus on the development of cross-cutting
approaches for the prediction of dynamic evolution of networked communities (groups of
soldiers, clusters of similar documents etc.) using multi-theoretic multi-level modeling
techniques.
Project E4 - Modeling Mobility and its Impact on Composite Networks (Lead: Thomas La
Porta, Penn State University): The goal of this project is to develop a suite of mobility models
that capture metrics of specific interest to the evolution of different types of networks, and
ultimately the evolution of composite networks. We also plan for models that will make use of

NS CTA IPP v1.4 3-11 March 17, 2010


the motivation for movement. These models will allow synthetic traces to be generated that are
statistically close to actual traces, but that may represent a large set of scenarios. These models
may then be used by the core programs of the CTA to determine the impact of mobility on
social, information and communication networks.

3.6 IRC non-CCRI research

A primary goal of IRC is to conduct fundamental cutting-edge research in network science


pertaining to the areas of communication, social/cognitive and information networks, and how
they interact with each other to yield synergistic benefits. To meet this goal, IRC has developed a
research plan consisting of both fundamental basic research (6.1) and applied research (6.2)
components. The 6.1 component consists of a diverse set of topics including development of
advanced mathematical models for network science; development of efficient techniques for
extracting knowledge about networks; and the characterization of the interdependencies of
military network components. The 6.2 component consists of several modeling, simulation, and
experiment design tasks that will be instrumental in the validation and verification of the various
6.1 research ideas not only in the IRC but also across the Consortium.

R1: Methods for Understanding Composite Networks The goal of this project is to work with
a variety of alternative approaches to understanding networks composed from communication,
social/cognitive and information networks and, as these efforts uncover insights, to provide those
insights to the EDIN project. A core question for this project is: Are integrated multi-genre
networks better understood and controlled using techniques outside classic structurally-focused
approaches?

Advanced Mathematical Models from Economics. Mathematical economics provides a well-


developed mathematical theory to model the behavior of large systems of (approximately)
rational agents as they act to deploy limited resources for value creation. A fundamental
principle is that rational (i.e., utility maximizing) agents respond to incentives, and so
identifying the operable incentives is always a first step of analysis. Economics provides a
repertoire of mechanisms, such as call markets, competitive equilibrium, and auction
mechanisms, in support of effective coordinated control for a range of situations. In adopting an
economic viewpoint for the understanding and control of complex, heterogeneous networks, the
goal of a network is construed as that of promoting fast, high-quality decision-making. Networks
that support effective decentralized decision-making are high-utility networks. We will pursue
the task of economic modeling of integrated networks at two levels. First, we will use market
modeling to capture specific network resource allocation scenarios, and analyze the resulting
models as economic systems. Second, we will seek to effect useful coordination across and
within networks by inferring the utility of actors within the networks, identifying simple
parameterizations of the decision environment that facilitate automated control to improve
behavior, and developing incentive-compatible mechanisms to elicit additional information as
necessary from participants.

Compositional Methods: One of the primary tasks for the IRC is to provide the understanding of
how global network properties or behaviors can be composed from local properties of
information, social-cognitive, and communications networks. The non-homogeneity of the

NS CTA IPP v1.4 3-12 March 17, 2010


different genres of networks involved calls for modeling abstract mathematical tools that are
capable of capturing the commonalities and the differences among various components and still
capable of providing means for deriving models of such complicated structures. We propose to
use category theory – a mathematical framework that has proved to be very efficient in capturing
representations of many, seemingly disparate, mathematical structures. This approach to
modeling relies on the colimit operator of category theory, which can be intuitively understood
as an extension of the shared union operator of set theory. This research will contribute to both
the modeling of composite networks and to the composite metrics on networked systems. We
plan to leverage this work in year 2 of the E2 project.

Efficient Knowledge Extraction from Integrated Networks. This task will develop theory,
methods, and tools for efficiently extracting knowledge from large networks of various types,
such as communication, social/cognitive, and information networks, and especially, integrated
compositions of the above. The problems addressed here are fundamental in our ability to move
from the plane of measured data and observed behavior to “knowledge”. A key goal here is to
measure and understand the fundamental relationships that define the networks and their
characteristics. Such knowledge can then be utilized not only for later analysis but also for
modeling, simulation and experimental Design in R3. In addition, we want to understand the
interplay between network structure and function, specifically, when function pertains to
information retrieval and information flow.

R2: Characterizing the Interdependencies Among Military Network Components This


project is predicated upon the fact that the vision of NCW/NCO is largely driven by
“information”. Decision-makers and soldiers often search for and exchange large amounts of
information for accomplishing any mission. The path followed by the information is usually
dictated both by social network structures and by communication networks. Hence, this project
focuses on multiple aspects of information propagation across socio-cognitive networks. First,
we propose to extend the notion of Shannon capacity of a physical communication channel that
transports bits to a notion of “semantic information capacity” of an information communication
channel between humans. The information that flows through social networks cannot be viewed
just as bits - they are bits that encode specific meanings, and those meanings only are
comprehensible in the presence of specific semantics. Information networks carry this encoded
meaning, but it is only useful when interpreted by humans, and that notion of interpretation (or
understanding) is not covered in Shannon’s work. Thus, there is a crucial link between social and
information networks that needs to be better explored to be able to develop a network science
that can explain these newer forms of networks.

Then, we will explore how loss or disruption of communications and information networks will
affect the function of social networks and how to prevent the effects of this loss when networks
we must maintain would be impacted. We will focus on issues such as characteristics of
information loss and error scenarios, intentional and inadvertent information loss and error, error
propagation through network topologies, estimation of information loss and error levels given
data, socio-cognitive network metrics that are robust in the face of information loss and error,
etc.

NS CTA IPP v1.4 3-13 March 17, 2010


Finally, we must understand information from the point of view of its utility over time or in a
particular context. We will investigate how various forms of suboptimal information (non-
relevant, redundant, untrustworthy or corrupted information) returned by information sources as
a result of a query by an operator can impact the expected utility of the information for decision-
making purposes.

R3: Experimentation With Composite Networks A central goal of the NS CTA is to


demonstrate the ability of network science to assess, understand, analyze, measure and predict
the performance of combined social, cognitive, information and communications networks. This
project is a multi-year project employing 6.2 Applied Research funding with fundamental
support from a 6.1 task to evaluate cross genre, combined network research. It is a consumer of
much of the 6.1 research in the ARCs and the IRC. The project seeks to inject that research into
experimental environments to evaluate our ability to understand and measure the behavior of
combined networks. The project produces output both back to researchers in the centers and
forward to technology transition. Output to researchers is information about how well their work
performed in realistic emulated or simulated test environments. The researchers can use this
information as guidance in refining or redirecting their research. Work that proves particularly
promising when tested in this project can be recommended for technology transition to Army
consumers.

R4: Liaison In order to facilitate cross-center communication, collaboration and coordination at


a technical level, the IRC has charged specific researchers with assuring effective liaison with
the ARCs. This project provides dedicated funding for their work in enabling and promoting
these goals.

R5: Technical and Programmatic Leadership This project consolidates the consortium’s
technical and programmatic management.

Education Component Fundamental to the research process is the rapid dissemination of


research results, the discussion within the community of these results, and the education of new
participants including new researchers. This Component will address these issues as two tasks:
research seminars and education. Each Center will appoint a member of a seminar coordination
committee (SCC). The SCC will establish an interdisciplinary seminar series for the Research
Centers and CCRI’s (Trust and EDIN). Each seminar will be multicast, and the IRC will set up a
repository for archiving the seminars and making past seminars available within the CTA.

The IRC will also form an Educational Coordination Committee (ECC), with membership drawn
as appropriate from the academic members of the NS-CTA. Given that Network Science is a
nacent discipline, the purpose of the ECC will be to promote establishment of Network Science
educational programs of benefit to the NS-CTA client base. Initially, the ECC will formulate
and produce short courses, archived by the IRC, on NS topics. The ECC will also develop
workshops for participants in the NS-CTA. The ECC will compile and disseminate information
on existing degree programs of interest to Network Science. The ECC will discuss the potential
for establishing educational programs at multiple institutions, and ultimately publish one or more
model curricula for degree programs, particularly graduate degrees, in Network Science.

NS CTA IPP v1.4 3-14 March 17, 2010


Shared facilities and research resources
The NS CTA requires – and is creating – an extensive infrastructure of resources and expertise in
performing research experimentation and validation. The NS CTA Facility provides one central
location for such Alliance-wide capabilities; in addition, the Alliance has a highly diverse and
rich ensemble of research experimentation resources among its many organizations: testbeds,
emulation and simulation tools, data sets, and analysis and visualization tools, to name a few
representative classes of resource.

In order for the shared facilities and research experimentation resources to be ready and actually
usable as 6.1 research needs emerge and as 6.1 research results become ready to transition into
6.2 research, we must start in year one to build the tools, the systems, the expertise, and the
shared culture to allow effective creation and continuing extension of shared facilities and
resources. This responsibility is led by the IRC. Basic scientific questions must be answered by
year-1 pure research in order to enable effective research experimentation in the network science
domain of intensely interacting interdisciplinary networks of networks; Project R3 concurrently
develops the needed applied research insights and results; and each project informs and advances
the other. We will use these projects not only to enable shared use of the Facility resources but to
enable shared use of an increasing pool of distributed resources throughout the course of the NS
CTA program. As Alliance network science research advances, we will use the insights and
capabilities created by the projects to identify highly valuable resources and facilities throughout
the Alliance that can technically and pragmatically be united with the growing distributed facility
base to allow more widespread use by Alliance researchers. It is the emerging experience of all
Alliance researchers that must drive the selection of shared research resources and the
investment of effort required to make them widely accessible and easily usable to the Alliance.

3.7 INARC non-CCRI research

The Information Network Academic Research Center (INARC) is aimed at (1) investigating the
general principles, methodologies, algorithms, and implementations of information networks,
and the ways how information networks work together with communication networks, social and
cognitive networks, and (2) developing the information network technologies required to
improve the capabilities of the US Army and provide users with reliable and actionable
intelligence across the full spectrum of Network Centric Operations.

An information network is a logical network of data, information, and knowledge objects that are
acquired and extracted from disparate sources, such as geographical maps, satellite images,
sensors, text, audio, and video, through devices ranging from hand-held GPS to high-
performance computers. Moreover, for systematic development of information network
technologies, it is essential to deal with large-scale information networks whose nodes and/or
edges are of heterogeneous types, linking among multi-typed objects, highly distributive,
dynamic, volatile, and containing a lot of uncertain information.

INARC will systematically investigate the foundations, methodologies, algorithms, and


implementations needed for fusing multiple types of data and information, constructing effective,
scalable, hierarchical, and most importantly, dynamic and resilient information networks,

NS CTA IPP v1.4 3-15 March 17, 2010


discovering patterns and knowledge from such networks, and applying network science
technologies for military applications. In particular, the center will be working on the following
five research projects:

• Project E(EDIN-CCRI): Foundation of Evolving, Dynamic Information Networks


• Project T (Trust-CCRI): Foundation of Trusted Information Networks

• Project I2: Scalable, Human-Centric Information Network System
• Project I3: Knowledge Discovery in Information Networks

Among the five projects, the first two, E and T, are two cross-center research initiative (CCRI)
projects. For these two projects, we will work closely with three other research centers,
CNARC, SCNARC, and IRC, to make good contributions on system development of cross-
center technologies in these projects. Therefore, these two CCRI projects will be detailed in their
corresponding CCRI project descriptions. This INARC IPP proposal will be dedicated to the
accomplishment of remaining three INARC-centered projects. Moreover, INARC will be
actively contributing to, together with other centers, a comprehensive CTA-wide education plan.

Here we provide a general overview of the three dedicated INARC research projects, and outline
for every project the major research problems, the tasks to be solved, the organization, and the
plan for collaboration with other centers.

I1: Distributed and Real Time Data Integration and Information Fusion This projects aims
to answer two fundamental questions: (i) “how to integrate and fuse heterogeneous data (sensor,
visual, and textual) that may be delivered over resource-constrained communication networks to
infer and organize implicit relationships to enable comprehensive exploitation of information,
and how to improve fusion by incorporating human feedback in the process?” and (2) “how to
model uncertainty resulting from resource-constrained communication, data integration, and
information fusion to enable assessment of the quality and value of information?” The focus of
this project is on large scale information extraction and fusion in the context of a large linked
information network. The goals of the project include both the derivation of such logical
linkages as well as their use during the fusion process. This project will be co-led by Charu
Aggarwal (IBM) and Tarek Abdelzaher (UIUC).

The project consists of three tasks. (1) Signal Data Integration (led by Tarek Abdelzaher),
which is to answer the key research question: “How to integrate and fuse sensor feeds (e.g.,
scalar, streaming data sources) into a semantic information network so as to a) select the most
appropriate sensors and sensors modalities for maximizing global information value of the
network given current resource constraints; b) create effective summarization of sensor data that
accounts for underlying uncertainty in data, for example those imparted by tactical networks and
sources of questionable provenance?” (2) Human and Visual Data Integration (Led by Thomas
Huang), which is to answer the key question: “How to integrate and fuse human generated and
visual data (e.g., unstructured, multi-dimensional data sources) so that a) network analysis is
enriched by virtual linkages embedded in different kinds of data leading to information gain; b)
semi-automated analysis of activities in social networks (e.g., occupancy levels in a building,
distribution of human movement, etc.) can be performed through powerful data representation
models. How can human and visual data be used to uncover the structures of social networks?”

NS CTA IPP v1.4 3-16 March 17, 2010


and (3) Modeling Uncertainty in Heterogeneous Information Sources (led by Heng Ji), which
is to answer the key questions: “(i) How to model the uncertainty obtained from fusing massive
volumes of data that contains uncertainty as a result of imperfect communication network and
data sources of questionable provenance? and (ii) how to combine information, consistency
verification and maintenance, and information propagation from multiple sources by using
extended joint inference that includes the end-user feedback.”

I2: Scalable, Human-Centric Information Network Systems

This project is concerned with the following research problems: (1) “how to organize and
manage distributed and volatile Information networks?” and (2) “how to analyze and visualize
information networks to provide end-users information tailored to their context?” A
sophisticated information network system should present a human-centric, simple and intuitive
interface that automatically scales according to the context, information needs, and cognitive
state of users, and maintain its integrity under uncertainties, physical constraints (communication
capacity, power limitation, device computation capability, etc.), and the evolving data
underneath. This project will be led by Xifeng Yan (UCSB).

The project consists of three tasks: (1) information network organization and management
(led by Xifeng Yan and Ambuj Singh), which is to answer the key questions: (i) “how to
organize provenance metadata to enable robust, multi-resolution computation of QoI, and (ii)
how to design and implement graph query languages to enable flexible network information
access?” (2) information network online analytical processing (led by Xifeng Yan), which is
to answer the key questions: “(i) how to perform multi-dimensional analytics that allows a non-
expert to explore information networks in real time, and (ii) what are new fundamental
operators that support network-wise graph modeling and analytics?” and (3) information
network visualization (led by Tobia Höllerer), which is to answer the key question: “what are
essential in situation-aware information network visualization?”

I3: Knowledge Discovery in Information Networks

This project considers a key research problem: “how to develop efficient and effective knowledge
discovery mechanisms in distributed and volatile information networks?” Knowledge discovery
in information networks involves the development of scalable and effective algorithms to
uncover patterns, correlations, clusters, outliers, rankings, evolutions, and abnormal relationships
or sub-networks in information networks. Knowledge discovery in information networks is a new
research frontier, and the Army applications pose many new challenges and call for in-depth
research for effective knowledge discovery in distributed and volatile information networks. The
project is led by Jiawei Han (UIUC).

The project consists of three tasks: (1) new principles for scalable mining of dynamic,
heterogeneous information networks (led by Jiawei Han), which is to answer the key question:
“what are the new principles and methods for mining distributed, incomplete, dynamic, and
heterogeneous information networks to satisfy the end-user needs?” (2) real-time methods for
spatiotemporal context analysis (co-led by Spiros Papadimitriou and Jiawei Han), which is to
answer the question: “how to mine spatiotemporal-related patterns in information networks in

NS CTA IPP v1.4 3-17 March 17, 2010


distributed, mobile environment?” and (3) text and unstructured data mining in information
network analysis (led by Dan Roth), which is to answer the question: “how to develop effective
mechanisms for mining knowledge from text and unstructured data in information networks in
noisy, volatile environment?”

Since information networks are intertwined with communication networks and social and
cognitive networks in many aspects, concrete collaboration plans with all the three centers have
been laid out in every project’s IPP, as well as the plan for the exploration of military
applications.

3.8 SCNARC non-CCRI research

3.8.1 Research Challenges:


In large organizations such as the U.S. Army, social networks involve interplay between a formal
network imposed by the hierarchy of the organization and informal ones based on operational
requirements, historical interactions, and personal ties of the personnel. Traditionally it has been
difficult to extract data that can document the interplay between these informal and formal
networks. We address this challenge by planning to utilize data collection developed by the IBM
team of the Center. The data covers employee interactions, communications, activity, and
performance across the IBM Corp. This data will enable us to conduct research to further our
understanding of the fundamental aspects of human communication within an organization and
the impact of social and cognitive networks on issues ranging from team performance to the
emergence of groups and communities.

Currently, and in foreseeable future, the U.S. Army is or will likely be operating in coalition with
allied armies and deeply entangled within the foreign societies in which those missions are
conducted. Social networks of the allies and the involved societies invariably include groups that
are hostile or directly opposed to U.S. Army goals and missions. Such groups are embedded in
larger social networks, often attempting to remain hidden to conduct covert, subversive
operations. The challenging research issues of studying such adversary social networks include
their discovery, the construction of tools for monitoring their activities, composition, and
hierarchy, as well as understanding their dynamics and evolution. We will address these
challenges based on statistical and methods for analyzing large social networks.

Ultimately, the network benefits and impact are limited by the human’s capability to understand
and act upon the information that they are capable of providing. Hence, the human cognition is
the important component of understanding how networks impact people. Important challenges to
study human cognition in relation to network-centric interactions include finding how limits on
cognitive resources or cognitive processing influence social (or asocial) behavior or what
demands do social behaviors make on cognitive resources and processing that may limit basic
information processing mechanisms such encoding, memory, and attention. We would like also
to understand what social behaviors (e.g., trust) are most important to or most influenced by Net-
Centric mediated human communications. In terms of performance, the central challenge is to
understand how the design of Net-Centric technologies or the form and format of Net-Centric
information interact with the human cognitive processes responsible for heuristics and biases in
human performance. Finally, there is a challenge to create predictive computational cognitive

NS CTA IPP v1.4 3-18 March 17, 2010


models of individuals interacting via Net-Centric technologies (i.e., interactive cognitive agents).
Those are the issues that we will address in our research on cognitive aspects of social networks.
Fundamental research challenges of dynamic and evolving integrated networks are the subject of
the CCRI EDIN to which researchers of SCNARC bring an important view of social, cognitive
and network science perspective. Since these challenges and associated research are described
elsewhere, no further discussion of these issues is given here.

Central to the operation and efficiency of social networks is the level of trust among interacting
agents. The challenges of studying trust in networked systems are the focus of the CCRI research
in this area to which the SCNARC researchers will fundamentally contribute, as ultimately trust
is a social issue with technology and information playing an important but supportive role. The
corresponding challenges are discussed in a separate Trust section of the IPP, so are no discussed
here.

3.8.2 Research Approach


The research effort is organized in the following 4 projects.

S1: Networks in Organization Project S1 focuses on analyzing and understanding fundamental


principles guiding the formation and evolution of large organizational networks and their impact
on performance of the teams embedded in them. This project consists of three tasks, each
targeting a major aspect of social network researches – capturing of networks, impact of
networks, and understanding of networks.

The first task, Infrastructure Challenges for Large-Scale Network Discovery and Processing, will
study the infrastructure challenge in gathering and handling large-scale heterogeneous streams
for social network research, with the context of information network and communication
network. We will conduct system level research to consider how to incorporate real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
Given that informal social network data reside intrinsically in different data sources, informal
networks can usually only be inferred from sampling larger networks. Since partially observed
data is a norm, we will derive mathematical theories to investigate the robustness of graph
sampling and its implications under various conditions. We will investigate what types of
sampling strategies are required to obtain a good estimation on the entire network. We will also
investigate analytic methods to conduct network analysis on only partially observed data.

Task 2 of Project S1, Impact Analysis of Informal Organization Networks, Second, will analyze
the impacts of informal social networks in an organization. We want to learn what and how
informal networks affect the performance of people. We shall model, measure and quantify the
impact of dynamic informal organizational networks on the productivity and performance of
individuals and teams; to develop and apply methods to identify causal relationships between
dynamic networks and productivity, performance and value; to model and measure peer
influence in social networks by statistically distinguishing influence effects from homophily and
confounding factors; and to examine how individuals change their use of social networks under
duress.

NS CTA IPP v1.4 3-19 March 17, 2010


Task S1.3, Multi-Channel Networks of People, will investigate the multi-channel networks
between people. With the unique large-scale info-social network system of IBM’s SmallBlue
system, we are able to capture multi-facets of people’s relationships via such means as email,
instant messaging, teleconference, etc. Based on this data, we will explore whether channel
capacity and coding theories in communications and information theory can be extended to the
humanity domain to model the relationship variation and distribution in different channels.

S2: Adversary Social Networks: Detection and Evolution Project S2 focuses on adversary
networks. The broad research questions which we address in this project include (i) identification
of communities in a dynamic social network, especially hidden and informal groups, based on
measurable interactions between the members, but without looking at details of the interactions,
(ii) uncovering relations between communities in terms of membership, trust and opposition or
support, (iii) observing evolution and the stable cores of communities, especially anomalous and
adversary communities or groups, and their relationships, (iv) understanding how information
flows within such communities and between communities.

To address the key research questions of this project, we defined two tasks. In the first task, S2.1
Detection of Hidden Communities and their Structures, we use interaction data over time to build
a picture of community structure and community evolution, including information pathways and
inter-community relationships. This is an appropriate first step in understanding the core of
social networks. In the second task, S2.2 Information Flow via Trusted Links and Communities
we build agent-based models to study how information pathways are affected by the varying
degrees of trust between individuals and communities in heterogeneous networks which contain
adversary (non-trusted) as well as non-adversary (trusted) networks.

S3: The Cognitive Social Science of Net-Centric Interactions This project will bring the
computational modeling techniques of cognitive science together with the tools and techniques
of cognitive neuroscience to ask how the design of the technology (human-technology
interaction), the form and format of information (human-information interaction), or features of
communication (human-human interaction) shape the success of net-centric interactions. It
includes three tasks.

The first task S3.1: the Cognitive Social Science of Human-Human Interactions investigates
cognitive mechanisms that influence human interactions. Our initial topic will temporarily
combine this effort with the CCRI on trust to investigate how cognitive mechanisms are affected
by trust and how human evaluations of trust influence our subsequent cognitive processing of
information. A year 1 focus of SCNARC 3 will be to examine the effect of trust on cognitive
processing and variations in human trust over human-human versus human-agent interactions.
Specifically, we hypothesize that differences in trust are signaled by differences in cognitive
brain mechanisms and that these differences can be detected by event-related brain potential
(ERP) measures and related to established cognitive science constructs, which in turn can be
incorporated as improvements in the ACT-R cognitive architecture. A key element in the study
will be the analysis of cognitive brain data collected from humans as they receive information
from the interactive cognitive agent or other humans.

NS CTA IPP v1.4 3-20 March 17, 2010


The second task, S3.2: Human-Technology and Human-Information Interactions, in the first year
will develop a Net-Centric Simulated Task Environment to collect across multiple locations data
on Cognitive Social Science constructs of interest. The focus will be on technology mediated
human-human interactions along with an emphasis on interaction design and information form,
format, and order. These data will be collected and analyzed across at least three sites:
Rensselaer, CUNY, ARL APG, and PARC.

3.9 CNARC non-CCRI research

The overarching goal of CNARC research is to understand and characterize complex


communications networks, such as those used for network-centric warfare and operations, so that
their behavior may be accurately predicted and they may be configured for optimal information
sharing and gathering. The objective of such a network is to deliver the highest quality of
information to support decision making and to provide comprehensive situational awareness,
increased mission tempo and overall supremacy in managing resources engaged in a mission.
Network science must embody the vision of a network as an information source delivering
quality information to support decision making. Therefore, in this center we aim to characterize
and optimize network behavior in a way that maximizes the useful information delivered to its
users.

To this end we define a new currency by which we evaluate a network: its operational
information content capacity (OICC). We believe that this approach will truly capture the value
of a network and allow a science to be developed that fundamentally characterizes the volume of
useful information that a network can transfer to a set of users. Our goal is to understand and
control network behavior so that the operational information content capacity of a network may
be increased by an order of magnitude over what is possible today.

The key challenges to achieving this involve creating comprehensive models for data delivery
capabilities that embody OICC. A key inhibitor to reaching this goal is the inability to model
complex ad hoc networks with a sufficient level of fidelity and at sufficient scale to: (1) develop
the fundamental knowledge to enable the a priori prediction of the behaviors of diverse and
dynamic communications networks; (2) understand how communication networks impact or are
impacted by information and social networks; and (3) understand trade-offs and impact of
various protocols and techniques under a wide variety of dynamic adverse conditions. These
models must accurately capture characteristics (such as heterogeneity, uncertainty, mobility),
interactions (such as between communications, information, and social networks), constraints
(such as processing, bandwidth, protocol limitations), and dynamics of these networks.

To develop these comprehensive models, one must consider the information needs of tactical
applications that share a network, and must cast the behavior of the network in light of these
needs. As the network delivers data to applications, the data is transformed into information.
Different applications require different types of sources and different network behavior in terms
of data delivery characteristics and security to distill useful information from the data received.
The data delivery characteristics and security properties of the network may vary depending on
the location of the source of data. The goal of the network is to deliver the data required from

NS CTA IPP v1.4 3-21 March 17, 2010


which the highest quality of information (QoI) may be derived from the perception of the
application. QoI is at least a function of the source information (e.g., video stream or acoustic
sensor output or message), the ability of the network to deliver this information (e.g., delay may
be important if the quality of information degrades quickly with time for an application), and the
security services that the network provides (e.g., authenticated sources, authenticated routes).
The QoI can thus initially be represented in a compact form as QoI = f (I,D,S) where, I, D, and S
represent a characterization of the source information, the delivery capabilities of the network,
and security provided by the network, respectively.

We propose to achieve our high level research objective by:

Developing comprehensive models for data delivery capabilities that support QoI.
First developing comprehensive models defining OICC and QoI and the properties and
constraints of tactical networks that impact them. Ultimately models developed to
characterize the network behavior must account for the interactions of the physical
layer(s) present in the network with the higher layer protocols to accurately describe the
composite behavior seen by the applications in terms of their ability to gather and share
information. The focus of the network models are on the data delivery characteristics of
the network and the security properties. We recognize that achieving security properties
impacts network delivery capabilities and we will formally capture this interaction in
terms of QoI across applications. We will also leverage the existence of underlying
social networks and information networks to determine the fundamental limits of the
communications network. Social and information networks may allow a network more
degrees of freedom in terms of what type of information must be transferred, from where
information may be retrieved, and what security properties must be present for different
types of information and may also allow the communications network to better
accommodate mobility and reduce uncertainty.

Focusing on emerging communication networking paradigms that have promise to


significantly increase OICC. Then, using the results from these models we refine our
models to include networking paradigms that will improve OICC. We will leverage the
intelligence and storage capabilities of nodes, their ability to learn and collaborate and
mobility.

Identifying fundamental limitations in protocol structures. As we better understand


the impact of network paradigms on OICC we will analyze protocol structures to
determine if they prevent networks from reaching their optimal QoI, and if so, explore
new protocol structures that alleviate these bottlenecks.

Our research program is unique in the following ways:

We define a new objective for network science: the modeling and optimization of a
network to deliver useful information to applications.

We develop comprehensive models for the data delivery capabilities of networks that
differ from ongoing work in that they explicitly target the impact on QoI. These models

NS CTA IPP v1.4 3-22 March 17, 2010


include fundamental limits of realistic mobile tactical networks, and advanced
communication technologies including concurrent communications, collaborative
networking, and in-network storage and processing. We explicitly include mobility
models, and account for heterogeneity in terms of network node capabilities,
communication technologies, and applications.

We incorporate security as a first class citizen in network modeling and account for the
interactions between security and data delivery in terms of QoI.

We fully leverage the existence of social and information networks so that the
communication network can optimize the delivery of information (or sets of equivalent
information) so that the operational information content capacity of the network is
optimized.

We iterate between extensive experimentation and modeling to capture unforeseen


interactions between components of the network, limitations of implementation, and
other effects derived by the geometry and size of networks representative of battalion-
size deployments (e.g., 100-1000 nodes). We have access to several state-of- the-art test
beds for this program.

We analyze protocol structures to determine fundamental limitations that prevent


networks from reaching their full QoI potential, and design new protocol structures to
alleviate these bottlenecks.

The CNARC will execute three projects:

C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks – This project
focuses on the behavior of OICC and QoI and the factors that impact them and will seek to
answer the research questions: What are the fundamental limits of Operational Information
Content Capacity in dynamic, heterogeneous mobile networks jointly considering data delivery
and security properties? What are the relative impacts of properties and constraints of tactical
MANETs on OICC, and why?

C2: Characterizing the Increase of QoI due to Networking Paradigms – This project focuses
on the impact of select networking paradigms in increasing QoI and OICC and will seek to
answer the research questions: What are the networking paradigms that will most improve
OICC? What are the gains achieved by these paradigms and why?

C3: Achieving QoI Optimal Networking – This project focuses on the structure of protocols
that may limit the QoI achieved in practical networks and will seek to answer the research
questions: What are the structural limitations that protocols impose on achieving QoI? How can
these limitations be removed?

In the first year the largest effort will be on C1 with a significant effort on C2. Project C3 will be
deferred until the second year of the program when we have an understanding of the factors that
impact OICC and QoI and the networking paradigms that hold the most promise.

NS CTA IPP v1.4 3-23 March 17, 2010


4 Cross-Cutting Research Issue: Trust in Distributed
Decision Making

Coordinator: Sibel Adali, RPI (SCNARC)


Email: sibel@cs.rpi.edu, Phone: 518-276-8047
Government Lead: Gerald (Jerry) Powell, ARL
Email: gerald.m.powell@us.army.mil, Phone: 732-532-0028

Project Leads Lead Collaborators


Project T1: D. Agrawal, IBM (INARC);
G. Powell, ARL
P. Mohapatra, UCD (CNARC)
Project T2: S. Adali, RPI (SCNARC)
Project T3: K. Haigh, BBN (IRC)

Table of Contents
4 Cross-Cutting Research Issue: Trust in Distributed Decision Making ...................... 4-1
4.1 Overview .............................................................................................................. 4-3
4.2 Motivation ............................................................................................................ 4-4
4.2.1 Challenges of Network-Centric Operations ............................................................. 4-4
4.2.2 Example Military Scenarios ..................................................................................... 4-5
4.2.3 Impact on Network Science...................................................................................... 4-5
4.3 Background on Trust ............................................................................................ 4-6
4.4 Key Research Questions ...................................................................................... 4-8
4.5 Technical Approach ............................................................................................. 4-9
4.6 Project T1: Trust models and metrics: computation, inference and aggregation4-11
4.6.1 Project Overview .................................................................................................... 4-12
4.6.2 Project Motivation .................................................................................................. 4-13
4.6.3 Key Project Research Questions ............................................................................ 4-13
4.6.4 Initial Hypotheses ................................................................................................... 4-13
4.6.5 Technical Approach ................................................................................................ 4-14
4.6.6 Task T1.1: Unified Models and Metrics of Trust (J. Opper, BBN (IRC); S. Adali, RPI
(SCNARC); D. Agrawal, IBM (INARC); J. Golbeck, UMD (SCNARC); K. Haigh, BBN
(IRC); K. Levitt, UCDavis (CNARC); S. Parsons, CUNY (SCNARC); P. Pirolli, PARC
(INARC); D. Roth, UIUC (INARC); M. Singh, NCSU (CNARC); Collaborators: S. Aral,
MIT (SCNARC); J. Cho, ARL; P. Mohapatra, UCD (CNARC); C. Partridge, BBN (IRC); M.
Srivatsa, IBM (INARC); Red team reviewers).................................................................... 4-15
4.6.7 Task T1.2: Computing Trust Factors (M. Srivatsa, IBM (INARC); R. Govindan,
USC (CNARC); T. Abdelzaher, UIUC (INARC); S. Adali, RPI (SCNARC); D. Agrawal,
IBM (INARC); T. Brown, CUNY (INARC); J. Han, UIUC (INARC); H. Ji, CUNY

NS CTA IPP v1.4 4-1 March 17, 2010


(INARC); T. Kementsietsidis, IBM (INARC); C. Lin, IBM (SCNARC); M. Magdon-Ismail,
RPI (SCNARC); P. Mohapatra, UCD (CNARC); D. Roth, UIUC (INARC); M. Singh,
NCSU (CNARC); B. Uzzi, NW (SCNARC); Z. Wen, IBM (SCNARC); Collaborators: M.
Goldberg, RPI (SCNARC); S. Krishnamurthy, UCR (CNARC); S. Parsons, CUNY (INARC);
T. La Porta, PSU (CNARC); W. Wallace, RPI (SCNARC)) ................................................ 4-26
4.6.8 Task T1.3: Cognitive Models of Trust in Human-Information, Human-Human,
Human-Agent Interactions (P. Pirolli, PARC (INARC); W. Gray, RPI (SCNARC); T.
Hollerer, UCSB (INARC); M. Schoelles, RPI (SCNARC); X. Yan, UCSB (INARC);
Collaborators: S. Adali, RPI (SCNARC); G. Powell, ARL)................................................ 4-41
4.6.9 Linkages with Other Projects ................................................................................. 4-48
4.6.10 Collaborations and Staff Rotations ....................................................................... 4-50
4.6.11 Relation to DoD and Industry Research ............................................................... 4-50
4.6.12 Project Research Milestones................................................................................. 4-50
4.6.13 Project Budget by Organization ........................................................................... 4-53
4.7 Project T2: Understanding the interactions between network characteristics and trust
4-53
4.7.1 Project Overview .................................................................................................... 4-54
4.7.2 Project Motivation .................................................................................................. 4-55
4.7.3 Key Research Questions ......................................................................................... 4-55
4.7.4 Initial Hypotheses ................................................................................................... 4-56
4.7.5 Technical Approach ................................................................................................ 4-57
4.7.6 Task T2.1: Interaction of trust with the network under models of trust as a risk
management mechanism (C. Lim, RPI (SCNARC); A. Goel, Stanford (CNARC); R.
Govindan, USC (CNARC); W. Wallace, RPI (SCNARC); Collaborators: S. Adali, RPI
(SCNARC); G. Korniss, RPI (SCNARC); D. Parkes, Harvard (IRC); S. Parsons, CUNY
(SCNARC); M. Srivatsa, IBM (INARC); M. Wellman, UMich (IRC)) ............................... 4-57
4.7.7 Task T2.2: Network Behavior Based Indicators of Trust (S. Adali, RPI (SNARC); P.
Mohapatra, UCD (CNARC); N. Chawla, ND (SCNARC); K. Haigh, BBN (IRC); M.
Goldberg, RPI (SCNARC); D. Hachen, ND (SCNARC); K. Levitt, UC Davis (CNARC); O.
Lizardo, ND (SCNARC); M. Magdon-Ismail, RPI (SCNARC); J. Opper, BBN (IRC); Z.
Torackai, ND (SCNARC); W. Wallace, RPI (SCNARC); F. Wu, UCD (CNARC); M.
Faloutsos, UC Riverside (IRC); Collaborators: J. Garcia-Luna-Aceves, UCSC (CNARC) R.
Govindan, USC (CNARC), S. Krishnamurthy, UCR (CNARC), B. Szymanski, RPI
(SCNARC), B. Uzzi, NW (SCNARC)) ................................................................................. 4-66
4.7.8 Linkages with Other Projects ................................................................................. 4-75
4.7.9 Collaborations and Staff Rotations ......................................................................... 4-76
4.7.10 Relation to DoD and Industry Research ............................................................... 4-76
4.7.11 Project Research Milestones ................................................................................. 4-76
4.7.12 Project Budget by Organization ........................................................................... 4-77
4.8 Project T3: Fundamental paradigms for enhancing trust .................................. 4-78
4.8.1 Project Overview .................................................................................................... 4-79
4.8.2 Project Motivation .................................................................................................. 4-79
4.8.3 Key Research Questions ......................................................................................... 4-80
4.8.4 Initial Hypotheses ................................................................................................... 4-80
4.8.5 Technical Approach ................................................................................................ 4-80

NS CTA IPP v1.4 4-2 March 17, 2010


4.8.6 Task T3.1: Trust Establishment via Distributed Oracles (K. Haigh, BBN (IRC); C.
Cotton, UDel (IRC); M. Faloutsos, UCR (IRC); A. Iyengar, IBM (INARC); A.
Kementsietsidis, IBM (INARC); S. Krishnamurthy, UCR (CNARC); C. Lin, IBM
(SCNARC); J. Opper, BBN (IRC); Z. Wen, IBM (SCNARC); F. Wu, UCD (CNARC); S.
Zhu, PSU (CNARC); Collaborators: N. Ivanic, ARL; M. Srivatsa, IBM (INARC)) .......... 4-81
4.8.7 Linkages with Other Projects ................................................................................. 4-89
4.8.8 Collaborations and Staff Rotations ......................................................................... 4-89
4.8.9 Relation to DoD and Industry Research ................................................................. 4-89
4.8.10 Project Research Milestones................................................................................. 4-89
4.8.11 Project Budget by Organization............................................................................ 4-90

4.1 Overview

Trust is a relationship involving two entities, a trustor and a trustee, in a specific context under
the conditions of uncertainty and vulnerability. It is universally agreed that trust is a subjective
matter and it involves expectations of future outcomes. In a trust relationship, trustor encounters
uncertainty from a lack of information or an inability to verify the integrity, competence,
predictability, and other characteristics of trustee, and is vulnerable to suffering a loss if
expectations of future outcomes turn out to be incorrect. Trust allows trustor to take actions
under uncertainty and from a position of vulnerability by depending on trustee. Trust is not a
singular or simple concept (Taylor, 1989) - it is broader than the concepts of cooperation and
confidence (Mayer, Davis & Schoorman, 1995) but narrower than concepts of love and like, both
of which are very board and difficult-to-define terms (Brehm, 1992). Trustworthiness is a
concept closely related to trust, based on the expected outcomes that the trust relation depends
on. The trustor expects the trustee to act in a certain way and if they do, they are considered
trustworthy.
The goal of the Trust CCRI is to enhance distributed decision making capabilities of the Army in
the context of Network-Centric Operations (NCO), in particular, for IW and COIN by
understanding the role trust plays in composite networks that consists of large systems with
complex interactions between communication, information, and social/cognitive networks.
Towards that goal, we will structure the research program in three projects: The first project will
focus on models and metrics for trust, methods for computing factors related to trust, methods
for aggregating multiple factors, models, and metrics into a uniform trust framework. The second
project will focus on how the trust relationships mediate the different interactions between
networks: how the network characteristics influence factors related to trust and how trust
modulates various network characteristics. The third project will examine paradigms for
establishment of trust and propagation of trust related information in the composite network.
Before going into the details of these projects, we will briefly describe why trust-based decision
making is crucial to NCO and which technical barriers need to be overcome to effectively use
trust in distributed decision making. This will be followed by a background review of prior work
in trust and detailed project descriptions.

NS CTA IPP v1.4 4-3 March 17, 2010


4.2 Motivation

4.2.1 Challenges of Network-Centric Operations


A basic tenet of Network-Centric Operations (NCO) is that the strategic design and use of
networks will improve information sharing and collaboration resulting in shared situational
awareness and actionable intelligence – all of which will dramatically increase mission
effectiveness. These operations require interaction with a large number of interconnected
elements – sensors connected with a communication network; information available from many
different sources in an information network: known people, anonymous sources, automated
programs for mining information; and people connected to each other through different and
complex interpersonal relationships in a social/cognitive network. All these networks exist and
evolve simultaneously in form of a composite network. The composite network may contain
adversaries who lie, nodes that fail, and information which may be inconsistent. Network-centric
operations will require soldiers to actively interact with people who are not necessarily from their
own organization, monitor the progress of mission based on feedback obtained from a distributed
network and act quickly to important events – all of which produce conditions of uncertainty and
vulnerability. Under such conditions, trust plays a crucial role in decision making and it is
critical that the interfaces to the network facilitate access to trusted information and people and
help foster development of trust. Equally critical is the ability to adapt to changes and disruptions
in the network and to compensate for actions and events that may cause trust to diminish.
As opposed to the traditional settings of electronic marketplaces or other closed or limited
settings, the trust relevant for NCO must be derived in a distributed fashion, in time-critical and
stressful situations, in environments where node capture and subversion are likely; and where the
underlying communications network is resource-constrained, mobile, and dynamic; and where
decision makers‘ reliance on and compliance with an information system are subject to
numerous internal and external influences. Furthermore, the trust will have many components
each of which will be derived depending on its own context; trust components will have varying
uncertainties; and they will involve network‘s ability to be available, accurate, reliable, timely,
comprehensible, etc. The ability to achieve trust and incorporate it into decision-making methods
under such environments is fundamental to the success of networked forces and is the subject of
scrutiny for this research effort.

NS CTA IPP v1.4 4-4 March 17, 2010


Figure 1: Modern warfighting operations need a means to assess trustworthiness of communication,
information, and socio-interactive resources to ensure mission success in coalition, joint, and time-
critical operations.

4.2.2 Example Military Scenarios


As a motivating example, consider coalition partners working in joint operations (cross-service
and cross-nation) in a closely confined area, such as the streets of Baghdad or the mountains of
Tora Bora during Operation Anaconda (Naylor, 2005). Mission success depends on each
participant having a joint view of the battlespace and factors feeding into command and tactical
decisions which requires open sharing of information. However, unconstrained open sharing of
information could also open vulnerabilities and leak information such as the knowledge of sensor
placement, intelligence operatives, or weapon capabilities that could compromise ongoing or
future operations. Under these conditions, information sharing decisions carry inherent risks that
cannot be eliminated. Instead, we must focus our attention on trust in information as a key
element and how the modalities of information creation, sharing and transmission affects trust in
the networks. The other key element is trust in human (or software agents) which produce,
assess, spread, and otherwise use this information for their missions. In uncertain and vulnerable
conditions, trust allows humans to interact with each other and act on information, and therefore,
a deep understanding of trust is crucial to enhance the effectiveness of NCO.

4.2.3 Impact on Network Science


A research agenda that advances the understanding of trust for NCO will have a substantial
impact on the Network Science. To derive and achieve trust in decision making networks,
significant advancements of the state-of-the-art in information, social/cognitive, and
communication network research areas are required. To address the true complexity of the trust
problem, the research must be conducted jointly by different disciplines.
For example, information networks must derive trust based on the credibility, provenance,
accuracy, and timeliness of data sources and evolve this composite trust as data is transformed

NS CTA IPP v1.4 4-5 March 17, 2010


into information through pattern identification; into knowledge through the application of theory;
and ultimately into understanding by a human through visualizations. In social/cognitive
networks, models for trust in distributed decision making are needed that consider the interaction
of individual, team, and organizational trust in the information and communications technology
to include not only the usability of the human-system interface but also the mission utility and
other cognitive, social, and emotional factors. Theories must also model the joint assessment of
trustworthiness of information and its source (e.g., the human), the impact of trustworthiness of
the information on the perceived trustworthiness of the human, and vice versa. For
communications networks, quantitative methods for establishing and analyzing trust in dynamic
mobile network environments are needed that considers the connectivity, reliability, and latency
of the communications network. The impact of reliability of the communication network on the
expected trust among its users and the impact of the quality of the information such as its
credibility and provenance on the network flows must be studied.
For distributed decision making in composite networks, there are few analytical models for
deriving trust that consider these myriad factors and network dynamics. In particular, there is a
limited understanding of how trust can be derived from various distributed sources of evidence.
There is also a limited understanding of network-of-networks composability of trust among
different networks. Furthermore, completely new trust dynamics are being developed in social
networks where trust relationships are formed, developed, and dissolved in virtual communities.
This has a deep impact in information and communication networks within which these social
networks are embedded.
One of the crucial steps in developing network science theory that addresses these issues is to
study real world and large scale network data from many different types of networks and find
correlations and other dependencies between different types of networks. There is very little
work that examines trust evolution in different network types from a collaborative perspective.
Advancing the state-of-the-art in trust-based distributed decision-making requires first an
understanding of the factors that contribute to trust in each type of network. In addition, methods
need to be developed to combine these factors into uniform measures of trust that overcome
challenges posed by the incompatible inputs, missing data, and possible inconsistencies. Trust
metrics might be separately defined in each of the networks, but the key issue is to elucidate the
mapping of qualitative metrics across the networks, to define an end-to-end notion of composite
trust, to determine the attributes (presumably many others than trust) in the different networks
that affect this composite metric, identify those that can be controlled and those that cannot.
As networks are open to manipulation by adversaries, sound network design principles are
needed to ensure that the desired behavior emerges with respect to trust. This requires an
understanding of how trust modulates network dynamics such as data flows, co-evolution of
groups, etc. and how network dynamics influences formation and propagation of trust in the
network.

4.3 Background on Trust

There are various ways to view to trust. A rational view of trust suggests that the decision to
place trust on a specific trustee depends on a specific context: the context defines the potential
gains and losses. It defines the specific scenario under which trust is placed, to whom, and for

NS CTA IPP v1.4 4-6 March 17, 2010


what. Given the complexity of a trust relationship - in particular, its subjective and context-
dependent nature - it is useful to think of a trust relationship in terms of its trust antecedents and
consequences (Mayer, Davis & Shoorman, 1995). A trust relationship has multiple trust
antecedents and consequences. A trust antecedent leads to, or contributes to, a trust relationship.
An antecedent involves one more trust indicators or trust factors that contribute to the decision to
trust or not. The trustor bases the decision to trust on these factors. Trust factors not only involve
properties of trustor and trustee but also include properties of the trust system supporting the
trust decision - for example, an intermediary (e.g., an advisor, guarantor, or entrepreneur) may be
involved in facilitating the trust relation, or a physical or social system (e.g., a communication,
information, or legal system) may be provide critical support in a trust relationship.
A trust consequence is a result enabled by a trust relationship. Various trust models provides a
bridge between trust antecedents and trust consequences to complete the picture. A trust model
specifies the trust system, how trust factors contribute to trust, what dependencies exist between
various trust factors, which behavioral influences a trust relationship has on a trustor, etc. In
particular, the goal of some trust models is to derive a trust metric for a trust relationship by
quantifying and combining trust antecedents relevant to that trust relationship. The trust metrics
derived from these models can be used as a trust proxy to predict and control network behavior.
By introducing trust metrics, we introduce a layer of abstraction that hides complex interaction
of trust factors and issues of how these factors contribute to trust so that a subclass of trust
problems can be analyzed solely in terms of trust metrics.
One of the complexities of trust is that, trust models do not always follow this rational pattern. In
the absence of sufficient information about another entity, due to the inability to verify the
integrity of some information, the decision to trust is typically made using simple heuristics and
biases. These heuristics and biases may have antecedents that are not dependent on the context or
the characteristics of the trustee relevant to the task for which trust is required. Instead, these
heuristics may be based on similarities between the trustor or trustee, a specific property of the
trustee (for example, one might trust a cleric or trust an information more if it has clean pictorial
cues) or other cues that have no relevance to the task. The advantage of this approach is its
simplicity and the ability to deal with uncertain situations. However, the chance of making an
incorrect decision is higher. It is important that a complete model of trust take into account such
non-rational views of trust.
Another view of trust evolves from embeddedness (Granovetter, 1985) that mediates between
heuristic and rational models of trust - it incorporates the historical interaction patterns between
the trustor and the trustee into a decision of trust even if these interactions may not have any
relationship with the current task at hand and may not involve any risk. For example, interactions
with others in social settings that resemble interactions with close friends and family may result
in the development of trust between two actors. This trust may later be utilized for a situation
involving risk and uncertainty. The Army training as an example offers many shared activities
that bring together individuals from different backgrounds. Some of these activities are not
related directly to a mission requiring trust, but they provide a reference point that can be used to
derive trust. There are several benefits of the trust that originates from embeddedness (as
opposed to pure rationality or heuristics), for example, a trustor gathering information from
sources of high (rational) trust may obtain very similar type of information from these sources.
This type of "group think" results in what is called "echo chambers" where all entities provide
the same point of view. However, weakly connected entities may provide new points of view or

NS CTA IPP v1.4 4-7 March 17, 2010


information that breaks these echo chambers and makes it possible to access more risky but
richer information sources. This process of achieving trust is of crucial importance in IW and
COIN operations that require close interactions between many different types of actors.
Even though trust has been studied in the context of security, cryptography, reliability, there are
very few known metrics for measuring trust based on quality of information, credibility,
provenance, cognitive and social factors in networks. Some of the work in this area has to be
revisited for the new types of networks that have not yet been studied. The scientific study in this
area has to be grounded in reality both with large-scale data sets and realistic experiments.
Integrative experiments and studies are crucial in understanding trust across multiple dimensions,
for example interacting with another person through an unreliable system, retrieving unreliable
information from a trusted person.
Few results exist for the complex network dynamics related to trust. This is an area that needs to
be closely coordinated with the work being performed in the EDIN CCRI. Understanding how
different network operations are affected by the trust in the network is crucial in developing the
effective methods for achieving mission objectives. The design of networks from the perspective
of ensuring quality of information in communication and information networks in conjunction
with human trust in social networks is a new area of investigation. To ensure the desired
properties in the presence of possible unexpected events requires the development of new design
principles that are cross-cutting and integrative.
Trust CCRI will address these challenges holistically in three interconnected projects:
Representing trust and its impact on ongoing and future missions is a foundational capability we
will investigate in Project 1. The goal is to create a unified well defined metric system capable of
describing trust relationships and their mission impact across entities involved in
communication, information, and social networks.
Understanding the network dynamics involved in the development of trust is an important first
step in the design of network operations that make use of trust strategically. This is the main goal
of Project 2. This requires answering many hard questions: What is the expected result of an
interaction? What are the important events that should be monitored in a network? Where should
the interaction take place?
As the understanding of trust and its impact in the network matures, it becomes possible to
develop methods to establish and distribute trust related information in the network, design
networks and processes to help foster trust. This is the goal of Project 3.
Together these three projects aim to provide the science needed to enhance distributed decision
making in NCO with the trust dimension.

4.4 Key Research Questions

It is clear that to address the Network Science challenges posed by types of scenarios described
earlier, there is a need to develop methods that address trust in each individual type of network as
well as across different types of networks. Collaborations have to be established early on to seek
common theories and methods that address the concerns of individual networks but provide the
necessary interface to be integrated into integrated common picture of trust. We intend to study
trust from the following perspectives:

NS CTA IPP v1.4 4-8 March 17, 2010


Metrics and methods for computing trust:
How do we represent trust measures obtained from distributed multidimensional
situational awareness metrics into a unified trust picture with well defined semantics?
How do we structure the representation to accommodate tradeoffs between completeness
and expressiveness? How do we make the unified trust picture available to human
mission participants? How do we make the unified trust picture available to automated
reasoning services?
What are indicators of trust, also called trust factors in each type of network? How are
they computed? What are their interrelationships across different networks? What are
factors related to quality of information, to trust in humans and information?
What are cognitive models of trust and credibility? How and what inputs are used to
arrive at a decision of trust and/or credibility?
Understanding the relationship between network dynamics and trust:
What are mathematical models of trust as a risk minimization method? What are the
properties of decentralized markets? Can they be sustained over time? How do we
generate objective metrics for the usefulness/correctness of information and for the
overall trustworthiness of an embedded network of agents and use these metrics to
optimize the collection of information on a larger dynamic social network under
uncertainty and time-dependence?
What network behaviors are related trust? What social network factors facilitate (or
hinder) the extent to which formed social ties become reciprocal? Are more reciprocal
links more likely to persist in the future? What are relationships between the variations in
link quality or changes in network topology and node mobility and network
performability?
Fundamental paradigms for enhancing trust:
How can we disseminate trust models through the network(s) in a scalable way, establish
trust for newcomers, and maintain trust in a dynamic environment?
How can we modify the network(s) to improve the trustworthiness of entities in the
network?

4.5 Technical Approach

The Trust CCRI will attempt to facilitate constructive synergy and interactions among the ARCs
and the IRC with respect to the formalization, establishment, composition, propagation,
rescinding of trust and with respect to trust-based distributed decision making. We will also
exploit interactions with other CCRI and other topics among the ARCs and within the IRC,
especially in helping to integrate the distributed monitoring and assessment mechanisms
developed within the CNARC, INARC, SCNARC to enable trust-based distributed decision
making. The overall research activities in this CCRI will be associated with three primary
projects with distinct but mutually supportive aims. The three projects are:
Project T1: Trust models and metrics: computation, inference and aggregation (Leads:
Dakshi Agrawal, IBM (INARC) and P. Mohapatra, UC Davis (CNARC))

NS CTA IPP v1.4 4-9 March 17, 2010


The overall goal of this project is to investigate the fundamental question of how the trust
relationship between two entities in the composite network can be computed and aggregated.
There is a substantial body of literature that addresses this question from many different
perspectives originating in different disciplines – the goal of this project will be to espouse a
view which promotes an end-to-end understanding of these issues starting from the point when
information and data is gathered, stored, transported by a communication network, processed by
information network, and finally used in decision making by humans. The main aim of this
project is to develop a unifying view of trust across all these different domains.
Project T2: Understanding the interactions between network characteristics and trust
(Lead: Sibel Adali, RPI (SCNARC))
The main research challenge in this project will be the identification of the various interactions
between the network characteristics and trust. Some of the main research topics to be studied in
this project are how trust enables the various types of interactions in the network, how the
different behavior dynamics such as conversation or mobility patterns that are related to trust can
be detected, how these different behaviors are correlated to each other and to which degree a
network can sustain these behaviors. The theory developed in Project T2 will build on the
fundamental work in Project T1 and will be used to develop the necessary paradigms for
enhancing trust, studied in Project T3.
Project T3: Fundamental paradigms for enhancing Trust (Lead: Karen Haigh, BBN (IRC))
This project looks at questions relating to trust meta-data – how do we propagate the trust models
about entities in the network, how do we support dynamics of trust, and how can we modify the
network to increase the trustworthiness of entities. We will look at how to scalably disseminate
the trust meta-data, how to reliably establish trust for newcomers, and how to revoke trust as the
environment changes. In later years, we will examine how to tie the properties of the network to
the emergent trust of the entities, and possibly optimize the structure of the networks to improve
the overall trust behaviour. The effort in this project requires a comprehensive unified trust
model developed in Project T1, and will provide a framework for validating ideas in both
Projects T1 and T2. Lessons learned in T3 will feed the development of ideas in T1 and T2.

NS CTA IPP v1.4 4-10 March 17, 2010


Figure 2: Projects in the Trust CCRI
It is our belief that the three projects enumerated above have the potential in combination to
considerably enhance integration, interoperability, and harmonization across the entire Trust
CCRI. In particular, the first project will provide a unified basis for the overall CCRI; the
second project will provide a framework to assess how trust impacts various network entities and
processes while the third project will show how we can design the network to enhance trust in
distributed decision making.
References

S. S. Brehm. (1992) ―Intimate Relationships‖, New York, McGraw Hill.


Mark Granovetter. (1985) "Economic action and social structure: The problem of
embeddedness." American Journal of Sociology 91:481-510.
R.C. Mayer, J.H. Davis & F.D. Schoorman. (1995) ―An integration model of
organizational trust‖, Academy of Management Review, Vol 20, 709-729.
JR Taylor. (1989) ―Linguistic categorization: Prototypes in linguistic theory‖ Oxford University
Press, USA.

4.6 Project T1: Trust models and metrics: computation, inference


and aggregation

Project Lead: Dakshi Agrawal, IBM (INARC)


Email: agrawal@us.ibm.com; Phone: 914-784-6016
Project Lead: Prasant Mohapatra, UC Davis (CNARC)
Email: prasant@cs.ucdavis.edu; Phone: 530-754-8016

NS CTA IPP v1.4 4-11 March 17, 2010


Primary Research Staff Collaborators
R. Govindan, USC (CNARC) K. Chan, ARL
K. Haigh, BBN (IRC) J. Cho, ARL
P. Pirolli, PARC (INARC) G. Powell, ARL
M. Srivatsa, IBM (INARC) T. La Porta, PSU (CNARC)
J. Opper, BBN (IRC) S. Aral, MIT (SCNARC)
S. Adali, RPI (SCNARC) C. Partridge, BBN (IRC)
J. Golbeck, UMD (SCNARC) M. Goldberg, RPI (SCNARC)
K. Levitt, UCD (CNARC) S. Krishnamurthy, UCR (CNARC)
S. Parsons, CUNY (SCNARC) W. Wallace, RPI (SCNARC)
D. Roth, UIUC (INARC)
M. Singh, NCSU (CNARC)
T. Abdelzaher, UIUC (INARC)
D. Agrawal, IBM (INARC)
T. Brown, CUNY (INARC)
J. Han, UIUC (INARC)
H. Ji, CUNY (INARC)
T. Kementsietsidis, IBM (INARC)
C. Lin, IBM (SCNARC)
M. Magdon-Ismail, RPI (SCNARC)
P. Mohapatra, UCD (CNARC)
B. Uzzi, NW (SCNARC)
Z. Wen, IBM (SCNARC)
W. Gray, RPI (SCNARC)
T. Hollerer, UCSB (INARC)
M. Schoelles, RPI (SCNARC)
X. Yan, UCSB (INARC)

4.6.1 Project Overview


The overall goal of this project is to investigate how trust can be modeled; how it can be
measured, inferred, or estimated; and how it can be presented to the entities in the network. In

NS CTA IPP v1.4 4-12 March 17, 2010


our investigation, we will promote an end-to-end understanding of the issues starting from how
data and information is gathered, stored, and transported by the communication network; how it
is processed by the information network; and finally, how it is used in decision making by the
humans. To make progress in this broad research agenda, we will take a stratified approach
where in the lower strata, we will conduct an in-depth investigation of the trust factors deemed
most relevant to the military needs to produce insights into how trust is influenced by the
underlying elements and processes in the networks. In the upper strata, the focus will be on end-
to-end taxonomy, models, and metrics for trust so that the unique strength of this consortium,
namely, a confluence of researchers from multiple disciplines can be used not only to advance
the frontiers of network science but also to promote a common understanding for an important
issue in the design of a complex system.

4.6.2 Project Motivation


Without proper assessment of trust, the warfighters will be left with the conundrum of either
acting on uncertain information that maybe controlled by the enemy or refusing to take any
actions – both of which are detrimental to a mission, for example, the Operation Anaconda
described in the introduction. By developing end-to-end trust models and metrics and by
developing means to measure, infer, or estimate trust, this project will advance the current
understanding of trust in two important ways: first, it will provide a common basis for trust
research which will facilitate greater synergies between researchers from multiple disciplines and
fill critical gaps that fall at the discipline boundaries and get overlooked. Second, the in-depth
investigation of critical trust factors in three networks will provide insights into the design and
operation of composite networks that can be exploited to enhance trust-based decision making.
Since enhancing trust-based decision making is a core objective of the research in NS-CTA, this
project will also provide critical input to the research agenda of several non-CCRI tasks being
carried out in individual centres (see Section 4.6.9 for project linkages).

4.6.3 Key Project Research Questions


The research agenda of this project calls for a penetrating analysis of trust factors that contribute
towards trust in distributed decision making; synthesis of models to compose these trust factors
into trust metrics relevant for different networks; synthesis of unified measures and abstractions
to compose trust in different networks; and finally, mechanisms to evaluate and estimate the
degree of influence of various trust factors and metrics in a given scenario, and whenever
possible, mechanisms to optimize the composite, integrated network to enhance the effectiveness
of decision making in military scenarios.
In the first two years of the program, this project will focus on outlining required components of
a unified model of trust across network types, including deep-dive investigations into key factors
(notably, provenance and credibility of information), and acquiring a better understanding of
cognitive trust models in the brain.

4.6.4 Initial Hypotheses


Our initial hypotheses is based on the assertion that we can construct a unified trust concept that
will include the assessment of entities from social, communication, and information network.
This unified trust model will be based on a common trust taxonomy. The unified trust model will
be able to aggregate trust computed by domain-specific trust models that may use probabilistic,

NS CTA IPP v1.4 4-13 March 17, 2010


logic, experimental, or other techniques to assess trust. The unified trust model will help identify
and isolate untrustworthy entities, thereby enhancing the accuracy of trust assessments.
For in-depth study of trust factors, we will focus on credibility and provenance as the most
important attributes of information that determine trust in information. We hypothesize that
assessment of credibility of information and its sources can be substantially enhanced by
exploiting the latent characteristics of the networks (credibility can also be established by direct
assessments, for example, of the accuracy of an information processing algorithm – we will
pursue these direct assessments in the Center-specific portion of the NS-CTA research). Since
information provenance plays a critical role in determining its credibility, we hypothesize that by
improving tracking, storage, and presentation of provenance metadata, we can make a substantial
improvement in the assessment of trust in information even when the provenance metadata is
incomplete.
We hypothesize that a cognitive model of trust can be constructed for situations in which humans
make credibility judgments about a piece of information and subsequently make decisions based
on the credibility of information. We hypothesize that beliefs regarding whether the source is a
human or an agent plays a major role in subsequent actions and the differences can be captured
in a quantitative cognitive model of trust.

4.6.5 Technical Approach


Overview
In the first year, we will undertake the following three tasks:

Task T1.1 Unified Models and Metrics of Trust (Lead: J. Opper, BBN (IRC))
In this task, we will develop a framework to enable representation of trust in a system-wide
manner, integrating trust measures across multiple abstraction layers from communication,
information, social, and cognitive networks. The work in the first year will focus on creating a
common trust taxonomy and on identifying trust factors that should be incorporated in the
aggregate model and how these trust factors influence each other and how they should be
represented individually.
Task T1.2 Computing Trust Factors (Leads: M. Srivatsa, IBM (INARC); R. Govindan,
USC (CNARC))
In this task, we will conduct in-depth investigation of the two critical aspects of information –
the credibility of information and source of information and provenance as a critical factor that
determines credibility of information. The focus in this CCRI will be on latent network
characteristics that can be exploited to better assess the credibility of information and how
provenance metadata can be modeled, organized, stored, and transmitted so that credibility of
information can be accurately assessed even when provenance metadata cannot be presented to
the user in its totality.
Task T1.3 Cognitive Trust Models (Lead: P. Pirolli, PARC/INARC)
This task concentrates on the development of a theory of the human cognitive machinery
involved in judgments of credibility (of the information itself and of other humans as sources of
information). The behavioral sciences have for the most part treated such judgments as ―black
boxes‖. Our aim is to start the development of a computational/quantitative theory of the
perceptual and cognitive processing inside of that black box, with a focus on how people process

NS CTA IPP v1.4 4-14 March 17, 2010


cues (e.g., from user interfaces, information visualizations of provenance, etc.) into inferences
about credibility and how those inferences feed into decision making and performance.

4.6.6 Task T1.1: Unified Models and Metrics of Trust (J. Opper, BBN (IRC); S.
Adali, RPI (SCNARC); D. Agrawal, IBM (INARC); J. Golbeck, UMD
(SCNARC); K. Haigh, BBN (IRC); K. Levitt, UCDavis (CNARC); S. Parsons,
CUNY (SCNARC); P. Pirolli, PARC (INARC); D. Roth, UIUC (INARC); M.
Singh, NCSU (CNARC); Collaborators: S. Aral, MIT (SCNARC); J. Cho,
ARL; P. Mohapatra, UCD (CNARC); C. Partridge, BBN (IRC); M. Srivatsa,
IBM (INARC); Red team reviewers1)
Task Overview
The goal of this task is to develop a framework that enables representation of Trust in a holistic
system-wide way, integrating trust measures across multiple abstraction layers from
communication, information, social and cognitive networks. The benefit of a unified framework
is the ability to provide a unified trust picture that can be used by both automated reasoners and
humans for dynamic mission planning and distributed decision making.
To establish a cross-cutting research thrust on Trust, researchers require an understanding of the
different vocabularies, key research questions and motivating problems. Researchers need to
understand how factors change as they cross network-type boundaries, and how factors in one
network-type impact the factors of another network-type.
Task Motivation
Today's forward deployed net-centric operations involve a large number of concurrent mission
participants and resources linked by complex interaction networks. As each mission participant
continuously performs trust assessments to guide his personal decision making, these individual
assessments need to be combined into a unified trust picture to be useful and applicable to
distributed collaborative decision making. This involves combination and normalization of trust
metrics at different layers, measured along different dimensions, and represented in formats that
vary across the armed forces and coalition partners.
A good example to illustrate the complexity of trust models is to consider an end-to-end flow of
information as the information is gathered from a battlefield, processed, and presented to the
end-user for a decision involving significant risk. The scenario is that of a battlefield under
surveillance from a sensor (sending time-series data, image and video sources mounted on
different platforms), and text reports from human sources. The decision maker is trying to assess
situation in the battlefield and decide if the hostile forces have emerged with unanticipated
capabilities, making it necessary to counteract by sending reinforcements.
First, in this scenario, it is important to assess security properties of the sources as well as the
capabilities of the sources and technical sophistication of the adversary forces. For human
sources, it is important to understand how much they are trusted for a specific decision making
context and why. This information can originate from many sources, including other users in a
social network. Depending on the situation and full context, this results in a credibility
assessment of the sources. Various attributes of sources (identity, credibility, etc.) become part of
1
We have verbal agreements with many additional ARL-CTA researchers who have agreed to provide red-team
style reviews to the documents we generate. We will be seeking to have as many ARL-CTA reviewers as possible.

NS CTA IPP v1.4 4-15 March 17, 2010


the provenance of the information. As this information flows through the communication
network, properties of the communication network become part of provenance as well (e.g., was
the information cryptographically secured? Was it sent on secure network?). Within the
communication and information network, information is fused and processed otherwise. Each
step of the processing and transmission pipeline results in additional provenance metadata.
Clearly, the total aggregate of provenance metadata cannot be stored indefinitely so at many
stages of this pipeline, provenance metadata may be summarized irrevocably. Along with
provenance, accuracy and timeliness of the processed information also changes which
themselves are critical trust factors. For example, for unstructured textual information, fusion
may increase accuracy of reports by combining it with facts extracted from the video sources or
by combining multiple reports into a single set of facts. This brings the aspect of algorithmic
credibility assessment of information into a picture which quantifies how the fusion algorithms
increase credibility of information by mutually consolidating potentially conflicting information.
As the information flows through the combination of information and communication network,
the eventual use of the information may not be known. It follows that information must be store
for future in a way that makes relevant information easy to retrieve. Going back to provenance
metadata, it may be stored at multiple levels of granularity and propagated in the network
depending on available resources.
From the end-user perspective, perhaps the most critical question is how to direct a user so that
their cognitive overload is reduced and they find the most relevant, credible information quickly.
There are two important trust aspects here: First, a relevant, credible piece of information may be
available in the information network but it is simply lost in the sea of information and the end
user does not find the relevant information. This calls for cognitive models of trust in
information processing so that the information navigation can be structured in a way that leads
the end-user to the relevant, credible information quickly and efficiently. Second, when the end-
user eventually gets a piece of information, it may not induce sufficient trust to take a decision.
For example, there may not be enough provenance details for the user to trust it in the particular
context and situation. Even more importantly, the information may be highly credible from an
algorithmic perspective, but the decision maker may not trust the source. Similarly, the
information may be somewhat stale and, before a decision can be taken, the user may want
fresher information before making a decision.
Thus, provenance (more generally, credibility as a function of provenance and other security and
expertise properties), timeliness, and accuracy of information they all play a major part in how
much trust a decision maker has in a particular decision based on that information. In such cases
where trust is insufficient to make a decision, the trust factor most critically impacting a
particular decision needs to be determined. In one case, fresher information from a single source
can immediately increase the trust in decision making while in another, additional detailed
provenance of the existing information from a single source can be sufficient to increase trust in
decision making. In another case, information that is highly contradictory with other facts from a
highly trusted source such as a human may be deemed more trusted. Furthermore, as human
decision makers make decisions to send, suppress or change information they receive based on
their trust judgements, provenance of information becomes even more important in making a
trust judgement and also understanding the basis of others' trust judgements.
Key Research Questions

NS CTA IPP v1.4 4-16 March 17, 2010


The top-level question is - How do we define a Unified Trust Model that is:
Comprehensive — touching all network types
Flexible — allowing extensions as new concepts are identified, and allowing different
representation types (particularly since different networks may have very different natural
representations of Trust)
Scalable / Distributable — nodes must be able to actually use the model to make decisions
and propagate the information, without overwhelming the infrastructure
This question can be decomposed to the following:
How do we represent trust measures obtained from distributed multidimensional situational
awareness metrics into a unified trust picture with well-defined semantics?
How do we structure the unified model so that it is decomposable, so that it can be scalably
reconstructed in a distributed network?
How do we structure the representation to accommodate trade-offs between completeness and
expressiveness?
How do we make the unified trust picture available to human mission participants and
automated decision-makers?
Initial Hypotheses
We can construct a unified trust picture that includes trust assessments of entities from
communications, information, and social networks. This unified trust picture will increase the
accuracy of assessments, and thus allow the network(s) to isolate untrustworthy entities in a
more expedient, reliable manner. Trust assessments will be justifiable under human examination.
Prior Work
Factors that contribute to Trust have been extensively studied in each of the component network
types, including communications networks (Sun et al 2006), (Theodorakopoulos & Baras, 2006),
information networks (Ailon et al, 2005), (Srivatsa et al, 2009, Srivatsa et al, 2008), social
networks (Zhang et al 2006), (Wang and Sun 2009) and cognitive networks (Zak 2008),
(Damasio 2005). In this task, we will make a much more extensive literature survey as part of the
work in this task; we will organize and collate this extensive prior art according to which
network types are touched upon, which factors are utilized, and the main conclusions about
which components are key.
Aggregate (composite-network) models are much rarer, however. The following frameworks
collate factors from at least one network type, and appear to be general enough that we can learn
lessons from their structure, and potentially even derive our composite model from concepts
described. Focussing on information quality, Stvilia et al (2007) have a model of information
quality that comprises 41 factors, a few of which come from other network types. Yu and Singh
(2002) store the trust evidence, and combine values with Dempster-Shafer theory. Xiong & Liu
(2004) introduce five trust parameters in PeerTrust and present a general metric that combines
these parameters. Procaccia et al (2007) study how to aggregate trust information based on
Gossip protocols. Benjamin et al (2008) and Pal (2007) studied dynamic algorithms for assessing
trustworthiness through a combination of knowledge of different types, including symptomatic,

NS CTA IPP v1.4 4-17 March 17, 2010


reactive, teleological, malicious, and relational. One interesting aspect of this research is that it
explicitly models trustworthiness of probes in addition to trustworthiness of subjects, which
allows construction of algorithms that limit the risk of performing actions based on
untrustworthy information. We will study whether (and how) each of these trust models and
evaluation methods may be used in evaluating the impact of multi-source capabilities on trust.
Technical Approach
Activities for this task are:
Developing a common Taxonomy for Factors. This task will focus on establishing a common
taxonomy for terms across network types. We will develop a list of Observables (Intrinsic,
Contextual and Reputational), Artifacts, Actors, Actions, and Key Questions that will enable
Trust researchers across network types to understand how terminology and key concepts
change in different networks. The final product of this effort will be a list of the trust factors
from each of the communication, information, social and cognitive networks, including
annotations of how factors in one network type influence factors in other network types.
Developing an Aggregate Trust Model. This task will focus on the development of a single,
cohesive model that incorporates factors from each of the participating network types. The
key challenge will be to support expressive, scalable and distributed computation of trust
factors that are vital to decision makers. We will develop a unified framework that can be
used to predict whether a trustor will or will not trust a trustee, to quantify the risk to
information from specific network configurations, and to function correctly in the presence of
missing or uncertain data. We will validate these against a variety of real-world datasets when
available.
This task will closely collaborate with T1.2 and T2.2, incorporating their ideas and lessons
learned as appropriate. T1.2 focuses on specific computation approaches for important factors
across network types (year 1 focuses on credibility and provenance). T2.2 is studying network-
based indicators.
The descriptions below outline a longer-term roadmap; Year 1 activities are outlined in the
paragraphs annotated Development Approach.
Developing a common Taxonomy for Factors. We know that trust is viewed from diverse
perspectives by the communication, information, social and cognitive networks groups.
Therefore, to establish a cross-cutting research thrust on Trust, researchers require an
understanding of the different vocabularies, key research questions and motivating problems.
Terminology differences cause confusion, misunderstanding, and disagreement, thus creating an
inherent barrier to cross-cutting research. A clear definition of terms will enable richer cross-
cutting research interactions. Key questions to address include:
Which concepts are key components of a Unified Trust Metric?
How do concepts change across network-type boundaries?
Which factors in network-type A impact which factors in network-type B, and how?
Our approach in developing a taxonomy is to clarify network terms in isolation, then look at
representation gaps and interactions across networks; this effort will occur in collaboration with
EDIN‘s task E1: Ontology and Shared Metrics through our liaison Craig Partridge (BBN/IRC).

NS CTA IPP v1.4 4-18 March 17, 2010


We believe that an iterative approach will be necessary, in that the identification of a
representation difference between a pair of network types will help clarify the definitions of
component concepts. The first step in developing a comprehensive taxonomy is to identify each
of the following attributes of the different kinds of networks:
Definitions
o Nodes. What are nodes in this network type? How nodes of different types behave
in different ways?
o Links. What are links in this network type? What flows along the links? How can
links be bundled?
o Other Concepts
 What is the main focus of research? Are links or nodes more interesting?
Which subgroups of a network are important and interesting? What are
motivating problems in this network?
 What properties does the network exhibit as a whole?
 What concepts are specific to Trust, and how do they complicate or
simplify the problem?
Observables (Factors)
o Intrinsic / Objective factors. Attributes that persist, and depend little on context.
o Contextual / Subjective factors. Relationships between nodes and embedding
context. Can change in time and space.
o Reputational. Authority, reputation and cultural bias. Captures historical
behavior of nodes.
o How should these factors be subdivided by institutional, economic, social, and
infrastructure? These would be reflected in different ways in the different layers.
For example, economic factors include power conservation for a sensor and
monetary gain for a spy.
o How can specific factors be captured and represented? Context for example, is
particularly important for Trust, and describing context in sufficient detail will be
a challenge.
Actuation
o Actors. What are the entities that act on/in the network?
o Actions. What can these entities do to effect change?
o Artifacts. What do they produce?
o Actuators. How do they effect change? How do they produce artifacts?
The second step is to identify and clearly define the factors that contribute to Trust in each
network type. Many of these factors will be directly derivable from the network taxonomy, but
we expect these gaps and differences to generate new ideas and clarify key research questions.
For example, if a factor is important for an information network, but not discussed (or discussed

NS CTA IPP v1.4 4-19 March 17, 2010


quite differently) in a communications network, we anticipate cross-fertilization of ideas in the
community.
The third step will be to identify a set of simple questions about how the different network types
interact. These questions will highlight the key touch-points of networks, and shape further
research. To answer each question operationally, we will need an underlying mechanism to
transform trust information across the bridge so that it can be accurately and effectively utilized.
For example:
Communication Information Social Cognitive
Network Network Network Network
Communication
Network
Information How does the What informs
Network information get where what?
it is needed?
Social Network Who needs what Who knows
information? who?
Cognitive
Network

To help understand these touch-points, we will use both visual and quantitative analysis of trust
patterns in networks. Exploratory visual analysis is a useful tool for hypothesis generation,
identifying potential correlations or non-correlations. The quantitative analysis will validate
those theories against existing trust networks, and identify how important the factors are. In
particular, we will examine patterns of trust relationships to understand how trust between people
develops in social networks, in an information system, and to what degree trust correlates with
cognitive factors such as profile characteristics and personality. We will look at different levels
of the network (overall, ego networks, and network patterns) to determine trust interaction and
distribution.
Finally, we will also identify how factors of one network type influence different factors in a
different network type. For example, the objective factors of accessibility and verifiability of a
communication link affect the timeliness and authority of nodes (both contextual factors) in an
information network.
Our development approach is to involve individuals with experience in all four of the network
types and all four of the centers, as shown in the following table. In addition to the names listed
below, we have verbal agreements with many other ARL-CTA researchers who have agreed to
provide red-team style reviews to the documents we generate.
IRC SCNARC CNARC INARC
Communications Haigh, Opper, Singh, (Srivasta)
(Partridge) (Mohaptra)
Social Haigh Golbeck, Parsons, Singh Agrawal
Adali
Information Haigh, Parsons, Adali Roth, (Srivasta)
(Partridge)

NS CTA IPP v1.4 4-20 March 17, 2010


Cognitive Golbeck, Parsons, Singh Agrawal, Roth,
Adali Pirolli

As part of this activity, we intend to form collaborative reading groups and seminars so that
relevant literature can be identified, presented, and explained to the audience. The cross-
disciplinary group will be able to elaborate unclear terminology, extend terminology when it
reaches limited conclusions, identify any gaps in the discussion, and perhaps most importantly,
identify potential early collaboration opportunities.
As described in the below, we will perform a variety of experiments and analyses to validate our
hypotheses of what factors influence each other.
Developing an Aggregate Trust Model. The goal of this activity is to develop a framework that
enables representation of trust in a holistic system-wide way, integrating trust measures across
multiple abstraction layers present in communication, information, and social networks. Trust is
a multidimensional highly subjective measure relevant to a wide range of multiple properties
(e.g., trust with respect to data correctness, data integrity, timeliness, freshness, resistance to
unauthorized alteration and denials of service, and unauthorized propagation, as well as system
integrity, and trustworthiness with respect to security, reliability, survivability, people, etc.). The
benefit of a unified framework is the ability to provide a unified trust picture that can be used by
both automated reasoners and humans for dynamic mission planning and distributed decision
making.
What is needed is a suite of mechanisms that provide necessary flexibility (since end-user
missions have different requirements and therefore different context-sensitive models of trust in
information) and scalability (since communication and information processing resources are
precious in military scenarios). A unified trust picture increases the accuracy of assessments and
therefore allows entities to isolate untrustworthy entities in a more expedient, reliable manner.
Being able to efficiently compute the quality of information as dictated by the end-user mission
is essential for computing trust in information-based decision making.
The IRC will lead the effort to create a single, unified trust metric that takes into account factors
(and aggregate trust models) from each of the network types.
We believe we can construct a unified trust model, including trust assessments of nodes linked
through network, information, and social networks at different layers. The trust model will be
scalable, composable, and distributable. Key questions to address include:

How do we represent trust measures obtained from distributed multidimensional


situational awareness metrics into a unified trust picture with well defined semantics?
How do we structure the representation to accommodate trade-offs between completeness
and expressiveness?
How do we aggregate and harmonize evidence import into the unified trust picture?
How does the aggregation process affect Trust?
How do we make the unified trust picture available to human mission participants?
How do we make the unified trust picture available to automated reasoning services?
(How does the network configuration impact how trust can or should be aggregated?)

We will also monitor the following Key Questions that will be addressed in other tasks:

NS CTA IPP v1.4 4-21 March 17, 2010


T3.1 How can we disseminate and propagate trust models; how can we enhance trust;
how can we limit trust-propagation errors?
T1.2 How do we compute trust factors, including provenance and credibility?
T1.3 What dimensions make a complete model of Human trust?

The taxonomy of factor types above will outline which factors from each network type affect
which factors in another network type.
We will design a composable Trust metric based on the ideas for the optimization function
described in (Haigh 2007). The function is a weighted sum of factors, designed in such a way as
to be easily distributable around the network. The key components will be the important factors
identified by Taxonomy, normalized, and with initial weights based on importance values
identifiable from literature searches. The advantage of this approach is that weights or
component factors can easily accommodate factors such as cost, risk, and benefit. The key
advantage to this metric is that it can incorporate trust models (and lessons learned) from the
three ARCs and also from prior art (see list below), while also balancing the concerns of the
different network types in a composable manner.
For example, Task T1.2 investigates provenance tracking algorithms by propagating provenance
metadata in the form of annotations across database queries/views; in particular, Task T1.2
addresses similar challenges such as expressiveness and scalability, but within the limited
context of one intrinsic QoI attribute, namely, provenance. Task 1.1 will explore the applicability
(and possible extensions) of such annotation propagation algorithms (along the lines of
expressiveness, scalability and distributability) for a broader range of trust factors.
Wang and M. Singh have proposed a Bayesian approach that captures trustworthiness in terms of
the certainty with which an agent performs at a specific level. Trust will have a component of
moral hazard wherein agents may alter their behaviors to maximize their utility, and agents seek
to establish incentives under which others will be well behaved. Hazard and M. Singh have
recently argued that, under some weak assumptions, the trustworthiness of a rational agent is
isomorphic to its intertemporal discount factor.
Argumentation systems, studied by Parsons, give us another way to develop a framework.
Argumentation is a symbolic reasoning mechanism that constructs reasons (arguments) for and
against propositions of interest. In a social context, for example, where we are interested in the
reliability of some information, a reason to believe it might be the fact that this source has been
reliable over a long period. A reason against believing it might be that it is outside the realm of
expertise of the source. Argumentation provides methods for taking this structure of interlocking
reasons and determining which propositions should be accepted. This symbolic model can be
combined with numerical measures like those developed by Singh. Further, we can use the
argument structures as plausible explanations for human users, and extend the argumentation
framework to not only handle information about trust, but also to aid decision making using the
information of questionable trustworthiness which takes into account the decision attitudes (such
as the risk-averseness) of the human users. The steps in the reasoning process that argumentation
captures do not have to be homogeneous. In the context of a unified model of trust, this would
make it possible to have arguments which represent reasons based on different aspects of trust
(trustworthiness of acquaintance, quality of information and so on), with the mechanism used to
combine these arguments summarizing the way that the different kinds of trust should be used in
combination.

NS CTA IPP v1.4 4-22 March 17, 2010


Reconciling these statistical or numerical approaches with symbolic or logical approaches will be
an interesting topic area to examine. Our goals include analyzing the incentives of agents and
evaluating the impact of their individual strategies, taking into account models of reliability, and
social, communicative, and institutional relationships. We would like our approach to be
modular so that the above domain-dependent models may be developed by others on the team.
We seek an abstract, dynamic, tractable model based on minimal assumptions that gives us
expectations on the quality and reliability of information based on context.
Our development approach directly addresses the significant challenge to develop a single
unified model that can capture all domain- and situation- specific nuances. We will therefore
focus initially on scenarios for a specific situation in a given domain. We believe that a
sensitivity analysis (in particular, guided by the end-user feedback) can be performed and once
the most critical trust factor is determined, the network can be (re-) configured so that the
information delivered is sufficient to make a trust-based decision. This calls for communication
and information networks that are designed to with this re-configurability and feedback in mind.
Note that reconfiguration of these networks is not an independent action carried out in each
network in isolation but these networks work together to deliver the relevant information to the
end user.

During year 1, the primary foci will be to:

Create the identify which factors should be incorporated into the aggregate model,
describe how they influence each other, and clearly define how they are represented
(mathematically, logically, symbolically, or other),
Identify different aggregate Trust models that have been developed in the prior art, and
examine them for their strengths and weaknesses, and
Develop clear, structured use cases (with accompanying validation datasets) against
which we can develop a framework.

To the extent that prior work has been performed by individuals participating in these tasks, we
intend to compare and contrast these approaches directly, including evaluating them against
datasets that are readily available.

In future years, we anticipate looking at challenges such as


How to model other entities' internal motivations
How to incorporate conflicting evidence
How to generate appropriate incentives to coerce other entities to cooperate
How to capture uncertainty
How to present the trust model to human mission participants, and how to allow them to
control how the automated decisions are made
Validation Approach
Working closely with task R3: Experimentation With Composite Networks, we will develop
specific decision making scenarios and validate them against ground truth to the extent possible,

NS CTA IPP v1.4 4-23 March 17, 2010


e.g. for the social network using a system in which users explicitly describe how much they trust
each other, and e.g. for the interactions between the social and information network using
something akin to Wikipedia articles.
For example, datasets simulating battlefields with dynamically unfolding events are the most
desirable means to validate the research approach and products. We can use these to validate the
possible usage of trust models military applications.
We will build upon PARC's data mining and user interface experience with systems such as
micro-blogging, wikis, social tagging, RSS feeds, and email. The experiments will involve
meaningful scenarios involving decision-making and prediction tasks that require utilization of
information recommended by others and stated opinions of others. While these experiments do
not simulate a military mission, we believe that the setup is provides a rich environment for
development of cognitive models of trust that will be applicable to many military missions.
Components of aggregate models should also be tested in isolation. For example, we can test
models of trust in information networks by monitoring daily (or even hourly or continuously
updated) news datasets; (1) news reports have multiple context (e.g., sports, politics, business,
etc.); (2) news often describes an unfolding situation (e.g., earthquake in Haiti, estimates of
casualties started from 50,000 and later proved to be too small); (3) news can be obtained in a
streaming manner and can be incrementally updated (e.g., from Google News); and (4) news
provides unstructured real-world data. We will construct information networks on several news
datasets and test how to use interconnected datasets to enhance the credibility of news when
encounter conflicting information. Other potentially useful datasets include Wikipedia pages
with infoboxes and the NIST Knowledge Base Population evaluation corpora.
We will also conduct experiments of how trust is assigned in social networks, complementing
the work in T3.1 (Oracles). In addition, Golbeck is currently recruiting participants from her
trust-based social network to participate in a study. They will be given the Big Five Personality
Test, a standard and widely accepted personality test that scores subjects' personalities on five
axes. She will then correlate their results on this test with their pattern of trust ratings in the
social network. If she can identify connections between personality and trust behavior, it will
help us better understand and compute trust by refining our computations based on individual
behavior.
Research Products
Our research products will include a comprehensive taxonomy of factors that contribute to trust,
integrating trust measures across multiple abstraction layers from communication, information,
social and cognitive networks. We will specifically highlight the interactions of factors as they
cross from one network type to another. The report will also describe various characteristics of
the network types, how they impact each other, and how they impact trust.
We will also develop a framework that calculates Trust in a unified way across network types. It
will support different definitions and representations of trust. We will study the trade-off
between representation comprehensiveness and scalability. In year 1, this unified trust picture
will be extremely simple, based closely on prior art.
We will also develop a suite of use cases and accompanying validation datasets, and run
experiments to validate our unified trust model.
References

NS CTA IPP v1.4 4-24 March 17, 2010


N. Ailon, M. Charikar and A. Newman. Aggregating inconsistent information: ranking and
clustering, In Proceedings of the Thirty-Seventh Annual ACM Symposium on theory of
Computing (STOC '05). New York, NY, 684-693. 2005.
P. Benjamin, P. Pal, F. Webber, M. Atighetchi, Using a Cognitive Architecture to Automate
Cyberdefense Reasoning. Proceedings of the 2008 ECSIS Symposium on Bio-inspired, Learning,
and Intelligent Systems for Security (BLISS 2008), IEEE Computer Society, August 4-6, 2008,
Edinburgh, Scotland.
A. Damasio. Human behaviour: Brain trust. Nature 435, 571-572 (2 June 2005)
K. Z. Haigh, O. Olofinboba, C. Y. Tang, Designing an Implementable User-Oriented Objective
Function for MANETs, IEEE International Conference On Networking, Sensing and Control,
2007, London, UK.
P. Pal, R. Schantz and F. Webber Survivability Metrics—A View from the Trenches. DSN
Workshop on Assurance Cases for Security - The Metrics Challenge, Edinburgh, June 2007.
A. D. Procaccia, Y. Bachrach and J. S. Rosenschein, J. S. Gossip-based aggregation of trust in
decentralized reputation systems, In Proceedings of the 20th International Joint Conference on
Artificial Intelligence (IJCAI), pages 1470-1475, 2007.
Y. L. Sun, W. Yu, Z. Han and K. J. R. Liu, Information theoretic framework of trust modeling
and evaluation for ad hoc networks, IEEE Journal on Selected Area in Communications. 2006.
M. Srivatsa, D. Agrawal and S. Reidt. A Metadata Calculus for Secure Information Sharing.
ACM Conference on Computer and Communication Security (CCS), 2009.
M. Srivatsa, P. Rohatgi, S. Balfe and S. Reidt. Secure Information Flows: A Metadata
Framework. IEEE Workshop on Quality of Information for Sensor Networks (QoISN), 2008.
B. Stvilia, L. Gasser, M. Twidale, and L. C. Smith, A framework for information quality
Assessment. JASIST, 58(12), 1720-1733, 2007.
G. Theodorakopoulos and J. S. Baras. On trust models and trust evaluation metrics for ad hoc
networks, IEEE Journal on Selected Areas in Communications, Volume 24, Issue 2, 318-328,
February 2006.
J. Wang and H.J. Sun. A new evidential trust model for open communities. Computer Standards
& Interfaces, Volume 31, Issue 5, September 2009, Pages 994-1001
L. Xiong, L. Liu. PeerTrust: Supporting Reputation-Based Trust in Peer-to-Peer Communities.
IEEE Transactions on Knowledge and Data Engineering (TKDE), Special Issue on Peer-to-Peer
Based Data Management, 16(7), July, 2004.
B. Yu and M. P. Singh, An evidential model of distributed reputation management. In Proc 1st
International Joint Conference on Autonomous Agents and MultiAgent Systems, Bologna Italy
2002, pp 294-301.
Zak, P. J. (2008). The Neurobiology of Trust. Scientific American, June: 88-95.
Y. Zhang, H. J. Chen and Z. H. Wu, A Social Network-Based Trust Model for the Semantic
Web, Autonomic and Trusted Computing, Springer, 2006.

NS CTA IPP v1.4 4-25 March 17, 2010


4.6.7 Task T1.2: Computing Trust Factors (M. Srivatsa, IBM (INARC); R.
Govindan, USC (CNARC); T. Abdelzaher, UIUC (INARC); S. Adali, RPI
(SCNARC); D. Agrawal, IBM (INARC); T. Brown, CUNY (INARC); J. Han,
UIUC (INARC); H. Ji, CUNY (INARC); T. Kementsietsidis, IBM (INARC);
C. Lin, IBM (SCNARC); M. Magdon-Ismail, RPI (SCNARC); P. Mohapatra,
UCD (CNARC); D. Roth, UIUC (INARC); M. Singh, NCSU (CNARC); B.
Uzzi, NW (SCNARC); Z. Wen, IBM (SCNARC); Collaborators: M. Goldberg,
RPI (SCNARC); S. Krishnamurthy, UCR (CNARC); S. Parsons, CUNY
(INARC); T. La Porta, PSU (CNARC); W. Wallace, RPI (SCNARC))

Task Overview
The goal of this task is to conduct in-depth investigation of critical trust factors that are most
relevant to NCO and yet the state of the art contains research gaps that hinder effective and
efficient assessment of these trust factors and therefore assessment of the overall trust in decision
making. In year 1, we have identified two such key trust factors: provenance and credibility.
Both play a crucial role in determining quality of information and therefore trust in decision
making. The result of the research will be fed to the overall modeling efforts in T1.1 and T1.3
and to the work being done to enhance trust in composite networks in T3.1.
Subtask T1.2a: Information Credibility: Definitions, Assessments, and Fundamental
Tradeoffs (Dan Roth, UIUC (INARC); Heng Ji, CUNY (INARC); Jiawei Han, UIUC
(INARC); Ramesh Govindan, USC (CNARC); Prasant Mohapatra, UCDavis (CNARC);
Sibel Adali, RPI (SCNARC); Malik Magdon-Ismail, RPI (SCNARC); Brian Uzzi, NW
(SCNARC))

Task Overview
In the context of NCO, a pre-requisite for trust-based decision making is the ability to assess
credibility of information and information sources. Different fields of study have examined
credibility (Rieh and Danielson, 2007) and have used subtly differing definitions. For our
purposes, we use credibility in the sense of believability: credible people are believable people
and a piece of information is said to be credible if it is believable without specific evidence or
proof. Credible information is information that has face validity and so appears to be trustworthy
circumstantial evidence rather than specific testing (Tseng and Fogg, 1999). Some languages use
the same word for these two English words, and for our purposes, these two are synonyms
(Tseng and Fogg, 1999).
Credibility is not binary, since information can have different degrees of believability.
Credibility is different from other attributes of information – for example, credibility is different
from accuracy: credible information might be inaccurate. Similarly, credibility is different from
information freshness or timeliness. The final value of a piece of information will depend on its
credibility, its accuracy, and its freshness, in addition to several other factors. Note that
credibility judgments and perceptions may also be wrong, based on heuristic factors that are not
related to the accuracy or some other rational, testable attribute of the information. An aspect of
computing credibility is to understand how credible the information will be judged or perceived
when the judgment or perception is partly based on heuristic measures and partly on intrinsic
quality attributes of the information that are rational and objective. For example, on a personal

NS CTA IPP v1.4 4-26 March 17, 2010


level, credibility assessment of an information provided by a person could consider whether the
person could have access to that type of information they are passing along, whether they are
capable of the mental judgment and thinking behind the information, or whether they can
provide more evidence to substantiate the information even if such evidence is not asked for.
Trust and credibility are not synonymous although in the literature of psychology and human-
computer interaction these two terms are often used interchangeably (Granovetter 1985 AJS;
Coleman 1988 AJS; Tseng and Fogg, 1999). As discussed earlier, in a trust relationship, trustor
encounters uncertainty from a lack of information or an inability to verify the integrity,
competence, predictability, and other characteristics of trustee, and is vulnerable to suffering a
loss if expectations of future outcomes turns out to be incorrect. Trust allows trustor to take
actions under uncertainty and from a position of vulnerability by depending on trustee. One
useful (though simplistic) rule of thumb is that trust is about dependability while credibility is
about believability. Credibility is a trust antecedent or a trust factor – a trustor believes in trustee
before deciding to depend on the trustee.
In addition to the intrinsic quality attributes, information credibility depends upon the credibility
of the nodes that originate, transmit, store, or otherwise process information. Clearly, the
competence of these nodes in performing these operations is a key determinant of their
credibility. However, in many situations, other mechanisms may also play key roles in
determining node credibility. For example, in an information network, credibility is often
conferred upon a source by some notion of authority or by reputation. Authority is formally
conferred upon a source while a reputation is earned by appropriate long-term behavior. Within a
communication network, it is possible to assess another form of credibility which we call
situational credibility. Simply put, situational credibility is the answer to the question: Could the
source have been in a situation where it could possibly have delivered credible information? For
example, a generally credible source might provide less credible information in the presence of
cognitive confusion (e.g., in a battlefield, a generally credible sensor source may happen to be
located in an area with too much noise, etc.).
The social dynamics in a network also affects node credibility. For example, in a social network,
individuals in a clique often draw on the same information and facts rather than on diverse points
of view and tend to have a myopic information perspective. As a consequence, credibility of any
one person in the clique is often seen as high by any other person in the clique because they
―echo‖ the common assumptions, beliefs, and information that supports credibility. Conversely,
actors in networks that bridge cliques may suffer from the lowest level of credibility since they
do not belong in the clique even though they may actually be in the best position to learn new
things (i.e., things that are beyond the myopic view of any one person in the clique). At other
times, actors that bridge networks can have high credibility if they pass on new information from
one clique to another that is later found out to be true creating credibility reinforcement (Uzzi
1997; Burt 1999).
In this task, we will explore factors that affect information credibility and develop methods that
can be used to assess credibility jointly across the social, information, and communication
networks. The goal of this task is to develop an understanding of the fundamental trade-offs
involved in network design and operation that influence information credibility.

Research Question

NS CTA IPP v1.4 4-27 March 17, 2010


How can we assess credibility of information in the social, information, and communication
networks, in particular, by exploiting latent factors other than the competency of network nodes?
How can ambiguity and information conflicts be resolved automatically and efficiently in a
dynamic situation?
Hypothesis
It is possible to improve on estimates of credibility by accurately assessing individual factors that
affect credibility, some of which can be determined individually within the social, information,
and communication networks and some jointly across these networks. These factors include the
situational assessment of sources, observed characteristics of communication, information, and
human nodes, dynamics of information propagation and social interactions as well as the degree
of corroboration obtained for a piece of information.
Prior Work
Credibility assessment has received significant attention in several fields such as sociology,
psychology, and education (Granoveter 1985; Coleman 1988; Burt 1992; Uzzi 1997; Rieh and
Danielson, 2007). Typically, within each field, experts have determined factors that affect
credibility in their respective areas. In the military context, (Wright and Laskey, 2006) have
examined definitions of credibility that depend upon the veracity of the information delivered
and the objectivity of the source. They have explored Bayesian models for learning these factors
and assessing credibility using multi-entity Bayesian networks. Hilligoss and Rieh (Hilligoss and
Rieh, 2007) have attempted a unifying definition of credibility based on a detailed user study of
several undergraduate students. Finally, (McKnight and Kacmar, 2007) have studied the factors
that influence the credibility of Web information sources.
A significant progress has been made recently towards developing a set of effective analysis
methods for heterogeneous information networks. In particular, it is possible to perform effective
classification over heterogeneous information networks (Yin et al. 2006); cluster heterogeneous
information networks efficiently by taking advantage of power law distribution and hierarchical
relationships among objects (Yin et al., 2006b); integrate ranking and clustering in information
network analysis and improve the quality of both (Sun et al., 2009); and assess ―trustworthiness‖
of information objects provided by multiple conflicting information sources (Yin et al., 2008),
etc. All of these analysis methods are relevant to the proposed work.
In particular, the general framework for the trustworthiness analysis laid out by the TruthFinder
(Yin, et al. 2008) provides a foundation for the solutions that will be required to resolve the
impasse referred to earlier2. This framework examines the relationships between information
providers and the information they provided, with the following two major heuristics: (1) an
assertion that several information providers agree on is usually more credible than that only one
provider suggests; and (2) an information provider is credible if it provides many pieces of true
information, and a piece of information is likely to be true if it is provided by many credible
providers. The method links three types of information—(i) the information providers, (ii)
entities of interest, and (iii) stated facts on different entities—into a heterogeneous (i.e., multi-
typed) information network and performs in-depth information network analysis. It starts with no
bias on a particular piece of information, but uses the above heuristics to derive initial weights on

2
To avoid a misunderstanding by the readers of this IPP, we are using a terminology consistent with the rest of this
IPP – the work by Yin, et al. use a different terminology (e.g., veracity of information sources) to describe similar
concepts.

NS CTA IPP v1.4 4-28 March 17, 2010


the credibility of the stated facts and information providers. Then it consolidates the credibility
by a progressive, iterative enhancement process with weight-propagation and consolidation
across this information network. The process is similar to the authoritative page ranking process
proposed in the HITS algorithm (Chakrabarti et al., 1998) but the iterative refinement processes
the credibility computation rather than authoritative score computation.
The simple framework proposed by Yin et al. is not sufficient for our purposes as the scenario
considered their work deals with static facts from relatively homogeneous information sources
(e.g., web bookstores). The requirements from military scenarios call for much more
sophistication; in particular, the following three technical issues emerge as key challenges: (1)
tracking dynamic situations by using heterogeneous information sources; (2) incorporating
unstructured data with structured data; and (3) detection of malicious information and sources
within the information network.
Technical Approach
Within the composite networks, a variety of mechanisms can be employed to assess information
credibility. This task will be a collaboration between the three centers – CNARC, INARC, and
SCNARC, and in the following, we outline the focal point of activities within each center and
concrete plans for collaboration with other centers:
Communication Networks
Assessing Node Credibility: Node credibility critically depends on the level of security that a
node employs in its communications, its susceptibility to intrusions, and the validity of its
information processing algorithms. Node credibility can be assessed by two methods: passive
observation of the node's behavior by other nodes within its vicinity, or active challenge response
from other nodes. More sophisticated node credibility assessments are also possible. For
example, a ―credibility agent (CA)‖ can walk though the network, establish interactions with the
entity nodes and measure the credibility. The CA can be a single node or multiple nodes (multi
agent systems), the latter providing higher robustness. This kind of credibility modeling and
validation technique, based on ―expert assessments‖, is widely used both because its capabilities
can provide insight about the credibility model and because adequate real world (experimental)
data are often not available to allow robust quantitative validation. To make expert assessments
more effective, there should be an effective way how the experts (agents) are selected, managed,
and used. Besides, there should be mechanism to effectively acquire and aggregate the expert
assessment opinions in the case of multi agent credibility assessment scheme [9, 10]. We will
make detailed comparison of these techniques and derive appropriate tuning of these techniques
so that they can be successfully employed in tactical networks.
Within the communication network, it is also possible to assess situational credibility of nodes.
Recall that situational credibility assesses whether the source could have been in a situation in
which it could have delivered credible information. Some examples of situations where the
source cannot deliver credible information are: being far away from an event, suffering from
cognitive confusion due to multiple concurrent activities or other diversionary events, being in
environments not conducive to accurate observation (low light, fog, etc.). Thus, situational
credibility in general depends upon: location, ambient activity and other ambient environmental
factors, and the activity of the (human) source itself (running versus walking, etc.).
In conjunction with the information network, communication network can develop feedback
mechanisms to enable node credibility and in-network information credibility assessments (as

NS CTA IPP v1.4 4-29 March 17, 2010


described below). The information network can track the long-term behavior of the source (either
a sensor or a human) and judge the reputation of the source. It can do this by judging the value of
information delivered by the source over a span of time: sources that are able to consistently
deliver higher value information are assessed a higher reputation. Reputation may be estimated
using statistical methods based on the history of data (Podolny 1993; Rina and Araki, 2004,
Makov). This reputation can be fed back into the communication network to enable in-network
information credibility assessments.
Assessing Information Credibility. Within a communication network, it is also possible to assess
information credibility. As discussed above, information credibility depends upon the credibility
of the source and the credibility of nodes that process or forwarded information. Information
credibility may also be intrinsic in the type of information: video may be more credible than
audio, which in turn may be more credible than text. Finally, within the communication network
it is possible to assess information credibility by corroborating information from multiple sources
(it is also possible to do this within the information network, as discussed below).
Credibility can be evaluated using different metrics and very different ways. For example, a
node‘s/agent‘s opinion about the credibility of another node can be described by a continuous
value in [0, 1]. Generally, Bayesian approaches use this kind of metric (Herzog, 1990). Some
schemes use a probability or statistics as a credibility metric for instance (Landsman and Makov,
1999b). Some schemes employ linguistic descriptions for credibility, such as in (Tseng and
Fogg, 1999, Fogg and Tseng, 1999). More generally, we assume that information credibility can
be assessed by a computable function of the various factors described above. The output of this
function is a qualitative assessment of the credibility (high/medium/low credibility), and the
function itself can take several forms: a set of rules, a statistically trained model (such as a
Bayesian network), etc. Thus, the credibility function reduces several factors within the network
into an estimate of credibility computed within the communication network. We highlight three
important aspects of this in-network credibility assessment, which point to interesting research
directions:
The assessment of credibility may evolve over time. For example, the network may have
assessed the credibility of a piece of information at time T, but at some later time T’, it
may receive corroborating information which may increase the credibility of the
information already delivered by the network.
Credibility assessments may be examined post-facto. The communication network may
deliver qualitative credibility assessments which can be used by the information network.
However, the factors used by the communication network to reach its decision (e.g.,
source and situational credibility, the corroborating pieces of information) may be stored
within the network and may be retrieved later by the information network to enable it to
make more nuanced credibility assessments. This tension between push-and-pull is an
important area of collaboration between CNARC and INARC.
Proactive credibility assessments can enhance credibility. The communications network,
either unilaterally or in conjunction with the information network, can task assets in order
to proactively determine or enhance the credibility of a piece of information. For
example, it can increase the degree of corroboration of a piece of information by
selecting a human or sensor asset close to the target of interest. In determining the set of
assets to select, the network should consider the cost/benefit tradeoff for different

NS CTA IPP v1.4 4-30 March 17, 2010


sources, as well as consider the situations the sources may be in (which can bias the
reported information).

Thus, from the CNARC perspective, the research conducted on this task will be to examine
models and methods for assessing node credibility, and to determine methods for computing
information credibility in-network, and proactively enhancing the credibility of information. In
the short-term we will work on the following problems:
Develop a taxonomy of credibility and the objects of credibility assessment (sources,
nodes, etc.), and exemplify various credibility that may be used in tactical settings.
Explore expert assessments methods for node credibility assessment.
Understand the credibility frontier of a network (the amount of credible information that
could be delivered by a network) and how that depends upon the type of credibility
functions, as well as the type of network.
Discuss with INARC the push-vs-pull tradeoffs for the information used to assess
credibility.
Find classifiers for patterns of networked activity that can classify relationships as
credible or not.
Use natural experiments in real data to reveal trust and credibility.

Information Networks
Within the information network, by constructing a rich information network and by analyzing
interconnected information in this network, it is possible to cross-check and mutually consolidate
the information from multiple sources so that ambiguities and conflicts can be resolved
automatically and efficiently in a dynamic situation. The diversity of sources produces rich data
about the history (more generally, spatiotemporal properties) of the information objects that can
facilitate information analysis. The philosophy is to make use of interconnected entities and the
relationships among them in an information network to cross-check and mutually consolidate the
information and thus assess various quality attributes of information and information sources in a
specific context. Such an information consolidation process that works by cross-checking and
mutual consolidation of interconnected entities and their relationships in an information network
is colloquially referred to as the self-boosting of information networks.
As mentioned earlier, the credibility of a sensor or information source will be highly linked with
the context in a heterogeneous information network. For example, some sensors will be excellent
at detecting low speed, heavy objects while others will be tuned for cold, low-resolution
environments, and yet others for highly noisy environments.
For example, a battle-field may be under observation by sensors of different modalities as well as
by personnel belonging to different organizations (and therefore, with different allegiance,
equipment, training, etc.). Each information source has a different vantage point of the battle-
field in which a highly dynamic and volatile situation is unfolding. In this scenario, it may be
impractical to only rely on sensors or people with highest capabilities, pre-determined credibility,
etc. as these information sources may not be located in optimal places to observe a phenomenon.
Furthermore, even the sensors or people with highest capabilities or pre-determined credibility
may provide conflicting information about the same entity. To overcome this impasse, the idea is

NS CTA IPP v1.4 4-31 March 17, 2010


to acquire additional information from sources that may not be so credible in general but can
provide credible information in a particular context by cross-checking and mutual enhancement.
To handle different context, we propose to define the notion of credibility in a multidimensional
space. In particular, we consider the credibility assessment is associated with multiple
conditions, such as time, location, environmental and other possible factors. An information
source (such as a sensor) can be reliable in one environment but rather unreliable in another.
Also, a stated fact could be accurate at one time or environment but inaccurate in another. Thus
the credibility analysis will be performed in fine granularity, partitioned by their environments,
and credibility scores will be associated with such environment and the evaluation process will
be done based on multiple environments. Moreover, cross-environment analysis will be
conducted to derive the maximally integrated environments so that credibility can be transferred
or generalized to different environments.
To account for highly dynamic environments, we will consider incremental updates of current
information network and subsequent revision of credibility assessments. The new incremental
information will either be in the form of stored records or in the form of dynamic data streams.
Training can be integrated into this framework to learn different weights associated with
different features based on training data or user feedbacks.
Furthermore, we will focus on unstructured data as a significant amount data in the information
networks today (some studies say, over 85%) is unstructured, mostly textual. Information
extraction from text is a difficult and involved process that requires machine learning and
inference methods due to (a) the variability in which meaning can be presented in the surface
form of the natural languages, (b) the inherent ambiguity of natural language, and (c) the context
sensitivity of the interpretation. As a result, multiple pipelined decisions need to be made in
order to extract information from text, and as a result, error may accumulate in the process. For
each step in this pipelined process, we can think the information extraction tool as a machine
learning algorithm that provides distribution over possible predictions leading to the concept of
algorithmic credibility. The iterative framework of cross-checking and mutual consolidation then
should account for this representation of uncertainty in verifying consistency and propagating
credibility assessments in the information network.
Note that our focus will be on inclusion of algorithmic credibility in the iterative framework as
opposed to information extraction from the text itself. For the later, we will make use of the
state-of-the-art tools already developed at the Cognitive Computation Group at the University of
Illinois (http://l2r.cs.uiuc.edu/~cogcomp/demos.php) and the BLENDER Lab at the City
University of New York (http://nlp.cs.qc.cuny.edu/software.html). Our focus in this program will
be on learning good models from different sources with different kinds of associated uncertainty
(context sensitivity) and how to make use of these models in estimating and enhancing
credibility of unstructured information and sources thereof.
The final aspect that we will study is to include background knowledge while assessing
algorithmic credibility. This is especially important when we begin to consider malicious data
sources which inject false information in the information network. Enforcing background
knowledge will not only allow the detection of malicious information and sources but also allow
us to correct this type of information. Specifically, we propose to use Constrained Conditional
Models (CCMs) (Chang, Ratinov, Rizzolo & Roth, 2008, Punyakanok et. al., 2005, Roth & Yih,
2004) as a framework for integrating different information sources and prior information

NS CTA IPP v1.4 4-32 March 17, 2010


provided by human experts. The CCMs framework augments probabilistic models with
declarative constraints and provides a way to integrate multiple statistical models with an
expressive output space while maintaining modularity and tractability of parameter estimation.
Declarative knowledge, which will encode the background knowledge, can be represented as
hard or soft constraints, the associated weights or penalties, which can be learned. Soft
constraints could be used to model agreements across multiple sources in the network or
conditions on the number of simultaneous signals of a specific type. The resulting composite
model can be regarded as a collection of components responsible for statistical modeling of
different information sources or different aspects of these information sources. The key research
issue here is how to make the best use of background knowledge that can be expressed as
declarative statements in credibility assessment to reject or identify malicious information or
information sources. Here we note that various policies (security, management, etc.) are also
specified as declarative statements. Thus the framework developed here could potentially be
applied when credibility assessment needs to conform to certain policy requirements. The
research team will further develop implications of these connections in the later years of the
program.

Social Networks
Another notion of credibility is based on the status, prominence of individuals in the social
network (Podolny 1993). We postulate that prominence is a self-dependent mechanism: the
prominence of individuals in the society is a function of the prominence of the communities they
belong to and the prominence of communities is a function of the prominence of the individuals
participating in that community. The more prominent an individual is, it is likely that the
information they provide will be considered more credible. In order to validate our hypothesis,
we will develop quantitative methods for inferring prominence based on community and meta-
community structure. The theory should be general enough to apply to a wide variety of social
network domains. The theory should also come with efficient algorithms so that we can measure
and validate the prominence measures on large real data. To that end, we will develop external
measures of prominence so that we can validate the quantitative predictions as implied by the
theory and as measured by the algorithms.
We will first develop methods for identifying communities and how communities interact with
each other based on observed interactions between individuals. This will rely heavily on the
work done in task S.2 (Adversary networks: detection and evolution, which builds a basic
understanding of how to detect communities). We will need algorithms that scale up to million
node networks, and so we will develop two types of algorithms: i) rapid metric based algorithms
for clustering nodes in a graph into overlapping clusters based first on a low dimensional
embedding of the graph and then on rapid spectral methods for estimation of mixtures; and ii)
iterative local optimization of clusters based on density metrics which find communities with
more internal connectivity than external.
Given communities discovered by these algorithms, we will develop an iterative algorithm,
which bootstraps an initial crude estimate of prominence based on interactions into a refined
measure of prominence taking into account the community structure. As our algorithm is similar
to TruthFinder in construction – this provides us with a natural meta-algorithm that combines
information and social networks. We will test whether it is possible to improve information
credibility computation by considering degree of corroboration and the prominence of the nodes.

NS CTA IPP v1.4 4-33 March 17, 2010


Similarly, we will examine to which degree communities favor ―consistency‖ in determining the
prominence of individuals. We will then examine whether these findings can be combined in a
meta-algorithm that iterates across two different network types.
Validation Approach
Ideally, datasets simulating battle-fields with dynamically unfolding situation are the most
desirable means to validate the research approach and products. Short of that, daily (or even
hourly or continuously updated) news provides a rich dataset to test our research approach since
(1) it has multiple context (e.g., sports, politics, business, etc.); (2) it often describes an unfolding
situation (e.g., earthquake in Haiti, estimates of casualties started from 50,000 and later proved to
be too small); (3) it can be obtained in a streaming manner and can be incrementally updated
(e.g., from Google News); and (4) it provides unstructured real-world data. We will construct
information network on several news datasets and test how to use interconnected datasets to
enhance the credibility of news when encounter conflicting information. If data sets simulating
battle-fields are available, they will be used to validate the possible usage and judge its
effectiveness in military applications. Other potentially useful datasets include Wikipedia pages
with infoboxes and the NIST Knowledge Base Population evaluation corpora.
These datasets can be used both by INARC and CNARC to provide synthetic workloads for
credibility assessment techniques. In addition, mathematical analysis and simulations can help
assess fundamental questions about trade-offs across various mechanisms.
For testing prominence, we have two data sets in mind: the DBLP coauthorship database (where
interaction is coauthorship); the Twitter network (where interaction is communication). These
two datasets provide us with the possibility of providing external measures of prominence as
well as the link between prominence and credibility (by way of forwarding behavior). They also
provide us with the ability to look at different types of communities: such as those based on peer
review as well as open/ad-hoc communities.
The last aspect of testing our algorithms will be to apply them to rigorous tests with real data.
Brian Uzzi has unique data on containing all 2 million instant messages sent between 66 traders
at a hedge fund. All traders communicate regularly with their network through the IMs in order
to get information about trading. The information is itself ambiguous – it is based on speculation
and hunches – it cannot be directly verified. (If it could be directly verified it would be priced
into the market). Our data permit us to examine the relationship between traders IMing
(content, interevent time, size of network, repeatedness of network ties and so forth) BEFORE
and AFTER the 2008 crash. The crash provides a natural experiment in which the overall level
of uncertainty in the system increased dramatically. One can think of the crash as a shock to the
system not unlike shocks that arise from new terrorist attacks, attacks of new types (double
agents at CIA bases), and new information about tacits. By looking at how the network of
relationship adapts to the new uncertainty before and after the crash, we can understand what is
the signal in the network of transmissions that classifies credibility and trust. We know this
because when the crash occurs the network of IM changes significantly – presumably being
refocused on the most trusted links. Further we can test the value of these ties because we can
also check to see if it helps traders make better traders. Consequently, we can track IM network
behavior before and after the crash, see how the network changes after the crash, find a
algorithmic classifier for know what ties persist after the crash, and then see if different changes
in the network lead to better or worse performance under uncertainty.

NS CTA IPP v1.4 4-34 March 17, 2010


Products
Algorithms that can be used for estimating and enhancing credibility sources, nodes, and of the
information and hence improving the quality of information networks; assessment of the
effectiveness and exploration of limitations of these algorithms for different classes of data
sources.
References
Ronald S. Burt 1992. "Structural holes: The social structure of competition." Boston, MA:
Harvard University Press.
James S. Coleman, 1988. "Social capital in the creation of human capital." American Journal of
Sociology 94:S95-S120.
Fei Liu, Ming Yang, Zicai Wang, ―Study on simulation credibility metrics” Proceedings of the
37th conference on Winter simulation, Orlando, Florida, pp. 2554 – 2560, 2005
B. J. Fogg and H. Tseng, ―The Elements of Computer Credibility‖, Proceedings of CHI’99,
Pittsburgh, PA, May 15-20, 1999, 80-87.
K. Giffin, ―The Contribution Of Studies Of Source Credibility To A Theory Of Interpersonal
Trust In The Communication Process‖, Psychological Bulletin, 68(2), 1967, 104-120.
Mark Granovetter. 1985. "Economic action and social structure: The problem of
embeddedness." American Journal of Sociology 91:481-510.
D. Harrison McKnight and Charles J. Kacmar, ―Factors and Effects of Information Credibility‖,
ICEC’07, August 19–22, 2007, pp 423-432
Herzog, T.N. (1990). Credibility: The Bayesian model versus Bühlmann‘s model. Trans. Soc. of
Act. 41, 43-88.
B. Hilligoss and S. Y. Rieh, Developing a Unifying Framework for Credibility: Construct,
heuristics and interaction in context, Information Processing and Management 44 (2008) 1467–
1484.
Landsman, Z. and Makov, U.E. (1999b). ―On stochastic approximation and credibility‖.
Scandinavian Actuarial Journal,1,15-31.
Udi E. Makov ―Credibility evaluation for heterogeneous populations ASTIN topic: Risk
evaluation‖, http://hevra.haifa.ac.il/stat/actuary/article/makov.pdf
Joel M. Podolny. 1993. "A status-based model of market competition." American Journal of
Sociology 98:829-872.
S. Y. Rieh & D. R. Danielson (2007). Credibility: A multidisciplinary framework. In B. Cronin
(Ed.), Annual Review of Information Science and Technology (Vol. 41, pp. 307-364). Medford,
NJ: Information Today.
S. Tseng and B. J. Fogg, ―Credibility and Computing Technology‖, Communications of the
ACM, 42, 5, 1999, 39-44.
Brian Uzzi. 1997. "Social structure and competition in interfirm networks: The paradox of
embeddedness." Administrative Science Quarterly 42:35-67.
E. Wright and K. Laskey, Credibility Models for Multi-Source Fusion - Proceedings of the 9th
International Conference on Information Fusion, July 2006.

NS CTA IPP v1.4 4-35 March 17, 2010


Wu Rina‘ and Kenji Araki‘, ―Credibility Evaluation of Candidates for Input Prediction Method‖,
International Symposium on Communications and Information Technologies 2001 (ISClT 2004)
Sappon,, Japan. October 26- 29. 2004

Subtask T1.2b Models and Maintenance of Provenance Metadata for Enhancing Trust in
Information (Dakshi Agrawal, IBM (INARC); Tasos Kementsietsidis, IBM (INARC);
Mudhakar Srivatsa, IBM (INARC); Tarek Abdelzaher, UIUC (INARC); Ted Brown,
CUNY (INARC); Ramesh Govindan, USC (CNARC); Prasant Mohapatra, UCD
(CNARC); Munindar Singh, NCSU (CNARC); Ching-Yung Lin, IBM (SCNARC); Zhen
Wen, IBM (SCNARC))

Task Overview
The goal of this task is to model provenance metadata and develop methods for maintaining
provenance metadata within the information and communication networks so that trust in
information can be estimated accurately and enhanced appropriately. It is well known that the
quality of information is a major contributor to the trust that decision makers place in
information. Each piece of information has certain important intrinsic quality attributes, and
within the context of NS-CTA, provenance has been identified as one such key intrinsic QoI
attribute. During the first year, we will focus our attention on modeling provenance and
providing scalable provenance tracking across the information and communication networks and
analyze impact of these techniques on credibility assessments.
Task Motivation
Primary sources of information in military domains (e.g., sensor readings, UAV data, satellite
images, reports from spies, coalition partners and local militia, etc.) are inherently heterogeneous
in terms of their capabilities as well as their affiliations and motives. We envision that in the
future: (i) raw data and processed information will be augmented with QoI attributes (more
specifically, with provenance metadata); (ii) end-users will query for information that is most
valuable to their mission goals and objectives (for example, a user may only want information
whose provenance can be traced to information sources from some selected organizations); and
(iii) end-users will make trust-based decisions in which credibility of the information, and
therefore, the provenance of information plays a major role.
In dynamic networks, provenance modeling and tracking becomes one of the main technical
challenges in assessing credibility due the following reasons: (a) provenance metadata grows as
the information is processed through the information pipeline; (b) provenance metadata could be
sensitive and its sensitivity may be time-dependent; (c) given the constraints of communication
networks, provenance information may sometimes be inaccurate or incomplete. Hence, we need
a suite of provenance models and network mechanisms that are flexible so that they can address
end-user missions with different requirements and cope with varying availability of
communication, information, and cognitive resources. Moreover, since provenance information
is necessary to establish the credibility of information, it is important to understand how the
incompleteness of provenance can impair credibility assessments and how the impact of
incomplete provenance can be minimized.
Key Research Questions
The overarching research problem can be summarized as follows: How does provenance affect
credibility? What are appropriate provenance models? How can trust in distributed decision

NS CTA IPP v1.4 4-36 March 17, 2010


making be enhanced by flexible tracking of provenance metadata (along with other security-
related intrinsic quality attributes) of information?
Initial Hypotheses
We hypothesize that appropriately designed provenance summaries can be used to assess
credibility in a majority of the cases ensuring that detailed provenance information need be
pulled from the network relatively infrequently. We also hypothesize that annotations provide an
intuitive solution for succinctly tracking provenance metadata of sensitive information products
and one can extend traditional relation algebra to support expressive and scalable query
answering systems over the provenance metadata. Moreover, additional corroboration can be
used to assess credibility even in the presence of incomplete provenance information, thereby
minimizing the impact of missing provenance metadata.
Prior Work
Provenance (―to come from‖) in general refers to the origin, or the source, or the history of
ownership or location of an information product. Provenance tracking and querying has been an
active topic of research for more than a decade (Chiticariu and Tan, 2006; Buneman et al., 2002;
Bowers et al., 2007). One of the most common (but often overlooked) challenges is the size of
provenance metadata itself, which in several cases could be many orders larger than the size of
the information product. To see why this is so, consider a simple battlefield report in plain text,
delivered through the information network to an army commander. To generate this report, the
information network might have combined written reports from field units, reports from
confidential army archives, satellite pictures of the terrain, and recent photographs and video
taken from army drones. All of this input data is part of the data provenance of the field report;
while the field report might only be a few kilobytes in size, its provenance includes satellite
images and photographs/video which are commonly in the orders of megabytes. Another aspect
of provenance, commonly referred to as workflow provenance, includes the complete process
graph used within the information network to produce the information product. So provenance
considers not only the raw data used to produce an output but the whole process used to combine
and analyze these raw data. Clearly, keeping track of the associations between the data that flows
in the information network and its associated (data/workflow) provenance is very important for
trust in information and it presents many challenges some of which are rather unique to the
military context. We review these challenges next.
Sensitivity of tactical military information poses an unexplored challenge in provenance tracking.
Traditional applications of provenance tracking have focused on ―public‖ information such as
scientific workflows (Bowers et al., 2007) or assumed that all data items of interest are of the
same sensitivity level such as medical records (Geerts et al., 2006). On the other hand, in military
applications, information producers (e.g., electronic sensor, spy, coalition partner, etc.) and
consumers (e.g., battlefield soldier, field commander, etc.) can be heterogeneous; hence, it may
be necessary to conceal provenance data in order to protect a sensitive information product or
source. The key challenge here is to build security conscious provenance models that account for
the clearance levels of information producers and consumers and the sensitivity of the
information product.
Dynamism in information network poses another unique challenge in military applications. In
such applications, provenance data may be striped across distributed storage units over a weak or
a disconnected communication network. The challenge here is to identify smart placement
strategies for provenance data over an unreliable communication network with the goal of

NS CTA IPP v1.4 4-37 March 17, 2010


enhance its real-time availability. Traditional data placement strategies (e.g., LRU based data
caching strategy) do not apply to provenance data, typically because provenance queries (unlike
the Web/Internet traffic) does not exhibit temporal locality. Hence, it is becomes important to
devise quality-aware caching strategies that improve real-time accesses to the most vital
provenance data – sometimes even forgoing the completeness of provenance data. Identifying
and presenting the most vital provenance data may also be useful in addressing the first
challenge (provenance data summarization) and mitigating the information overload problem for
human decision makers.
Provenance data summarization refers to the problem of scalably maintaining and answering
expressive queries over provenance data in information networks. While scalability and
expressiveness has been actively explored in the context of databases (see the survey paper by
Tan, 2007), traditional approaches do not address multi-resolution queries over provenance data
which can provide users with different trade-offs between cost (e.g., cognitive attention) and
benefit (e.g., better determination of credibility of information) in decision making process. For
example, while delivering data to a soldier with cognitive overload, quality attributes such as
provenance can be conveyed by a simple number in the [0, 1] interval (or more intuitively by a
simple color indicator with, say green, indicating high confidence and orange or red indicating
lower confidence). The soldier can inspect quickly this indicator and decide whether to trust the
information or not in the context of their particular mission. On the other hand, field
commanders with long term mission objectives may need detailed provenance of an information
product used in decision making. Hence, there is a need to support queries over provenance data
at multiple levels of granularity that can be tailored to the end-user needs.
Technical Approach
Sensitivity of tactical information requires a security-aware provenance model; for instance, the
provenance may be aware of security clearance levels (e.g., unclassified, confidential, secret,
etc.) or organizational roles and role hierarchies (say, derived from a RBAC model – Role Based
Access Control). To this end, an investigation of annotation-based provenance models
(Srivastava & Velegrakis, 2007; Geerts et al., 2006) is necessary since annotations provide a
convenient method to attach security metadata to data, be it atomic values or more complex data
structures (e.g., hierarchies). For example, Geerts et al. used atomic value annotations (called
colors – such as red, blue, etc.) and developed a color relational algebra for tracking such
annotations in medical record data. However, these provenance models lack the ability to
represent more complex data structures (e.g., hierarchical security levels) and as such, lack many
features necessary to support security requirements of the military missions. Extensions of the
provenance models necessary to address security requirement will potentially affect both the
storage model and the query algebra.
Provenance data summarization requires effective mechanisms to filter and gradually retrieve
provenance metadata at varying levels of granularity. Further, we need to determine when
provenance retrieval is initiated and which provenance metadata should be chosen first for
delivery. A promising approach in this direction relies on further exploiting the connections
between provenance and QoI. While provenance is one aspect of quality, quality might also help
us decide which data items are more important. We will explore quality-aware provenance
summarization and caching schemes, with the goal of delivering the end users with the highest
quality information (and its metadata) amidst resource constraints (e.g., handheld wireless device
with a battlefield soldier versus a powerful server with a field commander). Since this piece of

NS CTA IPP v1.4 4-38 March 17, 2010


work requires a certain level of maturity in QoI research, we anticipate that part of the work will
be differed to subsequent years.
The communications network is an important component in provenance tracking, since
information is generally expected to be sourced from this network. Research in the core CNARC
program will explore mechanisms for tracking provenance within the communications network.
There are likely to be several mechanisms that lie in different points of the accuracy-versus-cost
space; in general, keeping more accurate to provenance metadata, or ensuring high security of
that metadata, imposes greater network cost. In the Trust CCRI, we will implore the implications
of these choices on credibility and trust assessments. In a tactical network, since even small
pieces of information can have significantly large associated provenance metadata (as described
in the example above), the communications network will likely push only a provenance summary
along with information that it delivers. In many cases, a provenance summary might suffice for
making useful credibility assessments. When it is not, more detailed provenance meta-data can
be pulled from the network. An interesting question we will explore is this push-pull tradeoff:
how much provenance data can we push, in the form of provenance summaries, and what precise
form should the summaries take, so that credibility assessments only occasionally need to pull
data from the network.
As discussed above, in some cases, it may be important to have more complete provenance
metadata in order to make high confidence credibility assessments. More detailed provenance
information can be stored in distributed fashion within the communications network, and can be
dynamically retrieved when it is necessary to make such an assessment. However, the
communication network cannot be expected to reliably store, in perpetuity, all provenance
metadata. Nodes may fail, or may run out of storage, and it is likely that complete provenance
metadata is unavailable. We will study how incomplete provenance information impacts the
quality of credibility assessments. For example, when only partial provenance information is
available, a credibility assessment may place higher reliance on corroboration from additional
sources. In addition, we will also study in-network provenance metadata management techniques
that intelligently manage metadata storage for example, either by aging entries, or by purging the
metadata which is unlikely to be required in the future.
Validation Approach
We will validate our research products along two dimensions: scalability and expressiveness. We
will study the scalability of our approach by analyzing the space-time complexity of various
provenance tracking algorithms with respect to their expressiveness characterized by the richness
of query answering supported over provenance metadata (e.g., what class of provenance queries
can be supported at what cost by a provenance tracking system?).
Beyond the theoretical studies of scalability and expressiveness, it will be crucial to validate the
research by integrating it with realistic gaming scenarios in which the performance of two
decision makers – one with provenance information as discussed above and the other without it –
is compared. We anticipate that such a framework will be available in later years through IRC
project R3 which will develop a network science testbed for experimentation.
In the meanwhile, we will explore using the SCNARC dataset SmallBlue which contains data
from a unique large-scale info-social network system SmallBlue
(http://smallblue.research.ibm.com). This dataset provides information from numerous sources
(e.g., social network, email transactions, personal qualifications, etc.). An interesting validation

NS CTA IPP v1.4 4-39 March 17, 2010


direction would be to inject corruptions in this dataset through an information processing
pipeline and study the effects of provenance on trust-based decision-making by simulating two
software agents – one agent possesses the provenance information and the other does not.
Summary of Military Relevance
Provenance of information products will be a key factor in trust-based distributed decision
making. Military environments are inherently uncertain and are under a constant threat of
network compromise. Being able to efficiently track provenance of information products that are
crucial to an end-user mission is essential for trust-based decision-making.
Research Products
We expect a majority of our initial effort will be spent in developing scalable and expressive
provenance models, summarization techniques, and tracking and querying algorithms for
information products in a dynamic information network, and in understanding how provenance
and the push-pull tradeoff affects credibility assessments. We will also exploit synergy with
Projects I1.3 and C1.2 to investigate non-security related QoI attributes (e.g., uncertainty, data
freshness) arising from fusion of information from heterogeneous information sources and
network delivery modalities. We will also leverage Project I2 to explore solutions for expressive
and scalable QoI-aware query answering through optimal organization of QoI data. We will
collaborate with Project T3.1 to examine the impact of provenance and credibility on enhancing
trust; in the subsequent years, we will also attempt to integrate provenance and credibility
models with a unified trust model researched under Project T1.1. This project has synergies with
C1.4 which models the impact of security properties on QoI, considering for example, the joint
impact of security and data delivery on QoI.
In the subsequent years, this task seeks to expand the capabilities of data management techniques
to encompass a larger spectrum of QoI attributes (beyond provenance and other security-related
intrinsic quality attributes) so that the full richness of QoI metadata provided by communication
and sensor networks can be used by the end-users in their decision making in the context of their
mission.
References

S. Bowers, T. McPhillips and B. Ludascher, 2007. Provenance in Collection-Oriented Scientific


Workflows. Concurrency and Computation: Practice & Experience, special issue on the First
Provenance Challenge.
P. Buneman, S. Khanna and W. C. Tan, 2002. On propagation of deletions and annotations
through views. In Proceedings of the ACM PODS Conference.
L. Chiticariu and W. C. Tan, 2006. Debugging Schema Mappings with Routes. In Proceedings of
the VLDB Conference.
F. Geerts, A. Kementsietsidis and D. Milano, 2006. MONDRIAN: Annotating and Querying
Databases through Colors and Blocks. In Proceedings of the International Conference on Data
Engineering (ICDE).
D. Srivastava and Y. Velegrakis, 2007. Intentional associations between data and metadata. In
Proceedings of the ACM SIGMOD Conference, pp. 401-412.
W. C. Tan, 2007. Provenance in Databases: Past, Current, and Future. In IEEE Data Eng. Bull.,
30(4): 3-12.

NS CTA IPP v1.4 4-40 March 17, 2010


J. Wang, 1999. A Survey of Web Caching Schemes for the Internet. In ACM SIGCOMM
Computer Communication Review, Vol. 29, Issue 5.
L. Wu, C.-Y. Lin, S. Aral, and E. Brynjolfsson, 2009. Value of Social Network – A Large-Scale
Analysis on Network Structure Impact to Financial Revenues of Information Technology
Consultants, Winter Information Systems Conference.

4.6.8 Task T1.3: Cognitive Models of Trust in Human-Information, Human-


Human, Human-Agent Interactions (P. Pirolli, PARC (INARC); W. Gray,
RPI (SCNARC); T. Hollerer, UCSB (INARC); M. Schoelles, RPI (SCNARC);
X. Yan, UCSB (INARC); Collaborators: S. Adali, RPI (SCNARC); G. Powell,
ARL)
Task Overview
Users (e.g., soldiers, field commanders, intelligence officers) interact with information (e.g., seek
relevant information from a repository) as well as with other user and automated agents via user
interfaces. It is well known that dynamics and performance characteristics of these human-
information, human-human, and human-agent interactions are dramatically affected by the
representations and interaction methods available at the user interfaces and by individual
differences in human knowledge and cognitive skills. In this task, our goal is to develop of a
theory of the human cognitive machinery involved in two important aspects of these interactions:
judgments of credibility (of information and of other humans and agents as sources of
information) and influences of credibility and trust on the subsequent cognitive processing of
information.
The behavioral sciences have for the most part treated many aspects of human interactions as
―black boxes‖. Our aim is to start the development of a computational/quantitative theory of the
perceptual and cognitive processing inside of that black box, with a focus on how people process
cues (e.g., from user interfaces, information visualizations of provenance, etc.) into inferences
about credibility and how those inferences feed into trust-based decision making. In future years,
we intend to develop different test task environments as part of the Army-Argus framework and
use these environments to continue exploring various aspects of behavioral and cognitive-
neurological data on trust.
Task Motivation
The goal of this task is to explore cognitive models of trust where human heuristics and biases
play a major role in trust formation and subsequent trust-based decision making in the context of
information acquisition, processing, and propagation. The advantage of such non-rational views
of trust (see http://www.ns-cta.org/projects/netsci/wiki/whatistrust) is their simplicity and ability
to deal with uncertain situations arising out of incomplete information. Specifically, the focus of
the first subtask is on modeling how representations of information content (including its
provenance, etc.) and people (including their profiles, history, etc.) are processed in human
cognition to come up with cognitive representations of information and source credibility and
how these representations affect subsequent acquisition, processing, and propagation of
information. The emphasis in the second task is on the development of a computational model of
human decision-making and performance with a focus on active memory and of the cognitive
mechanisms that give rise to well-known human heuristics and biases leading to departures of
human decision making from classical rational models.

NS CTA IPP v1.4 4-41 March 17, 2010


The efforts in these two subtasks will complement each other. The second subtask will not
directly investigate the induction of semantic categories relevant to human trust computations
which is the focus of the first subtask. Preliminary discussions between Wayne Gray (SCNARC)
and Peter Pirolli (INARC) have already sketched how the two models could be combined by
incorporating Pirolli‘s ―trust‖ induction model into the ACT-R interaction simulation model.
Subtask T.1.3.a Cognitive Models of Interaction with Information and Information Sources
(P. Pirolli, PARC (INARC); T. Hollerer, UCSB (INARC); X. Yan, UCSB (INARC))

Key Research Problems


Can we develop predictive theory and models of the human cognition and action involved in
processing flows of content arriving via computer and mobile interfaces from other people and
information sources?
Initial Hypotheses
Currently, there are no predictive computational cognitive models of learning and judgments of
credibility for human-computer interaction (HCI) tasks. Our hypothesis is that a computational
cognitive model of the judgment of credibility can be fit to data from HCI studies by assuming a
form of Random Utility Model with credibility assessments stochastically based on latent factors
in a probabilistic semantic representation of the information and the information sources (e.g., as
an extension of an Latent Dirichlet Allocation (LDA) approach.). Techniques such as LDA
(Blei, Ng, & Jordan, 2003) can be used to model the gist and the latent semantic categories that
occur in collections of information that can be discretized (e.g., text). Such models (often called
topic models) are based on a generative probability model that assumes that information (e.g.,
text documents or messages) is generated from probabilistic mixtures of underlying concepts or
topics, and topics are probabilistic mixtures of discrete features (such as words or other cues).
Statistical learning techniques can be applied to invert this process and infer the topic and
features distributions involved in a given information collection.
Interestingly, such models can be used to make many predictions about human semantic memory
and category judgments (Griffiths, Steyvers, & Tenenbaum, 2007). Our working assumption is
that people making judgments about information and information sources are processing cues
streaming onto their user interfaces and not only inferring underlying semantic categories to
represent the content itself, but also to represent the trustworthiness and credibility of the
sources.

Prior Work
The prior work in this area includes the computational and mathematical cognitive models
developed in information foraging theory (Fu & Pirolli, 2007, Pirolli, 2007, Pirolli & Card,
1999), models of human interaction with information visualizations (Budiu, Pirolli, Fleetwoord
& Heiser, 2006, Pirolli, Card & Van Der Wege, 2003), and models of human category formation
while interacting with exploratory information browsers (Pirolli, 2004). All of these models deal
with people making or learning utility judgments based on the relevance of information.
Information foraging theory (Pirolli, 2007) models how human semantic memory is involved in
assessing the relevance of snippets of information, such as link text on a Web page, and how that
is used to assess the utility of various information gathering actions. This approach has been
extended somewhat to modeling interaction with highly interactive information visualizations

NS CTA IPP v1.4 4-42 March 17, 2010


(e.g., Budiu et al., 2006). A model related to LDA topics models has been used to model the
semantic categories that people induce by interacting with information (Pirolli, 2004).
This previous theoretical and empirical work did not deal with human perception and reasoning
about the credibility of information. Rather, the past work cited above is mainly about very
simple judgments of information relevance: Given that the goal is to solve a problem X, and I
have some representation of an information item on my screen (an icon or some other
visualization element or some text cues), how likely is it that the information item will be
relevant in solving X. For trust-based decision making, not only is the relevance of the
information critical but the credibility of the information also becomes a key factor.
There are no detailed cognitive studies of how people perceptually and cognitively process
external representations and past experiences to judge credibility in the incoming information or
information sources. One could imagine that such a model – if developed – would be immensely
useful in many different ways; for instance, providing accurate provenance cues when needed.
More generally, this could extend cognitive theories of category formation that deal with simple
relevance-based judgments of information to the area of trust so that we can better predict how
credibility judgments can modulate (for better or worse) the utility judgments made about
incoming information. This extension is a really tough high risk/high reward stretch of the prior
work in the cognitive theory of human-information interactions.

Technical Method
Some of the observed cues about content (e.g., as presented in a user interface) are inferred by
people to be manifestations of latent topics that the information is about (the semantic gist), but
for our research we will now additionally assume that other cues, more specifically, information
provenance visualizations and people profiles, are inferred to be manifestations of latent
categories about consistency, expertise, etc. that will be involved in credibility judgments.
We will develop theory and models of the human cognition with focus on credibility assessments
in individual human information-seeking, information-monitoring, and information-propagation
decisions. Our model development will involve the rational analysis methodology that has
gained currency in the cognitive science: (1) precise specification of the computational goal, (2)
formal analysis of the structure of the task/information environment, (3) specification of an
optimization approach assuming minimal cognitive costs, (4) tests against data, (5) iterations of
1-5, and (6) specification as mechanisms in a cognitive computational architecture (e.g., ACT-
R).
In recent years, in the cognitive sciences, the optimality analysis of steps 1-4 have derived from
Bayesian approaches, which is what we intend to do, building upon our work in information
foraging theory and theories of human category induction. Many recent theories of human
category formation bear similarity to Latent Dirichlet Allocation (LDA) and Topic Modeling in
machine learning.
The LDA generative model is a three-level hierarchical Bayesian models. An LDA topic model
assumes an inferred latent structure, L, that represents the gist of a set of discrete information
items (e.g., documents or messages), g, as a probability distribution over some T topics. Each
topic is, itself, a probability distribution over discrete cues (e.g., words), and those cues can be
associated with multiple topics. The generative statistical process selects the gist as a distribution

NS CTA IPP v1.4 4-43 March 17, 2010


over the topics contained in a document, and then selects words using this topic-information item
distribution and the distribution of cues within topics. This can be specified as:

where g is the gist distribution for an information items. For the ith cue-token occurring in the
information item, the topic zi is selected conditionally on the gist distribution, and the cue wi is
selected conditionally on the topic zi. Thus, in essence, P(z|g) reflects the prevalence of topics
within an information item, and P(w|z) reflects the prevalence of cues within a topic. The
probabilities of an LDA model can be estimated using Gibbs sampling.

Validation Method
The modeling efforts will be driven by experimental studies that manipulate human-information
interaction techniques in ways that are theoretically predicted to affect individual human
cognitive computations involved in judging and processing incoming information as well as
predicted to affect communication of information, social action, and task performance. These
empirical studies will build upon PARC‘s data mining and user interface experience with
systems such as micro-blogging, wikis, social tagging,, RSS feeds, and email. The experiments
will involve meaningful scenarios involving decision-making and prediction tasks that require
utilization of information recommended by others and stated opinions of others. While these
experiments do not simulate a military mission, we believe that the setup is provides a rich
environment for development of cognitive models of trust that will be applicable to many
military missions.

Research Products
This research will produce a report on initial specification (e.g., in ACT-R) of a credibility
judgment model and empirical predictions of the effects of variations in information provenance
and the personal reputation of the information sources on credibility assessments. We will
develop an ecologically valid task scenario and experimental test harness involving PARC‘s
experimental interfaces to fused information from streaming content that aims to test the model
predictions, and obtain preliminary results from the execution of the experiment developed in
this task including the tests of the model against data collected in these experiments.

Subtask T1.3.b: Cognitive science basis for human trust during interactions with human
versus nonhuman agents (W. Gray, RPI (SCNARC); M. Schoelles, RPI (SCNARC))

Key Research Questions


How does the social psychology construct of trust vary in human-human versus human-agent
interactions? Specifically, how does human evaluations of trust influence our subsequent
cognitive processing of information?
Initial Hypotheses
Our initial hypothesis is that the differences in trust in human-human versus human-agent
interactions are signaled by differences in cognitive brain mechanisms and that these differences
can be detected by event-related brain potential (ERP) measures. Using ERP measures, these

NS CTA IPP v1.4 4-44 March 17, 2010


differences can be related to established cognitive science constructs which in turn can be
incorporated as improvements in the ACT-R (Anderson, et al., 2008) cognitive architecture
leading to better computational models of human trust. A key question for the architecture is
determining how variations in trust influence the subsequent cognitive processing of information.
A key to integrating the two subtasks will be to show differences in decision-making outcomes
due to varying degrees of trust.
Prior Work
Beliefs and goals have been shown to influence the second-by-second allocation of cognitive
resources to information in a real-time data acquisition task (Mangels, Butterfield, Lamb, Good,
& Dweck, 2006). Likewise, lying to another human has been shown to increase demands on
cognitive resources of the liar (Carrión, Keenan, & Sebanz, 2010). Prior work on playing
complex games such as Warcraft™ (Lim & Reeves, 2010) and classic economic decision
making games such as paper, rock, scissors (Gallagher, Jack, Roepstorff, & Frith, 2002) have
shown robust differences across a variety of cognitive measures when the opposing player is
believed to be a human versus an artificial agent. These studies indicate that examining social
science constructs from a cognitive science and cognitive neuroscience perspective can be a very
productive enterprise. Just as this approach has shed new light on such social science constructs
as beliefs, goals, and lying, we are confident that it will provide insights into trust.
In our approach, the cognitive science and the cognitive neuroscience approaches to
understanding trust and other social science constructs are related by the use of cognitive
neuroscience to set parameters in computational cognitive models created in a modified ACT-R
architecture. Prior research on human-technology and human-information interaction has shown
that transient differences in allocation of cognitive resources can be captured and predicted by
computational cognitive modeling (Fu & Gray, 2006; Gray & Fu, 2004; Gray, Sims, Fu, &
Schoelles, 2006). The EEG, event-related brain potential (ERP) work will inform the processing
time for ACT-R modules that are implicated in producing the social construct of trust.
Technical Approach
Scenario Development. In a non-CCRI SCNARC task, we plan to develop a Net-Centric
simulated task environment, Argus-Army, in which teams of human players or mixed teams of
human and agent players can cooperate to achieve various tactical objectives. The Argus-Army
simulated task environment builds on the Argus simulated task environment engine developed by
Schoelles (Schoelles & Gray, 2001) under prior Air Force funding. Unlike prior work on game
playing (Gallagher, et al., 2002; Lim & Reeves, 2010), players will play the simulation for hours
rather than minutes, thereby providing longitudinal data. At least one player and possibly two
will be instrumented with EEG caps as they play.
This infrastructure created in the SCNARC task will provide a platform to collect data and
perform test scenarios required for the execution of this task. Argus-Army will support both
human users and Interactive Cognitive Agents based on the ACT-R cognitive architecture
(Anderson, 2007). For this task, the radar displays of Air-Sea space from Argus will be replaced
with terrain maps as is common in military ground wargames. As is the case for Argus, Argus-
Army will require three human or artificial cognitive agents, each of whom receives partial
information of the current situation. For Argus-Army, one player may play the role of the Squad
Leader on the ground, another the uninhabited air vehicle (UAV) Operator, and the third the
Battalion Commander at Headquarters. For purposes of our initial study, it will be the UAV

NS CTA IPP v1.4 4-45 March 17, 2010


operator‘s role that will either be played by a human team member or by an Interactive Cognitive
Agent.
In both types of teams, team members will communicate with each other via menu selections and
typing simple commands (i.e., no voice or complex linguistic data). The human members of the
teams will be instrumented with 32 electrode, EEG caps. All system states, all human mouse
movements, responses, and so on will be saved to a log file and timestamped to the nearest 8 ms.
The log file will be complete enough so as to be able to be ―played back‖ for qualitative
assessment of strategies. It will be complete enough for quantitative assessment of human eye
movements during game play (for example, to examine characteristics of scan paths during play
by the human agents and to compare them with the predicted scan paths made by the Interactive
Cognitive Agent).
Electroencephalogram (EEG): A key feature of our trust analyses will be a reliance on the
EEG data using the event-related potential technique. We will look at stimulus locked
components of brain functioning that have been shown to relate to memory processes in other
studies. If there are differences (in amplitude or scalp distribution) between ERPs recorded
during auditory and visual conditions, we can conclude that there is a difference between
auditory versus visual processing.
Our laboratory has successfully used ERPs to shed insight into performance of complex
videogame play (for an example see Figure 3) and, hence, we are confident that the use of ERPs
in our proposed work will help determine issues such as response time for various cognitive
modules and modalities, and localization effects (Wibral, Turi, Linden, Kaiser, & Bledowski,
2008). In our use of ERP measures, we will sample from the growing list of ERP components
discovered and validated by other researchers.
As Argus-Army is being developed, we plan to obtain preliminary data on the perceptual versus
semantic aspects of trust by first replicating and then extending a paradigm developed by Rudoy
and Paller (2009).

Figure 3: Example of ERP analysis of P300 response during play of a complex video game. Figure taken from
ONR Workshop presentation of Gray and Anderson (2009).

NS CTA IPP v1.4 4-46 March 17, 2010


Validation Approach
Primarily by using the Argus-Army framework but also by using other paradigms as appropriate,
we will examine differences in candidate event-related brain potentials (ERP) that are captured
in human subjects in trust situation either individually or while they play in human-agent and
human-human teams. In team play, each player plays from a different computer located in a
different room. Initially, in all human teams, one member will be a confederate of the
experimenter who plays his/her part of the game according to a fixed script. In human-agent
teams, the confederate will not introduced, rather the two human players will be told that the
third player is an artificial agent. Later in the project, one of the players will be an artificial
interactive cognitive agent (see SCNARC 3 for more details).

Research Products
Q2. Transformation of Argus (Schoelles & Gray, 2001) into Argus-Army. Investigation of the
ability to synchronize EEG collected from two humans at either a single or multiple locations
(taking advantage of the speeds offered by Internet2 technology). Q3. Pilot testing of
experimental setup including Argus-Army and synchronization of EEG data. Q4. Collection of
data from teams of pure human and mixed human-agent players and preliminary analyses of data
with priority going to analyses of the ERP data.

Collaborations
SCNARC Task 3 focuses on the cognitive social science of Net-Centric interactions. As such it
seeks to investigate topics inherent to the social sciences from a perspective informed by modern
theories, models, and neuroscience methods of cognitive science. Focusing on the social science
construct of human trust is part of the larger agenda of Task 3 and may have the effect of helping
to advance the agenda of the Trust CCRI. In addition, many or most of the developments of this
initial effect will be reused in SCNARC Task 3 in further studies of trust and to examine other
social science constructs that are relevant to Net-Centric interactions.

References
J.R. Anderson, C.S. Carter, J. M. Fincham, Y. Qin, S. M. Ravizza & M. Rosenberg-Lee (2008).
Using fMRI to Test Models of Complex Cognition. Cognitive Science, 32(8), 1323-1348.
D Blei, A Ng, M Jordan Latent Dirichlet allocation. Journal of Machine Learning Research
(2003) vol. 3 pp. 993-1022
R. Budiu, P. Pirolli, M. Fleetwood & J. Heiser (2006). Navigation in Degree of Interest Trees. In
Proceedings of AVI 2006, Venezia, Italy.
R. Carrión, J.P. Keenan & N. Sebanz (2010). A truth that's told with bad intent: An ERP study of
deception. Cognition, 114(1), 105-110.
W.-T. Fu & W.D. Gray (2006). Suboptimal tradeoffs in information seeking. Cognitive
Psychology, 52(3), 195-242.
W. Fu & P. Pirolli (2007). SNIF-ACT: A model of user navigation on the World Wide Web.
Human Computer Interaction, 22(4), 355-412.

NS CTA IPP v1.4 4-47 March 17, 2010


H.L. Gallagher, A. I. Jack, A. Roepstorff & C. D. Frith (2002). Imaging the Intentional Stance in
a Competitive Game. Neuroimage, 16(3, Part 1), 814-821.
W. D. Gray & J. R. Anderson (2009). Space Fortress. Paper presented at the ONR Workshop,
Palo Alto, CA.
W. D. Gray & W.-T. Fu, (2004). Soft constraints in interactive behavior: The case of ignoring
perfect knowledge in-the-world for imperfect knowledge in-the-head. Cognitive Science, 28(3),
359-382.
W. D. Gray, C. R. Sims, W.-T. Fu & M. J. Schoelles (2006). The soft constraints hypothesis: A
rational analysis approach to resource allocation for interactive behavior. Psychological Review,
113(3), 461-482.
S. Lim & B. Reeves (2010). Computer agents versus avatars: Responses to interactive game
characters controlled by a computer or other player. International Journal of Human-Computer
Studies, 68(1-2), 57-68.
J. A. Mangels, B. Butterfield, J. Lamb, C. Good & C. S. Dweck (2006). Why do beliefs about intelligence
influence learning success? A social cognitive neuroscience model. Scan, 1, 75-86.
P. Pirolli (2004). The InfoCLASS model: Conceptual richness and inter-person conceptual consensus
about information collections. Cognitive Studies: Bulletin of the Japanese Cognitive Science Society,
11(3), 197-213.

P. Pirolli (2007). Information foraging: A theory of adaptive interaction with information. New
York: Oxford University Press.
P. Pirolli & S. K. Card (1999). Information foraging. Psychological Review, 106, 643-675.
P. Pirolli, S. K. Card & M. M. Van Der Wege (2003). The effects of information scent on visual
search in the Hyperbolic Tree Browser. ACM Transactions on Computer-Human Interaction,
10(1), 20-53.
J. D. Rudoy & K.A. Paller (2009). Who can you trust? Behavioral and neural differences
between perceptual and memory-based influences. Frontiers in Human Neuroscience, 3.
M. J. Schoelles & W. D. Gray (2001). Argus: A suite of tools for research in complex cognition.
Behavior Research Methods, Instruments, & Computers, 33(2), 130–140.
M. Wibral, G. Turi, D.E.J. Linden, J. Kaiser & C. Bledowski (2008). Decomposition of working memory-
related scalp ERPs: Crossvalidation of fMRI-constrained source analysis and ICA. International Journal
of Psychophysiology, 67(3), 200-211.

4.6.9 Linkages with Other Projects


As a project in one of the two main cross-cutting research issues in NS-CTA, we have planned
interacts with the EDIN CCRI and specific non-CCRI projects in all centers. This project has a
clear relationship with E1.1 and E1.2. E1.1 deals with unified ontologies and shared metrics as
trust is a key metric for integrated networks. E1.2 involves network metrics and their relationship
to trust.
The tasks in SCNARC (S1.1 and S1.2) require input from tasks T1.1 in understanding the data
types that should be collected for trust metrics and for the computation of provenance and

NS CTA IPP v1.4 4-48 March 17, 2010


credibility. S1.2 will also help feed the trust factors related to organizational settings. S2.3 will
help in understanding relationships between communities and prominence. S3.1 will help in T3
for understanding the cognitive mechanisms that influence human interactions. Tasks in CNARC
(C1.1, C1.2) will need to use the trust metrics from this project. Models characterizing the
impact of provenance (C1.4) would help the effort in T1.2. The work on provenance and
credibility in T1.2 would impact the modeling of OICC (C1.1) and QoI characterization (C1.2).
Efforts on non-security related QoI attributes in INARC (I1.3) and the scalable organization of
QoI metadata (I2.1) would help in T1.2. INARC task on Information Network Visualization will
benefit from the result of task T1.3 that constructs the cognitive models of trust. The IRC tasks
(R1, R2) will help in exchanging approaches towards compositional methods and for
understanding the interdependencies of network types.
The cross-dependencies between various tasks of this projects are described within the task
descriptions.
IPP Task Linkage
S1.1 ← T1.1 What type of data should be collected for trust metrics
S1.2 → T1.1 Trust factors related to an organizational setting
T1.2 → T1.1 Incorporating provenance and credibility into metrics
T1.3 → T1.1 Incorporating cognitive models into trust models
T2.2 → T1.1 Incorporating network-based indicators
T2.2 ← T1.1 Suggestions for tests of possible network-based indicators
T3.1 ↔ Ensuring that the model can be propagated; how can we limit trust-
T1.1 propagation errors
R1 ↔ T1.1 Exchanging approaches towards compositional methods
R2 ↔ T1.1 Understanding interdependencies of network types; understanding the effect of
information utility in decision making
R3 ← T1.1 Validation of composite models
C1.1 ← T1.1 Modeling OICC and factors that impact need the trust metrics and ideas about
the unified model of trust
C1.2 ↔ Characterizing and controling QoI will use the metrics from this task and will
T1.1 provide feedback for the unified model
C1.4 → T1.1 The models for the impact of provenance and credibility will help the efforts
discussed in this task
E1.1 ↔ T1.1 Ontology for composite networks as it relates to trust factors
E1.2 ↔ Network metrics and their relationship to trust
T1.1
S2.3→T1.2b Community formation and dissolution in understanding relationship between
communities and prominence
S1.1← T1.2 Requires input from this task in understanding what type of data should be
collected for provenance and credibility
C1.1 ← T1.2 The work on provenance and credibility with feed into the modeling effort of
OICC
C1.2 ← T1.2 The QoI characterization effort will be impacted by this task
I1.3 ← T1.2 Investigation of non-security related QoI attributes (e.g., uncertainty and
freshness)
I2.1 ← T1.2 Scalable organization of QoI metadata to support scalable QoI-aware query

NS CTA IPP v1.4 4-49 March 17, 2010


answering
S3.1 → T1.3 Understanding cognitive mechanisms that influence human interactions

4.6.10 Collaborations and Staff Rotations


Collaborations between the three ARCs and the IRC have already germinated during the
preparation of this IPP. All the centers will be actively collaborating for T1 which forms the
foundational study for the entire TRUST CCRI.
IBM Research scientists from INARC located at NS-CTA facility will spend a majority of their
time working on this project. Our initial planning also calls for a staff rotation during summer
between one of the academic partners and IBM Research in form of a summer internship. The
staff rotation plans from the other centers are being worked out.

4.6.11 Relation to DoD and Industry Research


Network and Information Sciences – International Technology Alliance (NIS-ITA) has worked
in the area of Quality of Information (QoI) and Value of Information (VoI) establishing and
investigating these concepts in the context of sensor networks. The concept of trust itself has also
been investigated in NIS-ITA with a focus on enabling a trust management system in coalition
MANETs.
In contrast, the key thrust of this NS-CTA project is an investigation of interdependencies and
combinations of trust metrics and mechanisms that have been traditionally investigated in
relative isolation with each other in communication, information, and social/cognitive networks.
Thus the scope of work in this project is much more comprehensive and wherever appropriate
we will leverage research in NIS-ITA and other research programs.

Recently, IARPA has started a new initiative on TRUST which aims to develop protocols to
measure trust between individuals based on neurological, psychological, behavioral and
physiological inputs. The main aim of this work is to measure trust in face to face encounters of
dyads or small groups of people. This work is complementary to the emphasis of the work on
CTA on networks, technology driven interactions and large groups. Both CTA and IARPA
efforts are based firmly on social trust. There is a great deal of synergy between this approach
and task T1.3 and we are already initiating discussions with the researchers interested in this
work to seek possible collaborations and knowledge transfer between the projects that will
emerge under this new program.

4.6.12 Project Research Milestones


Project Research Milestones

Due Task Description

Q2 T1.1 Taxonomy document complete with (a) taxonomy definitions


and actuation description, (b) outline of observable factors;

NS CTA IPP v1.4 4-50 March 17, 2010


Project Research Milestones

Due Task Description

terms completed for each network type but not complete


definitions.
Taxonomy document with (a) complete definitions of
observable factors, (b) outline of how to capture and
Q3 T1.1 represent factors, and (c) outline of key touch-points for
Trust. Demonstration of two or more touch-point interactions
through visual and quantitative analyses.
Complete taxonomy document, with definitions, factors,
Q4 T1.1 factor representation, and touch-points; supported by
experimental demonstrations.
Verbal comparison of the aggregate Trust models from
existing approaches of internal PIs; identification of other
Q2 T1.1 existing aggregate model approaches. Outlines of testing
scenarios that will highlight strengths and weaknesses of each
approach.
Experimental pair-wise comparisons of existing aggregate
Trust approaches. Identification of collaboration
Q3 T1.1
opportunities among existing approaches; identification of
potential extensions based on literature.
Continued experimental comparisons of existing aggregate
Q4 T1.1 Trust approaches; outline of an aggregate framework that
collates synergistic components.
Initial investigation into security-aware provenance models –
Q2 T1.2a explore extensions to color algebra to support hierarchical
labels
Q3 T1.2a Design and analysis of provenance tracking algorithms
Analysis of impact of incomplete provenance information on
Q3 T1.2a
trust assessments.
Implementation and testing; report and/or research paper on
Q4 T1.2a provenance tracking algorithms and the impact of provenance
on credibility
Define factors that affect credibility; derive models for
Q2 T1.2b
credibility assessments;
Design the general problem solving framework for proposed
work 1 (Extension of TruthFinder framework) 2. Exploit
Q2 T1.2b
event chain knowledge to reduce the uncertainty in cross-
document entity coreference resolution
Q2 T1.2b Development of models of credibility of information based

NS CTA IPP v1.4 4-51 March 17, 2010


Project Research Milestones

Due Task Description

on the social network, algorithm design for prominence


computation
Q3 T1.2b Algorithm design, implementation and testing
Explore the fundamental limits on credibility assessments, or
Q3 T1.2b
the credibility frontier
Implementation of tests for credibility of information based
Q3 T1.2b on the social network and prominence of nodes.
Development of models that combine social and information
based credibility methods
Algorithm refinement, additional testing, and research paper
Q4 T1.2b
writing on credibility assessments (submission).
Testing and refinement of algorithms on socially based
credibility computations and prominence. Preliminary
Q4 T1.2b
implementation and testing of algorithms based on social and
information based credibility methods.
Report on initial specification of a credibility judgment
model and empirical predictions of the effects of variations in
Q2 T1.3.a
information provenance and the personal reputation of the
information sources.
Report on the development of an ecologically valid task
scenario and experimental test harness involving PARC‘s
Q3 T1.3.a experimental interfaces to fused information from streaming
content that aims to test the model predictions developed in
Q2.
Report on preliminary results from the execution of the
Q4 T.1.3.a experiment developed in Q3. Paper describing more detailed
tests of the model against data collected in Q3, Q4.
Transformation of Argus (Schoelles & Gray, 2001) into
Argus-Army. Investigation of the ability to synchronize EEG
Q2 T.1.3.b collected from two humans at either a single or multiple
locations (taking advantage of the speeds offered by Internet2
technology).
Pilot testing of experimental setup including Argus-Army and
Q3 T.1.3.b
synchronization of EEG data.
Collection of data from teams of pure human and mixed
Q4 T.1.3.b human-agent players and preliminary analyses of data with
priority going to analyses of the ERP data.

NS CTA IPP v1.4 4-52 March 17, 2010


4.6.13 Project Budget by Organization
The budget in the following table is for the first year.

Budget By Organization

Organization Government Funding ($) Cost Share ($)


BBN (IRC) 203,130
CUNY (INARC) 80,849
CUNY (SCNARC) 65,115
IBM (INARC) 192,205
IBM (SCNARC) 23,557
NCState (CNARC) 110,240
NWU (SCNARC) 56,000
PARC (INARC) 94,341
RPI (SCNARC) 138,020 25,196
UCD (CNARC) 129,014
UCSB (INARC) 37,604
UIUC (INARC) 188,936
UMD (SCNARC) 71,000
USC (CNARC) 45,000
Total 1,435,011 25,196

4.7 Project T2: Understanding the interactions between network


characteristics and trust

Project Lead: Sibel Adali, RPI (SCNARC)


Email: sibel@cs.rpi.edu, Phone: 518-276-8047

Primary Research Staff Collaborators

S. Adali, RPI (SCNARC) S. Parsons, CUNY (SCNARC)

NS CTA IPP v1.4 4-53 March 17, 2010


A. Goel, Stanford (CNARC) G. Korniss, RPI (SCNARC)

C. Lim, RPI (SCNARC) D. Parkes, Harvard (IRC)

P. Mohapatra, UCD (CNARC) M. Srivatsa, IBM (INARC)

R. Govindan, USC (CNARC) M. Wellman, UMich (IRC)

W. Wallace, RPI (SCNARC) J. J. Garcia-Luna-Aceves, UCSC (CNARC)

N. Chawla, ND (SCNARC) S. Krishnamurthy, UCR (CNARC)

K. Haigh, BBN (IRC) B. Szymanski, RPI (SCNARC)

M. Goldberg, RPI (SCNARC) B. Uzzi, NW (SCNARC)

D. Hachen, ND (SCNARC)

K. Levitt, UCD (CNARC)

O. Lizardo, ND (SCNARC)

M. Magdon-Ismail, RPI (SCNARC)

J. Opper, BBN (IRC)

Z. Torackai, ND (SCNARC)

F. Wu, UCD (CNARC)

M. Faloutsos, UCR (IRC)

4.7.1 Project Overview


As has been discussed earlier, in this CCRI we define trust as a relation between a trustor and a
trustee. The trustor is a decision making entity and the trustee can be a person, a piece of
information, a processing unit with or without reasoning capability, or a system encompassing a
combination of these. Trust relation incorporates a number of different characteristics such as the
context of decision making, the uncertainty involved in the available information at the decision
time and a specific benefit to cost trade off involved in the trust decision identifying the risk.
These examples of the trust characteristics are by no means exhaustive, but are expected to serve
as examples. In the networking context, trust is one of the fundamental relationships that underlie
the formation and evolution of networks. Trusting nodes exchange information, causing
information flow to be initiated. Nodes that interact with each other develop trusting
relationships. The circularity of this relationship is not a mistake: trust relationships
continuously change as a function of how nodes choose to or are able to interact with each other.
Nodes that are unable to communicate may stop trusting each other. The trust relationships also
create new dynamics that alter the network characteristics. Improvement of trust among two

NS CTA IPP v1.4 4-54 March 17, 2010


entities may alter the trust between other pairs of entities: trust in a tightly knit group leading to
distrust of others outside of this group, trust allowing the flow of trusted information may also
make it more difficult for new and novel information to enter the system as nodes choose to
spread information without checking its credibility due to the trust placed on the source.
These dynamics are crucial in designing networks that are able to detect and adapt to changes
resulting from trust relationships and their evolution. The research in this project will help
develop network design of paradigms for ensuring that networks have robust performance,
support appropriate functions despite specific failures or existence of adversaries, and are
capable of making use of trust effectively to accomplish mission. Furthermore, the research will
make it possible to develop methods and algorithms for detection of expected or unexpected
events in the network, emergence of ideas and groups. It will make it possible to design
operations that explicitly manipulate network parameters, such as dissemination or stopping of
certain information/ideas and breaking down silos in information sharing.

4.7.2 Project Motivation


Distributed decision making in NCO, especially IW and COIN operations requires command,
control and coordination of numerous and heterogeneous nodes, from hardware and software
infrastructure to information and people. Envision the situation on the urban battleground, e.g.,
Operation Anaconda described earlier, where soldiers are getting real time tactical data relevant
to their mission priorities similar to the Tactical Ground Reporting (TIGR) system. This system
allows soldiers to communicate with each other, provide real time data about their environment
(video, voice, text) to relevant members of the team. Soldiers also interact with many civilians
and informants with dubious allegiance, and input data provided by them to the system. They can
input intelligence that they gather during their mission, contact other members of their team for
clarification on people, information or systems they are interacting with, read sensor data to
check validity of different claims, transmit video to their team to get their opinion on specific
aspects of their environment. The main purpose of the system is to assist the soldier gather the
relevant information for his/her mission at the right time.
This is an example of a distribute decision making system where the soldiers are cooperating to
accomplish a mission with the help of such a system. The soldier has to make many decisions:
who to consult for information, who to pass their information to, how much to trust a specific
information they receive based on how it related to what else is known, who to pass this
information to next to achieve his goal? There are many interdependent dynamics involved in
such a system: mobility of the nodes, conversation patterns based on who is likely to exchange
information, information propagation patterns based on which crucial information is circulating
in the system. Ultimately, understanding these dependencies is a crucial first step in designing
composite networks and systems that will make the best use of available resources and offer the
best performance in solving distributed decision making problems.

4.7.3 Key Research Questions


The research agenda for the first year concentrates on the relationship between network
characteristics and trust, various dynamics created by trust and the impact of specific network
dynamics on trust. Two main problems will be considered: the network characteristics related to
the formation or dissolution of trusting relationships and management of risk.

NS CTA IPP v1.4 4-55 March 17, 2010


The function of trust as a risk management mechanism will be investigated in Task 2.1. The
main research problems that will be studied in this task are: (1) how and to what extent can the
concepts of incomplete virtual markets, non-monetary price mechanisms, self-financing non-
monetary budgetary constraints and existence of equalizing discount factor and uniform risk-
neutral or Martingale measures be realized in computational network models of trust? (2) Given
various network topologies, and given various stochastic models of transactions, how long can a
decentralized trust network continue to serve before trust relationships start to degrade and the
trust network starts to break down? How robust are trust networks to adversarial infiltration?
The relationships between network behavior characteristics related to trust in different types of
will be investigated in Task 2.2. The main research problems that will be studied under this task
are: What are network topological and flow (interaction) characteristics in social, information
and communication networks that are indicative of the existence or lack of trust? Which network
features and dynamics characteristics in social, information and communication networks affect
the emergence or dissolution of trust? How do we quantify the impact of network features and
dynamics on trust (using the metrics from T1) and provide guidelines for altering network
features in order to enhance trust (feeding to T3)?

4.7.4 Initial Hypotheses


In this project, we start with the initial hypothesis that trust is a relationship: between two
humans, between a human and an agent, a human and information/interface, communication
node and the network. These relationships alter the way networks evolve and are used. Trusted
systems and links are used more frequently, information flows along trusted paths across all
networks.
Starting with this hypothesis, we concentrate on specific relationships. We postulate that: 1.
There exist trust mechanisms that allow individuals to cooperate with each other consistently
while satisfying the goals of each player. We further pose that these cooperative behaviors,
clearly impinging on the issue of Trust, can be modeled by market/credit models, albeit in many
of the relevant applications to Network science, these are non-monetary ones. 2.Credit networks
can be sustained over long periods of time and provide robust performance. Moreover, network
credit models can be applied to cyber-warfare and proxy/circumvention technologies. Network
trust models can also be applied in information collection which is an essential ingredient in any
tasks devoted to the formulation and computation of trust in networks. 3. In social networks, at
the dyadic level, four properties are crucial to understanding social tie evolution and therefore
the extent to which trust exists between two people: reciprocity, tie strength, homophily (node
similarity) and embeddedness (the extent to which a connected nodes have common neighbors).
Stronger and more reciprocal ties are more likely to persist and therefore such ties are more
indicative of a trust relationship. Weak and/or non-reciprocal ties are also likely to persist if the
nodes are similar and therefore weak and non-reciprocal ties under this condition are also
indicative of a trust relationship. Strong and more reciprocal ties will tend to be embedded within
communities, while weaker and less reciprocal ties will tend to not be as embedded, indicating
that trust relationships can persist in both embedded and non-embedded situations. 4. In
communication networks, it is possible to derive statistical features that can characterize the
network elements (nodes, links, paths) that are impacted by the network dynamics due to
security, mobility and social networking. These statistical features can be used to improve the

NS CTA IPP v1.4 4-56 March 17, 2010


network architecture and protocols to compensate for performance degradations due to variations
in network dynamics. The network behaviors of (3) and (4) are correlated with each other.

4.7.5 Technical Approach


Overview
This project consists of two tasks as follows:
Task T2.1 Interaction of trust with the network under models of trust as a risk
management mechanism (Leads: A. Goel, Stanford (CNARC); C. Lim, RPI (SCNARC))
This task concentrates on the modeling of network as an abstract entity where the function of
trust can be examined using advanced economic models that aim to reduce risk, optimize
performance in different tasks of military relevance. Research will concentrate on the
development of the necessary scientific methods for the analysis of these models and the
examination of the various properties of these networks from an analytical and experimental
perspective.
Task T2.2 Network Behavior Based Indicators of Trust (Leads: S. Adali, RPI (SCNARC);
P. Mohapatra, UCD (CNARC))
The goal of this task is to study statistical features of the network behavior characteristics that
could signal a change in trust, either positive or negative. From a social networking perspective,
regularity and persistence of communication patterns between people may signal trust among
them. From a communication network perspective, the variation in link quality or changes in
network topology and node mobility signal changes in network performability (performance and
reliability) that impacts trust. From an information network perspective, the pattern of
transmission of some content may signal a consensus is being reached on a topic. In all networks,
it is important to develop methods to compute such statistical metrics using distributed
algorithms operating with local information and integrative research that is able aggregate and
translate the network changes between multiple network types.

4.7.6 Task T2.1: Interaction of trust with the network under models of trust as a
risk management mechanism (C. Lim, RPI (SCNARC); A. Goel, Stanford
(CNARC); R. Govindan, USC (CNARC); W. Wallace, RPI (SCNARC);
Collaborators: S. Adali, RPI (SCNARC); G. Korniss, RPI (SCNARC); D.
Parkes, Harvard (IRC); S. Parsons, CUNY (SCNARC); M. Srivatsa, IBM
(INARC); M. Wellman, UMich (IRC))

Task Overview
This task concentrates on developing the science needed to study economic models that represent
and quantify trust in a network. Many military operations such as those involving COIN, IW
require the soldiers to interact with network nodes (communication, information or social) with
various levels of trust with possibly different values and objectives. In these networks, nodes are
decision makers that initiate interactions to achieve a possible gain. In some cases, the
interactions require cooperation between two nodes (in the case of two people interacting with
each other) and in other cases the interactions are in the form of flows where one node simply
decides to pass information to another node or block it (in case of a decision maker receiving
information and processing it). In these settings the gain can be thought of as a change in the

NS CTA IPP v1.4 4-57 March 17, 2010


node's state, such as gaining better situational awareness or achieving a mission objective.
Economic models make it possible to study large systems of decision makers acting with certain
motives of optimizing their gain, either for their next action (single period optimization) vs. for
their overall mission planning (multi period optimization). It is possible to model cases where
nodes have limited information about the state of their neighbors and their motives, resulting in
uncertainty. In this type of setting, there is a certain risk to taking an action: the risk of obtaining
incorrect information vs. the risk of not getting valuable information, the risk of interacting with
an adversary vs. a friend. The decision maker has to use different mechanisms to balance the risk
taking actions to increase the likelihood of achieving a desired goal. One of the main functions of
trust is to enable the decision maker to take risk. The higher the risk, the higher the amount of
trust required to take an action in that scenario. The economic framework is a crucial step
towards studying the requirements for trust and the function of trust in the composite network for
decision making applications.
Two subareas which complement each other in this task are (1) pricing models for multi-period
multi-player networks under uncertainty extended to non-monetary settings of conflict and
natural disaster and (2) decentralized trust networks which are robust to infiltration and provide
almost the same performance as optimally designed centralized trust networks. It will be
explained where these two areas overlap and connect with each other and also how they will
interact with the network science initiative. The two areas together comprise a unified approach
to study trust in social networks, cyber-warfare applications, for non-monetary markets and with
incomplete information.
The Arrow-Debreu-Modigliani-Miller single-period framework in area (1) for K tradable
securities in an uncertain world consisting of S contingent states will be extended to the multi-
period setting. Incomplete markets and their associated return / payoff matrix Z provide a
suitable analytical and computational framework that can be extended to cover many scenarios
under conflict or natural disaster circumstances. The key ideas in this overarching theory for
non-monetary pricing mechanisms and risk management of trust/distrust under conflict are (a)
incomplete markets, (b) non-monetary price mechanisms, (c) self-financing non-monetary
budgetary constraints or simple extensions of such, and (d) equalizing discount factor for
networked community of players / consumers and (e) existence of risk-neutral or Martingale
measures that are uniform across the community of consumers in the Network. Chjan Lim will
lead this area.
A trust network, as defined in area (2), is a specific type of network where edges in the graph
represent trust between entities. The graph could be nodes in a social network, a content network
such as bitTorrent, or a proxy network for circumventing Internet censorship in repressive
countries. We will model a trust network as a ―credit network‖, where every node acts as a bank
and prints its own currency. Further, a weighted edge (X,Y) in the network with weight W
implies that Y has extended a credit line of W currency units to X, i.e. Y is willing to trust up to
W units of X's currency, at which point this credit line is saturated. Every service is performed in
return for trusted currency, and hence, these kinds of trust networks are quite robust to
infiltrators even when some of the nodes in the network are unknown or adversarial, which
makes it particularly suited for defense applications and cyber warfare, as described later. Our
conjecture is that credit trust networks can be sustained over long periods of time and provide
robust performance. The long survival time of a trust network is equivalent to the notion of
―sustained liquidity‖ in a market. In this proposal we will study (a) the survival time aka liquidity

NS CTA IPP v1.4 4-58 March 17, 2010


(b) the robustness properties, and (c) decentralized choice of trust network parameters. For (a)
and (c) we will study how decentralized trust networks compare with centralized trust networks
in terms of liquidity and how decentralized parameter choices compare to optimal centralized
choice of parameters. Ashish Goel will lead this area and will collaborate primarily with Ramesh
Govindan.
The models and methods in (2) clearly interact with other parts of the network science effort. It
will be important to understand the underlying economic mechanisms and utility functions which
the participants will use to decide their trust weights, and area (1) will help area (2) in this
regard. Another area of interaction would be to decide how trust weights get replenished. In
addition to coordination between areas (1) and (2), we will also communicate and coordinate
with David Parkes and Michael Wellman from other parts of the network science effort. In future
years, we seek to explore these linkages closely to explore how different economic models are
related to each other.
Not all social networks meet the precise economic (Arrow-Debreu) conditions for a single
equalized discount factor that holds across the network, and even less, meet the requirements for
a uniform risk-neutral pricing mechanism – hence the need for research in the next two years.
Fortunately, even in incomplete markets where not all contingent transactions or wagers are
allowed, there are many situations where a Martingale measure exists. Then this probability
measure provides a powerful method (instead of the computationally inefficient Dynamic
Programming) to solve complicated multi-period stochastic optimization problems that arise in
for example dynamic management of embedded sensor networks under uncertainty such as in
conflicts and natural disasters.
Task Motivation
Consider the following example: A set of proxies on the public Internet is being used to serve
traffic to nodes in a country with a repressive censorship regime. The IP addresses of these
proxies are passed to trusted nodes outside the repressive country. The number of IP addresses
provided to a trusted node depends on the trust weight. These trusted nodes then pass these IP
addresses to their trusted friends inside the repressive country, which then pass them on to other
trusted nodes, etc. In the repressive country, agents are trying to infiltrate this network. Small
trust weights mean that only a small number of proxy IPs can be compromised when some
honest node makes a mistake and starts to trust an infiltrator (even if the trust the honest
node places in the infiltrator is very high, since the overall number of IPs the infiltrator can
receive depends on the minimum trust value on the path from outside the country to the
infiltrator). In this setting, it does not make sense to set the weights centrally or think of an
insurer. It is best to let each node evaluate its own trust relationship in a decentralized fashion;
hence the models in area (2) of this task. It is important to note that proxy and circumvention
technologies are of great interest not just in cyber warfare, but also as a means of diplomacy.
Officials from the US Department of State, on a recent visit to Stanford University, posed this as
a challenge problem for technologists and entrepreneurs. Further, the trust network also maps
directly to content networks such as bitTorrent.
The Arrow-Debreu-Modigliani-Miller-Merton-Markowitz framework provides many
opportunities to model and study different military scenarios. Consider an information collection
scenario: Standard procedures for running a small informers network call for cross-partial
verification of one agent against another and all against dependable and independently verifiable
source. In this situation, the "variance" of an agent's reports on M different events gives the

NS CTA IPP v1.4 4-59 March 17, 2010


Army a useful rough measure of an agent's reliability. Another agent -characteristic is the
"mean" or his overall value computed by aggregating over M tasks. The Army's portfolio now
consists of N active agents in the field, each submitting multiple or Mj reports on purposely
overlapping and inter-related events - to facilitate cross-partial verification -and to compile a
useful summary of intel on the battleground, a quick and reliable real-time algorithm that will do
this is to maximize the mean of the portfolio and minimize its variance each of these being
dependent on the agents' historical means and variances on record. It is similarly possible to
model actions of civilians based on possible coalitions, the spread of information based on its
value, etc.
In addition to the single-period portfolio optimization model just mentioned, it is useful to
consider multi-periods and multi-players models under uncertain conflict and natural disaster
settings. Consider the dynamic management of a very large set of embedded sensors (such as the
millions of nanobots planted by HP worldwide) in the field surrounding a Fortress on very high
ground that is inundated by flood waters and/or enemy troops. A robust sensor placed on the
highest point of the Fortress (or a high-flying satellite sensor) can for practical purposes serve as
the riskless asset candidate in a non-monetary incomplete market model since its high position
guarantees its safety while its distance from the front / waterline makes it a relatively weak but
steady source of information. Under the no-arbitrage (no free lunch) assumption of all viable
economic models, which will be verified and/ or explicitly built into the details of the model
given below, there is therefore a single discounting factor equalized across all the virtual
managers/ consumers of sensor portfolios that will allow us to compute the relative benefits of
different portfolios which return information at different times before a finite fixed time horizon
T [many ref in monetary setting – almost none in current ones – hence highly original research].
This assumption also provides a uniform risk-neutral or Martingale probability measure that
offers a valuable computational alternative to the expensive Dynamic Programming often used
in such problems to dynamically fine tune the selection of sensors at the end of each time period
before T, in order to maximize for example the expected utility of total information collected at
T.
In addition to being well motivated in its own right, area (1) will also provide input to area (2) in
terms of understanding how decentralized nodes should (or would in a game theoretic setting)
choose trust weights on edges.
Key Research Question
In area (1), the key question will be how and to what extent can the concepts of (a) incomplete
virtual markets, (b) non-monetary price mechanisms, (c) self-financing non-monetary budgetary
constraints and (d) existence of equalizing discount factor and uniform risk-neutral or Martingale
measures be realized in computational network models of trust? These concepts are what allow
the applications of powerful economic thinking to a non-monetary setting as in this task.
The key research questions for area (2) will be: (a) Given various network topologies, and given
various stochastic models of transactions, how long can a decentralized trust network continue to
serve before trust relationships start to degrade and the trust network starts to break down? In
other words, how much liquidity is there in a credit trust network? How does this compare to
having a centralized trusted broker who mediates in all transactions? (b) Under various network
topologies, and stochastic models on the behavior of good nodes, how robust are trust networks
to adversarial infiltration, and (c) How efficient is it to allow nodes to choose trust weights in a
decentralized fashion as opposed to imposing these weights centrally?

NS CTA IPP v1.4 4-60 March 17, 2010


Initial Hypotheses
We hypothesize that:
There exist trust mechanisms that allow individuals to cooperate with each other
consistently while satisfying the goals of each player. We further pose that these
cooperative behaviors, clearly impinging on the issue of Trust, can be modeled by
market/credit models, albeit in many of the relevant applications to Network science,
these are non-monetary ones.
Credit networks can be sustained over long periods of time and provide robust
performance. Moreover, network credit models can be applied to cyber-warfare and
proxy/circumvention technologies. Network trust models can also be applied in
information collectionwhich is an essential ingredient in any tasks devoted to the
formulation and computation of trust in networks.
These will in the near future provide connections between the current task and other parts
of the Network Science effort.
Prior Work
Relevant to the work in area (1), are the stochastic optimization methods of (Sahasrabudhe and
Kar, 2008) and (Subramani, 2008) used above for wireless network spectrum to multi-periods
information collection by agent subnets embedded in a dynamic social network. The impact of
these recent work is that they provide an economic - trading - game-theoretic framework for
implementing a Martingale pricing algorithm that will reduce the complexity of the associated
optimization procedure from exponential in N to N log N where N is the number of time -periods
used. This allows us to study very large models that was not possible under previous methods.
Martingale pricing theory has been used in the past to facilitate the dynamic stochastic
optimization of a portfolio of mixed assets (Schreve, Dixit). Markov Chain models including
Monte-Carlo methods (Lim and Nebus, 2006, Zhang and Lim, 2009) have been applied to a
range of Potts models. The extensive computational work in (Lu, Korniss, Szymanski, 2008) on
the Naming Game and on leadership and community in evolving social networks has shown that
under weak assumptions on asymmetric initial agent- preference or tastes, this MCM converges
at explicitly calculable rate to invariant measures that represents either a single name across the
RGG or a finite number of single-named cliques (Zhang and Lim, 2009).
With respect to area (2), structural properties of trust networks modeled as graphs with positive
and negative edges (denoting trust and distrust) were first studied by Cartwright and Harary in
the 1950s (Cartwright & Harary, 1956). More recently (Guha, Kumar, Raghavan & Tomkins,
2004) proposed a framework to model trust and distrust as two separate graphs (e.g. a trust graph
and a distrust graph) each with edge weights in [0,1] representing the degree of trust/distrust.
They used the framework to predict missing trust/distrust values on an Epinions dataset. (Ghosh,
Mahdian, Reeves & Pennock, 2007, DeFigueiredo & Barr, 2005) model a trust network as a
``credit network". Individual transactions in this model are well understood. However, for
tactical applications, it is important to understand how this network functions over time. Does it
remain reliable over long periods or does it quickly get to a state where trust relationships are
lost and transactions either stop happening or become inefficient? If the latter, then how can we
modify the trust model to address this problem? There is no prior understanding of the long-term
behavior and evolution of these credit networks, and this will be the main topic of study in this
task.

NS CTA IPP v1.4 4-61 March 17, 2010


Technical Approach
Single-period mean-variance models are useful starting points for modeling trust in uncertain
environments such as in conflicts and natural disaster. See the motivational example above of
modeling and quantifying Trust and distrust on the basis of maximizing information content
while minimizing risk or variance based on the Modern Portfolio Theory of Markowitz
(Markowitz 1959). One approach that is suitable for multi-periods and multi-players models
under uncertainty and thus requiring Trust elements and measurements of trustworthiness, is
based on equalizing a strategic discount factor amongst the set of N players in the network, and
related issue of the existence of uniform risk-neutral or Martingale measures for all N players. As
is well-known (Milne 1990, Dixit) such a uniform discount factor is needed in a multi-period
setting to bridge between any pair of time periods separated in time, and can be generated from a
single riskless asset in the model. There are several ways to come up with such a riskless asset in
the war-fighting and natural disaster setting which will be discussed in the actual work projected
in the first two years (Lim, Lecture Notes for Financial Math and Simulations RPI, copyrighted
2010 and available online http://www.rpi.edu/~limc). The motivational application gives an
example of such a riskless asset.
In such extensions, the single-period return matrix Z that we develop for any given situation will
be supplemented by more sophisticated models consisting of a finite but possibly large set of
consumers / producers interacting through a social network. This extension to multi-players
setting already assume a secondary level of network connectivity above the basic network
linking the sensors or primary assets in the field. Such extensions will be carried out by
customizing nearly neoclassical (Milne 1990) utility functions to the key players in a conflict
situation who are, for instance, semi-independent runners or controllers of agents / sensors
portfolios. Instead of tradable basic securities in a traditional market model, we will substitute a
self-financing type budgetary constraint on the finite set of sensors - wireless, drones, human
agents – assumed to be infinitely divisible, already embedded (awaiting various degrees of
activation) and exchangeable between them according to a non-monetary price mechanism at
significant points in time before the fixed horizon at time t = T.
Next, Martingale pricing methods will allow us to bypass the computational obstacles in these
many-periods stochastic optimization problems, in the sense that traditional Dynamic
Programming – state of the art in many current applications – can be replaced by a vastly more
efficient two-stage procedure based on Martingales (Milne 1990, Schreve).
Area 2: As mentioned earlier, a trust network is a specific type of network where edges in the
graph represent trust between entities. We will model a trust network as a ―credit network‖,
where every node acts as a bank and prints its own currency. Further, a weighted edge (X,Y) in
the network with weight W implies that Y has extended a credit line of W currency units to X,
i.e. Y is willing to trust up to W units of X's currency, at which point this credit line is saturated.
However, this results in an edge forming in the reverse direction (i.e. from Y to X) since Y now
has some currency from X which it can return to X in exchange for some service. Hence, these
networks can potentially sustain transactions for a long time, even though the initial trust
capacities may be small, Every service is performed in return for trusted currency, and hence,
these kinds of trust networks are quite robust to infiltrators even when some of the nodes in the
network are unknown or adversarial, which makes it particularly attractive for defense
applications and cyber-warfare. If there is no direct trust relationship between X and Y, X can
still provide a service to Y if there exists a ―chain of trust‖ from X to Y, in which case some

NS CTA IPP v1.4 4-62 March 17, 2010


amount of trust gets reversed on all links along this chain. In graph theory, this is equivalent to
continuously routing ―flow‖ from a source to a destination in a residual capacity graph. Under
stochastic arrivals, our problem therefore reduces to the problem of continuous max-flow in
residual graphs. This is what we will study in the first year. Specifically, we will build a
simulator for credit trust networks, and also study the fundamental properties of these networks
mathematically. We will formulate a Markov chain such that every state in the Markov chain
corresponds to a possible residual graph. We will analyze this Markov chain formally. We will
show the conditions under which this Markov chain has a stationary distribution and study this
stationary distribution formally for simple graphs. We will also develop a simulator for trust
networks.

Validation Approach
Each of the following aspects (a) incomplete markets, (b) non-monetary price mechanisms, (c)
self-financing non-monetary budgetary constraints and (d) existence of equalizing discount
factor and uniform risk-neutral or Martingale measures, that are vital to the Research Program
proposed here, will have to be theoretically deduced as theorems and/or verified empirically –
only the minimum set of a priori assumptions will be made concerning these basic notions. In
this respect the work by other components of SCNARC, INARC and CNARC will be useful.
Particularly, the empirical work by (Hui, Magdon-Ismail, Goldberg and Wallace, 2009) on
natural disaster trust mechanism, the cognitive perspective on Trust provided by Task T1.3, and
the general equilibrium microeconomic models posited by members of the IRC, will positively
contribute to the validation efforts of our theoretical models. As part of the validation of first-
order models in this task, which combines the effects of stochastic dynamic optimization and
changes in the underlying social network, we will carefully check that various realistic scenarios
of local short-time changes in the social network are indeed suitably modeled by stochastic
simulation of sensing ranges. Extending this simple model to other more complex social-
cognitive networks, the notion that asset price provides an objective metric for the current
usefulness/correctness of the data collected by a given agent is part of a powerful coarse-graining
or aggregation methodology that has served the social-political-economic sciences well (Dixit)
and is expected to provide reliable metrics for information collection tasks in the current social-
cognitive contexts. In this direction we will use the Naming Game algorithm (Lu, Korniss,
Szymanski, 2008) as an efficient diagnostic and validation tool to discover underlying persistent
community-structure in a given dynamic social network. To gain further insight and construct
network-influenced trust metrics, we will also investigate fundamental models for reinforcement
learning (Arthur, 1994; Challet and Zhang, 1997) on dynamic co-evolving networks. In
particular, we will investigate the emergence of leadership structure influenced by an evolving
trust landscape (Anghel et al., 2004; Kirley, 2006).
For area (2), we will validate our findings by (a) Making sure that the simulation and formal
studies support each other, and (b) Equally importantly, we will consult experts in the area of
proxy and circumvention, and possibly State department officials, to ensure that we are on the
right track.
Research Products
Further issues concerning the dissemination of information after its efficient collection and
trustworthiness have been established will be related to the work of the IN and CN-ARCs where
the different time and spatial scales involved in the underlying dynamic socially-stratified

NS CTA IPP v1.4 4-63 March 17, 2010


network will interact in complex and tightly coupled ways. The specific aims of streamlining
information collection can be accomplished within the first 18 months. Specific milestones
include specification of the concrete models, Monte-Carlo simulations of test systems and
analyses of the martingale pricing approach, resulting in two long papers, two conference papers,
and extensive lecture notes for a graduate course on networked finance offered at RPI in 2010 by
Lim. The research products of area (2) are refined trust - credit network models which will have
direct and significant applications in the study of the impact of different trust levels and
mechanisms on the efficiency and robustness of social networks under external and internal
threats.
In area (1) we will provide technical reports, papers, and lecture notes that will be disseminated
to other parts of the Network Science effort. In particular, reports on results implementing the
Martingale pricing method in a non-monetary setting will offer a valuable alternative to Dynamic
Programming to others with related problems. Some of these findings will be announced at the
upcoming NetSci 2010 Boston conference and international school organized by Barabasi et al.
We will provide reports detailing theoretical understanding of the range of parameters where
credit-based trust networks can be sustained over long periods of time in area (2). Specifically,
the second quarter deliverables will be a simple text based simulator for credit based networks,
and an analysis of very simple networks such as lines. The Q3 deliverable will extend this
analysis to trees and possibly complete graphs. The Q4 deliverable will be adding more UI
elements to the simulator, and to compare the liquidity of our decentralized models of trust with
those of centralized models, for simple network topologies, using both simulations and formal
analysis. This simulator will be shared with other parts of the networks science effort and will
become a resource for studying trust networks. We will write a paper that synthesizes the
deliverables and circulate a preprint among network science researchers.
In the long term, in area (2), we will (a) study the robustness of these networks under various
attach models from infiltrators and behavior models for honest nodes, (b) study mechanisms for
decentralized parameter choices, and (c) consult with experts in proxy/circumvention and, if
possible, State department officials to understand how we can make our work directly useful for
cyber warfare.

References

M. Anghel, Z. Toroczkai, K.E. Bassler & G. Korniss. Competition-Driven Network Dynamics:


Emergence of a Scale-Free Leadership Structure and Collective Efficiency, Phys. Rev. Lett. 92,
058701 (2004).
W.B. Arthur. Bounded rationality and inductive behavior (the El Farol problem), Amer. Econ.
Rev. 84, 406–411 (1994).
R. Bhattacharjee & A. Goel. Avoiding ballot stuffing in eBay-like reputation systems, Third
Workshop on Economics of Peer-to-Peer systems, Aug 2005.
R. Bhattacharjee & A. Goel. Algorithms and incentives for robust ranking, Proceedings of the
18th ACM-SIAM Symposium on Discrete Algorithms, Jan 2007.
D. Cartwright and F. Harary (1956), Structural balance: a generalization of Heider's theory.
Psychological Review, vol. 63, pp. 277-293.

NS CTA IPP v1.4 4-64 March 17, 2010


D. Challet & Y.C. Zhang. Emergence of cooperation and organisation in an evolutionary game,
Physica A 246 (1997) 407.
D. DeFigueiredo, B. Venkatachalam & S.F. Wu. Bounds on the performance of p2p networks
using tit-for-tat strategies, in P2P: Proceedings of the Seventh IEEE International Conference on
Peer-to-Peer Computing (P2P), (Washington, DC, USA), pp. 11–18, IEEE Computer Society,
2007.
D. DeFigueiredo & E. T. Barr. TrustDavis: A non-exploitable online reputation system in CEC:
Proceedings of the Seventh IEEE International Conference on E-Commerce Technology (CEC),
(Washington, DC, USA), pp. 274–283, IEEE Computer Society, 2005.
A. Dixit. Econ Theory and Optimization, MIT Press., 1990.
A. Ghosh, M. Mahdian, D.M. Reeves, D.M. Pennock and R. Fugger. Mechanism Design on
Trust Networks, WINE, 2007, 257-268.
R. Guha, R. Kumar , P. Raghavan and A. Tomkins. Propagation of trust and distrust, WWW '04:
Proceedings of the 13th International Conference on World Wide Web, 2004, pp. 403-412.
C. Hui, M. Magdon-Ismail, M. Goldberg, W.A. Wallace 
 "The Impact of Changes in Network
Structure on Diffusion of Warnings", 
 Proc. Workshop on Analysis of Dynamic Networks
(SIAM International Conference on Data Mining), pages, 2009.
M. Kirley. Evolutionary minority games with small-world interaction, Physica A 365, 521–528
(2006).
C. Lim and J. Nebus. Vorticity, Statistical Mechanics and Monte-Carlo Simulations, Springer-
Verlag 2006
Q. Lu, G. Korniss and B. Szymanski. Naming games in two-dimensional and small-world-
connected random geometric networks, Phys Rev E, 77:016111, 2008.
F. Milne, Economic theory and Asset Pricing, Oxford U Press, 1990
H. Markowitz. Portfolio Selection Theory, Addison-Wesley 1959
A. Sahasrabudhe and K. Kar. ―Bandwidth allocation games under budget and access
constraints‖, in Proceedings of CISS, Princeton, NJ, March 2008.
S. Schreve. Math of Finance, Vol 1, Springer-Verlag, 1995.
S. Subramani, T. Basar, S. Armour, D. Kaleshi and Z. Fan, ―Noncooperative equilibrium
solutions for spectrum access in distributed cognitive radio networks‖, in Proceedings of New
Frontiers in Dynamic Spectrum Access Networks, DySPAN 2008, pages 1-5, 2008.
W. Zhang and C. Lim. An exactly-solved Markov Chain Model for the Naming Game, preprint
December 2009.

NS CTA IPP v1.4 4-65 March 17, 2010


4.7.7 Task T2.2: Network Behavior Based Indicators of Trust (S. Adali, RPI
(SNARC); P. Mohapatra, UCD (CNARC); N. Chawla, ND (SCNARC); K.
Haigh, BBN (IRC); M. Goldberg, RPI (SCNARC); D. Hachen, ND
(SCNARC); K. Levitt, UC Davis (CNARC); O. Lizardo, ND (SCNARC); M.
Magdon-Ismail, RPI (SCNARC); J. Opper, BBN (IRC); Z. Torackai, ND
(SCNARC); W. Wallace, RPI (SCNARC); F. Wu, UCD (CNARC); M.
Faloutsos, UC Riverside (IRC); Collaborators: J. Garcia-Luna-Aceves, UCSC
(CNARC) R. Govindan, USC (CNARC), S. Krishnamurthy, UCR (CNARC), B.
Szymanski, RPI (SCNARC), B. Uzzi, NW (SCNARC))

Task Overview
Trust is defined as a relationship between a trustor and a trustee. It enables interactions between
nodes: the more trust there is between two nodes, the more likely they are to interact with each
other. In addition, one can argue that even in cases where trust did not exist at first, a series of
interactions between nodes may cause trust to emerge over time. The key idea then is that
various network behaviors and patterns can be indicative of the presence (or absence) of trust, or
situations in which trust may be forming (or eroding). These two processes are interdependent:
trust has impacts on network behavior and network behaviors can impact trust. But this does not
signal a cycle but rather an ongoing process in which interactions within social networks are
intertwined with the evolution of trust relations between social actors.
Social networks have a tendency to exhibit triangular closure and transitivity such that if A
interacts with B and B interacts with C, then it is very likely that A will interact with C (Davis
1963; Granovetter 1973).This clustering can occur through a variety of mechanisms, such as in
functional groups (i.e. a task directed project) (Feld 1982, Uzzi and Spiro 2004), because of
social balance processes (Davis 1963; Hummon and Doreian 2003) or even affinity/similarity
(McPherson, Smith-Lovin and Cook 2001). The end result is clusters or islands within social
networks in which social interactions including information flows are more likely to occur within
communities and much less likely to occur between communities unless there are ―bridges‖
connecting communities (Granovetter 1973; Watts and Strogatz 1998). Communities and more
generally the typology of social networks are created through the formation and dissolution of
social ties (Burt 2000; Wellman et al 1997; Feld, Suitor and Hoegh 2007). To some extent social
tie formation and dissolution is endogenous, i.e. a function of node traits and social processes
within a social network/group such as homophily, proximity, task directives, and norms of
reciprocity (Toivonen et al 2009). But tie formation and dissolution can also be impacted by (1)
information networks (and most importantly the information flows) through influence processes
(A persuades B to be more like A) and (2) the tools and configurations of communication
systems designed to facilitate information transfer and processing. But just as important, and
maybe more important, is how information networks (and their flows) and communication
networks impact social network interactions and evolution (Doreian 2002) This is clearest in the
case of limited bandwidth resulting in actors who want/need to communicate but are unable to do
so (Aral and Alstyne 2007). Communication networks have to evolve with social networks and
the information transfer needs they generate, otherwise a communication network‘s design and
structure is likely to inhibit interaction and not facilitate it. What is needed then are smart
communication networks, i.e. networks which are aware of the interactions occurring within
them and can modify themselves (e,g,, switch communication between bands) based on those

NS CTA IPP v1.4 4-66 March 17, 2010


interactions. A similar point can be made with information networks. As social ties form and
dissolve, the flows of information, both the levels and the paths, will change.
From the opposite end, the behavior of communication and information networks impacts the use
of networks to facilitate interactions and the formation of ties within social networks. A change
in network dynamics will have an impact on who can communicate with whom and what social
actors can hear and see. In addition, network dynamics can alter the paths through which
information flows and the quality of the information that arrives at a given point in the social
network. Hence, interactions observed between individuals can contain artifacts resulting from
the fact that they are embedded in a communication and information network structure. In the
communication domain, network dynamics is impacted by node mobility, heterogeneity, and the
variations in channel conditions. These adaptations cause the link quality and the network
topology to change, which in turn impacts the network reliability and performance. These factors
can, therefore, impact the trustworthiness of the network. Statistical analysis of the network
dynamics in terms link quality and topological variance could show a pattern that would relate to
the quantifications of trust factors. For example, specific statistical characteristics could be
linked to the failures of links, or abnormal behaviors related to the trustworthiness. Similarly, the
topological changes coupled with heterogeneity may impact the provenance of the flow, and
thereby relate to trust. A careful and detailed analysis of such statistical attributes along with that
impact on trust will be one of the goals of this task.
Hence, certain characteristics of the communication and information network behaviors may
facilitate interactions that are crucial to trust while others may inhibit it. There is substantial
interdependence between social, information and communication network phenomena. It is
crucial to study the touch points between different networks in order to understand how trust
evolves in such composite and multilayered networks. In this task, we will study statistical
features of the network behavior characteristics that could signal the existence of trust or a
change in the level of trust, either positive or negative. From a social networking perspective the
regularity, intensity, reciprocity, embeddedness, and persistence of communication patterns and
interactions between people may signal trust among them. From a communication network
perspective, the variation in link quality or changes in network topology and node mobility can
signal changes in network performability (performance and reliability) thereby impacting trust.
From an information networking perspective, the sharing of valuable information early and often
may be related to the development of trust. Therefore for social, communication and information
networks it is essential to (1) develop methods for computing statistical metrics of key network
processes and flows using distributed algorithms operating with local information and (2) tools
that can facilitate integrative research that will enable researchers to aggregate network changes
across multiple network types.
The ultimate aim of this task is to develop the science necessary to provide guidelines for
designing communication and information networks that both facilitate trust in social networks
and which operate more effectively by better utilizing the resources available based on the
constraints imposed on them by social networks. In year 1, we will concentrate on identifying the
significant patterns related to trust and possible touch points between these patterns. In the
subsequent years, we will continue to develop the theory of patterns in different network times,
but at the same time concentrate on the correlations of these patterns across different networks
and how these correlations can be used to design better networks. Can we design better
communication and information networks? Can we understand social networks better or

NS CTA IPP v1.4 4-67 March 17, 2010


facilitate better social interactions based on the communication and information networks it is
built on?
Task Motivation
Tactical networks operate under several adverse conditions. They are impacted by the inherent
variations within the network, the nature of information flows, and the social relationship
between the sender(s) and receiver(s). For example, in a battlefield, the wireless link qualities
may vary because of interferences and changes in topology. The information flow may vary in
terms of content, urgency, quality, and modality. Similarly, the social relationships between the
source(s) and destination(s) could change because of varying trust depending on the scenario and
the personnel or equipment involved. To achieve success in a high risk mission, it is important
that soldiers participating in the mission share valuable information in a timely manner. The
structure and functioning of a unit‘s social, communication and information networks, and the
interconnections between these networks can affect essential information sharing and
acquisition. Within the social network, soldiers will most likely share information with others
they trust and transmit information they find credible. Hence, trust plays an important role in
determining who will speak to whom and how information will flow in such social networks.
Within the communication network, the underlying channels through which information is
shared and acquired must be reliable and trustworthy. Within the information network there
needs to be mechanisms through which new, different, and even divergent information is
obtained and routed to people who need that information. Meeting these ―needs‖ of each
network can and often are done independently, but given the interdependence between social,
communication and information networks, it is important to understand how the operation of
each network can facilitate or inhibit what the other networks are trying to achieve. For
example, strong ties among soldiers together with clustering can facilitate trust and information
sharing. However the information shared is likely to be redundant and low risk. Obtaining
valuable but new (or different) information may require links to social actors who are not
strongly connected to other social actors in the unit. Therefore, communication networks have to
be flexible enough so that they can reliably facilitate both the circulation of information among
trusted social actors and the acquisition and distribution of valuable information that is new.
Key Research Question

What are the important network topological and flow (interaction) characteristics in social,
information and communication networks that are indicative of the existence or lack of trust?
Which network features and dynamic characteristics of social, information and communication
networks affect the emergence or dissolution of trust?
How do we quantify the impact of network features and dynamics on trust (using the metrics
from T1) and provide guidelines for altering network features in order to enhance trust (feeding
to T3)?

Initial Hypotheses

Social Networks: At the dyadic level, four properties are crucial to understanding social tie
evolution and therefore the extent to which trust exists between two people: reciprocity, tie
strength, homophily (node similarity) and embeddedness (the extent to which connected nodes

NS CTA IPP v1.4 4-68 March 17, 2010


have common neighbors). Our model focuses on tie persistence and decay (in contrast to tie
formation). We expect that:
Stronger and more reciprocal ties are more likely to persist and therefore such ties are
more indicative of a trust relationship.
However, weak ties can and often do persist, but we expect this to occur only when the
interacting nodes are similar in important ways. Such weak ties may even be non-reciprocal.
Their persistence could be indicative of another route to forming trusting relationships. So we
also expect that:
Weak and/or non-reciprocal ties are also likely to persist if the nodes are similar and
therefore weak and non-reciprocal ties under this condition are also indicative of a trust
relationship.
Finally, we expect that:
Strong and more reciprocal ties will tend to be embedded within communities, while
weaker and less reciprocal ties will tend to not be as embedded, indicating that trust
relationships can persist in both embedded and non-embedded situations.

Communication Networks: It is possible to derive statistical features that can characterize the
network elements (nodes, links, paths) that are impacted by the network dynamics due to
security, mobility and social networking. These statistical features can be used to improve the
network architecture and protocols to compensate for performance degradations due to variations
in network dynamic.
Prior Work
Prior work in this area consider the dynamics of reciprocity in social networks in different
contexts (Garlaschelli and Loffredo, 2004, Gouldner, 1960, Hallinan, 1978, Hallinan,1980,
Hammer, 1985, Mandel, 2000, Skvoretz and Agneessens, 2007, Zamore-Lopez et. al. 2008). Our
work extends on this work for large-scale communication networks (Hachen et. al. 2009(a),
2009(b)) and complex networks in general (Barrat et. al., 2004) . The impact of mobility on
social networks has been studied by (Eagle, Pentland and Lazer, 2008, 2009). We intend to
extend this work to issues of reciprocity (Hachen et. al., 2009a, 2009b) and social influence
(Hachen and Davern, 2006).
In a preliminary effort, we have shown the probability distribution function (PDF) of the RSS
and its variations in the wireless links with respect to node mobility (Govindan, et al, 2010). A
closed form expression has been derived to characterized the PDF and relate it to a set of
mobility patterns. In a recent feature in Nature (Gammon, 2010), Professor Wu introduced the
idea of integrating social networking concepts (from the application-layer) into the foundations
of network design. Instead of provisioning global connectivity (which is the current mantra of
Internet), the connectivity could be established based on the social interactions and relationship.
This approach emulates the normal human communication behavior. Various security threats can
be eliminated by this fundamental design philosophy. In addition, this approach would also
impact the network dynamics. For example, a node no longer will have a unique identity; it will
have contextual social identity and the information will get routed based on the contextual social
identity.

NS CTA IPP v1.4 4-69 March 17, 2010


Analysis of network dynamics especially mobility and its impact on the communications is
carried out in (Syrotiuk, 2006). Variation of network topology in mobile adhoc networks and
method of obtaining a stable topology are analyzed in (Ramanathan, 2000, Cabrera, 2007). A
realistic mobility model is for mobile ad hoc network is proposed in (Jardosh, 2003). There are
works on routing algorithm design to capture the link dynamics and node failures. However,
there is no work on comprehensive analysis or deriving statistical attributes of various dynamics
of the network. There have been some works on the impact of dynamics on the trust in social
networks and business networks (Kale, 2007, 2005). However, we are not aware of any work on
the statistical characterization of network dynamics and their impact on trust.

Technical Approach

Social Networks

One of the main problems we will study is the evolution within social networks of reciprocity in
dyadic relationships and its relationship to trust. We expect that most ties begin in a more non-
reciprocal (unbalanced) state, and that over time either the tie dies as non-reciprocating by one
node (A) leads the other node (B) to decrease initiating interactions with that node OR the tie
evolves into a more reciprocal one as node A initiates more interactions with node B. We will
examine if these are the only two evolutionary paths, or whether it is also possible that a stable
non-reciprocal relationship can also emerge and,if so, under what conditions.
Essential to modeling the evolution of social networks is directional weighted network data
indicative of the extent to which a social actor initiates interaction with another social actor. We
define these weights as mij , the count of the number of interactions (e.g., communication events)
initiated by i and directed towards j. Reciprocity is then defined as the level of balance in a
dyad, i.e. mij m ji . The strength of a tie (edge) can be defined as the geometric mean of the
weights, i.e. mij m ji . An important type of similarity is degree assortativity defined as di dj ,
where di is the out degree of node i, e.g., the number of people with whom node i communicates.
Embeddedness of a tie can be defined as the number of neighbors that i and j have in common.
There are both weighted and non-weighted versions of this common neighbor measure.
The first stage of the research focuses on reciprocity and its correlates. Preliminary analyses
show that degree dissimilar ties are almost always reciprocal, indicating that degree dissimilar
dyads tend not to persist. However among dyads in which di d j we find both reciprocal and
non-reciprocal ties, indicating that non-reciprocal ties can persist when the nodes are similar.
The second stage involves developing algorithms and simulations to predict how newly formed
ties (which tend to be non-reciprocal) become reciprocal over time. We will examine the role
that homophily (including degree assortativity, but also similarity in terms of age, gender, and
residential location), and embeddedness play.
The third stage focuses on tie persistence or the longevity of a tie. The continuous-time hazard
R(t) R(t Vt)
rate of a tie dissolving in the next moment of time, h(t) lim , can be modeled as
Vt 0 Vt R(t)
a function of time itself (new ties have high failure rates and failure rates decline with a tie‘s age)

NS CTA IPP v1.4 4-70 March 17, 2010


and time-varying covariates pertaining to the extent to which a tie is reciprocal, strong and
embedded.

The fourth stage involves modeling change over time in the information flows and levels of
interaction between communicating nodes (the mij ‘s) using change and growth models. We will
investigate the extent to which growth/decline in flows is a function of the level of reciprocity
already evident in the dyad, as well as the similarity between nodes. Also, recalling
Granovetter‘s strength of weak tie thesis that non-redundant (new) information in more likely to
arrive at a node (A) from a neighbor that is not connected to any of node‘s A other neighbors, we
intend to examine whether and how changes in flows are affected by embeddedness.

The final stage will involve disaggregating data so that instead of looking at counts of directional
interactions and their changes over discrete periods of time we will focus on the events
themselves and their sequencing. We will examine different types of dyadic communication
patterns or motifs, such as conversations that entail back and forth messages and sequences
indicative of the propagation of information (e.g., A calls B, and then B calls C). While
conversations do not necessarily signal trust relationships, they may be indicative of a
relationship that will evolve into trust. Purposeful forwarding of information from one node to
another can be interpreted as a trust vote on the content and its sender. Discovering a
conversation on a specific topic that is taking place may be too costly based on textual analysis
for graphs involving millions of nodes. As a result, our aim will be to first develop methods to
discover and analyze these behaviors solely based on statistical analyses of the timing and
sequencing of these actions. We will extract relationships such as A trusts B (a directed link), or
there is a mutual trust relationship between A and B (an undirected link) from this graph.

Figure 4. Indicators of trust based on communication behavior

Consider first conversations, we postulate that if two nodes converse, then they are more likely
to trust each other; and a prolonged conversation reinforces this conclusion. We first propose to
partition the message exchanges into conversations based on the times between two exchanges.
The strength of the conversational trust indicator Tc between A and B is then given by:

where H(Ci) is a measure of the balance in the conversation Ci. One possible measure of balance
is based on entropy:
where p is the fraction of the messages in the conversation sent by A.

NS CTA IPP v1.4 4-71 March 17, 2010


For the propagation of information, we compute the following. If A sends a message to B, and if
B, within some time interval, propagates the message to some third person X (or Y), this is
indicative of trust. Similarly, a repeated propagation makes the conclusion stronger. We also
construct a propagation graph based on the interevent times, without checking the content. There
are multiple ways to assign weights to this interaction: propAB/prop B measures the fraction of all
the messages propagated by B that come from A, i.e. the amount of energy B spends propagation
messages from B. Similarly, propAB/mAB measures the number of messages from A that is
considered important enough to propagate by B.

Communication Networks

The primary causes of the dynamics from a communication network viewpoint are related to the
variations in channel quality, failures of nodes and links, and node mobility. In addition, security
attacks or perceived threats create adversarial dynamicity. Changes and variances in application
requirements and the social contexts also cause network dynamism. Social relationships create
both spatial and temporal dynamism in the network. These causes of network dynamics have an
impact on the path or links through which information is transferred through the network.
Changes in paths translate into a change in the set of nodes and links with various degrees of
trustworthiness associated with them. They also impact trust factors such as provenance,
credibility, security, confidentiality, among other aspects.

We propose to first derive a set of statistical features such as various distribution functions that
can characterize the network elements (nodes, links, paths) that are impacted by the network
dynamics. For example, in our preliminary effort, we have derived closed form expressions for
the probability distribution function of the variation in link qualities due to various node mobility
patterns in wireless networks. We propose to derive similar statistical characterizations for the
link quality variations due to interferences, and the topological variations due to node/link
failures. Deriving the correlation of social dynamics and the network characteristics in the
statistical domain will be an interesting task that will need strong collaborations between
SCNARC and CNARC researchers.

The correlation between the statistical characterizations and their impact on various trust factors
will be the second scope of our study under this task. We will analyze how the statistical
characterizations of the link quality variations impact the changes in topology, and thereby affect
the provenance and credibility of information flow through the network.

Combination of the above two scopes will provide us an insight to the correlation between the
network characteristics and trust. We will use metrics developed in Project T1 for quantifying
the correlations and the inferences derived from these correlations would help in the trust
propagation designs envisioned in Task T.3.1.

Validation Approach
The reciprocity evolution and persistence hypotheses will be validated using cellular telephone
network data on the calling and texting patterns of over 7 million subscribers of one cellular
telephone company over an extended period of time. With these data we can identify who

NS CTA IPP v1.4 4-72 March 17, 2010


communicates with who within a given time period and among those dyads calculate how often
each node initiates the communication event. Information on callers and callees including age,
gender and residential location are used to develop measures of homophily. Three dimensional
plots and other visualization techniques are used to detect and identify network patterns and
relationships among network variables. Hazard rate, logistic regression and growth models are
used to model the effects of covariates on tie persistence and growth in tie flows over time,
supplemented by machine learning and data mining techniques. Agent base modeling is used to
simulate network evolution both of reciprocity and tie persistence.

We will test the reciprocity and trust relationship by studying communications in Twitter. We
will extract conversation and behavior relationships from the Twitter data set and test whether
they represent similar or different types of relationships. We will then test whether our statistical
measures capture the relationships induced from actual propagation behavior that will be used as
a proxy for trust. As trust is a founding element of communities, we will examine the
communities formed by trusting relationships and test to which degree communities that are
formed by different types of relationships are similar to or different from each other.

From the communication perspective, we are in the process of collecting a large set of
experimental data from our existing QuRiNet (Wu and Mohapatra, 2010) to validate the
proposed technical approaches. QuRiNet is a unique test-bed in the sense that it can be
configured for various sensitivity analyses. Large scale experiments can be conducted using
QuRiNet to validate the statistical behaviors derived in the proposed study. The network
topology can be varied and the path of information flow can be changed to generate input data
for the validation model. In parallel, a large amount of experimental data can be analyzed to help
design better models to capture the impact of network dynamics on the reliability of the network.

Research Products
The major product that will come out of this research is a series of papers designed to advance
the theoretical understanding of how reciprocity develops in dyadic relationships. Theories about
how reciprocity emerges and the role of reciprocity in social networks are underdeveloped
because until recently researchers did not have longitudinal weighted edge data with which they
could assess the degree of reciprocity in a dyad and changes in that degree over time. Analysis
and models of the determinants of reciprocity will help us understand tie persistence and decay,
given that we expect more reciprocal ties to persist. Advancing theories about reciprocity and
balance within dyads through this research will then provide the foundation for exploring how
reciprocity (balanced flows) is related to other flow aspects, in particular the redundancy of
information and the amount and quality of the information that is being conveyed.
References

S. Aral and M. Van Alstyne 2007. ―Network Structure & Information Advantage.‖ Proceedings
of the Academy of Management Conference, Philadelphia, PA. 2007.
A. Barrat, M. Barthelemy, R. Pastor-Satorras & A. Vespignani. ―The architecture of complex
weighted networks‖. Proceedings of the National Academy of Sciences, 101(11):3747–3752,
2004.

NS CTA IPP v1.4 4-73 March 17, 2010


R. S. Burt 2000. ―Decay Functions.‖ Social Networks 22:1–28. 2000.
J.o˜ao B. D. Cabrera, R.am Ramanathan, Carlo.s Guti´errez, and R.aman K. Mehra ―Stable
Topology Control for Mobile Ad-Hoc Networks‖, IEEE Comm. Letts, Vol. 11, no. 7, July 2007.
Mojc.a Ciglaric, Andrej. Krevl, Matjaž. Pancur, and Tone. Vidmar, ―Effects of Network
Dynamics on Routing Efficiency in P2P Networks‖, World Academy of Science, Engineering
and Technology, 8, 2005.
J. A. Davis 1963. ―Structural Balance, Mechanical Solidarity, and Interpersonal Relations.‖
American Journal of Sociology 68:444–62. 1963
P. Doreian 2002. ―Event Sequences as Generators of Network Evolution.‖ Social Networks 24:
93-119. 2002.
N. Eagle, A. S. Pentland, D. Lazer & MA Cambridge. ―Mobile Phone Data for Inferring Social
Network Structure‖. Social Computing, Behavioral Modeling, and Prediction, pages 79–88,
2008.
S. L. Feld 1982. ―The Focused Organization of Social Ties.‖ American Journal of Sociology 86:
1015–1035. 1982.
S. L. Feld, J.J. Suitor and J. G. Hoegh 2007. ―Describing changes in personal networks over
time.‖ Field Methods 19: 218-236. 2007.
K. Gammon, Networking: Four ways to reinvent the Internet, Nature 463, 602-604 (2010).
D. Garlaschelli and M.I. Loffredo Patterns of link reciprocity in directed networks. Physical
Review Letters, 93(26):268701, 2004.
K. Govindan, K. Zeng, P. Mohapatra, Probability Density of the Received Power in Mobile
Networks, Submitted to the IEEE Transactions on Wireless Communications, 2010.
M. S. Granovetter. 1973. ―The Strength of Weak Ties.‖ American Journal of Sociology 78:1360–
80. 1973.
A. W. Gouldner. The norm of reciprocity: A preliminary statement. American Sociological
Review, 25(2):161–178, 1960.
D. Hachen & M. Davern. ―The role of information and influence in social networks: examining
the association between social network structure and job mobility‖. American Journal of
Economics and Sociology 65, (2):269-293 (2006).
D. Hachen, O. Lizardo, C. Wang, Z. Toroczkai, N.V. Chawla, R. Lichtenwalter, T. Raeder, A.
Strathman, & Z. Zhou. (2009a). ―Reciprocity: The Missing Link‖. NetSci?09, International
Workshop on Network Science, Venice, Italy, June 2009.x
D. Hachen, O. Lizardo, C. Wang & Z. Zhou. (2009). ―Correlates of Reciprocity in a Large-Scale
Communication Network: A Weighted Edge Approach‖. International Sunbelt Social Network
Conference, San Diego, March 2009b.
M. T. Hallinan & E. E. Hutchins., ―Structural Effects on Dyadic Change‖. Social Forces,
59(1):225–245, 1980.
M. Hammer, Implications of behavioral and cognitive reciprocity in social network data. Social
Networks, 7:189–201, 1985.

NS CTA IPP v1.4 4-74 March 17, 2010


N. P. Hummon and P. Doreian 2003. ―Some dynamics of social balance processes: bringing
Heider back into balance theory.‖ Social Networks 25: 17-49. 2003.
A. Jardosh, E. M. Belding-Royer, K. C. Almeroth, and S. Suri. ―Towards realistic mobility
models for mobile ad hoc networks‖. in Proceedings of the 9th Annual ACM International
Conference on Mobile Computing and Networking (Mobicom’03), pages 217–229, September
2003.
Anubhav Kale, Amit Karandikar, Pranam Kolari, Akshay Java, Anupam Joshi, ―Modeling Trust
and Influence in the Blogosphere Using Link Polarity‖, in Proceedings of the International
Conference on Weblogs and Social Media (ICWSM 2007), March 25, 2007.
M. McPherson, Lynn Smith-Lovin, and James M. Cook. 2001. ―Birds of a Feather: Homophily
in Social Networks.‖ Annual Review of Sociology 27:415–44. 2001.
M. Mandel, ―Measuring tendency towards mutuality in a social network‖. Social Networks,
22(4):285–298, 2000.
R. Ramanathan and R. Rosales-Hain, ―Topology control of multihop wireless networks using
transmit power adjustment‖, In Proc. Infocom, 2000.
J. Skvoretz & F. Agneessens. ―Reciprocity, multiplexity, and exchange: Measures.‖, Quality and
Quantity‖, 41(3):341–357, 2007.
V. K Syrotiuk, K. Shaukat, Y. J. Kwon, M. Kraetzl and J. Arnold, ―Application of a Network
Dynamics analysis Tool to Mobile Ad Hoc Networks‖, Proceedings of the 9th ACM
international symposium on Modeling analysis and simulation of wireless and mobile systems,
pp. 36 – 43, 2006
R. Toivonen, L. Kovanen, M. Kivela, J. P. Onnela, J. Saramaki and K. Kaski 2009. ―A
comparative study of social network models: network evolution models and nodal attribute
model.‖ Social Networks 31: 240-254. 2009.
B. Uzzi and J. Spiro 2005. ―Collaboration and Creativity: The Small World Problem.‖ American
Journal of Sociology 111: 447-504. 2005.
D. J. Watts and Steven H. Strogatz. 1998. ―Collective Dynamics of ‗Small-World‘ Networks.‖
Nature 393: 440–442. 1998.
B. Wellman, Renita. Yuk.-Lin. Wong, David. Tindall, and Nancy. Nazer. 1997. ―A Decade of
Network Change: Turnover, Persistence and Stability in Personal Communities.‖ Social
Networks 19:27–50. 1997.
D. Wu, and P. Mohapatra., A Wide-Area Wireless Mesh Testbed for Research and Experimental
Evaluations, COMSNET, Jan. 2010.
G. Zamora-López, V. Zlatić, C. Zhou, H. Štefančić. &and J. Kurths. ―Reciprocity of networks
with degree correlations and arbitrary degree sequences‖. Physical Review E (Statistical,
Nonlinear, and Soft Matter Physics), 77(1):016106, 2008.

4.7.8 Linkages with Other Projects


The tasks in this project will build on economic models and metrics of trust, connectivity,
communication and communities in other tasks in CTA. The output of these tasks will provide a
useful input many tasks that concentrate on building networks with specific properties.

NS CTA IPP v1.4 4-75 March 17, 2010


IPP Task Linkage
T1.1 → T2.1 Understanding how trust models impact the economic models studied
R1 ↔ T2.1 Using economic models for resource allocation and node coordination
R3 ↔ T2.1,T2.2 Validation of composite models
C2.1 ← T2.1 Characterizing the performance of collaborative networking with
concurrency
S1.3 → T2.2 How people use different channels for their relationships
S2.3 ←T2.2 How evolution of dyadic relationships contribute to community
E3.2 formation and dissolution
T1.1,E1.1↔ T2.2 How trust is measured
T3.1,E2.1,E2.2,E3.1 How trust information is propagated in the network
↔T2.2
R.2 ↔ T2.2 Understanding how loss or disruption in one network affects other
networks
C1.3 ↔ T2.2 Tasks on characterizing connectivity and information capacity and this
task will complement each other in their modeling and behavioral
analysis
C2.1 ← T2.2 Efforts on this task will help in characterizing the performance of
collaborative networking

4.7.9 Collaborations and Staff Rotations


The tasks in this project require all centers to work closely with each other. The first year will be
spent in developing the collaborations within the tasks as much as possible. We will coordinate
with BBN the planning of possible staff rotations that are most useful to these tasks.

4.7.10 Relation to DoD and Industry Research


Foundations for trust --- particularly in real-time decision making – are fundamental to the DoD.
Previous work on trust has concentrated on a single network type or a specific factor of trust
such as security or provenance. In these efforts, the emphasis has been on computing or ensuring
trust. This project is unique in its emphasis on the interactions with the network characteristics
and linkages between different network types.

4.7.11 Project Research Milestones


Project Research Milestones

Due Task Description

NS CTA IPP v1.4 4-76 March 17, 2010


A simple text based simulator for credit based
Q2 T2.1 networks, and an analysis of very simple
networks such as lines.

Extension of this analysis to trees and possibly


Q3 T2.1
complete graphs

Adding more UI elements to the simulator, and


comparison of the liquidity of decentralized
Q4 T2.1 models of trust with those of centralized
models, for simple network topologies, using
both simulations and formal analysis.

Specification of the concrete computational


Q2 T2.1
network model of trust.

Monte-Carlo simulations of test systems based


Q3 T2.1
on the models from Q2.

Analysis of the martingale pricing approach in


Q4 T2.1
models from Q2.

Initial model of how reciprocity emerges and


Q2 T2.2
the role of reciprocity in social networks.

Analysis of the model from Q2 using the test


Q3 T2.2
data sets.

Q4 T2.2 Revision of the model from Q2 and validation.

Statistical analysis of the variation in link


Q2 T2.2
qualities due to interferences.

Analysis of the variation of topological


Q3 T2.2
behavior due to node mobility.

Derivation of statistical inferences of the


Q4 T2.2 network behavioral changes due to variations in
link quality and topology.

4.7.12 Project Budget by Organization


The budget in the following table is for the first year.

NS CTA IPP v1.4 4-77 March 17, 2010


Budget By Organization

Organization Government Funding ($) Cost Share ($)


BBN (IRC) 103,667
ND (SCNARC) 38,500
RPI (SCNARC) 171,997 31,399
Stanford (CNARC) 110,240
UCD (CNARC) 110,000 14,500
UCR (IRC) 62,255
USC (CNARC) 10,000
Total 606,659 45,899

4.8 Project T3: Fundamental paradigms for enhancing trust

Project Lead: Karen Haigh, BBN (IRC)


Email: khaigh@bbn.com, Phone: 763-477-6500

Primary Research Staff Collaborators

C. Cotton, UDel (IRC) M. Srivatsa, IBM (INARC)

M. Faloutsos, UCR (IRC) N. Ivanic, ARL

K. Haigh, BBN (IRC)

A. Iyengar, IBM (INARC)

A. Kementsietsidis, IBM (INARC)

S. Krishnamurthy, UCR (CNARC)

C. Lin, IBM (SCNARC)

J. Opper, BBN (IRC)

Z. Wen, IBM (SCNARC)

NS CTA IPP v1.4 4-78 March 17, 2010


F. Wu, UCD (CNARC)

S. Zhu, PSU (CNARC)

4.8.1 Project Overview


This project looks at questions relating to trust meta-data: how to share trust meta-data so that
entities can make the right decisions in their current context, and how to design the network to
optimize the trustworthiness of the entities in the network.
Each time one network entity decides whether (or not) to trust another network entity, it factors
in a combination of its own evidence and evidence that was collected by others. It is our goal in
this task to determine what evidence needs to be propagated among entities, and how to support
that in a reliable, scalable way. We will explore the tradeoff between comprehensiveness and
scalability, and how to convey trust information across networks of different types.
In later years, we will use this understanding to explore how to design and modify the network to
enhance the trustworthiness of entities. We know that networks play a crucial role in the
development of trust. Several lines of research suggest there that certain network configurations
may be related to (or necessary for) the emergence of trust (Guimera, et al. 2005) (Uzzi and
Spiro 2005). Moreover, different types of networks are also inherently connected to each other.
Individuals interact with each other through communication networks, create and link
information with these interactions. The trustworthiness of individuals impact the trustworthiness
of the information they provide, the trusted communication channels are used more frequently,
facilitating the interaction of individuals available by that channel. The analysis of networks‘
influence on trust is then crucial in understanding what type of network architecture is needed to
achieve a certain level of trustworthiness needed for a specific mission. Our aim is to develop the
network science tools that will help us design networks that can satisfy and ensure trusted
behavior.

4.8.2 Project Motivation


Military net-centric operations are defined through complex interaction patterns between a large
number of mission participants that change dynamically. While these characteristics certainly
yield distinct advantages in enabling dissemination of the right information to the right people at
the right time, the complexity of such operations also necessitate means to act on participants
that have become untrustworthy. For example, warfighters out in the urban battleground can
become subject to overrun and adverse cyber attacks can penetrate deployed computer systems
and cause loss of confidentiality by stealing sensitive information (e.g, blue force tracking and
enemy location), loss of integrity by corrupting information (e.g., changes to attack plan orders),
and availability by causing timely information to be delayed beyond usefulness and forcing
mission participants to act on stale information. Alliances may shift, new groups may be formed
that render their cooperation unlikely.

The goal of this project is to determine how to propagate the trust meta-data so that changes in
evidence can be easily understood and accounted for, and to assess how changes in the network
can impact trustworthiness of participating nodes. For instance, sudden differences in

NS CTA IPP v1.4 4-79 March 17, 2010


communication patterns might indicate a node compromise. In addition, disagreement about
information received by multiple parties could be held against the observer and therefore reduce
his trustworthiness.

4.8.3 Key Research Questions


This project is interested in how to propagate trust meta-data in a dynamic environment. We will
investigate the tradeoff between comprehensiveness and scalability, with the intent of providing
enough meta-data to other nodes such that they can make the right trust decision for their
particular context. The question of how to convey trust from one network-type to another will
play a key role. We will examine how to maintain the trust model in a dynamic environment,
focusing on how to establish trust for newcomer nodes and revoke trust from existing nodes
when they are no longer cooperating.

4.8.4 Initial Hypotheses


This task looks at how to share trust meta-data among entities of networks of different types. We
postulate that this sharing of evidence will enable entities to make better trust decisions in the
right context, thus providing a more robust, trustworthy infrastructure for the warfighter.
Our initial hypothesis in this project is that a distributed approach to monitoring and
disseminating trust will provide increased accuracy and scalability over a centralized solution.

4.8.5 Technical Approach


Overview
The tasks in this project will study how to scalably and rapidly disseminate the trust models
developed in project T1. In later years, we will examine how to modify the network(s) to
enhance the trustworthiness of entities.
Task T3.1: Trust Establishment via Distributed Oracles (Lead: K. Haigh, BBN (IRC)). The
goal of this task is to develop methods to disseminate and propagate trust models through the
network, establish trust for newcomer nodes, and maintain the trust model in a dynamic
environment. Trustworthy Distributed Oracles (TDOs) are based on the idea of multiple nodes
sharing trust information to collaboratively support decisions of whether or not to trust members.
TDOs fundamentally develop trust assessments on entities in their neighborhood (i.e., sphere of
influence, area of responsibility) through direct interaction, through observations in their local
environment, through information conveyed by intermediaries, and through interactions with
other oracles. Specifically how these assessments are made and how local knowledge conveyed
to peers is of specific interest in this task.

NS CTA IPP v1.4 4-80 March 17, 2010


4.8.6 Task T3.1: Trust Establishment via Distributed Oracles (K. Haigh, BBN
(IRC); C. Cotton, UDel (IRC); M. Faloutsos, UCR (IRC); A. Iyengar, IBM
(INARC); A. Kementsietsidis, IBM (INARC); S. Krishnamurthy, UCR
(CNARC); C. Lin, IBM (SCNARC); J. Opper, BBN (IRC); Z. Wen, IBM
(SCNARC); F. Wu, UCD (CNARC); S. Zhu, PSU (CNARC); Collaborators:
N. Ivanic, ARL; M. Srivatsa, IBM (INARC))
Task Overview
The Trustworthy Distributed Oracles (TDOs) concept exemplifies collaboration, synergy, and
functional usefulness within a hierarchical architecture. The Oracles concept naturally mirrors
the hierarchical abstractions of the communication, information, and social/cognitive networks,
and is intended to provide the glue among them as well as the ability to integrate the distributed
analytic capabilities of those layers of abstraction at a higher layer. TDOs are based on the idea
of multiple nodes sharing trust information to collaboratively support decisions of whether or not
to trust members.
TDOs develop trust assessments on entities in their neighbourhood (i.e., sphere of influence, area
of responsibility) through direct interaction, through observations in their local environment,
through information conveyed by intermediaries, and through interactions with other oracles.
How these assessments are made and how local knowledge is conveyed to peers is of specific
interest in this task.
TDOs can also proactively modulate data flows based upon local trust. By modulation, we mean
that the Oracle perform some function of trust on the data within the flow or the flow itself —
particularly when the functions required of an Oracle that spans an interface between network
types and human-oracle interaction modalities. This requires codifying the actions that an oracle
may take in each distributed context and will depend upon the initial artifacts of the foundational
work in Project 1. As such, we do not plan on conducting substantive work in this area in year
one.
Task Motivation
Today's net-centric operations involve a large number of concurrent mission participants and
resources linked by complex interaction networks. As algorithms and mechanisms for assessing
trust evolve, important changes in the assessment picture need to trigger associated changes in
the networks. For instance, communication resources, e.g., laptops, sensors, that have become
untrustworthy need to get isolated, e.g., through firewall configurations, and information that
originated from those nodes needs to get retracted, e.g., involving provenance based
mechanisms, to ensure mission continuity even in the presence of untrustworthy nodes, including
insiders and computers taken over by the enemy.
Key Research Questions
How do we scalably disseminate and propagate trust models through the network?
How do we maintain the trust model in a dynamic environment?
How can we establish trust for newcomer nodes and revoke trust from existing nodes?
During year 1, we will be monitoring and working with other projects on the following
questions:

NS CTA IPP v1.4 4-81 March 17, 2010


In cooperation with T1.1 (Unified Trust Model), we will define how to structure the trust
metric to support scalable dissemination?
In cooperation with T2.2 (Network Behaviour Based Indicators of Trust), we will examine
how to use the trust model to drive network decisions, and how to alter, modulate, or
terminate information flows within and between each type of network. We do not plan on
conducting substantive work in this area during year 1.
Initial Hypotheses
A distributed approach to trust management provides increased scalability over a centralized
solution, reacts more quickly to local observations, and survives outages, partitions and attacks
better.
Prior Work
A large body of prior work exists on construction of trustworthy systems and the propagation of
trust elements in multiple domains. This includes research in the specification and
communication of policy in systems such as KeyNote [Blaze et al., 1999] and Ponder [Damianou
et al., 2001], localized [Golbeck and Hendler, 2006] and distributed trust inferencing [Lee et al.,
2003], and distributed reputation management [Yu and Singh, 2002]. In this task, we will use the
foundations of trust established in Project 1 to develop an oracular ontology for trust
establishment and propagation that unifies each network view. For NS-CTA, we must take an
integrative approach to develop insights into how trust may be conveyed within a network and
brokered or translated at network interfaces.
Distributed hash tables have also been widely studied with implementations that include Chord
[Stoica et al., 2001], Pastry [Rowstron and Druschel, 2001], and Tapestry [Zhao et al., 2003].
Additionally, DHT propagation strategies have been evaluated generally for efficient content
delivery [Leong et al., 2009], in the security domain for access control [Ingram, 2005] and worm
containment [Hwang et al., 2006], and specifically in reputation management [Kamvar et al.,
2003]. Our work will extend prior research in disruption-tolerant DHTs where content is
replicated to network cliques each supported by a content server.
Research in fault-tolerant systems [Anderson, 1990] has led to a number of technologies that
assess trust of various networked nodes and can tolerate arbitrary (Byzantine) corruption of some
number of nodes [Lamport, 1982] [Pal et al, 2006]. Research sponsored by DARPA under the
Self Regenerative Systems (SRS) program designed automated distributed reasoning capabilities
that can dynamically create trust assessments through a combination of knowledge of different
type, including symptomatic, reactive, malicious, and relational [Benjamin, 2008], and take
automated action to isolate nodes from the network [Pal et al, 2008). In large-scale networks,
efficient schemes for communication of the revocation space have been developed using logical
key hierarchies [Wallner et al., 1998], subset difference revocation [Naor et al., 2001], and one-
way function trees [McGrew and Sherman, 1998]. In our research, we will investigate a hybrid
approach to trust revocation that considers unilateral (irreversible) revocation and partial
revocation with incentives for suicide and possible redemption.
Technical Approach
Each TDO will maintain its own personal observations regarding interactions with other nodes.
Within a given network type, each node stores the evidence, rather than derived trust attributes,
since transformations generally lose information. Each node will make a local decision of
whether to trust based on the Unified Trust Model. Our approach of using Distributed Hash

NS CTA IPP v1.4 4-82 March 17, 2010


Tables safely and securely distributes the trust models, while distributed Certification Agents
allow newcomers to quickly join the ensemble.
We will address the following challenges:
Scalable propagation of evidence
How to reliably establish trust for newcomers
How to rescind trust
In year 1, we will select one or two simple trust models as a basis. In later years, we will look at
supporting a more complex Unified Trust Model. We anticipate challenges such as (1) managing
the tradeoff between accuracy and increased communications load, (2) adversarial models of
propagation, (3) conveyance of trust across network boundaries, and (4) propagation, replication
and security for storing the trust information in disconnected networks.
Scalable Trust Propagation. A key problem when propagating trust models over networks is the
tradeoff between scalability and comprehensiveness. Not only is it near-impossible to spread all
the knowledge to all of the nodes, it is also not a conceptually reasonable thing to do. Social
networks most clearly highlight this second issue: if a (human) individual A trusts an individual
B, across how many links can that information propagate before an individual C should ignore
the information or otherwise moderate the effect? In addition, due to the inherent characteristics
of a wireless network, the propagation of information across a large number of links cannot be
achieved with high reliability. Finally, the trustworthiness of the relays (from A to C in the above
example) could affect the information that is relayed.
To address the challenge of scalability, we will leverage the work done in cybersecurity,
communications networks, and information networks on Distributed Hash Tables (DHT), such as
Chord, Kademlia or Pastry. DHT designs seek to achieve decentralization, scalability, and fault
tolerance. The DHT manages multiple replicas of the data so that a given piece of information is
distributed throughout the network, and therefore continuous node joining, leaving, and failing
can be tolerated. DHTs store (key,value) pairs that allow nodes to quickly look up the location of
a particular piece of information, and scale to extremely large numbers of nodes. Although the
original DHTs were not designed with malicious attacks in mind, later on its security issues such
as user anonymity, data integrity, sybil attacks, routing and storage attacks were also studied.
Meanwhile, some innovative work on DHT's in MANET environment has been studied by BBN
(Vikas Kawadia, unpublished). Shao et al. [2007] have also proposed a key management
framework for DHTs where different types of keys are used to securely determine the storage
locations for different types of data to prevent outsider attacks.
In this particular context, each Oracle will maintain its own personal observations regarding
interactions with other nodes. The DHT replicates this information to some number of other
TDOs; only the replicate nodes know the true identity of the original. The DHT ensures that a
given node cannot predict where its replicates will be cached, thus increasing the difficulty of
successfully sabotaging the knowledge base. Whenever a node has a new observation, the data is
replicated.
Supporting the DHTs is an underlying scheme that the DHT has to use to disseminate content
and allow nodes to perform fetches. The communications network will need to develop a concept
of trusted paths, which are resilient to disruption and to attack. We will leverage our ongoing

NS CTA IPP v1.4 4-83 March 17, 2010


efforts in designing Sprout, a routing protocol developed for ad hoc networks, which can
mitigate the vast majority of known routing layer attacks, even when under assault from a large
number of colluding attackers. Sprout probabilistically generates a multiplicity of routes from a
source to a destination without concern for metrics, focussing on generating diverse routes; then
a route is selected based on performance. As end-to-end acknowledgements are received, the
quality of each active route is adjusted.
We will then examine the question of how much of the trust metadata needs to be propagated so
that nodes can make the right decisions in their context. At one end of the spectrum, TDOs can
represent trust values as fuzzy variables, where trust is represented as a tuple consisting of a
value and an associated uncertainty. As corroborating evidence is gathered, the uncertainty goes
down; conflicting evidence will increase uncertainty. At the other end of the spectrum, TDOs can
propagate each observation of each interaction. The ―right answer‖ for the tradeoff along this
spectrum may depend on a specific situation, incorporating factors such as the size of the
network, how well connected the network is, the risk associated with the decisions that must be
made, etc. Moreover, TDOs will also need to address issues of corroborating and conflicting
evidence; the section Revoking Trust discusses potential approaches. We will also exploit the
work being conducted in T1.2 where provenance metadata propagation will be investigated – the
idea is to augment abstract models for provenance metadata so that they are capable of not only
representing provenance but a wider set of trust metadata. This wider set will impose its own
structure and constraints that needs to be accommodated while information is transported, stored,
and processed by the TDOs.
In social networks, TDOs can be human. To support human-human trust propagation, we will
develop agent ―wrappers‖ for each individual in the network. A user interface will allow the
human to score other individuals according to different trust factors, assign a conclusion trust
value, and select which trust meta-data is propagated to other humans (customized to the
recipients).3 We also plan to model trust dissemination in social networks using our Dynamic
Probabilistic Complex Network (DPCN) model, which predicts people's behaviour in diffusing
different types of information propagation in social networks [Lin2007]. DPCN models the states
of nodes and edges as Markov models, which are infected into different states as information
spreads. We will also experiment with using these models to suggest to the user what
information to propagate to whom; this activity will help determine more detailed validation of
the models.
In year 1, we will select one or two simple models for Trust, and work on the issues relating to
propagating these. Specific challenges we will address include:
Representation and aggregation of trust metadata in small- and large-scale networks
Tradeoffs between trust dissemination (push) and fetching (pull)
Establishing Trust. Newcomers present a particular problem for distributed recommendation
systems. We would like to allow newcomers to participate, but without opening the network to
vulnerabilities if the newcomer is malicious. It has been established that Certification Agencies
of some form are required to counter coordinated attacks from groups of newcomers [Douceur

3
We can use this ―ground truth‖ information to validate social and cognitive trust models developed in other CTA
activities, notably T1.1 (Golbeck), T1.3 (Pirolli et al), and T2.2 (Faloutsos).

NS CTA IPP v1.4 4-84 March 17, 2010


2002]. However, a single certification agency is clearly a central point-of-failure. Multiple
certification agencies, meanwhile, allow attackers to get multiple pseudonyms.
Each TDO will act as a distributed certification agency, similar to the approach of Luo et al.
[2002] or ENTRAPPED [Ingram 2005]. ENTRAPPED operated in a fully connected network;
Luo et al operated in a MANET environment as long as a minimum threshold of nodes were
reachable, and also did not support creating of initial certificates (true newcomers). Our approach
will attempt to hybridize these ideas, supporting disconnected operations and also supporting
newcomers. Because DHTs attempt to maintain a single (centralized) knowledge store over
multiple (distributed) locations, DHTs will restrict the total number of pseudonyms that can be
generated. Our prior work on DHT's in MANET environments will be a starting point for this
work.
Despite having a certification agency, it may still be the case that a node has been compromised.
In other words, just because a node has been certified, does not mean it is trustworthy. For that
reason, nodes will have to make local decisions about whether to trust other nodes, based on their
own local observations, and specifically choosing whether and how much evidence to use from
other nodes' observations. This approach is how social reputation systems are built today,
although not in disconnected environments.
In year 1, we will investigate (1) the efficiency of DHTs in facilitating collaborative admission
control, particularly in disconnected networks, (2) how trust values evolve as local nodes make
local decisions.
Revoking Trust. There are two critical research issues to be addressed in trust revocation. First,
how do we determine that a node should no longer be trusted (therefore revoked)? Second, how
do we conduct trust revocation?
To answer the first question, we need to evaluate the trust level of every node and gather such
information. Given the distributed nature of the network, it is natural to request every node to be
evaluated by its direct neighbours. Indeed, most existing intrusion detection systems for
MANETs are based on neighbours' observations. Trust evidence will be uploaded to the DHT so
that everyone else can acquire it. The challenges come from the following two aspects.
First, although in our TDO model we may fully trust the Oracles, in practice because of sudden
dynamics in the network (an extreme case is that only one or no Oracle is present in the network,
making it a very distributed system), we cannot assume Oracles always have the first-hand trust
evidence about every node. That is, other regular nodes will also generate trust evidence and
input it to the system. Clearly, in this case the trust evaluation system could be abused by
malicious nodes to accuse benign nodes or to support other malicious nodes. Therefore, we need
to aggregate the (probably inconsistent) observations to reach the decision about whether a node
should be revoked. Ailon et al. [2005] address optimization problems in which one is given
contradictory pieces of input information and the goal is to find a globally consistent solution
that minimizes the number of disagreements with the respective inputs. We are considering
leveraging their algorithms to decide the optimal trust level and evaluate the risk of errors and its
resistance to abuse.
Second, because the intrusion detection systems (IDSes) deployed in every node are not perfect
due to the statistic nature, we want to understand, through both analysis and experiments, the

NS CTA IPP v1.4 4-85 March 17, 2010


impact of their individual detection accuracy on the accuracy of the whole system in revocation
decision making.
The answer to our second question (i.e., how to conduct trust revocation) is related to the first
one. Because trust is not simply a binary value, there is a fundamental tradeoff between
revocation latency and accuracy. To improve the detection accuracy, more evidence is needed,
taking more time, and thus greater damage could be result if the suspect node is indeed a
malicious one. To address this challenge, we are considering two approaches.
First, we propose the idea of partial revocation. The basic idea is to increase/limit the functions
and participation of a node in the system based on its trust level. For example, a node may
forward messages of different priority levels through different neighbours based on their trust
levels and forward fewer messages for neighbours with lower trust levels.
Second, for better revocation immediacy and abuse resistance, we will leverage the idea of
suicide-based trust revocation [Reidt et al., 2009]. Here, when a node accuses another node, both
of them will be temporarily revoked. This will allow immediate revocation of nodes while
preventing a malicious node from abusing the system. Later on, when the investigation (by
TDOs in our setting) turns out to support the accusation, the accused one will be permanently
revoked, while the accuser will be reincarnated. To give the incentive for volunteering as the
accuser, some bonus (e.g., b (0<b<1) additional identity) may be awarded in the case of a
justified suicide. We are investigating the following approaches for allocating this bonus. First,
in a tactical network, what are the right choices for incentive and how to integrate it with partial
revocation (e.g., partial suicide)? Second, in their scheme every node simply rates another one as
either trusted or not trusted and node mobility is not considered. With trust evidence provided
through DHTs in our system, we will study the efficiency of such a suicide scheme under various
mobility models.
In year 1, we will focus on the second question, i.e., how to conduct trust revocation. In later
years, we will address the first question and integrate it with DHTs.
Validation Approach
Working closely with task R3: Experimentation with Composite Networks, we will develop
specific decision making scenarios and validate them against ground truth to the extent possible.
We will run these scenarios trying various points in the spectrum of trust meta-data,
establishment approaches and revocation schemes, for each of the network types and for the
composite networks.
One important way to evaluate TDOs as a trust mechanism is through economic analysis. By
modelling participants as economic agents, we can characterize under what conditions they will
use TDOs as designed, revealing relevant information to these nodes and incorporating the TDO
reports and certifications in their own decision making. Identifying opportunities for strategic
behaviour within TDOs can in turn make it possible to identify alternative designs that make
different tradeoffs between robustness against rational behaviour and informativeness when used
to aggregate data provided by fully cooperative participants. Economic analysis can also quantify
the improvements in efficiency (better coordination, deterring deceptive activity etc.), if any, that
is enabled by a TDO mechanism. To perform this analysis, we will develop a multiagent
economic model for the TDO use scenario, in collaboration with the project R1: Economic
Modeling (IRC, Parkes and Wellman).

NS CTA IPP v1.4 4-86 March 17, 2010


In addition to the social network validation effort described above, we will investigate the
increase in people's social influence because of the TDO mechanisms. The social influence can
be measured by shared information and interests between social neighbours [Wen 2010a]. In
addition, we validate TDOs by people's productivity improvement that results from the TDO
mechanisms. People's productivity can be measured by the revenue they generated in our IBM
SmallBlue dataset. Alternatively, when the revenue data are not available, we will measure
people's productivity by classifying their communication data, using both content and network
features [Wen 2010b].
Research Products
Year 1 will focus on developing a complete architecture description, with APIs between the
DHT models, routing models, trust models, trust revocation, and user interfaces. Key milestones
include the architecture document, complete with interactions among components, and strategies
for collaboration.
Given that several of the component modules already have initial implementations, we hope to
begin integration efforts this year. We will start modifying each component to support the
capabilities desired for NS-CTA, and conduct experiments to validate that the approach is
appropriate for the problem.
References
N. Ailon, M. Charikar and A. Newman. Aggregating inconsistent information: ranking and
clustering. In Proceedings of the Thirty-Seventh Annual ACM Symposium on theory of
Computing (STOC '05). 2005
T. Anderson, Fault Tolerance: Principles and Practice, 2nd edition, 1990. Springer-Verlag New
York, Inc
P. Benjamin, P. Pal, F. Webber and M. Atighetchi, Using a Cognitive Architecture to Automate
Cyberdefense Reasoning. Proceedings of the 2008 ECSIS Symposium on Bio-inspired, Learning,
and Intelligent Systems for Security (BLISS 2008), IEEE Computer Society, August 4-6, 2008,
Edinburgh, Scotland.
M. Blaze, J. Feigenbaum, J. Ioannidis, and A. D. Keromytis. The KeyNote Trust Management
System Version 2. Internet RFC 2704, September 1999.
N. Damianou, N. Dulay, E. Lupu and M. Sloman. The Ponder Policy Specification Language, in
Policy 2001: Workshop on Policies for Distributed Systems and Networks, 2001, Bristol, UK:
Springer-Verlag LNCS 1995.
J. Golbeck and J. Hendler. 2006. Inferring Trust Relationships in Web-Based Social Networks,
ACM Transactions on Internet Technology, 6(4).
K. Hwang, Y.-K. Kwok, S. Song, M. C. Y. Chen, Y. Chen, R. Zhou, X. Lou, GridSec: Trusted
Grid Computing with Security Binding and Self-Defense against Network Worms and DDoS
Attacks, In Int’l Workshop on Grid Computing Security and Resource Management (GSRM 05),
Springer-Verlag, 2005
D. Ingram, An Evidence Based Architecture for Efficient, Attack-Resistant Computational Trust
Dissemination in Peer-to-Peer Networks, Third International Conference on Trust Management,
May 2005.

NS CTA IPP v1.4 4-87 March 17, 2010


S. D. Kamvar, M. T. Schlosser, and H. Garcia-Molina. EigenRep: Reputation Management in
P2P Networks. In World-Wide Web Conference, 2003.
L. Lamport, R. Shostak, and M. Pease. The Byzantine Generals Problem. ACM Transactions on
Programming Languages and Systems 4 (3):July 1982 382-401.
S. Lee, R. Sherwood, and B. Bhattacharjee. Cooperative Peer Groups in NICE. In IEEE Infocom,
San Francisco, CA, Apr. 2003.
D. Leong, T. Ho, R. Cathey, Optimal Content Delivery with Network Coding, In Conference on
Information Sciences and Systems, 2009.
C.-Y. Lin. Information flow prediction by modeling dynamic probabilistic social network. In
International Conference on Network Science, New York, May 2007.
H. Luo, J. Kong, P. Zerfos, S. Lu and L. Zhang, Self-securing Ad Hoc Wireless Networks, in
IEEE ISCC, 2002.
D. McGrew and T. Sherman. Key establishment in large dynamic groups using one-way function
trees. Technical Report 0755, TIS Labs at Network Associates Inc., Glenwood, MD, May 1998.
D. Naor, M. Naor, and J. Lotspiech. Revocation and tracing schemes for stateless receivers. In
Advances in Cryptology (CRYPTO), 2001.
P. Pal, P. Rubel, M. Atighetchi, F. Webber, W. H. Sanders, M. Seri, H. Ramasamy, J. Lyons, T.
Courtney, A. Agbaria, M. Cukier, J. Gossett, I. Keidar, An architecture for adaptive intrusion-
tolerant applications, Software: Practice and Experience, Volume 36, Issue 11-12. (September -
October 2006) (p 1331-1354)
P. Pal, F. Webber, P. Rubel, M. Atighetchi, P. Benjamin. Automating Cyber Defense
Management. Second International Workshop on Recent Advances in Intrusion Tolerant Systems
(WRAITS 2008), Glasgow, UK, April, 2008.
A. Rowstron, A., and P. Druschel, Pastry: Scalable, distributed object location and routing for
large-scale peer-to-peer systems. In Proceedings of the 18th IFIP/ACM International Conference
on Distributed Systems Platforms (Middleware 2001) (Nov. 2001).
M. Shao, S. Zhu, W. Zhang, and G. Cao. pDCS: Security and Privacy Support for Data-Centric
Sensor Networks. In Proc. of IEEE Infocom'07, May 2007.
D. Wallner, E. Harder, and R. Agee, Key management for multicast: issues and architectures,
RFC 2627, 1998.
Z. Wen and C.-Y. Lin. Towards Finding Valuable Topics. To appear in SIAM conference on
Data Mining, April 2010.
Z. Wen and C.-Y. Lin. Towards Finding Valuable Topics. To appear in SIAM conference on
Data Mining, April 2010.
B. Yu and M. P. Singh. An Evidential Model of Distributed Reputation Management. In
Proceedings of the 1st International Joint Conference on Autonomous Agents and MultiAgent
Systems (AAMAS), 2002.
B. Y. Zhao, J. D. Kubiatowicz, and A. D. Joseph. Tapestry: An infrastructure for fault- resilient
wide-area location and routing. Technical Report UCB//CSD-01-1141, U. C. Berkeley, April
2001.

NS CTA IPP v1.4 4-88 March 17, 2010


4.8.7 Linkages with Other Projects
The work in this project will build on the work being done in T1 and T2, by taking into account
models, metrics of trust and network dynamics related to trust.
IPP Task Linkages
T1.1 ← T3.1 What is the unified trust model to disseminate
T1.1, T1.3, Validation approaches for human trust models
T2.2 → T3.1
T2.2 ← T3.1 How can we use the trust model to drive network decisions
R.1 ↔ T3.1 Using economic models to validate the TDOs and trust mechanism
R.3 ← T3.1 Validation of composite models
C1.4 → T3.1 The provenance and credibility model could help in building the trust
propagation models considered here.
E1.1 ↔ T3.1 Network and oracular ontology development
E2.1/E2.2/E3.1 Validation approaches for models of trust propagation
→T3.1
E3.2 → T3.1 Co-evolution relationship with trust revocation and reinstatement
E4.2 → T3.1 Mobility and DHT clique formation

4.8.8 Collaborations and Staff Rotations


We expect intensive online interactions as well as some extensive personal interactions through
extended visits and temporary relocations that can foster joint efforts. Specific staff rotation
schedule has not been finalized yet.

4.8.9 Relation to DoD and Industry Research


Trust is exceedingly difficult to ensure. The existing information infrastructures are largely not
adequate for future needs. Many past and ongoing R&D efforts have been only partially
successful. The emphasis in this project will be in dealing with propagation of meta-information
about trust as a whole which in the past have been addressed only piecemeal and mostly
inadequately.

4.8.10 Project Research Milestones

Project Research Milestones

Due Task Description


Q2 T3.1 Initial Architecture Description
a) Complete Architecture description, with
APIs and interactions among components.
Q3 T3.1 b) Initial modifications to component
algorithms to support API and expected
behaviour.

NS CTA IPP v1.4 4-89 March 17, 2010


Validation of utility of component algorithms
Q4 T3.1
for this domain.

4.8.11 Project Budget by Organization


The budget in the following table is for the first year.

Budget By Organization

Organization Government Funding ($) Cost Share ($)


BBN (IRC) 203,128
IBM (INARC) 41,967
IBM (SCNARC) 23,557
PSU (CNARC) 35,000
UCD (CNARC) 30,000
UCR (CNARC) 27,480
UCR (IRC) 62,256
UDEL (IRC) 102,989
Total 526,377

NS CTA IPP v1.4 4-90 March 17, 2010


5 CCRI EDIN: Evolving Dynamic Integrated (Composite)
Networks

Coordinator: Prithwish Basu, BBN (IRC)


Email: pbasu@bbn.com, Phone: 617-873-7742
Government Lead: David Dent, ARL
Email: dave.dent@arl.army.mil, Phone: 301-394-1865

Project Leads Lead Collaborators


Project E1: P. Basu, BBN (IRC); J.
A. Swami, ARL
Hendler, RPI (IRC, SCNARC)
Project E2: J. Garcia-Luna-Aceves, UC
Santa Cruz (CNARC); P. Basu, BBN
(IRC)
Project E3: A. Singh, UC Santa Barbara
(INARC); B. Szymanski, RPI (SCNARC)
Project E4: T. La Porta, Penn State
University (CNARC)

Table of Contents
5 CCRI EDIN: Evolving Dynamic Integrated (Composite) Networks .................................... 5-1
5.1 Overview ......................................................................................................................... 5-3
5.2 Motivation ....................................................................................................................... 5-4
5.2.1 Challenges of Network-Centric Operations ............................................................. 5-4
5.2.2 Example Military Scenarios ..................................................................................... 5-5
5.2.3 Impact on Network Science ..................................................................................... 5-6
5.3 Key Research Questions ................................................................................................. 5-8
5.4 Technical Approach ........................................................................................................ 5-9
5.5 Project E1: Ontology and Shared Metrics for Composite Military Networks .............. 5-11
5.5.1 Project Overview ................................................................................................... 5-11
5.5.2 Project Motivation ................................................................................................. 5-11
5.5.3 Key Project Research Questions ............................................................................ 5-13
5.5.4 Initial Hypotheses .................................................................................................. 5-13
5.5.5 Technical Approach ............................................................................................... 5-13
5.5.6 Task E1.1: Harmonized Vocabulary and Ontology for Composite Network
Modeling (J. Hendler, RPI (IRC, SCNARC); C. Partridge, BBN (IRC); A. Singh, UC Santa,
Barbara (INARC); A. Bar-Noy, CUNY (CNARC). D. Dent, Cpt. S. Shaffer, and A. Swami,
ARL) 5-14

NS CTA IPP v1.4 5-1 March 17, 2010


5.5.7 Task E1.2: Shared metrics for Composite Military Network Analysis (W. Leland, P.
Basu, BBN (IRC); A. Bar-Noy, CUNY (CNARC); J. Hendler, RPI (IRC, SCNARC); A.
Singh, UC Santa Barbara (INARC)) .................................................................................. 5-19
5.5.8 Linkages with Other Projects ................................................................................. 5-22
5.5.9 Collaborations and Staff Rotations ........................................................................ 5-23
5.5.10 Relation to DoD and Industry Research .............................................................. 5-23
5.5.11 Project Research Milestones ................................................................................ 5-24
5.5.12 Project Budget by Organization ........................................................................... 5-24
5.6 Project E2: Mathematical Modeling of Composite Networks ..................................... 5-25
5.6.1 Project Overview ................................................................................................... 5-25
5.6.2 Project Motivation ................................................................................................. 5-26
5.6.3 Key Research Questions ........................................................................................ 5-26
5.6.4 Initial Hypothesis ................................................................................................... 5-27
5.6.5 Technical Approach ............................................................................................... 5-27
5.6.6 Task E2.1: Unifying Graph Representations of Composite Networks (P. Basu, I.
Castineyra, BBN (IRC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY (CNARC); A.
Singh, X. Yan, UCSB (INARC); R. D‘Souza, UC Davis (CNARC); A. Swami, ARL)... 5-27
5.6.7 Task E2.2: Network Models that Capture Time (J.J. Garcia-Luna-Aceves, H.
Sadjadpour, UCSC (CNARC); C. Faloutsos, CMU (INARC); P. Basu, BBN (IRC); R.
Ramanathan, BBN (CNARC); A. Singh, UCSB (INARC); A. Swami, ARL).................. 5-30
5.6.8 Task E2.3: Interplay of Power Laws vs. Optimization Models and Random Graphs
(J.J. Garcia-Luna-Aceves, H. Sadjadpour, UCSC (CNARC); C. Faloutsos, CMU (INARC);
P. Basu, BBN (IRC); M. Faloutsos, UC Riverside (IRC); D. Towsley, UMass (IRC)) .... 5-32
5.6.9 Linkages to Other Projects ..................................................................................... 5-34
5.6.10 Relevance to US Military Visions/Impact on Network Science .......................... 5-34
5.6.11 Collaborations and Staff Rotations ...................................................................... 5-35
5.6.12 Relation to DoD and Industry Research .............................................................. 5-35
5.6.13 Project Budget by Organization ........................................................................... 5-36
5.7 Project E3: Dynamics and Evolution of Composite Networks .................................... 5-37
5.7.1 Project Overview ................................................................................................... 5-38
5.7.2 Project Motivation ................................................................................................. 5-38
5.7.3 Key Project Research Questions ............................................................................ 5-39
5.7.4 Initial Hypotheses .................................................................................................. 5-39
5.7.5 Technical Approach ............................................................................................... 5-40
5.7.6 Task E3.1 Analysis of Causal Structure of Network Interactions (L. Adamic,
Michigan (INARC); P. Basu and W. Leland, BBN (IRC); C. Faloutsos, CMU (INARC); J.J.
Garcia-Luna-Aceves, UCSC (CNARC); A. Singh, UCSB (INARC); Q. Zhao, UC Davis
(CNARC); Post-doc researcher (PSU); A. Swami, ARL) ................................................. 5-40
5.7.7 Task E3.2: Co-evolution of Social, Information, and Communication Networks with
Dynamic Endogenous and Exogenous Processes (A. Vespignani, Indiana (SCNARC); S.
Wasserman, Indiana (SCNARC); A. Pentland, MIT (SCNARC); H. Makse, CUNY
(SCNARC); G. Korniss, RPI (SCNARC); L. Adamic, Michigan (INARC); N. Contractor,
Northwestern (INARC); J. Han, UIUC (INARC); C.-Y. Lin, IBM (SCNARC); Z.
Toroczkai, ND (SCNARC); N. Chawla, ND (SCNARC); R. D‘Souza, UC Davis (CNARC);
A. Swami, ARL) ................................................................................................................ 5-44

NS CTA IPP v1.4 5-2 March 17, 2010


5.7.8 Task E3.3: Data-driven Modeling and Simulation of Dynamic Community
Evolution (N. Contractor, Northwestern (INARC); J. Han, UIUC (INARC); Z. Toroczkai,
ND (SCNARC); N. Chawla, ND (SCNARC); David Hachen, ND (SCNARC); Omar
Lizardo, ND (SCNARC); A.-L. Barabasi, NEU (SCNARC); B. Szymanski, RPI
(SCNARC))........................................................................................................................ 5-48
5.7.9 Linkages with Other Projects ................................................................................. 5-54
5.7.10 Collaborations and Staff Rotations ...................................................................... 5-56
5.7.11 Relation to DoD and Industry Research .............................................................. 5-56
5.7.12 Project Research Milestones ................................................................................ 5-56
5.7.13 Project Budget by Organization ........................................................................... 5-57
5.8 Project E4: Modelling Mobility and its Impact on Composite Networks ................... 5-59
5.8.1 Project Overview ................................................................................................... 5-59
5.8.2 Project Motivation ................................................................................................. 5-59
5.8.3 Key Project Research Questions ............................................................................ 5-60
5.8.4 Initial Hypotheses .................................................................................................. 5-60
5.8.5 Technical Approach ............................................................................................... 5-61
5.8.6 Task E4.1: Deriving Data-Driven Models and Understanding Human Mobility (A.-
L. Barabasi and D. Lazer, NEU (SCNARC); B. Szymanski, RPI (SCNARC); S. Pentland,
MIT (SCNARC); H. Makse, CUNY (SCNARC); T. La Porta, PSU (CNARC)).............. 5-61
5.8.7 Task E4.2 - Deriving Metric-Driven Mobility Models (K. Psounis, USC (CNARC);
P. Mohapatra, UC Davis (CNARC); T. La Porta, PSU (CNARC); P. Basu, BBN (IRC); T.
Brown, CUNY (CNARC); A. Swami, ARL) .................................................................... 5-64
5.8.8 Task E4.3: Modeling the Impact of Mobility on Composite Network Properties
(deferred task) .................................................................................................................... 5-66
5.8.9 Linkages with Other Projects ................................................................................. 5-66
5.8.10 Collaborations and Staff Rotations ...................................................................... 5-67
5.8.11 Relation to DoD and Industry Research .............................................................. 5-67
5.8.12 Project Research Milestones ................................................................................ 5-67
5.8.13 Project Budget by Organization ........................................................................... 5-67
References .................................................................................................................................. 5-69

5.1 Overview

EDIN CCRI will focus on the modeling and mathematical representation of dynamic, composite
networks comprised of social, information and communication networks, and how stimuli, both
internal (within a specific type of network) and external (from a different network type) impact
the dynamic co-evolution of the composite network structure. EDIN will also investigate a
particular type of network dynamics, i.e., mobility, which is particularly important in the military
networks context.

NS CTA IPP v1.4 5-3 March 17, 2010


5.2 Motivation

A basic tenet of Blue Forces Network-Centric Operations (NCO) is that the mission
effectiveness of networked forces can be improved by information sharing and collaboration,
shared situational awareness, and actionable intelligence. The effectiveness of such networks is
dependent on our ability to accurately anticipate the evolution of the structure and dynamics of
social-cognitive (SCN), information (IN) and communication (CN) networks that are constantly
influencing each other. While partly controlled by design and by TTPs (Tactics, Techniques and
Procedures), these networks also exhibit complex emergent restructuring and dynamics that
depend on mission context and the adversarial environment.

The objective of a tactical network may be viewed as delivering the right information at the right
time to the right user (persons, applications, and systems) so as to enable timely and accurate
decision-making (e.g., to enable the soldier to effectively ―shoot, move, and communicate‖), and
therefore, mission success. The tactical network is composed of multiple interacting networks:
communications networks, information networks, and command-and-control or social networks.
Understanding the structure of the component networks and the dynamics therein, and of the
dynamic interactions between these networks is crucial to the design of robust interdisciplinary
(or composite1) networks which is one of the primary goals of the NS CTA program.
Understanding the evolution and dynamics of a network entails understanding both the structural
properties of dynamic networks and understanding the dynamics of processes (or behaviors) of
interest embedded in the network. Typically, the dynamics of network structure impacts certain
processes (e.g., how information propagates through the network); but at the same time the
dynamics of processes (or behaviors) may result in alteration of network structures. Therefore,
gaining a fundamental understanding of such relationships under several kinds of network
dynamics is of paramount importance for obtaining significant insights into the behavior and
evolution of complex military tactical networks as well as adversarial networks.

5.2.1 Challenges of Network-Centric Operations


Decision-makers in this complex network of networks generally take individual or collaborative
decisions that are impacted by their dynamic surroundings, which may include the current state
of various network variables or extraneous non-network variables; and they expect to get a
certain benefit from taking such a decision while incurring a certain cost (in a generalized sense).
They ideally want to make optimal decisions assuming certain knowledge of past, present, and
expected future state of the network and non-network variables but are often forced to operate
with incomplete knowledge under a time deadline. This situation is made even more complex in
the case of composite networks where a particular decision may have a significant impact on the
future structure of the network. While the individual networks (CN, IN, and SCN) have been
studied in isolation to some extent, the lack of a common mathematical language and
mathematical formalism to jointly represent these interacting networks has hindered
understanding of such complex networks as a whole. Our understanding of the structure and
dynamics of these networks, and of how they interact with one another, is limited at best.

1
We use the terms interdisciplinary and composite interchangeably in the context of EDIN.

NS CTA IPP v1.4 5-4 March 17, 2010


Few theoretical results and fewer yet analytical tools exist for estimating the temporal evolution
of structure and internal dynamics of networks that comprise and interweave large-scale social-
cognitive domains, information elements of diverse nature and origins, and communications that
range from non-verbal body-language cues to modern wireless and internet links. There is a
limited understanding of how to take into account the diversity of interactions at different time
scales (from seconds to years) and spatial scales (from village to a country; from one-hop
neighborhood to the global internet) both within and across networks; and the heterogeneity in
the nodes, links, and behaviors of social networks, information sources, and communication
networks. There is also a limited understanding of how the relevant theories and tools can be
experimentally confirmed, calibrated, and validated; how to obtain appropriate data to initialize
the models; and how to confirm or disconfirm the analytical models‘ specific quantitative or
qualitative assumptions, especially when operating in an environment that is adversarial and
often subject to deliberate disinformation.

5.2.2 Example Military Scenarios


Consider an Infantry Brigade Combat Team (IBCT) with approximately 3000 soldiers playing
one or more roles toward several objectives. Due to its flexible design, an IBCT is highly capable
in mixed terrain defense, urban combat, mobile security missions, and stability operations, and
additionally, has sufficient motor transport to support most missions [IbctURL].
From the doctrinal characteristics, it is clear that an IBCT is a dynamic ―on-the-move‖ entity that
depends on dynamic communications networks to achieve high-tempo operations as well as
performs ISR (Intelligence, Surveillance, and Reconnaissance) operations by making use of
sensor information from a variety of dynamic information sources. Also, it consists of soldiers
with roles and levels of expertise spanning a large range. Since they are highly deployable and
versatile, spatio-temporal dynamics are common in all three types of networks embedded in an
IBCT over the lifetime of a mission. For example, frequent mobility of soldiers makes the
underlying CN structure extremely dynamic; the information requirements of a mission may
evolve over time based on events on the ground thus affecting the structure of the IN pertaining
to the current goals of the mission; changing the fraction of senior personnel participating in a
particular exercise or assigning multiple roles to soldiers causes dynamics in the underlying
social networks. Dynamics in each network typically impacts the properties of the composite
network (CN+IN+SCN) that evolves and morphs itself over time. For example, if an information
source is inaccessible due to jamming, soldiers move themselves or fuse information from other
sources on the fly to accomplish the mission. Sometimes forging operational social relationships
that are not mandated by the doctrine might get a particular task done in an expeditious manner.
Since such dynamic network reconfiguration steps can have a profound impact on mission
success/failure, it is very important to study the theoretical underpinnings of composite network
evolution.
Development of models that not only use predictive power but also ensure this prediction is also
based on data collected and stored by previous military commands conducting similar operations
in the same region. Typically units conduct battle-space handovers when one unit takes over for
another, but there is a lot of information passed off to the gaining unit typically over a 1-2 month
transition time where they must decipher and begin to apply the information to their current
mission. Any technology that performs this for the commander will make his/her unit highly

NS CTA IPP v1.4 5-5 March 17, 2010


effective for a longer period of time, especially at the beginning phases; decision barriers will be
broken down by this technology, and commanders will make more effective en-route decisions.
Another problem arises in Army counterinsurgency (COIN): a basic tenet of the COIN warfare
doctrine is to focus on influencing the civilian population — a complex network of people and
groups, along with their information, knowledge and beliefs, and their means of communications
— by providing it with security and applying a broad range of social, economic and
informational inputs in conjunction with minimal necessary force. A critical element in realizing
the COIN vision is that planning and executing counterinsurgency operations must anticipate the
short and long-term impact of the military, economic, and informational actions taken by Blue
(coalition) forces. This prediction of anticipated evolution over time in structure and dynamics of
the Green (population) and Red (insurgent forces) networks must be performed in a largely
distributed fashion, using insights and potentially inconsistent information distributed through
multiple organizations, in a highly dynamic and adversarial environment, in a broad range of
contexts, subject to partial observability, uncertainty, and information warfare. Development of
models with predictive power can help a decision-maker play out various scenarios and make a
decision that optimizes a certain objective.

5.2.3 Impact on Network Science


The EDIN research program promises to have a significant impact on the science of composite
networks, which simply does not exist right now, or is nascent at best. Social, information, and
communication networks will all impose constraints on, as well as provide degrees of freedom
in, how information may be aggregated and delivered. CN structures evolve due to mobility,
fading, adversarial jamming, changes in traffic patterns, and power usage; SN structures evolve
due to changes in relationships that can be caused by large scale mobility or other factors; IN
structures may evolve due to changes in the characteristics of the sources, media, consumers, or
even the information fusion scheme. Not only does the network structure evolve due to such
dynamics in their own network, but it also evolves because of dynamics in another network.
Most critically, the constraints that these composite networks impose on the available knowledge
and capabilities of the entities acting at each level will profoundly influence the decisions of
these many cooperating and competing entities (people, knowledge services, and devices) – and
thereby constrain the actions and decisions by which these entities modify and evolve the
structures of these networks. Therefore, we are faced with a key challenge of uncovering the
rules of composite network formation and evolution with the objective of constructing a
predictive model.
There are several fundamental problems that are common to the three networks, even when they
are viewed separately:
Discovery and characterization of resources (discovery of services in CN, information
objects in IN, and known acquaintances in SCN).
Characterization of substructures (identification of communities in SN and clusters in
CN; identification of weak links in SN, bottleneck nodes and links in CN).
Time-scale decompositions (characterizing effects in space and time, and the scales at
which different components work and interact).

NS CTA IPP v1.4 5-6 March 17, 2010


Characterization of dynamic network structures from partial observations (e.g., discovery
of command structures in SN; intrusion detection in CN; network monitoring in CN).
Characterization of the constraints on the decision-making capabilities of entities
embedded in the structures, and on the modifications that these entities can make to the
networks.
Characterization of the goals of entities of every network from partial observations (e.g.,
at each network level: observed trade-offs made and resources allocated; observed
responses to probe actions; observed responses to battle damage).
Network stability (often measured by resilience to node/link failures and adversarial
attacks).
Stability of goals, potentially inferred from resilience of trade-offs and resource
allocations in the face of losses or structural network changes.
However, the primary goal of the EDIN CCRI projects is to understand the analogues of the
above problems in the context of composite network operation. For example:
Discovery of an SCN node or relationship using information from SCN, IN and CN.
Inference of goals in CN and IN from SCN relationships (e.g., organizational structures).
Characterization of dynamic structure of SCN or IN (of, say, an adversary network) from
gathered CN or SCN traces.
Prediction of the evolution of CN structure driven by IN or SCN goals and costs.
Time-scale decomposition in composite CN, SCN, IN: e.g., when end-to-end user traffic
demand and types of traffic grow (relatively faster time scale), the CN service providers
may be forced to add more routers or increase their capacity (slower upgrade cycle).
Similarly, when a CN is disrupted (fast time scale), the IN may be forced to rely on
Disruption-Tolerant Networking (DTN) techniques (e.g., data mules or message ferrying
— to connect isolated networks it may be necessary to carry data physically [Shah03]),
ultimately forcing dramatic changes in SCN control costs, goals, and strategies for
extended periods.
Composite network stability under different time scales: changes in CN technologies can
affect SN relationships, and degrees of freedom in a SN can affect CN capabilities.
Changes in capabilities will in turn change the constraints on the available strategies of
the interacting networks; changed strategies and goals in one network genre may ensure,
say, SCN stability in the face of CN instability. Indeed, connectivity in one network (e.g.,
CN or SN) need not imply connectivity in another network (e.g., IN or CN), and
heterogeneity or homogeneity in one does not imply the same characteristic in another. In
general, resource allocation and dynamics in one network can clearly affect metrics of
interest in the other networks, possibly at different time scales.

NS CTA IPP v1.4 5-7 March 17, 2010


5.3 Key Research Questions

The driving goal of EDIN research is to understand structure and dynamics of co-evolving
composite networks leading to predictive models that enable control of composite networks.

It is clear that a common mathematical language and mathematical formalism is required to


jointly represent communication, social, and information networks and the interactions between
them. There are several fundamental research questions pertaining to the science of evolving and
dynamic composite networks that need to be addressed separately and jointly by the Centers,
which we outline below. Note that this list is not exhaustive and is only a representative list of
broad questions regarding modeling, analysis, and design concerning the science of interacting
and composite networks. Moreover, these research questions constitute the longer-term vision
for EDIN, and we will only address a smaller subset of these questions in the first year of the
program.
Joint representation and modeling of dynamic composite networks
Since structures and dynamics of one network impact those on the other networks, it is
imperative that a common mathematical model and common approaches be used in representing
and studying these networks jointly.
How should dynamic composite networks be represented? What are the fundamental
attributes? How do we represent dynamic network structures succinctly but with
sufficient richness? How do we incorporate probabilistic or fuzzy information?
How to model time-varying dependencies (from seconds to years) and interaction
between devices, information objects, and humans? How to model conditional
triggers/interactions between pairs of networks? How to model interaction with adversary
networks, which may only be partially known or even observable?
How to represent these networks at multiple levels of granularity? Are continuous
domain methods useful?
How can we efficiently query such networks for substructures inside them?
Dynamics and evolution of composite networks and analysis of properties
Given a network composed of SCN, IN, and CN component networks, we need to analyze and
develop method to predict and model how the composite network behaves and performs.
What are the factors and rules of network formation and the subsequent co-evolution?
How do we predict the composite effects of both emergent properties from simple rules
and complex engineered behavior from complex rules?
What are the objectives of a composite network? What are good cross-cutting shared
metrics for measuring the network‘s effectiveness in accomplishing the goal? Can the
overall network function (metrics) be systematically composed from constituent network
functions (metrics)?
What are the dynamic processes executing inside static or dynamic networks? How do
they act as stimuli and impact the modification of network structure and properties? How
do changes in local structure affect global properties/behaviors being subject to network

NS CTA IPP v1.4 5-8 March 17, 2010


formation constraints? How does composite network topology affect fault tolerance, i.e.,
with respect to the mission objective?
What is the effect of deterministic and stochastic evolution in one network (e.g., spatio-
temporal dynamics in SCN) on another network (e.g., CN) at widely varying time scales?
What dynamic coupling between a cognitive network and known/observed human
interaction can help determine the right method to connect to the network of information?
How can we formalize the dynamics between a friendly network with known structure
and adversary network (whose structure is only partially known)?
Are there invariants (in a statistical sense) of a network as it evolves over time? What are
appropriate joint scaling laws and when do they manifest themselves? Given a co-
evolution process where SCN and CN are co-evolving, what are the fundamental limits
on information flow?
Design of a composite network that meets desired goals
The ultimate goal is to design a composite network (CN+IN+SN) structure and to be able to
monitor and control its evolution to meet desirable mission objectives. Key questions here
include the following: (NOTE: the ―network design‖ aspects will be investigated in later years of
the program)
What factors are directly controllable and which ones can be controlled indirectly?
What network state is directly observable and what can only be inferred and how?
How to optimally control network structure with respect to a given cross-cutting metric
by modifying a few key network variables while taking into account constraints imposed
by network formation rules?

5.4 Technical Approach

The complexity of the evolving dynamics of composite networks and its importance to Network-
Centric Operations and the Army mission require a composite CCRI research plan that exploits
and unifies a wide range of approaches.
In the subsections below, we describe in detail four complementary projects that involve tight
collaboration between CNARC, INARC, IRC, and SCNARC to study different aspects of
dynamic evolving networks while addressing the common theme.

Project E1: Ontology and Shared Metrics for Dynamic Composite Networks
This project is focused on developing a shared vocabulary and ontology across social,
information, and communication networks. Specifically, it will identify entities in a composite
network and their attributes; what are the relationships between them and how do they affect
network formation; what are the metrics that need to be defined across composite networks
irrespective of their representation structure (this will include metrics relevant to tactical
missions that use all 3 networks).

NS CTA IPP v1.4 5-9 March 17, 2010


Project E2: Mathematical Modeling of Composite Networks
This project will focus on the development of mathematical representations, models, and tools to
capture the salient aspects of dynamic composite networks and their evolution. Since our
knowledge about composite network structures is in its infancy, we will explore a number of
orthogonal mathematical modeling approaches rather than focusing on one or two. We will
investigate composite multi-layered graph theory, tensor analysis tools, temporal graphlets,
dynamic random graphs, constrained optimization etc. Since each technique is likely to have
advantages and disadvantages with respect to the specific scenario, we have to be careful to not
declare winners or losers too early in the research.

Project E3: Structure and Co-evolution of Composite Networks


This project is focused on the investigation of the temporal evolution of various structural
properties of integrated networks; We will study the short-term effects of network stimuli and
dynamics, e.g., arrival of new information flows, deletion of nodes, on the properties of a given
composite network and also investigate how composite networks co-evolve in longer time scales
due to exogenous and exogenous influences. This project will also focus on the development of
cross-cutting approaches for the prediction of dynamic evolution of networked communities
(groups of soldiers, clusters of similar documents etc.) using multi-theoretic multi-level
modeling techniques.

Project E4 - Modeling Mobility and its Impact on Composite Networks


The goal of this project is to develop a suite of mobility models that capture metrics of specific
interest to the evolution of different types of networks, and ultimately the evolution of composite
networks. We also plan for models that will make use of the motivation for movement. These
models will allow synthetic traces to be generated that are statistically close to actual traces, but
that may represent a large set of scenarios. These models may then be used by the core programs
of the CTA to determine the impact of mobility on social, information and communication
networks.
We believe that the above four projects together give us a well-rounded research program for the
development of the science for dynamic composite networks. We will track the progress of these
projects regularly and will add or subtract projects and tasks from the mix over the next five
years in order to keep the research both scientifically and militarily relevant.

NS CTA IPP v1.4 5-10 March 17, 2010


5.5 Project E1: Ontology and Shared Metrics for Composite
Military Networks

Project Lead: Jim Hendler, RPI (IRC, SCNARC);


Email: hendler@cs.rpi.edu, Phone: 518-276-4401
Project Lead: Prithwish Basu, BBN (IRC)
Email: pbasu@bbn.com, Phone: 617-873-7742

Primary Research Staff Collaborators


A. Bar-Noy, CUNY (CNARC) D. Dent, ARL
P. Basu, BBN (IRC) Cpt. S. Shaffer, ARL
J. Hendler, RPI (IRC, SCNARC) A. Swami, ARL
W. Leland, BBN (IRC)
C. Partridge, BBN (IRC)
A. Singh, UC Santa Barbara (INARC)

5.5.1 Project Overview


This project will focus on developing a shared vocabulary across each of the three types of
networks. This first part of the project will identify entities in a composite network and their
attributes; and the relationships between these attributes and how do they affect network
formation. Harmonization and disambiguation of similar concepts across different network
genres will be key goals of this project. The second part of the project will identify new ―shared
metrics‖ that need to be defined across composite networks irrespective of their underlying
mathematical representation structure. This will include metrics relevant to tactical missions that
use all 3 networks.
This is a foundational project that will publish a machine-processable ontology for composite
networks, which will be eventually used by other projects across the consortium.

5.5.2 Project Motivation


Network science has been an active area of research for the past few decades. Typically network
science researchers have focused on specific types of network and have discovered specific
structural properties or laws that hold in those networks, e.g., Internet, World Wide Web,
scientific collaboration networks, social networks, and organizational networks, to name a few.
This has resulted in an explosion of ―networks vocabulary and terminology‖ which is tailored to
each type of network even though concepts in each network type have common semantics. For
example, there is a notion of homophily in social networks [McPherson01], which refers to the
tendency of individuals to associate and bond with other ―similar‖ individuals. While this
concept was proposed for social networks, even information networks may exhibit ―clustering

NS CTA IPP v1.4 5-11 March 17, 2010


behaviors‖ where similar blogs may end up linking to each other; and wireless ad hoc networks
may often form clusters based on proximity due to common mission goals. Thus there is a great
of commonality in the basic concept of ―similarity‖ and ―clustering‖ which permeates three
completely different types of networks, even though they are referred to by different names in
each field. Another example is that of a network node which lies in the ―center‖ of a given
network – social network researchers refer to it as a node with high ―betweenness centrality‖
whereas a communication networks researcher refers to it as a ―highly congested node under
shortest path routing‖.
Since the NS CTA program is supposed to develop the science of composite interacting
networks, it is important to tease out such concepts that have varying degrees of commonality
between them, to model the relationships between various networks at the structural level as well
as at the level of network properties as mentioned earlier.
Challenges of Network-Centric Operations
Military missions inherently entail interactions between multiple networks in a joint
environment, both friendly (this not only includes inter-service networks, but also includes those
with other military units and U.S. allies) and adversarial, involving all genres: information,
social/cognitive, and communication networks. Usually, operators in each of these networks
operate independently and hence are unaware of the vocabulary and concepts used another
network. However we envision future network-centric operations that use these three genres of
networks synergistically – hence closing the ―vocabulary gap‖ and defining explicit operational
metrics that straddle multiple genres of military networks will be essential for improving the
operations.
Example Military Scenarios
Consider a military commander who is looking at how best to disrupt the operations of a non-
nation-state terrorist organization. A physical weapon could be deployed to attack its
communication networks infrastructure, an information operation could be deployed that would
disrupt the ability of the enemy to rely on their information space, or a special operations mission
could remove a particular human from the social network by force. In deciding how best to
impact the adversary, we will eventually need a robust network science that can let us compare
these options in a more coherent way, or more importantly that could let us use a ―combined‖
attack that would let us, for example, use a more targeted information operation to remove that
human from the command and control loop, without putting US warfighters at risk. The ability
to model the combined operations and determine their effect requires that many different kinds of
systems (physical weapons, information operations, human forces, etc.) can be directly modeled
and compared, extending the commanders options across the entire range of possible actions
through an integrated network operations suite of tools.
Impact on Network Science
In order to achieve understanding of this overall network of networks, we need to develop a
shared network-centric vocabulary and ontology for referring to the concepts in each network
and more importantly, to the dependency relationships between them. This harmonized ontology
is also needed for cross-referencing concepts, models, and theories from each of the network-
specific ARCs in order to exploit insights from one genre to accelerate progress in other genres.
In addition to correlating various terminologies across these networks of different genres, we
need to develop new vocabulary and ontology for referring to network composites and their co-
evolutionary properties. These have not been studied before in the broad area of network science,

NS CTA IPP v1.4 5-12 March 17, 2010


hence new research is needed to establish new metrics that span all three types of networks
(social or SN, information or IN, and communication or CN) and capture the synergistic benefits
(or problems) that these networks receive from each other.
The focus of EDIN CCRI is on studying the dynamics and evolution of composite interacting
networks, and this project will develop the modeling vocabulary for capturing dynamics and
evolution. But first, it is necessary to develop the vocabulary of basic interactions between
networks before addressing the issue of developing the ontology and metrics for dynamics and
(co)-evolution.

5.5.3 Key Project Research Questions


This project aims at answering the following key research challenges that are important for the
development of the science of composite interdisciplinary networks:
What concepts in social, communication, and information networks have similar analogs?
How do we model the structure of a composite network so as to delineate the specific
dependency relationships between its component network entities?
How do we represent the dynamics of these composite networks as they change over
time?
Do metrics in individual networks have analogues in composite military networks?
Can a composite network metric be defined as a function of basic attributes and metrics
of its constituent social, information, and communication networks?
Project T1 also has a focus on developing metrics and vocabulary for interdisciplinary networks
but from the very specific point-of-view of ―trust‖. Although we will be influenced by T1
regarding the specific ―trust‖ related issues (since trust attributes may indeed play an important
role in network evolution, among other attributes), we have a broader charter in some sense,
since we have to investigate network attributes beyond trust.

5.5.4 Initial Hypotheses


A machine-readable ontology will allow us to better integrate results from all parts of EDIN, and
will enable us to better design scientific experiments for use in evaluating theories resulting
throughout the project. Mismatches in terminology can lead to fundamental flaws in
experimental design, methodology or analysis, and this project will enable us to avoid these. We
hypothesize that a machine-readable ontology can be realized early in current standard
languages, but that in the long run we will need to extend these languages to properly encode
some of the key Network Science concepts.

5.5.5 Technical Approach


Overview
Our approach is to separate the logical modeling aspects of composite network structures and the
definition of interesting network properties from the specific mathematical models that can be
used to represent these logical relationships and efficient algorithms to compute/estimate the
aforementioned network properties. We will focus on the latter activities in project E2 while
building upon the knowledge built in E1.1 and E1.2.

NS CTA IPP v1.4 5-13 March 17, 2010


This project consists of two tasks – the first task concentrates on the development of a
harmonized vocabulary and ontologies for composite network modeling, while the second task
concentrates of the identification and specification of metrics for composite networks. These two
tasks will involve tight collaboration since the introduction of a new entity in a composite
network will have impact on existing as well as novel shared metrics that can be defined on
composite networks.

5.5.6 Task E1.1: Harmonized Vocabulary and Ontology for Composite Network
Modeling (J. Hendler, RPI (IRC, SCNARC); C. Partridge, BBN (IRC); A. Singh,
UC Santa, Barbara (INARC); A. Bar-Noy, CUNY (CNARC). D. Dent, Cpt. S.
Shaffer, and A. Swami, ARL)
Task Overview
The primary goal of this task is to model a composite network. This involves modeling it at the
communication, information, and social networking levels. We define a composite military
network (CMN) as consisting of one or more interconnected networks composed from each of
the three disparate types of networks that interact with and typically influence each other (and
military operations); these are communications networks (CN), information networks (IN) and
social cognitive networks (SCN). In this task, we will identify the fundamental entities that
constitute CMNs.
Task Motivation
We believe that a dictionary or handbook of network science will be needed if the composite
military networks we describe in this section are to be modeled and if those models are to inform
the eventual military need to optimize our network infrastructures, to protect our networks
against attack, and to enable us to find minimum effort means to upset enemy networks,
especially those used by non-nation-state actors.
Key Research Questions
How do we model the structure of a composite network so as to delineate the specific
dependency relationships between structural entities at the SCN, IN, and CN levels?
How do we represent the dynamics of these composite networks as they change over
time?
Initial Hypotheses
OWL will provide a sufficient level of expressivity for expressing composite network
structure. Extensions of OWL will be needed, and designed, to better represent the
dynamic changes in networks over time.
Prior Work
Since dynamic composite networks are a new topic of research, existing modeling approaches
cannot be applied directly to develop a model in a straightforward manner. However, modeling
nodes, edges, and flow information in various classes of networks has received a lot of attention
in the past. Entity-relationship modeling is a database modeling method, used to produce a
semantic data model of a system [Chen76] – more recent variants of this exist. Ontology
languages such as standards SKOS, RDF(S) [Rdfs04] and OWL 2 RL [Owl09] exist for semantic
modeling of complex relationships between concepts. These will be leveraged in this task.
However these approaches are not directly amenable to a succinct representation of spatio-
temporal dynamics or probabilistic representation of various network attributes. Hence, new

NS CTA IPP v1.4 5-14 March 17, 2010


modeling approaches may be necessary to cover aspects of composite networks such a temporal
modeling, causality modeling etc.
Technical Approach
Mathematically, a network is represented as a tuple of entities that are related to each other by
labeled, binary, relations2 that represent the transport of certain objects from one entity to another
either within or between network types by leveraging the given set of relations. More formally,
each type of network consists of three basic entities, namely, nodes, edges (or links) and flows.
Examples of each entity for each of SCN, CN, and IN have been depicted in Table 1. Note that
while this may sound like graph-theoretic terminology, it is more general than that since the
same concept could be represented using tools other than graph theory. (See task R1.3 for an
example)
Table 1: Entities in a Social, Communications, and Information Networks

SCN IN CN
Node Entities with intelligent Information objects: Devices that can source,
decision making Concepts, Documents sink, store, and forward
capabilities (humans, Content metadata data packets
robots)

Edge Relationships, e.g., Concept relationships Connectivity (and


is-a-friend, reports-to, potential connectivity)
Hyperlinks between
is-similar-to, relationships between
documents
communicates-with, nodes, e.g., point-to-point,
seeks-advice-from, etc. Similarity between point-to-multipoint
content objects

Flow Abstract objects Queries and responses Streams of data bits


traversing a SN, e.g., sourced at a CN node and
Web-crawlers
information, ideas, traverse intermediate
beliefs, opinions, Indexing and mining nodes and edges
command/order, process
advice, affinities, etc.

System Social norms, values, Resource constraints, Protocols, security,


rules incentives, risks, linkages, background operational information
bounded cognition, information content, reliability
human adaptability to
task environment

2
In principle these relations can be n-ary. Mathematically, n-ary networks can be reduced to binary ones by the
introduction of redundant or specially-labeled arcs. In practice, however, visualization and analysis tools may need
to include representations of the n-ary relations. Note that a relation (edge) can have continuous-valued attributes.

NS CTA IPP v1.4 5-15 March 17, 2010


In this task, we will first identify various types of nodes, edges, and flows that exist in each type
of network. Additionally, we will capture certain conceptual relationships between classes of
nodes (edges, or flows), e.g., subclass/superclass; and attribute/value assignments for each type
of node, edge or flow. Such information will not be too difficult to collect since every center will
have an active participant in this project who will have sufficient domain knowledge.

Figure 1: Composite network of Communications, Information, and Social Networks

Having modeled relationships between nodes inside each type of network, we will focus on
modeling the relationships between nodes, edges, and flows of SCN and IN, IN and CN, and CN
and SCN.
Examples of common types of relationships include:
Mapping relationships between nodes: a (human) node in SCN is mapped to a (wireless
ad hoc) node in CN if the former uses the latter as his communications device; the human
node is mapped to an information node in IN if the former is a consumer of that
information, etc.
Producer/consumer dependency relationships: a node in IN with its producer and
consumer nodes in SCN can be mapped to a flow in CN
Observer relationships: a human operator could be observing/monitoring/controlling a
CN link.

NS CTA IPP v1.4 5-16 March 17, 2010


Conditional triggers: information starts flowing from a group of sensors when a malicious
node is detected.
Adversarial relationships: a group of red-force operators may be attempting to jam the
CN links of a blue-force.
Figure 1 illustrates some of the above types of relations and mappings between nodes within
information, social, and communication networks. This shows a specific representation of the
dependencies and interactions between various nodes in each network. The ―cross-edges‖
between the vertices of CN, IN, and SN indicate mappings between them, and they are directed
edges in general. For example, ―A generates B‖ models the fact that a certain user A in the
military social network generates a piece of knowledge B in the information network. Similarly,
―B fused-at C‖ means that this piece of knowledge B is a result of information fusion that
happens at node C in the communication network.
Modeling Rules of network formation We will investigate the various mechanisms for modeling
these relationships and how they are formed. While some of these relationships are simple binary
relations between entities in two different networks, others are more complex n-ary relations. We
will start our research with syntax-based approaches and then progress to more semantic
approaches, for example looking at the RDF(S) [Rdfs04] and OWL [Owl09] standards. It is
important to note that the existence of multiple relationships and certain attributes on a relevant
set of nodes and edges may give rise to more relationships. This can be modeled by means of
rules of network formation. Exploring the modeling and application of rules to networks will be
a linkage to the INARC Information Fusion task, which will be exploring the processing of large
RDF datasets.
Note that a composite network instance can have multiple instances of type SN, or IN; for
example, a composite network to model Army Counterinsurgency operations typically involves
three disparate social networks: enemy (red) forces, blue (friendly) forces, and green (civilian)
forces. Alternatively one could model the conglomerate of these three networks as one giant
social network. We will investigate the pros and cons of these approaches from the point of view
of scalability.
Modeling time-varying attributes and relationships Since we are dealing with dynamic
networks, time is a very important factor that needs to be modeled – this can be done by
modeling absolute time under certain circumstances, or by modeling causal ordering in time. We
will explore the use of Allen‘s interval algebra [Allen83] to model relations that change with
respect to pairs of time intervals. Additionally, nodes, edges and flows may have dynamic
processes (that have explicit dependence on time) associated with them, a representational
research area that we will attack in this project (none of the currently known temporal
approaches directly covers our needs in this area). Hence, we will need to model the
relationships between time-varying attributes in a succinct manner. Multi-scalar representations
of time-varying attributes will also be considered.
Modeling uncertainty Relationships and attribute values in integrated networks may be
uncertain. Much research has been done in databases on efficient methods for modeling and
representing such uncertainty. Tuple-level uncertainty, attribute-level uncertainty, and
probabilistic graphical models such as Bayesian networks have traditionally been used. We will
understand and analyze the specific kinds of uncertainties that arise in integrated networks and
devise the best methods for representing them.

NS CTA IPP v1.4 5-17 March 17, 2010


In the first year, we will focus on the following:
Vocabulary documentation and standardization for the CCRIs: Many of the critical terms in the
ARC descriptions are used inconsistently between the different communities involved, and have
many possible definitions (for example, the thesis of SCNARC researcher Golbeck has an
eleven-page chapter on the definitions and properties of the term ―trust‖ in the social networking
area alone). The start to harmonizing our vocabularies is not just to recognize the many ways the
terms are used, but to create operational definitions that can be tested and modified as we learn
to better understand the dynamics of networks. As the CCRIs have the most critical need for
early harmonization, year 1 efforts will focus on identifying the key terms and relationships
needed for smooth functioning of the TRUST and EDIN CCRIs. This will be achieved through a
coordination between E1 and T1 projects.

Figure 2: Harmonization of vocabulary and lightweight ontology development

“Lightweight” vocabulary formalization: ―Lightweight‖ is a term of the art in ontology


development, relating to the development of the core taxonomy of the class structure for a
domain, and the identification of key properties. The languages used for such ontologies include
the standards SKOS, RDF(S) [Rdfs04] and the new OWL 2 RL [Owl09]. This will provide a
standardized vocabulary and will indicate where further formalization (with more expressive
languages such as OWL 2 Full and RIF) will be needed in the later years. This will also provide
a ―skeletal ontology‖ that can be used in text extraction and data mining activities in the INARC
and SCNARC. One component of this vocabulary will address similar metrics across networks
that may have different names or syntax but have substantially the same semantics.

NS CTA IPP v1.4 5-18 March 17, 2010


Tool/language requirement specification: There are a number of tools for ontology development
that range from research-developments (eg. SWOOP) to open-source tools (eg, Protégé) and
commercial off the shelf (eg TopBraid). For the NS ARC, we will need to use these tools where
appropriate, but it is not unlikely that Army needs may mandate specializations or extensions of
current technology3. This is particularly true for the needs of the CMNs defined above, as there
must be a way to for tool users to specify dependencies between network components and types,
not just to specify that such exist. A requirements process, to determine the needs of the ARC
beyond the first year, will be an IPP deliverable.
Validation Approach
Our approach is to develop a mathematically sound, web-accessible knowledge base that will not
only support the individual efforts of the ARCs and the interdisciplinary goals of the IRC and
Army, but also will provide a common knowledge base for the use of the network-science
community at large. The Semantic Web standards can be leveraged for creating an initial starting
place for such a harmonized ontology. The use of machine-oriented knowledge representation
standards will support the evolution of socio-cognitive simulations.
Summary of Military Relevance
As described in section 5.5.2 above, which discusses providing a palette of options to a
commander needing to disrupt a terrorist network, the ability to model effects that cross the
boundaries of current network types is absolutely crucial. The modelling of composite networks
is also critical in defining protective technologies for our networks during as dynamic change
occurs on the battlefield (for example, if we must move a communications link, when is the best
time to do so with respect to some other operation that the communications officer may not be
aware of), in optimizing our network resources when we have uncertain information, or in
allowing a warfighter to make best decisions by being better able to exploit the techniques being
developed in the trust CCRI. Composite models are crucial for composite effects which can, as
these examples show, be used to minimize cost, effort, or risk while optimizing effect.
Research Products
This task will produce an ontology for describing the relationships between communication,
social and information networks and the possible dynamics changes and effects therein.

5.5.7 Task E1.2: Shared metrics for Composite Military Network Analysis (W.
Leland, P. Basu, BBN (IRC); A. Bar-Noy, CUNY (CNARC); J. Hendler, RPI (IRC,
SCNARC); A. Singh, UC Santa Barbara (INARC))
Task Overview
This task is responsible for the identification of cross-cutting metrics for composite networks in
general and composite military networks in particular.
Task Motivation
Clearly, each of the separate Centers will have to develop core network metrics by which they
can meaningfully measure the properties of the dynamics of the specific kinds of networks that
fall in their specialized areas. While a few of these metrics may be common to the study of all
networks, most of these metrics are specific to the particular type of network be it CN, IN or

3
For example, the Protégé tool was developed for the needs of the bioinformatics community, TopBraid is based on
SWOOP, which was developed under funding from the intelligence community, but is primarily aimed at industrial
enterprise needs. In each case, specific functionalities were added for these ―primary customer‖ groups.

NS CTA IPP v1.4 5-19 March 17, 2010


SCN. For example, metrics measuring the structural properties of a network such as degree
centrality, betweenness centrality, giant component size, clustering coefficient, network
resilience are common to all types of networks that can be represented as graphs [Newman03].
However, these properties don‘t easily translate into useful metrics at the level of military
function or need. To address these, each network type must also have specific metrics. For
example, CNARC will have metrics for measuring the core performance of communication
networks such as throughput capacity, latency, latency jitter, fairness, energy efficiency etc.
INARC will have metrics for measuring the efficacy of the information carried within the
network; e.g., the value of a certain information flow in a mission, which will in turn depend
upon several CNARC metrics. SCNARC will have metrics to measure the contribution of social
and organizational structures toward mission effectiveness.
For composite military networks, we will need to create methodologies that can combine these or
develop new metrics that can be used across network types. For example, the metric of Quality
of Information, originally proposed as a CNARC focus, will ultimately involve all network
genres.
Key Research Questions
Do metrics in individual networks have analogues in composite military networks?
How can a shared metric be defined as a function of basic SCN, IN, and CN
attributes/metrics, constraints, and enablers?
Initial Hypothesis
Basic network metrics such as ―connectivity‖ and ―reachability‖ have an analog in
composite networks, and can be expressed in terms of the corresponding individual
network metrics (such as connectivity in social or communication network) and the
mapping function between the multiple genres of networks.
QoI can be expressed as a composition of individual network metrics and the mapping
function between multiple networks.
Prior Work
Not much work has been done on defining metrics for composite networks. However, there has
been work done in decision theory literature for characterizing conditional relationships and
dependencies (both deterministic and stochastic) between various variables in a system – these
are known as influence diagrams [Howard84]. While this approach makes the model more
complex, it does provides significant expressive power to describe real networks and define
shared metrics on them (for capturing the influence of multiple individual metrics on the shared
metric). This however, has not been applied in the networks context before.
Technical Approach
To address these issues, we will define some theoretical metrics for CMN structures, to answer
questions such as:
What does reachability mean for the network shown in Figure 1? Reachability between a
social network node A and an information network node B could be defined as follows:
when A has an active mapping to a CN node C and a valid copy of B either exists on CN
node D that is reachable from C, or is derivable from other nodes in the IN that are
reachable from C.

NS CTA IPP v1.4 5-20 March 17, 2010


When is a composite network connected? When does it get disconnected?
What is the meaning of a sub-network in this context?
How do we measure the rate of evolution in an composite network? This metric should
measure the dynamics of mapping between nodes of individual networks and the rate of
structural change at both short and long time scales.
What is the minimum time to broadcast a piece of information from one user to a set of
users using structural knowledge of underlying SN, CN, and IN?
Formal definitions of such properties and metrics will be essential for developing the science of
dynamic composite networks, and this will have to take into account more general constraints
and enablers than the ones mentioned above.
A very important aspect of our work will focus on working out how these ―theoretical‖ metrics
can be the basis for more practical military needs. In particular, we will define mission
effectiveness as a shared metric for measuring success of a Composite Military Network that has
formed over time or which was assembled for a particular mission. Our initial mission
effectiveness metrics will be based on ARL and Army expertise and our experience with
warfighter needs in DoD programs such as FCS, e.g., least amount of resources to accomplish a
mission; degree of understanding of a mission environment, etc. Effectiveness metrics can be
either qualitative or quantitative (examples above), while many subsidiary network measures
(such as the many aspects of network performance and capacity) are typically quantitative.
An unambiguous quantitative definition of this overarching metric requires cross-cutting
research across multiple networks. The success of a military mission (say, capturing a target)
could depend on several factors, such as:
1. The geographic distribution of various types of decision-makers and soldiers relative to
the target
2. The mission plan, which is hierarchically decomposed into subtasks and could have
dependencies between themselves; this includes information requests and feedback4
3. The type, number, and placement of various information sources that gather and process
information about the target (cameras, social databases, etc.) to aid the soldiers
4. The capabilities, trustworthiness, and robustness of the underlying communication
network
Each of these factors can be further decomposed into simpler elements, and then metrics can be
assigned to them.
We propose to conduct research about the relative importance of each factor with respect to the
overall mission. High level ―mission‖ metrics will be decomposed into lower level ―network‖

4
With this structure of subtasks being pushed down from higher to lower echelon commanders in the military, time
is of the essence in many cases. To assist, it would be prudent to work a plan into this network that would prevent,
or at least identify, information to leaders, which they are attempting to gather. In other words, if a commander was
attempting to gather specific information and, without his/her knowledge, the information was already out there on
the network, then the network would be able to find this and provide the data back to the commander. This would
prevent double efforts and provide the commander with more time to complete other tasks.

NS CTA IPP v1.4 5-21 March 17, 2010


metrics through collaborative research across the Alliance. We will also systematically determine
the explicit or implicit dependency of one metric on another.
In the first year, we will start by defining metrics that measure the impact of perturbation inside
one network on the others, like an ―impulse response‖, and then specify higher-level metrics in
terms of these impulse response functions. In the second year, this approach will be augmented
by use of the theory of marginal utilities as applied to this problem for capturing sensitivity of
higher-level metrics to changes in lower-level ones.
In particular, we will focus on the following:
Mapping between similar metrics across the three NS CTA network genres: In addition to
the harmonized metrics vocabulary to be developed in E1.1, further research analysis will
articulate and model mappings between metrics in specific network genres and their
impact on related metrics in other genres. This research entails quantitative understanding
of the dependencies and their sensitivity analysis.
Components of the Quality of Information metrics: Working across all Centers, we will
develop and quantify an initial influence diagram of metrics and the associated derivation
of composite QoI metrics for the composite dynamic network.
Validation Approach
In the first year, we will formalize and evaluate the initial mission effectiveness metrics based on
the analysis of a specific exemplar Army scenario, to be developed in conjunction with ARL and
other Army subject matter experts.
Summary of Military Relevance
The driver for this research is its relevance to enhancing warfighter performance and the
precision and effectiveness of military missions. Because Army missions inherently entail close
interactions between multiple composite networks, both friendly (including both inter-service
and coalition allies) and adversarial, across all network genres, we need to develop a shared set
of network vocabulary, ontology, modeling and metrics for the military value of individual
networks and of the composite network of networks to be able to develop a fundamental
understanding of the interactions between Army communication, information, and social
networks. The metrics will both embody both combinations of genre-specific metrics and also
interdisciplinary metrics that address the combined effectiveness of the composite networks,
their dynamic behavior, and their evolution over time (under influence of these metrics) which
will result in a better understanding of various Army mission specific metrics.
Research Products
This research will produce a list of cross-cutting metrics for measuring the properties of
composite military networks.

5.5.8 Linkages with Other Projects


IPP Tasks Linkage
E1.1, E1.2  R1.3 Develop new methods of modeling composite
networks by leveraging results from category
theory
E1.1, E1.2  T1 Borrow from specific trust metrics and models to
enrich composite network ontology and metrics.

NS CTA IPP v1.4 5-22 March 17, 2010


Also influence composite network modeling
aspects of Trust
E1  C1 Models of QoI and OICC
E1.1, E1.2  I2 Develop means for the representation of network
relations in RDF using the data modeling
techniques defined by the INARC task
E1.1  S2 Models for adversarial social network
relationship and their evolution
E1.2  R1.2 Develop metrics based on economic utility
measures
E1.1, E1.2  E3, E4 Borrow network formation rules developed
under mobility modeling and dynamic
community modeling
E1.1, E1.2  E2, E3, E4 Use network relationships and metrics for
determining impact of network dynamics on
spatio-temporal evolution

5.5.9 Collaborations and Staff Rotations


This project needs collaboration with researchers from all four centers and thus has the potential
to forge bonds between all the groups. The focus in the first year will be on the needs of the
EDIN CCRI and this will assure that the effects are immediately applicable to the entire NS
CTA. The eventual development of a first machine-readable ontology in E1.1 will be useful for
the INARC and SCNARC text-mining related projects.
Each Center will develop metrics pertaining to their type of network; hence E1.2 will have
participants from each center to reflect the current thought processes of each center while
addressing the charge of developing cross-cutting metrics beyond the individual network metrics.
E1.2 will collaborate with R1.2 for gaining insights into utility metrics defined in market-based
approaches developed by the IRC.
RPI will provide a post-doctoral research assistant, funded through the IRC, with a primary
responsibility for the knowledge representation aspects of this project. This post-doc will spend
at least half their time at the NS-CTA facility.

5.5.10 Relation to DoD and Industry Research


The Web Ontology Language OWL grew out of the DARPA Agent Markup Language (DAML)
which was developed specifically to meet US military needs in integrating information systems
and modeling military domains for extended software and technology development. In 2002, the
World Wide Web Consortium, an organization of over 400 of the major Web and IT companies
and organizations in the world, formed the ―Web Ontology Working Group‖ (WOWG) to
develop an industrial standard based on the DAML language developed under DARPA support.
(The WOWG was chaired by IRC researcher James Hendler). The OWL language that emerged
became an international standard in 2004. Other languages that have evolved to further
Semantic Web standards, especially RDFa and SPARQL, are now being supported by many
large companies including Google, Microsoft, Oracle, Yahoo and many others. However, the
level of representation needed for this project is likely to exceed those currently supported in

NS CTA IPP v1.4 5-23 March 17, 2010


these standards, and we will work with industry and standards organizations to explore how the
new technologies we develop can be used by these companies.

5.5.11 Project Research Milestones

Research Milestones

Due Task Description


Collect ―glossary‖ of CMN terminology from ARC documents,
Q2 E1.1
IPP and long-term planning discussions. (RPI, BBN)
Report detailing initial list of basic shared metrics for composite
Q2 E1.2
networks (BBN, CUNY)
Report/paper on modeling network formation rules while
Q3 E1.1 capturing temporal models and probabilistic relationships
(BBN, UCSB, RPI)
Report detailing advanced metrics for composite networks
Q3 E1.2
(time-varying dependencies etc.) (BBN, CUNY, UCSB)
Requirements document for representational needs in CMN
modeling and for Army-specific tool development needs; Create
Q4 E1.1
a wiki with both human and machine-readable markup for
collecting and tracking CMN terminology use. (RPI, BBN)
Report detailing military effectiveness metric and identifying
Q4 E1.2 key technical challenges for NS CTA in this area (BBN,
CUNY)

5.5.12 Project Budget by Organization

Budget By Organization

Organization Government Funding ($) Cost Share ($)


BBN (IRC) 147,286
CUNY (CNARC) 22,000
RPI (IRC) 132,854
RPI (SCNARC) 15,250 2,784
UCSB (INARC) 45,323
TOTAL 362,713 2,784

NS CTA IPP v1.4 5-24 March 17, 2010


5.6 Project E2: Mathematical Modeling of Composite Networks

Project Leads: J. J. Garcia-Luna-Aceves, UC Santa Cruz (CNARC)


Email: jj@soe.ucsc.edu, Phone: 831-459-4153
Project Lead: Prithwish Basu, BBN (IRC)
Email: pbasu@bbn.com, Phone: 617-873-7742

Primary Research Staff Collaborators


C. Aggarwal, IBM (INARC) I. Castineyra, BBN (IRC)
A. Bar-Noy, CUNY (CNARC) A. Swami, ARL
P. Basu, BBN (IRC)
C. Faloutsos, CMU (INARC)
M. Faloutsos, UC Riverside (IRC)
J.J. Garcia-Luna-Aceves, UCSC (CNARC)
R. Ramanathan, BBN (CNARC)
H. Sadjadpour, UCSC (CNARC)
A. Singh, UCSB (INARC)
X. Yan, UCSB (INARC)
D. Towsley, UMass (IRC)
R. D‘Souza, UC Davis (CNARC)

5.6.1 Project Overview


This project will study the development of mathematical models for composite networks
comprised of communication networks, information networks, and social networks.
Any aggregation of resources, information or individuals in the context of a common purpose or
description of a problem constitutes a network. Arguably, the first formal mathematical study of
a network dates back to Leonard Euler's work on the seven bridges of Konisberg in 1735.
However, as old as graph theory is, graph theory and other mathematical disciplines have been
applied mostly to the characterization of infrastructure networks used for communication,
transmission, or transportation. The characterization of information networks and social
networks is much more recent, and even far more recent is the realization that all such networks
interact with one another [net-science].

NS CTA IPP v1.4 5-25 March 17, 2010


There are many contributions on the structure of social networks and information networks, but
very little on the capacity of social and information networks. Furthermore, many studies on the
performance of social and information networks assume implicitly or explicitly that the
underlying physical network structure is the same as that of the social or information network
under consideration.
In reality, people do not simply use or participate in a communication network, a social network
or an information network. They use, participate and modify composite networks consisting of
many different structures that evolve and interact with one another in space and time. The
structure of a tactical communication network is defined by its dynamic wireless connectivity
and is very different than the structure of the social networks defined by common interests and
social links defined among the users of the communication network, or the information networks
defined by the relationships among information objects. Accordingly, any one component of a
composite network is a spatiotemporal structure that needs to be examined in the context of the
other two types of networks of the composite network.

5.6.2 Project Motivation


The motivation for this project is quite simple: No modeling framework exists today that can be
used to characterize the structure and performance of composite networks. The implications of
establishing such a framework for the understanding of composite networks is far reaching. First,
an understanding of the true fundamental limits of tactical networks will evolve that takes into
account all the dynamics of such networks, including node mobility, physical-layer impact on
communication, social preferences, and correlation of information. Second, a new perspective on
the design of architectures, protocols, tools and applications will evolve that considers how users
and applications adapt to the underlying infrastructures and how the underlying infrastructures
adapt their operation according to the way in which users and applications share and create
information. Third, a modeling framework for composite networks will render results that reflect
the performance of realistic tactical networks more accurately.

5.6.3 Key Research Questions


The basic research question that we seek to answer is: What are the best mathematical structures
for representing relationships between components of a composite network, especially ones that
capture time-dependence? In approaching answers to this question, we will also address several
other questions, including:
What are the best mathematical structures for representing spatiotemporal dependencies
of composite networks?
Are multiple models needed to characterize all the dynamics of a composite network?
Is there a relationship between the interactions of the three components of a composite
network with its structure?
What is the impact of the evolution of compound networks over dynamic scales and
granularities on the protocols and tools used in such networks?

NS CTA IPP v1.4 5-26 March 17, 2010


5.6.4 Initial Hypothesis
Our main research hypothesis is that the new models of composite networks we will develop will
provide a better approximation of the structure and operation of real tactical networks than the
current models that focus exclusively on communication, social, or information networks.

5.6.5 Technical Approach


Given that network science applied to composite networks is in a nascent stage, we plan to
investigate several promising approaches in parallel. Over the course of our research during this
first year, we will compare these approaches and establish the advantages and disadvantage of
each approach over the others.
The research activities of this project are intimately linked to other EDIN projects and other
network-specific projects. For example, aspects of mobility models studied under E3 are
intimately connected to the development of a mathematical model of dynamic networks.
Similarly, group dynamics studied in E4 are related to the spatiotemporal nature of the models
we seek to develop.
This project is organized into three tasks. Each task addresses the same basic question and starts
with the same basic hypothesis. The main differences among tasks stem from the graph models
chosen to begin the research. Task E.2.1 seeks to develop unifying models of composite
networks. Task E.2.2 focuses on modeling time and spatiotemporal properties of composite
graphs. Task E.2.3 seeks to take advantage of the interplay between random graphs and
optimization models, both of which have been used in the past to model communication
networks.

5.6.6 Task E2.1: Unifying Graph Representations of Composite Networks (P. Basu,
I. Castineyra, BBN (IRC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY
(CNARC); A. Singh, X. Yan, UCSB (INARC); R. D’Souza, UC Davis (CNARC); A.
Swami, ARL)
Task Overview
This task aims at developing graph theoretic representations of composite networks and studying
their basic properties. It also aims at studying summary representations of massive networks,
which are useful for fast extraction of useful information.
Task Motivation
Composite networks consist of social, information, and communication networks and each
consist of constituent entities (nodes, edges, flows, formation rules) that have been summarized
earlier. Since graphs are the most natural representation for each of the network genres, it is
natural to think of how individual graph representations for each network can be ―composed‖ to
represent a composite network. Metrics can then be defined on such ―composite graphs‖ and this
graph representation can then be used effectively for various purposes, such as the following:
Distributed search for a piece of information
Determination of critical bottlenecks in the joint CN+IN+SN space
Prediction of evolution of network attributes

NS CTA IPP v1.4 5-27 March 17, 2010


Key Research Questions
How should dynamic composite networks be represented in terms of graph theoretic
models?
How do we represent dynamic composite graphs succinctly but with sufficient richness?
How to represent these networks at multiple levels of granularity?
How to model time-varying dependencies (from seconds to years) and interaction
between devices, information objects, and human beings? Can we model interactions
between graphs representing SCN, IN, and CN using graph theoretic operations?
How do we model probabilistic or fuzzy information in composite graphs?
Can we extend tradition graph theory metrics naturally to these composite networks?
Initial Hypotheses
Efficient graph algorithms can be extended to function efficiently (in a theoretical sense
such as polynomial time) on composite graphs but there exist ―rich‖ composite graphs in
which composition causes algorithmic complexity to explode.
Prior Work
The join G1 G2 of graphs G1,G2 with disjoint point sets V1,V2 and edge sets E1, E 2 is defined as
the graph union G1 G2 together with all the edges joining V1,V2 [Harary94]. This essentially
retains all the vertices and internal edges in the individual networks but adds ―cross-edges‖
between pairs of vertices. Thus ―graph join‖ can potentially model composite structure.
Technical Approach
Composite networks can be represented by aggregating the underlying graph structures for the
individual constituent networks. For instance, if GCN (VCN , ECN ) represents the underlying
communications network, GIN (VIN , E IN ) represents the underlying information network, and
GSN (VSN , E SN ) represents the underlying social network, then we can define and model the
composite network using the aforementioned graph join operation as follows: GInt (VInt , E Int )
where VInt V(GCN GIN GSN ) VCN VIN VSN and
* * *
EInt E(GCN GIN GSN ) ECN EIN ESN ECN IN EIN SN ESN CN , where
*
ECN IN VCN VIN , and so on. This is illustrated in Figure 1.
Each node, edge, and flow in the above graph can have attributes that can vary over time. For
example the ―trusts‖ relationship in the SN part of the network in Figure 1 could have a weight
associated with it reflecting the degree of trustworthiness – this attribute could vary over time. In
addition to the above attributes, the ―cross-edges‖ also possess attributes, e.g. the edge labeled
―fused-at‖ can have a time-varying attribute namely, ―last fusion time‖ which is updated
whenever a new fusion event occurs.
The edges in a composite network graph also need to model complex relationships between
nodes and may be derivable by executing rules – for example, the existence of a certain mapping
edge between a social network node A and an information network node (object) B may be
contingent upon the existence of a communication network node C located near A with a high
probability, and that node C stores object B. Such complex, yet realistic constraints add richness
to the resultant graph structure, thus necessitating the development of new algorithms for
analysis. If node and edge attributes have probabilistic measures associated with them, the

NS CTA IPP v1.4 5-28 March 17, 2010


existence of a relationship becomes probabilistic and that adds another dimension to the
modeling problem.
These mapping maybe highly dynamic in a spatio-temporal sense – an SN node may be always
mapped to the nearest available CN node as he moves around and the SN  CN mapping may
be constantly changing. Similarly, an IN node may be mapped to a particular CN node and it
may be replicated based on certain demands from the social network, hence the IN  CN
mapping may change rapidly.
It is not surprising that the traditional graph properties (e.g., connectivity, reachability, max-
flow) and algorithms to compute those (e.g., depth-first-search, Dijkstra shortest path, or Ford-
Fulkerson) do not directly apply in this setting. For example, a basic question that needs to be
answered is what does reachability in a composite network mean; and how it can be defined in
terms of reachability in and connectivity of the constituent networks and the mapping edges
between their nodes? The notion of reachability is related to problem of searching for nodes in
such networks. More generally, we will evaluate how shared network metrics defined in Task
E1.2 can be mapped to this representation and will develop efficient algorithms to operate on this
new class of graphs. We expect some composite networks not to be amenable to efficient
algorithm design – we will explore such boundaries carefully.
We will also study summary representations of massive networks, which can be used for
extraction of useful information from the underlying network. We will design data structures that
are extremely compact, but can retain important information about the underlying network for
querying and mining purposes. We will investigate the different kinds of queries that can be
performed in a network using this summary representation. The need for this arises in different
kinds of networks, though the query may be different. For example, in a communication
network, we may wish to determine the connectivity between a given pair of nodes, whereas in a
social network, we may wish to characterize the frequency behavior of the different edges. Both
of these queries may be resolved with the use of a summary representation of the underlying
network. The summary representation may need to be general enough, so as to accommodate
different kinds of queries that are relevant to the composite network scenario. In the first year,
we will study summary representations specific to particular kinds of networks. The longer-term
plan is to generalize these techniques to composite networks, by increasing the spectrum of
queries that a particular kind of representation can resolve. The graph-based and tensor-based
mathematical models mentioned subsequently in this section may be relevant to designing
effective summary structures that are useful for modeling network dynamics and evolution.
A particular challenge arises in the context of a dynamic information network in which the links
may evolve over time. For such cases, it is useful to assume the stream model for the underlying
information network. Correspondingly, the underlying graph may also be assumed to correspond
to the stream model. Another challenge may arise from the fact that the domain-size of the
underlying graph may be very large. As a result, it may not even be possible to represent the
underlying network explicitly. Our summarization techniques will be designed to work with this
dynamic evolving stream scenario.
Linkages
IRC project R1 is focused on extracting knowledge about very large networks by developing
efficient graph sampling techniques. This has interesting linkages with the summarization idea in
this task – we plan to explore this further during the research.

NS CTA IPP v1.4 5-29 March 17, 2010


5.6.7 Task E2.2: Network Models that Capture Time (J.J. Garcia-Luna-Aceves, H.
Sadjadpour, UCSC (CNARC); C. Faloutsos, CMU (INARC); P. Basu, BBN (IRC);
R. Ramanathan, BBN (CNARC); A. Singh, UCSB (INARC); A. Swami, ARL)
Task Overview
In this task we will investigate the use of graph models aimed at representing spatiotemporal
relationships. In particular, we will consider Kronecker graphs, tensors, and a new model that we
call temporal graphlets.
Key Research Questions
What mathematical techniques can be used to study temporal graphs, whose structure
changes (rapidly or slowly) over time, and their properties?
How to use tensors to study time-varying graphs?
Initial Hypotheses
Recursive models, like Kronecker graphs and variations (RTG, a recursive realistic graph
generator using random typing), will lead to realistic-looking graphs in multiple settings:
directed/undirected, time-evolving/static, bipartite/uni-partite.
Tensors models will achieve good compression of real graphs.
Realistic composite networks can be modeled well using Simplicial Complexes and
hypergraphs.
Technical Approach and Prior Work
We will study methods using hypergraphs (or set systems) and Kronecker graphs for
representing networks where nodes and edges have multiple attributes (e.g., desire for different
types of data, distinct edge capacities, etc) or relate one node to multiple nodes. We will model
separating different types of traffic into distinct flows and potentially the interactions amongst
the distinct flows. The combination of hypergraphs and Kronecker graphs lends itself to the
modeling of temporal evolution patterns and lead to tractable analysis and rigorous proofs.
We will also investigate use of modeling frameworks that are more general than basic graph
theory since graphs only capture relations between pairs of entities and not groups of entities.
We will borrow mathematical tools from the rich theory of Simplicial Complexes to
appropriately model the structure of individual as well as composite networks and study their
properties.
We will investigate suitable generative models for time-evolving graphs. A recent survey on
graph models is by us [Chakrabarti and Faloutsos, ACM CSUR ‗06]. The difference from others
is that we are (a) collecting as many real graph datasets as possible and (b) we are looking for
patterns in them, like the number of triangles [Tsourakakis‘08], properties of the edge weights
and of the distribution of the sizes of connected components [McGlohon+‘08], the number of
cliques [Du+‘09] etc. We will measure the success of graph generating models in two ways: (1)
qualitatively and (2) quantitatively. In the first, we will try to prove that the generator indeed has
the properties of real graphs (small diameter, densification, etc). In the second, we will use
maximum likelihood that our generative model will assign to a given real graph – the higher, the
better the model. We expect to discover new patterns, and to show that the recursive generators
match these patterns, or to develop new graph generators with even better matches.

NS CTA IPP v1.4 5-30 March 17, 2010


Another approach we will investigate is tensors, which are very suitable in modeling composite
networks. For example, consider a network of humans, where links show that person ‗A‘ e-mails
person ‗B‘; another network on the same humans where a link means that ‗A‘ calls ‗B‘, and so
on. In such a case, the edge-type would be the third mode of the tensor where the first two are the
source-id and the destination-id. Tensor analysis, like Kruskal decomposition, will reveal
patterns of the form, e.g., ``engineers typically talk to officers on the phone, but rarely e-mail
them‘‘. Such rules are likely to help us spot anomalies, as well as compress the data.
Tensors are suitable for time-evolving graphs, because we can treat time as the third mode. In
fact, tensors can even handle time-evolving, composite networks, because such networks can be
modeled as 4-mode tensors (source, destination, time, type-of-edge). We have studied tensors for
the temporal evolution of graphs [Sun06Beyond], low rank approximations [Tong08Colibri],
wavelets [Papadimitriou03Adaptive] and Kalman filters [Tao04Prediction, Li09DynaMMo].
We believe that tensors will achieve good compression of real graphs, because they will capture
the correlations across all the modes, the same way that SVD captures correlations across rows
and columns of a matrix. Most of the prior work is on theoretical or algorithmic aspects of
tensors, with little study of social networks. We plan to study tensors for real, time-evolving
social networks. We will use the RMSE of the reconstruction as the measure of success, and also
subjective evaluation of the resulting patterns using domain experts.
Lastly, we will investigate a new approach to capture the structural properties of dynamic
networks. We propose to study Temporal Graphlets, which extend classical graph theory in two
dimensions – time and computation. A temporal graphlet G(t) = {V (t), E(t)} is t-snapshot of a
graph that changes over time. We propose to develop mathematical concepts, temporal structural
properties, and computation-theoretic properties across an arbitrary sequence of such graphlets.
While traditional graph theory only considers properties in the ―horizontal‖ (space) dimension,
we consider properties across the ―vertical‖ (time) dimension as well. For instance, u → v is t-
reachable iff there exists a sequence of nodes and edges u = v1 (1), e1 (2), v2 (3), . . . , ek (j – 1),
vk (j ), . . . , v = vm(t) in window [1, t]. Similarly, a t-cut is the removal of a set of vertices X ⊂ V
(1) ∪ V (2) ∪ V (3) . . . ∪ V (t) that results in some u and v losing their t-reachability property.
Finally, like planar, regular etc., special/restricted temporal graphlets are possible: e.g., a t-k-
regular graph is one in which every node makes unique contact exactly k times during its
lifetime. We shall properly define useful analogs of classical graph-theoretic concepts. For
instance, one could define t-adjacency: Nodes u and v are t-adjacent if there exists an edge u → v
in window [1, t]. We shall derive new results and answer new questions on temporal graphlets.
For instance, does t-chromatic number ≤ MAX-CLIQUE(t)? Or what family of edge processes
E(t) would increase the probability of occurrence of CLIQUE(k, T )?
Abstract models of packet transport ―methods‖ can be used to derive results on the classes of
temporal graphlets for which the method ―computes‖ (in this case, transports the packet). In our
preliminary work [Ramanathan07] we have used the concept of Forwarding, which specifies
actions on an incoming packet, to classify methods based on replication (single copy, k-copy,
maximal copy etc.) and knowledge (oracular, non-oracular). We expect that this line of work will
provide a complementary approach to the characterization of dynamic networks that can be used
to model mobile and dynamic networks to study the performance of specific protocols in the
future. In future years, we will examine how temporal graphlets can be used to examine
fundamental limits of dynamic networks.

NS CTA IPP v1.4 5-31 March 17, 2010


5.6.8 Task E2.3: Interplay of Power Laws vs. Optimization Models and Random
Graphs (J.J. Garcia-Luna-Aceves, H. Sadjadpour, UCSC (CNARC); C. Faloutsos,
CMU (INARC); P. Basu, BBN (IRC); M. Faloutsos, UC Riverside (IRC); D.
Towsley, UMass (IRC))
Task Overview
This task will study the interplay between various existing network models such as power laws,
random graphs, and constrained optimization in the representation of composite networks.
Task Motivation and Prior Work
Three graph-based approaches have been used extensively in the past to model the structure and
evolution of different types of networks, namely: random graphs, scale-free graphs, and
constrained-optimization models.
Random geometric graphs (RGG) have been used extensively over the past 10 years for the
characterization of the structure and performance of wireless communication networks. The
advantage of random graphs and RGGs in particular is that they provide a simple building block
for the modeling of the structure and performance of wireless ad hoc networks in which the
location of nodes or the communication links connecting them are not engineering carefully in
order to attain a given performance. The key limitation of this type of graphs has been that they
represent static topologies, which is insufficient to capture the true structure of a tactical
communication network subject to mobility and other types of dynamics.
Scale-free graphs are exemplified by the well-known papers stating that at the core of complex
evolved networks such as the Internet [Faloutsos99, Barabasi99], WWW network, citation
networks, and others often lie very simple ―natural‖ organizing principles, such as preferential
attachment. These organizing principles make scale-free networks amenable to sophisticated
mathematical analysis. However, contrary to what many recent modeling efforts on complex
networks would tend to indicate, scale-free graphs are not applicable to every type of real
network. In particular, Willinger, Alderson and Doyle [Willinger09] showed that the power-law
claim in many papers with respect to the structure of the Internet are an artifact of incorrect data
collection techniques and that ―engineered‖ systems often evolve as a result of heuristic
optimization of various tradeoffs related to underlying costs of network formation, not
necessarily simplistic rules such as preferential attachment. Similarly, a tactical wireless network
cannot possibly have a small-world topology, because nodes can only interact with a relatively
small number of neighbors compared to the total number of nodes in the theater, even when
these nodes have radios more powerful than others.
Willinger, Alderson and Doyle [Willinger09] and other authors have proposed the use of
constrained optimization models as an alternative to capture the structure of communication
networks. Constrained optimization models have the advantage that they are the result of
multiple equations stating the demands and constraints of the problem, and they can model the
impact or social and information networks on the resources of the underlying communication
network. However, constrained optimization formulations that capture node mobility and
channel dynamics are by necessity much more complex than those that have been suggested to
model the Internet [Willinger09] and other static networks. In addition, a constrained
optimization formulation typically assumes an underlying solution approach for information
dissemination (e.g., shortest path routing), because of the formulation of constraints in equations,

NS CTA IPP v1.4 5-32 March 17, 2010


and need not be the best tool to answer questions about limits of dynamic networks given any
optimal set of protocols.
Key Research Questions
Can one decompose a composite network into a natural component that models the
structure of social and information networking aspects and an engineered or random
component that captures the structure of the underlying communication network?
How does the interplay of power-laws, random graphs, and constrained optimization
affect the structure of the composite network?
Initial Hypotheses
Most real-world composite networks can be naturally decomposed into interacting
natural, engineered, and random components and its properties can be predicted as a
function of the underlying interacting components.
Technical Approach
We propose to investigate the interplay of these existing graph models in the representation of
the structure of composite networks. Several modeling questions arise from the fact that the
social networks and information networks operating over a wireless tactical network can very
well be modeled using scale-free graphs while the underlying communication network can be
modeled using random graphs and constrained optimization.
The first question pertains to the way in which the structure of a composite network should be
modeled. Can one decompose a composite network into a natural component that models the
structure of social and information networking aspects and an engineered or random component
that captures the structure of the underlying communication network? If scale-free graphs model
information and social networks operating over a random or engineered graph representing the
communication network, how should these scale-free graphs become an overlay on the graph
describing physical connectivity?
Modeling the structure of a composite network must also capture the spatio-temporal properties
of communication, information, and social links. In this regard, we believe that seeking an initial
decomposition of the modeling task is a good approach. The modeling of dynamic
communication networks is in its infancy, and the vast majority of work on fundamental limits of
communication networks has been limited to static topologies.
During the first year, we will consider a number of alternatives with increasing degree of
complexity, and will address node mobility and channel dynamics. We will consider using
random graphs in which node-to-node distances are given by probability density functions that
depend on the random mobility of nodes [Bettstetter03], the connectivity among nodes will be
modeled according to the protocol or physical models used in the past for static wireless
networks, and the effect of social and information structures will be modeled by power-law
distributions. An important challenge in the representation of scale-free graphs as overlays in a
random graph is the mapping of a social or information link into a physical path in a random
graph. In the past, this step has been avoided by assuming that all flows in a network are
uniformly distributed, which is clearly not the case in a real network. The representation of
dynamic communication networks using random graphs is in itself a new problem. In the past,
RGGs have been used to model static communication networks. We will study approaches aimed
at dynamic random geometric graph (DRGG) in which nodes are allowed to move randomly. In

NS CTA IPP v1.4 5-33 March 17, 2010


this context, the work by Diaz, Missche and Perez-Gimenez [Diaz08] offers some preliminary
insight. The basic intent is to view each snapshot in time of an RGG as a random variable, and
the DRRG resulting from considering the RGG at different points in time as a random process
that reflect properties over time.
We will also study the use of constrained optimization models to address more explicit modeling
of the various classes of entities modifying these networks, including their cost and benefit
functions and their available strategies. The objective is to determine whether these graph-based
approaches can achieve greater fidelity and predictive power.
Validation Approach
We will validate our modeling approaches using data available from military deployment
scenarios. For example, the distribution of soldiers in a Brigade Combat Team and their various
social network relationships (reporting structures, friendship etc.) can be studied as a
combination of natural and engineered evolution, whereas the spatial distribution of their
communication devices may be closely modeled by random geometric graphs. We can use these
to predict properties of the composite military network and validate those against qualitative or
quantitative mission metrics. We hope to receive guidance from ARL on how to get access to
such information.

5.6.9 Linkages to Other Projects


IPP Tasks Linkage
E2.1  E1.2 Analysis of a subset of the shared metrics (for
composite networks) being defined and
developed in the ontology/metrics task.
I2.1 E2.1 Representation of networks will be used to
develop new ways of organization and
management of information networks.
I2.1 E2.1 New kinds of causality and evolution queries
will be considered.
E2.3  E3.3 Models and analysis of group behavior will be
used in studying interplay of various existing
network models.
E2.1  I3.1, I3.2 How to discover causal behavior from networks?
E2.2  E4.1, E4.2 How do mobility patterns affect temporal models
of composite networks?
E2.1, E2.2 E3.2 Contribute concepts, data and theory on network
dynamics and co-evolution for the mathematical
model of time-dependent networks

5.6.10 Relevance to US Military Visions/Impact on Network Science


This project is of a foundational nature as it sets out to create the fundamental mathematical
underpinnings of the structure of composite networks. Therefore, it is highly relevant to the US
Army operations. As mentioned earlier in this chapter, both tactical and counter-insurgency
operations involve a non-trivial interplay of dynamic social, information, and communication

NS CTA IPP v1.4 5-34 March 17, 2010


networks; hence mathematical models and tools that can predict interesting properties of such
networks are of tremendous value as they would improve the tempo of network-centric operation
and decision-making. What are the appropriate models and representations for military
communication, information, and social networks? How are the individual networks lined
together? How do each individual network and the composite network evolve? These are the key
scientific questions of network science that will be addressed in this project. These answers will
eventually lead to sound design and analysis techniques that can be used to analyze existing
military networks and to develop new networks so that military operations can be carried out
successfully.
The impact of this project on network science will be significant since as of now, a science of
dynamically interacting composite networks simply does not exist; and this project is proposes a
set of fundamental mathematical approaches in that grand direction.

5.6.11 Collaborations and Staff Rotations


Each task will have monthly teleconferences to coordinate the planned research work. Each task
itself will naturally decompose into subtasks and these will have closer collaborations in the form
of joint papers and joint advising on PhD theses. There will be organized visits by the faculty and
staff of the collaborating institutions. Wherever possible, meetings will be coordinated
opportunistically with travel to conferences and meetings.

5.6.12 Relation to DoD and Industry Research


While thre are several theory programs investigating large information networks, or dynamic
communication networks, or social networks, there is no known project along the lines of the
proposed research in either DoD or Industry.

Research Milestones

Due Task Description


Generative graph models for weighted, time-evolving
Q2 E2.1
graphs (BBN, CUNY, CMU)
Design information network stream summarization methods
Q2 E2.1
in an evolving environment (IBM, UCSB)
Q3 E2.1 Analysis of queries on dynamic integrated networks (UCSB)
Development of algorithms for tensors on time-evolving
Q3 E2.2
graphs (CMU)
Initial analysis of the interplay between power-law and
Q3 E2.3 random graph models with constrained optimization (BBN,
UCSC)
Initial analysis of DRGG and temporal graphlet models
Q4 E2.2
(UCSC, BBN)
Q4 E2.1 Test information network stream summarization methods

NS CTA IPP v1.4 5-35 March 17, 2010


Research Milestones

Due Task Description


with a variety of queries which are relevant to information
networks (IBM, UCSB)
Analysis of constrained optimization and power-law models
Q4 E2.3
on dynamic composite networks (BBN, UCSC)

5.6.13 Project Budget by Organization


Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (CNARC) 44,000
BBN (IRC) 177,147
CMU (INARC) 66,852
CUNY (CNARC) 24,614
IBM (INARC) 35,070
UCD (CNARC) 50,000
UCR (IRC) 62,256
UCSB (INARC) 129,389
UCSC (CNARC) 84,614 48,000
UMass (IRC) 25,144
TOTAL 699,086 48,000

NS CTA IPP v1.4 5-36 March 17, 2010


5.7 Project E3: Dynamics and Evolution of Composite Networks

Project Leads: Ambuj K. Singh, UCSB (INARC)


Email: ambuj@cs.ucsb.edu, Phone: 805-893-3236
Project Lead: Boleslaw K. Szymanski, RPI (SCNARC)
Email: szymansk@cs.rpi.edu, Phone: 518-276-2714

Primary Research Staff Collaborators


L. Adamic, Michigan (INARC) C. Faloutsos, CMU (INARC)
A Barabasi, NEU (SCNARC) D. Lazer, NEU (SCNARC)
P. Basu, BBN (IRC) O. Lizardo, ND (SCNARC)
N. Chawla, ND (SCNARC) H. Makse, CUNY (SCNARC)
N. Contractor, Northwestern (INARC) Z. Wen, IBM (SCNARC)
R. D‘Souza, UC Davis (CNARC) A. Swami, ARL
J. J. Garcia-Luna-Aceves, UCSC (CNARC)
D. Hachen, ND (SCNARC)
J. Han, UIUC (INARC)
G. Korniss, RPI (SCNARC)
W. Leland, BBN (IRC)
C. Lin, IBM (SCNARC)
A. Pentland, MIT (SCNARC)
A. Singh, UCSB (INARC)
B. Szymanski, RPI (SCNARC)
Z. Toroczkai, ND (SCNARC)
A. Vespignani, Indiana (SCNARC)
S. Wasserman, Indiana (SCNARC)
Q. Zhao, UC Davis (CNARC)
TBD Post-doc Researcher, PSU (CNARC)

NS CTA IPP v1.4 5-37 March 17, 2010


5.7.1 Project Overview
This project will study the evolution of dynamics of integrated networks comprising
communication networks, information networks, and social networks, which is fundamental to
network science. An integrated network consists of many different structures that evolve and
interact with one another in space and time. The structure of tactical communication networks is
defined by its dynamic wireless connectivity. The structure of social networks is defined by
common interests and social links defined among users. The structure of information networks is
defined by the relationships among information objects. The function of the integrated network is
clearly defined by how information is disseminated, how these networks evolve, and how a
change in one network causally affects changes in another. Any one component is a
spatiotemporal structure that needs to be examined in the context of other networks.
This project is organized into three tasks. Task E.3.1 considers causality in composite networks
and the relationship of causality to network structure and information. In this task, we build time-
resolved models of information spread that will combine the structure of the communication
network with topic modeling, allowing us to predict the depth and path of information spread.
Task E3.2 examines the co-evolution of the three networks with dynamic endogenous and
exogenous processes. This task focuses on how nodes and edges simultaneously and
interdependently co-evolve. It involves the development of both computational models and
empirical examination of these co-evolutionary processes. This task is complementary and
extends some of the objectives of task E3.1 aiming at the study of how information
delivery/alteration might be reflected in changes of the network structure. Here we focus mostly
on social and behavioral processes and the continuous feedback loop structure-dynamics in the
network evolution. The two tasks as a whole encompass the necessary elements needed for a
complete description of dynamical networks. Task 3.3 investigates group and community
formation using real-world data to identify the socio-psychological and environmental key
factors (internal, external), and computational algorithms for network analysis on several levels
of resolution (multi-scale analysis).
The research activities of this project are intimately linked to other EDIN projects and other
network-specific projects. The mathematical models investigated in E2.2 for capturing temporal
structure and dynamics of networks are closely related to the co-evolutionary processes studied
in E3.2. Various aspects of mobility models studied under E4 are intimately connected to
evolutionary aspects studied here. Similarly, group dynamics studied in E3.3 are related to the
study of temporal behavior and causality in E3.2. The discovery of causal interactions and how
large networks can be managed have strong tie-ins to center-specific projects (e.g., I2.1 on
network management and I3.2 on knowledge discovery).

5.7.2 Project Motivation


The basic research issue that motivates the proposed work is how to detect the existence and then
predict the dynamical evolution of group formation and emergence of clusters, phenomena, and
organizations. Modeling the flow of information across the time scale can give a wealth of
knowledge on how entities in the network interact and how information propagates through a
network. In particular, the possibility of building models of information flow between users has
many interesting and important possible applications. For example, these models could be used
to predict, given a datum and ``transmitting‘‘ user, the set of receivers to whom this information
may spread. Or another application could be to predict the topic of a datum given the graph

NS CTA IPP v1.4 5-38 March 17, 2010


structure of its information flow. From just these two possibilities, it can be seen that the building
and analysis of such models can be useful in areas such as security or counter-terrorism.
Co-evolution is endemic in network phenomena. Changes in social interaction patterns lead to
changes in the information network (as the interconnections between logical objects evolve) and
in the communication network (as the communication patterns evolve). Similarly, changes in the
communication network or the information network can drive changes in the other two networks.
Co-evolution is a fundamental component in any formal model of network formation as well as
in the understanding of the network response in case of exogenous or endogenous stress
condition. Understanding of how network structure and nodal characteristics co-evolve to
address the core conundrums of network science: understanding how people affect the
configuration of their local network while simultaneously being affected by that network. The
long-term objective is to characterize the multi-scale dynamical features of large-scale social and
technological networks and develop methods that will finally allow the definition of layered
computational approaches in which different dynamical scales, co-evolution processes and
granularities can be used consistently. The results obtained in this task are crucial in defining
predictive computational approaches for the behavior of techno-social networks including
adaptive social behavior in the case of emergencies such as WMD and bioterrorism.

5.7.3 Key Project Research Questions


Our long-term objective is to discover laws and properties of information flow, evolution, and
co-evolution of dynamic heterogeneous integrated networks. Our key research questions are as
follows:
How to quantify the relationship between interactions and network structure? What is the
appropriate time scale for representing interactions?
How do each individual network and the integrated network evolve? What are the causal
paths in the composite network?
How do social, demographic and economical factors constrain network topology and co-
evolution?
How do the embedding space and administrative boundaries affect network topology and
co-evolution?
How do groups, social, and communication networks co-evolve? How to model the co-
evolution dynamics?
How does group formation depend on the attributes of the individuals (such as skill,
level, role, resources) as well as the extant links (such as communication, financial
transactions, exchange of materials or services) among individuals within the network?
How do we analyze the network for such behavior rules? Or, alternatively how do we
verify that a given set of group formation rules fit a given dynamic network?

5.7.4 Initial Hypotheses


Social and communication network structure affects the speed and depth of information
spread. Patterns of information flow are intimately tied to the topic and the association
remains stable over sufficiently long time periods to enable analysis and prediction.

NS CTA IPP v1.4 5-39 March 17, 2010


The dynamics of large scale networks is the outcome of the co-evolution mechanisms of
all the network components and dynamical processes. These co-evolution mechanisms
cannot be neglected in the description of the adaptation, resilience and response of
networks to critical situations.
The social/behavioral theories about network dynamics can be modeled in-silico on large
scale real data.

5.7.5 Technical Approach


The approach to group evolutionary models will be based on multi-theoretical multilevel model
designed to explain individuals‘ motivations to create, maintain, dissolve, and reconstitute links
with others in a network. The model explains these motivations on the basis of attributes of the
individuals (such as skill, level, role, resources) as well as the extant links (such as
communication, financial transactions, exchange of materials or services) among individuals
within the network. Since our research question here is to understand the mechanism of group
formation, the proposed effort extends our prior research by specifically addressing individuals‘
motivations to create, maintain, dissolve, or reconstitute a group linkage with another individual.
We also propose to exploit the dynamics of dyadic behavior, as captured in the notion of
reciprocity and homophily, to discover communities and their evolution. Here again we posit that
group linkages can be explained on the basis of characteristics of the individuals and other
linkages among the individuals.
Our work on group discovery and evolutionary analysis will focus on methods for clustering and
classification of networked data. The former will lead to discovery of communities, and the latter
will largely be focused on the classification and prediction on formation or persistence of links
between two nodes, which are essential to the dynamics of group evolution. We also posit that
there will be scenarios when only limited number of labeled training data is available. It is
possible for the labeling process to be associated with a selection bias such that the distributions
of data points in the labeled and unlabeled sets are different. Not correcting for such bias can
result in biased function approximation with potentially poor performance. To that end, we will
also investigate semi-supervised clustering and semi-supervised classification, especially in the
presence of shifting distributions. The proposed work will also extend those methods to the
environment of heterogeneous composite networks by examining different ways to transform
heterogeneous network into multiple homogeneous ones or clustering directly on heterogeneous
networks. We will also add a time dimension and study the methods for mining evolution of sub-
network clusters. This work will leverage linkages with IRC‘s task R1.1 on efficient
identification of clusters in composite networks; the latter proposes to start with local measures
of clustering at each node such as Scaled Coverage Measure [Li08].

5.7.6 Task E3.1 Analysis of Causal Structure of Network Interactions (L. Adamic,
Michigan (INARC); P. Basu and W. Leland, BBN (IRC); C. Faloutsos, CMU
(INARC); J.J. Garcia-Luna-Aceves, UCSC (CNARC); A. Singh, UCSB (INARC);
Q. Zhao, UC Davis (CNARC); Post-doc researcher (PSU); A. Swami, ARL)

NS CTA IPP v1.4 5-40 March 17, 2010


Task Overview
Social, communication, and information networks co-evolve as a single complex system.
Information flows over communication networks, communication networks are in turn shaped by
social ties. In this task, we build time-resolved models of information spread that will combine
the structure of the communication network with topic modeling, allowing us to predict the depth
and path of information spread. We also propose a novel time-series approach to detecting
arrival of new information based on change in structure of the communication network. Benefits
include being able to evaluate how efficiently information can diffuse in an existing network and
being able to detect new information in an adversary‘s network.
Task Motivation
Analysis of structure of networks (at all levels in the hierarchy) is essential to Network Science.
The fact that these networks evolve over time adds extra information and complexity to their
investigation, as compared to the analysis of static networks. By modeling the flow of
information between users (social network nodes) over a range of time scales one can predict the
set of nodes that the information may spread to. One could also predict the ―topic‖ of the
information flow given its underlying graph structure. This can have important applications in
areas such as security (predicting who and how far data from a message will spread to), or
counter-terrorism (discovering possible terrorist message threads from only the communication
structure).
Key Research questions
How to quantify the relationship between interactions and network structure? What is the
appropriate time scale for representing interactions?
Can interactions within a network be predicted based on past history?
Can the arrival of new information be detected from changes in the network structure?
How do each individual network and the integrated network evolve? What are the causal
paths in the composite network?
Initial Hypotheses
Social and communication network structure affects the speed and depth of information spread.
The arrival of new information is accompanied by predictable changes in communication
network structure. Patterns of information flow are intimately tied to the topic and the
association remains stable over sufficiently long time periods to enable analysis and prediction.
Prior work
Previous research into the prediction of communication by modeling on graphs can be divided
into two areas. In one, the content or subject of the messages and threads are ignored and any
messages sent close in time between two users are considered relevant and included into one
large model. Some examples of this approach are time series link prediction
(Huang09,Tylenda09) and event prediction (O‘Madadhain05). The second area considers only
the flow of information one time step into the future, predicting the recipients of a single
message instead of the entire ensuing message thread (Carvalho08).
Other areas of related research include work done on information diffusion in blogs
(Adar05,Lesk07). Here, frequent patterns/models are found for the flow of information across
the blogosphere network. However, two main differences in this work are that similarities in post

NS CTA IPP v1.4 5-41 March 17, 2010


content are not considered when analyzing the threads for frequent patterns, as well as an
assumption that any two posts with similar content and timestamps indicate a direct link between
the two users. In other relevant research, models of communication for a single subject have
been built, for use in identifying and tracking malicious subnets (Chung06). In this case, though,
the models were created only for a single subject and not used for the prediction of future events.
There exists been a body of research on studying monotone properties of random Bernoulli
graphs (Friedgut96) and random geometric graphs (Goel05), and robustness and fragility aspects
of large networked systems such as the Internet (Doyle05). However, the study of the impact of
network stimuli on graph properties under causal dependencies between networks is new.
Technical approach
Overall, we propose a three-step process for building these models on information flow graphs.
First, given user A, all communication threads which include messages sent by user A (using A
as the source) are found. This can be done, for example in the case of an email graph, by
grouping candidate messages to optimize a weighted scoring function which consists of the
similarities in subject line, message content, and time. Next, the specific threads found can be
clustered into general ``topics‘‘ using the text contents and unsupervised document clustering.
From these training threads, a probabilistic model of the information flow for each ``topic‘‘ can
be built. There are several different possible types of models that can be built at this step, from
simple correlation matrices to weak process models (Chung06). Additional factors that might
influence spread include decay in transmission probability with distance from the source
(Wu04), decay in interest with time, and social influence based thresholds. We expect to utilize
and modify several of these different classes of models, and find which model type performs best
on this class of communication graph data. After the models are built they can be used for
prediction; for example, given a message and sender, a mixture model may be built using
Expectation Maximization to find weights for how much the given message fits into each of the
possible ``topics‘‘ for that sender (using the word distributions within the message and the
topics). These weights can then be used to combine the respective models for each topic and find
the overall probability for each user to be a part of the ensuing message thread.
Furthermore, we will utilize a novel time series approach to correlate changes in communication
network structure to the arrival of new information. We will construct several time series, each
corresponding to a different network property, for example, the density, size of the giant
connected component, clustering, centralization, and degree correlations. Using time series
analysis, we will identify the properties that correlate with the arrival of new information, as
discerned using topic modeling, or evident in other kinds of detailed virtual world data: i.e.
transfers of assets, monetary transactions, and sharing of landmarks. This will allow us to
identify information arrival events in a network where the content of the communication cannot
always be obtained. It will also allow us see whether the network responds as expected when
new information arrives. A lack of response in the network variables may indicate inefficiency in
information transfer.
In addition to studying arrival of information as being a source of network stimulus, we will also
study the short-term impact of causality relationships between networks (IN, CN, and SCN) on
network properties under certain network stimuli. If a critical node (document) is erased from the
information network, this may have an impact on the information fusion process and the
information flow through the communication network between various social network entities;

NS CTA IPP v1.4 5-42 March 17, 2010


this can affect properties of the underlying IN, SCN, and CN, and hence the overall composite
network.

Validation approach
In the first year, we will concentrate on a small number of network stimuli (namely, deletion of
information node, failure of a communication device, change of a relationship from ally to
adversary) and analytically study their impact on key shared network metrics such as number of
affected information flows over a period of time. The key observation here is that above events
will not only change the structural properties of the network but also impose constraints on the
existing information flows in the network as well as their communication pathways, for example,
routing information through a non-ally or fusing information at a communication device owned
by a non-ally are not viable options anymore. This may hurt performance of some flows since it
may reduce the number of potential paths, but on the other hand, it may improve performance of
some other flows that were competing for resources but were not impacted by the network
stimulus. We will begin our research by studying such behaviors analytically for simple static
topologies of IN, CN, and SCN and simple causal mappings between them. Eventually we will
undertake more realistic models of dependencies when such models become available from other
research in the Consortium. This task will maintain close collaboration with task E2.
As mentioned in E2, mappings between networks may be highly dynamic in a spatio-temporal
sense. We will analytically study such classes of dynamics and their impact in this task. In
particular we will analyze the rate of change of composite network metrics as a function of the
rate of change of the aforementioned inter-network mappings.
Empirically, we will study many co-evolving networks in the virtual world Second Life,
including chat, social, asset transfer, and monetary transactions. Activities in virtual worlds
resemble many real-world activities, such as coordination and information distribution. New
information may include the arrival of a new asset or landmark designating a new assembly
point. A time series analysis will establish correlation between information arrival and chat
network structure.
Summary of Military Relevance
What are the appropriate models and representations for military communication, information,
and social networks? How are the individual networks linked together? How do each individual
network and the composite network evolve? What are the causal paths in the composite network?
What are the fundamental scientific underpinnings of this evolution? These are the key scientific
questions of network science that will be addressed in this project. These answers will
eventually lead to sound design and analysis techniques that can be used to analyze existing
military networks and to develop new networks so that military operations can be carried out
successfully.
Research Products
We will study relationships between information flow and network structure, and how best to
represent the relationships across multiple time-scales. These studies will lead to
recommendations for analyzing integrated networks (including military networks), and research
manuscripts.

NS CTA IPP v1.4 5-43 March 17, 2010


5.7.7 Task E3.2: Co-evolution of Social, Information, and Communication
Networks with Dynamic Endogenous and Exogenous Processes (A. Vespignani,
Indiana (SCNARC); S. Wasserman, Indiana (SCNARC); A. Pentland, MIT
(SCNARC); H. Makse, CUNY (SCNARC); G. Korniss, RPI (SCNARC); L. Adamic,
Michigan (INARC); N. Contractor, Northwestern (INARC); J. Han, UIUC
(INARC); C.-Y. Lin, IBM (SCNARC); Z. Toroczkai, ND (SCNARC); N. Chawla,
ND (SCNARC); R. D’Souza, UC Davis (CNARC); A. Swami, ARL)

Task Overview
This task focuses on how nodes and edges simultaneously and interdependently co-evolve. It
involves the development of both computational models and empirical examination of these co-
evolutionary processes.
Task Motivation
Co-evolution is endemic in network phenomena. If someone becomes ill, it changes their
interaction patterns, which then affects the spreading process. If political preferences affect
interaction patterns, and interaction patterns affect political preferences, then a wide variety of
emergent patterns are possible. These in turn lead to changes in the information network (as the
interconnections between logical objects evolve) and in the communication network (as the
communication patterns evolve). Similarly, changes in the communication network or the
information network can drive changes in the other two networks. Despite the importance of co-
evolution, it has been a neglected topic in network science until recent years, because of the
methodological challenges of studying changes in edges and nodes of a single network and
changes across networks. Our aspiration is to push forward the analytic and conceptual
machinery for the study of co-evolution of integrated networks. This is a fundamental component
in any formal model of network formation as well as in the understanding of the network
response in case of exogenous or endogenous stress condition. This task is complementary and
extends some of the objectives of task E3.1 aiming at the study of how information
delivery/alteration might be reflected in changes of the network structure. Here we focus mostly
on social and behavioral processes and the continuous feedback loop structure-dynamics in the
network evolution. The two tasks as a whole encompass the necessary elements needed for a
complete description of dynamical networks.
Initial Hypothesis
Networks and the dynamics of agents are changing continuously and any communication,
mobility or contagion process in large scale networks is the outcome of the co-evolution
mechanisms of all the network components and dynamical processes. These co-evolution
mechanisms cannot be neglected in the description of the adaptation, resilience and response of
networks to critical situations. We begin with the assumptions (1) that the behavior of elements
of any networked systems is influenced by the interaction pattern encoded in the network
structure 2) that the network structure changes according to the behavior of the elements of the
network. This co-evolution mechanism, in the form of a continuous feedback loop, is a key
element of the dynamical evolution of networks across different systems. Infrastructures and

NS CTA IPP v1.4 5-44 March 17, 2010


communication networks change according to the user behavior and vice versa. People are
influenced by those that they interact with, as well as individuals tend to choose to interact with
those who are similar to them. What are under-explored are critical ancillary questions, such as:
what kinds of similarity do people tend to select discussion partners on; what is the time scale of
social influence and network change; and so on. Furthermore, environmental factors such as
geographical, demographic and economic factors are constraining the co-evolution mechanisms
while at the same time driving the evolution dynamics. We assume that the characterization of
constrained networks will allow drawing general statistical association between network
structures and socio-demographic factors. This will allow the distillation of general principles
that will shed light on the co-evolution dynamics and its appropriate formal modeling.
Prior Work
In general, network science has proceeded in the analysis of the structural features of large-scale
networks and the dynamics occurring on those as isolated components. This has led to the
understanding of these two components as independent from each other. On the other hand, any
realistic model with the ambition of being used in a real world context needs the inclusion of the
interaction between these two components. Most of the process of interest in networks such as
the resilience to damage, or the alteration of communication efficiency during stress conditions
must include the adaptation of the network due to the restructuring of its topology due to the
change of dynamics and vice versa. The definition of basic models and topology generators for
networks as well must generally takes into account the co-evolution feedback as well as the
various constraining factors on the dynamical forces shaping networks. In the first year we plan
the study and characterization of various multi-scale networks of their co-evolution and the
mathematical formalism for the modeling of multi-scale dynamical processes. We aim at the
mathematical and statistical characterization of the multi-scale nature of these networks and the
statistical correlation and association of the network structure with the embedding spatial layout
(both in geographical and the social space). The aim is to go beyond the usual topological or
weighted network characterization of large-scale networks (Albert99, Newman03, ).
Another objective we will start pursuing during the first year is the formulation of the
appropriate mathematical formalism to address the theoretical understanding and computational
analysis of co-evolution processes. We individuate the particle-network framework in the case of
multi-scale networks as the appropriate mathematical conceptual tool. The key idea builds on the
proven success of reaction-diffusion processes for network flow modeling. In their simplest
formulation, transportation infrastructures, information and communication processes, social and
biological contagion are equivalent to classic reaction-diffusion processes used in many physical,
chemical, and biological systems (Colizza08). The challenge in adapting reaction-diffusion
models to large techno-social networks consists in the necessity of dealing simultaneously with
multiple time and length scales (scale-mixing) in the mobility processes of coupled populations.
The particle-network framework is an ideal framework to undertake the study of spreading,
mobility and diffusion processes in a wide range of problems. The reaction-diffusion framework
allows the occupation numbers Ni of each node to have any nonnegative integer value so that the
total particle population of the system is N N i . Each particle diffuses along edges connecting

nodes with a diffusion coefficient dij that depends on the node degree, node attributes and/or the
mobility matrix. Within each node particles may react according to various schemes that
represent possible interactions among particles/individuals. This framework has the advantage of

NS CTA IPP v1.4 5-45 March 17, 2010


dealing with arbitrary network structures (heavy-tailed, homogeneous, etc.) as well as
combination of those. In particular it is possible to integrate partly engineered topologies in the
framework, simulating networks emerging by the interplay of stochastic and supervised
dynamical principles.

Key Research Questions

How do social, demographic and economical factors constrain network topology and co-
evolution?
How do the embedding space and administrative boundaries affect network topology and
co-evolution?
How do groups, social, and communication networks co-evolve? In order to tackle this
question we will test several hypotheses such as the following: Are social ties the lie
within groups are more likely to be active as measured in chat activity and are also more
likely to become embedded in additional groups? Are Individuals who are ―physically‖
proximate in the online world more likely to chat with one another? Are individuals who
are more likely to chat with one another more likely to develop a game partnership
relation (engaging in joint gaming activities)? Are individuals who engage in joint
activities more likely to trade with one another? Are individuals who trade with one
another more likely to co-locate in the virtual world?
How can the co-evolution dynamics be plugged in the particle-network framework by
general nonlinear coupling mechanisms?
Is it possible to define general types of co-evolution mechanisms that can be mapped into
specific particle-network equation classes?
Is it possible to extend the particle network-framework to non-Markovian processes by
ad-hoc approximations and quasi-stationary effective coupling terms?

Technical Approach
We plan to study the co-evolution of integrated networks using two interrelated mathematical
formalisms: multi-scale networks and particle dynamics.
Characterization of multi-scale networks. The nodes of techno-social multi-scale-networks
usually have fractal distributions over multiple scales in space. These geographical distributions
and the corresponding demographic (population) distributions strongly constrain and define the
evolution and the structural and transport properties of these networks. We will analyze the
correlations of topological and dynamical properties with the actual demographic, geographical
and economic factors underneath the structure of these networks, including: (a) spatial
distribution of nodes and edges and their correlation with population density, and (b) correlation
of traffic flows and network evolution on edges with geographical and population attributes.
Particle-network framework. The reaction-diffusion framework has the advantage of allowing
suitable approximations including explicitly the discrete nature of particle packets and the
underlying topology of the network. In particular, one of the main issues to be considered in
multi-scale networks is the ubiquitous presence of very heterogeneous topologies dominated by
statistical distributions with heavy-tails. In the reaction-diffusion framework it is possible to
introduce explicitly in the description of the system classes of statistically equivalent nodes of

NS CTA IPP v1.4 5-46 March 17, 2010


degree k. The convenient representation of the system is therefore provided by the quantities
defined in terms of the degree k
1
Nk Ni ,
Vk i|ki k

where Vk is the number of nodes with degree k and the sums run over all nodes I having degree ki
equal to k. The degree block variable Nk is therefore representing the average number of particles
in all nodes with the degree k. While this representation corresponds to a homogeneous
approximation, it allows working explicitly with arbitrary network topologies and to
progressively introduce higher order structural properties such as clustering and multi-point
correlations. In addition, it allows for an explicit analytic solution while taking into account a
wide range of dynamical routing strategies as well as specific processes modeling the injection or
absorption of particles/individuals/information in the network. This is due to the particular nature
of the general framework in which the dynamics of particles is represented by a mean-field
dynamical equation expressing the variation in time of the sub-populations Nk(t) in each degree
block as:

t Nk pk N k (t) k P(k | k)dk k N k (t)


k

This basic reaction-diffusion framework can be used to study the propagation of information
particles as well as individuals mobility whose dynamics depend on or are modulated by the
structural properties (e.g., node degrees) of the underlying network. The above basic reaction-
diffusion framework can be further extended by including multiple particle types, change of
states or awareness, birth-death process (or some other processes) at nodes, for example, to
model the injection/absorption of information/individuals. Such an extended framework is
particularly useful for modeling networks under critical conditions or stress. In this task the
reaction-diffusion formulation on networks will allow us to explore the correlation feedback and
co-evolution of the network structure and the equilibrium stationary distribution of particles and
their flows. An ambitious objective is the formulation of particle-networks classes of equations
that correspond to dynamical types of co-evolution networks as categorized in the data analysis.
We also plan the study of networks combining engineered and emerging properties to
discriminate their effect in the co-evolution process. Another main objective during the first year
is the extension of the particle-network framework through approximate and quasi-stationary
approximations to non-Markovian dynamics that in many cases characterize social mobility and
interaction processes.
Validation
Validation will occur through the accumulation of results through multiple technical approaches,
and, most importantly, through examination of multiple data sets. In particular, the critical
validation questions will be: (1) which results are robust across a wide variety of data sets, and
(2) where there are powerful results that are heterogeneous across data sets, what contextual
details create this heterogeneity; 3) what general mathematical classes can be found in the formal
description of co-evolution processes that find evidence in the real data analysis. Data analyzed
will range from the dynamics of person-to-person interaction obtained with experiments that
assess face-to-face social interactions with active RFID to large-scale multimodal mobility
networks in more than 35 countries and worldwide long-range transportation systems such as the
airline transportation network. This will allow to leverage over more than 40 large scale datasets,

NS CTA IPP v1.4 5-47 March 17, 2010


most of them geo-localized. At the individual scale, we will use two related data sets that are
especially well suited to addressing these questions (a) longitudinal data on the self reported ties
and individual attributes from 800 students in 14 dorms, and (b) longitudinal data on the self
reported ties and individual attributes from 80 students in a single dorm, supplemented by data
on physical proximity and location from mobile phones that had been circulated to students at
the beginning of the academic year. Our objectives in processing these two data sets will be to
parse social selection and social influence. Further, we will drill down into minute details of
how time related factors affect behavior. For example, does the increased prominence of the
election in the Fall of 2009 increase political homophily? If someone is sick, do they tend to
interact less with others? These are key issues in understanding the drivers of social
organization, and should yield insight into the process of diffusion in groups more generally. We
will also study explicit group formation in the context of virtual world interactions. Using time-
resolved data on user-to-user social network ties, game-partnership ties, in-game trading ties,
chat interactions with approximate virtual locations, and group affiliations, we will map the co-
evolution of these three networks: for instance, social ties influencing group joining, groups
fostering chat, which in turn can lead to new social tie formation.
Research Products
At the end of year one we plan a set of deliverables that will include i) Publication/report on the
characterization and statistical analysis of networks constrained in the geographical and social
space; ii) Publication that reports on computationally efficient schemes/algorithms for the
simulation of dynamical processes (particle–-network framework) in multi-scale networks iii)
Preliminary technical report on the extension of the particle-network framework in the case of
non-Markovian processes and the definition of threshold mechanisms in spreading and contagion
phenomena; iv) Publication/report on co-evolution of political attitudes and physical proximity;
v) Publication/report on physical proximity patterns and contagion processes in a small, bounded
population; vi) Working paper on the extension of multi-theoretical multilevel models of
communication networks to specifically investigate co-evolution of multiple networks; vii)
Working paper on the extension of p*/Exponential Random Graph Models to multivariate
networks as well as spatial networks; viii) Parallel algorithms for p*/ERGM estimation that
leverage Petascale computers.
Summary of Military Relevance
The co-evolution task is relevant to the US military in multiple ways. Within the military,
although soldiers are initially molded by their direct supervisors, they are then shaped/molded by
their peers once a good relationship has been established within the unit. This research would
help understand the dynamics by which relationships are forged among soldiers, and the affect
that those relationships have on soldiers. This task will also provide conceptual tools for the
analysis and forecast of ad-hoc emergent social and communication networks in combat areas
where defined exogenous or endogenous constraints are present. Particular elements of the co-
evolutionary processes are especially important. For example, the process by which a pathogen
might spread, and the effect it has on the network, is relevant to military settings, which are often
vulnerable to the spread of disease (e.g., the flu). The team component will help understand the
formation of effective teams, which is important because teams are performing an increasing
fraction of mission critical tasks in the military.

5.7.8 Task E3.3: Data-driven Modeling and Simulation of Dynamic Community


Evolution (N. Contractor, Northwestern (INARC); J. Han, UIUC (INARC); Z.

NS CTA IPP v1.4 5-48 March 17, 2010


Toroczkai, ND (SCNARC); N. Chawla, ND (SCNARC); David Hachen, ND
(SCNARC); Omar Lizardo, ND (SCNARC); A.-L. Barabasi, NEU (SCNARC); B.
Szymanski, RPI (SCNARC))

Task Overview
The primary goals of this task include: group analysis and their formation using real-world data
to identify the socio-psychological and environmental key factors (internal, external), methods
of community detection, development of agent-based computational models for in-silico testing
and prediction of networked group behavior, and computational algorithms for network analysis
on several levels of resolution (multi-scale analysis).
Task Motivation
In network science today, efficient algorithms for large scale dynamic network analysis,
including detection of communities, are an ongoing, central and by far a non-finished research
area. Once such algorithms are developed, then the social/behavioral theories could be modeled
in-silico, and the various hypotheses put forward by sociological scientists validated. Armed with
knowledge on the most important factors and mechanisms for adversarial group formation,
evolution, dissolution, a multi-scale agent-based environment can be created.
The modern military increasingly needs to rely on bottom up network processes, as compared to
top down hierarchic processes. This is particularly true in interactions with social networks
existing in foreign societies in which many Army missions are conducted. How can we use the
massive streams of data to detect adversarial networks? How can we quickly extract the most
meaningful information for the soldier and decision maker that is useful in all aspects of their
operations from supporting humanitarian operations to force protection and full combat
operations. The models and methods investigated in this task will be evaluated for potential
military information network applications. Although we may not be able to get real military
mission datasets, we plan to use the available data from civilian application to investigate the
possible cases and scenarios and generate simulated data for proof of the concepts and promote
the potential applications of the developed theory and technology in military applications. The
multi-scale agent-based environment that we will be developing can be used as a computational
framework to predict network behavior based on partial data and input from the military
intelligence services, either from past, current or future operational areas of the US Army.
Key Research Questions
The research question here is to understand the mechanism of group formation and evolution.
Why – given a wide set of potential group members in the network–- an individual might be
more likely to join one group of individuals rather than another? Is this group formation on the
basis of attributes of the individuals (such as skill, level, role, resources) as well as the extant
links (such as communication, financial transactions, exchange of materials or services) among
individuals within the network? How do we analyze the network for such behavior rules? Or,
alternatively how do we verify that a given set of group formation rules fit a given dynamic
network?
Initial Hypotheses
We hypothesize that the social/behavioral theories about network dynamics could be modeled in-
silico on large scale real data. Currently, there is a lack of efficient and scalable algorithms for
modeling community detection and network dynamics. To address this issue, we will investigate

NS CTA IPP v1.4 5-49 March 17, 2010


several promising approaches to design such algorithms, as described in the technical sections of
this task description.
Prior Work
PW1. Group Evolution Models. The preponderance of prior research has taken group
composition as given and hence has considered it a source rather than the object of explanation.
However, with the dawn of the ―e-lance‖ economy and virtual organizations, individuals are
increasingly at liberty to select with whom they might want to assemble a ―group.‖
Communication technologies can dramatically enlarge the network of potential group members
to select, unfettered, say, by proximity constraints. It is therefore even more important than
before to advance our understandings of the social motivations for why – given a wider set of
potential group members in the network–- an individual might be more likely to join one group
of individuals rather than another. In prior and ongoing research, we have developed a multi-
theoretical multilevel (MTML) model to explain individuals‘ motivations to create, maintain,
dissolve, and reconstitute links with others in a network (Contractor06, Monge03). The model
explains these motivations on the basis of attributes of the individuals (such as skill, level, role,
resources) as well as the extant links (such as communication, financial transactions, exchange of
materials or services) among individuals within the network.
The MTML model includes eight families of theoretical mechanisms for creating group linkages:
(1) Theories of self-interest focus on how people make choices that favor their personal
preferences and desires, creating group ties that enable them to seek goals they wish to achieve.
Two primary theories in this area are the theory of social capital and transaction cost economics.
(2) Theories of mutual interest and collective action examine how forging links produces
collective outcomes unattainable by individual action. People create collaboration ties because
they believe they serve their mutual interests in accomplishing common or complementary goals.
(3) Contagion theories address questions pertaining to the spread of ideas, messages, attitudes,
and beliefs through direct or indirect groups links (Burt87). Similarly, collaboration can be
blocked by isolating parts of the network or by inoculating against infection. (4) Cognitive
theories explore the role that meaning, knowledge, and perceptions play in development of
groups. Decisions to forge group ties with others are influenced by who or what people think
others know. (5) Exchange and dependency theories explain the emergence of groups on the
basis of the distribution of information and material resources among network members. People
seek group ties with those whose resources they need and who in turn seek resources they
possess. (6) Homophily and proximity theories account for emergence of group links on the basis
of trait similarity and similarity of place (McPherson01) (7) Balance theories posit a consistency
towards relations. That is, individuals are more likely to create transitive ties. (8) Finally,
coevolutionary theory posits that group linkages are typically created in the belief that they will
increase group fitness, measured as performance, survivability, adaptability, and robustness.
Coevolutionary theory articulates how communities of groups populations linked by intra- and
inter-group networks compete and cooperate with each other for scarce resources.
In our prior and ongoing research, we have empirically tested MTML predictions in over four
dozen networks, using recent advances in exponential random graph modeling techniques
(Contractor06). . Our findings across the networks indicate that the individuals‘ motivations to
create, maintain, and dissolve ties with other individuals or knowledge repositories are a
complex combination of multi-theoretical motivations. No one theoretical motivation is
consistently superior or inferior to others. Instead, they tend to work in ensembles.

NS CTA IPP v1.4 5-50 March 17, 2010


Since our research question here is to understand the mechanism of group formation, the
proposed effort extends our prior research by specifically addressing individuals‘ motivations to
create, maintain, dissolve, or reconstitute a group linkage with another individual. Here again we
posit that group linkages can be explained on the basis of characteristics of the individuals and
other linkages among the individuals.
PW2. Group Discovery and Evolutionary Analysis. Most existing network modeling and analysis
methods consider homogeneous, static networks. However, networks in the real world are
heterogeneous, interacting and evolving, which poses great challenges in terms of effectiveness,
scalability, and comprehensive analysis of such information networks, and especially so for the
case of military networks. This task is to study the methods for discovery of evolution regularity
and exceptions of dynamic heterogeneous information networks. We assume such networks
consist of multi-typed, interconnected objects, such as soldiers, commanders, armed vehicles,
buildings (including bridges and some geospatial objects) documents, and other artifacts, each
associated with multiple properties (called attributes). Previous research has been focusing on
evolution of homogeneous networks, such as friends, authors, web pages themselves. It is new
and important to study evolutions of dynamic heterogeneous information networks.
PW3. Formation of elite/leadership groups. While current adversarial/terrorist networks have a
cellular and distributed structure, there is always a hierarchical organization of influence within
cells and among the cells. Prior to our own work, there has been no computational/modeling
research focusing in the literature on the formation of leadership hierarchies (―elites‖) from
network dynamics. It was only after the publication of our work that this approach was adopted
and expanded by others (James Hazy, from Adelphi U.). In our research, using a multi-agent
modeling approach we have uncovered some of the fundamental mechanisms behind the
emergence of leadership structures in competitive environments with limited resources. We have
shown that effective cooperation can emerge through network interactions within a completely
individualistic community of competing individuals and how properties of these cooperating
groups (size, network structure, stability, composition) are influenced by the properties of the
underlying social networks. We have developed a mathematical framework, (Toroczkai08),
describing these dynamical influence network structures, coined gradient networks. The
mathematical and computational modeling approach developed there can be directly extended to
study distributed, adversarial networks, under the same framework of influence graphs (gradient
networks).
PW4. Community Detection. Community detection in networks has a long history, for review see
(Newman06). However, almost exclusively, those methods are graph structure based. A radically
different approach was provided in some of our prior work (Steinhaeuser09), using non-topology
based measures as well. We proposed a community detection algorithm and demonstrated the
scalability of the same on a network with millions of nodes and edges. We have shown that the
commonly used maximum modularity evaluation criterion does not necessarily coincide with the
correct division of the network. In this case the algorithms that maximize modularity converge
on a suboptimal solution, that is, miss the discovery of the actual and meaningful communities in
a social network. We showed that our method is not only of less computational complexity than
other methods, but it is also more accurate in reflecting the true community structure.

NS CTA IPP v1.4 5-51 March 17, 2010


Technical Approach
In order to develop an understanding and predictive modeling capability for dynamic community
evolution one needs to seamlessly integrate previous research and new ideas coming from
disparate areas, ranging from behavioral sciences to computational modeling. This integrative
effort is precisely the greatest strength of this research module.
The technical approach has several phases and interdependent steps. The analysis side takes
datasets available to us and uses them to learn about communities and their dynamics. This
analysis requires, however, efficient methods of community detection (PW4) in the first place
and in case of large-scale social networks, methods for a multi-scale decomposition of
community structures (detailed below). The data driven analysis of communities will be used to
validate the hypotheses exposed in PW1 (MTML models) and in PW3 (leadership structures),
and suggest new ones where needed (described below). Once the relevant factors (individual,
external) and mechanisms are identified (based on data) for group dynamics, and the
corresponding theories developed, the next stage is to fuse this knowledge into an agent-based
predictive computational framework, which can be employed towards novel data of
military/intelligence nature, for strategy development purposes. Next we give a high-level
description of the methodology and technical approach to attack some of the major technical
questions.
We conjecture that variation across networks reflects the diverse tasks that are being
accomplished in these networks. The contingency framework proposes that likelihood of a
theoretical mechanism explaining group formation will depend on the goals of the group. We
have identified five goals commonly found in groups we have investigated: Exploring refers to
groups that in search of new information or undiscovered resources. Exploiting refers to groups
that seek to maximize their ability to exploit the resources that already exist in the group.
Mobilizing refers to groups that are trying to organize towards some collective action. Bonding
refers to groups where the main objective is to provide social support. Swarming refers to groups
where the ability to gear for a rapid response is a high priority. These goals are by no means
exhaustive nor are they exclusive. From our research, theories of self-interest, cognition, and
contagion are more influential in explaining group formation in groups whose goal is exploring.
By contrast, theories of balance, exchange, homophily and proximity are more influential in the
formation of groups who goal is bonding.
In case of group discovery and evolutionary analysis we will focus on discovery of the methods
clustering and classifying network data. The reason that we first focus on clustering methods is
that most evolution regularity in the dynamic world may not have training samples beforehand.
Then when there are some small numbers of training samples, we will consider the case of semi-
supervised clustering and semi-supervised classification. We will study typical classification in
composite networks as well.
First, we will perform a systematic study of clustering and classification methods for mining
evolution regularities of heterogeneous composite networks. Our proposed first task is to extend
existing clustering methods for homogeneous information network to the environment of
heterogeneous composite networks by examining different ways to transform heterogeneous
network into multiple homogeneous ones or clustering directly on heterogeneous networks.
Similarly, we will study how to extend our previous work on micro-clustering-based discovery
of evolution regularities of information networks (Kim09) and to heterogeneous composite
networks. We will also apply rank-based clustering to evolution analysis. The new aspect of this

NS CTA IPP v1.4 5-52 March 17, 2010


work is adding a time dimension to mine evolution of sub-network clusters. For mining
evolution of composite networks, an important direction is to see how to use limited training
examples to construct evolution models for such networks. Since we assume there will usually
be only a small number of training sets, we propose to study semi-supervised learning by
integrating classification and clustering approaches.
To discover efficient community detection algorithms, we will formulate objective functions for
edge weighting that not only capture the key structural and node-level parameters such as
expansiveness (calling behavior/out-degree), reciprocity (mutuality of communication) and
transitivity (friends of friends are more likely to communicate with one another), but also the
attributes and characteristics of the nodes. The proposed work builds on this prior work, but
incorporates the dynamics of human behavior and interaction to discover and monitor
communities over time. The community detection algorithms will incorporate the attributes of
nodal behavior, as well as the weights on the edges, which can be a functional form of
interaction between two individuals. Another critical element in community detection is the
notion of labeled versus unlabeled data. We could potentially have a small sample of ``known-
communities‘‘, which will serve as a labeled training data. While we can then exploit the labeled
data under a semi-supervised learning and clustering framework, the evolving and shifting
distributions will challenge the framework. In our prior work, we proposed a novel strategy for
semi-supervised learning under changing distributions, albeit for non-networked data. We
demonstrated that it outperforms the state-of-the-art in the presence of biased distributions, see
(Chawla05); however there is no work, to the best of our knowledge on semi-supervised learning
with changing network distributions. We will tackle this fundamental problem as part of the
proposal.
For studying evolution of communities, one fundamental issue is modeling the rate of link
persistence. Our prior work has looked at incorporating topological metric towards governing the
formation and persistence of link see (Raeder09, Lichtenwalter09). We will build on our
knowledge in this domain to incorporate more sociological as well as behavioral attributes at
different snapshots in time to capture the dynamics of network, and build more robust and
accurate predictive models that would enable us to answer questions like under what conditions
is a tie present in a social network at time t likely to decay by some future time t + ∆t? Previous
research addressing this issue suggests that the network range of the people involved in the tie,
the extent to which the tie is embedded in a surrounding structure, and the age of the tie all play a
role in tie decay. We propose to use the weighted social network data and attempt to determine
the importance of tie strength to tie decay. In particular, we will study the relative predictive
power of sociological properties with respect to tie decay and assess the effectiveness with data
mining algorithms model can accurately predict tie decay, both in the presence and absence of
strength information.
We will also build this family of predictive models as follows. We first define objective
functions for edges or links using the individual specific information and the other parameters
including sociological and behavioral attributes. Essentially, we start with the network at large,
use the properties to cluster, develop models on each cluster, and then make predictions. Our
goal is to first decompose the data space, and then generate predictive models tuned for the
decomposed space. Effective clustering of the data space will not only reveal unique dynamics
but also increase the signal to noise ratio. Developing models for the processes underlying the
behavior at the node and group levels will attain probabilistic predictions.

NS CTA IPP v1.4 5-53 March 17, 2010


Algorithmic and computational research will focus on developing 1) computationally fast and
parallelizable algorithms for non-local structural analysis measures (e.g., path distributions,
diameter, betweenness centrality, etc.), 2) advanced data mining methods for community
detection in massive social networks and methods for evaluating their structural and
compositional stability over time and 3) novel computational tools for longitudinal data analysis
and prediction for large-scale and longitudinal social network data (nonlinear times series
methods, data mining algorithms/methods for link prediction/persistence, especially in the
presence of highly skewed and changing distributions). The efficiency of simulations hinge upon
the decoupling of spatial, behavioral, and temporal scales into fundamental classes in social
networks when going from the individual to the group, and finally to the diameter scale network
properties. A systematic coarse-grained methodology will allow us to use more effective and
computationally less expensive models coupled across the scales to produce agent-based models
that have the same statistical behavior as the real-world data. The dynamic, data-driven
modeling suite will be developed so it can both generate predictions/forecasts about emergent
phenomena/behavior and to test scenarios and conduct what-if analyses.
Validation Approach
The goals of the IPP focus on development of efficient and scalable for modeling community
detection and network dynamics. We will analyze thoroughly these aspects of the algorithms that
we will be developing, using extensive social network data that are available to the consortium
researchers via collaborations with teams from Northeastern University, Notre Dame University
and RPI. Both efficiency and scalability will be measured and compared to the existing
algorithms.
Summary of Military Relevance
The long-term goal of the task is to create a multi-scale agent-based environment that can be
used as a computational framework to predict network behavior based on partial data and input
from the military intelligence services. Having such a tool will enable the military network
analysts to use the data from operational areas of the US Army to identify the most important
factors and mechanisms for adversarial group formation, evolution, and dissolution and to
predict network behavior based on partial data and input from the military intelligence services,
either from past or current operational areas of the US Army.
Research Products
The products in IPP will be the efficient and scalable algorithms for modeling network dynamics
and community detection and associated analysis of their performance and scalability.

5.7.9 Linkages with Other Projects

IPP Tasks Linkage


E3.1 ← I3.1, I3.2 How to discover causal behavior from networks?
E3.1 ← S1.3 Modeling multi-channel networks of people will
inform models of information diffusion.
E3.1 ↔ E1 What are the appropriate representations for
evolving networks?
S2.1 ← E3.2 Adversary community detection
S2.2 ← E3.2 Role recognition for adversary network members

NS CTA IPP v1.4 5-54 March 17, 2010


and the inter-relationship of different adversary
networks to each other
E3.2 ← E4.1 How do mobility patterns affect co-evolution?
E3.2 ← I2.3 Information network constructed from different
sources will be incorporated in modeling suite.
E3.2 ← E4 How do mobility patterns affect co-evolution?
I2.3 ← E3.2 Social network models
E2 ← E3.2 Contribute concepts, data and theory on network
dynamics and co-evolution for the mathematical
model of time-dependent networks
E3.2 ← E3.3 Models and analysis of group behavior will be
used to understand co-evolution of networks.
E3.3 → S2.1 Persistent link model developed under E.3.3 will
be used in adversary network detection planned
under S2.1
E3.3 ← R1.1 Contribute clustering concepts and metrics such
as SCM that are appropriate for identification of
local clusters in composite networks.
E3.3 → S2.2 Clustering of nodes and classification of
information flows developed under E3.3 will be
used in analysis of information flows in trusted
communities planned in S2.2
E3.3 → S4 The community detection in composite networks
will influence the analysis of the community
disintegration and engineering planned in S4.
E3.3 → E3.1 The social network modeling suite will provide
basis for investigation of temporal and spatial
patterns studied in E3.1
E3.3 ↔ I2.3 Input from I2.3 on information network
constructed from the different sources will be
incorporated in to the modeling suite of E.3.3
while output to task I2.3 on social network
models will be provided from E3.3.
E3.3 ↔ I3.1 E3.3 will guide studies of I3.1 on clustering
heterogeneous information networks and the
techniques developed at I3.1 will be useful at
studying clustering and evolution of general
networks in E3.3
E3.3 ↔ I3.3 E3.3 will guide studies of I3.3 on text mining of
information networks and the techniques
developed at I3.3 will be useful for studying at
mining text and unstructured data in social and
information networks in E3.3.

NS CTA IPP v1.4 5-55 March 17, 2010


5.7.10 Collaborations and Staff Rotations

Each task will have monthly teleconferences to coordinate the planned research work. Each task
itself will naturally decompose into subtasks and these will have closer collaborations in the form
of joint papers and joint advising on PhD theses. There will be organized visits by the faculty and
staff of the collaborating institutions. Wherever possible, meetings will be coordinated
opportunistically with travel to conferences and meetings. The UCSB Research scientist located
at NS-CTA facility will spend a part of his/her time working on this project.

5.7.11 Relation to DoD and Industry Research


Understanding the fundamentals of appropriate models and representations for military
communication, information, and social networks and the ways in which individual networks are
linked together are of primary importance to DoD missions. Being able to predict how each
individual network layer and the composite network evolve, how the causal paths in the
composite network form and overall discovering the fundamental scientific underpinnings of this
evolution are also important both for DoD-oriented and industrial research as will eventually
lead to sound design and analysis techniques that can be used to analyze existing military
networks and to develop new networks so that military operations can be carried out
successfully.

5.7.12 Project Research Milestones

Research Milestones

Due Task Description


Classification and characterization of relationship of
Q2 E3.1 information flow with content and network structure
(UCSB, Michigan)
Initial analyses of network dynamics and multiple temporal
Q2 E3.2
scales (NEU, MIT, Northwestern, Michigan, IU)
A report on the design of methods for clustering and
Q2 E3.3 evolution of heterogeneous information and social networks
(UIUC, RPI)
Construct and analyze a set of metrics suitable for time
Q3 E3.1 series on online chat network and blog citation network
(Michigan)
Preliminary results on social influence/diffusion processes
Q3 E3.2
and group formation processes (NU,IU and NW)
New algorithm development and testing for clustering and
Q3 E3.3
evolution analysis for heterogeneous networks (UIUC, RPI)

NS CTA IPP v1.4 5-56 March 17, 2010


Research Milestones

Due Task Description


Model topic bursts and correlate burst events with extrema
in network variables. Also quantify total amount of unique
Q4 E3.1
information exchanged and correlate with network variable
time series (Michigan, UCSB)
Q4 E3.2 Working paper on co-evolutionary processes (NU)
A research paper prepared to be submitted for publication on
Q4 E3.3 clustering and evolution analysis in heterogeneous networks
(UIUC, RPI)

5.7.13 Project Budget by Organization

Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 154,436
CMU (INARC) 20,891
CUNY (SCNARC) 26,046
IBM (SCNARC) 47,115
IU (SCNARC) 99,000
MIT (SCNARC) 28,017
ND (SCNARC) 49,500
NEU (SCNARC) 53,235
NWU (INARC) 69,601
PSU (CNARC) 30,000
RPI (SCNARC) 67,528 12,327
UCD (CNARC) 100,000
UCSB (INARC) 48,814
UCSC (CNARC) 25,000

NS CTA IPP v1.4 5-57 March 17, 2010


Budget By Organization

Government Funding
Organization Cost Share ($)
($)
UIUC (INARC) 91,961
UMich (INARC) 76,253
TOTAL 987,397 12,327

NS CTA IPP v1.4 5-58 March 17, 2010


5.8 Project E4: Modelling Mobility and its Impact on Composite
Networks

Project Lead: T. La Porta, Penn State (CNARC)


Email: tlp@cse.psu.edu, Phone: 814-574-6295

Primary Research Staff Collaborators


A. Barabasi, NEU (SCNARC) A. Swami (ARL)
P. Basu, BBN (IRC)
T. Brown, CUNY (CNARC)
T. La Porta, PSU (CNARC)
D. Lazer, NEU (SCNARC)
H. Makse, CUNY (SCNARC)
P. Mohapatra, UC Davis (CNARC)
S. Pentland, MIT (SCNARC)
K. Psounis, USC (CNARC)
B. Szymanski, RPI (SCNARC)

5.8.1 Project Overview


The goal of this project is to develop models of mobility that may be used to drive the evolution
of social, information and communication networks. These models will provide synthetic traces
which may be used to generate network dynamics at the appropriate level of granularity and
metrics.

5.8.2 Project Motivation


Mobility is a key characteristic of tactical networks. The impact of this mobility must be
understood to be able to accurately characterize, predict and control the behavior of composite
social, information and communication networks. Likewise, mobility is an important factor in
determining the formation of adversarial networks. Existing mobility models do not adequately
address the needs of military-type networks, and do not address the joint evolution of social,
information and communication networks.
A number of more realistic mobility models have been proposed that capture the mobility
patterns in specific scenarios [Royer99, Hui05, Henderson04]. Examples include mobility
models for freeway movement, the human mobility in campus environments, the mobility of
nodes in groups, etc. But, these models tend to be very complicated, and are very hard to

NS CTA IPP v1.4 5-59 March 17, 2010


analyze theoretically. In addition they are often designed without respect to specific metrics that
allow determination of impact on properties of interest.
As an example, prior mobility models do not allow one to infer the impact of mobility at the
scale that affect node communications on the formation or maintenance of social networks.
Likewise, models that capture mobility at the scale of social networks do not allow one to
determine the impact on communications. Furthermore, models for these two types of network
(social and communication) have not been developed to be consistent, albeit at different spatial
and time scales.
Challenges of Network-Centric Operations
Mobility is a key component of network-centric operations. Resources that move range from
individuals, to groups, to communications equipment. Mobility has a drastic effect on the
evolution of networks.
Example Military Scenarios
Relevant military scenarios include troop movements, individual or group, movement of data
gathering resources, for example sensors, and the movement of communications equipment, for
example aircraft or mobile radios that are assisting in communications.
Impact on Network Science
Existing mobility models do not allow one to infer the impact of mobility at the scale that affect
node communications on the formation or maintenance of social networks. Likewise, models that
capture mobility at the scale of social networks do not allow one to determine the impact on
communications. Furthermore, models for these two types of network (social and
communication) have not been developed to be consistent, albeit at different spatial and time
scales. This project will lead to models that are modular and useful across a range of network
types and performance metrics of interest.

5.8.3 Key Project Research Questions


This project aims to answer the following questions:
Can a suite of consistent mobility models scale from potentially minor movements (e.g.,
change of orientation) through longer-range mobility (e.g., international travel)?
How is mobility in one network correlated with mobility in other networks? How does
mobility affect shared network metrics (special case of E.2)?

5.8.4 Initial Hypotheses


Our expectation for the first question is that, yes, such a suite of models may be developed. In
the short term we expect to develop models that are accurate for movement trends of large
groups of people and that impact neighborhoods from the perspective of communications. These
models will be able to produce traces within statistical error of metrics to be determined during
year 1. In the long term we expect to be able to define metrics of interest, and to develop models
that may produce traces that within the statistical error for movements ranging in granularity
from those that impact signal quality from a communications perspective to those that influence
the formation of communities.

NS CTA IPP v1.4 5-60 March 17, 2010


5.8.5 Technical Approach
Overview
As an answer to this dichotomy between realism and analytical tractability, we propose to
develop a suite of mobility models that capture metrics of specific interest to the evolution of
different types of network, and ultimately the evolution of composite networks. We also plan for
models that will make use of the motivation for movement. These models will allow synthetic
traces to be generated that are statistically close to actual traces, but that may represent a large set
of scenarios. These models may then be used by the core programs of the CTA to determine the
impact of mobility on social, information and communication networks. They will also be used
by E.2 to capture the impact of mobility on structure and co-evolution of networks. Moreover,
they will be used in the latter years of this project to determine the specific impact of mobility at
different scales on the joint structure of composite social, information and communication
networks.
We define three tasks to meet our goals. In the first task, E4.1 Deriving data-driven models and
understanding human mobility we use data traces from actual mobility and real-life scenarios to
generate models. In the second, E4.2 Deriving metric-driven mobility models we define metrics
of interest, and develop models to capture the behavior of these metrics in a way that is useful to
determine network evolution and its impact on networks of different type. These two models are
important for determining the impact of mobility on the structure of communication, information
and social networks. Ultimately we will launch a third task, E4.3 Characterizing impact of
mobility on properties of composite networks, but this task if deferred until sufficient progress is
made on E4.1 and E4.2.

5.8.6 Task E4.1: Deriving Data-Driven Models and Understanding Human


Mobility (A.-L. Barabasi and D. Lazer, NEU (SCNARC); B. Szymanski, RPI
(SCNARC); S. Pentland, MIT (SCNARC); H. Makse, CUNY (SCNARC); T. La
Porta, PSU (CNARC))

Task Overview
The long-term goal of this task is to gain a basic quantitative understanding of the fundamental
forces that shape the topology as well as the spatial and geographical properties of social
networks. This requires extensive tool development and database preparation. During the first
year we plan to start this exploratory analysis and lay the groundwork for our and the other team
members‘ work during the coming years.
Task Motivation
The motivation for this task is to understand the interplay between the physical space (location)
and social network.
Key Research Questions
In this task we answer the following research questions: Can mobility models be generated from
large sets of trace data and military planning scenarios that can be used for understanding the
formation of communities of interest and communication networks? The answer to this question
will lead to be able to analyze the impact of mobility on network structure.

NS CTA IPP v1.4 5-61 March 17, 2010


Initial Hypotheses
Our initial hypothesis is that from studying the sources of data described above, we will be able
to identify dynamic sub-structures within social and communication networks to the level of
granularity that will allow us to generate analytical models from which we can understand the
impact of mobility on social and communication networks.
Technical Approach
The overall goal, understanding the interplay between the physical space (location) and social
network, is rather wide and far reaching, so we plan to focus on three issues that need to be first
addressed for the later success of the program:
Objective 1: The bulk of research in the past ten years has focused on the structure of static
networks, describing topological aspects, such as the degree distribution and its origin, degree
correlations, network motifs, and network communities. Changes in network topology and
communities of interest as a function of time, however, have received much less attention. To
make progress on this important problem, we plan to apply longitudinal data, covering a period
of three months up to one year, from a network of millions of cell-phone users in an
industrialized European country to of analyze how a real social network changes over time.
We plan to develop tools to identify dynamical substructures inside dense regions (communities)
of the network, and identify the distinct archetypes that characterize communication patterns.
The results of this objective will allow us to understand how communication topologies among
communities of interest change over time with mobility. It will also allow us to characterize the
social and mobile behavior of adversarial social networks.
The methods developed here will facilitate the analysis of digital trace data, and especially
mobile phone data, in ways highly relevant to the military. Specifically, current military efforts
increasingly involve understanding the dynamics of local populations. In part this could be done
through examination of trace data from mobile phones. Can one detect anomalous patterns of
call behavior when the adversary moves into a territory, for example? Such patterns may be
recognizable from call patterns (the threat from such a move might result increased call volume
to friends and family, for example), or from movement patterns (where an incipient attack might
be preceded by signature movements from informed local populations).

The team currently has in its possession the best curated and best understood mobile phone
dataset in existence. It is therefore the ideal resource with which to develop methods to analyze
the network and mobility information flows that result from mobile phones. While we would
anticipate substantial adaptation would be required in moving from these data to data from, for
example, areas characterized by violent conflict, these are, at this time, the ideal data with which
to start developing models and tools to understand normal and atypical patterns in networks and
mobility. These data are particularly useful because we have identified events, such as
bombings, earthquakes, power outages, etc, that would serve as models for detecting anomalous
movement and communication events associated with societal disruptions. Over the longer term
we seek to cultivate a broader range of data sets, including, potentially, areas characterized by
strife, where we could link known conflict events to patterns observed in the data.

Objective 2: In aiming to understand the interplay between physical space and social networks,
we first need to understand the mobility patterns. We therefore plan to start a modeling effort to

NS CTA IPP v1.4 5-62 March 17, 2010


understand the mechanistic models that capture the basic scaling laws of individual human
mobility. These models have the potential to expand our understanding of social processes as
well, and will serve as the starting point of the proposed applications. As a first step we plan to
study a number of key quantities that characterize the user trajectories, from the time dependence
of a user‘s radius of gyration (capturing the range of distances the user covers on a regular basis)
to the number of towers visited by the user; the recurrence of the visitation pattern (the frequency
by which a user visits the same location); the growth in the number of towers visited by the user
(a measure of the territory covered by the user over time); the waiting time distribution (the time
users spend on a specific location). Note that each of these measures needs to be properly
corrected for the potential biases resulting from data incompleteness.
Objective 3: To capture the mobility in tactical networks, we will work with the Army Research
Lab to obtain mobility traces and develop realistic mobility profiles. We have similar ongoing
work as part of the International Technology Alliance (ITA) project in which we work with the
Army to develop mission scripts and ontologies of personnel roles in missions with respect to
information requirements. We will collaborate with the ITA program and ARL to extend these
two models specifically to include mobility. We will also work with ARL and other DoD
agencies that gather data and publish reports with respect to mobility models and
characterizations. For example, we will reference the NATO Reference Mobility Model and its
updated version [Vong99]. In [Birkel03] an attempt is made to develop mobility models
consistent with the NATO Reference Mobility Model II. This document provides models for
ground vehicles and groups, such as the speed they may attain, deceleration and acceleration
characteristic and turning ability on different surfaces. The Deep Green effort
(http://www.darpa.mil/ipto/Programs/dg/dg.asp) from DARPA tries to develop mission
alternatives using simulations to assist in on-going missions. As part of this work mobility
models are used. We will determine if they are accessible and relevant to our efforts.
The work in the present task contributes and benefits of the continuous interactions with tasks
S1.2 and S3.1. Objective 1 will provide empirical and mathematical groundwork for
understanding the temporal and spatial aspects of multilayer networks (Task S1.2), and it will be
of crucial important for task S3.1 focusing on community formation. Furthermore, Objective 2
will offer the tools to start incorporating in the future the spatial aspects into these tasks. The
models generated here will impact C1.3 and C2.2 specifically in CNARC, and the overall
CNARC program in general which relies heavily on the dynamics in networks.
Validation Approach
The validation approach of Objective 3 is to compare the models to realistic missions that may
be carried out in tactical networks. To do this we will work with ARL experts to evaluate the
output from the models. We will also determine the scalability of the models in terms of
different environments (dense vs. sparse, high vs. low mobility) by evaluating their realism in a
military context.
Summary of Military Relevance
The formation of CoIs is important to understand for two reasons: first, it will shed light on how
information is shared by humans which will help understand the bet methods of sharing
information in a military setting; second, it will shed light on now information is shared by
civilians, allowing the models to be applied to potential adversarial communities operating
outside a military setting. The impact of communication networks is directly related to the
mobility of armed forces.

NS CTA IPP v1.4 5-63 March 17, 2010


Research Products
At the end of year one we expect as deliverables (1) a paper under preparation for publication (or
submitted) on the Objective 1, the temporal aspects of communities in social networks; (2) a
report, that will serve as the basis of a potential papers in year 2, on the basic statistics of
mobility patterns in civilian settings (Objective 2); and (3) a similar report for the mobility in
tactical networks.

5.8.7 Task E4.2 - Deriving Metric-Driven Mobility Models (K. Psounis, USC
(CNARC); P. Mohapatra, UC Davis (CNARC); T. La Porta, PSU (CNARC); P.
Basu, BBN (IRC); T. Brown, CUNY (CNARC); A. Swami, ARL)

Task Overview
In this task we consider the phenomena that is important to the networks being impacted by the
mobility and performance metrics of interest. We generate models to produce traces that
conform to the statistics of these metrics. The type of mobility being modeled will also impact
communities of interest that form based on locality.
Task Motivation
The results of this task will be used by E.2 to determine network evolution, and several tasks in
CNARC that deal with the dynamics of communications networks. They will also be used by
Task S1.2.
Key Research Questions
In this task we seek to answer the question: Can models be developed that generate traces that
are statistically close the real mobility and scale form vastly different scenarios?
Initial Hypothesis
We expect that we can generate models based on statistics that scale from mobility that impacts
connectivity to those that impact proximity within hundreds of yards. As a longer term goal we
expect to generate models that can impact channel conditions on a time scale of a few seconds.
Technical Approach
When considering mobility models for determining the impact on communication networks, it is
crucial to understand the type of phenomena that must be captured to keep models tractable.
This in turn requires the understanding to the type of network being modeled. When modeling
mobility for cellular networks, for example, the metrics of interest are typically how often
handoffs occur and how many users may be in the coverage area of a single radio. The first
metric is used to help engineer the capacity of signaling links and network processors so that
they can handle the control load generated by mobility. Simple fluid flow models that
approximate the amount of movement during congested time periods often suffice for this
purpose. The second metric is used to determine how to allocate resources, for example
frequencies to cells. Models that incorporate expected human movements given the layout of a
network are often used for this purpose (e.g., movement on roadways, movement between cities
and suburbs). This second type of model is addressed in E4.1.
When modeling mobility in tactical networks, a different set of metrics is important. For
communication networks slight movements such as the change in angle of a node may impact its
ability to maintain connectivity with nodes. From a networking perspective mobility in
MANETs typically impacts routing protocol overhead and the latency with which data is

NS CTA IPP v1.4 5-64 March 17, 2010


delivered across a network. Thus metrics of stability and cardinality are often important.
Finally, the characteristics of mobility in tactical MANETs are different than in civilian
networks. Mobility in tactical networks may be based on planned missions or a group reaction to
an event, thus movement in groups is often present [Venkateswaran09, Traynor06].
To model mobility in tactical MANETs we will define metrics to quantify or describe the node
mobility, link stability, and network connectivity. Then, we plan to study the evolution of network
connectivity and link stability over time due to different mobility models based on temporal
random graphs. Second, we will look into the correlation between consecutive random graphs,
and quantify the relationship between the correlation, nodes mobility pattern/property, and
network parameters (such as node number, area, transmission power, etc.). Third, based on the
analysis, we will develop tractable mathematical models to analyze instant and asymptotic
performance of network connectivity and link stability under each mobility model. We will also
validate our model through extensive experimental work. We have performed a pilot study at
USC where we instrumented 25 individuals with embedded radios for a week to log who they
met, when, and for how long. We are able to extract many useful statistics from such
experiments including determination of groups of users that spend significant time with each
other, distributions of the sizes of these groups and the dynamics of their interaction. Fitting an
appropriate parameterized distribution will help greatly in the design of performance prediction
tools and protocol designs.
An important family of models we consider are community-based mobility models. Each node
preferentially moves within its own community (e.g. the department building on a campus), and
occasionally roams through the network (e.g. the whole campus). In our preliminary work we
have analyzed the basic properties of this mobility model and extended it to be time-variant by
dividing the time into multiple time-periods. This allows us to model periodicities, e.g. daytime
versus nighttime mobility patterns [Jindal09, Hsu09]. The preliminary model also supports
multiple-tier communities, e.g. common attraction points.
This task will draw upon the research on network mobility modeling in the US/UK ITA program
– this task will address research issues not being addressed in the corresponding ITA project. The
latter is focused on building universal mobility modeling frameworks for characterizing tactical
mobility patterns based on composition of basic building blocks namely, target selection;
steering behaviors for reaching target, collision avoidance etc.; and locomotion through a
physical navigation graph, and on studying connectivity properties of resultant communication
network structures as a function of universal model parameters. In contrast, research in the
current task will focus on the analysis of the time-evolution of key network properties given the
time-evolution of mobility model parameters. We will go beyond studying connectivity metrics
and will investigate ones that pertain to information flow in the network, e.g., max-flow between
instantiations of two IN nodes on the physical network as a function of mobility of intermediate
CN nodes.
During the course of this proposal, we will carefully select the right mobility model for each
situation. In year one, we plan to use our time-variant community-based model, since it is the
most realistic among analytically amenable models. To further improve its relevance, we will
carefully compute the various parameters of the model, e.g. the number of communities per
node, the size of communities, the probability to move outside ones community, the periodicity
parameters, etc., such that synthetic traces generated by the model match important properties of
available application-specific real traces.

NS CTA IPP v1.4 5-65 March 17, 2010


As in E4.1, to augment models depending solely on statistics, will work with the Army Research
Lab to obtain mobility traces and develop realistic mobility profiles.
Validation Approach
To validate this work we will generate traces from our models and check the statistics of key
metrics defined during the first year. We will compare the statistics of the traces to our targets to
see if there is any significant difference. Longer term, we will evaluate the metrics by
determining their importance on the evolution of networks of all types.
Summary of Military Relevance
Relevant military scenarios include troop movements, individual or group, movement of data
gathering resources, for example sensors, and the movement of communications equipment, for
example aircraft or mobile radios that are assisting in communications.
Research Products
At the end of year one we expect to have statistical models on connectivity and link stability
derived from trace data.

5.8.8 Task E4.3: Modeling the Impact of Mobility on Composite Network


Properties (deferred task)
A special case of mobility is controlled mobility. i.e., when mobility is guided by a benefit to
some other node or person other than the node or person moving, perhaps to establish a
communication link or for a sensor to monitor a certain location. We term one class of this type
of mobility as influenced mobility. For example, if an information item of interest is posted on a
bulletin board, a certain community of people may move toward the location of the information
item. Likewise, if an Internet access point is available in one part of a facility, people may move
from a part of the facility where there is no Internet access into the area where access is
provided. To understand these types of mobility one must understand the motivation for
movement and the interaction between social, information and communication networks in terms
of mobility.
This is a ―deferred task‖ which will be started in year 2 of the NS CTA program after the initial
research results from E1.1, E2, E3, E4.1 and E4.2 are available. As our understanding of
mobility increases with the execution of E4.1 and E4.2, we will develop controlled and influence
mobility models. This will allow us to directly model the influence of mobility and structure of
social, information and communication networks on each other.

5.8.9 Linkages with Other Projects


This project is closely related to E2 and E3. Mobility is a primary factor in the evolution of the
structure of networks. This project will use the results of and feed E2 and E3. The metrics
defined in this project will be harmonized with those in E1.1.
Mobility is an important factor in all three types of networks. This task will have a major impact
on CNARC tasks C1.1, C1.3, C2.1 and C2.2. All of these tasks use mobility as a driver for
dynamics (C1.1, C1.3 and C2.1) or as something that may be leveraged to help deliver data
(C2.2). In SCNARC, mobility is a prime factor in the formation of communities (see S1.2 and
S2.1).
This project also feeds the Trust CCRI.

NS CTA IPP v1.4 5-66 March 17, 2010


5.8.10 Collaborations and Staff Rotations
Each task will have monthly teleconferences to coordinate the planned research work. Each task
itself will naturally decompose into subtasks and these will have closer collaborations in the form
of joint papers and joint advising on PhD theses. There will be organized visits by the faculty and
staff of the collaborating institutions. Wherever possible, meetings will be coordinated
opportunistically with travel to conferences and meetings.

5.8.11 Relation to DoD and Industry Research


The US/UK ITA program addresses network mobility modeling. We believe that since one of the
PIs (Basu) is also a Project Lead for the corresponding project in ITA, it will help develop
synergies between EDIN and ITA mobility modeling projects while minimizing the amount of
overlap in scope.

5.8.12 Project Research Milestones


Research Milestones

Due Task Description


Characterize mobility traces obtained from ARL and other DoD
Q2 E4.1
documents
Analysis of how a real social network changes in time by
applying longitudinal data, covering a period of one year, from
Q3 E4.1
a network of millions of cell-phone users in an industrialized
European country
Extension of existing mobility models to capture various
Q3 E4.2
military-relevant group mobility constraints
Statistical models on connectivity and link stability derived
Q4 E4.2
from trace data

5.8.13 Project Budget by Organization

Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 47,313
CUNY (CNARC) 15,000
CUNY (SCNARC) 45,580
MIT (SCNARC) 28,017

NS CTA IPP v1.4 5-67 March 17, 2010


Budget By Organization

Government Funding
Organization Cost Share ($)
($)
NEU (SCNARC) 114,714
PSU (CNARC) 55,589 20,000
RPI (SCNARC) 18,385 3,356
UCD (CNARC) 30,000
USC (CNARC) 73,000
TOTAL 427,598 23,356

NS CTA IPP v1.4 5-68 March 17, 2010


References

[Adar05] Adar, E. and Adamic, L.A., ―Tracking information epidemics in blogspace,‖


Proceedings of the 2005 IEEE/WIC/ACM International Conference on Web Intelligence, 207--
214, 2005.
[Albert02] Albert R. and Barabasi A.-L. (2002) ―Statistical mechanics of complex networks,‖
Rev. Mod. Phys. 74, 47-97.
[Allen83] J. F. Allen, ―Maintaining knowledge about temporal intervals,‖ Communications of
the ACM. 26/11/1983. ACM Press. S. 832-843, ISSN 0001-0782.
[Backstrom06] L. Backstrom, D. Huttenlocher, J. Kleinberg, X. Lan, ―Group formation in large
social networks: membership, growth, and evolution,‖ in KDD‘06, 2006.
[Barabasi99] A. L. Barabási and R. Albert, ―Emergence of Scaling in Random Networks,‖
Science, 1999.
[Barrat04] Barrat, A., Barthelemy, M., Pastor-Satorras, R., & Vespignani, A. (2004). ―The
architecture of complex weighted networks,‖ Proc. National Academy of Science, 101 (10),
3747-3752.
[Barrat09] Barrat, A, Barthelemy , M, Vespignani, A., ―Dynamical Processes on complex
Networks,‖ Cambridge University Press (2009).
[Bettstetter03] C. Bettstetter, ―Topology Properties of Ad Hoc Networks with Random
Waypoint Mobility,‖ Proc. ACM SIGMOBILE Mobile Computing and Communications
Review, vol. 7, July 2003, pp. 50–52.
[Birkel03] P. A. Birkel, ―Terrain trafficability in modeling and simulation,‖ in TECHNICAL
PAPER SEDRIS 2003 – 1, 2003.
[Blatt96] Blatt, M., Wiseman, S., & Domany, E. (1996). ―Superparamagnetic clustering of data,‖
Phys. Rev. Lett 76, 3251--3254.
[Cai05] D. Cai, Z. Shao, X. He, X. Yan, and J. Han (2005) ―Community mining from multi-
relational networks,‖ Proc. 2005 European Conf. Principles and Practice of Knowledge
Discovery in Databases (PKDD'05), pp. 445-452.
[Carvalho08] Vitor Carvalho and William Cohen, ―Ranking users for intelligent message
addressing,‖ Advances in Information Retrieval, pages 321—333, 2008.
[Chawla05] N. V. Chawla and G. Karakoulas (2005) ―Learning From Labeled And Unlabeled
Data: An Empirical Study Across Techniques And Domains,‖ Journal of Artificial Intelligence
Research (JAIR), Volume 23, pages 331-366.
[Chen76] P. P. Chen, ―The Entity-Relationship Model: Toward a Unified View of Data,‖ ACM
Trans. Database Systems, 1976.
[Colizza06] Colizza V, Barrat A, Barthelemy M., Vespignani A., (2006) ―The role of the airline
transportation networks in the prediction and predictability of global epidemics,‖ Proc. Natl.
Acad. Sci. USA, 103, 2015-2020.

NS CTA IPP v1.4 5-69 March 17, 2010


[Colizza07] Colizza V. and Vespignani A. (2007) ―Invasion threshold in heterogeneous
metapopulation networks,‖ Phys. Rev. Lett. 99, 14870.
[Colizza08] Colizza V. and Vespignani A. (2008) ―Epidemic modeling in metapopulation
systems with heterogeneous coupling patterns: Theory and simulations,‖ Journal of Theoretical
Biology 251, 450-467 (2008)
[Contractor06] Contractor, N., Wasserman, S., & Faust, K. (2006). ―Testing multi-theoretical
multilevel hypotheses about organizational networks: An analytic framework and empirical
example,‖ Academy of Management Review, 31, 681-703.
[Chakrabarti06] Deepayan Chakrabarti and Christos Faloutsos, ―Graph mining: Laws,
generators, and algorithms,‖ ACM Computing Surveys, 38(1), 2006.
[Chung06] W. Chung, R. Savell, J.-P. Schutt, and G. Cybenko, ―Identifying and tracking
dynamic processes in social networks,‖ Society of Photo-Optical Instrumentation Engineers
(SPIE) Conference Series, volume 6201, June 2006.
[Diaz08] J. Diaz, D. Mitsche, X. Perez-Gimenez, ―On the Connectivity of Dynamic Random
Geometric Graphs,‖ Proc. of the Nineteenth Annual ACM-SIAM symposium on Discrete
Algorithms (SODA‘08), San Francisco, California, 2008.
[Doyle05] J. Doyle, D. L. Alderson, L. Li, S. Low, M. Roughan, S. Shalunov, R. Tanaka, and W.
Willinger, ―The ``robust yet fragile`` nature of the Internet,‖ Proceedings of the National
Academy of Sciences, vol. 102, no. 41, 14497-14502, October 2005.
[Du09] Nan Du, Christos Faloutsos, Bai Wang, and Leman Akoglu, ―Large human
communication networks: Patterns and a utility-driven generator,‖ KDD, 2009.
[Faloutsos99] M. Faloutsos, P. Faloutsos, and C. Faloutsos, ―On Power-Law Relationships of
The Internet Topology,‖ ACM Comp. Commun. Review, Vol. 29, No. 4, 1999.
[Friedgut96] E. Friedgut and G. Kalai, ―Every monotone graph property has a sharp threshold,‖
Proc. Amer. Math. Soc. 124 2993–3002. 1996.
[Goel05] A. Goel, S. Rai, and B. Krishnamachari, ―Monotone properties of random geometric
graphs have sharp thresholds,‖ The Annals of Applied Probability, 2005, Vol. 15, No. 4, 2535–
2552.
[Harary69] F. Harary, Graph Theory (1969), Addison–Wesley, Reading, MA.
[Henderson04] T. Henderson, D. Kotz and I. Abyzov, ―The changing usage of a mature campus-
wide wireless network,‖ Proceedings of ACM MOBICOM, 2004.
[Howard84] R. A. Howard and J. E. Matheson. ―Influence diagrams,‖ In Readings on the
principles and applications of decision analysis, Vol. I (1984), pp. 721-762, R. A. Howard and J.
E. Matheson, eds. Menlo Park, CA: Strategic Decisions Group.
[Huang06] Zan Huang and D.D. Zeng, ―A link prediction approach to anomalous email
detection,‖ Systems, Man and Cybernetics, 2006, SMC '06, volume 2, pages 1131--1136, Oct.
2006.
[Huang09] Zan Huang and Dennis K. J. Lin, ―The time-series link prediction problem with
applications in communication surveillance,‖ INFORMS J. on Computing, 21(2):286--303,
2009.

NS CTA IPP v1.4 5-70 March 17, 2010


[Hui05] P. Hui, A. Chaintreau, J. Scott, R. Gass, J. Crowcroft and C. Diot, ―Pocket Switched
Networks and Human Mobility in Conference Environments,‖ Proceedings of ACM SIGCOMM
workshop on Delay Tolerant Networking (WDTN), 2005.
[Hsu09] W.-J. Hsu, T. Spyropoulos, K. Psounis and A. Helmy, ―Modeling Spatial and Temporal
Dependencies of User Mobility in Wireless Mobile Networks,‖ IEEE/ACM Transactions on
Networking, 17(5), Oct. 2009.
[IbctURL] Infantry Brigade Combat Team description.
http://www.globalsecurity.org/military/agency/army/bct-infantry.htm
[Jindal09] A. Jindal and K. Psounis, ―Contention-Aware Performance Analysis of Mobility-
Assisted Routing,‖ IEEE Transactions on Mobile Computing, 8(2), Feb 2009.
[Kim09] M.-S. Kim and J. Han (2009). ―A particle-and-density based evolutionary clustering
method for dynamic networks,‖ Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09), pp.
622-633.
[Lesk07] J. Leskovec, M. McGlohon, C. Faloutsos, N. Glance, M. Hurst, ―Cascading Behavior
in Large Blog Graphs,‖ in SIAM International Conference on Data Mining (SDM) 2007.
[Li08] Yan Li, Jun-Hong Cui, Dario Maggiorini, and Michalis Faloutsos, ―Characterizing and
Modeling Clustering Features in AS-Level Internet Topology,‖
 IEEE INFOCOM'08, 2008.
[Li09] Lei Li, James McCann, Nancy~S. Pollard, and Christos Faloutsos, ―Dynammo: mining
and summarization of coevolving sequences with missing values,‖ KDD, pages 507--516, 2009.
[Lu09] Q. Lu, G. Korniss, and B.K. Szymanski (2009). ―The Naming Game in Social Networks:
Community Formation and Consensus Engineering,‖ J. Econ. Interact. Coord. 4, 221—235
(2009)
[McPherson01] McPherson, M., L. Smith-Lovin, and J. Cook. (2001). ―Birds of a Feather:
Homophily in Social Networks,‖ Annual Review of Sociology. 27:415-44.
[McGlohon08] Mary McGlohon, Leman Akoglu, and Christos Faloutsos, ―Weighted graphs and
disconnected components: patterns and a generator,‖ Proceeding of the 14th ACM SIGKDD
international conference on Knowledge discovery and data mining, pages 524--532, Las Vegas,
Nevada, USA, 2008.
[Monge03] Monge, P. & Contractor, N. (2003) Theories of communication networks. New
York: Oxford University Press.
[Newman03] M. E. J. Newman, ―The Structure and Function of Complex Networks,‖ SIAM
Review, 45(2), pp. 167-256. 2003.
[Newman04] Newman, M. E. J. and Girvan, M. (2004). ―Finding and evaluating community
structure in networks,‖ Phys. Rev. E 69, 026113.
[Newman05] Newman, M. E. J. (2005). ―Detecting Community Structure in Networks,‖ Eur.
Phys. J. B. 38, 321–330.
[Newman06] Newman, M. E. J. (2006). ―Finding community structure in networks using the
eigenvectors of matrices,‖ Phys Rev E 74:036,104.

NS CTA IPP v1.4 5-71 March 17, 2010


[O‘Madadhain05] Joshua O'Madadhain, Jon Hutchins, and Padhraic Smyth, ―Prediction and
ranking algorithms for event-based network data,‖ SIGKDD Explor. Newsl., 7(2):23--30,
December 2005.
[Owl09] OWL 2 Web Ontology Language: New Features and Rationale. W3C Recommendation
27 October 2009 URL: http://www.w3.org/TR/owl2-new-features/
[Palla05] Palla G, Derenyi I, Farkas I, and Vicsek T (2005). ―Uncovering the overlapping
community structure of complex networks in nature and society,‖ Nature 435:814–818.
[Papadimitriou03] Spiros Papadimitriou, Anthony Brockwell, and Christos Faloutsos, ―Adaptive
hands-off stream mining,‖ VLDB, September 2003.
[Raeder09] T. Raeder and N. V. Chawla (2009), ―Modeling the Product Space as a Social
Network,‖ ACM/IEEE International Conference on Advances in Social Network Analysis and
Modeling.
[Raeder09] T. Raeder, O. Lizardo, N. V. Chawla, and D. Hachen, ―Precursors of Tie Decay in a
Large-Scale Communication Network,‖ Social Networks, UNDER REVIEW.
[Ramanathan07] R. Ramanathan, P. Basu, and R. Krishnan, ―Towards a formalism for routing in
challenged networks,‖ in Proceedings of ACM Mobicom workshop on Challenged Networks
(CHANTS 2007), Montreal, Canada, September 2007.
[Rdfs04] RDF Vocabulary Description Language 1.0: RDF Schema. W3C Recommendation 10
February 2004. URL: http://www.w3.org/TR/rdf-schema/
[Reichardt04] Reichardt, J., & Bornholdt, S. (2004). ―Detecting fuzzy community structures in
complex networks with a potts model,‖ Phys. Rev. Lett. 93, 218701.
[Royer99] E. Royer and C.-K. Toh, ―A Review of Current Routing Protocols for Ad-Hoc Mobile
Wireless Networks,‖ IEEE Communications Magazine, April, 1999, pp 46-55.
[Scott00] Scott, J (2000). Social Network Analysis: A Handbook. Sage, London, 2nd ed.
[Shah03] Shah, R.C.; Roy, S.; Jain, S.; Brunette, W., ―Data MULEs: modeling a three-tier
architecture for sparse sensor networks,‖ IEEE Workshop on Sensor Network Protocols and
Applications, May 2003.
[Steinhaeuser08] K. Steinhaeuser and N. V. Chawla (2008), ―Community Detection in a Large
Real-World Social Network,‖ Social Computing, Behavioral Modeling, and Prediction, pages
168—175.
[Steinhaeuser09] K. Steinhaeuser and N. V. Chawla (2009), ―Identifying and Evaluating
Communities in Networks,‖ Pattern Recognition Letters, TO APPEAR.
[Stewart07] Avare Stewart, Ling Chen, Raluca Paiu, and Wolfgang Nejdl, ―Discovering
information diffusion paths from blogosphere for online advertising,‖ ADKDD '07: Proceedings
of the 1st international workshop on Data mining and audience intelligence for advertising, pages
46--54, New York, NY, USA, 2007.
[Sun06] Jimeng Sun, Dacheng Tao, and Christos Faloutsos, ―Beyond streams and graphs:
dynamic tensor analysis,‖ KDD, pages 374--383, 2006.

NS CTA IPP v1.4 5-72 March 17, 2010


[Tao04] Yufei Tao, Christos Faloutsos, Dimitris Papadias, and Bin Liu, ―Prediction and indexing
of moving objects with unknown motion patterns,‖ SIGMOD Conference, pages 611--622, Paris,
2004.
[Tong08] Hanghang Tong, Spiros Papadimitriou, Jimeng Sun, Philip~S. Yu, and Christos
Faloutsos, ―Colibri: fast mining of large static and dynamic graphs,‖ Proceeding of the 14th
ACM SIGKDD international conference on Knowledge discovery and data mining (KDD),
pages 686--694, Las Vegas, Nevada, USA, 2008.
[Toroczkai04] Toroczkai Z., K.E. Bassler. (2004) ―Network dynamics: Jamming is limited in
scale-free systems,‖ Nature 428, 716.
[Toroczkai08] Toroczkai Z, B. Kozma, K.E. Bassler, N.W. Hengartner, G. Korniss (2008)
―Gradient Networks,‖ J. Phys. A: Math. Theor. 41 155103.
[Traynor06] P. Traynor, J. Shin, B. Madan, S. Phoha, and T. La Porta, ―Efficient Group Mobility
for Heterogeneous Sensor Networks,‖ in Proceedings of IEEE VTC-Fall, September 2006.
[Tsourakakis08] Charalampos Tsourakakis, ―Counting of triangles in large real networks,
without counting: Algorithms and laws,‖ International Conference on Data Mining (ICDM),
2008.
[Tylenda09] Tomasz Tylenda, Ralitsa Angelova, and Srikanta Bedathur, ―Towards time-aware
link prediction in evolving social networks,‖ In Lee Giles, Prasenjit Mitra, Igor Perisic, John
Yen, and Haizheng Zhang, editors, Proceedings of the 3rd ACM Workshop on Social Network
Mining and Analysis (SNA-KDD), Paris, France, 2009.
[Venkateswaran09] Venkateswaran, T. La Porta, R. Acharya, and V. Sarangan, ―A mobility
prediction based relay deployment framework for conserving power in MANETs,‖ in IEEE
Transactions on Mobile Computing, vol. 8, pp. 750–765, 2009.
[Vong99] T. T. Vong, G. A. Haas, and C. L. Henry, ―NATO reference mobility model (nrmm)
modeling of the demo iii experimental unmanned, ground vehicle (xuv),‖ in ARL–MR–435,
Army Research Lab, 1999.
[Willinger09] W. Willinger, D. Alderson, and J. C. Doyle, ―Mathematics and The Internet: A
source of Enormous Confusion and Great Potential,‖ Notices of the AMS, Vol. 56, No. 5, pp.
586-599, May 2009.
[WuHuberman04] Wu, F., and Huberman, B. A. (2004). ―Finding communities in linear time: a
physics approach,‖ Eur Phys J B 38, 331–338.
[Wu04] Wu, F. and Huberman, B. A. and Adamic, L.A. and Tyler, J.R., Physica A: Statistical
Mechanics and its Applications, 337, 1-2, pages 327--335, 2004.
[Yi08] Y. Yi and M. Chiang. ―Stochastic Network Utility Maximization: A tribute to Kelly‘s
paper published in this journal a decade ago,‖ European Transactions On Telecommunications
00: 1–22 (2008). http://www.princeton.edu/~chiangm/stochastic_num.pdf

NS CTA IPP v1.4 5-73 March 17, 2010


6 Non-CCRI Research: Interdisciplinary Research
Center (IRC)

Director: Will Leland, BBN Technologies


Email: wel@bbn.com, Phone: 908-464-9364
Government Lead: Ananthram Swami (ARL)
Email: aswami@arl.army.mil, Phone: 301-394-2486

Project Leads Lead Collaborators


Project R1: M. Faloutsos, UCR A. Swami, B. Rivera, ARL
Project R2: M. Dean, BBN; J. Hendler,
B. Szymanski, RPI (SNARC)
RPI
Project R3: J. Hancock, ArtisTech; A.
J. Han, UIUC (INARC)
Leung, BBN

Table of Contents
6 Non-CCRI Research: Interdisciplinary Research Center (IRC) ............................................ 6-1
6.1 Overview ......................................................................................................................... 6-3
6.2 Motivation ....................................................................................................................... 6-3
6.2.1 Challenges of Network-Centric Operations ............................................................. 6-3
6.2.2 Example Military Scenarios ..................................................................................... 6-4
6.2.3 Impact on Network Science ..................................................................................... 6-4
6.3 Key Research Questions ................................................................................................. 6-4
6.4 Technical Approach ........................................................................................................ 6-4
6.5 Project R1: Methods for Understanding Composite Networks ...................................... 6-7
6.5.1 Project Overview ..................................................................................................... 6-7
6.5.2 Project Motivation ................................................................................................... 6-7
6.5.3 Key Project Research Questions .............................................................................. 6-8
6.5.4 Initial Hypotheses .................................................................................................... 6-9
6.5.5 Technical Approach ................................................................................................. 6-9
6.5.6 Task R1.1: Extracting Network Knowledge: Graph Sampling and Clustering in
Integrated Networks (M. Faloutsos, UC Riverside (IRC); D. Towsley, UMass (IRC); P.
Basu, BBN (IRC); J. Srivastava, UMinn (IRC)) ............................................................... 6-10
6.5.7 Task R1.2: Advanced Mathematical Models: Economic / Market-based Approach to
Modeling Integrated Networks (D. Parkes, Harvard (IRC); M. Wellman, UMich (IRC); V.
Kawadia, BBN (IRC)) ....................................................................................................... 6-18
6.5.8 Task R1.3: Category Theory Based Approach to Modeling Composite Networks (M.
Kokar, NEU (IRC); V. Kawadia, BBN (IRC); Collaborators: P. Basu, BBN (IRC); J.
Hendler, RPI (IRC); C. Cotton, D. Sincoskie, UDel (IRC)) .............................................. 6-25

NS CTA IPP v1.4 6-1 March 17, 2010


6.5.9 Linkages with Other Projects ................................................................................. 6-30
6.5.10 Collaborations and Staff Rotations ...................................................................... 6-31
6.5.11 Relation to DoD and Industry Research .............................................................. 6-32
6.5.12 Project Research Milestones ................................................................................ 6-32
6.5.13 Project Budget by Organization ........................................................................... 6-33
6.6 Project R2: Characterizing the Interdependencies Among Military Network Components
6-34
6.6.1 Project Overview ................................................................................................... 6-34
6.6.2 Project Motivation ................................................................................................. 6-35
6.6.3 Key Project Research Questions ............................................................................ 6-38
6.6.4 Initial Hypotheses .................................................................................................. 6-39
6.6.5 Technical Approach ............................................................................................... 6-39
6.6.6 Task R2.1: Semantic Information Theory (J. Hendler, RPI (IRC); P. Basu, BBN
(IRC); N. Contractor, NWU (IRC); B. Carterette, UDel (IRC)) ....................................... 6-40
6.6.7 Task R2.2: Impact of Information Loss and Error (K. Carley, M. Martin, M.
Kowalchuk, J. Reminga, CMU (IRC); Collaborator: B. Syzmanski, RPI (SCNARC)) ..... 6-46
6.6.8 Linkages with Other Projects ................................................................................. 6-50
6.6.9 Collaborations and Staff Rotations ........................................................................ 6-50
6.6.10 Relation to DoD and Industry Research .............................................................. 6-50
6.6.11 Project Research Milestones ................................................................................ 6-51
6.6.12 Project Budget by Organization ........................................................................... 6-52
6.6.13 Relevance to US Military Visions/Impact on Network Science .......................... 6-52
6.7 Project R3: Experimentation with Composite Networks ............................................. 6-55
6.7.1 Project Overview ................................................................................................... 6-55
6.7.2 Project Motivation ................................................................................................. 6-55
6.7.3 Key Project Research Questions ............................................................................ 6-57
6.7.4 Initial Hypotheses .................................................................................................. 6-57
6.7.5 Technical Approach ............................................................................................... 6-58
6.7.6 Task R3.1: Shared Environment for Experimentation in Composite Networks (J.
Hancock, ArtisTech (IRC); A. Leung, BBN (IRC)) .......................................................... 6-58
6.7.7 Task R3.2: Basic Research to Enable Experimentation in Composite Networks (D.
Williams, USC (IRC); A. Leung, BBN (IRC); J. Hancock, ArtisTech (IRC); N. Contractor,
NWU (IRC); M. Poole, UIUC (IRC); J. Srivastava, UMinn (IRC)) ................................ 6-65
6.7.8 Task R3.3: Applied Experimentation in Composite Networks (A. Leung, BBN
(IRC); J. Hancock, ArtisTech (IRC); D. Williams, USC (IRC)) ....................................... 6-70
6.7.9 Task R3.4: Dissonance in Combined Networks (D. Sincoskie, C. Cotton, U. Del
(IRC)) 6-73
6.7.10 Linkages with Other Projects ............................................................................... 6-78
6.7.11 Collaborations and Staff Rotations ...................................................................... 6-79
6.7.12 Relation to DoD and Industry Research .............................................................. 6-79
6.7.13 Project Research Milestones ................................................................................ 6-79
6.7.14 Project Budget by Organization ........................................................................... 6-80
6.8 Project R4: Liaison ....................................................................................................... 6-81
6.8.1 Project Overview ................................................................................................... 6-82
6.8.2 Project Motivation ................................................................................................. 6-82

NS CTA IPP v1.4 6-2 March 17, 2010


6.8.3 Task R4.1 6.1 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D. Towsley,
UMass (IRC))..................................................................................................................... 6-82
6.8.4 Task R4.2 6.2 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D. Towsley,
UMass (IRC))..................................................................................................................... 6-82
6.9 Project R5: Technical and Programmatic Leadership .................................................. 6-84
6.9.1 Project Overview ................................................................................................... 6-84
6.9.2 Project Motivation ................................................................................................. 6-84
6.9.3 Task R5.1 6.1 Technical and Programmatic Leadership (W. Leland, BBN (IRC); I.
Castineyra, BBN (IRC))..................................................................................................... 6-84
6.9.4 Task R5.2 6.2 Technical and Programmatic Leadership (I. Castineyra, BBN (IRC);
W. Leland, BBN (IRC)) ..................................................................................................... 6-84
6.9.5 Project Budget by Organization ............................................................................. 6-84
6.10 Project EDUC: Education Planning ............................................................................ 6-86
6.10.1 Project Overview ................................................................................................. 6-86
6.10.2 Task EDUC.1 Education and Transition Planning (D. Sincoskie and C. Cotton,
UDEL (IRC)) ..................................................................................................................... 6-86
6.10.3 Project Budget by Organization ........................................................................... 6-86

6.1 Overview

A primary goal of IRC is to conduct fundamental cutting-edge research in network science


pertaining to the areas of social/cognitive, information, and communications networks, and how
they interact with each other to yield synergistic benefits. To meet this goal, IRC has developed a
research plan consisting of both fundamental basic research (6.1) and applied research (6.2)
components. The 6.1 component consists of a diverse set of topics including development of
advanced mathematical models for network science; development of efficient techniques for
extracting knowledge about networks; and the characterization of the interdependencies of
military network components. The 6.2 component consists of several modeling, simulation, and
experiment design tasks that will be instrumental in the validation and verification of the various
6.1 research ideas not only in the IRC but also across the Consortium. In addition, there is a
liaison project designed to ensure cross-center coordination and communication.

6.2 Motivation

6.2.1 Challenges of Network-Centric Operations


The problem addressed here is fundamental in our ability to move from the plane of measured
data and observed behavior to ―knowledge‖. Here are some concrete examples: (a) we can
identify communications nodes or sensors that have passed to the hands of the enemy and are
used to impede the communication functions, (b) we can identify information exchanges such as
email, that can point to abnormal behavior, of someone attempting to contact or access people
and information that they should not, and (c) we can discover and maintain key military
relationships and communication channels, which are essential for the success of a mission . The
capabilities developed in this project will benefit directly our ability to develop trusted networks
that are reliable in their operation and protected from malicious users.

NS CTA IPP v1.4 6-3 March 17, 2010


Furthermore, a rare piece of critical information may significantly affect mission
accomplishment. Tight social connections should be preserved to maintain a smooth flow of
information. Sudden failure of a trust relationship may dramatically impact the structure of the
information flow, and mandate changes in the underlying communications network.

6.2.2 Example Military Scenarios


For example, an individual needs certain information, but doesn't know exactly where it is or
who can provide it. Traditionally, that person would ask friends in their social network, and they
might ask their friends, until the information has been found. By leveraging the interconnections
between the networks, we can design a system-of-networks that will automatically mimic and
extend this behavior while reducing the cognitive impact on the users. The overall goal is to
predict what information will be needed by users (cognitive and information networks), identify
candidate sources for that information (information network), and select sources based on
information quality and delay (communications and social networks).

6.2.3 Impact on Network Science


Enhance our ability to characterize and predict the behavior of multi-genre networks by applying
non-structural analysis techniques.
Create a theory of ―information‖ for the social cognitive networks arena.
Build a validation methodology for the science of composite networks.

6.3 Key Research Questions

Are integrated multi-genre networks better-understood and controlled using techniques outside
classic structurally focused approaches?
How can we describe and characterize the propagation of information across social cognitive
networks in a way akin to the characterization of capacity, noise, and distortion in
communication networks: e.g., define semantic capacity, impact of loss and error of information
on decision processes?
What is the best methodology for validating theories, models and characterizations of multi-
genre networks?

6.4 Technical Approach

Our research consists of the following six projects.


R1: Methods for Understanding Composite Networks:
This project addresses the following research question: Can integrated multi-genre networks be
better understood and controlled using techniques outside classic structurally-focused
approaches? The project combines three threads: economic (utility-based) models, category
theory for modeling networks in a compositional way, and knowledge extraction from composite
networks using graph sampling and clustering.

NS CTA IPP v1.4 6-4 March 17, 2010


Mathematical Economics: Mathematical economics provides a well-developed mathematical
theory to model the behavior of large systems of (approximately) rational agents as they act to
deploy limited resources for value creation. A fundamental principle is that rational (i.e., utility
maximizing) agents respond to incentives, and so identifying the operable incentives is always a
first step of analysis. Economics provides a repertoire of mechanisms, such as call markets,
competitive equilibrium, and auction mechanisms, in support of effective coordinated control for
a range of situations. In adopting an economic viewpoint for the understanding and control of
complex, heterogeneous networks, the goal of a network is construed as that of promoting fast,
high-quality decision-making. Networks that support effective decentralized decision-making are
high-utility networks. We will pursue the task of economic modeling of integrated networks at
two levels. First, we will use market modeling to capture specific network resource allocation
scenarios, and analyze the resulting models as economic systems. Second, we will seek to effect
useful coordination across and within networks by inferring the utility of actors within the
networks, identifying simple parameterizations of the decision environment that facilitate
automated control to improve behavior, and developing incentive-compatible mechanisms to
elicit additional information as necessary from participants.
Compositional Methods: One of the primary tasks for the IRC is to provide the understanding of
how global network properties or behaviors can be composed from local properties of
information, social-cognitive, and communications networks. The non-homogeneity of the
different genres of networks involved calls for modeling abstract mathematical tools that are
capable of capturing the commonalities and the differences among various components and still
capable of providing means for deriving models of such complicated structures. We propose to
use category theory – a mathematical framework that has proved to be very efficient in capturing
representations of many, seemingly disparate, mathematical structures. This approach to
modeling relies on the colimit operator of category theory, which can be intuitively understood
as an extension of the shared union operator of set theory. This research will contribute to both
the modeling of composite networks and to the composite metrics on networked systems. We
plan to leverage this work in year 2 of the E2 project.
Knowledge Extraction: This task aims to develop a theory, methods, and tools for efficiently
extracting knowledge from large networks of various types, such as communication,
social/cognitive, and information networks, and especially, integrated compositions of the above.
A key goal here is to measure and understand the fundamental relationships that define the
networks and their characteristics. Such knowledge can then be utilized not only for later
analysis but also for modeling, simulation and experiment design. In addition, we want to
understand the interplay between network structure and function, specifically, when function
pertains to information retrieval and information flow. In particular, we attempt to address the
following problem: given measured, potentially inaccurate and incomplete topology of an
integrated network, how can we extract knowledge from its structure? This knowledge can then
be used to reveal the nature of the network, the rules of how it is being created and maintained,
and provide insight into its operational functions. More specifically, one can identify an
emerging hierarchy, ―central‖ nodes, or outliers, nodes or group of nodes that do not seem to
conform to the typical behavior of the network. This task has strong linkages with project I3 of
INARC and over time is expected to feed into various EDIN projects.

R2: Characterizing the Interdependencies among Military Network Components:

NS CTA IPP v1.4 6-5 March 17, 2010


In 1948, Claude Shannon [Shannon48] identified some of the key properties of information
flowing through networks. In particular, Shannon explored fundamental limits on compressing,
reliably storing, and communicating data. Modern military communication networks, as well as
much of the modern IT industry, use the information theory derived from Shannon as a key
mathematical backbone of their work. Unfortunately, when it comes to the composite networks
being studied in the NS CTA, current information theory only goes so far. In this project we
explore how to extend the information theory that has been so important to communications to
complex military networks. Specifically, we explore the extension of Shannon‘s definitions to
modern networks, we examine how the effects of information loss can be modeled at the level of
social and cognitive networks, and explore decision and utility models that can help in
formulating the overall network. In short, a key focus of this project is creating a formal model
the interdependencies between the different components of composite networks. This project
includes both 6.1 research in the formulation of the basic models, and 6.2 research (especially in
the later years) that will focus on how these models can help us understand how to disrupt enemy
networks and on how to best protect our own.

R3: Experimentation with Composite Networks:


A key responsibility of IRC is to provide a facility for joint experimentation on composite
networks. The multi-year project will prepare the distributed Centers and ARL for
experimentation with and thorough evaluation of NS CTA ideas tailored to Army challenges.
This project includes mostly 6.2 applied research tasks, and one 6.1 basic research task that is
fundamentally based on networked research, and ready for more experiments. We bring together
an interdisciplinary team to specify (6.1) experimental requirements and methods for the data,
representations, interfaces, research-driven models and simulations to conduct (6.2) distributed
validation experiments. We leverage modern distributed computing practice to bear upon
populating the IRC facility with useful distributed experimental environment automation using
existing resources like MNMI as well as the ontologies, metrics, and other products of associated
CTA research. A key to reaching validation requirements is to sufficiently specify and support
experiment design to produce scientifically reliable results. The interdisciplinary composite
network research of this CTA will take us into new experimental territory. There are no
boundaries to the collaboration on this task and it is not limited to the team above. This task is
dedicated to creating IRC supported experiment environments and procedures that support CTA-
wide cross genre experiment needs. Connecting this environment to ARL is critical to work on
the Army relevant scenarios and interaction with Army systems that will push advances toward
Technology Transition.
R4: Liaison:
This project serves to fund individuals tasked with enabling cross-center communication,
collaboration and coordination at a technical level.
R5: Technical and Programmatic Leadership:
This project consolidates the consortium‘s technical and programmatic management.
EDUC: Education Planning:
The education task covers coordination of education-related activities for the NS CTA.

NS CTA IPP v1.4 6-6 March 17, 2010


6.5 Project R1: Methods for Understanding Composite Networks

Project Lead: M. Faloutsos, UCR


Email: michalis@cs.ucr.edu, Phone: 951 827-2480

Primary Research Staff Collaborators


M. Faloutsos, UCR A. Swami, ARL
M. Wellman, UMich. (IRC)
D. Parkes, Harvard (IRC)
M. Kokar, NEU (IRC)
J. Srivastava, UMinn (IRC)
D. Towsley, UMass (IRC)
P. Basu, BBN (IRC)
I. Castineyra, BBN (IRC)
W. Leland, BBN (IRC)
V. Kawadia, BBN (IRC)

6.5.1 Project Overview


The project combines three threads: economic (utility-based) models, category theory for
modeling networks in a compositional way, and knowledge extraction from composite networks
using graph sampling and clustering.
One of the goals of this project is to develop a theory, methods, and tools for efficiently
extracting knowledge from large networks of various types, such as communication,
social/cognitive, and information networks, and especially, integrated compositions of the above.
A key goal here is to measure and understand the fundamental relationships that define the
networks and their characteristics. In addition, we want to understand the interplay between
network structure and function, specifically, when function pertains to information retrieval and
information flow, as we explain below.

6.5.2 Project Motivation


Understanding the behaviour of a multi-genre network depends on the availability of models that
will characterize its behaviour to the point where it can be predicted. Individual models of each
one of the genres are not useful if they cannot be composed into an overarching description.
This project focuses on non-structural techniques that promise the capability of creating
descriptive models that can be used to characterize a multi-genre network. Other projects of the
Consortium are looking at modeling multi-genre networks using structure-focused approaches:
e.g., projects E1 and E2.

NS CTA IPP v1.4 6-7 March 17, 2010


The problem addressed here is fundamental in our ability to move from the plane of measured
data and observed behavior to ―knowledge‖. Here are some concrete examples: (a) we can
identify communications nodes or sensors that have passed to the hands of the enemy and are
used to impede the communication functions, (b) we can identify information exchanges such as
email, that can point to abnormal behavior, of someone attempting to contact or access people
and information that they should not, and (c) we can discover and maintain key military
relationships and communication channels, which are essential for the success of a mission . The
capabilities developed in this project will benefit directly our ability to develop trusted networks
that are reliable in their operation and protected from malicious users.
Furthermore, a rare piece of critical information may significantly affect mission
accomplishment. Tight social connections should be preserved to maintain a smooth flow of
information. Sudden failure of a trust relationship may dramatically impact the structure of the
information flow, and mandate changes in the underlying communications network.

Impact on Network Science:


Mathematical Economics: modeling a multi-genre network as a conglomerate of interacting
markets of rational entities that attempt to maximize individual goals should lead to ways of
understanding and controlling such a network.
Compositional Methods: a fundamental look at what types of models can be composed
Graph Sampling and Clustering: will enable us characterize a large composite network by
efficiently sampling a smaller subset of the same.

6.5.3 Key Project Research Questions


A prerequisite to developing economic network models will be to understand the ways in which
different network planes (information, communication, social) come together in competition for
resources, and in service of each other‘s functions. Key research questions include:
What are the resources that are in supply and demand by each network?
What does this suggest about where to position a small number of market processes, that
will provide coordinated allocation for those resources and services?
What kind of mathematical formalism is needed to model complex heterogeneous networks in a
compositional way?
Can we identify patterns and ―typical‖ node behavior from the point of view of network
connectivity? This implies the development of profiles per node or group of nodes, and their
interactions. The node could be a device, a piece of information or a user. A pattern here implies
―commonly manifested‖ behavior, on graphs with verifiable small or no malicious activity.
Can we identify structure and understand how this affects the operation of the network and its
capability to meet its goals? For example, we want to identify network articulation points, which
could correspond to potential points of failure in a communication network or an influential
person in a social network. Another example is the discovery of implicit or explicit hierarchical
structures.
Can we identify abnormal substructures and outliers? This could point to ―misbehaving‖ nodes,
peculiar access patterns to a piece of information, or surprising relationships between people,

NS CTA IPP v1.4 6-8 March 17, 2010


which can translate directly to malicious behavior, such as a spy in a social network or an
attacker in a communication network. Identifying ―atypical‖ behavior will tie in with the
identification of patterns discussed above and the distinction of typical and atypical will be the
focus of the second and third years.
In an integrated network, how can we identify and retrieve important information by considering
the interaction of the structure and the availability of information?

6.5.4 Initial Hypotheses


This project addresses the following research question: Can integrated multi-genre networks be
better understood and controlled using techniques outside classic structurally focused
approaches?

6.5.5 Technical Approach


Overview
Mathematical Economics: Mathematical economics provides a well-developed mathematical
theory to model the behavior of large systems of (approximately) rational agents as they act to
deploy limited resources for value creation. A fundamental principle is that rational (i.e., utility
maximizing) agents respond to incentives, and so identifying the operable incentives is always a
first step of analysis. Economics provides a repertoire of mechanisms, such as call markets,
competitive equilibrium, and auction mechanisms, in support of effective coordinated control for
a range of situations. In adopting an economic viewpoint for the understanding and control of
complex, heterogeneous networks, the goal of a network is construed as that of promoting fast,
high-quality decision-making. Networks that support effective decentralized decision-making are
high-utility networks. We will pursue the task of economic modeling of integrated networks at
two levels. First, we will use market modeling to capture specific network resource allocation
scenarios, and analyze the resulting models as economic systems. Second, we will seek to effect
useful coordination across and within networks by inferring the utility of actors within the
networks, identifying simple parameterizations of the decision environment that facilitate
automated control to improve behavior, and developing incentive-compatible mechanisms to
elicit additional information as necessary from participants.
Compositional Methods: One of the primary tasks for the IRC is to provide the understanding of
how global network properties or behaviors can be composed from local properties of
information, social-cognitive, and communications networks. The non-homogeneity of the
different genres of networks involved calls for modeling abstract mathematical tools that are
capable of capturing the commonalities and the differences among various components and still
capable of providing means for deriving models of such complicated structures. We propose to
use category theory – a mathematical framework that has proved to be very efficient in capturing
representations of many, seemingly disparate, mathematical structures. This approach to
modeling relies on the colimit operator of category theory, which can be intuitively understood
as an extension of the shared union operator of set theory. This research will contribute to both
the modeling of composite networks and to the composite metrics on networked systems. We
plan to leverage this work in year 2 of the E2 project.
Knowledge Extraction: This task aims to develop a theory, methods, and tools for efficiently
extracting knowledge from large networks of various types, such as communication,
social/cognitive, and information networks, and especially, integrated compositions of the above.

NS CTA IPP v1.4 6-9 March 17, 2010


A key goal here is to measure and understand the fundamental relationships that define the
networks and their characteristics. Such knowledge can then be utilized not only for later
analysis but also for modeling, simulation and experiment design. In addition, we want to
understand the interplay between network structure and function, specifically, when function
pertains to information retrieval and information flow. In particular, we attempt to address the
following problem: given measured, potentially inaccurate and incomplete topology of an
integrated network, how can we extract knowledge from its structure? This knowledge can then
be used to reveal the nature of the network, the rules of how it is being created and maintained,
and provide insight into its operational functions. More specifically, one can identify an
emerging hierarchy, ―central‖ nodes, or outliers, nodes or group of nodes that do not seem to
conform to the typical behavior of the network. This task has strong linkages with project I3 of
INARC and over time is expected to feed into various EDIN projects.

6.5.6 Task R1.1: Extracting Network Knowledge: Graph Sampling and Clustering
in Integrated Networks (M. Faloutsos, UC Riverside (IRC);
D. Towsley, UMass (IRC); P. Basu, BBN (IRC); J. Srivastava, UMinn (IRC))
NOTE: This is a 6.1 basic research task.

Task Overview
In this task, we will explore two complementary approaches for extracting knowledge about the
structure of a network: (a) graph sampling, as a means to overcome measurement limitations,
and (b) clustering, as a means to extract patterns and information from the structure of the
network. Note that these two problems are synergistic: they both provide different insights into
the structure of the observed network, as we discuss below.
In the first year, we will focus on the question: how can we identify patterns and ―typical‖ node
behavior? This is not a trivial task as it an open-ended question. Finally, in subsequent years, we
will develop methods to detect outliers and anomalous behavior; nodes that interact with the rest
of the network in surprising ways. All tasks here are closely related: outliers are nodes or group
of nodes that deviate from the observed structural rules and patterns. The work will leverage
the work of PIs Faloutsos [He08] [Ilio07] and Towsley [Jaiswal04] [Bu02] in this area.
Task Motivation
The problem addressed here is fundamental in our ability to move from the plane of measured
data and observed behavior to ―knowledge‖. Here are some concrete examples: (a) we can
identify communications nodes or sensors that have passed to the hands of the enemy and are
used to impede the communication functions, (b) we can identify information exchanges such as
email, that can point to abnormal behavior, of someone attempting to contact or access people
and information that they should not, and (c) we can discover and maintain key military
relationships and communication channels, which are essential for the success of a mission . The
capabilities developed in this project will benefit directly our ability to develop trusted networks
that are reliable in their operation and protected from malicious users.
We present key research questions and the approach for each two components of this tasks
separately.
Subtask R1.1.1: Graph Sampling
The first approach to explore the structure of a network is graph sampling.

NS CTA IPP v1.4 6-10 March 17, 2010


Key Research Questions
Graph sampling can answer two different but related questions:
a. Given a large graph, we want to sample it to obtain a smaller graph that maintains most of
the fundamental properties of the initial graph.
b. Given a sample from a graph, what can we tell about the properties of the larger unknown
graph that the sample came from? The whole graph may be difficult or even impossible to
measure.
Initial Hypothesis
Can a sampling of an integrated network provide provable accurate estimates for the real
sampled topology in terms of key graph metrics?
The project in year one and the discussion below initially focus on unweighted and unlabelled
graphs. However, in subsequent years we will consider directed, weighted and labeled graphs,
which incorporate additional information, such as proximity of nodes in some metric space, or
roles of nodes.
We propose to develop a theoretical framework within which to understand the advantages of
various approaches to sampling a graph with the goal of developing accurate, yet resource
efficient algorithms. At first glance, it seems straightforward to sample a graph, but it is
surprisingly challenging. The work here will address two key challenges: (a) defining what is a
good sample, in terms of properties and metrics with respect to the original graph, and (b)
sampling method effectiveness may be specific to the type of network under study. As our
starting point, we will leverage PI Faloutsos, Basu and Towsley‘s work on graph sampling in
several different contexts, using sampling methods suitable for different types of graphs [He08,
Krishna07].
Prior work
We propose to evaluate, but ultimately go beyond, existing ways to sample a graph [Krishna07],
such as random walk (technique used by webcrawlers), random walk punctuated with random
jumps (Google‘s page-ranking algorithm), random node or edge sampling etc. and determine
which technique works best for specific classes of networks. We will also carefully consider
respondent-driven sampling (RDS) [Heckathorn97], a variation of Snowball sampling
[Goodman61]. RDS represents a recent advance in sampling methodology, because it has been
shown to produce asymptotically unbiased estimates from snowball samples under certain
conditions, and perform well even for groups of nodes that are small in size relative to the
general population, and for which no exhaustive list of population members is available. In
addition, we will research the application of diffusion wavelets [Coifman06] in conjunction with
random-walk-based sampling for uncovering network properties. We may be able to validate
sampling performance by correlating the actual latency of a random walk on the network with
analytical results on hitting times of weighted random walks on sampled graphs using work done
by Chau and Basu in the ongoing US/UK ITA program [Chau09]. For evaluating the
effectiveness of a sampling method, we will start with the degree distribution and then use more
advanced metrics such as, the Fisher Information metric, which is a standard measure of the
information provided by a sample in order to estimate some graph property.
In more detail, there exist two main classes of sampling methods. First, we have node/edge
sampling, where nodes are sampled according to some distribution. Typically nodes are

NS CTA IPP v1.4 6-11 March 17, 2010


sampled uniformly at random. Second, we have snowball sampling, where nodes are sampled
by traversing edges beginning at one node in the graph, for example using a breadth-first or
depth-first or random walk exploration. Its method has relative advantages. Node sampling is
possible only when all nodes have been already identified and thus can be selected. With this
sampling, it is easier to make assertions about the statistical properties of the resulting network.
On the other hand, some form of snowball sampling is easier to execute in practice, e.g.
exploring the friends-list of people in a social network, even in cases when knowing the whole
network is not feasible.

NS CTA IPP v1.4 6-12 March 17, 2010


Degree

The CCDF of the true and sampled degree distribution


using snowball sampling.

Technical Approach
First, it is easy to show that uniform node sampling provides an unbiased estimate of the degree
distribution. Second, in the context of online social networks many studies based on snowball
sampling have produced significantly erroneous estimates of the degree distribution, see Figure
above.
When both node sampling and RW sampling are applicable, which is better? Here we take mean
square error (MSE) as the metric to use in comparing different methods. Our initial results show
that node sampling provides better estimates for small degrees and that RW-sampling provides
better estimates for high degree nodes. This suggests that RW-based sampling is more
appropriate for characterizing the degree distribution of networks with highly variable degree
distributions and, in particular, those where the degree distributions are characterized by a power
law. The figure below shows degree distribution estimates under both node sampling and RW
sampling along with the mean square errors as estimated through simulation. We observe that
random node sampling provides better estimates for small degrees and RW (here we used the
variant called RDS) provides better estimates for large degrees.

NS CTA IPP v1.4 6-13 March 17, 2010


In the future, we propose to address the following questions: (a) Can we characterize analytically
the above tradeoffs in the presence of degree correlations? (b) Suppose that we have different
costs associated with making queries under node and RW sampling. How does this affect our
conclusions?
In addition, we propose to extend our work in the following directions.
Evaluating sampling using an extended set of network metrics. Graph metrics can be classified
into local metrics or global metrics. The former includes node degree, assortativity (correlation
of the degrees of nodes connected by an edge), clustering coefficient, etc. The latter include
network diameter (maximum number of links on the shortest path between any pair of nodes),
average shortest path length, etc. We will evaluate the effectiveness of sampling in estimating
both global and local properties of the network.
Considering networks characterized by directed graphs. Most current work focuses on networks
whose underlying graphs are undirected. We will also consider the problem of characterizing
directed graphs. This poses significant challenges especially in snowball sampling: if we assume
that an edge can be traversed only in the direction of the edge, e.g., the out-degree can be
obtained each time that a node is queried and that the in-degree cannot be obtained. This poses
an interesting and challenging sampling problem where node or RW sampling can pick up out-
degrees, construct a ―sampled‖ in-degree graph and an estimate for the average in-degree (total
out-degree equals total in-degree).
Time varying networks. We will extend the work to consider time varying networks. The first
challenge in extending the techniques proposed, when all we can obtain is the state of a node at a
particular instant of time. This suggests the need to perform repeated sampling over time so as to
obtain the time varying behavior of the node. In the case of RW sampling, the question arises as
to how we define a random walk. Does one still choose a neighbor uniformly at random from
those that the node is currently connected to? What action should be performed in the case that a
node is temporarily disconnected? We will consider all of these questions with the goal of
extending our methods to such time-varying networks. In this task, we will collaborate with
EDIN and their work on modeling evolving networks and their communities.
Integrated networks. We will extend our methodology to cover integrated networks. A
particularly interesting question is to establish how representative is a network sample that spans
multiple types of networks. Clearly, this will require a new set of metrics, which will take
explicitly into consideration the correlations of properties across different types of networks. A
characterization of an on-line social network is often more abstract, meaning that I can have
many ―friends‖ but not of equally active interactions. However, if we combine information from
traffic measurements with network sampling, we can identify a subnetwork that characterizes the
active relationships among nodes with in the social network. Going one step further, the
snowball sampling methods can be on purpose ―biased‖ to follow such active links, essentially
using information from the communication network to sample the social network.
Subtask R1.1.2: Extracting knowledge through clustering
The goal of this subtask is to identify ―clusters‖ of nodes to obtain insight on the structure and
the dynamics of the network.

NS CTA IPP v1.4 6-14 March 17, 2010


Prior work
Although clustering is a very well studied area in general, there are still many open problems that
need to be explored, especially in the context of dynamic and integrated networks. PIs Faloutsos
and Towsley have a significant body of work in modeling the topology of communication
networks, and quantifying their clustering properties [Ilio09] [Beye08] [Li08] [Jaiswal04].
Key Research Questions
Can we identify structure and understand how this affects the operation of the network
and its capability to meet its goals? An example is the discovery of implicit or explicit
hierarchical structures.
Initial Hypotheses
Can clustering reveal structure and patterns in integrated networks?
Technical Approach
The typical definition of a cluster refers to nodes that are more tightly connected among
themselves than with the rest of the network. However, in our case, we start from real data not an
already cleanly defined graph: the concept of a connection, represented by an edge, can take
many different meanings. In practical scenarios, an edge can contain information, for example,
the frequency, the intensity or the type of the communication. In other words, defining what
constitutes an ―edge‖ can create many different connectivity patterns from a single initial real
network description. We propose to address this question in depth and in fact use multiple
different definitions of what is an edge in order to see different ―slices‖ and aspects of the
network behavior.
In the first year, the focus will be to develop methods and tools for efficiently identifying clusters
in integrated networks, which combine communication, information, and social networks. In
fact, we need first to understand the clustering properties of each network in isolation before we
attempt to model clusters across different types of networks. An interesting question here is: how
will the inter-network clusters (e.g. between communication and social networks) relate to intra-
network clusters (within a single type of network)? We will address this question in years two
onwards after having established an arsenal of analytical tools.
Our technical approach will be based on the identification of cluster metrics, which will help us
define the quality of clustering.

NS CTA IPP v1.4 6-15 March 17, 2010


Among the several ways to assess clustering quality, we propose to start with the Scaled
Coverage Measure (SCM) [Li08], which captures how ―good‖ the clustering is for each node in
the network. Specifically, each node in a topology has a local SCM value determined by its
connectivity and the structure of its cluster. A node, vi, in a topology that has been partitioned by
a clustering method, it is associated with two sets of nodes: NNBR(vi) that is the set of nodes in
its cluster but are not directly connected with it, and XNBR(vi) which is the set of neighbors of vi
not in the same cluster as vi. The SCM for node vi defined to be:
1 - [ | NNBR(vi) + XNBR(vi)| / |NBR(vi) U Cluster(vi)| ],
where NBR(vi) is the set of neighbors of vi and Cluster(vi) is the nodes in the same cluster as vi.
In the context of our work, we expect to have to expand beyond this definition, especially when
we consider evolving integrated networks. In addition, we are going to characterize a clustering
using other metrics, which include: (a) the cluster size distribution, (b) inter-cluster

The Complementary Cumulative Distribution Distinguishing network applications by the clusters


Function of the cluster size of the Internet their generate: plotting the size of the larger connected
topology in 2001. The distribution is highly component versus the number of clusters that an
skewed with few very large clusters. [Li08] network application generates [Ilio07].

connectivity, which captures the connectivity between clusters, (c) intra-cluster connectivity,
which captures the connectivity inside the clusters.
Using these metrics we propose to explore different clustering algorithms and adopt them for the
needs of our networks. Note that different clustering algorithms have different performance
depending on the properties of the network and also the definition of the goal of clustering. A
critical issue for example is the degree distribution and the existence of degree correlations
(assortativity). The PIs have extensive experience with clustering algorithms in specific contexts
namely, clustering and modeling the Internet topology, and identifying groups of communicating
nodes by analyzing network traffic as can be seen in the Figures above.
Specifically, for the first year, we propose to see how much information we can extract from the
topology of a network and we will focus on: (a) the topology of communication networks, (b) the
network wide interactions between computer devices, (c) social networks, focusing on online
social networks such as eBay, YouTube, and Enron data set. For eBay and YouTube, we have

NS CTA IPP v1.4 6-16 March 17, 2010


already collected data that ranges from a few years for eBay and a few months for YouTube
[Beye08]. Most of the data will be made available to CTA members. We will develop suitable
algorithms for clustering and methods to interpret the clustering results. The goal in subsequent
years will be to extend the work to integrated networks.
Validation Approach
The validation of the proposed work will involve the application of the developed algorithms to
test data that will be conducted on well known structure and properties, and known in its entirety,
and it will be either (a) real data, for which we happen to know in depth, or (b) artificially
generated data, using generators that promise to produce realistic networks at least according to
some graph metrics. Specifically, the PIs have significant experience with communication
networks, the Internet topology and ad hoc networks, both in terms of real network, but also in
terms of realistic generators. In addition, we will use all possible data made available to the CTA
from Army agencies.
Summary of Military Relevance
The problem addressed here is fundamental in our ability to move from the plane of measured
data and observed behavior to ―knowledge‖. Here are some concrete examples: (a) we can
identify communications nodes or sensors that have passed to the hands of the enemy and are
used to impede the communication functions, (b) we can identify information exchanges such as
email, that can point to abnormal behavior, of someone attempting to contact or access people
and information that they should not, and (c) we can discover and maintain key military
relationships and communication channels, which are essential for the success of a mission . The
capabilities developed in this project will benefit directly our ability to develop trusted networks
that are reliable in their operation and protected from malicious users.
Research Products
In the first year of this project, we expect to provide a foundation of tools, models and
understanding of how to extract knowledge from integrated networks. More concretely, we
expect to deliver: (a) a comprehensive toolset for sampling and clustering real network data
efficiently, and (b) in depth study of fundamental clustering properties and patterns of real
networks.
References

[Beye08] "The eBay Graph: How Do Online Auction Users Interact?" Yordanos Beyene and M.
Faloutsos and Polo Chau and Christos Faloutsos IEEE Global Internet, 2008
[Bu02] Tian Bu, Donald F. Towsley: On Distinguishing between Internet Power Law Topology
Generators. INFOCOM 2002
[He08] "Policy-Aware Topologies for Efficient Inter-Domain Routing Evaluations,‖ Yihua He
and Michalis Faloutsos and Srikanth V. Krishnamurthy and Marek Chrobak IEEE INFOCOM
2008 Mini-Conference.
[Heckathorn97] "Respondent-Driven Sampling: A New Approach to the Study of Hidden
Populations." Social Problems. (1997)
[Ilio07] "Network Monitoring Using Traffic Dispersion Graphs (TDGs)" Marios Iliofotou , P.
Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, G. Varghese ACM/USENIX Internet
Measurement Conference (IMC 07) 2007.

NS CTA IPP v1.4 6-17 March 17, 2010


[Ilio09] "Graph-based P2P Traffic Classification at the Internet Backbone" M. Iliofotou, H.
Kim, M. Faloutsos, M. Mitzenmacher, P. Pappu, G. Varghese IEEE Global Internet
Symposium, 2009
[Jaiswal04] Sharad Jaiswal, Arnold L. Rosenberg, Donald F. Towsley: Comparing the Structure
of Power-Law Graphs and the Internet AS Graph. ICNP 2004
[Krishna07] "Sampling Large Internet Topologies for Simulation Purposes" V.Krishnamurthy,
M. Faloutsos, M. Chrobak, L. Lao, JH Cui, and A. G. Percus Computer Networks, 51(5), 2007.
[Li08] Characterizing and Modeling Clustering Features in AS-Level Internet Topology 
 Yan
Li, Jun-Hong Cui, Dario Maggiorini, and Michalis Faloutsos 
 IEEE INFOCOM'08, 2008

6.5.7 Task R1.2: Advanced Mathematical Models: Economic / Market-based


Approach to Modeling Integrated Networks (D. Parkes, Harvard (IRC); M.
Wellman, UMich (IRC); V. Kawadia, BBN (IRC))
NOTE: This is a 6.1 basic research task.

Task Overview
We will pursue the task of economic modeling of composite networks at two levels. We will
seek to effect useful coordination across and within networks by inferring the utility of actors
within the networks, identifying simple parameterizations of the decision environment that
facilitate automated control to improve behavior, and developing incentive-compatible
mechanisms to elicit additional information as necessary from participants. We will also use
market modeling to capture specific network resource allocation scenarios, and analyze the
resulting models as economic systems.
Task Motivation
Mathematical economics provides a well-developed mathematical theory to model the behavior
of large systems of (approximately) rational agents as they act to deploy limited resources for
value creation. A fundamental principle is that rational (i.e., utility maximizing) agents respond
to incentives, and so identifying the operable incentives is always a first step of analysis. The
economic perspective extends naturally to decision making over networks that restrict
information of participants, provide structure to interdependencies between participants, and
constrain available actions.
Key Research Questions
A prerequisite to developing economic network models will be to understand the ways in which
different network planes (information, communication, social) come together in competition for
resources, and in service of each other‘s functions. Key research questions include:
What are the resources that are in supply and demand by each network?
What does this suggest about where to position a small number of market processes, that
will provide coordinated allocation for those resources and services?
Initial Hypotheses
An economic and market-based approach to modeling complex interconnected systems bears on
two distinct, but equally important dimensions of network analysis: understanding, and control.
These two dimensions of analysis apply both within individual networks and across networks,

NS CTA IPP v1.4 6-18 March 17, 2010


and are of direct relevance to the study of heterogeneous and dynamic networked scenarios and
intervention to promote effective utilization by all participants.
Technical Approach
One way to integrate the network planes in a composite economic model is to view each plane as
consuming/producing resources and services from/for the others, with the details of the
production abstracted and kept within the boundaries of that plane. The market serves to
coordinate the activities of the network planes. We can then evaluate the composite network with
respect to its overall goal of promoting fast, high-quality decision making by its ultimate users.
Note that this is a modeling and control research agenda, and distinct in year one from an explicit
market design agenda.
Our research consists of two subtasks, one of which (on market design) will start in year 1, and
one of them (on market modeling) will be deferred to year 2.
Subtask R1.2.1: Market design and Environment Design on Networks (D. Parkes,
Harvard (IRC); M. Wellman, UMich (IRC); V. Kawadia, BBN (IRC))
Task Overview
The broad thrust of this ―control‖ based element in our economically-principled approach to
network science is that we will seek to study:
market-design: design markets to intermediate resource allocation across different
network planes,
environment-design: perturb the decision problems of network actors to change behaviors
in useful ways,
revealed-preference: seek to bring economic control paradigms to bear with passive
methods that do not insist on active elicitation of the preferences of participants.
Task Motivation
Because networks represent inherently decentralized activities, achieving coordinated behavior
of any kind requires some mediating mechanisms (implemented by technical means or simply
rules or conventions) for entities to interact. For example, a communications network requires
some mechanism to determine what bandwidth resources are allocated to what activity at what
time. Economics provides a repertoire of mechanisms, such as call markets, competitive
equilibrium, and auction mechanisms, in support of effective coordinated control for a range of
situations. In engineered networks, a central challenge is to design economic mechanisms to
meet specified control goals.
Prior Work
Computational mechanism design (CMD) [Chapter 10, Nisan et al. 2007, Dash et al. 2003,
Feigenbaum et al. 2002, Feigenbaum and Shenker 2002, Levin et al. 2008, Shneidman and
Parkes 2004] and market design [Roth and Peranson 1999, Roth 2002, Milgrom 2004, Lubin et
al. 2007] seek to design allocation mechanisms to satisfy particular design criteria; e.g., stability
so that no coalition can improve their payoffs by deviation, incentive-compatibility so that no
agent can usefully manipulate the outcome of the mechanism by misreporting preferences,
efficiency so that the aggregate utility of participants is maximized, and fairness (e.g. no-envy) so
that the outcome is viewed as equitable by participants. The proposed work is novel because it
seeks to intermediate across the different functional planes of a complex network, in which the
agents represent functional components rather than individual actors.

NS CTA IPP v1.4 6-19 March 17, 2010


Computational environment design [Zhang and Parkes 2008, Zhang et al. 2009] has developed
algorithmic methods for single-agent environments in which the goal is to change the decision
problem of an agent in order to usefully perturb its behavior; e.g., policy teaching seeks to
allocate reward in order to achieve a particular policy. The novelty here is to seek to extend this
agenda to a networked setting. Turning to passive learning of agent preferences-- economic
theory insists on agents making decisions that are consistent with utility theory [von Neumann
and Morgenstern 1944], and through the theory of revealed preferences [Arifat 1967] allows for
an agent‘s utility function to be learned; if an agent chooses decision A over decision B, then this
implies that its utility for A is greater than for B under the current decision context.
For market design, we will need to identify the resources that are under contention across
networks. Only once these are delineated, we can pose precise questions about the right form of
mechanism or market coordination. There is a vast wealth of knowledge on the design of
combinatorial auctions [Cramton et al. 2005], matching markets [Roth 2008, Hatfield and
Milgrom 2005], assignment problems without money [Sönmez and Ünver 2008], with the
appropriate market framework heavily dependent on details such as property rights (who has
them), whether payments are available (we intend to ask this question in the context of Army
networks), and the most important design constraints. Further steps will require an understanding
of interactions across the networks and shared metrics for performance of networks, in order to
define the problem of allocation of resources across networks. For year one, we will seek to
identify the resources to be allocate across networks, with specific market design deferred to
years two or three and contingent on receiving appropriate metrics and inputs from other centers.
Technical Approach
For environment design, the first step will be to formulate the problem in a network setting with
agents following local, partially-observed Markov decision processes. For this, we have in mind
developing one or two canonical examples of agents acting on networks, modeled as graphs in
which the agents are associated with vertices and edges indicate a utility dependence between
one agent and another agent (so that the utility to one agent is conditionally independent of the
actions of all other agents on the network once the actions of neighbors are determined.) We will
first assume that each agent has a fixed utility function that is unknown to the ―principal‖ who is
responsible for designing interventions to effect useful network-wide behavior, making its utility
for the joint action profile on the network through optimal reward/action interventions. For this,
we will need to adopt a particular model for how agents reason about the decisions of
neighboring agents (e.g., myopic best-response or best-response to an empirical average of
utility). It should be reasonable to have a formal definition and an initial algorithmic approach to
solving the problem by the end of year one.
For the direction of revealed preference, which allows the market approach to be hidden from
participants and embrace automated and human components, our current work has assumed
complete observation of relevant local environment variables and consistent behavior. To make
progress in realistic network domains, we will need to adopt a robust approach to revealed
preference learning. In the first year we propose to explore the relative merits of two different
approaches: one will retain a constraint-based approach, but allow for some slack in satisfying
constraints and find a utility model that is robust in the sense of minimizing the maximum slack
(or violation) over all constraints; a second will adopt a machine learning approach and find the
model with the best generalization power. The network aspect of this research is that the
presence of other agents on a network will change the inference about agent preferences because

NS CTA IPP v1.4 6-20 March 17, 2010


of utility interdependence. We will insist that the utility model can be used together with
environment or market design. A reasonable goal for the end of year one will be to develop a
couple of robust methods for revealed preference elicitation, and be ready to test the methods on
noisy data; e.g., collected from an online labor market, large-scale online game, or a network
testbed.
Validation Approach
We will investigate the potential of advanced market designs, particularly combinatorial
[Cramton et al., 2005] and multiattribute [Lochner & Wellman, 2009] auctions, to improve
efficiency when the resources to be allocated are highly interdependent and/or multidimensional.
We will use data collected from the game-based simulation/experimentation environment
(developed in Project R3) to study revealed preference learning of utility functions.

Subtask R1.2.2: Market Modeling (D. Parkes, Harvard (IRC); M. Wellman, UMich (IRC);
V. Kawadia, BBN (IRC))
NOTE: This activity will be deferred to year 2.

In a direct approach to economic modeling, we can map specific network resource allocation
scenarios—or generic resource allocation problems on networks—to literal market systems,
where agents interact with neighbors through the explicit exchange of goods and services at
negotiated prices.
Many previous works have developed computational market-based methods for resource
allocation on networks or other decentralized environments. One of the most active areas has
been in markets for computational resource allocation, starting over 40 years ago, and revisited
periodically thereafter in a range of computational settings [Clearwater, 1995; Kurose et al.,
1985; Mullen & Wellman, 1995; Nisan et al., 1998; Stonebraker et al., 1996; Waldspurger et al.,
1992; Wellman et al., 2001]. Attempts to ground computational markets in microeconomic
general equilibrium theory led to a general methodology for market-oriented programming
[Wellman, 1993; Ygge & Akkermans, 1999]. Recent work in grid and utility computing has
naturally brought a resurgence of interest in the market-based control idea [Buyya &
Bubendorfer, 2009; Buyya et al., 1999; Lai et al., 2005, Lubin et al., 2009; Wolski et al., 2001].
The successes and failures of these various systems—particularly those operated on networks—
are highly instructive for development of computational market infrastructure for all kinds of
resources.
We start from the general-equilibrium perspective, but extend the classical microeconomic
model to operate dynamically over time, and through explicit computational market mechanisms.
Our basic price-determination mechanism is the call market—a two-sided auction that matches
trades periodically according to well-defined bidding and clearing rules [Wurman et al., 1998].
Network structure defines the possible trade pathways, and market functions are implemented by
message passing on these networks. Depending on the scenario, market operations (bidding,
clearing) may be conducted synchronously or asynchronously, with informational latencies
governed by network capacities and market rules.
For example, the graph below depicts a simple supply chain network, where oblong nodes
represent agents, and circular nodes represent resources (goods or services) and their
corresponding markets. Agents a1 and a2 are suppliers, who can provide raw materials to the
producers a3, a4, and a5, who can provide finished tasks to the end consumers (e.g.,

NS CTA IPP v1.4 6-21 March 17, 2010


commanders) c1 and c2. For example, c1 and c2 may be military commanders who seek to
achieve tasks 3 and 4, respectively. Depending on the relative values (priorities) of these
commanders, costs of production, and alternative uses for the resources, the agents will form
exchange connections on the network [Walsh & Wellman, 2003], and one or both of the end
tasks will be achieved.

The computational market on a network will directly model canonical resource allocation
scenarios, such as supply chain formation and management (shown above), and allocation of
generic goods (where the network captures communication links).
Validation Approach
We will evaluate the effect of operational market parameters (e.g., periodicity of call markets,
communication latency), network structure, and other environmental conditions (e.g., demand
volatility) on allocation outcomes. The key measure of outcome quality is efficiency: how well
does the market allocate resources to their most valued uses over time.
References

S. N. Afriat, The Construction of Utility Functions from Expenditure Data, International


Economic Review 8 (1) (1967) 67–77.
Rajkumar Buyya and Kris Bubendorfer, editors. Market-Oriented Grid and Utility Computing.
Wiley, 2009.
Rajkumar Buyya, Chee Shin Yeo, and Srikumar Venugopal. Market-oriented cloud computing:
Vision, hype, and reality for delivering IT services as computing utilities. In Tenth IEEE
International Conference on High Performance Computing and Communications, Dalian, China,
2009.
Scott Clearwater, editor. Market-Based Control: A Paradigm for Distributed Resource
Allocation. World Scientific, 1995.
Peter Cramton, Yoav Shoham, and Richard Steinberg, editors. Combinatorial Auctions. MIT
Press, 2005.
Rajdeep K Dash, Nicholas R. Jennings, and David C. Parkes, Computational-Mechanism
Design: A Call to Arms. IEEE Intelligent Systems, November 2003, pages 40-47 (Special Issue
on Agents and Markets).

NS CTA IPP v1.4 6-22 March 17, 2010


Joan Feigenbaum, Christos Papadimitriou, Rahul Sami, and Scott Shenker. A BGP-based
mechanism for lowest-cost routing. In Proc. of the 2002 ACM Symp. on Princ. of Distr.
Comput., pages 173–182, 2002.
Joan Feigenbaum and Scott Shenker. Distributed Algorithmic Mechanism Design: Recent
Results and Future Directions. In Proc. 6th Int. W. on Discrete Alg. and Methods for Mobile
Computing and Commun., pages 1–13, 2002.
John William Hatfield and Paul Milgrom, Matching with Contracts, American Economic Review,
95(4), September, 913-935, 2005.
K. Jain and V. Vazirani. Eisenberg-Gale Markets: Algorithms and Game-Theoretic Properties. In
Proc. STOC 2007. To appear in Games and Economic Behavior.
Frank Kelly, Charging and rate control for elastic traffic, EuropeanTransactions on
Telecommunications, 8(1), 1997, 33–37.
James F. Kurose, Mischa Schwartz, and Yechiam Yemini. Amicroeconomic approach to
decentralized optimization of channel access policies in multiaccess networks. In Fifth
International Conference on Distributed Computing Systems, pages 70–77, 1985.
Kevin Lai, Lars Rasmusson, Eytan Adar, Li Zhang, and Bernardo A. Huberman. Tycoon: An
implementation of a distributed, market-based resource allocation system. Multiagent and Grid
Systems, 1:169–182, 2005.
Hagay Levin, Michael Schapira and Aviv Zohar . Interdomain Routing and Games. SIAM
Journal on Computing (SICOMP) Special Issue with selected papers from STOC 08 (accepted
subject to minor revision).
Kevin M. Lochner and Michael P. Wellman. Information feedback and efficiency in
multiattribute double auctions. In First Conference on Auctions, Market Mechanisms and their
Applications, pages 26–39, Boston, 2009.
Benjamin Lubin, Jeffrey O. Kephart, Rajarshi Das, and David C. Parkes. Expressive power-
based resource allocation for data centers. In Twenty-First International Joint Conference on
Artificial Intelligence, pages 1451–1456, 2009.
Benjamin Lubin, Adam Juda, Ruggiero Cavallo, Sébastien Lahaie, Jeffrey Shneidman, and
David C. Parkes. ICE: An Expressive Iterative Combinatorial Exchange. Journal of Artificial
Intelligence Research 33, 2008, pages 33-77
Paul Milgrom. Putting Auction Theory to Work. Cambridge: Cambridge University Press, 2004.
Tracy Mullen and Michael P. Wellman. A simple computational market for network information
services. In First International Conference on Multiagent Systems, pages 283–289, San
Francisco, 1995.
Noam Nisan, Shmulik London, Ori Regev, and Noam Camiel. Globally distributed computation
over the Internet: The POPCORN project. In Eighteenth International Conference on Distributed
Computing Systems, pages 592–601, Amsterdam, 1998.
Noam Nisan, Tim Roughgarden, Eva Tardos, and Vijay Vazirani (eds.). Algorithmic Game
Theory, Cambridge University Press, 2007

NS CTA IPP v1.4 6-23 March 17, 2010


Alvin E. Roth ―The Economist as Engineer: Game Theory, Experimental Economics and
Computation as Tools of Design Economics,‖ Fisher Schultz lecture, Econometrica, 70, 4, July
2002, 1341-1378.
Alvin E. Roth "Deferred Acceptance Algorithms: History, Theory, Practice, and Open
Questions," International Journal of Game Theory, Special Issue in Honor of David Gale on his
85th birthday, 36, March, 2008, 537-569.
Alvin E. Roth and Elliot Peranson, The redesign of the matching market for American
physicians: Some engineering aspects of economic design. AER 89(4): 748-780, 1999
Jeffrey Shneidman and David C. Parkes. Specification Faithfulness in Networks with Rational
Nodes. In Proc. 23rd ACM Symp. on Principles of Distributed Computing (PODC'04), St.
John's, Canada, pages 88-97, 2004.
T. Sönmez and U. Ünver. Matching, Allocation and Exchange of Discrete Resources, in
Handbook of Social Economics, 2008
Michael Stonebraker, Paul M. Aoki, Witold Litwin, Avi Pfeffer, Adam Sah, Jeff Sidell, Carl
Staelin, and Andrew Yu. Mariposa: A wide-area distributed database system. VLDB Journal,
5:48–63, 1996.
John von Neumann and Oskar Morgenstern: Theory of Games and Economic Behavior,
Princeton University Press (1944).
Carl A. Waldspurger, Tad Hogg, Bernardo A. Huberman, Jeffrey O. Kephart, and Scott
Stornetta. Spawn: A distributed computational economy. IEEE Transactions on Software
Engineering, 18:103–117, 1992.
Michael P. Wellman. The economic approach to Artificial Intelligence. ACM Computing
Surveys, 1995.
Michael P. Wellman. A market-oriented programming environment and its application to
distributed multicommodity flow problems. Journal of Artificial Intelligence Research, 1:1–23,
1993.
Michael P. Wellman, William E. Walsh, Peter R. Wurman, and Jeffrey K. MacKie-Mason.
Auction protocols for decentralized scheduling. Games and Economic Behavior, 35:271–303,
2001.
Rich Wolski, James S. Plank, John Brevik, and Todd Bryan. Analyzing market-based resource
allocation strategies for the computational grid. International Journal of High Performance
Computing Applications, 15:258–281, 2001.
Peter R. Wurman, William E. Walsh, and Michael P. Wellman. Flexible double auctions for
electronic commerce: Theory and implementation. Decision Support Systems 24:17–27, 1998.
Fredrik Ygge and Hans Akkermans. Decentralized markets versus central control: A comparative
study. Journal of Artificial Intelligence Research, 11:301–333, 1999.
Haoqi Zhang and David C. Parkes. Enabling Environment Design via Active Indirect Elicitation.
In Proc. AAAI Workshop on Preference Handling, Chicago, IL, July 2008
Haoqi Zhang, David C. Parkes, and Yiling Chen. Policy Teaching Through Reward Function
Learning. In Proc. 10th ACM Electronic Commerce Conference (EC'09), pages 295-304, 2009.

NS CTA IPP v1.4 6-24 March 17, 2010


6.5.8 Task R1.3: Category Theory Based Approach to Modeling Composite
Networks (M. Kokar, NEU (IRC); V. Kawadia, BBN (IRC);
Collaborators: P. Basu, BBN (IRC); J. Hendler, RPI (IRC); C. Cotton, D.
Sincoskie, UDel (IRC))
NOTE: This is a 6.1 basic research task.

Task Overview
This task focuses on using key ideas from category theory to model composite networks. This
promises to give us an alternative mathematical modeling toolset to augment the structure-
focused tools that will be developed in Project E1 and E2.
Task Motivation
One of the primary tasks for the IRC is to provide the understanding of how global network
properties or behaviors can be composed from properties of information, social-cognitive, and
communications networks. Consequently, model development must also follow the
compositional pattern –the model development process must capture how the composition
influences the overall structure and how human effectiveness with respect to the mission can be
assessed and influenced by both the properties of particular networks and by the interactions
among them within the composed model of network of networks. For this reason, the model
must be crafted systematically, with the focus of providing the high fidelity of the representation
with respect to the behavior of the real system.
The problem of derivation of models of composed networks is exacerbated by the fact that in this
project we are not dealing with a homogenous collection of simple objects, but rather with a
collection of networks, which are very complex objects to model and analyze. On top of this
complexity is the additional layer of complication, perhaps the most difficult one, resulting from
the fact that three different types of network are involved. This non-homogeneity, both within
one type of network as well as among the three types of network, calls for modeling
(mathematical) tools that are very abstract and thus are capable of capturing the commonalities
and the differences among various components and still be able to provide means for deriving
models of such complicated structures. This is the main reason why we are proposing to use
category theory – a mathematical framework that proved to be very efficient in capturing
representations of many, seemingly disparate, mathematical structures.
Key Research Questions
What kind of mathematical formalism is needed to model complex heterogeneous
networks in a compositional way?
Prior Work
In this research thrust we are proposing to use the approach to modeling of heterogeneous
networks based on the principles of category theory (c.f. [Goldblatt, 1984; Pierce, 1991]). This
approach to modeling relies on the colimit operator of category theory. Intuitively, the colimit
operator is an extension of the shared union operator of set theory. The shared union operator is
applicable only to sets. The colimit operator, on the other hand, is applicable to any objects in a
category. For instance, if the objects are processes (or algorithms) on a particular network –
social, information, or communications - then the colimit of two processes allows for a
systematic and consistent weaving of the two processes into one in such a way that pieces of
each of the process (sub-processes) contribute their own parts to the overall process and their

NS CTA IPP v1.4 6-25 March 17, 2010


contributions are still clearly identified. When each of the component processes carries
information about the quality (e.g., the uncertainty measure), they can then be combined into an
integrated quality measure of the composite network, which in turn assesses the effectiveness of
decision making based on the whole system.
In this task, the components of the solution will involve the three networks – social, information,
and physical – and the combined effectiveness measure will account for the contributions of the
sub-solutions to the overall goal of the system. This research will contribute to both the modeling
of composite networks and to the composite metrics on networked systems. Since category
theory is such an abstract mathematical theory, it also can potentially serve as a mathematical
tool that can be used to integrate the various mathematical tools within this whole research effort.
This research will rely on the previous work done by this project lead (cf. [Kokar et al., 2004]).
In this work, category theory was used to represent fusion nodes and relationships among the
nodes. The fusion operator was expressed using the colimit concept of category theory.
Technical Approach
We plan to pursue two parallel tracks of research:
Thrust A: Theoretical studies of the use of category theory for the purpose of modeling
composite networks and decision making within such networks.
Thrust B: Experimentation with representation and inference within networked structures
using the theoretical models developed in thrust A. In particular, we will pursue two sub-
thrusts within this thrust:
o Development of a meta-model for representing different kinds of network.
o Analysis of examples of various network structures.
The idea of using category theory for network science was proposed by Bonick [Bonick, 2006].
The main idea there was to model sensor and agent nodes as category objects and information
flows as category arrows. Additionally, the relations among the category objects can be
characterized in a logic, which again can be interpreted in categorical terms. This approach leads
to the use of topos [Goldblatt, 1984]. It is clear, however, that the relations among network
nodes, in particular among networks of different types, are very complex and the modeling of
such relations might require the use of higher categories, i.e., categories where objects in the
category are arrows from a lower level category (for a second order). This kind of construction
can be carried out for more than two levels of abstraction. We plan to pursue such an approach in
Thrust A of this project.
In Thrust B, we will investigate the use of meta-modeling with the specific focus on the three
kinds of network. This approach is related to the work presented, e.g., in [Krackhardt & Carley,
1998]. In this approach, we propose to develop a meta-model, or a representation language, for
representing various networks. This approach is similar to the modeling practices in software
engineering, where the Meta-Object Facility (MOF) is used to specify modeling languages, like
UML, and then such languages are used to capture models of software systems. This approach
can also be related to the ontological modeling used in the domain of Semantic Web. We
anticipate that result of other projects in this whole effort will be utilized for our purpose, but in
the beginning, we will develop a simple meta-model of our own.
In particular, we foresee two levels of modeling network composition:

NS CTA IPP v1.4 6-26 March 17, 2010


The macro level modeling in which each network is considered as an object with
properties associated with each network.
The micro level modeling in which each node on the network is considered as an object
with properties associated with each such node.

Subtask R1.3.1: Categories for modeling heterogeneous networks and networks of


networks

This task will address the question – what are the categories that are appropriate for representing
complex heterogeneous networks and networks of such networks. This task is the realization of
Thrust A above.
The first seemingly natural attempt would be to use the category of algebras to model this kind
of networks. Such a category can be understood as the category of specifications of software
components, i.e., collections of sets and some operations on sets. The operations are always
functions. However, it has been already recognized by the researchers in computer science (cf.
[Anlauff et al., 2004]) that such categories are not adequate for modeling behavioral components.
For instance, while such categories are very useful for modeling functional programs (cf.
[Pavlovic & Smith, 2003]), they cannot model programs with state and global variables. Since
networks considered in this project are not assumed to be functional (i.e., we do not assume that
a network node will respond to some informational inputs always in the same manner
independently of the history of such inputs in the recent time), therefore we need to look further,
beyond the category of algebras. At this point we anticipate that even the category of Espec,
partially formalized in [Anlauff et al., 2004], will not suffice for this particular problem. We will
use Espec category as a starting point and will propose a new category that will faithfully
represent heterogeneous networks of networks, including the evolution of networks due to both
the passage of time and of external events.
In this way we will pursue (constructively) the plan of research suggested by Bonick. In other
words, the result of this task will be a higher category along with justification of the selection of
the category. One of the important issues in this task will be the decision on how to assess the
appropriateness of a proposed category to the task of modeling at hand. For this purpose we will
use the results of Task 2, in which we will develop examples of representations of networks of
networks. These examples will serve as the test cases for the theory we develop in this task. The
feature assessed by this testing will be the fidelity of the representation of a system (in this case
these will be simulated systems) by the proposed modeling formalisms. Towards this goal, the
developed test cases must provide information on what the expected results of the tests should
be.
Subtask R1.3.2: Meta-models for modeling heterogeneous networks and networks of
networks

This task will address the question of what king of meta-modeling language is needed to
faithfully represent heterogeneous networks and networks of such networks. This task is the
realization of Thrust B above.

NS CTA IPP v1.4 6-27 March 17, 2010


At the macro level, this task will focus on the characterization of networks, in particular of their
properties, based upon the properties of their component-networks. In particular, we will
consider the effects of composition in terms of the influences of one of the network types –
Social/Cognitive, Information, Communication – on the other types of network.
We assume that there will be an ontology developed by other teams that will provide a language
for the characterization of such networks. While this ontology will be in the development stage
in the beginning, we will start our research with a simple prototype ontology.
In our first-cut approach, the meta-model for capturing each of the three network types will
include Nodes and Edges. Nodes will be modeled as ontology classes, while edges mainly as
ontology properties. In case of higher-arity properties, such properties may be reified for the
purpose of representability in an ontology language like OWL. However, our approach will not
be limited to the restrictions imposed by ontology languages, like OWL.
Examples of such meta-properties may include such relations as connectedTo (network A is
connected to network B), subNetworkOf (network A is a sub-network of network B). Moreover,
classes specific to particular network types will be provided, e.g., Agent (mainly for social
networks), Human, Bot, as well as special properties like knows (Agent A knows Agent B),
trusts (Agent A trusts Agent B), controls (Agent A controls Agent B), influences (Agent A
influences Agent B). Additionally, some attributes (also referred to as datatype properties) will
be provided, e.g., size (of a network).
For modeling Information Networks, the basic class may be Ontology, while the properties may
include such relations as imports (ontology A imports all of ontology B), overlaps (ontology A
has concepts in common with ontology B). Attributes may include size of an ontology in terms
of the number of classes, properties and constraints.
For modeling Communication Networks, the main class may be termed Device, while properties
can include connected (network A is connected to network B). Attributes may be of Boolean
type, e.g., disconnected (network A is not connected to any network), temporarily-disconnected
(network A is not connected to any other network for a limited period of time), throughput.
At the micro level, the concepts for each of the types of network will be less common than for
the macro level, however some overlap in the concepts must exist in order to have some points of
reference in the mapping of the concepts. For instance, for Information Networks, the meta-
classes may be Class, Property, Constraint. These are all concepts that are part of ontology
modeling languages. The properties then can be any properties expressible in a language like
OWL. For Communication Networks, nodes can be specialized to NetworkComponent, while
properties can be such as isComponentOf, contains and others.
The next step after developing such a meta-model (or ontology) will be to develop a set of cases,
i.e., a set of specific networks represented in the developed formalism. These representations will
serve as test cases for the theoretical concepts developed in Task 1 described above.
It is important to stress here that the specifications of particular networks will contain
specifications of various functions associated with nodes. The functions will be of higher arity,
i.e., will take possibly many arguments and returning vectors of values. Model composition will
have to compose specifications of such functions consistently so that the resulting functions will
be consistently weaved, rather than just placed next to each other in the resulting spec. This is
where category theory will show its strength in terms of modeling of complex compositions.

NS CTA IPP v1.4 6-28 March 17, 2010


While it might seem that the development of such a meta-model should be a simple matter, it is
not. Meta-models developed in an ad-hoc manner will not serve the purpose and will not have a
long life. A good meta-model must capture all the commonalities of a domain and should still be
able to express all the differences, too. Typically such meta-models are developed for narrow
domains. For instance, while MOF is a good meta-modeling facility for software, it is not so
great for expressing concepts in the bio-medical domain. But even for software, the history of
MOF goes twenty years back and it is still recognized that the MOF needs to be revised (there
are efforts underway at the OMG to come up with a better MOF, in particular with formal
semantics). The lack of a formal semantics for MOF is the biggest problem with this framework.
The lack of formal semantics means the lack of capability of formal automatic inference. In other
words, one cannot mathematically verify a representation of a system with respect to the MOF.
And one cannot write a piece of software that can do this kind of formal verification.
OWL is a language with a formal semantics. Thus this feature of OWL gives an upper hand to
this language as compared to UML, but then there are issues with OWL that are not issues in
UML. For instance, the UML support of behavioral aspects (although with the semantics defined
in natural language, thus not formal) can be considered as an advantage of UML.
However, it is anticipated that the need for modeling behaviors of networks, in particular the
quantitative aspects, will push the requirements beyond the capabilities of both UML and OWL.
It is expected that the category developed in Task 1 will constitute a basis for a more expressive
language appropriate for modeling heterogeneous networks of networks.
Relevance to US Military Visions/Impact on Network Science
Consider a scenario in which a platoon leader needs to make a decision on whether to enter a
specific area known for harboring a terrorist group. He needs to get information on time and
reliable. He submits a query to the system. The system then traces all the links (defined in
category theory as morphisms) to identify relevant information sources (social net) and possibly
information items (info net). Both types are represented as theories or ontologies that are
relevant. System infers which pieces in the representation are the same (establishing a graph of
morphisms), fuses information from multiple theories into one using the colimit operation and
finally infers the answer to the submitted query. In this process, the system gets timeliness
metrics from the communication layer, while relevance and reliability metrics come from the
information and communication subnets.
References

[Anlauff et al., 2004] M. Anlauff, D. Pavlovic, D. Smith, Specification-Carrying Software:


Evolving Specifications for Dynamic System Composition, Kestrel Institute preprint, 2004.
[Bonick, 2006] James R. Bonick, ―Higher category theory as a paradigm for network
applications‖. Proceedings of SPIE Conference: Signal processing, sensor fusion, and target
recognition, April, 2006, Kissimmee, Florida, USA.
[Goldblatt, 1984] R. Goldblatt, ―Topoi: The Categorical Analysis of Logic‖, North-Holland,
1984.
[Kokar et al., 2004] M. M. Kokar, J. A. Tomasik, and J. Weyman. Formalizing classes of
information fusion systems. Information Fusion: An International Journal on Multi-Sensor,
Multi-Source Information Fusion, 5,3:189-202, 2004.

NS CTA IPP v1.4 6-29 March 17, 2010


[Krackhardt & Carley, 1998] David Krackhardt & Kathleen M. Carley, ―A PCANS Model of
Structure in Organization‖, in Proceedings of the 1998 International Symposium of Command
and Control Research and Technology Conference, p. 113-119, Monterray, CA, 1998.
[Pavlovic & Smith, 2003] Dusko Pavlovic and Douglas R. Smith. ―Software Development by
Refinement.‖ In UNU/IIST 10th Anniversary Colloquium, Formal Methods at the Crossroads:
From Panaea to Foundational Support, Springer-Verlag, 2003.
[Pierce, 1991] B. C. Pierce, ―Basic Category Theory for Computer Scientists‖, MIT Press, 1991.

6.5.9 Linkages with Other Projects


While most of the projects can (and will) contribute results that may influence the modeling
efforts of this particular project, the following projects will be in direct linkage with Task R1.3.1
and Task 2.4.2 of this project.
Task 2.4.1 has linkages with project E1.1, E1.2 and R3.4. The theory developed in Task 2.4.1
can provide insights into how to develop ontologies in E1.1 and how to construct metrics based
upon metrics assessing social/cognitive, information and communications networks. Feedback
from E1.1 and E1.2 will be very important for the purpose of assessing the appropriateness of the
theoretical direction taken in Task 1 of R1.3.
Task 2.4.2 has linkages with E1.1 and E1.2. In particular, the ontologies developed in E1.1 could
be used for the purpose of meta-modeling in Task 2.4.2. Task 2.4.2 will also develop an initial
ontology that could be considered by E1.1.
Both Task 2.4.1 and Task 2.4.2 have strong connections to R3.4. The PI‘s in R3.4 will develop
concrete descriptions of multi-layer networks that will be accurate representations of real
networks of networks. This will enable us to test the theoretical results developed in R1.3 on
real-world cases. On the other hand, the meta-modeling techniques developed in R1.3 will serve
as a way of representing the descriptions developed in R3.4 in a formal meta-modeling language.
Task R1.1 is tightly related to several other efforts within IRC and other centers. Specifically,
IRC R1.2 and R1.3 will provide a fundamental theory and models that will be used as a reference
point in this project. In addition, there is good synergy with SCNARC S2 (Malik Magdon-Ismail
RPI especially Task 1 Detection of Hidden Communities and Their Structures) and task E3.3 on
Evolution of Networked Communities and Impact on Integrated Networks. Both these projects
will help guide the definition of communities, and shape the clustering algorithms developed in
this project. Finally, this project will provide useful tools to IRC‘s Intelligent Network
Visualization research (year 2), while at the same time the visualization tools can help
researchers in this task by providing visual feedback on the effectiveness of the developed
algorithms and tools.
Note that the project lead, Faloutsos, is also involved in other tasks Trust, EDIN and this will
provide natural linkages. Specifically, Faloutsos is involved in Trust 2.2 (Task 2: Network
Behavior Based Indicators of Trust) and T3.1 (Trust Establishment via Distributed Oracles). The
project here will provide tools and modes (sampling and clustering toolkits) to facilitate the study
of trust establishment in large real networks. In addition, Faloutsos is also involved in EDIN E3:
Dynamics and evolution of composite networks, specifically Task E2.1 (Mathematical models
and representations of inter-linked time-dependent networks) and E3 (Evolution of Networked
Communities and their Impact). Clustering is closely related with the identification of

NS CTA IPP v1.4 6-30 March 17, 2010


communities and their evolution in time, so there will be a two-way interaction between the
needs of EDIN tasks E2.1 and E3.3 and the development of clustering methods here. Finally,
concepts from R1.2 will be used in defining communities and the functional role.

IPP Tasks Linkage


R1.1  T2.2, T3.1 Methods and tools in R1.1 will be provided to
the Trust research
R1.1  E2.1, E3.3 R1.1 will provide clustering methods and tools
while E2.1 and E3.3 will define the meaning and
requirements of what clusters/communities
should capture.
R1.2  E2.1, E3.3 Market-based graph concepts from R1.2 will be
used in defining communities by considering
their functional role.
R1.3  E1.1 Ontologies developed in E1.1 will be reconciled
with the ontologies developed in R1.3.
R1.3  E1.1, E2.2 New methods of modeling composite networks
by leveraging results from category theory will
provide means for developing models developed
in E.1.1 and E2.2
R1.3  E1.2 Metrics developed in E1.2 will be used for
modeling the impact of particular network
components (social / information /
communications) on the composed metric.
R1.3  E1.2 Methods of composition of metrics developed in
R1.3 will be used in E1.2 to test new proposed
metrics.
R3.4  R1.3 Concrete realizations of models of multi-layer
networks developed in R3.4 will be used in R1.3
to test the appropriateness of developed meta-
models of the developed ontologies. In general,
PI‘s in R3.4 will provide subject matter expertise
in capturing the relevant aspects of multi-layer
networks.
R1.3  R3.4 Meta-model developed in R1.3 will be used in
R3.4 to express models of multi-layer networks
developed in R3.4

6.5.10 Collaborations and Staff Rotations


Resource allocation problems in CNARC. Thomas La Porta and Amotz Bar-Noy are addressing
various problems in resource allocation on communication networks (from an optimization
perspective), and expressed some interest in working with us. One linkage would be if we took
some of their problems as subjects for economic mechanism design. Another would be for them
to specify optimization algorithms for agents to use as participants in economic mechanisms.

NS CTA IPP v1.4 6-31 March 17, 2010


Economic models of social networks. Jaideep Srivastava (IRC participant from U Minnesota)
does data mining work on social network data, particular large-scale online games. He has
expressed interest in working with us on economic modeling. One linkage would be for him to
provide us data about the economic interactions. Another would be for us to propose ways to
model the economic activity in their environments, for them to exploit in data analysis.
Task R1.1 will receive input from IRC R1.2 and R1.3 in terms of fundamental theory and
models. SCNARC S2, especially Task 1 Detection of Hidden Communities and Their Structures
will be facilitated through Malik Magdon-Ismail RPI and Basu from this project. EDIN Project
E3 will develop a contact through the interaction of leads Szymanski and Faloutsos. Finally, this
project will provide useful tools to IRC R3 through Hancock (ArtisTech).

6.5.11 Relation to DoD and Industry Research


The project does fundamental contributions to our ability to extract knowledge from the plane of
the measured data and observed behavior. Here are some concrete examples: (a) we can identify
communications nodes or sensors that have passed to the hands of the enemy and are used to
impede the operation of the network, which can be done by either limiting communication
capabilities (communication layer) or introducing erroneous and misleading data (information
layer), (b) we can identify information exchanges such as email, that can point to abnormal
behavior, of someone who is attempting to contact or access people and information that they
should not. The capabilities developed in this project will benefit directly our ability to establish
trusted networks that are reliable in their operation and protected from malicious users.

6.5.12 Project Research Milestones

Research Milestones

Due Task Description


Identify the resources to be allocated across network planes
Q2 R1.2.1 in composite network scenario. Responsible:
Harvard/Michigan.
Representation of a network of networks in the Espec
Q2 R1.3.1
category. Responsible: Northeastern University
Development of the macro meta-model for modeling
Q2 R1.3.2 networks of heterogeneous networks. Responsible:
Northeastern University
Extend sampling methods to provide accurate and realistic
Q2 R1.1 samples. Develop and evaluate an initial toolset of clustering
algorithms. Responsible: UC Riverside.
Develop methods for robust revealed preference elicitation,
Q3 R1.2.1 and prepare to test the methods on real behavioral data.
Responsible: Harvard/Michigan.
Q3 R1.3.1 Identification of potential candidate categories and

NS CTA IPP v1.4 6-32 March 17, 2010


Research Milestones

Due Task Description


identification of extensions needed. Responsible:
Northeastern University
Development of the micro meta-model for modeling
Q3 R1.3.2 networks of heterogeneous networks. Responsible:
Northeastern University
Evaluate sampling methods analytically and experimentally.
Q3 R1.1 Extend clustering methods to the specific requirements of
realistic networks. Responsible: UC Riverside.
Develop a formal definition of environment design to
Q4 R1.2.1 networked actors and an initial algorithmic approach.
Responsible: Harvard/Michigan.
Development of a new higher category for representing
networks of heterogeneous networks. Initial testing of the
Q4 R1.3.1
category on cases of Task 2. Responsible: Northeastern
University
Development of descriptions of a set of networks in the meta-
Q4 R1.3.2 model developed in the previous quarters. Responsible:
Northeastern University
Deliver a comprehensive toolset for sampling and clustering
real network data efficiently. Deliver reports on fundamental
Q4 R1.1
properties and patterns of real networks. Responsible: UC
Riverside.

6.5.13 Project Budget by Organization

Budget By Organization

Government Funding
Organization Cost Share ($)
($)

BBN (IRC) 478,743


Harvard (IRC) 79,915
NEU (IRC) 77,693

UCR (IRC) 124,511

UMass (IRC) 50,288


UMich (IRC) 93,559

NS CTA IPP v1.4 6-33 March 17, 2010


Budget By Organization

Government Funding
Organization Cost Share ($)
($)

UMinn (IRC) 45,611


TOTAL 950,320

6.6 Project R2: Characterizing the Interdependencies Among


Military Network Components

Project Lead: J. Hendler, RPI


Email: hendler@cs.rpi.edu, Phone: 518-276-4401
Project Lead: M. Dean, BBN
Email: mdean@bbn.com, Phone: 734-997-7439

Primary Research Staff Collaborators


K. Carley, CMU (IRC) M. Martin, CMU (IRC)
B. Carterette, UDel (IRC) M. Kowalchuk, CMU (IRC)
P. Basu, BBN (IRC) J. Reminga, CMU (IRC)
N. Contractor, NWU (IRC) J. Han, UIUC (INARC)
J. Hendler, RPI (IRC) B. Syzmanski, RPI (SCNARC)
M. Dean, BBN (IRC) TBD, CNARC

6.6.1 Project Overview


In 1948, Claude Shannon [Shannon48] identified some of the key properties of information
flowing through networks. In particular, Shannon explored fundamental limits on compressing,
reliably storing, and communicating data. Modern military communication networks, as well as
much of the modern IT industry, use the information theory derived from Shannon as a key
mathematical backbone of their work. Unfortunately, when it comes to the composite networks
being studied in the NS CTA, current information theory only goes so far. In this project we
explore how to extend the information theory that has been so important to fields ranging from
communications to complex military networks. Specifically, we explore the extension of
Shannon‘s definitions to modern networks, we examine how the effects of information loss can
be modeled at the level of social and cognitive networks, and explore decision and utility models

NS CTA IPP v1.4 6-34 March 17, 2010


that can help in formulating the overall network. In short, a key focus of this project is creating a
formal model of the interdependencies between the different components of composite networks.
This project is broken into two parts: 6.1 research in the formulation of the basic models, and 6.2
research (especially in the later years) that will focus on how these models can help us
understand how to disrupt enemy networks and on how to best protect our own as well as on how
to use this work to enhance team effectiveness for military units.

6.6.2 Project Motivation


The fundamental mathematics used in understanding the transmission of information in
communication networks is Shannon‘s information theory [Shannon48]. While this theory does
not account for every facet of communication networks (for example, Shannon‘s approach
doesn‘t account for network capacity), it is the seminal work that has provided the basis for
much of the mathematical work in communication networks: this seminal work provided
mathematical formalization to the notion of information (a bit stream) being transmitted between
entities, and allows for the specific definition of notions such as information loss, coding and
encoding errors, and information entropy. In short, it allows us to probe the fundamental limits
on many aspects of transmitting data.
However, Shannon‘s theory, and its many descendents, is deficient in an important way when
considering modern military networks which must include not just the communications network,
but also the overall information network and the social and cognitive connections it allows.
While Shannon‘s theory and its descendants yield critical insights into many aspects of
networking, it does not properly account for the transmission of complex information with
respect to whether the receiver ―understands‖ that information and acts upon it appropriately.
Simplifying greatly, we can say that under Shannon‘s definitions information transmission is
essentially related to coding and encoding, and not to human communication or understanding of
that information.
In the creation of the NS-CTA, it is clear that a key challenge will be to create better
understandings of information and social networks, including coming up with a precise
definition of what the modern information that is translated through an information network
actually is. Clearly the purpose of a communication network is to transfer information, and the
people in social networks are linked together not simply by edges in a graph called ―friend,‖ but
by the information they can share through those links. Thus, although there are many definitions
of ―information,‖ defining information networks and differentiating them from social networks
remains a challenge.
In social networks, we also see that a new model is needed if we explore some of the typical
encodings of these networks. For example, a model of the Facebook system with respect to the
underlying graph of ―friends‖ can reveal important aspects of network structure, for example
who are central nodes, that the network is primarily scale free, and that some particular features
of the network follow Zipf‘s law. However, the model cannot explain why people have chosen
the particular friends they have, what information has been transmitted between those friends,
and how that information is interpreted. In a military setting, these simple links do not properly
encode the hierarchical links between the warfighters and their command structures, nor can they
account for the changes in structure that occur on the battlefield (for example, when a participant
is injured or killed and the command structure must be modified immediately for continued
mission success).

NS CTA IPP v1.4 6-35 March 17, 2010


Similarly, while we may be able to understand the network structure at the graph level, we may
not be able to explain why that structure results – preferential attachment, for example, gives us a
model of network formation, but we cannot explain what sorts of information leads to
preferential attachment in the real world. We know as well that there are macro-effects that
describe many kinds of networks, from the formation of galaxies to the food chains of
microscopic life in a drop of water, but models which ignore the flow of information cannot
possibly provide accounts of some key aspects of human network use. For example, a military
team may not be able to function properly if a critical message is lost, which could lead to the
destruction of the team (and thus to a change in the network structure), but identifying what
information might be impacted, and how, is crucial to either having an ability to disrupt an
adversary‘s function or to secure our own.
In the social science field, the analysis of social networks (a term usually attributed to
[Barnes54]) takes on a very different flavor, looking exactly at the human effects that determine
who might join a network, why someone might leave a network, who are critical nodes, etc. Up
until recently, these analyses have generally focused on relatively small networks, and the
analyses have allowed the exploration of ―signatures‖ that reflect network formation with respect
to aspects of human social and cognitive function such as family, health, incentives, religion, etc.
Network analysis has been able to help us define concepts such as social capital, measures of
node placement and importance, various kinds of social ranking structures, etc. While these are
crucial in looking at specific networks, the current models tend to be ―descriptive‖, telling us
that, for example, homophily (the tendency to associate with individuals with similar
characteristics, [Lazarsfeld and Merton54]) will form around social class in some cases (such as
mySpace) and locality in other (such as online gaming), but they do not provide explanatory
mechanisms to let us know why these effects result in some cases and not others, or how they
effect the macro structure (for example community formation) in larger networks. Further, and
more importantly, they do not generally take into account the issues of information loss – for
example, how is a mission‘s success affected if it is discovered that a network link has been
compromised?
Oversimplifying to some degree, traditional sciences are founded on the need to explain the
relation between micro and macro effects. In biology, we must understand subcellular
interactions if we are ever to cure diseases like cancer. In physics, we know any individual
particular is effected at a quantum level by many things, but given enough of them we can
predict the behavior of a gas. Analogously, social scientists have generally looked at the
microstructures of social networks while physics and computer scientists have explored the
macro. New work is needed to bring these together and to scale the analysis, to explain the
relationships between them, and to be able to link our understanding of individual behaviors to
those resulting in the larger network structures. This is the micro to the macro that is at the heart
of network science, and it clearly is necessarily connected to a much more complex notion of
information (and social relations) than is reflected in the pure mathematics of Shannon‘s
theories.
In order to study systematically the linkages between information, social/cognitive, and
communication networks, it is vitally important that we develop a new network-centric
vocabulary for referring to the concepts in each network, and to describe the relationships
between them. Our goal is to extend traditional information theory to provide mathematically
sound definitions that will not only support the individual efforts of the ARCs and the

NS CTA IPP v1.4 6-36 March 17, 2010


interdisciplinary goals of the IRC, but also will provide a common knowledge base for the use of
the network-science community at large. The starting place is to understand the interaction
between communication, information and social networks and to model more completely what
the dependencies are.
The long-term goal of this project is thus to explore the extension of the mathematics of
information theory to cover all of the areas of network science, not just communication
networks. To do this, we must explore how the definitions of information that have been so
powerful in communication networks may be revised to help us to model information networks
and their interaction with social networks. We need to explore how loss or disruption of
information networks will affect the function of military networks and how to prevent the effects
of this loss when networks we must maintain would be impacted. Finally, we must understand
information from the point of view of its utility over time or in a particular context. Knowing the
enemy‘s plan of battle the day after it occurred is of no value, the day before – priceless.
Challenges of Network-Centric Operations
The US military commander has tremendous resources under his command, and can often
choose many methods for achieving and end. Often, disrupting the enemy can be done by an
immediate kinetic effect ranging from aerial attack to ground force actions. The decision is
made by the commander based on a command intuition, honed through a combination of training
coupled with the modelling of potential effects. Making a correct choice of option can save
lives, time and military resources; a poor choice can be a significant setback or even, in some
cases, a tragic error.
Unfortunately, when it comes to longer-term effects, especially against a non-nation state player
in a complex geo-political situation, it is much harder for commanders to know what to do.
There may be many non-kinetic options available, including a different kinds of information
operations (some which may not have effects until triggered at a later time) and even through a
psyop aimed at a long-term effect on the political context of the situation in which the end must
be achieved. In these cases, the commander may be called upon to decide between options
which have never been used before, whose effects may not be well-modelled, and where training
may not have been available. Further, kinetic effects may also be needed as part of the action,
for example physically disrupting a communications network to cover the insertion of
misinformation. The purpose of this project is to help do the necessary science to provide the
models, and the eventual use thereof in both planning and training, for network-based
information actions against an adversary.
Example Military Scenarios
The leader of an anti-US militia is known to be hiding somewhere in a tribal area under cover of
civilians. A physical attack on a communications network is not likely to work as the savvy foe
is using a combination of Internet sites, cell phone drops, and physical messages to obtain his
information. The US commander, having learned from intelligence surcess that the enemy is not
actively planning an attack in the immediate future, but is gathering resources for a longer term
operation, wants to figure out how best to disrupt the planning. INSCOM informs him that
misinformation could be inserted into the enemy‘s network that might have an effect in the next
few weeks, but the adversary would be much more alert once the deception is uncovered. A
national intelligence asset suggests an approach that would create a situation that, at a later time
of the commander‘s choosing, could totally corrupt the enemy‘s situational awareness, but that it
could only be used once and of short duration, and that an air strike on a communication node

NS CTA IPP v1.4 6-37 March 17, 2010


might be needed to cover the insertion. A representative of the 96th CA Battalion suggests that a
better approach would be to take a longer time and to strengthen US ties with a refugee
community in the same area, which would create significant intelligence potential as well as
making the adversary‘s command and control of the militia much more difficult as well as
potentially strengthening US supply lines if a direct military option is needed as it could lessen
hostile activity in the area. Based on the underlying theories developed in Task R2.1, and tested
for transition in R2.2, the commander is able to run some simulations based on validated models
and to determine that one of these three contradictory options has the highest chance of success
with the least unintended consequence.
Impact on Network Science
In the creation of the NS-CTA, it is clear that a key challenge will be to create powerful models
of information and social networks, including coming up with a more precise definition of what
the information that is translated through a modern information network actually is. Clearly the
purpose of a communication network is to transfer information, and the people in social
networks are linked together not simply by edges in a graph called ―friend,‖ but by the
information they can share through those links. Thus, although there are many definitions of
―information,‖ defining information networks and differentiating them from social networks
remains a challenge. This project focuses on providing both key theoretical results, and
validated models, that can help provide the crisp distinctions among different types of
information (and thus, for the military, information ops) that are needed for better modeling of
composite networks.

6.6.3 Key Project Research Questions


It is clear that to make significant progress on this important task over the duration of the NS-
CTA, we must address several key issues:
How can we extend Shannon‘s theories, or develop new mathematics of similar stature,
to account for the phenomena we must model in modern military networks of all kinds?
How can we devise appropriate utility measures that can evaluate the change in value of
information over time and distance, in the physical case, or against social issues such as
change of context (for example where the information came from) or change in trust
between the participants
A particularly important use of this new theory would be to exploit, in the case of
adversary networks, or protect against, in the case of our own, the loss of information on
social networks and the cognitive aspects of decision makers. How can this be modeled
mathematically? In addition, how can we use these models to help better understand and
enhance team performance?
For the initial program plan, these two problems will be explored in parallel, but with strong
links between them in terms of making sure each is developing their work in the context of the
others. In the following years, we expect to evolve these tasks into a single unified framework
which provides the key underlying mathematics for modeling modern military networks.
Key linkages will be working with the INARC team (project I3.1) to help define information
networks, and working with the SCNARC team to help explore how information issues can
affect adversary social networks (project S2.1). We will also work closely with the CNARC on
the extensions of Semantic Information Theory and how the work in this task can be used as part

NS CTA IPP v1.4 6-38 March 17, 2010


of the overall QoI definitions. We will also work closely with EDIN task E1, which focuses on
the modeling of dynamic effects in networks and we note that the personnel overlap between E1
and R2.1 is specifically designed to permit joint modeling and keep these tasks synchronized
with each other.

6.6.4 Initial Hypotheses


We hypothesize
That modeling the information flow in composite networks will require extending the
definitions from Shannon‘s Information Theory to understand how terms that have
successfully been used in modeling communications networks can be applied to these
new network types.
That we must have a model of the utility of information in a specific context if we are to
be able to model the impact of information disruption on adversary networks (or block
those disruptions to our systems).
That we can use these results to provide validated models as to how to upset adversary
networks, and defend our own, in practice.

6.6.5 Technical Approach


This project focuses on providing formal underpinnings to work in defining, modeling and
differentiating Information Networks and Social Networks at a fundamental level. The project
has two tasks:
Task R2.1: Semantic Information Theory: How can we extend Shannon‘s theories, or develop
new mathematics of similar stature, to account for the phenomena we must model in modern
military networks of all kinds? How can we devise appropriate utility measures that can evaluate
the change in value of information over time and distance, in the physical case, or against social
issues such as change of context (for example where the information came from) or change in
trust between the participants?
Task R2.2: Impact of Information Loss and Error on Social Networks: How can we use this new
theory, especially the modeling of explicit information loss, to exploit, in the case of adversary
networks, or protect against, in the case of our own, the loss of information on social networks
and the cognitive aspects of decision makers.
We believe these tasks, taken together, will provide a mathematical foundation for modeling
networks, an approach to modeling, and later effecting, our adversaries‘ networks (and
protecting our own), and an approach to exploring how to more precisely understand key design
issues in composite networks. In year 1, we focus on key definitions and scientific
experimentation for validation. In later years we expect that technological development and
transition will result from this early work, and form a key part of the transition strategy with
respect to network design, defense and attack.
These tasks are described below.

NS CTA IPP v1.4 6-39 March 17, 2010


6.6.6 Task R2.1: Semantic Information Theory (J. Hendler, RPI (IRC); P. Basu,
BBN (IRC); N. Contractor, NWU (IRC); B. Carterette, UDel (IRC))
NOTE: This is a 6.1 basic research task.

Task Overview
The fundamental mathematics used in understanding the transmission of information in
communication networks is Shannon‘s information theory. This seminal work provided
mathematical formalization to the notion of information (a bit stream) being transmitted between
entities, and allows for the specific definition of notions such as information loss, coding and
encoding errors, and information entropy. In short, it allowed us to probe the fundamental limits
on compressing, reliably storing, and transmitting data. The impact of Shannon‘s work is huge,
and applications of the theory include lossless data compression (e.g. ZIP files), lossy data
compression (e.g. MP3s), and channel coding (e.g. for DSL lines). Further, the field is at the
intersection of mathematics, statistics, computer science, physics, neurobiology, and electrical
engineering. Its impact has been crucial to the success of the Voyager missions to deep space, the
invention of the compact disc, the feasibility of mobile phones, the development of the Internet,
the study of linguistics and of human perception, the understanding of black holes, and numerous
other fields. Important sub-fields of information theory are source coding, channel coding,
algorithmic complexity theory, algorithmic information theory, and measures of information.
However, Shannon‘s theory, and its many descendents, is deficient in an important way when
considering modern information networks. While it talks about ―information,‖ it really focuses
on data – that is the transmission of bits and the ability of a device to reconstitute those bits at the
receiving end. However, it does not take into account the critical aspects of the transmission of
the information with respect to whether the receiver ―understands‖ that information. Thus, under
Shannon‘s definitions, information transmission is essentially related to coding and encoding,
and not to human communication of that information.
Task Motivation:
Modeling the modern information network thus needs a significant extension to Shannon‘s work.
Social networks are made of links that represent people (or organizations) and their relationships.
The information that flows through those networks cannot be viewed just as bits - they are bits
that encode specific meanings, and those meanings only are comprehensible in the presence of
specific semantics. Information networks carry this encoded meaning, but it is only useful when
interpreted by humans, and that notion of interpretation (or understanding) is not covered in
Shannon‘s work. Thus, there is a crucial link between social and information networks that we
must better explore, understand and explicate to be able to develop a network science that can
explain these newer forms of networks.
Key Research Questions
Can the concepts of information theory be extended to the composite networks needed for the
NS CTA to model modern composite military networks?
Initial Hypotheses
We hypothesize that modeling the information flow in composite networks will require
extending the definitions from Shannon‘s Information Theory to understand how terms that have
successfully been used in modeling communications networks can be applied to these new
network types. We will start by extending definitions from information theory as applied to

NS CTA IPP v1.4 6-40 March 17, 2010


communication networks to information networks and Semantic and Cognitive networks. We
also hypothesize that we must have a model of the utility of information in a specific context if
we are to be able to model the impact of information disruption on adversary networks (or block
those disruptions to our systems). We will being this work by extending utility theory to
information networks and defining the requirements of a framework for adding this kind of
utility into the extended information theory.
Prior Work
Traditionally, data sharing between computational networks has followed schemes largely
derived based on functional flows that identify known information sharing links as opposed to
creating an overall network. Military data networking has primarily been ad hoc, requiring a
plethora of systems and servers to be deployed, and special purpose mappings to be defined pre-
conflict or in theatre. More recently, centralized approaches, using XML or other interoperability
formats have been proposed, leading to improvement in the situation, but at a high a priori
design cost and significantly limiting flexibility in the field (cf. [Hendler02]).
In the area of communication networks in particular, there is much well understood underlying
mathematics, most of it dependent on Shannon‘s information theory [Shannon48] and the much
work derived from it in the past fifty years. The ability to analyze information transfer or other
scaling factors in such networks lets us, for example, increase robustness to failures or improve
QoI in communications networks (as proposed by the CNARC). However, for information
sharing in large data networks, or in other data exchange systems, there has been great difficulty
in applying such models. The ability to generate quantifiable models, and corresponding
optimization procedures, would be a major step forward in information networks.
One of the key approaches to data integration has been the development of common schemas, or
more recently ontologies, to allow semantic integration of data sets ([Batini86] and [Noy04]
provide overviews of work in these areas). In [Hendler06], we proposed a model of information
fluidity (see Figure 1.9) that would allow qualitative models of information network design, and
corresponding optimal flows, to be developed. The approach is based on taking an information
theory approach to data sharing, based on the approaches being used in communication
networks, but with a twist to add in the effort of information sharing design. To simplify
somewhat, in traditional information theory if we have two sources exchanging a signal, then
their communication is dictated by signal/noise ratios. However, if we look at the exchange of
data, we must add a translation cost if they are in different formats or from different domains.
We balance this against an effort calculation needed to model the complexity that results from
the fact that reaching useful agreement in larger communities (i.e., standardizing all data) is
harder than in smaller communities (arranging data sharing between a small number of systems).
Technical Approach
This task has two subtasks, one focusing on the specific definitions for Semantic Information
Theory, one focusing on developing utility mechanisms for use in modeling composite networks.
[Shannon1948] explicitly excluded meaning (semantic aspects of communication) and utility
(which is based on meaning) from consideration. Efforts to extend the theory should presumably
incorporate both in a coordinated fashion.
Subtask R2.1.1: Defining Semantic Information Theory terms
Our starting place in this study, and the focus of our IPP work, is to explore whether we can
model data mismatch (e.g. in translating between ontologies employing different assumptions or

NS CTA IPP v1.4 6-41 March 17, 2010


levels of granularity) analogously to noise introduced in transmission and thus something akin to
entropy minimization could produce optimal information network designs. To do this we need
to work with INARC task I3.1 to define information networks more rigorously, and especially to
come up with a proper framework for talking about information loss (to be used in subtask 2
below). We must also explore the work on social network modeling from the SCNARC, project
S2, and see how the formal notion of information networks (and particularly the formalization of
information utility we will use in task 3.3 below) fits into these models. Thus, for example, we
must be able to describe a network effect such as preferential attachment, not just in terms of the
network graph, but with respect to what information decision-makers need, and how their
networks develop to provide it.
Longer term, and eventually transitioning to the 6.2 level, we must work on turning the math
being developed in this task into a design approach that can be used in identifying the key
mappings for real-world systems, and growing the optimal networks from those key links. We
believe that a design methodology that uses operational concepts to derive key links, preferential
attachment or similar model to identify ―clusters‖ of sharing needs, and an optimization model
over these would be both possible and desirable for designing the information exchange needs
for military information network design and deployment.

Figure: Information Fluidity may allow us to apply more rigorously define quantifiable effects of information
transmission errors based on changes in social or semantic contexts.

Validation Approach
As a primarily mathematical approach, the model will mainly be validated through application to
scenarios (e.g. use cases) and intuitions coming from more applied collaborating tasks (e.g.
information loss in R2.2). Early results (year 1) will primarily focus on definitions that attempt
to expand those of Shannon theory to the new network types; later work will include proofs of
key theorems and development of scientific experiments to test key hypotheses of the model.
One key aspect of validation will be showing that the new definitions, which extend Shannon‘s
approach, will

NS CTA IPP v1.4 6-42 March 17, 2010


Summary of Military Relevance
This task will provide the formal basis for the approaches being used in the remaining tasks in
this project and for many of the key information-related models needed throughout the NS CTA.
In the long run we expect that, like the Shannon Information Theory it will be based on, this will
provide fundamental new insights into technology development for military networking.
Research Products
Primary research products will be definitions, mathematics, and formalism on which other work
in the NS CTA can build. Initial formulations will be posted on the NS CTA Wiki for review and
comments and incrementally evolved. In the later years, we expect that experimentation based on
this model will help us to build technological solutions, at the 6.2 level, that enable better
exploitation of the interdependencies between composite network components.
Subtask R2.1.2 Modeling Utility in Composite Networks
Another important theoretical approach to modeling of networks with information flow is that of
utility theory. Knowing, from task R2.1.1 that it is possible to, say, provide significant
information flow with little entropy under certain conditions, we still need to model the utility of
that flow if we are to determine that those conditions would be desirable to obtain. This is
because much of the information flowing through networks at the socio-cognitive level will
ultimately be used by a human operator to make some decision. For this information to be useful
to that operator, it must provide utility in making those decisions. This is information that can be
easily processed by humans, but the mathematical models that are required for automatic
processing do not (yet) exist with enough generality for the modeling needs of complex
composite networks. Evaluating the utility of this information, and specifically the utility of the
information provided by a particular network configuration (including nodes, algorithms running
at nodes and between nodes, connections between nodes, etc), is a difficult but important
problem.
The broad scenario we envision is a specialist/operator at one node in an information network
issuing a query that propagates through the network; at each node that receives the query, it is
processed and potentially relevant material is retrieved, packaged, and returned to the querying
node for consumption. The operator then uses that information to make some decision. In doing
so, the operator must deal with at least some of the following:
1. Non-relevant information returned by nodes. This is likely to happen when querying
highly unstructured databases comprising natural language (such as full-text databases of
written reports), images, video, and so on.
2. Redundant information.
3. Untrustworthy information returned by nodes.
4. Corrupted information returned by nodes (due to communications failures or other
problems).
5. Information returned by distrusted nodes.
Generally speaking, each of these negatively affects the utility of the information presented to
the operator. We would like to be able to evaluate the expected utility of the information
retrieved in response to an arbitrary query within a particular information network defined by the
nodes in the network, the type of data stored at those nodes, the representation of that data, and

NS CTA IPP v1.4 6-43 March 17, 2010


the algorithms used to retrieve it and package it for consumption. We will in fact need to be able
to evaluate expected utility in order to improve the overall performance of the network.
Our research questions, therefore, are:
1. How do we measure the utility of information returned in response to a query?
2. Given a particular utility measurement, how do we compute it in an efficient way without
losing any accuracy?
3. Given a particular utility measurement calculated over expert human judgments, how can
we amortize the cost of those judgments over many such measurements (or experiments)?

Technical Approach
The field of Information Retrieval (IR) offers many tools to approach these research questions.
We define IR as the study of computational methods for searching, organizing, and analyzing
very large, semi-structured, heterogeneous databases of text, images, video, and other media that
are easily understood at the level of human cognition but for which mathematical models can at
best only capture the semantics at rudimentary levels. We therefore view IR as a problem in
Artificial Intelligence. The basic function of an IR system is to take a query entered by a human
user and return a set of documents ranked in decreasing order of probability of relevance to the
user.
One of the key problems in IR is measuring the utility of results ranked in response to a query.
Utility is generally measured in terms of the relevance of the results; two of the most common
measures are precision (the proportion of retrieved material that is relevant) and recall (the
proportion of relevant material retrieved). Since relevance is a semantic (or pragmatic) property,
it can only be well-judged by human assessors. These assessors incur a cost in time and money,
and the cost is potentially very high: in particular, if they must have expert knowledge to be able
to assess relevance, or if the databases are very large, costs can quickly grow to the point that no
evaluation can be done without a significant investment.
This is an important problem in IR, and we believe it will be even more important in network
science: we expect that the types of information flowing through the network will be so varied,
and the algorithms used to process queries so diverse across nodes, that it will cost significantly
more to evaluate the utility of a network than it does a standard IR system. Fortunately the
problem has been studied, and there are tools that we can build on. Much of the previous work
focuses on selecting individual results (i.e. individual documents, images, videos, etc) to be
judged by human assessors, with the goal of being able to accurately measure utility with only a
small amount of assessor input. Some of this work is based on statistical sampling approaches
that attempt to find a set of documents that provide a low-variance, zero-bias estimate of a
measure like recall ([Aslam06]; [Carterette2008b]). One approach is an algorithm, using a
priority queue ([Cormack98]) that is updated after each assessment. Alternatively, some
approaches assume an initial set of assessments from which additional ones can be inferred
([Jensen07]).
Our approach builds on algorithmic work published in a series of papers by Carterette et al.
([Carterette06], [Carterette07], [Carterette08a], [Carterette08b]). The focus in that work is on
comparative evaluation: given two algorithms for ranking possibly-relevant material, determine
which is better (and report the confidence in that decision). Specifically, given some utility

NS CTA IPP v1.4 6-44 March 17, 2010


measure expressed as a function of relevance judgments xi to individual documents di (1 i
N), and calculated for two underlying systems S1 and S2, we write = (S1, {x1…xn}) – (S2,
{x1…xn}). We then attempt to find the smallest set of documents to assess that would prove
< 0, > 0, or = 0 by calculating the expected information about a random variable given
a hypothetical judgment to a document di. For comparing two systems, we have proved both
theoretically and empirically that our algorithms are optimal in the sense that they are expected
to require the least number of judgments of any other algorithm with equivalent prior knowledge.
Our work would extend that previous work in three ways:
1. Instead of assessing documents for relevance alone, we will take into account trust, trust
of the source, redundancy with other documents, and corruption of information.
2. We will develop measures of utility that take into account all the individual judgments
about all aspects (trust, relevance, and so on).
3. We will develop new optimal algorithms to find the most informative judgments that
need to be made in order to minimize the cost of expert human judgments while also
minimizing the probability of errors in making decisions about the relative quality of
information network algorithms.
One of the particular challenges of this work will be finding data that can be used to simulate and
validate our approaches to the problem. As a first step, we intend to model the problem using
newswire data assembled for previous and on-going DARPA projects such as TIDES and GALE.
This data includes full-text news articles in English, Arabic, and Mandarin Chinese (providing an
additional challenge of dealing with multi-lingual information), as well as full queries that have
been developed by intelligence analysts. We can simulate nodes in a network by partitioning the
data either randomly or in particular ways that skew certain types of data to certain nodes. Then
we can simulate deteriorating trust in individual nodes, data corruption occurring at the node or
over the physical network (using particular models for how natural language text becomes
corrupted), untrustworthy information, and so on. Redundancy is built into the dataset by virtue
of its aggregation of multiple news sources over a fixed period of time. (Part of this work will
involve humans reading and assessing these newswire documents. Local students may be able to
do this, but it is also possible that Amazon‘s Mechanical Turk could provide a cheap and fast
source of judgments.)
As a result of this work, we will have produced measures of utility of information that take into
account redundancy and trust as well as relevance. It is very likely that such measures will
depend on some human expert input, so we will also have algorithms to estimate those measures
with minimum cost in human expert time. These algorithms will be based on particular models
of the data, which will be another product of this work.
Finally, we will explore whether the utility theories developed in this sub task and the
information theory described in subtask R2.1.1 can be coupled into a ―unified theory‖ that
supports both kinds of work. We note that even should this prove impossible, as reconciling the
continuous mathematics of information theory with the more discrete information in this task is
always complex mathematically, the separate results will each feed into Task R2.2, which
focuses on better modeling of the impact of information loss (and error) and its effect on
adversary networks.

NS CTA IPP v1.4 6-45 March 17, 2010


6.6.7 Task R2.2: Impact of Information Loss and Error (K. Carley, M. Martin, M.
Kowalchuk, J. Reminga, CMU (IRC); Collaborator: B. Syzmanski, RPI
(SCNARC))
NOTE: This is a 6.1 basic research task.

Task Overview
As discussed in the military scenario described earlier, measuring, reasoning about, and
forecasting the impact of information loss in communication networks and information error in
information networks on the structure and performance of socio-cognitive networks is a key
challenge that systems will face when integrating socio-cognitive models with information and
communication models. A related challenge, necessary to transitioning the theoretical results of
task 1.1 is representing and visualizing the impact of information loss and error on socio-
cognitive networks in a manner that will help warfighters identify socio-cognitive networks of
interest (e.g., adversarial networks, boundaries of trust within coalitions, etc.).
Task Motivation
We believe that advances in network science will occur more quickly if we simultaneously
address the IRC research challenges above. Our goal is to develop theory, models, tools, metrics
and visualizations for reasoning about the impact of information loss and error on socio-
cognitive networks. Insights gained from our research will inform research concerned with how
information loss and error impact, for example, Blue and Red force capability, their socio-
cognitive networks, and the adequacy of warfighters‘ knowledge of adversaries‘ socio-cognitive
networks. In the long term, our research will lay the foundation for a capability to perform
dynamic error-impact analysis and visualization on-line as network-data streams in from
evolving battlefields.
Key research questions
Our key research question is how can missing or incorrect network data be characterized using
the metrics (extant and forthcoming) that practitioners might use to describe and reason about
networks? Our secondary research question is how does information error affect behavior in
socio-cognitive networks?
To answer these questions we will address a number of core issues, including (progressing from
6.1 to 6.2):
What are the characteristics of information loss and error scenarios?
How do we differentiate intentional vs. inadvertent information loss and error?
What is the effect of network topology on error propagation?
How can we provide a high level estimation of information loss and error levels given
specific networking data?
What is the robustness of socio-cognitive network metrics in the face of information loss
and error?
What procedures do we have for assessing confidence in network metrics?
What is the impact of information loss and error on socio-cognitive networks and how do
these affect the resultant human behaviors at the tactical, operational and strategic levels?

NS CTA IPP v1.4 6-46 March 17, 2010


How can we visualize the impact of information loss and error on socio-cognitive
networks?
To address these issues we will use a combined statistical, machine learning, and simulation
approach. Statistical and machine learning techniques will be used to characterize and
discriminate errors across diverse topologies under a range of information loss/error scenarios
using real and simulated data resulting in estimates of metric robustness. Agent based simulation
will be used to evolve populations and explore error conditions. Dynamic-Network Analysis
(DNA) techniques will collectively be used to develop a framework and the metrics and
visualizations associated with the impact of information loss and error on socio-cognitive
networks.
As the proposed work is multi-level and has a forecasting and information assurance component,
both CCRIs (EDIN and Trust) will benefit from this research. It will also feed into and benefit
from the SCNARC task concerning the other side of the information-loss coin –information
overload (assuming that closer examination of this potential linkage continues to indicate a
substantive relationship). It will feed and benefit from the CNARC‘s tactical MANET research
tasks such as estimating the information loss due to rerouting of mobile networks and suggesting
which parts need protection to prevent critical information loss. Models and data from the
INARC will be beneficial, particularly those on network resiliency.
Prior Work
Prior work has produced a basic understanding of the robustness of standard metrics used to
characterize socio-cognitive, communication, and information networks under conditions of
information loss and error on different network topographies [Frantz10] [Borgatti06]. But a
challenge that remains for the NS-CTA is determining which topology is best suited when so that
these error models can be utilized. These metrics and basic confidence integrators given simple
topographies have been implemented as part of *ORA, and are available in executable form to
NS-CTA members. These metrics have also been integrated into an overall system that enables
warfighters to track and identify hostile actors and groups given communications, information,
and socio-cognitive network data [Carley09b], and are currently in use at SOCOM and SKOPE.
We expect to leverage this system, thus fostering the development of new metrics and enabling
the rapid transition of new results.
Prior and on-going work concerned with information diffusion and cyber warfare has produced a
robust agent-based dynamic-network analysis simulation engine – Construct [Carley09c]. This
prior work led to an understanding of how to use agent-based dynamic-network analysis
simulation engines to address issues of change in socio-cognitive networks and how different
interventions impact said networks [Carley09a]. This model already takes actual network data
into account. Thus, we expect to use Construct to ―play out‖ nonlinear interactions among the
nodes in communication, information, and socio-cognitive networks. Ultimately (beyond year 1)
we expect to develop a reduced form version of Construct so that warfighters can enter their
scenario of information loss/error and extant network to see the impact. Note that both ORA and
Construct already can be downloaded as executables from the CASOS website and used by NS-
CTA members.

Technical Approach
This task focuses on modeling adversary networks.

NS CTA IPP v1.4 6-47 March 17, 2010


We propose a two-pronged research strategy that involves simulated data on the one hand and
real-world data on the other. Both lines of research are concerned with assessing the impact of
information loss and error on: (1) the evolution, transformation and behavior of socio-cognitive
networks, and (2) the ability to accurately identify socio-cognitive networks. They differ because
simulated and real-world network data often have very different structures; the technical
demands and tools required to process the two data-types differ, too.
In Prong 1, we will utilize multi-agent dynamic-network simulation and DNA techniques to
evolve and assess joint communication, information, and socio-cognitive networks. Specifically,
we propose to conduct a series of virtual experiments using theory-based, multi-agent
simulations (e.g., Construct) to explore how different levels and types of information loss/error
impact socio-cognitive networks with different topologies. Results from these simulations will be
assessed using dynamic network, change detection, and over-time analysis techniques to
determine whether different types of information loss and errors produce common or unique
network responses, where network responses are conceptualized in terms of changes in socio-
cognitive network structure and the repercussions of such changes for individuals‘ roles and
positions. The second prong of our research strategy involves extracting socio-cognitive
networks from available real-world data (e.g., newspaper articles, websites, mock intel reports,
etc.). This line of research thus requires a combination of network extraction techniques (e.g.,
AutoMap, web scrapers, etc.) and DNA techniques. Once the real-world networks are extracted,
we will use the SAME METRICS and VISUALIZERS as used in Prong 1 to assess how the
SAME levels and types of information loss and error will impact the ability to identify,
characterize and understand the socio-cognitive networks as observed by field operatives and
analysts.
At the tool level, both lines of research require the same advances in error-impact measurement
and visualization. More importantly, over time (beyond year 1) the two lines of research will
converge to provide an integrated model of how the information loss and error that is present
during the identification of an adversary‘s socio-cognitive network can impact the warfighters‘
understanding of the impact of future information loss and error on the behavior of Blue or
adversarial forces. Furthermore, convergence of these research lines – as we envision – will
enhance the workflow among network tools that is necessary for on-line analysis of streaming
battlefield data. Namely, convergence will enhance the workflow from network extraction of
real-world data to descriptive network analysis of current battlefield states to network evolution
via multi-agent simulation and back to predictive network analysis of future battlefield states
based on simulated data. Enhanced workflows such as this are an important aspect of on-line,
dynamic network analytics.
Enhanced workflows could be used, for example, to support the integration of tactical MANETs
with formal and informal socio-cognitive networks during mission planning and execution.
During planning, error-impact analysis could be used to identify strengths and weaknesses in
how the planned communications network would support decision makers during the upcoming
mission. During execution, mission performance could be monitored to assess how information
loss and error is influencing the distribution of information deemed necessary for mission
success. Alternative options concerning information access during mission execution (push or
pull) could be provided as the mission unfolds. Thus, decision-makers who should have shared
access to particular pieces of information would be more likely to receive them.

NS CTA IPP v1.4 6-48 March 17, 2010


In year 1, we will complete the following subgoals associated with Prong 1: designing and
conducting virtual experiments to assess the impact of key classes of information loss and error
on socio-cognitive networks with different topographies. First, we will identify a scenario that
includes socio-cognitive, information, and communication networks for use in our virtual
experiments and informed by the open source data. To design the scenario, we will characterize
the levels and key types of information loss/error and develop a set of procedures for
representing information loss in communication networks and information error in information
networks for use in dynamic network models. Ideally, the ARCs will provide at least some of the
data and/or models needed to populate this scenario. We will then represent the scenario in our
network representation language, DyNetML (see [Belov09] for justification), and design and
conduct at least one virtual experiment that explores how key classes of information loss and
error impact socio-cognitive networks with different topographies.
Continuing with year 1 subgoals to be completed, we will develop a set of dynamic network
metrics for assessing the impact of information loss and error on socio-cognitive networks to
support our analysis of the results of the virtual experiment(s). This will require determining
which, if any, of the current metrics for impact assessment such as those in ORA (i.e., the only
network analysis package currently available for analyzing the multi-mode, multiplex networks
under consideration) are usable. New metrics will be defined as needed and operationalized. For
the warfighter, however, it is not enough to have these metrics; the metrics need to be placed in a
visualization environment that fosters understanding of the impact of information loss and error
on behavior. Thus, we will also develop initial procedures for representing and visualizing the
impact of information loss and error on socio-cognitive networks. Finally, we will demonstrate
and test these with simulated and, as possible, real data.
Validation Approach
CMU-CASOS has a long history of taking basic research ideas and creating usable systems that
can and are used by various practitioners. For example, the ORA system developed in CASOS is
now in use at SKOPE, RDEC within SOCOM and at various agencies. We propose, as we move
from 6.1 to 6.2, to take the models and techniques that work best for error reasoning, and to
harden and include them in various tools, such as ORA, for testing and evaluation at ARL and in
the IRC distributed test bed. We plan to work simultaneously on using simulation to conduct
virtual experiments to assess the potential impact of information loss/error on network formation,
evolution, and performance and to assess the ability of the blue-force to estimate the adversaries‘
socio-cognitive network under the same types and levels of information loss/error. Project 1.3.2
will exploit the data collected by the SCNARC and INARC for testing the model, and will
provide to them a way of visualizing the impact of information loss on socio-cognitive networks.
Summary of military relevance
The work done here will support DoD efforts by developing (at the 6.2) level specific indicators
of the confidence one should place in the network metrics extracted from the data given specific
information loss (or interruption) scenarios. Previously conducted work on metric robustness
and link inference (e.g., work supported by the ONR) will be leveraged in this project as that
other work provides tools that we will employ in assessing simulation results. The proposed 6.2
work here will provide new capabilities by enabling the visualization of the impact of
information loss/error on metrics of concern in information operations.

NS CTA IPP v1.4 6-49 March 17, 2010


6.6.8 Linkages with Other Projects
Key linkages will be working with the INARC team (project I3.1, led by Jiawei Han) to help
define information networks, and working with the SCNARC team to help explore how
information issues can affect adversary social networks (project S2.1). We will also work
closely with the CNARC on the extensions of Semantic Information Theory and how the work in
this task can be used as part of the overall QoI definitions. We will also work closely with EDIN
task E1, which focuses on the modeling of dynamic effects in networks and we note that the
personnel overlap between E1 and R2.1 is specifically designed to permit joint modeling and
keep these tasks synchronized with each other.
In particular:
Task R2 (and especially R2.1) will directly collaborate with the definitional tasks being
undertaken within the EDIN and Trust projects (E1.1 and T1.1 respectively).
Task R2.1 will collaborate with the CNARC on semantic information theory definition.
INARC task I3.1 will collaborate with R2.1 on the definition, and eventual formalization,
of Information Networks.
Task R2.2 will collaborate with the SCNARC on adversary networks (S2.1).

6.6.9 Collaborations and Staff Rotations


A key collaboration that is needed throughout the project is to make sure that the modeling tasks
in EDIN, Trust and this R2 task are coordinated. To this end, IRC funding will provide a post-
doctoral researcher who is specifically tasked to spend half his time on vocabulary definition
tasks and who will reside at the NS-CTA center in Boston. The post-doc‘s time will be funded
through the subcontract to Dr. Hendler at RPI, and Dr. Hendler will directly supervise his efforts,
to make sure that these ontology and modeling-relating tasks are coordinated and that they use
the best technologies.
Work in task R2.2 is clearly related to, and will be coordinated with, the SCNARC project on
modeling adversarial networks (S2). Dr. Boleslaw Syzmanski and other SCNARC researchers
will work with Dr. Kathleen Carley and her team on task R2.2 and the CMU researchers will
similarly coordinate efforts on project S2. As the IRC task is partially supported by 6.2 funds,
with more expected in the outyears, while the SCNARC project is completely 6.1 funded, we do
not to integrate these tasks (or their budgets) in the IPP, but we do specifically identify this
collaboration.

6.6.10 Relation to DoD and Industry Research


Work in task R2.2 will support other DoD and Industry Research efforts by providing (at the 6.1
level) basic insights on general impacts of information loss and (at the 6.2) level specific
indicators of the confidence one should place in the network metrics extracted from the data
given the information loss scenario. Previously conducted work on metric robustness and link
inference (e.g., work supported by the ONR) will be leveraged in this project as that other work
provides tools that we will employ in assessing simulation results. Further, proposed 6.2 work
here could provide new capabilities by enabling the visualization of the impact of information
loss/error on metrics.

NS CTA IPP v1.4 6-50 March 17, 2010


6.6.11 Project Research Milestones

Research Milestones

Due Task Description


White paper on social network and information network
Q2 Task R2.1 definitions. Responsible Parties: RPI, BBN

Develop measures of the utility of information flowing through


Q2 Task R2.1 an information network that take into account relevance,
redundancy, and trust. Responsible Party: UDel

Develop set of procedures for representing information loss in


Q2 Task R2.2 communication network and error in information networks for
use in dynamic network models. Responsible Party: CMU
Derive definition of information loss and, if possible, entropy
Q3 Task R2.1 with respect to non-communication networks under varying
semantic conditions. Responsible Parties: RPI, BBN.
Develop models and algorithms for low-cost computation of
Q3 Task R2.1 information network utility measures that require input from
human assessors. Responsible Party: UDel.
Develop set of dynamic network metrics for assessing impact of
loss and error on the socio-cognitive network. Conduct virtual
Q3 Task R2.2 experiment. Develop procedures for representing and
visualizing the impact of information loss and error on socio-
cognitive networks. Responsible Party: CMU
Report on key mathematical properties to explore with respect
to semantic effects on information propagation, identification of
Q4 Task R2.1
promising approach for year two APP. Responsible Parties:
BBN, RPI.
Perform simulations to validate information utility measures
Q4 Task R2.1 and low-cost algorithmic procedures; report on key findings.
Responsible Party: UDel.
Report on the requirements of a combined framework, to be
Q4 Task R2.1 developed in year two or beyond, that could unify work in
R2.1.1 and R2.1.2 Responsible Party: BBN.
Report on virtual experiment concerning how key classes of
information loss/error impact socio-cognitive networks with
Q4 Task R2.2 different topographies (6.1) Demonstration and test of
visualization approach (6.2) Identify and analyze coordination
costs in a networked environment. Responsible Party: CMU

NS CTA IPP v1.4 6-51 March 17, 2010


6.6.12 Project Budget by Organization
Budget By Organization

Government Funding
Organization Cost Share ($)
($)

BBN (IRC) 189,264


CMU (IRC) 38,983
NWU (IRC) 60,940
RPI (IRC) 66,427

UDEL (IRC) 51,494


TOTAL 407,108

6.6.13 Relevance to US Military Visions/Impact on Network Science


Task R2.1 will provide the formal basis for the approaches being used in the remaining tasks in
this project and for many of the key information-related models needed throughout the NS CTA.
In the long run we expect that, like the Shannon Information Theory it will be based on, that this
will provide fundamental new insights into technology development for military networking.
Task R2.2 will extend these results in the area of information loss, particularly as applied to
social-cognitive networks.

References
[Aslam06] Javed A. Aslam, Virgil Pavlu, and Emine Yilmaz. A statistical method for system
evaluation using incomplete judgments. In Proceedings of the 29th Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, pp. 541-548, 2006.
[Barnes54] Barnes, J. A. "Class and Committees in a Norwegian Island Parish", Human
Relations 7:39-58, 1954.
[Batini86] C Batini, M Lenzerini, Sb Navathe, A Comparative Analysis of Methodologies for
Database Schema Integration, ACM Computing Surveys, 1986.
[Belov09] Belov, N., Martin M.K., Patti, J., Reminga, J., Pawlowski, A., & Carley, K.M. (2009).
Dynamic networks: Rapid assessment of changing scenarios. In the Proceedings of the 2nd
International Workshop on Social Computing, Behavior Modeling, and Prediction: Phoenix, AZ
(March, 2009).
[Borgatti06] Stephen Borgatti, Kathleen M. Carley, and David Krackhardt, 2006, ―Robustness of
Centrality Measures under Conditions of Imperfect Data,‖ Social Networks, 28(2), p. 124 - 136.

NS CTA IPP v1.4 6-52 March 17, 2010


[Carley09a] Kathleen M. Carley, Michael K. Martin, John P. Hancock, 2009, ―Dynamic
Network Analysis Applied to Experiments from the Decision Architectures Research
Environment,‖ Advanced Decision Architectures for the Warfighter: Foundation and
Technology, ch. 4.
[Carley09b] Kathleen M. Carley, 2009, ―Computational Modeling or reasoning about the social
behavior of humans,‖ Computational and Mathematical OrganizationTheory,15(1): 47-59.
[Carley09c] Kathleen M. Carley, Michael K. Martin, and Brian Hirshman, 2009, ―The Etiology
of Social Change,‖ Topics in Cognitive Science, 1(4).
[Carterette06] Ben Carterette, James Allan, and Ramesh K. Sitaraman. Minimal test collections
for retrieval evaluation. In Proceedings of the 29th Annual International ACM SIGIR
Conference on Research and Development in Information Retrieval, pages 268–275, 2006.
[Carterette07] Ben Carterette. Robust test collections for retrieval evaluation. In Proceedings of
the 30th Annual International ACM SIGIR Conference on Research and Development in
Information Retrieval, 2007.
[Carterette08a] Ben Carterette. Low-Cost and Robust Evaluation of Information Retrieval
Systems. PhD thesis, University of Massachusetts Amherst, 2008.
[Carterette08b] Ben Carterette, Virgil Pavlu, Evangelos Kanoulas, James Allan, and Javed A.
Aslam. Evaluation over thousands of queries. In Proceedings of the 30th Annual International
ACM SIGIR Conference on Research and Development in Information Retrieval, 2008.
[Cormack98] Gordon V. Cormack, Christopher R. Palmer, and Charles L.A. Clarke. Efficient
construction of large test collections. In Proceedings of the 21st Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, pages 282–289,
1998.
[Frantz10] Terrill L. Frantz, Marcelo Cataldo and Kathleen M. Carley, in press/2010.
―Robustness of centrality measures under uncertainty: Examining the role of network topology,‖
Computational and Mathematical Organization Theory
[Hendler02] J. Hendler, B. Thuraisingham, E. Nelson, G. Wiederhold and Scott Saunders,
Database Migration for Command and Control, Report of the Scientific Advisory Board (Air
Force), Report A025214, Nov., 2002.
[Hendler06] Guofei Jiang, George Cybenko, James A. Hendler. Semantic Interoperability and
Information Fluidity. Int. J. Cooperative Inf. Syst., 15(1), 2006
[Jensen07] Eric C. Jensen, Steven M. Beitzel, Abdur Chowdhury, and Ophir Frieder. Repeatable
evaluation of search services in dynamic environments. Transactions on Information Systems
26(1) pp. 1—38, 2007.
[Lazarsfeld and Merton54] Lazarsfeld, P., and R. K. Merton. (1954). Friendship as a Social
Process: A Substantive and Methodological Analysis. In Freedom and Control in Modern
Society, Morroe Berger, Theodore Abel, and Charles H. Page, eds. New York: Van Nostrand,
18-66.
[Noy04] Noy, N. Semantic integration: a survey of ontology-based approaches, ACM SIGMOD
Record, 2004.

NS CTA IPP v1.4 6-53 March 17, 2010


[Shannon48] Claude E. Shannon: A Mathematical Theory of Communication, Bell System
Technical Journal, Vol. 27, pp. 379–423, 623–656, 1948.

NS CTA IPP v1.4 6-54 March 17, 2010


6.7 Project R3: Experimentation with Composite Networks

Project Lead: J. Hancock, ArtisTech


Email: johnh@artistech.com, Phone: 703-383-3077
Project Lead: A. Leung, BBN
Email: aleung@bbn, Phone: 617-873-5617

Primary Research Staff Collaborators

A. Leung, BBN (IRC) C. Partridge, BBN (IRC)


J. Hancock, ArtisTech (IRC) B. Roberts, BBN (IRC)
D. Williams, USC (IRC) CCRI Leads and ARC Experimentation Liaisons
M. Poole, UIUC (IRC) IRC Facility Staff
J. Srivastava, UMinn (IRC)
N. Contractor, NWU (IRC)
D. Sincoskie, UDel (IRC)
C. Cotton, UDel (IRC)

6.7.1 Project Overview


The four main goals for this project are to:
Lead consortium collaboration and interdisciplinary sharing of theories, tools, data sets,
ontologies, methods, and approaches.
Coordinate collaboration across the consortium to plan, organize, and discover new
methods for experimental validation of theories and models of joint composite networks.
Lead efforts to experimentally assess the potential military impact of ARC and CCRI
research results.
Establish a composite network experimental validation methodology
To accomplish these goals, the R3 Experimentation with Composite Networks effort will include
6.2 applied research to establish a shared platform for experimentation in composite networks
(R3.1) and to design and perform composite network experiments using the shared platform, in
order to validate theories and assess potential military impact (R3.3). We will also perform
applied research to understand dissonance and failure between different genres of networks, and
how these phenomena can be experimentally examined (R3.4). The effort will also include 6.1
basic research to establish and validate experimental methodology for studying composite
networks (R3.2).

6.7.2 Project Motivation


As Network Science mechanisms, models and theories are identified they will require
experimental research to study and verify them. Beyond initial verification of basic research
findings there will also be experiments to determine the compatibility of new approaches with

NS CTA IPP v1.4 6-55 March 17, 2010


each other and traditional approaches. Finally approaches found to be useful in Army challenge
domains need to be tested for system metrics like scalability.
This Project will establish the geographically distributed computation capabilities and
interoperability across the IRC, ARCs, and ARL locations needed to accomplish envisioned
experiments with composite networks. Validation and metrics will vary across network genres
as will scenarios. The scenario for an information driven experiment interacting with varying
communications networks may be fundamentally different than the scenario for Humans using a
dynamic mobile network for voice communications. Each is expected to be related, and most
experiments will have themes that affect all ARCs. It is expected that a range of IN, CN, and
SCN models and Army relevant scenarios will need to be constructed and used in varying
combinations to support specific experiment designs.
We anticipate that there will be several levels of experimental interaction varying from batch-
like tasking on assets that are unique, to piecewise incremental processing such as Kepler can
provide, to scaled real-time especially in simulation, to real-time simulation and emulation with
human users in the loop. The non-real-time experimental work is not usually useful for HITL
(human in the loop) experiments, but it should prove useful in IN/CN experiments, and when
used in conjunction with SCN behavior models.
Beyond the agile experimental environment this effort is also focused on novel experimental
design approaches to control and validate interdisciplinary studies of composite networks.
Challenges of Network-Centric Operations
It is difficult to assess the potential performance impact of new communications and information
networking research products combined with the formal and informal social networks which
overlay any real-world organization. The time and funding required to develop and
operationalize such research products not only delays assessing their effectiveness, but delays the
development of suitable doctrine and procedures to achieve that potential effectiveness – and
delays feedback to the research community on the relevance of their efforts to network-centric
operations. The challenge is to create effective means to experiment on emerging research
products, in the context of network centric operations, prior to their realization in deployable
devices, software, or systems.
Example Military Scenarios
The effectiveness of new technologies in, for example, the composite network of networks used
in a tactical ground mission cannot be readily determined by single-discipline metrics (such as
communications bandwidth or effectiveness of information retrieval). Our goal is integrated,
multi-discipline experimentation that allows validation of specific research products and research
directed toward their combined effects in realistically modelled multi-genre environments.
Imagine that consortium research leads to a method for analyzing information networks to locate
detrimental nodes whose overall effect is to decrease information signal and increase information
noise. As the first step to transitioning this research into a useful product, experimentation could
be performed to demonstrate that during a simulated mission planning and execution scenario, a
platoon using a communications system that automatically suppresses message traffic from
detrimental nodes performs more effectively. Similarly, if consortium research led to a new
communications network routing algorithm which was 40% more robust, but other research
suggested that increased network reliability sometimes triggered unexpected system failures due

NS CTA IPP v1.4 6-56 March 17, 2010


to a spike in demand for network services, experimentation could be performed to determine
whether distributed teams gathering intelligence about tribal leaders reacted to improved
reliability in their communications network by increasing the frequency and quantity of
messaging.
Impact on Network Science
This research will make two key contributions to network science. The first is the fundamental
understanding required to define experimental requirements and methods that network science
research needs in order to perform effective experimentation; the second is the applied research
that will enable distributed, interoperable, systems for performing such experimentation.

6.7.3 Key Project Research Questions


The overall CTA must quickly realize a geographically distributed experiment environment
which is built for flexibility and extensibility. We are starting with pieces, some as of yet
unidentified, and few well defined experimental needs. Thus the key research questions in this
project are:
What experiment environment capabilities should be established in order to meet near-
term composite network experiments *across* the CTA?
What experiment network, computing, data, and visualizations need to begin planning
and development to be ready for experimental use later in the CTA?
Can one or more unified experiment design approaches be put into practice in the IRC
that will serve near and far-term CTA experimental needs to ensure validation and other
NS CTA goals?
What are the fundamental assumptions and common approaches which underlie
composite networks experimentation?
Inherent in these questions is the underlying theme that any NS CTA experiment is dependent on
many other CTA projects and tasks. We will be putting the ontologies, metrics, models, data
sets, and visualization approaches developed throughout the CTA to use in the distributed
experiment environment.

6.7.4 Initial Hypotheses


As this project is primarily applied research interacting with basic research throughout the CTA
the operational hypotheses are more applied in nature than other CTA tasks.
Until basic connectivity with reasonable security is established, combined network
experimentation is limited or cost prohibitive. Our hypothesis is that a basic package of
communications tools can be created that will enable all willing CTA Centers,
institutions, companies, and individual researchers to form a heterogeneous network that
provides a practical basis for interoperable distributed support of network science
experimentation.
A few of the interdisciplinary projects in the CTA will be ready for near-term
experiments, or be fundamentally experimental in nature. We hypothesize that analysis
of these near-term ready experimental projects will allow creation of experiment

NS CTA IPP v1.4 6-57 March 17, 2010


environment approaches that have demonstrable sustained value for CTA research
experimentation and validation.
We further hypothesize that the sustained development of research environments using
emerging research results can sufficiently anticipate and support long-lead experiment
needs (in the absence of definitive requirements) to allow rapid, effective
experimentation for future CTA research products.

6.7.5 Technical Approach


Overview
The general approach of this project is to aggressively work the current and long-lead elements
of experiment environment architecture and experimental design in the order that best supports
CTA requirements. The tasks below reflect this approach. We are stepping out in this first year
to establish the environment for near-term experimental research and to build an agile general
architecture to serve the CTA over the longer term. Project leadership will be in tight
coordination with CCRI and other cross network-genre leads to coordinate and synchronize
products (e.g. ontologies) with experiment planning. All tasks lead toward experiment capability
that will be used widely across the CTA as the program progresses.

6.7.6 Task R3.1: Shared Environment for Experimentation in Composite Networks


(J. Hancock, ArtisTech (IRC); A. Leung, BBN (IRC))
NOTE: This is a 6.2 applied research task.

Task Overview
This task will extend through the course of the project. The main goals are to:
Analyze the opportunities and requirements for experimentation in composite networks,
based on the initial consortium research directions and military needs.
Develop a shared environment for experimentation in composite networks, selecting and
integrating resources (such as datasets, models, simulations, ontologies, methodologies,
software tools) created and used throughout the consortium.
Lead collaborative experimentation in composite network through outreach, education,
coordination, and planning across the consortium.
Creation of the shared experimentation platform will start in the first year, and additional
components and capabilities will be added as the composite experimentation needs of the
consortium grow. A key aspect of the initial year effort will be to identify a small number of
experimentation components which are mature enough to deploy as a shared resource and to
begin the integration work necessary to deliver capabilities for cross-genre network
experimentation. We anticipate that one likely component is a game-based interface suitable for
presenting a scenario and capturing human behaviors from teams, in order to provide realistic
action input and mission performance measurements for experiments on underlying
communication and information networks.
A central objective of this multi-year task is to establish the fundamental interoperability of a
distributed experiment environment to make Cross-Cutting Research Initiative (CCRI)
experimental needs and experimentation with combined network research across the ARCS
possible. As the program progresses we intend to stay in front of research and experimental need

NS CTA IPP v1.4 6-58 March 17, 2010


taking on the long-lead challenges in experiment environment establishment, data set access,
parallel processing availability and readiness, and distributed research computing architecture.
This task will also include making an inventory of sharable resources such as simulations,
models, visualization tools, analysis tools, data sets, testbeds, and networked connectivity
resources available for the CTA. This is especially important in year one as the new Centers get
to know each other. We will use the CTA web-site to develop a virtual team experimentation
center enumerating experiment assets, integration, timelines, and planned experiment timing and
status. Information on resource capability descriptions and usage requirements will also be
included.
Task Motivation
The Network Science CTA is a hybrid 6.1 basic and 6.2 applied research program. A key
purpose of the applied research projects is to determine, through experimental validation and
assessment, how basic research products might be used to address military relevant problems.
This task is specifically designed to actively create broad collaboration across the CTA to gather
and coordinate planned and emerging experiment artifacts and needs. In this task we begin on
fundamental connectivity and computing architecture while simultaneously actively coordinate
plans and needs while working long-lead environment, scenario and instrumentation
requirements.
The role of a comprehensive modeling and simulation toolset is vital to the success of NS CTA.
A shared experimentation platform and coordinated experimentation approach is required,
because no single research group has the resources needed to create an experimentation
environment that encompasses all three types of networks and the interactions between them.
Research progress should also be more efficient if groups across the alliance are able to leverage
each other‘s experimentation resources.
We anticipate that many of the theoretical models initially developed across the consortium will
be descriptive rather than quantitative. By providing a platform where computer simulations of
different network genres can be combined with behavior input from agent models or actual
humans, we will make it possible to iteratively compare and refine these models.

Key Research Questions


Which existing experimentation resources, if integrated into a shared platform, are most
likely to satisfy both near-term and long-term experimentation needs, and be of broadest
general use?
Based on anticipated experimentation needs, what necessary additional experimentation
infrastructure (such as data collection, storage, or visualization tools) must be developed?
How can the shared experimentation platform be deployed, publicized, and demonstrated
so that it is most accessible and useful to consortium researchers?
How can we efficiently integrate diverse components, such as communication network
simulators, information network models, and game-based simulators of social networks,
to represent scenarios of military interest?

NS CTA IPP v1.4 6-59 March 17, 2010


Figure 1: High-level capabilities of a composite network simulator

Initial Hypothesis
The overall CTA must quickly realize a geographically distributed experiment environment
which is built for flexibility and extensibility. We are starting with pieces, some as of yet
unidentified, and a few well defined experimental needs. Thus the key initial research
hypotheses in this project are:
Multiple consortium groups have experimentation resources which should be of general
use for experimentation in composite networks.
A basic pack of security and distributed computing software can rapidly enable sharing of
data and models in a substantial but non-real-time way across the ARCs.
A shared platform for experimentation is feasible to develop, deploy, and evolve in a
dynamic CTA research environment.
After examining the initial hypotheses, this task will then go on to determine how:
Consortium researchers will perform collaborative experimentation with composite
networks, using the shared platform.
One can realistically assess the potential effectiveness of network science research
products by placing human subjects in simulated environments and measuring their
responses and decisions in reaction to properties of and events in their social, cognitive,
information and communications networks.
Inherent in these explorations is the underlying theme that any NS CTA experiment is dependent
on many other CTA projects and tasks. We will be putting research products like ontologies,
metrics, models, data sets, and visualization approaches to use in the distributed experiment
environment.

NS CTA IPP v1.4 6-60 March 17, 2010


Prior Work
The IRC and each ARC listed a number of testbeds, models, data sets and tools in their
proposals. BBN and ArtisTech [Carley09] have each successfully integrated research products
from a variety of Government, Industry, and Academic sources for R&D purposes.
There is strong potential to develop deep understanding of behavior when the environments
(including communications, information, and social networks), goals, and social contexts can be
altered. This is the hallmark and the appeal of virtual environment testing. The military makes
good use of virtual environment for training (such as DARWARS Ambush!, VBS2, and
OLIVE), but training platforms do not suffice for composite network experimentation, because
they do not include the capability for flexible incorporation of models and simulations of
complex information and communications environments. The military also performs extensive
modeling and simulation experimentation and integration, and we will draw heavily on work
from these domains. We will build on the methods and approaches of the Virtual World
Exploratorium (VWE), applying insights from a multi-theoretical and multi-level perspective
[Contractor05])of experimentation on human behavior in complex situations to the design of a
composite network experimentation environment.
There are two critical limitations of prior work for the purposes of addressing the experimental
platform needs of the NS CTA. First, existing systems tend to gloss over details such as
distributed protocol operation and inconsistency in state shared between entities. Second, and
critically for the needs of the NS CTA research program, existing simulation-based training
environments are not designed for rapid, flexible integration of newly envisioned network
capabilities across all three networking genres. Thus, they are poorly suited to experimentation
with, and validation of, emerging 6.1 and 6.2 research results and technologies. Our team has
experience running long-term experimental models of game-based worlds, testing social
variables as well as changes in communication network architectures [Williams06a,
Williams06b; Williams07). Our platform will support multiple levels of behavior and interaction,
ranging from individuals to teams to multi-team systems interacting with communication and
information networks under various conditions of task dynamism, member composition, and
environmental conditions.
Technical Approach
Integrating extant disparate simulations, hardware, data sets, and visualization and analysis assets
is always a challenge in engineering. Research experimentation needs are often unpredictable
due to unknown and surprising discoveries. We have selected starting points based on ongoing
research efforts; we also have a plan to work with the CCRI and Collaborative efforts to support
emerging experimental needs. We expect that particular CTA research products will directly
support integration efforts, (e.g. the EDIN ontology project.) We will initially and periodically
canvas the research centers and interdisciplinary project leaders to gather a set of plausible
experiment concepts with ARL-approved scenarios that we will use to focus and prioritize IRC
readiness activities.
We will establish an Experimental Coordination Committee to assist with outreach to the other
ARCs. The committee will gather information from consortium researchers about their
composite network experimentation needs and possible shared resources. We will also use the
alliance seminar series as a means of publicizing and championing collaborative composite
experimentation. The initial seminar will detail how collaborative research teams can plan,

NS CTA IPP v1.4 6-61 March 17, 2010


schedule, execute and analyze joint network experiments with the IRC facility and overview the
general approach and initial tools and network that will be available in the first year.
In creating the shared platform for experimentation, we will use existing resources for network
modeling and simulation. We will select from resources currently used by CTA researchers,
identifying a few which are broadly applicable to network centric military scenarios and mature
enough for shared use. Eventually, the shared platform will include models and simulations of
communication, information, and social networks, as well as a way to present scenarios
involving these networks to groups of people and measure their task performance. Though the
capability for independently varying each of the three network genres will be supported,
particular experiments may focus on the interaction between two network genres. The high-level
capabilities of such a validation environment are illustrated in Figure 1.
Leveraging existing resources from ARL and the consortium, we will not need to develop the
proposed experimentation platform from scratch. Instead we will build it upon the collective
wisdom and toolsets developed by the networks research community. We will add any composite
network experimentation components which are needed for consortium-wide collaboration,
drawing as much as possible on COTS, GOTS, and existing open source solutions. Capabilities
for data storage, collection, visualization, or analysis are examples of components that may need
to be tailored for use in the shared platform. One example of an existing resource is the VBS2
game-based platform, which is already in use for Army for training. By starting from an
endorsed training platform, we can be assured that the virtual environment will be able to present
realistic, relevant military scenarios and will have a reasonable cost. Additionally, it will be
easier to recruit military partners and participants for experimentation in the future if they are
already familiar with the game platform.
When the ability to alter the teams and conditions is added to the system, it will allow for
experimental designs testing different team types and compositions, e.g. varying size,
experience, age or demographic mixes and the like. On the software side, we need to put
sufficient ‗data collection/logging hooks‘ into the system so that all events/variables of interest
are logged at the appropriate resolution. Ideally the event logging system would be configurable,
and non-intrusive. This turns a one-off simulator into a Petri dish for low-cost experimentation of
team performance and psychology [Poole05], which can evaluate how mission effectiveness
depends on the effectiveness of the composite networks that the team has available.
For the experimentation platform, we will also utilize the capabilities of ARL‘s WEL (Wireless
Emulation Lab) and MNMI (Mobile Network Modeling Institute) for simulation, emulation, and
experimentation (this is the SEE cycle) of large scale, and highly mobile MANETs. The
difference between MNMI‘s approach and other simulators is its focus on fidelity and
performance from both technical and operational perspectives.
Basic internet information exchange and storage is already established to BBN and the NS-
CTA.org website for information and documentation exchange. In this task initial establishment
of connectivity will focus on the IRC Facility interoperability with CISD (for IN and CN) at
Adelphi (ALC) and possibly Aberdeen (APG), HRED Cognitive Assessment, Simulation, and
Engineering Laboratory (CASEL) (for SCN) at APG and ARCs as they come up to readiness.
We expect that the evolving and growing distributed NS CTA infrastructure will be quite
heterogeneous.

NS CTA IPP v1.4 6-62 March 17, 2010


We propose to create a set of packages including the correct version and configuration of
OpenVPN, and probably Kepler and Hadoop for the anticipated collaborative platforms in the
CTA. We plan to test using models and simulations from ArtisTech and BBN that will probably
be useful in future experiments. We will test these between the IRC Facility and ArtisTech and
work with ARL to select pilot sites for initial installations. Optimally these would include the
HRED CASEL laboratory, and several CISD laboratories. These tools will enable research
teams to open in-genre simulations, models, databases, etc. to other CTA researchers with
commercial (password) and VPN level security. These are short-lead capabilities that are
known, available, and immediately actionable. These capabilities mainly speak to non-real-time
or scaled-time research and experimental computing work.
With the Internet, web services, workflow, and VPN under way quickly we also propose to start
down the longer path towards High Performance Computing (HPC) availability in the APG
Major Shared Resource Center (MSRC) [ARL-center] and the standup of a connection between
the IRC and ALC and APG. High speed networking and HPC capabilities will be critical as we
scale up the network experimentation to real world problems. Information (e.g. Internet), Social
(e.g. Facebook), and Communications (e.g. cellular phones) networks all have instances of huge
installation bases, for example (relatively) Google, Facebook, and Cellular Phone networks.
Further we may find in some cross genre experiments that accurate time representations (real-
time or controlled scaled real-time) is an important experiment design feature. Any one of these
challenges point to the need for high speed computing and networking architecture.
Validation Approach
Validation that we have designed and implemented a useful shared platform for composite
experimentation can be measured in three ways:
The perceived usefulness and the degree to which consortium-wide requirements are met
by the shared experimentation platform can be measured by the level of interest and
collaboration from across the consortium.
The usability of the experimentation platform will be examined and improved by our own
use of it (Subtask R3.3).
The feasibility of such platform creation, deployment, and evolution will be demonstrated
by the level of effort required to deliver the product.
Subtask R3.3, Applied Experimentation in Composite Networks, will discuss in more detail the
types of experimentation that we plan to perform.
In this applied (6.2) research task we will establish the fundamental workings of the IRC Facility
for interaction with ARL and ARCs in access to assets for cross network-genre basic research
experimentation, applied research validation, applied research adaptation, prototyping, and
experiment preparation; we will also work the long lead issues that will allow the CTA to
achieve large scale multi-network experimentation. It is essential that the IRC lead efforts to
jointly validate research findings, assess system impact of results, and push research results
toward realistic prototypes that apply research breakthroughs to Army challenges in military
systems operation, design, and team dynamics.
Summary of Military Relevance
Military actions often involve interrelated communication, information and social networks all
utilized during mission planning and execution. From the largest operation to the shortest

NS CTA IPP v1.4 6-63 March 17, 2010


mission, solders rely on communication networks, sort through large amounts of information,
and work in teams among different organizations. Thus, it is critical to measure the mission
performance of various teams of soldiers that are using information networks and
communication networks, in the context of existing and emerging social networks, within an
experimental scenario. This task will enable instrumentation and measurement of such
performance and enable the study of joint network effects on a multitude of mission metrics.
Linkages and Collaboration
This task will create the validation platform through which much of the NS CTA effort will be
evaluated and thus, implicitly and explicitly collaborates with virtually the entire NS CTA. In
particularly, experimentation in future years may involve linkages across the CCRIs and ARCs
such as:
Demonstrating practical, behavioural measures of trust and information quality.
Demonstrating the evolving dynamic behaviour of networks and how network usage
patterns adapt to events like disturbances in the communication layer, deliberate
misinformation propagation, or abrupt removal of key nodes in a social network.
The Experimentation Coordination Committee will consist of liaisons from the other three
ARCs, ARL representatives, and the R3 leads. Through the members of this committee, we will
stay informed about experimentation needs and priorities from the other ARCs, which will
influence the selection of components for inclusion in the shared platform.
The experimentation platform from this task will be used in IRC task R3.3 to perform composite
network experiments. Simultaneously, the experimental design guidance developed under R3.3
will feed into this task‘s platform design. Conclusions from IRC tasks R3.2 and R3.4 will
influence the experimentation platform evolution in future years.
Research Products
Incrementally establish a distributed Shared Experimentation Environment centered in the IRC
Facility. [R3.1-M4]
Compile an Experiment and Environment readiness schedule that takes CTA-wide project needs,
goals, plans, and product timelines. Publish this on the website and keep it current as
opportunities of changes occur. [R3.1-M3]
Create an evolving ―NS-CTA Experiment Capability, Tools, Models, Data Sets, Metrics, and
Methods‖ living document to enhance CTA-Wide understanding of experiment possibilities.
[R3.1-M3]
References
[Carley09] Carley, K, Martin, M, Hancock, J. 2009, Dynamic Network Analysis Applied to
Experiments from the Decision Architectures Research Environment. Chapter 1 of Advanced
Decision Architectures for the Warfighter: Foundations and Technology.
[Contractor05] Contractor, N., Wasserman, S., & Faust, K. (2005). Testing multitheoretical,
multilevel hypotheses about organizational networks: An analytic framework and empirical
example. Academy of Management Review, 31(3), 681-703.
[Poole05] Poole, M. S., & Zhang, H. (2005). Virtual teams. In S. Wheelan (Ed.), Handbook of
group research (pp. 363-386). Thousand Oaks, CA: Sage.

NS CTA IPP v1.4 6-64 March 17, 2010


[Williams06a] Williams, D. (2006). Groups and goblins: The social and civic impact an online
game. Journal of Broadcasting and Electronic Media, 50(4), 651-670.
[Williams06b] Williams, D. (2006). Virtual cultivation: Online worlds, offline perceptions.
Journal of Communication, 56(1), 69-87.
[Williams07] Williams, D., Caplan, S., & Xiong, L. (2007). Can you hear me now? The impact
of voice in online gaming community. Human Communication Research, 33(4), 427-449.

6.7.7 Task R3.2: Basic Research to Enable Experimentation in Composite


Networks (D. Williams, USC (IRC); A. Leung, BBN (IRC); J. Hancock,
ArtisTech (IRC); N. Contractor, NWU (IRC); M. Poole, UIUC (IRC); J.
Srivastava, UMinn (IRC))
NOTE: This is a 6.1 basic research task.

Task Overview
In order to realistically reflect performance in military relevant scenarios ranging from operation
planning to analysis of intelligence to execution of humanitarian relief missions, we must be able
to incorporate human behaviors. The experimentation platform developed under R3.1 will
eventually include methods to simulate simple human actions can be using automated computer
agents. However, initial development of the composite experimentation platform will emphasize
integration of a virtual environment as an interface to capture human behaviors. Task R3.2
focuses on developing and validating a methodology for practical experimentation in composite
networks, particularly experimentation using human participants acting in virtual environments.
This task differs from R3.3 because this task focuses on experimentally understanding some key
principals required to enable general experimentation in composite networks, while the emphasis
in task R3.3 is to perform experimentation to validate and assess specific composite network
theories.
Task Motivation
Experimentation in full-scale, real-world composite networks is most realistic, yet not usually
feasible due to expense and lack of experimental control. Yet, composite networks simulations
that rely only on scripted or automated behaviors to model human decisions may not produce
realistic results. We must understand which types of network characteristics and behaviors are
accurately modeled in simulation, and which require more real-world input of actual human
behaviors. New controlled, virtual experimentation environments can incorporate some degree of
more realistic human behaviors, but their limitations in this domain are not fully understood. In
particular, it is important to understand which types of behaviors and phenomena (i.e. formation
of trust, sharing of information, use of communications, emergence of leadership) unfold
differently in virtual environments than in live environments. Before we can rely on results from
virtual world experimentation, we must assess how closely virtual world behaviors map to the
real world.
Similarly, much research in network science depends on civilian dataset and civilian experiment
participants, due to limited availability of military data and participants. It is important to
understand whether and how networked interactions among military populations may differ from
civilian analogs.

NS CTA IPP v1.4 6-65 March 17, 2010


Additionally, controlled experimentation is most easily performed with scenarios that play out
within minute or hours. Yet many composite network phenomena unfold over weeks or months
in the real world. Thus, an important basic research questions for experimental study of
composite networks is how long time-scale phenomena can be effectively tested. Potential
approaches include faster-than-real-time virtual environments, web-based experimentation, or
experimentation within persistent virtual worlds.
Key Research Questions
This ongoing task focuses on several basic research questions:
How do observed network phenomena, such as information sharing, establishing and
maintaining trust, or robustness to failure, differ when a significant amount of
interactions take place in live versus virtual environments?
How do people‘s network-relevant behaviors, decisions, and relationships differ when
they interact in live environments versus virtual environments?
How can long time-scale network behaviors be most feasibly studied in a composite
network context?
Are there significant differences between networked interactions in military and civilian
populations?
Do two major theoretical frameworks, (1) an integrative model of teamwork expanding
on the Salas et al. [Salas05] ―Big Five‖ model and (2) the multi-team systems model
proposed by Marks et al. [Mathieu01], provide a useful approach for analyzing team task
performance within composite networks?
All of these questions are important for experiment design and results interpretation, but we will
focus on one or two of the key questions during the first year.
Initial Hypotheses
We will select one or more of these initial hypotheses for the first investigation. Selection will be
influenced by (1) availability of existing experimental datasets (2) availability of analogous
live/virtual exercise for study (3) availability of military experimentation collaborators and
participants.
There is a significant but not perfect correlation between network-relevant human
behaviors in virtual and live environments.
Military training exercises, both live and virtual, can provide scenarios in which
significant and important composite network behavior can be tested.
There is a significant but not perfect correlation between observed networked interactions
in military and civilian populations.
Military relevant, long time-scale, composite network phenomena can be satisfactorily
studied using a combination of pre-existing and deliberately constructed environments
and scenarios.
The Integrative Teamwork Model can be extended to distributed, virtual teams (Poole &
Zhang, 2005) whose members are connected in composite networks.

NS CTA IPP v1.4 6-66 March 17, 2010


Prior Work
Prior investigation of the correlation between behaviors in live and virtual environments has
been performed, but it has not focused on phenomena most relevant to understanding composite
networks [Yee07]. It has primarily examined issues of ―virtual presence‖ such as eye-gaze or
personal space, in the context of dyadic interactions. Our emphasis will be on teams and group
interactions.
Prior investigation of the effect of virtual environment training on live exercise performance has
been performed, but while such work also studies live/virtual analogs, it does not focus on the
question of how ―true to life‖ observed behaviors in virtual worlds are [Hussein09]. We are
interested in examining training exercises because they provide a valuable source of military
relevant scenarios designed to take place in a limited time frame and to be staged using limited
resources. However, our interest is not in training effectiveness, or how virtual world training
can improve real world performance. Instead, our focus is on establishing the mapping between
observed behaviors in virtual and live environments. Another shortcoming of existing game-
based training systems as applied to composite network research is that such platforms do not
generally support long-term longitudinal experimentation. Sessions (whether short-term or multi-
day) stand alone and are not continued later. This limitation must be overcome in order to
perform experimentation and validation of the mutual evolution over time of interacting social,
information, and communications networks.
In addition to using training exercise design as a starting point for experimental design, we will
also use the Salas Big Five Model of teamwork. Based on an extensive review of models of
teamwork effectiveness, this model posits five components of effective teamwork—team
leadership, team orientation, mutual performance monitoring, back-up behavior, and
adaptability—and three coordinating mechanisms that promote effective execution of the five
components—shared mental models, closed-loop communication, and mutual trust. The model
posits causal relations among the five components and interaction effects between the
coordinating mechanisms and various components that moderate their impact on team
effectiveness. Each of the eight constructs is described in terms of specific behaviors required
to execute the construct and so can be easily connected to experiment design.
Mathieu [Mathieu01] describes multi-team systems and argue that input, process and output
interdependencies, and goal hierarchies are key constructs for describing multi-team systems
(MTS), that the environment is a key influence on MTS, and that multitasking and coordination
are particularly important processes in MTS. Research in this area is still in its infancy (see
[Marks05] for an example).
Technical Approach
We plan to use military training exercises as the basis for creating experimental scenarios that
include realistic and important tasks performed by teams. To study the mapping between live and
virtual behaviors, we will seek out existing live/virtual training exercise analogs for observation
and instrumentation. If no appropriate existing analogs are found, we can develop a virtual
analog for an existing live exercise, using an existing game-based training platform (such as
VBS2). Such experiments require collaboration and cooperation for performing from military
training sites. We will draw on BBN, ArtisTech, and ARL contacts with sites such as the USMA

NS CTA IPP v1.4 6-67 March 17, 2010


at West Point, Fort Lewis, or FLTC to seek such collaboration 1. If we are unable to secure first
year cooperation from a training site, we will fall back on more easily available populations
(such as ROTC or other university students). While it would be most realistic to run experiments
involving military participants, we could also use civilian populations to study behavior mapping
or methodologies for simulating long time-scale events.
We will examine time-compression approaches used in exercises or games and evaluate whether
any are appropriate for use in experimental scenarios. In determining appropriate experimental
methods for studying long time-scale phenomena in virtual environments, the world of
commercial virtual games offers two possible counterpoints to the single session model:
persistent worlds and time compression. In persistent world games, the virtual world is never
―off,‖ i.e., it runs continuously and users log into and out of it. Thus, the teams who inhabit these
systems form, meet, advance and stick together. These kinds of longer-term associations are
much more analogous to real-world deployments than single-session training exercises.
Subsequently, they offer the ability to track, model and analyze real teamwork over real time
periods. In contrast, some games use time compression techniques, such as playing out key
episodes in real-time but then having time pass quickly between episodes. Both the employment
of persistent virtual worlds and time compression will be evaluated in the context of enabling
composite network experimentation.
The types of behavioral observations most relevant for mapping (both live/virtual and
military/civilian) or long time-scale investigations will be guided by two major theoretical
frameworks of team and group behavior. These are: (1) an integrative model of teamwork
expanding on the Salas ―Big Five‖ model and (2) the multi-team systems model proposed by
Marks [Mathieu01] augmented by multi-theoretical, multi-level network models. Each of these
will be discussed briefly in terms of work to be performed, both theoretical and applied.
While there have been tests of portions of the Big Five Model, there have been few evaluations
of the entire model. One goal of this project is to operationalize and refine the Big Five Model so
that we can study the impact of task complexity (affected by information networks and
information quality and quantity), task dynamism, environmental stress (affected by
communication network reliability), and group composition (affected by social networks) on
performance. In this continuing task, we anticipate developing the Integrative Teamwork Model,
a contingency model that specifies the relative levels of importance of the team components and
coordinating mechanisms for team effectiveness under varying combinations of the task
complexity, task dynamism, environmental stress, and group composition variables. The result
will be a situationally-adaptive systemic model of teamwork.
Another continuing goal is to extend the Integrative Teamwork Model to distributed, virtual
teams [Poole05] whose members are connected through composite social, information, and
communications networks. This extension will have two purposes. First, the Integrative
Teamwork Model can be built into simulations such as those described in §7.4 so that the
simulations connect to important aspects of teamwork. For example, information loss is likely to
impact teamwork via its impact on shared mental models and leadership capacity, so results of
the simulation can be filtered through these constructs to determine impacts on teamwork.
Second, results of integrative and composite network simulations can be empirically evaluated in

1
BBN has done game-based experimentation or training work at more than 20 Army and military training sites for
previous efforts, but we have not yet arranged collaboration for these experiments.

NS CTA IPP v1.4 6-68 March 17, 2010


experiments in the game environment. One avenue we will consider is to apply recently
developed methods that incorporate fuzzy set and rough set approaches to test equifinal sets of
factors that promote effective teamwork under various combinations of contingencies.
We will investigate how coordination occurs both within and across teams described in terms of
the Integrative Teamwork Model, (a) for various tasks and environments (as defined previously),
(b) for goal hierarchies with varying properties (their depth and whether they are hierarchical or
heterarchical), and (c) for various types of configurations of composite and integrative networks
and processes that happen within them (e.g. information propagation, information loss). The
teams will connect via the three coordination mechanisms (shared mental models, mutual trust,
and closed-loop communication) and the impact of configurations of composite and integrative
networks can be assessed through their effects on the system of coordinating mechanisms that
connects the multi-team system. This model has connections to the work on trust in networks,
on composite networks and on integrative modeling of networks and so can be used to
experimentally evaluate simulation results.
In particular we will study how coordination is structured over time through interactions among
the teams within the system [DeSanctis08]. This structuring is a dynamic process that occurs as
the teams work together and various strategies that reflect effective and ineffective structuring
can be identified. On the practical front, this can enable us to develop advice and training
strategies for coordinating multi-team systems.
Development and testing of the two theoretical frameworks will occur throughout the life of the
project. Throughout the first year, we will evaluate games platforms as interfaces to capture
human behaviors, in terms of how they will enable us to study teamwork and multi-team systems
using the constructs from the theoretical frameworks.
This is a particularly collaborative task. Expertise in experimentation in virtual environments and
communication theory will be provided by D. Williams. M. Poole will lead the work on team
models and small group theory. J. Srivastava will lead analysis of large or complex datasets and
contribute to data instrumentation. N. Contractor provides expertise in social network theory.
ArtisTech will provide expertise in military systems and assist with coordination with military
sites. BBN will provide overall coordination and expertise in military training with virtual
environments.
Linkages and Collaboration
The results from this task will provide validation and guidance for experimental methods used to
study performance and behavior in composite networks. Thus, it is directly relevant to
experimental design and performance in task R3.3, but also relevant to other experimentation
throughout the consortium.
Validation Approach
Through a combination of analysis of existing datasets and/or new experimentation results, we
will (1) identify characteristics that do and do not differ between composite networks
interactions examined in live and virtual environments (2) identify long time-scale phenomena
which are and are not realistically reproduced through various time-compression methods (3)
identify characteristics that do and do not differ between civilian and military populations
interacting with composite networks.

NS CTA IPP v1.4 6-69 March 17, 2010


Summary of Military Relevance
The military has pioneered the use of virtual environments for training, and is increasingly using
virtual environments as a means to test the usability of proposed system interfaces and even to
develop tactics, techniques, and procedures (TTPs) for the use of new equipment prior to
complete system development. Despite widespread common-sense acknowledgement that
actions in a virtual world may not map exactly to real world behaviors, the extent of such
mapping is not well understood. Any increased understanding of how teams or groups interact
differently within composite networks manifested in a virtual environment, compared to the real
world, can be applied to (1) improve design and interpretation of composite network
experimentation (2) improve interpretation of usability or TTP tests using virtual environments
(3) refine game-based training exercises to compensate for behavioural artefacts triggered by
virtual environments.
References
[DeSanctis08] DeSanctis, G., Poole, M. S., Zigurs, I. & Associates (2008). The Minnesota GDSS
Research Project: Group support systems, group processes, and group outcomes. Journal of the
Association of Information Systems, 9, 551-608.
[Hussain09] Hussain, T., Roberts, B., Menaker, E., Coleman, S., Pounds, K., Bowers, C.,
Cannon-Bowers, J., Murphy, C., Koenig, A., Wainess, R., Lee, J., (2009). Proceedings of the
Interservice/Industry Training, Simulation & Education Conference (I/ITSEC).
[Marks05] Marks, M. A., DeChurch, L. A., Mathieu, J. M., Panzer, F. J., & Alonso, A. (2005).
Teamwork in multi-team systems. Journal of Applied Psychology, 90, 964-971.
[Mathieu01] Mathieu, J. M., Marks, M. A., & Zaccaro, S. J. (2001). Multiteam systems. In N.
Anderson, D. Ones, H. K. Sinangil, C. Viswesvaran (Eds.) Handbook of Industrial, Work, and
Organizational Psychology (vol. 2, pp. 286-313). London: Sage.
[Poole05] Poole, M. S., & Zhang, H. (2005). Virtual teams. In S. Wheelan (Ed.), Handbook of
group research (pp. 363-386). Thousand Oaks, CA: Sage.
[Salas05] Salas, E., Sims, D. E., & Burke, C. S. (2005). Is there a ―big five‖ in teamwork?. Small
Group Research, 36, 555-599.
[Yee07] Yee, N., Bailenson, J., Urbanek, M., Chang, F., & Merget, D. (2007). The unbearable
likeness of being digital: The persistence of nonverbal social norms in online virtual
environments. CyberPsychology & Behavior, 10, 115-121.

6.7.8 Task R3.3: Applied Experimentation in Composite Networks (A. Leung, BBN
(IRC); J. Hancock, ArtisTech (IRC); D. Williams, USC (IRC))
NOTE: This is a 6.2 applied research task.

Task Overview
In this task, we will perform applied experimentation to validate composite network theories and
assess the potential military operational applications of these theories. Additionally, we will
develop guidelines and processes for experimental design and shared experimentation platform
use, paving the way for other consortium researchers to do more composite network
experimentation.

NS CTA IPP v1.4 6-70 March 17, 2010


This task will use the shared experimentation platform developed in R3.1 to perform experiments
on composite networks. The specific experiments designed for this task will take into account the
consortium-wide requirements and priorities gathered in R3.1. Additionally, scenario design and
results interpretation will draw on research results from R3.2 and R3.4. However, this task
differs from the basic research experimentation of task R3.2 because this task focuses on the goal
of validating and assessing network science theories, while R3.2 focuses on understanding the
fundamental validity of the experimental method.
Task Motivation
The consortium will perform groundbreaking basic research in network science, but the
envisioned pay-off depends on the application of these theories to real military needs. A measure
of the success of network science research must is the production of concrete tools, procedures,
or systems which serve operational needs. Therefore, the consortium‘s basic research
conclusions must be validated using composite network experimental platforms, and the
potential application of composite network theories must be experimentally assessed.
To perform such validation and assessment, we will facilitate and perform experiments that
involve people performing tasks within military relevant scenarios, using simulated or modeled
social, information, and communication networks within a virtual environment. We will draw on
scenario examples from training exercises because such exercises focus on skills and situations
which are highly relevant to operational needs, and because such exercises are designed to elicit
important behaviors within a limited time frame. Additionally, we anticipate that experimental
scenarios with similarities to training exercises will be of more interest to training sites, making
it easier to secure cooperation and recruit participants.
Another motivation for this task is to lead the way in using a shared experimentation platform.
Consortium researchers will need examples of how to design and perform composite network
experiments with the shared platform produced in task R3.1, in order to facilitate their own
future use of this platform. During this task, we will develop such guidance, and also validate the
usability of the shared platform.
Key Research Questions
In this on-going task, we will tackle a number of broad questions about composite network
experimentation.
How can composite network theories and models be applied to solve military relevant
problems?
How can composite network experiments be structured to produce valid, analytically
sound results while remaining practical to design, construct, conduct and analyze?
We will start with some more specific questions.
What are some experimental scenarios which convey the utility and range of capabilities
for composite network experimentation, illustrating how changes in each of the three
types of networks can affect overall system performance?
What are key considerations in experiment design for studies that incorporate (1) a virtual
environment interface for presenting scenarios, immersively eliciting human behaviors,
and measuring group performance (2) models or simulations of the social, information,
and communications networks within the scenario?

NS CTA IPP v1.4 6-71 March 17, 2010


Initial Hypotheses
A primary and continuing purpose of this task is to show that composite network theories and
models can be experimentally demonstrated to improve overall performance of a group or team
within a virtual environment. Because the experimentation component of this task relies on the
platform developed in task R3.1, initial efforts will focus on experiment design and scenario
development. Initial hypotheses are:
Collaborations can be established at training sites for performing composite network
experimentation with military participants.
Experiment design guidelines based on processes developed for controlling single-
network simulation test and for human subjects experimentation can be applied to
develop composite network experiments.
Experimental scenarios can be developed which take advantage of the shared composite
network experimentation platform to deliver realistic re-creation of how social,
information and communications networks interact in military relevant operations.
Prior Work
General experimental design best practice and guides have been developed, some in the context
of experiments on communications networks or other complex systems, others in the context of
team performance. However, the complexities of composite network experimentation will
require us to extend and tailor an experimental design process that can help guide researchers
who typically focus on one of the three networks genres when they are designing experiments for
composite networks.
Experimentation using models, simulations, agents, and people to study aspects of network
performance has been done, but prior cross-genre composite network experimentation to look at
interactions between multiple types of networks is limited. Composite network experiments will
be more complex to design, perform, and analyze than single network tests. Good experimental
design in networks that include humans is challenging. For instance, (Jain, 1991) identifies 10
distinct steps to designing an experiment and notes 23 common errors that can invalidate an
experiment.
Technical Approach
To develop guidelines for experimental design, we will start with methods used in the
computational network domain and team/organization dynamics domains. Aspects of each will
be necessary for composite network experimentation; for example, experiments with simulated
communications networks must specify software configuration and version information, while
experiments on human teams must specify the minimum number of teams required to achieve
statistical power.
We will base our experimental scenarios on training exercises because they already reflect the
types of situations and skills that the military finds most critical to mission success. Scenario
development can be very time consuming, but starting from an existing exercise will enable
faster progress. We will survey a range of training to identify scenarios in which the three genres
of networks play an important role. Scenarios are likely to involve several participants who act in
the role of US Army personnel, and also some representation of local nationals, allied forces,
NGOs, or other non-US military groups. Representation of these other groups may be through
computer controlled agents or human participants. Training is not representative of all Army

NS CTA IPP v1.4 6-72 March 17, 2010


operations. However it offers a solid experimental approach to constructing instrumented and
scientifically repeatable military scenarios which more realistically represent team dynamics
operating in composite network enabled scenarios.
Any experiments will be pilot tested first with locally available participants, to refine the
scenario, data collection system, network simulation parameters, etc. We will seek partnerships
with a military training site that we have previously worked with, in order to involve military
participants. If we are unable to secure collaboration quickly, we will run early experiments with
civilian participants.
Validation Approach
The usefulness and completeness of the experimental design approach will be evaluated during
analysis of experiments performed using the shared platform. Any weaknesses or incompleteness
in the experimental design approach which are uncovered during our experimentation can be
corrected to improve future composite experimentation in this continuing task and to aid other
consortium researchers.
The usefulness of any performed experiments will be evaluated based on any demonstrated
improvement in system performance that is shown between different conditions in the
experiment, and also by the perceived potential for future military application demonstrated by
level of interest from potential technology transition partners.
Military Relevance
The Army and DoD have an ever-increasing number of networks that are continually interacting
with each other, being integrated (sometimes unintentionally), and new technology and social
practices enter the mix without announcement. Combined network experimentation which
enable better understanding of the underlying models and mechanisms can inform military
equipment and network design, application, procurement, training and operations. An example of
a future capability stemming from improved understanding of composite networks is targeted
strengthening or weakening of groups via introduction or disruption of communications
networks. Imagine if models of composite networks were predictive enough to suggest that a
fledgling democracy movement in a war-torn region could be strengthened by making low-cost
mobile phones available on the open market. Making these sorts of future applications a reality
depends on establishing reliable approaches for designing and performing composite network
experimentation.
References
[Jain91] R. Jain, The Art of Computer Systems Performance Analysis, John Wiley, 1991.

6.7.9 Task R3.4: Dissonance in Combined Networks (D. Sincoskie, C. Cotton, U.


Del (IRC))
NOTE: This is a 6.2 applied research task.

Task Overview
This task looks into the future of combining network models to understand the kinds of hazards
or dissonances that occur when networks are combined. It looks at existing examples of
combined networks, such as BitTorrent (an information network) on top of the Internet (a
communications network), and the various attempts to layer communications networks with
social networks, to understand the implications of combining network models and the types of

NS CTA IPP v1.4 6-73 March 17, 2010


hazards and dissonances that occur. The resulting set of "lessons learned" in the form of a set of
guidelines can be used to predict potential failure modes of composite networks.
Task Motivation
Consider a composite network model. In the NS CTA project, one might pick a three-layer
abstraction where the overall networked system is a mesh of communications, information, and
social/cognitive networks. A simpler example is the Internet, where the layered communications
model: physical, data link, network, transport, and application really involves the interactions of
five individual complex systems, but unlike the prior meshed example, this is typically a vertical
arrangement and only has interfaces pairwise to the components above and below. In either of
these composite network systems, each network or component provides services to other layers
by providing interfaces via which the other components may request service. The interfaces also
hide details of the implementation from other components in order to reduce complexity.
Unfortunately, there are always assumptions, often unstated, about how the service may be used,
and which under some operating conditions, may cause the overall composite network system to
be inefficient or worse, have unexpected failure modes. For example, an internet may provide a
point to point, network layer service such as IP unicast, but implement it over a data link layer
wireless broadcast network such as 802.11. An information networking content application such
as video distribution may then implement its service by building an n-ary tree of content
distributors on top of the communications network. The resultant inefficiencies in the
communications network caused by multiple transmissions of the same data between content
distributors may then make it difficult to provide the desired content service, not because there is
not enough bandwidth allocated to the task but because the network-layer IP unicast service
model does not make use of the inherent broadcast capability of the wireless 802.11 link layer.
This effect is sometimes called the ―law of unintended (network) consequences.‖ Examples such
as the one given above abound. Consider the ―Slashdot effect‖. A large social network of
technologists follow and exchange information in near real-time about new technologies via an
Internet enabled information network (http://slashdot.org/). But often when a web site is
referenced that contains information about a new technology item of interest, the target site lacks
sufficient capacity to deal with its announcement to such a large audience over a short period of
time and fails under the load. The same effect could be seen in well engineered military
information systems which under unusual topically-driven client loads might become unstable or
unavailable without explicit prioritization or load-shedding features. These are examples of
interactions between the composite network components for which point solutions exist.
However, no general solution exists. In a related example also dealing with failed assumptions
around connectivity and capacity, the field of disruption tolerant networking (DTN) has sprung
up most recently and is developing cross-layer solutions to specific network disruptions which
are reminiscent of the time where the physical network channel (analog modems) had such a
high error rate that hop-by-hop recovery was required to make forward progress (e.g. X.25).
A much more serious example exists in the information assurance domain. Systems security
today is built layer by layer from the computer hardware up through the operating system, the
communications network, applications (an information network), and users themselves (a social
network). If this vertical chain is broken at any point, the security of information at any of the
other layers is rendered ineffective. Perfect hardware, software, network, and application
security can be rendered useless by as simple an act as a user writing a password on a piece of

NS CTA IPP v1.4 6-74 March 17, 2010


paper, or copying a classified file onto a removable memory and walking out the door with it.
The security of the entire system is only as good as the weakest link.
Key Research Questions
The central research question is thus: what hazards may exist in composite network abstractions
if we adapt the models used today in communications networks to the domain of network
science? In particular:
Can we understand and model failures in composite networks which arise from
dissonances between the underlying social, information, and communication networks?
How can these types of failures be prevented or mitigated?
Other questions would include:
What layered or composite abstractions are appropriate for NS?
What hidden assumptions are being made in implementation experiments being
conducted across the NS CTA?
What are the unintended consequences of these assumptions?
Prior Work
The synthesis of large systems using logical decomposition or layering is a proven design
technique that allows complexity to be controlled through the use of modularity and information
hiding. This technique is often reinforced with the goal of building simple and elegant solutions
[RFC3439] that are often easier to both construct and maintain [Hoare80]. And success due to
these practices dominate technology in systems, hardware, software, and networks [RFC1122].
Little progress however appears to have been made providing direction for decomposition
selection decisions from efforts starting at early communications networks [Saltzer 84] up to
modern holistic systems engineering practices [NASA07]. Some insights into partitioning have
been recently driven by stakeholder interests [Clark02].
In some cases, performance requirements such as in wireless networking, may dictate deviating
from this design practice toward a more fluid sharing of information and mechanisms between
components [Shakkottai03]. But often, more general issues arise as a tradeoff for the benefit of
reduced design complexity causes failures and other unintended consequences to occur
[Tennenhouse89]. Many examples exist where interface complexities and poor or inadequate
assumptions bound to the inter-component services design cause failures [Halderman08]. And
unfortunately, this issue sometimes comes at great cost, e.g. Space Shuttle Challenger, especially
as the complexity of individual layers become large systems in themselves [Tucker09].
Nor does guidance exist on how to test the strength of our assumptions for individual
components and the potential impact between components of flaws in those assumptions.
Managing the complexity and safety of the rapidly expanding social, information, and
communication ecosystem is one of the larger technology challenges we face today, but the
problem may become even more acute as researchers begin to contemplate vastly different
architectures for a future [Clark09] which may involve the coexistence of simultaneous virtual
networks possibly designed for specific social applications [Wu09] or even for specific
communities or countries. And a similar situation will surely be seen in Network Science CTA

NS CTA IPP v1.4 6-75 March 17, 2010


experiments that begin to combine social/cognitive, information, and communications networks
into systems and applications focused on Army needs.
We plan to use the term ―network dissonance‖ in our investigations to describe this conflict
between the desired design simplicity due to decomposition or layering versus the lack of
complete information about assumptions as well as the loss of overall function of individual
components of the entire system due to this partitioning. This not unlike the mental health term
cognitive dissonance, defined as being the discomfort caused by holding two contradictory ideas
simultaneously. Note that network dissonance is a more narrow term than Clark‘s Tussle
[Clark02] and mainly focuses on categorizing, identifying, and reducing flaws that might arise
due to this dissonance between network components.
Initial Hypotheses
In this task, we will first concentrate on reasoning about historic examples, then use these results
as input into experimental scenario design.
A layered network abstraction and security model can be used to explain historic
examples of dissonance and failure in composite networks.
A layered network abstraction and security model and a set of associated guidelines can
be used to predict potential failure in composite networks with enough specificity that
experimental scenarios can be designed to study occurrences of such dissonance.
Technical Approach
This task will formulate a unified approach to the composite network abstraction problem, using
at first simple network models and then meshed models proposed by the other centers involving
combinations of information, social/cognitive, and communications network components. We
will begin this task by constructing at least two simple multi-layer models relevant to the types of
systems studied in the NS CTA. One will be a simple model of the interaction between social,
information, and communications networks and the other a unified information security model
that spans the same three networks.
With these models and network examples we should be able to show reasoning steps that provide
insights for improving the design or architecture of composite networked systems. We would
hope to be able to collect enough examples of flawed assumptions, impacts, and remediations
that we could eventually begin to form a taxonomy covering these situations. The examples and
insights may also allow an exploration of what component specifications might be needed to
identify and foresee these issues prior to implementation. These results will eventually be used
to spur progress in the NS CTA, by leading the way in developing terminology for the types of
problems we are seeing and will see, and providing real world examples that we could then test
against the theory and models developed elsewhere in the CTA (to see if the models revealed
issues). The results can also be used to guide composite network experimentation design towards
conditions where failures may occur.
Reasons we believe our approach will be successful: The layered model used in the internet is
quite mature (35 years old) and the implementation experience is extensive as it has been
combined with many information networks and more recently with mechanized instantiations of
social networking applications. This allows us, in retrospect, to observe the hazards and
limitations of the model caused by the abstract interfaces used at each layer. We have begun
preliminary work to apply this technique to the information security problem faced by the Army

NS CTA IPP v1.4 6-76 March 17, 2010


today, and already observed that the vast majority of security research and development being
performed today will not succeed in staunching the leakage of information because no abstract
layered model of the complete network under attack exists. Without the model, research is
horizontally fragmented and single-layer specific, and the attackers just move to another layer.
With a security model we‘re designing, we have made specific observations on how operating
systems may be constructed that would make the applications they run more security-tolerant.
These very tentative and preliminary results lead us to believe there is merit in our approach
applying this same methodology to the layered composite network problem that faces us in the
Network Science.
Validation Approach
The challenge will be to see if our technique extends beyond internets and security to as yet
undesigned NS networks, and can be made predictive rather than retrospective. To be useful, a
composite network abstraction must provide simplification and information hiding to reduce the
complexity of implementations. To be successful, we will construct a NS model and use it to
identify in advance hazards before they are implemented and widely deployed. Existence of a
model then used to identify one or more hazards is thus our success metric.
Linkages
Task R1.3: Category Theory Based Approach to Modeling Composite Networks: Exchange of
approach and example networks with this task group may improve results of both efforts.
Task R2.1: Semantic Information Theory: Results and insights from this task may be beneficial
in formulating what layer-specific information might be useful in the identification if inter-layer
design hazards.
Task R2.2: Impact of Information Loss and Error: Joint discussion of the nature or flaws in this
task and the inter-layer hazards may help to sharpen and improve results from both tasks.
Task R3.3: Applied Experimentation in Composite Networks: It would be expected that design
insights from this task would be beneficial to the design aspects of Task R3.3.
Research products
We expect to construct two simple multi-layer models relevant to the types of systems studied in
the NS CTA. One will be a simple model of the interaction between social, information, and
communications networks and the other a unified information security model that spans the same
three networks. Further, we will produce metrics and an ontology of types of hazards (based on
historic instances of dissonance) and provide these as feedbacks to the ontology and metrics
efforts in EDIN and to the experimental design effort (Task R3.3 above). [R3.4-M1]
Then, we will apply these models to describe potential experimental scenarios which can induce
cross-network hazards or reproduce conditions for dissonance between networks. The scenario
descriptions will provide guidance on the necessary composite network experimentation
capabilities to implement such scenarios. [R3.4-M2]
References
[Clark02] Clark, D. D., Wroclawski, J., Sollins, K. R., and Braden, R., ―Tussle in cyberspace:
defining tomorrow's internet,‖ IEEE/ACM Trans. Netw. 13, 3 (Jun. 2005), pp. 462-475.
[Clark09] Clark, D. D., NSF Future Internet Summit, October 12-15, ―2009 Meeting summary,
Version 6.0 of November 10th, 2009‖, to be published, http://www.fisummit.info.

NS CTA IPP v1.4 6-77 March 17, 2010


[Halderman08] J. Alex Halderman, Seth D. Schoen, Nadia Heninger, William Clarkson,
William Paul, Joseph A. Calandrino, Ariel J. Feldman, Jacob Appelbaum, Edward W. Felten,
―Lest We Remember: Cold Boot Attacks on Encryption Keys,‖ Proc. 17th USENIX Security
Symposium, San Jose, CA, July 2008.
[Heylighen06] Heylighen Francis, Cilliers Paul, Gershenson Carlos, ―Complexity and
philosophy‖, in Robert Geyer and Jan Bogg (eds.), Complexity, Science, and Society, Radcliffe
Press. 2006.
[Hoare80] 1980 Turing Award Lecture; Communications of the ACM 24 (2), (February 1981):
pp. 75-83.
[NASA07] Systems Engineering Handbook NASA/SP-2007-6105 Rev1, NASA, December 2007.
[RFC1122] RFC1122, ―Requirements for Internet Hosts - Communication Layers,‖ October
1989.
[RFC3439] RFC 3439, ―Some Internet Architectural Guidelines and Philosophy,‖ December
2002.
[Saltzer84] J. H. Saltzer, D. P. Reed, D. D. Clark, ―End-to-end arguments in system design,‖
ACM Transactions on Computer Systems, v.2 n.4, p.277-288, Nov. 1984.
[Shakkottai03] S. Shakkottai, T. S. Rappaport, and P. C. Karlsson, ―Cross-layer design for
wireless networks," IEEE Communications Mag., vol. 41, pp. 74-80, Oct. 2003.
[Tennenhouse89] D. L. Tennenhouse, ―Layered Multiplexing Considered Harmful," Protocols
for High Speed Networks, Elsevier Science Publishers, 1990.
[Tucker09] Walter L. Tucker, ―When Complexity Exceeds the Capability to Understand It,‖
INCOSE, http://www.incose.org/orlando/Attach/200906/ComplexityExceedsCapability.pdf
[Wu09] Lerone Banks, Prantik Bhattachayya, Matt Spear, S. Felix Wu, ―Social Network Kernel
for Future Internet Design,‖ http://cyrus.cs.ucdavis.edu/~wu/DSL/DSL_0722_2009_FIST.ppt.

6.7.10 Linkages with Other Projects


This Project is depending on models, metrics, ontologies, data sets, visualization approaches and
experiment plans from all projects across the CTA. We will be reaching out directly to ARC and
CCRI leaders to gather emerging experiment needs and prioritize a schedule of experiments. We
will form an Experimentation Collaboration Committee, with representatives from each of the
ARCs; members of this committee will function as liaisons and will collaborate in setting
priorities for composite experimentation research. In this project we will be taking a two pronged
approach to establishing an NS CTA distributed experiment environment; we will be supporting
near-term experiments with the best architectures and tools that are available and also working
long lead issues that we expect for the future. This approach puts us in the role of provider for
joint combined network experiment environmental needs. We will be in frequent public
communication with the ARCs, CCRIs and individual researchers to make sure that experiment
timelines align and achieve anticipated validation and research needs. Further this project
specifically lists technologies (such as MSRI) as concrete starting points for collaborating with
ARL to achieve access to NS CTA activities.

NS CTA IPP v1.4 6-78 March 17, 2010


6.7.11 Collaborations and Staff Rotations
This task will be collaborating closely and frequently with the IRC staff, specifically the Facility
staff. Hancock expects to visit the Facility a number of times, but more importantly to
coordinate with facility computational and network evolution to support experimentation
timelines.
We also plan to schedule deliberate meetings with CCRI and other CTA leads to document
experimental requirements. Computational, statistical, analytic, and human participatory
requirements will all be considered.

6.7.12 Relation to DoD and Industry Research


Innovation in the field of network science is vital to the Army‘s (and DoD‘s) future. Network
science takes the challenges of communications superiority and information dominance and
allows us to intertwine them each other and with the cognitive networks of the squad and
command center.
A central goal of this effort is to ensure that intertwining takes place through the creation of a
testing and validation platform that integrates experiments across the different networks, such
that the impact of innovations in one type of network on other networks will be visible.
Another goal is to use this validation platform as a transition platform in which relevance to
military needs and missions is made clear. For that reason we are using immersive, soldier-in-
the-loop, environments that allow us to combine real-world scenarios with innovative networks
predicted by theory but not-yet-realized in the real world. Through this approach we will make
innovations visible to the Army, DoD and, where relevant, industry, early and often.

6.7.13 Project Research Milestones

Research Milestones

Due Task Description

Q3 R3.1 R3.1-M1 Establish an experimental coordination committee,


including liaisons to ARCs. Responsible Party: BBN

R3.1-M2 Publicize shared composite network experimentation


Q3 R3.1 goals through a CTA seminar and the CTA web pages.
Responsible Party: ArtisTech

R3.2-M1 Analyze time-compression techniques used in


Q3 R3.2 commercial games or training exercises to evaluate applicability
for experimentation in composite networks. Responsible Party:
USC

Q3 R3.3 R3.3-M1 Create a composite network experiment design guide.


Responsible Party: BBN

NS CTA IPP v1.4 6-79 March 17, 2010


Research Milestones

Due Task Description

R3.4-M1 Categorize historic instances of composite network


Q3 R3.4 failure, identifying and modeling key characteristics of cross-
genre network interactions relevant to failures. Responsible
Party: UDel

R3.1-M3 Identify simulations, models, tools, and other


resources from alliance researchers and external DoD sources
Q4 R3.1 which may provide required shared composite experimentation
capabilities. Maintain CTA web page listings. Responsible
Party: ArtisTech

R3.1-M4 Initial installation of shared experimentation


Q4 R3.1 infrastructure at the alliance facility. Responsible Party:
ArtisTech

R3.2-M2 Analyze live and virtual training exercises to apply the


Q4 R3.2 Salas Big Five and Multi-team systems models. Identify
necessary scenario elements as a basis for experimentation.
Responsible Party: Univ. Illinois

R3.3-M2 Implement a simplified scenario representing a


military relevant task using a game-based interface to
Q4 R3.3 demonstrate elicitation of behaviors and task performance
which depends on composite network parameters. Responsible
Party: BBN

R3.4-M2 Define simplified experimental scenarios and produce


an associated set of guidelines for studying cross-genre network
Q4 R3.4 dissonance. Identify key experimentation infrastructure
capabilities which would be required to support such scenarios.
Responsible Party: UDel

6.7.14 Project Budget by Organization

6.2 Applied Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)
ArtisTech (IRC) 133,773

NS CTA IPP v1.4 6-80 March 17, 2010


6.2 Applied Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 363,013
UDEL (IRC) 119,080
Williams (IRC) 12,142
TOTAL 628,008

6.1 Basic Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)

ArtisTech (IRC) 48,722

BBN (IRC) 112,995

NWU (IRC) 32,814

UIUC (IRC) 75,920

UMinn (IRC) 30,407

Williams (IRC) 28,330

TOTAL 329,188

6.8 Project R4: Liaison

Project Lead: I. Castineyra, BBN


Email: isidro@bbn.com, Phone: 617-873-6233

Primary Research Staff Collaborators


M. Dean, BBN (IRC)

NS CTA IPP v1.4 6-81 March 17, 2010


J. Hendler, RPI (IRC)
D. Towsley, UMass (IRC)

6.8.1 Project Overview


The purpose of this project is to enable coordination, communication and collaboration across
the research centers of the CTA at a technical level (vs. the institutions such as the TMG which
are focused on administrative coordination). This project has two tasks. One for 6.1 liaison and
one for 6.2 liaison.

6.8.2 Project Motivation


The idea behind this project was that by encouraging interactions between top-flight researchers
at the IRC and researchers at the centers we would enable more technical interactions and
improve coordination, communication and collaboration.

6.8.3 Task R4.1 6.1 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D.
Towsley, UMass (IRC))
NOTE: This is a 6.1 basic research task.

This task is centered on encouraging collaboration and communication of 6.1 projects across the
NS-CTA. In particular, the liaisons will regularly talk with researchers at their respective ARCs
regarding technical activities and then talk among themselves, the project leader, and other IRC
staff looking for potential unexplored or unrealized synergies.
Mike Dean is responsible for liaison with INARC. Jim Hendler is responsible for liaison with
SNARC. Don Towsley is responsible for liaison with CNARC. Note that both Hendler and
Towsley actually receive funding from both the IRC and the ARCs with whom they will conduct
liaison work, making this task very straightforward.

6.8.4 Task R4.2 6.2 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D.
Towsley, UMass (IRC))
NOTE: This is a 6.2 applied research task.

This task is centered on identifying promising research at the ARCs that should be rapidly
moved to 6.2 experimentation, validation and possible tech transfer. It is well-known that
researchers often become so engrossed in the research problems in their work that they do not
realize when the work has reached the point it should transition. This task addresses that risk.
The liaisons for the ARCs will be the same as in Task R4.1

6.1 Basic Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)

NS CTA IPP v1.4 6-82 March 17, 2010


6.1 Basic Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 17,543
RPI (IRC) 17,714
UMass (IRC) 6,705
TOTAL 41,962

6.2 Applied Research Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 4,386
RPI (IRC) 4,428
UMass (IRC) 1,676
TOTAL 10,490

NS CTA IPP v1.4 6-83 March 17, 2010


6.9 Project R5: Technical and Programmatic Leadership

Project Lead: W. Leland, BBN


Email: wel@bbn.com, Phone: 908-227-1139

Primary Research Staff Collaborators


I. Castineyra, BBN (IRC) ARC Leads
BBN Staff

6.9.1 Project Overview


This project consolidates the consortium‘s technical and programmatic management. W. Leland
is the project technical lead. I. Castineyra is the program manager.

6.9.2 Project Motivation


Technical and programmatic leadership and coordination will be fundamental to the success of
the program. The program‘s emphasis on collaboration between geographically dispersed
specialists in different disciplines necessitates strong coordination.
The program includes over one hundred researchers and over twenty-five different institutions.
Pervasive programmatic coordination is a must.

6.9.3 Task R5.1 6.1 Technical and Programmatic Leadership (W. Leland, BBN
(IRC); I. Castineyra, BBN (IRC))
NOTE: This is a 6.1 basic research task.

This task addresses technical and programmatic leadership of 6.1 funds.

6.9.4 Task R5.2 6.2 Technical and Programmatic Leadership (I. Castineyra, BBN
(IRC); W. Leland, BBN (IRC))
NOTE: This is a 6.2 applied research task.

This task addresses technical and programmatic leadership of 6.2 funds. This task also includes
management of the NS CTA facility in Cambridge MA.

6.9.5 Project Budget by Organization

NS CTA IPP v1.4 6-84 March 17, 2010


6.1 Leadership Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 412,062
TOTAL 412,062

6.2 Leadership Budget By Organization

Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 361,502
TOTAL 361,502

NS CTA IPP v1.4 6-85 March 17, 2010


6.10 Project EDUC: Education Planning

Project Lead: D. Sincoskie, UDel


Email: sincos@udel.edu, Phone: 302-831-7173

Primary Research Staff Collaborators


C. Cotton, UDel (IRC) ARC Leads
BBN Staff

6.10.1 Project Overview


The Education task covers coordination of education-related activities for the NS CTA. These
activities include providing educational opportunities for Government personnel, organizing
seminar series and placing faculty and graduate students with internships and fellowships on
relevant ARL projects.

6.10.2 Task EDUC.1 Education and Transition Planning (D. Sincoskie and C.
Cotton, UDEL (IRC))
NOTE: This is a 6.1 basic research task.

This project consists of one task.

6.10.3 Project Budget by Organization

EDUC: Education Planning (6.1)

Government Funding
Organization Cost Share ($)
($)
UDEL (IRC) 48,276 20,260
TOTAL 48,276 20,260

NS CTA IPP v1.4 6-86 March 17, 2010


7. Non-CCRI Research: Information Networks Academic
Research Center (INARC)

Director: Jiawei Han, UIUC


Email: hanj@cs.uiuc.edu, Phone: 217-333-6903
Government Lead: Lance Kaplan
Email: lkaplan@arl.army.mil, Phone: 301-394- 0807

Project Leads Lead Collaborators

Project I1: C. Aggarwal, IBM;


T. Abdelzaher, UIUC
Project I2: X. Yan, UCSB

Project I3: J. Han, UIUC

Table of Contents
7. Non-CCRI Research: Information Networks Academic Research Center (INARC) ........... 7-1
7.1 Project I1: Distributed and Real-time Data Fusion and Information Extraction ............ 7-4
7.1.1 Project Overview ..................................................................................................... 7-5
7.1.2 Project Motivation ................................................................................................... 7-5
7.1.3 Key Research Questions .......................................................................................... 7-6
7.1.4 Initial Hypothesis ..................................................................................................... 7-6
7.1.5 Technical Approach ................................................................................................. 7-6
7.1.6 Task I1.1 Quality-of-Information-Aware Signal Data Fusion (T. Abdelzaher, UIUC
(INARC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY (INARC); S. Papadimitriou,
IBM (INARC); R. Govindan, USC (CNARC)) ................................................................... 7-7
7.1.7 Task I1.2 Human and Visual Data Fusion (T. Huang, UIUC (INARC); B. S.
Manjunath, UCSB (INARC); H. Ji, CUNY (INARC); T. Höllerer, UCSB (INARC); C. Lin,
IBM (SCNARC); A. Pentland, MIT (SCNARC); Z. Wen, IBM (SCNARC)) .................. 7-10
7.1.8 Task I1.3 Modeling Uncertainty for Quality-of-Information Awareness in
Heterogeneous Information Network Sources (H. Ji, CUNY (INARC); C. Aggarwal, IBM
(INARC); D. Roth, UIUC (INARC); A. Singh, UCSB (INARC)).................................... 7-16
7.1.9 Linkages Table to Other Projects/Centers ............................................................. 7-20
7.1.10 Collaborations and Staff Rotations ...................................................................... 7-20
7.1.11 Relevance to US Military Visions/Network Science ........................................... 7-20
7.1.12 Relation to DoD and Industry Research .............................................................. 7-21
7.2 Project I2: Scalable, Human-Centric Information Network Systems ........................... 7-24
7.2.1 Project Summary/Research Issues Addressed ....................................................... 7-25

NS CTA IPP v1.4 7-1 March 17, 2010


7.2.2 Key Research Questions ........................................................................................ 7-25
7.2.3 Technical Approach ............................................................................................... 7-25
7.2.4 Task I2.1 Information Network Organization and Management (X. Yan, A. Singh,
UCSB (INARC); C. Aggarwal, IBM (INARC); G. Cao, PSU (CNARC); J. Hendler, RPI
(IRC)) 7-26
7.2.5 Task I2.2 Information Network Online Analytical Processing (X. Yan, UCSB
(INARC); J. Han, UIUC (INARC); C. Lin, IBM (SCNARC)) ......................................... 7-30
7.2.6 Task I2.3 Information Network Visualization (Tobias Höllerer, UCSB (INARC); P.
Pirolli, PARC (INARC); X. Yan, UCSB (INARC); W. Gray, RPI (SCNARC)).............. 7-33
7.2.7 Linkages to Other Projects ..................................................................................... 7-37
7.2.8 Relevance to US Military Visions/Impact on Network Science ............................ 7-38
7.2.9 Relation to DoD and Industry Research ................................................................ 7-38
7.3 Project I3. Knowledge Discovery in Information Networks ........................... 7-42
7.3.1 Project Motivation and Overview .......................................................................... 7-42
7.3.2 The key research question of the project ............................................................... 7-43
7.3.3 Technical Approach ............................................................................................... 7-43
7.3.4 Task I3.1 Methods for scalable mining of dynamic, heterogeneous information
networks (J. Han, UIUC (INARC); C. Faloutsos, CMU (INARC); X. Yan, UCSB (INARC);
M. Faloutsos, UCR (IRC); J. Hendler, RPI (IRC); B. Szymanski, RPI (SCNARC)) ....... 7-44
7.3.5 Task I3.2 Real-Time Methods for mining Spatiotemporal Information-Related
Cyber-physical Networks (S. Papadimitriou, IBM (INARC); J. Han, UIUC (INARC); X.
Yan, UCSB (INARC); S. Adali, RPI (SCNARC); T. La Porta, PSU (CNARC)) ............. 7-47
7.3.6 Task I3.3 Text and Unstructured Data Mining for Information Network Analysis (D.
Roth, UIUC (INARC); J. Han, UIUC (INARC); H. Ji, CUNY (INARC); X. Yan, UCSB
(INARC); M. Faloutsos, UCR (IRC); B.Szymanski, RPI (SCNARC)) ............................ 7-50
7.3.7 Linkages to Other Projects/Centers ....................................................................... 7-54
7.3.8 Relevance to US Military Visions/Impact on Network Science ............................ 7-55
7.3.9 Relation to DoD and Industry Research ................................................................ 7-55

The Information Network Academic Research Center (INARC) is aimed at (1) investigating the
general principles, methodologies, algorithms, and implementations of information networks,
and the ways how information networks work together with communication networks, social and
cognitive networks, and (2) developing the information network technologies required to
improve the capabilities of the US Army and provide users with reliable and actionable
intelligence across the full spectrum of Network Centric Operations.

An information network is a logical network of data, information, and knowledge objects that are
acquired and extracted from disparate sources, such as geographical maps, satellite images,
sensors, text, audio, and video, through devices ranging from hand-held GPS to high-
performance computers. For systematic development of information network technologies, it is
essential to deal with large-scale information networks whose nodes and/or edges are of
heterogeneous types, linking among multi-typed objects, highly distributive, dynamic, volatile,
and containing a lot of uncertain information.

NS CTA IPP v1.4 7-2 March 17, 2010


INARC will systematically investigate the foundations, methodologies, algorithms, and
implementations needed for fusing multiple types of data and information, constructing effective,
scalable, hierarchical, and most importantly, dynamic and resilient information networks,
discovering patterns and knowledge from such networks, and applying network science
technologies for military applications. In particular, the center will be working on the following
five research projects:

• Project E(EDIN-CCRI): Foundation of Evolving, Dynamic Information Networks


• Project T (Trust-CCRI): Foundation of Trusted Information Networks

• Project I2: Scalable, Human-Centric Information Network Systems
• Project I3: Knowledge Discovery in Information Networks

Among the five projects, the first two, E and T, are two cross-center research initiative (CCRI)
projects. For these two projects, we will work closely with three other research centers, i.e.,
CNARC, SCNARC, and IRC, to make good contributions on systematic investigation and
development of cross-center network science technologies. Therefore, these two CCRI projects
will be detailed in their corresponding CCRI project descriptions. This INARC IPP proposal will
be dedicated to the accomplishment of the remaining three INARC-centered projects. Moreover,
INARC will be actively contributing to, together with other centers, a comprehensive CTA-wide
education plan.

Here we provide a general overview of the three dedicated INARC research projects, and outline
for every project the major research problems, the tasks to be solved, the organization, and the
plan for collaboration with other centers.

Project I1. Distributed and Real Time Data Integration and Information Fusion (Leads: C.
Aggarwal, IBM (INARC) and Tarek Abdelzaher, UIUC (INARC))

This projects aims to answer two fundamental questions to ensure QoI (quality of information) in
data and information fusion: (i) ―how to integrate and fuse heterogeneous data (sensor, visual,
and textual) that may be delivered over resource-constrained communication networks to infer
and organize implicit relationships to enable comprehensive exploitation of information, and
how to improve fusion by incorporating human feedback in the process?” and (2) ―how to model
uncertainty resulting from resource-constrained communication, data integration, and
information fusion to enable assessment of the quality and value of information?” The focus of
this project is on large scale information extraction and fusion in the context of a large linked
information network. The goals of the project include both the derivation of such logical
linkages as well as their use during the fusion process.

The project consists of the following three tasks:


Task I1.1. Quality-of-information Aware Signal Data Fusion
Task I1.2. Human and Visual Data Fusion
Task I1.3. Modeling Uncertainty in Information Extraction from Heterogeneous
Information Networks

NS CTA IPP v1.4 7-3 March 17, 2010


Project I2: Scalable, Human-Centric Information Network Systems (Lead: X. Yan, UCSB
(INARC))

This project is concerned with the following research problems: (1) ―how to organize and
manage distributed and volatile Information networks?‖ and (2) ―how to analyze and visualize
information networks to provide end-users information tailored to their context?‖ A
sophisticated information network system should present a human-centric, simple and intuitive
interface that automatically scales according to the context, information needs, and cognitive
state of users, and maintain its integrity under uncertainties, physical constraints (communication
capacity, power limitation, device computation capability, etc.), and the evolving data
underneath.

The project consists of the following three tasks:


Task I2.1. Information Network Organization and Management
Task I2.2. Information Network Online Analytical Processing
Task I2.3. Information Network Visualization (Lead: Höllerer)

Project I3: Knowledge Discovery in Information Networks (Lead: J. Han, UIUC (INARC))

This project considers a key research problem: ―how to develop efficient and effective knowledge
discovery mechanisms in distributed and volatile information networks?” Knowledge discovery
in information networks involves the development of scalable and effective algorithms to
uncover patterns, correlations, clusters, outliers, rankings, evolutions, and abnormal relationships
or sub-networks in information networks. Knowledge discovery in information networks is a new
research frontier, and the army applications pose many new challenges and call for in-depth
research for effective knowledge discovery in distributed and volatile information networks.

This project consists of the three tasks:

Task I3.1: Methods for scalable mining of dynamic, heterogeneous information networks
Task I3.2: Real-time methods for mining spatiotemporal information-related cyber-
physical networks
Task I3.3: Text and Unstructured Data Mining for Information Network Analysis

In general, information networks are intertwined with communication networks and social and
cognitive networks in many aspects, therefore, concrete collaboration plans with all the three
centers have been laid out in every project‘s IPP, as well as the plan for the exploration of
military applications.

7.1 Project I1: Distributed and Real-time Data Fusion and


Information Extraction

NS CTA IPP v1.4 7-4 March 17, 2010


Project Lead: C. Aggarwal, IBM
Email: charu@us.ibm.com, Phone: 914-784-6699
Project Lead: T. Abdelzaher, UIUC
Email: zaher@cs.uiuc.edu, Phone: 217-265-6793

Primary Research Staff Collaborators

T. Abdelzaher, UIUC (INARC) R. Govindan, USC (CNARC)

C. Aggarwal, IBM (INARC) C. Lin, IBM (SCNARC)

A. Bar-Noy, CUNY (INARC) A. Pentland, MIT (SCNARC)

T. Höllerer, UCSB (INARC) Z. Wen, IBM (SCNARC)

T. Huang, UIUC (INARC) D. Roth, UIUC (INARC)

H. Ji, CUNY (INARC) A. Singh, UCSB (INARC)

B. S. Manjunath, UCSB (INARC)

S. Papadimitriou, IBM (INARC)

7.1.1 Project Overview


This project discusses the lower level building blocks of the process of information network
construction. Information network construction requires us to collect data from diverse
information sources, infer the virtual links and ontologies among them, and fuse them in order to
build semantically relevant objects which can be leveraged for knowledge discovery and
processing discussed in the other projects. The project develops a number of techniques for
multimodal data sources such as sensors, text, video and audio. Two general themes cut across
all tasks discussed in this project; namely, (i) the notion of optimizing quality of information as a
driver and framework for formulating the various data fusion and information extraction
problems, and (ii) the notion of quantifying uncertainty as a first class attribute of data and
objects derived from the use of disparate, heterogeneous data sources.

7.1.2 Project Motivation


One of the key challenges in information network construction is to be able to integrate data
from heterogeneous sources into summary information that can be effectively leveraged for
knowledge discovery in a variety of military scenarios. In the context of an information
network, a rich amount of logical linkage information exists, which can be leveraged for fusion.
Data sources of military relevance can be broadly categorized as signal data sources, human data
sources and visual data sources. Due to the inherently different nature of signal, human, and

NS CTA IPP v1.4 7-5 March 17, 2010


imagery data-sources, we anticipate that different techniques will be needed to integrate them
into the information network, and to fuse the underlying information with the help of information
network linkages (further organization of, and knowledge discovery from, the extracted
information will be addressed in Project I2, and Project I3, respectively). In military scenarios,
data sources are also likely to have varied degrees of inherent uncertainty. Therefore we need to
model and analyze this non-uniform uncertainty in the context of the fusion process, and how it
propagates into the extracted information so that the trust in and quality of information can be
assessed on a principled basis. The focus of this project is on large scale information extraction
and fusion that can be assimilated into a linked information network. The goals of the project
include both the derivation of information network objects and linkages as well as their use
during the fusion process.

7.1.3 Key Research Questions


How to infer information objects and linkages among them from widely distributed and diverse
data sources in order to build a semantically meaningful information network for inference and
knowledge discovery? How to quantify the uncertainty regarding the state of such objects and
linkages? Is it possible to utilize object linkages to reduce the uncertainty in the inference
process itself? How to address the above problems in a way that optimizes the quality of
information derived from the data sources?

7.1.4 Initial Hypothesis


It is possible to infer reliable information network objects and linkages among them from widely
distributed data sources, and at the same time leverage and utilize these objects and linkages for
improving the inference and fusion process itself, making it superior to conventional techniques
which do not work with a network-centric approach.

7.1.5 Technical Approach


The organization of the tasks in this project is designed towards the processing of different kinds
of data to extract information objects and their inherent linkages both within themselves as well
as with each other in the context of a logically networked environment. Special emphasis is made
on optimizing the quality of information as well as quantifying and reducing uncertainty. We
examine the propagation of quality of information and uncertainty when multiple data sources
are fused together. Accordingly, we propose the following three tasks:
1. Quality-of-information Aware Signal Data Fusion. In this task, we will develop distributed
algorithms to transform sensory information feeds into uniquely identifiable and addressable
local (information network) objects with well-defined semantics that are easy to integrate
into an information network. These algorithms are cast as resource-constrained optimization
of quality of information and attain bounded uncertainty.
2. Human and Visual Data Fusion. In this task, we will develop techniques for fusing a variety
of data sources such as sensor, text, images, and video. From an information network
perspective, our unique approach is to use the virtual linkages during the fusion process.
3. Modeling Uncertainty for Quality-Of-Information Awareness in Heterogeneous Networked
Data Sources: . In this task, we will design methods for quantifying and reducing the

NS CTA IPP v1.4 7-6 March 17, 2010


uncertainty from different information network sources. The framework for characterizing
and reducing uncertainty will be essential in understanding QoI and trust.

7.1.6 Task I1.1 Quality-of-Information-Aware Signal Data Fusion (T. Abdelzaher,


UIUC (INARC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY (INARC);
S. Papadimitriou, IBM (INARC); R. Govindan, USC (CNARC))
Task Overview
This task will study methods for distributed and real-time fusion of sensor data sources into an
information network in a manner aware of quality of information, degree of uncertainty, and
resource constraints. In particular, we close a feedback loop between the information network
and the physical sensing system to improve attained information quality, bound the degree of
uncertainty and reduce sensing resource needs.

Task Motivation
Information networks rest on two fundamental abstractions: the abstraction of information
objects and the abstraction of links that describe their inter-relations. Examples of nodes include
information entities residing in images, text, and sensors, concepts such as threats and
vulnerabilities, events such as attacks, and locations. Links have multiple types that specify
semantic relations between objects. For example, links might indicate acquisitions,
communications, or command chains. There are two main challenges in assimilating signal data
into the information network. First, the same physical object, concept or event is often sensed or
reported by multiple sensors from different perspectives, different degrees of reliability, and
possibly different veracity. This creates data which must be collected, consolidated and linked
with each other at different levels of abstraction. In turn, such linkage can aid identification and
maintenance of individual information objects, for example by helping prioritize further data
acquisition needs. Second, sensors may generate a large amount of streaming data that needs to
be summarized for known specific applications as well as for generic unknown applications that
may be required in future. Feedback from the information network into the data fusion system
could improve situation awareness and resource efficiency of fusion by maintaining and sharing
an up-to-date representation of key measured or inferred features of the environment, guide the
degree of summarization, determine and reduce uncertainty, and help compute the most efficient
sensing resource allocation.

Key Research Questions


How to fuse the sensor feeds from physical sources in a manner cognizant of quality of
information needs, resource constraints, and data uncertainty? How to utilize information
network feedback? How to analyze timeliness of real-time information fusion?

Hypothesis
This task will test the hypothesis that the resource requirements of data fusion and the value of
information resulting from data fusion can be significantly improved by exploiting feedback from
the information network regarding object uncertainty, data linkages, and models of sensed
phenomena. This improvement is with respect to techniques that do not use feedback from the
information network.

Prior Work

NS CTA IPP v1.4 7-7 March 17, 2010


The problem of sensor stream fusion has been studied extensively in prior work (see [Agg2007]
for a detailed survey). We have also studied this problem in the military domain; especially in
the context of battlefield awareness and tracking [Abdel2004, Luo2006]. Our proposed research
will differ from prior work in its ability to exploit the information network for prioritizing needs
for further data extraction and for guiding data summarization. In turn, this exploitation will
improve the quality of information attained, bound the degree of uncertainty, and reduce
resources used in the process of extracting and constructing semantically meaningful information
objects from the underlying sensor streams. We refer to data fusion systems that are guided by
feedback from the information networks they help build as information distillation networks.

Proposed Work
In a joint vision with the CNARC, we aim to explore the concept of networks as information
sources, as opposed to networks as mere connectivity providers between end-points. In addition,
we close the loop between the physical sensing system and the information network in order to
improve information quality, bound the degree of uncertainty and reduce resource needs. An
information distillation network must integrate a large number of human and data sources of
different degrees of reliability, noise, uncertainty, resource cost, and veracity into higher-level
pieces of actionable information with a quantifiable level of quality and low uncertainty. Current
data fusion techniques are ―open loop‖ in that the logical information flow is one-directional
from sensors to fusion engines. We hypothesize that closing the loop by utilizing knowledge
extracted from the information network can significantly improve these techniques. There are
two aspects to consider in that regard; namely the quality of information (e.g., how reliable,
accurate, or uncertain it is) and the value of information (e.g., how much does the user care about
this information in view of other known information). For example, a determination that
something is a threat with high confidence makes the information of higher quality than the same
determination with low confidence. Furthermore, new information from ground sensors on tanks
in the vicinity might be of more value if the current knowledge about the tanks is low and the
tanks are in very close vicinity.To achieve the vision of information distillation networks, we
propose to investigate and develop three fundamental enabling components as discussed below.

The first component and outcome of this task is a suite of distributed algorithms, leveraging our
prior work on battlefield awareness and tracking [Abdel2004, Luo2006], to transform sensory
information feeds into a set of uniquely identifiable and addressable logical (information
network) objects with well-defined operationally-relevant semantics and a quantified degree of
uncertainty. Information objects thus formed could range from physical models of distributed
phenomena measured by the network to the representation of physical objects in the
environment. The aforementioned algorithms will employ feedback from fusion results back to
the data collection system to optimize the quality of collected information. The feedback
recognizes that data is only as valuable as its contribution to the quality of information objects.
Towards that end, we shall develop protocols for data collection and fusion that maximize the
quality of object representation, taking into account both the structure of the information network
and user-defined value of information on each object. Explicit consideration will be made to
resource constraints. Hence, an optimization problem will be solved whose objective is to select
the most appropriate sensors and sensor modalities for maximizing the global information value
of the network, given current resource constraints and degree of uncertainty of different sources.
Distributed and localized algorithms will be explored that take into account the communication

NS CTA IPP v1.4 7-8 March 17, 2010


model and bound sensor communication. The task will also be concerned with quantifying the
impact of resource constraints on feasible regions, defined as those state subspaces in which
fusion meets it end-to-end performance constraints. Of particular interest will be incorporating
end-to-end deadline constraints into the optimization of real-time sensor data processing and data
fusion.

The second main component and outcome of this task is an analytic framework for reasoning
about uncertainty propagation in the data fusion system, together with mechanisms that bound
such propagation. In particular, as sensory data are combined in the data fusion system,
inferences can be made about the degree to which individual data sources contribute to the
quality of information at the destination and to the error in information object representation.
These inferences can be used to reconfigure data collection such that error is reduced and quality
is improved. An analytic framework will be developed to offer a principled approach for such
reconfiguration.

The third component considers signal data sources such as infrared or vibration monitors that
generate a significant amount of the data that arrives in the form of streams. Since, the amount
of historical data may be too large to be held explicitly and may be received from a large number
of diverse sensors, we need to be able to develop companion techniques which can fuse the
information from multiple networked sensors, and construct compressed summaries for a wide
variety of information extraction applications. The logical information network linkages among
the networked sensors may be used during the summarization process. These summarization
techniques must be general enough to be reusable in a variety of scenarios. We will explore how
summaries such as sketches, histograms, wavelets and sampling [Agg2007] can be leveraged in
military applications. Effective techniques will be designed to archive and retrieve such
summaries.

Validation
Validation will proceed experimentally by comparing performance of data fusion algorithms in
terms of resources needed and value achieved in the case where fusion does not utilize object and
phenomena models gathered from the information network and in the case where it leverages
such information. The PIs are in possession of a number of simple sensors including light, sound,
magnetic, infrared (camera), motion, and acceleration sensors. In addition, a number of publicly
available sensor data sets are available, such as the UC Berkeley sensor data set. Experiments
will proceed on three steps. Initial comparison of the algorithms (in year 1) will be conducted in
simulation using common network simulators such as NS2 or equivalent that have the capability
to account for communication models and constraints. In later years, we will use models
provided by CNARC. Sensors will be abstracted by noisy detectors or noisy scalar sources. This
evaluation will lead to a general understanding of the strengths and limitations of the data fusion
algorithms considered as a function of abstract sensor models. Next, validation will be conducted
using data from the physical sensors mentioned above. These results will shed light on
performance characteristics of fusion, such as the quality of collected information and the degree
of resulting uncertainty, given specific sensors and targets. Finally, physical experiments will be
performed on (existing) laboratory testbeds comprised of multiple distributed sensors with
wireless interfaces. For the summarization portion, the PIs will use publicly available sensor
data sets. The PIs are further open to suggestions regarding sensing modalities to use in their

NS CTA IPP v1.4 7-9 March 17, 2010


studies, as well as the possibility of synthetically generating military-relevant data sets with
feedback from ARL.

Products

The research from this effort will result in the following: (1) Methods for creating semantically
linked objects from sensor streams (2) Analytic and performance results on information quality
and uncertainty that quantify the end-to-end behavior of fusion that takes information network
linkages into account (3) Methods for stream summarization (4) Experimental results validating
the hypothesis on usefulness of exploiting the information network in fusion (5) Research reports
or published papers which describe the afore-mentioned items.

Linkages with other projects


This work ties directly with CNARC tasks for collecting and interpreting data as well as macro-
programming (the latter task of CNARC has been deferred to the second year). The former
linkage is obvious since the ability of the communication network to determine the quality and
value of information depends on feedback from the information network and the output of the
optimization problem described above. Our collaborator on this task will be Ramesh Govindan
from CNARC. The latter linkage is because the abstraction of a network as a source of high-
value information objects lends itself nicely to a new network programming paradigm, called
environmentally-immersive programming [Abdel2004, Luo2006], that allows the user to
manipulate such objects directly, as opposed to programming low-level communication patterns.
The operators‘ view of the system is thus made out of logical objects (entities in the information
network) representing the key relevant elements of the battlespace (e.g., soldiers, assets, fires,
models, locations, contamination levels, activities, and threats), each encapsulating its own data
and state. Values may be attributed to different object types. The joint exploitation of both
sensor feeds and information network feedback to maximize the value of information objects
derived from the network sets this work apart from prior attempts at providing better
communication channels and sensor network abstractions of the physical world [Li2004,
Liu2003A, Liu2003B, Mad2005].

7.1.7 Task I1.2 Human and Visual Data Fusion (T. Huang, UIUC (INARC); B. S.
Manjunath, UCSB (INARC); H. Ji, CUNY (INARC); T. Höllerer, UCSB
(INARC); C. Lin, IBM (SCNARC); A. Pentland, MIT (SCNARC); Z. Wen,
IBM (SCNARC))

Task Overview
This task studies the fusion of multimodal data (with a special focus on visual and human data)
in the context of information network ontology and linkage structure. The core idea is to utilize
the virtual linkages in the information network during the fusion process. Some examples of
such fusion techniques are based on the following kinds of virtual linkages: (1) Co-reference in
multiple documents (2) Text linked by images and video (sensor) feeds (3) Links between
different sensors implicit in a distributed network such as a camera sensor network (4) Network
links between different kinds of entities; for example a document may link to an image in an
information network environment, and therefore provide hints for effective fusion and inferences
which may be derived from such a linkage.

NS CTA IPP v1.4 7-10 March 17, 2010


Task Motivation
The ability of the US Army to monitor and fuse large amounts of heterogeneous data in sensors,
images, text, video, and audio is critical for maintaining situational awareness on the battlefield.
It is also extremely important to quickly extract compact representations of relevant information
from large amounts of heterogeneous data, in which the relationships may be inferred only from
the virtual linkages in the information network. The goal of this task is to enable the fusion of
different kinds of heterogeneous data. In a classic information network, links are used to
represent relationships between subjects in the network. For example, in WWW, a linkage
between two web documents is denoted by a hyperlink. In a sensor network, a linkage usually
represents whether two sensors directly exchange information in the network. Such linkages
play an important role for network analysis but are often limited to a single type of information.
In this project, we propose to enrich the network analysis with virtual linkages embedded
inherently in different kinds of data. The first kind of virtual linkage is constructed from a visual
similarity graph among different kinds of visual objects. This similarity graph may be
considered a small information network, and is derived from the underlying feature descriptors.
Virtual linkages can also be used to analyze the web-scale image and surveillance sensor
network. For example, when we see images of two web documents containing the same object,
we can raise our confidence that such documents are semantically correlated even if the two web
documents are from different sources. Similarly, if the same subject appears twice in the sensor
network, we can estimate various aspects of the subject more accurately.

Key Research Question


How to fuse multimodal data such as text, audio and video with the use of critical information
network links? How to derive logical links between information network sources?

Hypothesis
Inference methods which utilize network ontology and links during the fusion process of
multimodal data are significantly superior to conventional inference methods.

Prior Work
The problem of fusion in the context of multimedia data has been widely studied; see the survey
reference [Wu2004] for a detailed study of such techniques. However, these methods construct
fusion tools in isolation, and do not take into account the rich information which is available in
the virtual linkages of an information network. This task studies the problem of multimedia
fusion in the context of the virtual linkages of an information network; a scenario in which the
rich linkage information can be leveraged for fusion and knowledge discovery.

Proposed Work
Information networks are often involved with multiple types of data. How to best combine these
features is a challenging problem. The fusion problem becomes more important when we
introduce virtual network linkages as discussed above. We have proposed a probabilistic
approach to model the subject relationships using heterogeneous kinds of data sources in
[Cao2008], and show that the model with heterogeneous data significantly outperforms models
from uniform data. However the approach in [Cao2008] is limited since it requires that the
nodes in the network only be of two states (0 or 1). We are working on a more complex model

NS CTA IPP v1.4 7-11 March 17, 2010


named Heterogeneous Feature Machines [Cao2009A], and plan to apply it to analysis of data in
large networks. There are a number of natural tasks, which are naturally suited to leverage
network information. An example is the design of methods for image ranking and
recommendation. Given an association network, the most popular image is the one with largest
amount of connections. The broad idea is to efficiently recommend and summarize the
representative images based on the association network. Such association network can easily
incorporate heterogeneous information, including visual similarity, user votes, and physical
relationship.

We will also design methods for fusing human data (in the form of text, lectures, oral histories,
speech, images, and audio sensor feeds) from multiple networked sources with the use of
linkage information. As possible scenarios, we build the information networks using (a) web-
scale multimedia data, especially news and blogs on the web, and (b) using data from a camera
sensor network augmented by human observer‘s recordings. At the core of human data fusion lie
techniques to identify ‗facts‘ (entities, relations, and events) of a particular type within different
kinds of media, such as documents, images, sensor information or video, which are subsequently
converted into structured representations (e.g., databases). Most current information extraction
(IE) systems focus on processing one source at a time. This is not well suited to large
information networks containing many disconnected, unranked, redundant (and some erroneous)
facts. A related challenge is that the information network associated with images or videos often
contain unlabeled data. Some unlabeled samples contain distinguished characters such as
familiar faces and popular activities, which can be recognized by general visual understanding
systems. The other samples, however, are difficult to interpret. We will leverage a label
propagation approach based on the association network, which is built according to the similarity
between visual subjects. To handle the missing labels, our approach first annotates the visual
subjects with distinguished characteristics, and then propagates the labels to the other ―hard‖
samples according to association network. We believe this approach has potentials in general
network interpretation tasks and will keep working in this direction. When we combine the
information from images and their associated text (e.g. meta-data, captions, surrounding text,
transcription), one of the challenges lies in the uncertainty of text representation. The
descriptions are usually generated by humans and thus are prone to error. The images, especially
the web images, are typically labeled by different users in different languages and cultural
backgrounds. It is unrealistic to expect descriptions to be consistent. Without rich and accurate
descriptions, information network images cannot be searched or processed correctly. We
propose the following

(1)We will design methods for integrating such heterogeneous data into a canonical document
representation graph that integrates data coming from heterogeneous formats and media. By
mining the connections (or virtual links) between these meta-data and their models, it is possible
to obtain a unifying coherent representation of the structured network. This representation is
designed such that it is able to accommodate, for every supported document format, enough
information to allow an inference algorithm to run. Furthermore, a typical information
distillation system primarily uses prior knowledge, which is not updated during the extraction
process. We will take a broader view by exploiting posterior knowledge derived from related
documents and other data available on the entire information network. The underlying
philosophy for such integrated information networks is to leverage the redundancy information

NS CTA IPP v1.4 7-12 March 17, 2010


between data of different types [Dow2005,Ji2008]. As is well known, the redundancy
information can boost the inference ability of the system and be more resistant to the noise in the
network. On the other hand, by mining the redundancy, the missing or damaged data node in the
network can be recovered, especially in extreme environments such as battlefields.

We will also leverage an existing approach based on Markov Logical Networks [Dom2008] to
capture the global inference rules. Such a statistical relational learning approach will provide a
more unified framework to combine the power of both uncertainty (global confidence metrics)
and complex relational structure (interactions among different events, arguments and roles).

Exploiting this approach will also provide greater flexibility to encode probabilistic graphical
relations besides first-order logic, and thus allow us to fuse dynamic background knowledge as
required to effectively interpret a multimedia document in a more holistic way in the context of
the information network. Besides written texts, ever-increasing human generated data is
available as speech recordings, including news broadcasts, meetings, debates, lectures, hearings,
oral histories and webcasts. This new medium involves many difficulties related to its variability
in terms of quality, environment, speaker and language. In this project, we will attempt a novel
approach of linking diverse extracted facts from the information network as feedback to enhance
the fusion process. Our approach is not restricted to multiple document groups, but also to cross-
media fusion.

For the camera sensor network scenario, we will use the extensive network data available at
UCSB. This unique infrastructure has a large number of video cameras, both static and mobile,
distributed across the campus and surrounding areas. For this project, we will augment the
camera network data with human observers who provide verbal information that may cover
areas not covered by the camera network. On the analysis side, one of the primary challenges is
to discover the relationships between the non-visual data and the visual information.
Specifically, how the visual data can be used to provide a summarization similar to the narrated
descriptions and, if available, how verbal annotations can help the visual recognition process.
Another challenge is to sense the anticipated and unanticipated abnormal behavior occuring in
the camera network based on the historical information available through other network sources
including the past camera data itself. For example, these could include prior network information
about possible motion patterns at different times of the day/week at specified locations in the
network.

Unfortunately, automatically inferred information is never perfect as there are many factors that
determine the quality of information in visual data. Starting with the image acquisition process
and accumulating uncertainties all the way to the decision engines. Modeling and quantifying
these degradations is especially important in the networked environment, where the information
nodes need to know how trustworthy the information coming from other nodes are. In the camera
network setting, we will model the accuracy of our algorithms with respect to time dependent
factors such as outside lighting, crowd conditions and traffic patterns. This approach to modeling
information quality in time is well aligned to our goal of temporal integration of the sources. In
addition modeling the quality of the data through several variables will help the robustness of the
proposed query as well as data synopsis interfaces for handling uncertainty in Task I1.3.

NS CTA IPP v1.4 7-13 March 17, 2010


For visualization and sensemaking of the obtained data, we will link up the output of the
distributed camera network to compute servers feeding the UCSB Allosphere, our three-story
immersive visualization chamber. Our plan is to fit feeds from static and mobile cameras to
aerial photographs, surround-view panoramas and, if available, 3D reconstructions (acquired and
constructed off-line) and to generate a live navigable overview of the information gathered.

(2) Modeling and reducing the uncertainty in information networks is critical in practical
problems. In essence, the data uncertainty in these information networks usually comes from
noises caused by incorrect labeling of linkages due to human errors or subjectivity
[Weinberger2008]. Especially, in collaborative annotation tasks, people join in labeling the
linkages between cross-media data for training examples. They annotate them based on
individual subjectivity, which may vary largely among different people due to their different
education and culture backgrounds, knowledge and life experiences. To reduce the uncertainty
of their annotations, some quality control procedures are required. Redundant information in
annotations among different users can be used to improve the annotation quality. In the case of
collaborative annotation, each linkage is often annotated by different users simultaneously. By
mining this redundancy among these simultaneous annotations, the uncertainty can be reduced.
A direct way is to vote among these annotations. For example, some game-based human
annotation systems have been developed for collaborative annotating the cross-media linkages
[Ho2009]. It designs Games with A Purpose (GWAP) [Ahn2006] to attract users to actively find
consensual annotations in the game so that the annotation quality can be guaranteed. Such
system has successfully generated large amounts of linkage annotations with high quality
through the information network.

Beyond leveraging the redundancy information in to design collaborative annotations interfaces,


statistical models can be built to analyze the data structures underlying the redundancy. For
example, [Weinberger2008] proposed a probabilistic framework to resolve ambiguous tags
which are likely to occur but appear in different context with the help of human effort. In this
proposal, to reduce the noise-induced uncertainty in the linkage data, we proposed to use a noise
minimization algorithm to recovery the noise-free data by the redundant information in the cross-
media linkages between different modalities [Qi2010]. In detail, it assumes that in an ideal case
where no uncertainty exists, the noise-free linkages often have redundant structure, i.e., they are
not independent but correlate with each other. This observation on the (linear) dependent
structures in the linked data reveals these linkages have a corresponding low-rank structure if
they are equivalently represented in matrix or tensor forms. In our previous work [Qi2010], we
showed how this low-rank prior is used to reduce the uncertainty in the cross-media linkages
between image and text data. This idea can be extended to be applied in environments with more
complicated information networks. For example, besides the cross-media linkages between
image and text data, more entities can be involved into the linked network, which perhaps ranges
from web links, human users and even the social structures associated with them. Due to the
limited abilities and errors in data acquisition procedure, we cannot assume the acquired
information is free of noises. By formulating these linked data in compact forms (e.g., matrix
and tensor) and applying proper biased prior, the uncertainty can be reasonably reduced whose
outputs can be used for future data processing and analysis. To move further, we also noted that
the uncertainty reduction and the further data analysis such as recognition and prediction can
even be completed in a unifying framework. As in [Qi2010], the concept indexing is integrated

NS CTA IPP v1.4 7-14 March 17, 2010


in reducing the noise-induced uncertainty by mining the cross-media linkages. It makes it
possible to unify processing and analyzing the linked networks, both theoretically and
technically.

Validation
Since the linkage based approach uses two scenarios corresponding to web-scale linked data and
the camera sensor network data, we will design validation methods for both scenarios. For
multiple data source fusion in the first scenario, we will build a collection from the World Wide
Web by harvesting large amounts of media relevant to an army-related domain (e.g., vehicles).
Specifically, the data will consist of text, images, and images with captions as well as other text
that links to the source documents. We will compare the performance of the algorithms to
unbiased human annotators. We will also compare our techniques to conventional methods
which do not use network-based linkage structure in the fusion process. This method will yield
qualitative and quantitative assessment for the capabilities of the proposed methods. For the
combined representation of visual and text data we will work closely with project I2 to identify
the opportunities of developing fusion and dissemination methods that are optimally beneficial to
the remainder of the information network. For the distributed camera network case we will focus
on specific activity scenarios wherein the activities cannot be recognized in any single camera
event but can be detected collectively in the network-sense, and optimal ways of fusing human
observer data with sensor network data. The data will include both indoor and outdoor activities,
and in both cases, analysis and reasoning need to happen across the network, and would present
challenges in fusing the information between the nodes in the network that may have non-
overlapping visual fields. We will develop protocols for comparing the automated analysis with
human interactive analysis. The PIs will also work with ARL to identify military relevant data
sets in order to test the effectiveness of these techniques.

Products
The research from this effort will result in the following: (1) Methods for data fusion of
heterogeneous data sources; (2) Data collected from the camera sensor effort; (3) Experimental
results validating the above techniques; (4) Research reports or published papers which describe
the afore-mentioned items.

Linkages to Other Projects


The visualization portions of this project link up directly to Project I2. In coordination with
project I2, we will work towards the most suitable visualizations/interfaces to present gathered
information in the Allosphere and on mobile platforms.
The context of network research creates several interesting linkages to social network research,
specifically projects S1.3 and S2.1 of SCNARC, which are concerned with uncovering the
structures of social networks and social networks of the adversaries respectively. By analyzing
occurrence patterns of individuals over time in the various media, our research can uncover
social links that are not explicitly defined within the formal social network. For example, if two
entities occur in images from the same collection as well as spatio-temporally related videos
from the sensor network, it follows that the two individuals are related to one another. The idea
is that such logical linkages across different media translate to links in the information network,
which can be utilized for data fusion. In this task, we will explore methods for determining

NS CTA IPP v1.4 7-15 March 17, 2010


linkages in such scenarios. Our collaborators in this effort will be Alex Pentland and Ching-
yung Lin from IBM.

7.1.8 Task I1.3 Modeling Uncertainty for Quality-of-Information Awareness in


Heterogeneous Information Network Sources (H. Ji, CUNY (INARC); C.
Aggarwal, IBM (INARC); D. Roth, UIUC (INARC); A. Singh, UCSB
(INARC))

Task Overview
The task studies the uncertainty which results from the fusion of different kinds of networked
data. The uncertainty is analyzed in the context of network linkage issues such as co-reference
between objects.

Task Motivation
The ability to handle heterogeneous data sources, many of which are unstructured—text or
images—in a networked environment provides unparalleled challenges and opportunities for
improved decision making. Data can be noisy, incorrect, or misleading. Unstructured data,
mostly text, is difficult to interpret. In a large, diverse, and interconnected system, it is difficult
to assure accuracy or even coherence among the data sources. The fusion process may itself
cause data uncertainties, which may be challenging both from a modeling and usage perspective.
Our underlying premise is that uncertainties in these scenarios must be explicitly accounted for
to achieve truly significant advances in network sciences. The need to handle such uncertainties
in the context of the structural framework of a massive information network, and support
querying, search, retrieval, mining methods over such new structures is a highly critical
requirement. Our vision is to build on these techniques to develop a theory of integrating
information from various sources into an information network, and further to model the use of all
available information from the information network. The goal of our proposed research is to
study both how to learn good models from different information network sources with different
kinds of associated uncertainty, and how to make use of these, along with their level of
uncertainty in supporting coherent decisions, taking into account characteristics of the data as
well as of its source. We propose a model centered on extended joint inference over learned,
discriminative or generative models over the entire information network, where the level of
uncertainty is represented explicitly, and inference is done within a constrained optimization
framework. This allows us to generate multiple models for information network data sources,
take into account their level and type of uncertainty, and combine and propagate coherent
decisions that respects domain and tasks specific constraints. Such an approach is more
appropriate to the information network domain.

Key Research Questions


How to model the uncertainty resulting from fusion of different kinds of networked data? How to
resolve linkage issues such as co-reference? How to reduce the uncertainty resulting from fusing
massive volumes of uncertain data from multiple networked sources?

Hypothesis

NS CTA IPP v1.4 7-16 March 17, 2010


We hypothesize that a constrained optimization framework, which allows us to generate multiple
models for information network data sources, takes into account their level and type of
uncertainty, and combines and propagates coherent decisions that respect domain- and task-
specific constraints is one which is more appropriate than one which is based purely upon
confidence estimation metrics.

Prior Work
The field of information theory provides fundamental measures to quantify and explicitly
account for uncertainties in data and provides the corresponding sound means to formulate
performance criteria for optimal algorithm development, as well as overall performance bounds
[Cover1991]. The database community has also studied the concept of uncertainty and its
representation in the form of probabilistic databases [Agg2009]. Multiple systems have
addressed the problem in traditional databases in some context [Corm2009,Jag2008, Jag2004,
Dalvi2007, Suc2004, Jam2008, Benj2006, Sen2007].

Proposed Work
We will conduct research on the design of statistical or machine learning approaches for
determining correctness of information extraction output. We will adapt the node centrality
problem in graph theory to our global confidence estimation research. Our basic underlying
hypothesis is that the salience of an entity should be calculated by taking into consideration both
its confidence and the confidence of other entities connected to it, which is inspired by PageRank
and LexRank. This implies that the confidence of an entity should be calculated by examining
the entire information network surrounding that particular node. In this way we intend to explore
more than each individual co-reference or relation link, and also analyze the entities that cast the
vote. For example, a vote by linked entities which are highly voted on by other entities is more
valuable than a vote from unlinked entities. This is the essence of an approach which works with
the page-rank paradigm. These methods will also be integrated into the social/cognitive network
paradigm in which users may provide feedback during search and monitoring.

In addition, text and image processing methods are typically organized as a pipeline architecture
of processing stages (e.g. from pattern recognition, to information fusion, and to summarization).
Each of these stages has been studied separately and quite intensively over the past decade.
There has clearly been a great deal of progress on some of these components. However, the
output of each stage is chosen locally and passed to the next step, and there is no feedback from
later stages to earlier ones. Although this makes the systems comparatively easy to assemble, it
comes at a high price: errors accumulate as information progresses through the pipeline, and an
error once made cannot be corrected. There is little work of using logic to model the
interpretation of facts. Classical logical inference, however, is unable to deal with the
combinations of disparate, conflicting, uncertain evidence that shape such events in discourse.
We intend to move away from approaches that make chains of independent local decisions, and
instead toward methods that make multiple decisions jointly using global information. We
propose to address this by combining logical inference with probabilistic methods. We will focus
on extraction of facts with the following property: they are neither yes nor no, but they convey
information that can be used to infer such a fact with some degree of confidence, though often
not with enough confidence to count as resolving. We will develop techniques for improving the
extraction performance of multi-media data by explicitly modeling the errors in the extraction

NS CTA IPP v1.4 7-17 March 17, 2010


output. Therefore we plan to transform the integration of text and image data sources into a
benefit by using the joint inference between them to reduce the errors in individual stages. In
doing so, we will take advantage (among other properties) of the coherence of a discourse: that a
correct analysis of a text discourse reveals a large number of connections from the image
information in its context, and so (in general) a more tightly connected analysis is more likely to
be correct. More specifically, we shall apply supervised re-ranking techniques, including the p-
norm push ranking algorithm introduced in our prior work (Ji et al., 2006a), to enhance the
performance of extraction components based on generative models.

We shall further extend the aforementioned techniques to the integration of a large number of
inaccurate and possibly inconsistent physical sensing sources. Physical signals obey physical
laws of nature that are given by known models (of unknown parameters). When multiple sensors
observe overlapping phenomena, these physical models present constraints on possible relations
between correct data values. An iterative algorithm can then be executed where the collected
sensor values of different degrees of confidence help quantify the parameters of a physical
model, which then in turn helps quantify the confidence in individual data sources. By estimating
the veracity of the individual sources in view of both known physical data models and estimated
linkages between data flows, the approach can further be used to reconfigure the data collection
system as described in Task I1.1 to improve quality of information.

As the volumes of uncertain data increase, users are forced to become more reliant on data
exploration techniques to identify the interesting portions of their data. Unfortunately, query
processing using the intuitive all possible worlds semantics, in these systems has proven to be a
computationally difficult task and is shown to be #p-complete in the general case [Suc2004].
Due to this limitation, accurately approximating query results (and their associated probabilities)
is an important problem. We aim to design methods for building a compact data synopsis, which
is capable providing approximate answers to simple count queries over probabilistic databases.
This will provide a mechanism which allows users to quickly explore large uncertain datasets by
circumventing the standard query processing engine. In practice, it is often the case that there
are multiple tables which need to be compressed and only a global space budget is provided. A
simple approach would be to distribute available space evenly across all of the tables. However,
this can result in wasted resources as it may be possible to represent some data accurately with
less space than others. Additionally, the goal in this scenario should be to minimize error in a
global manner, over the set of all representations, not just locally for each dataset.

In the proposed work, we address this problem and provide an optimal solution which is
independent of the method used to summarize each individual data source. Specifically, given a
global space budget, B, and a set of data sources, S, we compute an optimal space allocation in
which the global L2 error is minimized. We will consider two algorithms to compute an optimal
space allocation. The first is a dynamic programming approach which works without restriction.
The second is a local update method which is faster, but requires that the error function of the
approximation method is strictly decreasing with respect to the amount of space allocated to a
signal.

For tuples with large domains, the typical techniques for representing distributions (ie.
histograms) become costly to compute and may induce large errors. Additionally, although

NS CTA IPP v1.4 7-18 March 17, 2010


sometimes the data collected may only provide discrete data points, there is often an implicit
smoothness in the distribution which one would like to model. Although current systems may be
able to handle continuous data, most do so by discretizing it. However, if we can take advantage
of this continuity and smoothness, we may be able to represent these distributions more
compactly and compute the probability mass in a given range more efficiently. At a more
specific level, our basic problem is as follows. Given a collection of probabilistic attribute values
and a space budget B, construct a synopsis which minimizes the maximum error and allows for
quick selection and range query frequency distribution estimation with error bounds. We will
develop algorithms which, given a space budget and a set of signals to be approximated,
allocates space to each signal in order to minimize the global L∞ error. In our technique, we will
apply Chebyshev polynomials to the problem of compactly summarizing multiple probability
distributions. We chose Chebyshev polynomials due to their simplicity to compute, accuracy,
connection to the minimax polynomial for any function and because polynomials can be easily
integrated analytically. We will provide upper and lower bounds on the economization of a
function represented in Chebyshev space. This result is also applied to show that we can cluster
in Chebyshev space and still minimize the error in the signal space.

In the first year, we will develop techniques for summarizing the uncertainty in a single
component of a collection of objects. Beyond the initial year, we will extend these techniques to
summarizing the entirety of the collection, possible with uncertainty in multiple components. We
will also develop methods that can be compute these summaries in an online manner. Thus, we
will be able to consider any dynamic collection of uncertain objects and be able to summarize
them in a compact representation.

Validation
We will validate our techniques on a variety of real data sets which can be used in order to
simulate information networks and other forms of uncertain data which are associated with
information networks. We will compare the techniques proposed in this task to more
conventional methods and validate whether our approaches turn out to be superior. We will test
our methods on the following data sets.

1. NIST Automatic Content Extraction Program's 2002-2009 training corpora: These


include thousands of documents with facts annotated (entities, relations and events) in
different genres (broadcast conversation, broadcast news, newswire, news groups and
weblogs etc). We will use them to validate the confidence estimation methods in joint
inference modeling for the information network domain, and model the conflicts and
consistency of different facts in conjunction with the links and the structure of the overall
information network.
2. University of California at Irvine Repository: These include hundreds of data sets from a
variety of different domains, including noisy, and linked data. We will also design synthetic
techniques to introduce uncertainty into the data set, and test the corresponding algorithms
for the data sets.
3. Terrorists Data Base (<http://wits.nctc.gov/Export.do): validate our progress on
information extraction and the various co-reference resolution aspects.
4. Military Relevant Data Sets: We will work with ARL to explore the possibility of
generating synthetic data sets which are relevant to military scenarios.

NS CTA IPP v1.4 7-19 March 17, 2010


Products:

The research from this effort will result in the following: (1) Models for uncertainty estimation
from data fusion (2) Methods for processing and improving the uncertainty of the data obtained
from different sources (3) Experimental results validating the above (4) Research reports or
published papers which describe the afore-mentioned items.

Linkages to other Projects:

The research from this effort will allow us to summarize a collection of uncertain objects in a
compact representation. These compact representations can be used to develop models of
uncertain data for querying and mining (Project I2), and for assessing the QoI of data. The
framework for characterizing and reducing uncertainty will be essential for understanding QoI
and trust. Whenthere are conflicting data sources, the algorithms which have been developed in
this task allow us to estimate a confidence measure for the extracted information. In the Trust
CCRI, we will show how we will use the models developed from our information extraction
algorithms to develop a more comprehensive theory of distributed trust over the entire pipeline
of information processing.

7.1.9 Linkages Table to Other Projects/Centers

IPP Tasks Linkages


I1.1↔ C1.2 The QoI models can be developed only with the use of communication
network issues. Joint work required for modeling (Govindan)
I1.2 ↔ S1.3 The linkages between similar entities presented in network sensors can be
used to determine structure and evolution of networks (Pentland, C-Y Lin)
I1.2 ↔ S2.1 Spatio-temporal analysis of entities present in multimedia feeds to
determine the hidden connections (Pentland, C.-Y. Lin)
I1.3 ↔ T1 How does the impact of fusion QoI affect trust? (Collaborator: Mudhakar
Srivatsa)

7.1.10 Collaborations and Staff Rotations


In the first year, this project will build closer linkages to SCNARC and CNARC, as suggested by
the linkage table above. We will also collaborate with the trust project in order to transition our
results on uncertainty into an understanding on the impact on trust. The research scientist
provided by IBM to NS CTA will also collaborate with this project in order to analyze the effects
of lower level uncertainty on trust.

7.1.11 Relevance to US Military Visions/Network Science


Since information networks contain objects which are essentially networked entities connected
by links, it follows that one of the key parts of constructing an information network is to be able
to perform the lower level processing required in order to derive the objects, the virtual links
between them, and the inferences which are implicit in such links. This task essentially performs

NS CTA IPP v1.4 7-20 March 17, 2010


both the lower level processing as well as the fusion necessary in order to enable this. From a
military perspective, this will be a key part in being able to build the information network which
is needed in order to make higher level decisions. For example, sensor data which is collected in
the field needs to be logically linked with other data sources in order to create the information
network. As other projects (I2 and I3) will show, such an information network can be leveraged
for a variety of querying and retrieval tasks. The different tasks in this project focus on the
process of deriving these linkages as well as the derivation of inferences from a combination of
the different kinds of available data.

7.1.12 Relation to DoD and Industry Research


The field of information networks is relatively new; and the linkage structure inherent in
information networks provides new ideas and challenges which are not present in other research.
While much research has been performed in the area of data fusion, the context of information
networks provides a different perspective in which the fusion is performed in the context of the
linkages of the underlying network structure. The ability to use different sources in one
integrated (logically) networked view is a vision which distinguishes this program from other
DoD research. From an industrial perspective, there is some research on linked entities such as
the web and social networks; however information networks provide newer and much richer
abstraction which can be leveraged during information extraction and fusion. This work studies
such networked fusion problems on a massive scale; challenges which have not been discussed
in previous work.

References

[Abdel2004] T. Abdelzaher, et al. Envirotrack: Towards an environmental computing paradigm


for distributed sensor networks. ICDCS Conference, 2004.
[Agg2007] C. C. Aggarwal. Data Streams: Models and Algorithms. Springer, 2007.
[Agg2009] C. C. Aggarwal. Managing and Mining Uncertain Data. Springer, 2009.
[Ahn2006] Luis von Ahn. Games with a Purpose. IEEE Computer Magazine, 39(6):92-94, 2006.
[Cao2008] L. Cao, J. Luo, T.S. Huang, ―Annotating Photo Collections by Label Propagation
According to Multiple Similarity Cues‖, ACM Conf. Multimedia (ACM-MM), 2008.
[Cao2009A] L. Cao, J. Luo, F. Liang, and T. S. Huang, ―Heterogeneous Feature Machines for
Visual Recognition‖, IEEE International Conf. Computer Vision (ICCV), 2009.
[Cao2009B] L. Cao, J. Yu, J. Luo, and T.S. Huang, ―Enhancing Semantic and Geographic
Annotation of Web Images via Logistic Canonical Correlation Regression‖, ACM Conf. on
Multimedia (ACM-MM), 2009
[Cao2010] L. Cao, A. Del Pozo, X. Jin, J. Luo, J. Han and T. S. Huang, ―RankCompete:
Simultaneous Ranking and Clustering of Web Photos‖, submitted to International Conference on
World Wide Web (WWW) 2010.
[Corm2009] Graham Cormode and Minos Garofalakis. Histograms and wavelets on
probabilistic data. In ICDE, 2009.
[Cover1991] T. A. Cover and J. A. Thomas. Elements of information theory. 1991.

NS CTA IPP v1.4 7-21 March 17, 2010


[Dalvi2007] Nilesh Dalvi Christopher Re and Dan Suciu. Efficient top-k query evaluation on
probabilistic data. In Proceedings of ICDE, 2007.
[Dow2005] Doug Downey, Oren Etzioni, and Stephen Soderland. A probabilistic model of
redundancy in information extraction. In Proc. International Joint Conferences on Artificial
Intelligence (IJCAI 2005), 2005.
[Ho2009] Chien-Ju Ho, Tao-Hsuan Chang, Jong-Chuan Lee, Jane Yung-jen Hsu and Kuan-Ta
Chen. KissKissBan: a Competitive Human Computation Game for Image Annotation.
Proceedings of the ACM SIGKDD Worshop on Human Computation, Paris, France, 2009.
[Jag2004] H. V. Jagadish, Raymond T. Ng, Beng Chin Ooi, and Anthony K. H. Tung.
Itcompress: An iterative semantic compression algorithm. In the 20th International Conference
on Data Engineering, page 646, 2004.
[Jag2008] H. V. Jagadish, Jason Madar, and Raymond Ng. Semantic compression and pattern
extraction with fascicles. In VLDB, pages 186–197, 1999.
[Jam2008] R. Jampani, F. Xu, M. Wu, L.L. Perez, C. Jermaine, and P.J. Haas. Mcdb: a monte
carlo approach to managing uncertain data. In SIGMOD, 2008.
[Ji2008] Heng Ji and Ralph Grishman. Refining even extraction through cross-document
inference In Proc. the Annual Meeting of the Association of Computational Linguistics (ACL
2008), 2008.
[Li2004] S. Li, S. Son, and J. Stankovic. Event detection services using data service middleware
in distributed sensor networks. Telecommunication Systems, 26:351–368, 2004.
[Liu2003A] Jie Liu, Maurice Chu, Juan Liu, James Reich, and Feng Zhao. State-centric
programming for sensor-actuator network systems. IEEE Pervasive Computing, 2(4):50–62,
2003.
[Liu2003B] Juan Liu, Jie Liu, James Reich, Patrick Cheung, and Feng Zhao. Distributed group
management for track initiation and maintenance in target localization applications. In Proc. of
2nd workshop on Information Processing in Sensor Networks (IPSN, pages 113–128, 2003.
[Luo2006] Liqian Luo, Tarek F. Abdelzaher, Tian He, and John A. Stankovic. Envirosuite: An
environmentally immersive programming framework for sensor networks. Trans. on Embedded
Computing Sys., 5(3):543–576, 2006.
[Mad2005] Samuel R. Madden, Michael J. Franklin, Joseph M. Hellerstein, and Wei Hong.
Tinydb: an acquisitional query processing system for sensor networks. ACM Trans. Database
Syst. 30(1):122–173, 2005.
[Qi2010] Guo-Jun Qi, Yong Rui, and Thomas S. Huang. Low-Rank Sparse Transfer Indexing
by Mining Community-Contributed User Tags. Submitted to International Conference on
Knowledge Discovery and Data Mining, 2010.
[Weinberger2008] K. Weinberger, M. Slaney and R.V. Zwol. Resolving Tag Ambiguity.
Proceedings of International ACM Conference on Multimedia, 2008.
[Wu2004] Yi Wu, Edward.Y. Kevin Chen-Chuan Chang and John R. Smith. Optimal
multimodal fusion for multimedia data analysis. 12th annual ACM international conference on
Multimedia, 572-579, 2004.

NS CTA IPP v1.4 7-22 March 17, 2010


Project Research Milestones

Due Task Description


Formulation of problem of bounded error object representation
and summarization for fusing or compressing multiple
Q2 Task 1 distributed streams with logical information network links;
stream optimization for bounded error representation (UIUC,
IBM, CUNY)
Formulate the problem of associating visual data with key
concepts in the Information Network. Data sets to be used
include web data and data from camera sensor network, and
Q2 Task 2 including mobile cameras and human observer data. . (UIUC,
UCSB, CUNY)

Formulate problem of using node centrality and correlation


analysis on information network sources to increase information
gain (CUNY, IBM) Develop Chebychev approximation and
Q2 Task 3
clustering techniques from uncertain data tuples from multiple
sources. Begin evaluation of confidence estimation methods for
IE and Co-Ref (UCSB, UIUC, CUNY)
Algorithms for fusing multiple sensor streams with information
network linkages into single representation. End to end latency
Q3 Task 1
analysis. Analysis of methods for sensor data reduction. (UIUC
CUNY, IBM)
Develop inference algorithms and construct graphical
representations in the context of information network scenario of
Q3 Task 2 ON and distributed camera network data . (UIUC, UCSB,
CUNY)

Design technique to use node centrality and correlation analysis


on information network sources to increase information gain
(CUNY, IBM) Experiment with uncertain approximations for
Q3 Task 3 count, join queries, and joint inference methods and characterize
quality and performance Study a framework for error estimation
and bounds with a joint inference framework (UCSB, UIUC,
CUNY).
Multi-modal information value optimization subject to time and
resource constraints. Explore utility of the fused and compressed
Q4 Task 1
data from an information network perspective. (UIUC, CUNY,
IBM)

NS CTA IPP v1.4 7-23 March 17, 2010


Project Research Milestones

Due Task Description


Test effectiveness of algorithms developed in the context of a
practical scenario of IN for web and network data. Work with I3
Q4 Task 2 to design methods for recognizing specific spatio-temporal
entities, based on a collaboratively-defined common scenario.
(UIUC, UCSB, CUNY)
Determine methods and results for uncertain data approximation
Q4 Task 3 Develop error guarantees and improve joint inference methods
(UCSB, CUNY, UIUC).

Budget By Organization

Organization Government Funding ($) Cost Share ($)


CUNY (INARC) 101,193
IBM (INARC) 118,761
UCSB (INARC) 111,873
UIUC (INARC) 208,536
TOTAL 540,363

7.2 Project I2: Scalable, Human-Centric Information Network


Systems

Project Lead: X. Yan, UCSB


Email: xyan@cs.ucsb.edu, Phone: (805) 699-6018

Primary Research Staff Collaborators


C. Aggarwal, IBM (INARC) G. Cao, PSU (CNARC)
J. Han, UIUC (INARC) W. Gray, RPI (SCNARC)
T. Höllerer, UCSB (INARC) J. Hendler, RPI (IRC)
P. Pirolli, PARC (INARC) C. Lin, IBM (SCNARC)
A. Singh, UCSB (INARC)
X. Yan, UCSB (INARC)

NS CTA IPP v1.4 7-24 March 17, 2010


7.2.1 Project Summary/Research Issues Addressed

Project Motivation and Overview


An information network is a logical network of data, information, and knowledge objects that are
acquired and extracted from disparate sources such as geographical maps, satellite images,
sensors, text, audio, video etc., through devices ranging from hand-held GPS to high-
performance supercomputers. A sophisticated information network infrastructure should present
a human-centric, simple and intuitive interface that automatically scales according to the context,
information needs, and cognitive state of users, and maintain its integrity under uncertainties,
physical constraints (communication capacity and topology, power limitation, device
computation capability, etc.), and the evolving data underneath.
The situations and available resources differ dramatically for various military units in a full
combat mission – a field soldier has a different information view from a battalion commander.
Global headquarters, on the other hand, need to maintain a holistic view of the situation,
including views of all available assets, mission status, threats, and what-if scenarios. We will
consider the scalability of situation awareness in the design of information network systems so
that information and knowledge can be delivered timely and appropriately to our soldiers and
commanders. In this project, we aim to achieve two major research goals:
1. Organize and manage large heterogeneous information networks for scalable information
search and efficient knowledge discovery,
2. Design and evaluate a human-centric, intuitive, scalable interface that automatically takes
into account the situation and cognitive states of users, as well as physical constraints.

7.2.2 Key Research Questions


(1) How to organize, manage, and query information networks in a scalable manner
(2) How to analyze and visualize information networks to provide situation awareness to
end-users

7.2.3 Technical Approach


The above key research questions will take multiple years of research efforts for concept
formulations and innovative solutions. For the first year, we propose the following three tasks in
the initial program plan (IPP):
(Task I2.1) Information Network Organization and Management (Yan and Singh): This
project is to answer the key research question: ―How to develop a network query
language that enables flexible network information access and how to index information
networks for scalable access?” We will study appropriate data models for information
networks, investigate stochastic information dissemination and study graph indexing and
query languages for managing information networks with time and text.
(Task I2.2) Information Network Online Analytical Processing (Yan): The key research
question in this project is “How to perform multi-dimensional analysis that allows a
non-expert to explore information networks in real time? What are new fundamental
operators that could facilitate network-wise graph modeling and analytics?” We will

NS CTA IPP v1.4 7-25 March 17, 2010


design a multi-dimensional OLAP framework for information network analysis, study
new path-based similarity ranking of information objects, and develop graph aggregation
and association operators in information networks.
(Task I2.3) Information Network Visualization (Höllerer): The key research question in this
project is ―How to best visualize information networks for Army personnel in different
situations? What are the principles behind effective information network dissemination
on various interaction platforms and how is situation awareness achieved?” We will
collect representative sets of meaningful data and task scenarios, investigate scalable
graph visualizations, design a framework to capture and communicate information
network content tailored to a variety of users in a variety of context situations, and study
visualization techniques that work well on different platforms, from mobile to large-
scale situation rooms, studying trade-offs in these situations.

7.2.4 Task I2.1 Information Network Organization and Management (X. Yan, A.
Singh, UCSB (INARC); C. Aggarwal, IBM (INARC); G. Cao, PSU
(CNARC); J. Hendler, RPI (IRC))

Task Overview
An information network is a conceptual representation of not only the data which is explicitly
stored on the military network of distributed repositories, but also the implicit knowledge present
in various human-intelligence sources and public domains. A given query, for which results are
desired in real-time, may need to draw on any of these explicit and implicit information sources
for the most appropriate resolution. This task will study the science and principles behind
different data models and management mechanisms for heterogeneous network information
access and query answering.

Task Motivation
Military domain data and information is often widely spread across a disparate collection of data
sources. It is important to recover implicit links between nodes not only in one information
network, but also from multiple heterogeneous information networks. In addition, physical
constraints and human factors will significantly influence information networks. In network
centric military operations, for example, battlefield soldiers need a simple, intuitive query
interface for fast information access, while headquarter agents might rely on complicated
analytical tools and powerful machines to discover knowledge hidden deeply in an information
network. These differences pose great challenges for studying information network models,
languages and systems that allow situation-aware network information access.

Key Research Questions


How to develop a network language that enables flexible network information access?
How to index large graphs in order to support scalable access methods?
How to organize information for fast dissemination?

NS CTA IPP v1.4 7-26 March 17, 2010


Initial Hypothesis
Our information network management system consists of three novel network-centric
components: graph query language, graph index, and information dissemination strategy. These
components will facilitate network information access that is not well supported by the existing
relational database techniques in prior work. The system could also be used as a platform to
answer network queries related to QoI.

Prior Work
The Resource Description Framework (RDF) [Manola2004], e.g., used in Linked Data
[Bizer2008], has an abstract syntax that could represent network data. However, there is a lack
of formal semantics that can capture the two ubiquitous issues existing in most of military data
sources: uncertainty and trust. In this project, we are going to study how to augment/design new
data models with statistical measures that are suitable for information networks. There have
been a lot of studies on data cleaning, information integration, and trustability analysis, e.g.,
[Dasu2003, Raman2001, Bhattacharya2004, Culotta2007, Han2004, Udrea2007] to fuse data
together. However, they do not consider managing data from a network science point of view.
There is a lack of languages and systems to organize and manage network information.

Existing data models, query languages, and database systems do not offer adequate support for
the modeling, management, and querying of graph data. There are a number of reasons for
developing native graph-based data management systems. Consider expressiveness of queries:
we need query languages that manipulate graphs in their full generality. This means the ability
to define constraints (graph-structural and value) on nodes and edges not in an iterative one-
node-at-a-time manner but simultaneously on the entire object of interest. This also means the
ability to return a graph (or a set of graphs) as the result and not just a set of nodes. Another
need for native graph databases is prompted by efficiency considerations. There are heuristics
and indexing techniques that can be applied only if we operate in the domain of graphs.

A number of query languages have been proposed for graphs [Consens90, Guting94, Leser05,
Sheng99]. GraphQL is different from these languages in that graphs are taken as the basic units
in a fundamental way. Some of the recent interest in graph query languages has been spurred by
the Semantic Web and the accompanying SPARQL query language. This language works
primarily through a pattern which is a constraint on a single node. All possible matchings of the
pattern are returned from the graph database. A general graph query language should be more
powerful by providing primitives for expressing constraints on the entire result graph
simultaneously. Graph grammars have been used previously for modeling visual languages and
graph transformations in various domains. Our work is different in that our emphasis has been
on a query language and database implementations. Furthermore, GraphQL can be efficiently
implemented using graph specific optimizations.

Proposed Approaches
We will study the science behind different data models for appropriate information network
representation. In order to resolve queries in a comprehensive, accurate, and timely manner, we
propose to investigate innovative techniques for information fusion to link entities from

NS CTA IPP v1.4 7-27 March 17, 2010


disparate data sources together, and meta-information organization to map the different
categories of queries onto information sources properly. To support fast and flexible access to
information networks, we will develop an operational network which includes graph query
language, a means to index the information network, and a means to determine how to
disseminate information along logical links to its destination.
Information network management needs a query language to help users to access interconnected
information. Abstractly, a graph query takes a graph pattern as input, retrieves graphs from the
database which contain (or are similar to) the query pattern, and returns the retrieved graphs or
new graphs composed from the retrieved graphs. Examples of graph queries can be found in
various domains: 1) Find all occurrences of information exchange that follow a given branching
structure 2) Find all occurrences of information flow in which an event of a given type (or
occurring in a specified spatial region) led to an event of another type. An effective network
query language presents a simple and unified interface to all the graph related managing and
query requests across information networks. The same set of language primitives can also be
applied to communication networks, social networks, and the cross-network links between them.
In the first year, we will adapt the GraphQL query language [He2008] to the integrated collection
of three different networks and consider the temporal dimension. We will define the appropriate
sequence of operators for querying dynamic graphs. This will allow the specification of
temporally evolving patterns and dynamic causal patterns in graphs. Our initial approach will be
to extend the primitives of GraphQL with the dimension of time. For such a language to be
useful, it also has to be supported by an efficient implementation. In the second part of the thrust,
we will consider implementations of the language primitives. Techniques for reducing the
overall search space and optimizing the search order will be developed. As in relational
databases, we seek a logical separation between a query language and its implementation.

Information network management relies on a powerful indexing engine to speed up the


processing of graph queries and mining operations over complex networks. Information network
queries might access huge amounts of network links, exhibiting IO patterns that are never
encountered in traditional query processing engines like those in RDBMS. Thus, we have to
resort to new indexing methodologies for fast graph access. In the first year, we will design and
implement innovative graph indexing, to address growing need of processing graph queries in
large-scale information networks, e.g., find the aggregate value of an attribute for neighbors
within h-hops or find the link density in a neighborhood of a node. While these basic queries are
common in a wide range of information network search tasks, surprisingly they have not been
examined systematically in literature. Our preliminary study found two properties unique in
network space: First, the aggregate value for the neighboring nodes in a network should be
similar in most cases. Second, given the distribution of attribute values in nodes, it is possible to
estimate the upper-bound value of aggregates. These two properties shed new insights into novel
indexing techniques such as differential index. In addition to graph indexing that accelerates the
execution of graph queries, we will also study how to leverage distributed computing
environments such as MapReduce [Dean2004] to shorten graph query process time.

Information networks typically have information in different nodes which are logically linked
with each other. These logical links often directly lead to further dissemination. For example,
the links within blogs, social networks or information networks naturally lead to further
dissemination. A question naturally arises as which nodes provide the best representatives at

NS CTA IPP v1.4 7-28 March 17, 2010


which the information can be naturally transmitted to participants at other nodes. Clearly nodes
which have the best linkage structure are likely to be the best representatives; in other words, if
a node is linked to a lot of other informative nodes, it is also likely to be informative. Therefore,
the best authorities can be determined by only doing a probabilistic analysis of the structure of
the linkages and information transmission probabilities in the information network. In the first
year, we will investigate a stochastic information linkage analysis model, and use it to determine
the authoritative representatives in information network. Specifically, we will study a Bayes
probabilistic model in order to determine the most relevant set of information flow
representatives in the network. The idea in the Bayes model is to do a backward estimation of
the probabilities starting with a network in which the information is disseminated as widely as
possible. This backward estimation will then converge on a small set of representatives with the
highest probability. Our techniques derive their motivation from PageRank like concepts
[Brin1998]. While PageRank attempts to determine the steady-state probabilities in a forward
traversal, our methods determine the transient starting points which result in the most uniform
information spread in a small number of steps. We plan to investigate the effectiveness of this
approach for determination of the authoritative information representatives.

Validation Approach
The design of the proposed information network management system will be validated on a set
of public and synthetic datasets. We will perform comparative studies to evaluate the
effectiveness and efficiency of data access, with and without the information network system.
The validation should demonstrate significant improvement over the existing relational database
techniques on accessing information networks. We will first test our system using the following
datasets.
1. DBLP Information Network: The DBLP graph is downloaded from www.informatik.uni-
trier.de/~ley/db/. There are 684,911 distinct authors and more than 1.3 million publications.
The publications include venues, years, areas, etc., thus formulating a large special-topic
information network.
2. WebGraph. We downloaded a 9GB uk-2007-05 web graph data from http://webgraph.dsi.
unimi.it/ [Boldi2004]. This web graph is a collection of UK web sites. It contains 30,500,
000 nodes and 956,894,306 edges. The edges are directional in this case. This dataset could
be used to test the scalability of the proposed information network system.
3. Biological Networks. Functional gene networks integrate multiple interaction sources into a
single network. One common repository for interaction data sources is the BioGRID
database. Different interaction types from high-throughput experiments can be assigned a
confidence value and combined together.
4. Temporal datasets: Dynamic graph datasets will be crucial to some of the research. Such
datasets can be obtained from blog networks.

These publically available network datasets might contain much more information for testing
than would be available in a military relevant environment. Working with ARL researchers, we
will adopt two approaches to alleviate this issue. First, we could generate synthetic networks
from these datasets by injecting noise or reducing data points so that they behave similarly with
military data. Second, we will collaborate with military researchers to obtain more extensive
datasets for validation.

NS CTA IPP v1.4 7-29 March 17, 2010


Summary of Military Relevance
Military operations usually involve limited training exemplars. It becomes important to augment
data with linkages to other data sources. Using the proposed information network system, we are
able to provide enriched, linked data and scalable information access to soldiers and
commanders.

Products
(i) technical reports on research issues in information network organization and management,
(ii) scalable algorithms and frameworks generated from the proposed studies, and (iii) research
papers and submissions to international conferences and journals.

Collaborations with other projects and other network centers:


1. We will jointly work with Project I1 to derive models for quality of information resulting
from information fusion.
2. We will work with Dr. Guohong Cao at CNARC for information network placement
including caching so that the network is more accessible for different users.
3. Time varying aspects of networks will be studied in EDIN Tasks 2.2, 2.3 and 4.2. The
research there will provide the kinds of queries that will be useful for integrated networks.

7.2.5 Task I2.2 Information Network Online Analytical Processing (X. Yan, UCSB
(INARC); J. Han, UIUC (INARC); C. Lin, IBM (SCNARC))
Task Overview
Given an information network with nodes and edges associated with multiple attributes, a
multidimensional network model can be built to help military operators to perform on-line
analysis over the network. In this case, networks can be generalized or specialized dynamically
for any portions of the data. This model could provide multiple, versatile views of information
networks. In this task, we will study the mechanisms of online analytical processing (OLAP) in
complex information networks.

Task Motivation
To reduce information overload and provide real-time responses to military users, the
exploration of information networks will be done only through the slices of data that are
relevant to a user‘s current mission and situations. For example, a battlefield information
network may be formed by nodes representing commanders, soldiers, tanks, supporting units,
and enemy units. A commander in the field may like to roll-up the network to see how different
battalions are spatially related and are changing in the entire battlefield, or drill-down to check a
particular spot or soldier to see if re-enforcement is needed when the enemy is approaching.
Unfortunately, the lack of a general analytical model makes such sensible navigation and human
comprehension virtually impossible in an environment with complex information networks.

Key Research Questions


How to perform multi-dimensional analytics that allows a non-expert to explore information

NS CTA IPP v1.4 7-30 March 17, 2010


networks in real time?
What are new fundamental operators that could facilitate network-wise graph modeling and
mining?

Initial Hypothesis
It is feasible to provide an information network analytical framework that allows a non-expert to
perform hierarchical, multi-dimensional exploration of information network in real time.

Prior Work
OnLine Analytical Processing (OLAP) concepts have been popular in industry for multi-
dimensional analysis of relational databases and data warehouses [Gray97, Chaudhuri97].
OLAP relies on pre-computed summaries for multi-dimensional data in order to provide fast
responses to flexible drill-down/roll-up styled queries for online data analysis. Unfortunately,
the OLAP framework is not available in the context of networks.
Proposed Approaches
In this task, we will study the mechanisms of OLAP in complex networks, and develop a
novel information network OLAP framework. Conceptually, OLAP on informational
dimensions is similar to overlaying multiple information networks without changing the
granularity of the network, e.g., ―merging‖ the coauthor networks of multiple years and/or
multiple conferences into one. On the other hand, OLAP on topological dimensions is similar
to the zooming out/in of information networks, which merges a set of nodes into one (thus hides
its internal structure) or splits one node into many (thus discloses its internal structure). In this
sense, we distinguish two types of OLAP on information networks: (i) informational OLAP
(i.e., I-OLAP), that drills along informational dimensions; and (ii) topological OLAP (i.e., T-
OLAP) that drills along topological dimensions. In the first year, we will develop a multi-
dimensional OLAP framework. It will be accomplished by finding suitable techniques to
summarize the graphs by determining how to identify the salient nodes via ranking and how to
calculate localized graph properties in an accurate manner.

First, heterogeneous information networks involving different types of objects are becoming
ubiquitous in military operations. How to define similarity among objects using different
relations in heterogeneous information networks and how to efficiently return top-k most similar
objects given a query accordingly are challenging research problems. In the first year, we will
examine link-based similarity definition in homogeneous networks such as Personalized
PageRank and SimRank, and then develop entity similarity operators in heterogeneous networks
where objects belong to different types with different semantic meaning, and with different size.
We will formalize an intuitive similarity definition based on path schemas given by users, i.e., a
user can specify the relation and their orders to decide the similarity among objects in a
network. Multiple path schemas can also be selected to calculate the combined similarity
results. Since path schemas can be arbitrarily given, on the one hand, we can not fully
materialize all the possible similarity results given different path schemas and their
combinations; on the other hand, online calculation for queries involves matrix multiplications,

NS CTA IPP v1.4 7-31 March 17, 2010


which is unacceptable for real applications. We will study efficient solutions to make a trade-off
between materialization and online computation.

Second, to gain a deep understanding of the structures and functions of a complex information
network, it is fundamental to investigate various properties of the network and its constituent
components, i.e., nodes, edges, sub-networks, and their associated features and attributes.
Different kinds of network analysis have been proposed and conducted over the past decades,
offering useful insights into a great number of network data, e.g., small world phenomena
[Watts1998, Barabasi2003] and power-law degree distribution [Faloutsos1999]. While these
global properties provide important observations of the real-world networks as a whole, there is
a growing need of introducing new graph operators to search local structures that characterize
objects and their neighbors in an information network. For example, ―evaluate the reliability of
an information point via its related information sources‖, ―find alert associations in intrusion
networks, which could correspond to multi-step intrusions‖, and ―discover semantic relationship
between nodes and generate hierarchies for OLAP.‖ The discovered knowledge for hierarchies
or semantic relationships can be used for OLAP. In this project, we will first study the
emerging needs for graph aggregation and association in information networks and then develop
scalable implementation for these two new graph OLAP operators. We are going to formulate
the problem of association mining as a probabilistic ranking problem, and propose a time-
constrained probabilistic factor graph (TPFG) model to model the dynamic information
network. Furthermore, we will design an efficient algorithm to optimize the joint probability
via a process of message propagation [Frey1998] on the network.

The proposed information network OLAP is also important for QoI. Node ranking and graph
information aggregation provide mechanisms to validate information and detect anomalies
existing in information network.

Validation
The data sets used in Task I2.1 will also be utilized in this task. We plan to include an
information network crawled from Last.fm for information network OLAP and graph
association study. We will demonstrate that Network OLAP will provide mechanisms for
searching knowledge and patterns hidden in large complex information networks. We will
study the scalability of our approach by analyzing the space-time complexity of various new
graph OLAP operators. In the subsequent years, we will integrate Information Network OLAP
with visualization techniques developed in I2.3 and conduct case studies.

Summary of Military Relevance


Military users need a simple and fast way to navigate complex information networks to find
information quickly. The proposed Network OLAP framework will address their need. It will
not only be available for information networks, but also applicable to all network types including
communication, social and cognitive networks. The framework can be used as a tool to integrate
and jointly search information from different military networks.

Products

NS CTA IPP v1.4 7-32 March 17, 2010


(i) technical reports on research issues in information network OLAP, (ii) algorithms for graph
aggregation and association, and (iii) research papers to international conferences and journals.
Collaborations with other projects and other network centers:
1. The OLAP framework will feed into Project I3, where sophisticated knowledge discovery
algorithms will be built.
2. The proposed network OLAP framework is not only available for information networks, but
also applicable to all network types including communication, social and cognitive
networks. Network OLAP offers algorithms and tools for sensible navigation and human
comprehension in large-scale complex networks. We will work with the IRC to further
extend the network OLAP framework to CNARC and SCNARC.
3. We will collaborate with Dr. Ching-Yung Lin at SCNARC on large-scale network
processing and graph analysis. OLAP as a tool could be used to analyze the influence
between social networks and information networks.

7.2.6 Task I2.3 Information Network Visualization (Tobias Höllerer, UCSB


(INARC); P. Pirolli, PARC (INARC); X. Yan, UCSB (INARC); W. Gray,
RPI (SCNARC))
Task Overview
While Tasks I2.1 and I2.2 focus on the underlying data and processing models to facilitate real-
time data queries and meaningful knowledge discovery, this research task is aimed at devising
and evaluating the best visualization techniques and user interface options to enable human
operators of different interaction platforms (mobile and stationary) to get comprehensive, timely,
and useful information from the network, for different task scenarios, in a wide variety of user
situations and cognitive states.

Task Motivation
Different users of the information network need different views of the available information. An
urban war fighter tracking down a sniper may need fast access to a floor plan representation of a
particular building as well as answers to several simple yes/no questions, whereas a commander
planning a new mission may need a comprehensive overview of intelligence regarding a tactical
situation, including an analysis of sources, reliability, time stamps, alternative resources, etc.
For many tasks, we anticipate the necessity of visualizing (parts of) the information network as
graphs consisting of nodes and edges. Hence, we will be researching flexible comprehensible
representations to depict large graphs interactively in different user contexts. We also anticipate
users being equipped with a wide variety of interaction platforms, ranging from ultra-mobile
devices to surround-view immersive situation rooms. We will work towards a framework for
automatically tailoring information to different user contexts. Our goal is superior situational
awareness for information network users.

Key Research Questions


How to best visualize (potentially large) information networks for Army personnel in
different user situations?

NS CTA IPP v1.4 7-33 March 17, 2010


What are the principles behind effective information network dissemination on various
interaction platforms?
How is situation awareness for information networks achieved in different user contexts?
Initial Hypothesis
We hypothesize that our novel information network visualization techniques, which will adapt
to the recipients‘ information needs and the available presentation/interaction platforms, will
result in superior situational awareness compared to the application of best known practices
from prior work. Identifying the best means of effectively enabling human-centric, effective
information and knowledge presentation in information networks depends on the context and
specific information needs of the recipient.

Prior Work
There has been extensive research on Situational Awareness [Endsley2000, Endsley2003,
Gawron2008] and how to achieve it on different presentation and interaction platforms
[Rosenblum97, McCarley2002, Bell2002]. We will work towards achieving Situational
Awareness for heterogeneous information networks, considering scalability in data size, platform
variability, and user context. Interactive graph visualization and mining [Heer2005, Wong2006,
Cui2008, Gretarsson2009] can play an important role in the representation of large
heterogeneous data sets, and we will use this representation scheme for a central component of
our scalable visualization and interaction agenda. Visual analytics approaches to sensemaking in
large-scale data repositories [Viegas2007, Stasko2007] have occasionally explored the role of
different types of provenance [Gotz2008] with good results, and we feel that our agenda will
result in considerable new contributions to the state of the art in this area as well.

Automatic generation of visualizations and multimedia briefings has been explored in the
intelligent user interface research community [Maybury1998, Dalal1996, Green2004,
Rousseau2006]. We focus our work on the specific tailoring of information network data to
different users with different presentation and interaction platforms in different user contexts.

Proposed Approaches
Our first-year efforts on optimally visualizing general information network content will focus on
a crucial component of network information visualization, interactive graph visualization, as well
as the design of an integrated framework to tailor information content to user context and
availability of presentation/interaction platforms. In particular, we will work on the following
two tasks:
1. Design scalable graph representations of information networks;
2. Develop and evaluate a framework for adapting information network content to different
user contexts and presentation and interaction platforms (ultra-mobile to immersive
surround-view);
1. Scalable Graph Representations of Information Networks
In the first year, we will develop novel graph-based representations for information networks.
The interactive visualizations and interfaces that we plan to develop and evaluate will be based
on large heterogeneous data collections as well as a careful analysis of meaningful situational
awareness tasks in the INARC domain. We will perform a detailed task analysis and collect

NS CTA IPP v1.4 7-34 March 17, 2010


representative data sets that drive and support our visualization and interactive interfaces. Based
on the resultant task scenarios, we will design efficient interactive graph visualizations
representing the information network content and its provenance, supporting interactivity for
large information networks of tens or hundreds of thousands entities and relationships.

Extending our previous work on scalable visualization and constrained interaction for large
graphs [Gretarsson2009], we will focus on real-time interactive visualization of graphs
representing entire information networks and selective clusters of interest. Our motivation
comes from the insight that real-time interaction and dynamic probing of large data sets can lead
to a much clearer mental representation and better understanding of the available data than
partial views and text-based analysis and queries alone. We will be working towards powerful
graphical tools enabling our users to form their own comprehensive views of the information
universe and allowing them to navigate and explore it as comfortably as possible. We strongly
believe that interactive exploration of very large networks is not just feasible but also essential
for forming an increased understanding of the available resources. The first step is to display the
data universe as a whole and make it feasible to interact with it in real time. The second step is to
leverage interaction to let each individual user predictably downscale the network to form
various level-of-detail representation of the space. This structure will become their mental model
of the data universe. While the representation will be dynamic and continuously adapt to newly
arriving data, it will stay comprehensible and predictable because it is formed under the user‘s
direct control.
2. Adapting Information Networks to Different User Contexts and Presentation Platforms
While the previous subproject deals with the specific case of graph representations, the
following two subprojects are concerned with general multimedia information presentation.
Apart from the scalability to large amounts of data, we want our information network
visualizations to be flexible in terms of the type and cognitive state of the recipient (scalability
to user context) and the presentation and interaction platforms they use (scalability to available
infrastructure).

To this end we will design a representation and presentation framework that takes into account
user models (from our domain and user analysis) and cognitive models (using extensions to
PARCs Information Foraging Theory), as well as detailed information about the capabilities and
constraints of various candidate presentation and interaction platforms, ranging from ultra-
mobile handheld/wearable devices to the Allosphere, our three-story immersive situation room
at UCSB. Our framework will allow us to tailor task-specific information from the
representations inside the information network to the interaction platforms and general context
of the recipient. We will start our agenda with simple ultra-mobile interfaces, integrating
iPhones with the Allosphere infrastructure. In conjunction with Project I1, we will develop
interactive situation room interfaces to visualize an information network originating from a
large indoor/outdoor camera network deployed at UCSB.

To study the effectiveness and efficiency of the graph visualizations, UCSB and PARC will
collaborate on an empirical study of trade-offs among task conditions, user interface constraints

NS CTA IPP v1.4 7-35 March 17, 2010


and resources (e.g., display and interaction functionalities), visual presentation and interaction
techniques, the impact of human perception and cognition constraints on information network
displays, and outcome measures such as judgments of information relevance and provenance,
human source credibility (trust; expertise), and task performance. These studies will aim beyond
simple evaluations towards more general theoretical models of that build upon the mathematical
and computational modeling approaches developed in Information Foraging Theory. In
particular these models will be developed to correctly represent of how to modulate information
network representations for optimizing performance given various contextual & resource
constraints (e.g., small mobile screen vs. big display). The evaluation will be based on carefully
chosen tasks and information to present to users under different visualization conditions, a
measurement framework driven by a task environment analysis from a psychological
perspective, and an information environment analysis driven by the task environment analysis.

Validation
We will validate the usefulness of our novel scalable visualizations and interfaces through the
usability evaluation agenda proposed above. Specifically, in the first year, we will design
controlled formative user studies for scalable graph visualizations and provenance data, UCSB
and PARC will collaborate to determine meaningful variables & trade-offs to study for the
identified tasks, as well as to set up a systematic infrastructure and procedure for replicable
controlled collaborative user evaluation.

We will make use of the data sets used in tasks I2.1 and I2.2, and work with Project I1 to obtain
meaningful graph representations of heterogeneous information networks. Furthermore, UCSB
and PARC will draw upon their capabilities for mining publicly available media systems such as
tagging systems, microblogging, RSS feeds, etc. These systems are frequently used in everyday
life to support social, communication, and information network functionality. All these datasets
together will provide representative samples of real network structures and dynamics. If
necessary, these data could be enhanced with synthetic data (e.g., geolocation) to have high
similarity to anticipated Army scenarios.

Summary of Military Relevance


Allowing military personnel of different positions and ranks to effectively extract valuable and
task-relevant knowledge from an information network requires the careful design of appropriate
visualizations and user interfaces. Meaningful scalable graph visualizations will be important
for many task scenarios. Such graph representations, along with other information formats, need
to be conveyed on a variety of user platforms, for a variety of task contexts, and cognitive states.

A flexible representation and interactive visualization framework that can automatically adapt to
different interaction platforms will be critical for military operations. For example, an urban
war fighter, clearing houses, will need a different information view than a battalion commander,
who assesses and dispatches incoming intelligence. Global command centers, on the other hand,
need to maintain a holistic view of the situation, including interactive views of all available
assets, mission-critical assets, mission status, threats, and what-if scenarios. The proposed graph

NS CTA IPP v1.4 7-36 March 17, 2010


visualizations, context-adaptive visualization framework, and evaluations of information
network knowledge extraction will improve situation awareness for military users.

Products
(i) Novel scalable graph visualizations and interfaces to information provenance. (ii) User study
design on information network visualization. (iii) Research reports and published scientific
papers on effective situation-aware visualization of heterogeneous information networks.

Collaborations with other projects and other network centers:


1. Information network visualization is related to Trust-CCRI. We will study the visualization
issue of trust and uncertainty.
2. We will work with the cognition experts at SCNARC to design visualization methodologies
in a way that leads to the maximum understanding not only for individuals, but also for a
group of users. Our visualization tools will be integrated with the computational cognitive
model (a simulated user) developed by Dr. Wayne Gray at SCNARC. Programming
interfaces will be coordinated so that simulated users can access UCSB‘s graphical UIs, and
we will develop some logging protocols to record user (or simulate user) behavior.

7.2.7 Linkages to Other Projects


IPP Tasks Linkage
I2.1 E2 Representation of networks will be used to develop new ways of
organization and management of information networks.

I2.1 E3.1, E3.2, E3.3 Time varying aspects of networks will be studied in EDIN Tasks
3.1, 3.2 and 3.3. The research there will provide the kinds of
queries that will be useful for our studies.

I2.1  E1 We will work with E1 to ensure that we adopt a consistent


ontology.
I2.1  C2.2 We will work with Dr. Guohong Cao at CNARC for information
network placement including caching so that the network is more
accessible for different users.
I2.2  I3.1, I3.3 The OLAP framework will feed into Project I3, where
sophisticated knowledge discovery algorithms will be built.
I2.2  S1.1 We will collaborate with Dr. Ching-Yung Lin at SCNARC on
large-scale network processing and graph analysis.
I2.3  I1.2 The data fusion work in I1.2 will influence the visualization
techniques devised in I2.3 and the interfaces developed in I2.3 will
stipulate more kinds of multimodal data fusion.
I2.3  T1 We will study visualization questions for trust and uncertainty.
I2.3  S3.2 Our visualization tools will be integrated with the computational
cognitive model (a simulated user) developed by Dr. Wayne Gray
at SCNARC.

NS CTA IPP v1.4 7-37 March 17, 2010


7.2.8 Relevance to US Military Visions/Impact on Network Science
While many military information network-related applications are seemingly independent, they
usually share common information access and knowledge discovery need inside, such as
integrating heterogonous information sources and finding important nodes, structures, and
associations. Nevertheless, due to the lack of an information network system, users have to re-
implement the same functions repeatedly in an ad hoc manner. In many cases, important
network properties are ignored and the influence among communication networks, information
networks, and social networks is not taken into consideration, leading to inefficient designs in
the existing implementations. This project is to change the state of the art by developing a
scalable human-centric information system that is network-centric in its specification and that
can address the needs of managing and searching various kinds of information in military
physical and logical networks. This system will consider the impact of changes in
communication networks and social networks to information networks, and vice versa. Such
impact is obvious when soldiers store, move, and disseminate information. The proposed
system will significantly contribute to network science and provides systematic support to fine-
granularity scientific analysis of network structures and content.

7.2.9 Relation to DoD and Industry Research


Information network organization, management, and dissemination are critical to many DoD
tasks that involve information fusion and information access. Our team members have
sophisticated background and direct connections to industry research and startups. The design of
a scalable human-centric information network system will consider real military scenarios. We
expect that the research will lead to insights on how to design scalable and effective information
networks, and we expect that these design principles will lead to significant technology transition
opportunities.

Project Research Milestones

Due Task Description


Study application scenarios of large scale network aggregation
Q2 Task 1
(UCSB, IBM)
Formalize a multi-dimensional OLAP framework for graph
Q2 Task 2
data analysis (UIUC, UCSB)
Assemble representative sets of meaningful data and task
scenarios (UCSB, PARC)
Q2 Task 3
Design scalable graph representations of information networks
(UCSB, IBM, UIUC)
Investigate stochastic information dissemination models and
Bayes probabilistic models (IBM, UCSB)
Q3 Task 1 Design graph indexing for large scale network aggregation
(UCSB, IBM)
Adapt the GraphQL query language to include the notion of

NS CTA IPP v1.4 7-38 March 17, 2010


Project Research Milestones

Due Task Description


time (UCSB, UIUC)
Q3 Task 2 Design a multi-dimensional OLAP framework (UIUC, UCSB)
Develop a framework for adapting information network
content to different user contexts and presentation and
interaction platforms (ultra-mobile to immersive surround-
view) (UCSB, PARC)
Q3 Task 3
Investigate models of how to modulate visualizations to
optimize performance given various contextual & resource
constraints (e.g., small mobile screen vs. big display) (PARC,
UCSB)
Develop implementations of the GraphQL language (UCSB,
UIUC)
Q4 Task 1
Implement graph indexing for information aggregation
(UCSB, UIUC)
Develop two new graph OLAP operators: graph aggregation
and graph association (UCSB, UIUC)
Q4 Task 2 Study new path-based node similarity ranking (UIUC, UCSB)
Investigate reverse path-based similarity ranking (UIUC,
UCSB)
Design user studies to iteratively improve the effectiveness of
Q4 Task 3 our visualizations and interfaces and assess the achieved
situational awareness (UCSB, PARC)

Budget By Organization

Organization Government Funding ($) Cost Share ($)


IBM (INARC) 89,756
PARC (INARC) 35,646
UCSB (INARC) 274,239
UIUC (INARC) 56,664
TOTAL 456,305

References
[Barabasi2003] Barabasi A.-L., Linked: How everything is connected to everything else and
what it means. Plume, 2003.

NS CTA IPP v1.4 7-39 March 17, 2010


[Bell2002] Bell B., Höllerer T. and Feiner S., An annotated situation-awareness aid for
augmented reality. In Proceedings of User Interface Software and Technology, pages 213–
216, 2002.
[Bhattacharya2004] Bhattacharya I. and Getoor L. Iterative record linkage for cleaning and
integration. In Proc. SIGMOD Workshop on Research Issues on Data Mining and Knowledge
Discovery, pages 11–18, 2004.
[Bizer2008] Bizer C., Heath T., Idehen K., and Berners-Lee T. Linked data on the web. In
Proceedings of 17th Int. World Wide Web Conf., 2008.
[Boldi2004] Boldi P. and Vigna S. The WebGraph framework I: Compression techniques. In
Proceedings of 17th Int. World Wide Web Conf,pages 595-601, 2004.
[Brin1998] Brin S. and Page L. The Anatomy of a Large-Scale Hypertextual Web Search
Engine. Proceedings of 7th Int. World Wide Web Conf, pages 107-117, 1998.
[Chaudhuri1997] Chaudhuri S. and Dayal U., An overview of data warehousing and OLAP
technology. SIGMOD Record, 26:65–74, 1997.
[Consens90] M. P. Consens and A.O. Mendelzon, GraphLog: a visual formalism for real life
recursion, ACM PODS, 1990.
[Cui2008] Cui, W., Zhou, H., Qu, H., Wong, P.C., and Li, X., Geometry-based edge clustering
for graph visualization, IEEE Transactions on Visualization and Computer Graphics, vol.14,
no.6, pages 1277-1284, 2008.
[Culotta2007] Culotta A., Wick M., Hall R., Marzilli M., and McCallum A., Canonicalization of
database records using adaptive similarity measures. In Proceedings of ACM Int. Conf.
Knowledge Discovery in Databases, 2007.
[Dalal1996] Dalal, M., Feiner, S., McKeown, K., Pan, S., Zhou, M., Höllerer, T., Shaw, J., Feng,
Y., and Fromer, J. Negotiation for automated generation of temporal multimedia presentations.
In Proceedings of the Fourth ACM International Conference on Multimedia (Boston,
Massachusetts, United States, November 18 - 22, 1996). MULTIMEDIA '96. ACM, New York,
NY, 55-64, 1996.
[Dasu2003] Dasu T. and Johnson T. Exploratory Data Mining and Data Cleaning. John Wiley &
Sons, 2003.
[Dean2004] Dean J. and Ghemawat S., MapReduce: simplified data processing on large clusters,
Proceedings of the 6th conference on Symposium on Opearting Systems Design &
Implementation, pages 137-150, 2004
[Endsley2000] Endsley, M.R. and Garland, D.J. (Editors), Situation Awareness Analysis and
Measurement, Taylor & Francis, 2000.
[Endsley2003] Endsley, M.R., Designing for Situation Awareness: An Approach to User-
Centered Design, Taylor & Francis, 2003.
[Faloutsos1999] Faloutsos M., Faloutsos P., and Faloutsos C., On power-law relationships of the
internet topology. In Proc. of ACM Conf. Applications, Technologies, Architectures, and
Protocols for Computer Communication, pages 251-262, 1999.
[Frey1998] Frey B. Graphical models for machine learning and digital communication, MIT
Press, Cambridge, 1998
[Gawron2008] Gawron, V.J., Human Performance, Workload, and Situational Awareness
Measures Handbook, Second Edition, Taylor & Francis, 2008.
[Gotz2008] Gotz, D. and Zhou, M.X., Characterizing users‘ visual analytic activity for insight
provenance, IEEE Symposium on Visual Analytics Science and Technology, pages 123-130,
2008.

NS CTA IPP v1.4 7-40 March 17, 2010


[Gray1997] Gray J., Chaudhuri S., Bosworth A., Layman A., Reichart D., Venkatrao M.,
Pellow F. and Pirahesh, H., Data cube: A relational aggregation operator generalizing
group-by, cross-tab and sub-totals. Data Mining and Knowledge Discovery, 1:29–54, 1997.
[Green2004] Green, N. L., Carenini, G., Kerpedjiev, S., Mattis, J., Moore, J. D., and Roth, S. F.
2004. Autobrief: an experimental system for the automatic generation of briefings in integrated
text and information graphics. Int. J. Hum.-Comput. Stud. 61, 1, pp. 32-70, July, 2004.
[Gretarsson2009] Gretarsson, B., O‘Donovan, J., Bostandjiev, S., and Höllerer, T.. WiGis: A
framework for scalable web-based interactive graph visualizations. Proc. GraphDrawing 2009
(17th Int’l Symposium on Graph Drawing), 2009.
[Guting94] R. H. Guting, GraphDB: Modeling and querying graphs in databases, Proc. of
VLDB, pages 297--308, 1994.
[Han2004] Han H., Giles L., Zha H., Li C., and Tsioutsiouliklis K. Two supervised learning
approaches for name disambiguation in author citations. In Proceedings of Int. Conf. on Digital
Libraries, 2004.
[He2008] He H. and Singh A. K., Graphs-at-a-time: query language and access methods for
graph databases. In Proceedings of the ACM International Conference on Management of Data,
pages 40-418, 2008.
[Heer2005] Heer, J. and Boyd, D., Vizster: visualizing online social networks, IEEE Symposium
on Information Visualization, pages 32-39, 2005.
[Leser05] U.Leser, A query language for biological networks, Bioinformatics, 21:ii33--ii39,
2005.
[Manola2004] Manola F. and Miller E., Rdf Primer. In W3C Recommendation, 2004.
[Maybury1998] Maybury, M.T. and Wahlster, W., Eds. Readings in Intelligent User Interfaces.
Morgan Kaufmann Publishers Inc., 1998.
[McCarley2002] McCarley J., Wickens C., Goh J., and Horrey W., A computational model of
attention / situation awareness. In Proceedings of 46th Annual Meeting of the Human Factors
and Ergonomics Society, 2002.
[Raman2001] Raman V. and Hellerstein J. Potter‘s wheel: An interactive data cleaning system.
In Proceedings of Int. Conf. on Very Large Data Bases, pages 381–390, 2001.
[Rosenblum1997] Rosenblum L., Durbin J., Doyle R., T a t e D. And King R., Situational
awareness using the VR responsive workbench. IEEE Computer Graphi cs and
Applications, 17(4):12–13, 1997.
[Rousseau2006] Rousseau, C., Bellik, Y., Vernier, F., and Bazalgette, D. 2006. A framework for
the intelligent multimodal presentation of information. Signal Process. 86, 12 (Dec. 2006), 3696-
3713.
[Sheng99] L. Sheng, Z. M. Ozsoyoglu, and G. Ozsoyoglu, A graph query language and its query
processing, ICDE, 1999.
[Stasko2007] Stasko, J., Gorg, C., Liu Z., and Singhal, K., Jigsaw: Supporting investigative
analysis through interactive visualization, IEEE Symposium on Visual Analytics Science and
Technology, pages 131-138, 2007.
[Udrea2007] Udrea O., Getoor L., and Miller R. J., Leveraging data and structure in ontology
integration. In Proceedings of ACM Int. Conf. Management of Data, pages 449–460, 2007.
[Viegas2007] Viegas, F.B., Wattenberg, M., van Ham, F., Kriss, J., and McKeon, M.,
ManyEyes: A site for visualization at internet scale , IEEE Transactions on Visualization and
Computer Graphics, vol.13, no.6, pages 1121-1128, 2007.

NS CTA IPP v1.4 7-41 March 17, 2010


[Watts1998] Watts D. J., Dodds P. S., and Newman M. E. J., Collective dynamics of ‗small
world‘ networks. Nature, 393:440-443, 1998.
[Wong2006] Wong, P.C., Chin, G., Foote, H., Mackey, P., and Thomas, J., Have green - A
visual analytics framework for large semantic graphs, IEEE Symposium on Visual Analytics
Science and Technology, pages 67-74, 2006.

7.3 Project I3. Knowledge Discovery in Information


Networks

Project Lead: Jiawei Han, UIUC


Email: hanj@cs.uiuc.edu, Phone: 217-333-6903

Primary Research Staff Collaborators


C. Faloutsos, CMU (INARC) S. Adali, RPI (SCNARC)
J. Han, UIUC (INARC) M. Faloutsos, UCR (IRC)
H. Ji, CUNY (INARC) W. Gray, RPI (SCNARC)
S. Papadimitriou, IBM (INARC) J. Hendler, RPI (IRC)
D. Roth, UIUC (INARC) T. La Porta, PSU (CNARC)
X. Yan, UCSB (INARC) B. Szymanski, RPI (SCNARC)

7.3.1 Project Motivation and Overview


Knowledge discovery in information networks involves the development of scalable and
effective algorithms to uncover patterns, correlations, clusters, outliers, rankings, evolutions, and
abnormal relationships or sub-networks in information networks. Although there have been a
large number of statistics, machine learning, data mining, and other data-intensive methods
developed for knowledge discovery, knowledge discovery in information networks is a new
research frontier, and moreover, the army applications pose many new challenges and call for in-
depth research for effective knowledge discovery in distributed and volatile information
networks. We identify the following major and critical tasks in knowledge discovery in
information networks:
1. Most data mining methods for information networks assume the network nodes and/or edges
are of homogeneous types (e.g., friends in social networks, and web-pages in the Internet),
however, many large information networks are heterogeneous, with links intertwined among
multi-typed objects, such as interlinks among persons (e.g., soldiers, commanders, and
enemies), equipments (e.g., tanks, helicopters, and satellites), geo-locations (e.g., cities,
bridges, and rivers), forming heterogeneous information networks

NS CTA IPP v1.4 7-42 March 17, 2010


2. Military data, especially many sensor data, such as battle-field situation, emergency handling
and situation control, are often highly dynamic, with new information constantly fed into the
system in the form of real-time data streams
3. The networks are critically associated with spatiotemporal dimensions, e.g., location and
time associated with tanks, soldiers, and helicopters
4. Many military applications contain a lot of text and unstructured data, and thus text and
unstructured data mining in information networks is critically important for understanding
and utilizing such networks
5. Since the communications could be broken, sensors could be destroyed, battery may run low,
and combat personnel/equipments could be disabled, it is essential to discover trustable
knowledge in information networks under dynamic, volatile, incomplete, and unreliable
situations. Mining trustable knowledge in information networks will be part of the task in
Trust-CCRI and part of the efforts in the second and third year of I1 and I3 projects although
our design and methodology study in the first year are also aimed to advance in this frontier
as much as possible.

7.3.2 The key research question of the project


―How can we develop efficient and effective knowledge discovery mechanisms in distributed,
volatile, and heterogeneous information networks?”

7.3.3 Technical Approach


The above challenging research issues will take multiple years of research efforts for effective,
robust, and scalable solutions. For the first year, we propose to accomplish the following three
tasks:

Task I3.1: Methods for scalable mining of dynamic, heterogeneous information networks
(Lead: Jiawei Han). This project is to answer the key research question: ―What are the new
principles and methods for mining distributed, incomplete, dynamic, and heterogeneous
information networks to satisfy the end-user needs?”

Task I3.2: Real-time methods for mining spatiotemporal information-related cyber-


physical networks (Lead: Spiros Papadimitriou and Jiawei Han). This project is to answer the
key research question: ―How to mine spatiotemporal-related patterns in cyber-physical
information networks in distributed and mobile environments?”

Task I3.3: Text and Unstructured Data Mining for Information Network Analysis (Lead:
Dan Roth). This project is to answer the key research question: ―How to develop effective
mechanisms for mining knowledge from text and unstructured data in information networks in
noisy and volatile environments?”

We will systematically investigate these research issues, develop robust and effective methods
and algorithms for solving these problems and test our solutions in military and/or similar
applications. Moreover, since information networks are closely linked with communication
networks and social and cognitive networks in many aspects, we will pay much attention on
collaborating with the other research centers in this project. We will also invest our efforts to
collaborate with IRC on potential technology transfer of our algorithms and methods developed

NS CTA IPP v1.4 7-43 March 17, 2010


in this project for large-scale, military applications. The details of our proposed approach are
outlined as follows.

7.3.4 Task I3.1 Methods for scalable mining of dynamic, heterogeneous


information networks (J. Han, UIUC (INARC); C. Faloutsos, CMU
(INARC); X. Yan, UCSB (INARC); M. Faloutsos, UCR (IRC); J. Hendler,
RPI (IRC); B. Szymanski, RPI (SCNARC))
Task Overview
This task is to investigate new principles and effective methods for mining various kinds of
patterns and knowledge from dynamic, heterogeneous information networks. We assume that
data in the information network are multiple typed (i.e., heterogeneous), incomplete, dynamic,
and noisy. There are many issues to be studied in order to effective discovery interesting and
mission-critical knowledge in information networks. For the first year, our focus will be on
effective methods for classification of heterogeneous information networks, discovery of
evolutionary regularities in dynamic information networks, and methods for outlier detection for
dynamic heterogeneous information networks. Data stream mining, in-depth pattern discovery,
network structure discovery, and many other issues will be investigated in the subsequent years.

Task Motivation
It is critically important in military and other application to construct models (i.e., classification)
based on limited training data, discover evolutionary regularities and anomalies from massive,
inter-related datasets. Such massive, inter-related data form heterogeneous information networks
and objects in such information networks may mutually consolidate each other and thus
information network analysis will enhance the quality of data, information and knowledge
overall and help us make intelligent and informative decisions.

Key research question


What are the new and effective methods for mining distributed, incomplete, dynamic, and
heterogeneous information networks to satisfy the end-user needs?

Initial Hypothesis
We assume networks in the real world are heterogeneous, interacting and evolving, and new data
set streaming into the system dynamically. Many military applications require the system to
perform effective, scalable, comprehensive and real-time analysis of such information networks.
We further assume such networks consist of multi-typed, interconnected objects, such as
soldiers, commanders, armed vehicles, geospatial objects (such as bridges, rivers, highways,
villages), text messages and documents, and other artifacts, each associated with multiple
properties (called attributes), and such networks poses many new challenges to the analysis
systems that handle only homogeneous objects, such as people-people networks. We hypothesize
that our proposed approaches will be able to discover mission-critical knowledge systematically
from such dynamic, heterogeneous information networks.

Prior work
Most existing network modeling and analysis methods consider homogeneous, static networks
(Girvan02). However, networks in the real world are heterogeneous, interacting, distributive, and
dynamically evolving, which pose great challenges in terms of effectiveness, scalability, and

NS CTA IPP v1.4 7-44 March 17, 2010


comprehensive analysis of such information networks (Sun09a). Especially, military networks
pose even greater challenges due to diversity, heterogeneity and dynamics. Previous studies
focus on homogeneous networks, such as friends, authors, and web pages themselves. It is
important to investigate the new principles and methods of dynamic heterogeneous information
networks.

Technical approaches
Mining distributed, incomplete, dynamic, and heterogeneous information networks is a multi-
year task. The first year work contains three subtasks: (i) developing methods for effective
classification of heterogeneous information networks, (ii) developing methods for pattern
discovery of in evolutionary heterogeneous information networks, and (iii) developing methods
for detecting outliers and exceptions in dynamic, heterogeneous information networks. The first
subtask lays out the foundation and works out effective algorithm for model construction in
heterogeneous information networks. The second task investigates pattern discovery in dynamic
heterogeneous information networks. We expect the result of this study will also contribute to
the study of evolution and dynamics of general networks, i.e., the EDIN project. The third task is
to detect outliers in dynamic information networks where data can be streaming into the network
in real-time. The following are the details of these subtasks.

1. Methods for effective classification of heterogeneous information networks: For


heterogeneous information network analysis, we have recently developed ranking-based
clustering methods: RankClus (Sun09a) and NetClus (Sun09b). The former investigates how
to perform link-based clustering on bi-partite information networks by progressively ranking
objects to be clustered while clustering the ranked objects. The latter extends the
methodology to multi-typed, interrelated objects that form a star schema. Such clustering
methods do not use any class label but discover ranking and clustering simultaneously by
link analysis. However, label information is sometimes available for portions of the objects.
Considering one may know some armed vehicles belong to ours, some belong to enemies,
but others are unknown. Such knowledge can be used to label data partially. Learning from
such labeled and unlabeled networked data may lead to better knowledge extraction of the
hidden network structure. In previous studies, classification on homogeneous information
networks has been studied for over a decade, but classification on heterogeneous network has
not been explored so far. We propose a novel graph-based regularization framework to model
the link structure in heterogeneous information networks with arbitrary network schema and
arbitrary number of object/link types. Specifically, we explicitly differentiate the multi-typed
link information by incorporating it into different relation graphs. Based on that framework,
we use the label information on part of the objects to predict labels for the unlabeled objects.
2. Methods for pattern discovery in evolutionary heterogeneous information networks:
We have recently developed methods for rank-based clustering of heterogeneous information
networks (Sun09a; Sun09b), with promising results. However, the previous studies do not
consider the dynamics and evolution of information networks. For example, a research
publication network contains multiple years of data and observing the evolution regularities
of the networks, such as which communities are emerging or vanishing could be important in
many applications. Moreover, one may like to discover interesting patterns in such evolving
networks. We propose to develop effective knowledge discovery methods to uncover the
evolution regularities of heterogeneous information networks and by frequent and

NS CTA IPP v1.4 7-45 March 17, 2010


discriminative pattern mining (Cheng08) from such evolving clusters or refined hierarchical
clusters. We will also investigate the use of such patterns for network-based ontology
discovery and multi-dimensional data or network summarization. The discovered patterns
and constructed hierarchies can be fed into multiple tasks in other projects, such as OLAP
analysis in information networks of project I2.3 and trustworthiness analysis in Trust-CCRI.
3. Outlier detection for dynamic heterogeneous information networks. In our data mining
research, we have developed some interesting outlier detection methods. However, it is new
and challenging to detect outliers in heterogeneous information networks. We propose to
develop two approaches: supervised and unsupervised approach, as follows.
Supervised approach: Most classification methods assume relatively balanced data sets
but cannot handle highly skewed (e.g., few positives but lots of negatives) distributions.
However, anomalies or rare events (treated as positives) could be the most interesting
signals in reconnaissance or other military operations. Work in rare event data stream
classification mines data sets by estimating reliable posterior probabilities using an
ensemble of models to match the distribution over under-samples of negatives and
repeated samples of positives. In many applications, however, experts may only have
time to identify a small set of anomalies but leave the majority of data unlabeled. We
propose to extend works on classification with positive examples and unlabeled data and
integrate it with the information network analysis.
Unsupervised approach: We want to use link behavior in order to determine significant
outliers. Outliers may be interesting linkage behaviors which may not be commonly
noticed; and may be useful for military purposes. e.g., a movie which is linked to a genre
and an actor who normally are not associated together; or a paper which is written by
authors who do not normally write papers together. We want to ignore noise and identify
significant events in an unsupervised way. We need a dynamic, scalable approach which
can be used for massive information network streams. We would like to summarize
"typical" link behavior with some kind of dynamic clustering approach; define outliers in
the context of dynamic clustering using tilted time frames. In this way, we hope to be
able to identify outliers in information network stream data.

Validation Approach
We will use the following data sets to design, develop, and validate our proposed methods, not
only for this task but also for the remaining tasks in this project.
DBLP bibliographic networks: This is a typical heterogeneous information network with
authors, titles (a bag of keywords), conferences, and research papers linked together. With
the year information, one can study its evolution regularity and mining interesting patterns.
Note such multi-typed network is also typical in military networks which could be
interconnected, multiple typed entities representing soldiers, commanders, equipments, time,
location, documents, etc.
NASA aviation safety databases: We have bee using NASA aviation incident report
database (ASRS) for studying text mining and information network analysis and will
continue using it for this study. Note that incident reports of similar nature could be common
in military applications.
News data sets: We plan to use news datasets, including Google News or other typical news
datasets to construct information networks and observe the evolution of some popular events
and see how to perform mining in such datasets. Note that news, radio-broadcasts, blogs,

NS CTA IPP v1.4 7-46 March 17, 2010


messages, documents, conversation records are also common in military applications, and
they can be linked with other types of entities in military information networks. In our study
of Google News, we will pay attention to link the scenarios with military applications where
there are numerous and important text and multimedia information exchanges.
Simulated data sets for military information network usage: We will work with ARL
researchers to generate simulated datasets that are closely linked to the various requirements
of the military missions for proof of the concepts and for testing military applications of our
technology.

Summary of Military Relevance


Military missions involve collecting and handling massive, interrelated, dynamic and unreliable
data, which form heterogeneous information networks. Mining knowledge from such networks
is critical to the mastery of timely and comprehensive information and thus to the success of any
mission. Such relevance has been emphasized throughout the task description and will be carried
out in our research, algorithm development, and experiments.

Research Products

1. A set of algorithms and methods generated from this study and a series of reports on the
effectiveness and efficiency tested in the datasets provided above, and
2. A set of research papers to be published in international conferences and journals.

Collaborations with other projects/centers


For this and other tasks in this project, please see the collaboration section and table at the end of
the project description.

7.3.5 Task I3.2 Real-Time Methods for mining Spatiotemporal Information-


Related Cyber-physical Networks (S. Papadimitriou, IBM (INARC); J. Han,
UIUC (INARC); X. Yan, UCSB (INARC); S. Adali, RPI (SCNARC); T. La
Porta, PSU (CNARC))
Task Overview
This task is to develop effective data mining methods to discover interesting patterns and
knowledge in cyber-physical information networks that contains spatiotemporal components in
distributed, mobile environment. Most military information networks contain spatiotemporal
components in their nodes and edges, such as soldiers, commanders, enemies, tanks, rivers,
bridges, cities, and so on. With sensors and communication networks linked with such
information networks, spatiotemporal data will be flowing into such information networks
continuously and dynamically. Moreover, such dynamically changing information may contain
noise, uncertainty, and incompleteness. Recently, a new research discipline, called cyber-
physical systems, has emerge, that investigates systems that contain both physical devices, such
as sensors, cameras, mobile phones, and other physical and mobile devices, and information
components, such as inter-related multimodal data, including structured data, text data, and
images. In our study, we investigate new principles and methodologies for networks that link
both sensors and information components and thus called cyber-physical networks. We propose
to study effective and scalable methods that perform data mining and knowledge in cyber-
physical networks in distributed, mobile environment.

NS CTA IPP v1.4 7-47 March 17, 2010


Task Motivation
A network that links both physical devices and information components is a cyber-physical
network. Understanding, representing, and accounting for the spatiotemporal context of a
network (e.g., its physical constraints and uncertainty) are key issues in cyber-physical network
analysis. Physical (e.g., monitoring, communication) networks and information networks have
traditionally been studied in their respective communities separately, focusing on performance
analysis methods for the former and search and mining methods for the latter. A significant
innovation of the proposed work lies in a multidisciplinary approach that puts them under a
common analytic foundation and accounts for their interdependencies, when present. For
example, in a dynamic scenario, a resource-constrained or partially damaged monitoring and/or
communication network might throttle the information needed for maintaining and updating
parameters of an evolving battlefield model. In turn, real-time analysis of battlefield information
streams may require additional information from the monitoring/communication network to
reduce uncertainty. The efficacy of battlefield analysis in detecting distributed anomalies or
predicting an opponent's move may thus depend on proper understanding and exploitation of the
resulting feedback loop and the imposed environmental constraints. Of particular interest is
developing an understanding of the sensitivity of such loops to both damage (e.g., to the physical
network) and uncertainty (e.g., in the state of the information network).

Key research question


―How can we mine patterns and knowledge in cyber-physical networks that contain information
networks associated with spatiotemporal-context in distributed and mobile environment?”

Initial hypotheses
We hypothesize that our proposed approach as outlined below will be able to discover patterns
and knowledge in cyber-physical networks effectively in distributed, mobile environment.

Prior work
Sensor networks and the related data analysis have been studied extensive in previous research,
with many important issues addressed, including formal modeling, debugging, data-link
streaming, privacy-preserving data aggregation, etc. Our aim here is to integrate sensor network
analysis with information network analysis to create a new generation cyber-physical network
systems. Social network analysis, including Web community mining, has attracted much
attention in recent years (Chakrabarti03). Abundant literature has been dedicated to the area of
social network analysis, ranging from the network property, such as power law distribution
(Newman03) and small world phenomenon (Watt03), to the more complex network structural
analysis (Flake02; Girvan02), evolution analysis (Leskovec05; Backstrom06), and statistical
learning and prediction (Aleman-Meza06). The static behavior of large graphs and networks has
been studied extensively with the derivation of power laws of in- and out- degrees, communities,
and small world phenomena (Faloutsos99; Chakrabarti03). Our proposed work is to establish a
general analytical framework, with which users can easily manipulate and explore massive
cyber-physical networks to uncover interesting patterns, measures, and sub-networks.

Technical approaches

NS CTA IPP v1.4 7-48 March 17, 2010


It is challenging to mine interesting knowledge and patterns from cyber-physical networks. In the
first-year of research, our focus will be the development of effective clustering methods, with the
following two proposed tasks.

1. Clustering static cyber-physical networks: Taking battle-field as a typical military


application example of cyber-physical networks, we investigate principles and methods for
effective clustering of static cyber-physical networks (with the following subtask to
investigate the dynamic networks). We take soldiers, weapons, armed vehicles, local field
commanders, their communication networks/devices, and the related spatial objects as multi-
typed physical objects, and moreover, there are various kinds of links connecting them
together, and are also associated with various kinds of information objects. Clustering often
need to be performed on such cyber-physical networks for effective analysis and
subsequence data mining or functioning, based on their locations, time, functionality and
other requirements. However, it is an open problem on how to cluster them effectively (for
accomplishing certain tasks) and efficiently. In our previous research, we proposed a user-
guided clustering methodology (Yin07). We believe this methodology may generate desired
clusters for diverse application requirements. For example, for attacking enemies within
minutes, one may take available weapons, soldiers, tanks, field commanders, and
communication devices that are physically close to each other as a cluster, and analyze
multiple clusters of firing power on our side as well as that on the enemy side to work out an
effective battle plan. However, for a longer-term strategic offense, one may need to consider
more remote roads, bridges, multiple air/sea fighting units, transportations, longer-range
communications, and the clustering features could be rather different. Therefore, a user-
guided clustering mechanism can be effective since it takes users‘ instructions and hints as
guidance in cluster analysis and work out most relevant features in cluster analysis based on
information and physical network properties, as well as user-preference. We propose to
systematically develop this method by developing new methodologies that integrate user-
guided clustering (Yin07), rank-based clustering of heterogeneous information networks
(Sun09a; Sun09b), and spatial clustering methods.

2. Online clustering of dynamic and evolving cyber-physical networks: Besides clustering


of static cyber-physical networks, our research will proceed to clustering dynamic cyber-
physical networks, that is, the cyber-physical networks that change with time. Again, taking
a battle-field scenario as an example, both physical and informational components in a cyber-
physical network may change with time. A sensor could be dead, a tank could be disabled,
an enemy commander could be killed, a bridge could be destroyed, or even weather could
change. So the question becomes how to refine the clusters or redo the clustering in such a
volatile and dynamic environment. This poses great challenges to the development of
effective and efficient methods for clustering dynamic and evolving cyber-physical
networks. Moreover, this task is naturally linked with communication networks and social
networks since the change of physical and informational components could be the result of

NS CTA IPP v1.4 7-49 March 17, 2010


the situation changes in communication and social networks. Therefore, our research in this
direction will be closely collaborating with two other centers, CNARC and SCNARC. In our
own study, we propose to investigate methods of incremental updating of clustering results
obtained from a relatively stable cyber-physical networks as well as stream-based dynamic
clustering of cyber-physical networks. Moreover, we plan to share data and possibly social
network analysis methods and results (user proximity) with SmallBlue (SCNARC). This
work has potential extensions to intelligence analysis, where information is typically
collected from a multitude of different sources. In addition, not all of these sources are
always available, hence understanding the relationship and redundancies present may be
even more crucial to accurate mining since it will lead to the understanding of what data
sources should one seek to gain better and necessary inferences to decrease the uncertainty of
the dynamics of cyber-physical networks. Thus this is an exciting task that links
communication networks, physical network, information networks, as well as social
networks, and potentially leads to reduction of uncertainty and improvement of the quality
and trustability of the overall cyber-physical networks.

Validation Approach
We will use the battle-field scenarios data that simulates military applications in our study. In
the meantime, other dataset that outlined in Task I3.1 as well as the MoveBank datasets (at
movebank.org) that contain spatiotemporal information related to object movements will also be
used for testing in this study.

Summary of Military Relevance


This task is critical and closely linked to military applications since most mission-critical tasks
involve spatial and temporal dimensions and demand in real-time clear and intelligent knowledge
about the network related to dynamic cyber-physical networks.

Products

1. A set of methods to be developed on the effectiveness and efficiency of mining (i.e.,


clustering) cyber-physical networks, and
2. A report and a set of research papers to be published in international conferences and
journals.

7.3.6 Task I3.3 Text and Unstructured Data Mining for Information Network
Analysis (D. Roth, UIUC (INARC); J. Han, UIUC (INARC); H. Ji, CUNY
(INARC); X. Yan, UCSB (INARC); M. Faloutsos, UCR (IRC); B.Szymanski,
RPI (SCNARC))

Task Overview
This task is to investigate the integration of text mining and information network
analysis to enhance knowledge discovery in text-rich information networks. Most

NS CTA IPP v1.4 7-50 March 17, 2010


military applications will need to handle text data, including documents, e-mails,
telecommunication messages, and conversations. We assume an information network
will contain node and link, and moreover, some nodes and links may contain valuable
text/unstructured data. In this context, text analysis will play an important role in the
analysis of text-intensive information networks. We will study topic modeling and
topic cube in multi-dimensional information network analysis, dynamic language
modeling, and text mining in information network analysis in the first year.

Task Motivation
Information networks often contain abundant text and unstructured data, in the form
of electronic documents, reports, e-mails, blogs, conversations, news, web-pages,
and other narratives. We envision that such text-based datasets also play an essential
role in military applications. Besides integrating traditional text information retrieval
methods with information network analysis, it is essential to study multidimensional
text analysis in information networks.

Key research question


―How can we develop effective mechanisms for mining knowledge from text and unstructured
data in information networks in noisy, volatile environment?”

Initial hypothesis
We hypothesize that the proposed methods outlined below that integrate text mining
with information network analysis are effective in knowledge discovery in
information networks containing valuable text data.

Prior work
There have been extensive studies on information retrieval with text data and text mining
(Chakrabarti03; Manning08). However, the multidimensional analysis of text data by
construction of text cube and topic cubes has just been recently proposed in our studies
(Lin08; Zhang09). There have been even fewer studies on integration of information
network analysis with text data. Recently, our group has proposed a topic modeling
approach that integrates information network analysis with text data analysis
(Sun09c).

Proposed approaches
Three research subtasks are proposed as follows.

1. Topic modeling and topic cube in multi-dimensional information network


analysis: As text data related to a time-critical application is usually sparse and of high
dimension, it is essential to represent text data in a compact and efficient way. Statistical
topic models have been successfully applied to multiple text mining tasks to discover a
number of topics from text. In topic models, documents are represented by the
multinomial distributions over topics instead of keyword vectors. We will employ topic
models to discover the major topics among the text as well as to reduce the dimension of
text representation. Moreover, we have recently developed method for constructing topic

NS CTA IPP v1.4 7-51 March 17, 2010


cubes (Zhang09) to facilitate effective analysis of multidimensional text databases. We
will extend this model for multidimensional analysis of information networks containing
text data. Moreover, we will study the applications of text cube for the analysis of some
information networks in the context of similar to the military applications. Currently, the
potential dataset will be Google news and some news-related blogs.

2. Dynamic language modeling: For queries and mining in natural language speech or
text, we plan to analyze, parse and expand these queries in order to retrieve more
accurate and reliable information. We will consider query-time dynamic language model
adaptation for speech recognition and segmentation. We will leverage the information in
user queries for topic analysis and Language Model (LM) adaptation. We intend to
investigate extensively the underlying characteristics and different kinds of topic
modeling approaches, including the conventional MAP-based adaptation, latent Dirichlet
allocation (LDA) and clustering. Their performance will be analyzed and verified in the
information network setting. In this way we will achieve the fusion of global topical,
local contextual information and the feedback from natural language queries. In this first
year, we will implement a dynamic query expansion method based on entity and event
co-reference and finish experiments on using the query expansion methods in template
based data fusion.

3. Text mining in text-rich information network analysis: We will continue to


provide annotations of corpora, including named entities, co-reference, time expressions
(with date normalization), predicate-argument representation, relations and events, to the
members in both information network and social/cognitive network projects. We will
extract and align information social networks from comparable corpora for subsequent
data/text mining. This study will focus on experimenting with bootstrapping methods.
Since different data sources that provide direct or indirect associations between entities
(e.g., users, skillsets, bookmarks and documents) have significant redundancy, they can
be used to improve the robustness of both estimating low-level proximity/similarity
metrics as well as higher-level recommendations. Thus the use of linked data from
heterogeneous information network will be able to boost the quality of information.

Validation Approach
The data sets used in Task I3.1 will be used in this task. Especially news datasets, such as
Google News, could be the most useful example database for studying topic modeling, topic
cube and text mining. Moreover, we plan to use additional dataset as follows:
NIST Automatic Content Extraction Program's 2002-2009 Training Corpora: These include
thousands of documents with facts annotated (entities, relations and events) in different
genres (broadcast conversation, broadcast news, newswire, news groups and weblogs).
DARPA Global Autonomous Language Exploitation Program's question answering Year 1
and Year 2 training corpora: These corpora include a set of 17 different question templates
and their gold-standard answer snippets.

We plan to use these data sets to construct multi-dimensional text cube and develop information
network analysis and text mining methods for such datasets.

NS CTA IPP v1.4 7-52 March 17, 2010


Summary of Military Relevance
Most military applications involve text data, especially military-related news, newspapers,
documents, text messages, conversations, and blogs. Multi-dimensional text analysis and text
mining will be a critical component in information network analysis.

Products

1. A set of methods generated and reports on the effectiveness and efficiency of the mining text
data in information networks, and
2. A set of research papers to be published in international conferences and journals.

References
C. C. Aggarwal, J. Han, J. Wang and P.S. Yu (2003), ―A framework for clustering evolving data
streams‖, in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03).
C. C. Aggarwal, J. Han, J. Wang and P.S. Yu (2004), ―On demand classification of data
streams‖, in Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases
(KDD'04).
B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. P. Sheth, I. B. Arpinar,
A. Joshi, and T. Finin (2006). ―Semantic analytics on social networks: experiences in addressing
the problem of conflict of interest detection‖, in Proc. 15th Int. Conf. World Wide Web
(WWW'06).
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. (2006) ―Group formation in large
social networks: Membership, growth, and evolution. In Proc. 2006 ACM SIGKDD Int. Conf.
Knowledge Discovery in Databases (KDD'06).
D. Bortner and J. Han (2010), ―Progressive Clustering of Networks Using Structure-Connected
Order of Traversal‖, in Proc. 2010 Int. Conf. on Data Engineering (ICDE'10).
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan
Kaufmann, 2003.
H. Cheng, X. Yan, J. Han, and P. S. Yu (2008), ―Direct discriminative pattern mining for
effective classification‖, in Proc. 2008 Int. Conf. Data Engineering (ICDE'08).
M. Faloutsos, P. Faloutsos, and C. Faloutsos. (1999) ―On power-law relationships of the internet
topology‖, in Proc. ACM SIGCOMM'99 Conf. Applications, Technologies, Architectures, and
Protocols for Computer Communication, pp. 251-262.
G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee (2002) ―Self-organization and identification
of web communities‖, IEEE Computer, 35:66-71, 2002.
M. Girvan and M. E. J. Newman (2002) ―Community structure in social and biological
networks‖, in Proc. Nat. Acad. Sci. USA, pp. 7821-7826, 2002.
J. Han, Y. Chen, G. Dong, J. Pei, B. W. Wah, J. Wang, & Y. D. Cai (2005), ―Stream Cube: An
architecture for multi-dimensional analysis of data streams‖, Distributed and Parallel Databases,
18:173-197, 2005.
M.-S. Kim & J. Han (2009a), ―A Particle-and-Density Based Evolutionary Clustering Method
for Dynamic Networks‖, in Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09).
M.-S. Kim & J. Han (2009b), ―CHRONICLE: A Two-Stage Density-based Clustering Algorithm
for Dynamic Networks‖, in Proc. 12th Int. Conf. on Discovery Science (DS'09).

NS CTA IPP v1.4 7-53 March 17, 2010


J. Leskovec, J. Kleinberg, and C. Faloutsos (2005). ―Graphs over time: Densification laws,
shrinking diameters and possible explanations‖, in Proc. 2005 ACM SIGKDD Int. Conf. on
Knowledge Discovery and Data Mining (KDD'05), pp. 177-187.
C. X. Lin, B. Ding, J. Han, F. Zhu, and B. Zhao (2008) ―Text Cube: Computing IR measures for
multidimensional text database analysis‖, in Proc. Int. Conf. on Data Mining (ICDM'08).
C. D. Manning, P. Raghavan, and H. Schutze (2008). Introduction to Information Retrieval.
Cambridge University Press, 2008.
M. Masud, J. Gao, L. Khan, J. Han and B. Thuraisingham (2008), ―A practical approach to
classify evolving data streams: Training with limited amount of labeled data‖, in Proc. 2008 Int.
Conf. on Data Mining (ICDM'08), Pisa, Italy, Dec. 2008.
L. Mendes, B. Ding, and J. Han (2008), ―Stream sequential pattern mining with precise error
bounds‖, in Proc. 2008 Int. Conf. on Data Mining (ICDM'08), Pisa, Italy, Dec. 2008.
M. E. J. Newman (2003) ―The structure and function of complex networks‖, SIAM Review,
45:167-256, 2003.
Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, & T. Wu (2009a) ―RankClus: Integrating Clustering
with Ranking for Heterogeneous Information Network Analysis‖, in Proc. 2009 Int. Conf. on
Extending Data Base Technology (EDBT'09).
Y. Sun, Y. Yu and J. Han (2009b) ―Ranking-Based Clustering of Heterogeneous Information
Networks with Star Network Schema‖, in Proc. 2009 ACM SIGKDD Int. Conf. on Knowledge
Discovery and Data Mining (KDD'09).
Y. Sun, J. Han, J. Gao, and Y. Yu (2009c). ―iTopicModel: Information network-integrated topic
modeling‖, in Proc. 2009 Int. Conf. on Data Mining (ICDM'09).
H. Wang, W. Fan, P.S. Yu, and J. Han (2003), ―Mining concept-drifting data streams using
ensemble classifiers‖, in Proc. 2003 ACM SIGKDD Int. Conf. Knowledge Discovery and Data
Mining (KDD'03).
D. J. Watts (2003) ―Small Worlds: The Dynamics of Networks between Order and
Randomness‖, Princeton University Press, 2003.
X. Yin, J. Han, and P. S. Yu, ―CrossClus: User-Guided Multi-Relational Clustering‖, Data
Mining and Knowledge Discovery, 16(1), 2007.
D. Zhang, C. Zhai and J. Han (2009) ―Topic cube: Topic modeling for OLAP on
multidimensional text databases‖, in Proc. 2009 SIAM Int. Conf. on Data Mining (SDM'09).

7.3.7 Linkages to Other Projects/Centers

Knowledge discovery in information network has close ties to many projects in INARC and
three other centers, especially with SCNARC, EDIN and Trust-CCRI. The mining will need
guidance and receive requests from other projects, and in turn provide patterns, hierarchies and
new knowledge to other projects and centers. Therefore, we provide the following table and
propose to work closely with other projects and centers.

IPP Tasks Linkages


I3.1↔ E3,3 E3.3 will guide I3.1 on mining information networks and that developed
at I3.1 will be used at studying clustering and evolution of general
networks in E3.3
I3.1 → I2.3, R2, Network mining will be used in I2.3 (hierarchy construction in
R3 information networks) and in R3 (knowledge extraction and

NS CTA IPP v1.4 7-54 March 17, 2010


characterization of social/cognitive and information networks) at IRC (J.
Hendler & M. Faloutsos)
I3.1 ↔ S2.2 Mining information network developed under I3.1 will be used in the
analysis of information flows in trusted communities in S2.2. S2.2 will
provide concrete examples and tasks for I3.1. (w. B. Szymanski RPI)
I3.2 ↔ C2, I1, I2 The network update tie up with sensor and communication network input
from C2 (Tom La Porta PSU/CNARC)) and information network data
fusion and construction (I1, I2), and also with military applications.
I3.2 ↔ T3 Network updating has lots of semantic issue, especially network impact
on trust, tie with T3 (Sibel Adali and Lada Adamic).
I3.3 ↔ E3.3 E3.3 will guide I3.3 on text mining of information networks, and that
developed at I3.3 will be used for text mining in networks in E3.3. (w. J.
Hendler, IRC)
I3.3 ↔ I2, R3, The multidimensional text analysis I3.3 will be used for trustable network
S3 construction and OLAP I2 and Trust CCRI. These tasks will guide text
mining in integrated networks (w. J. Hendler, N. Contractor at IRC) and
B. Szymanski & W. Gray at SCANRC.

7.3.8 Relevance to US Military Visions/Impact on Network Science


To ensure soldiers and commanders master and analyze complex, dynamic, and interconnected
information on-time, it is essential to turn data into information and high-level knowledge,
incrementally build up and dynamically update information networks online, dynamically
discover knowledge in such networks despite of noise and uncertainty in the battle-field or other
mission-critical environments so that people can understand the current situation and predict
trends and outcomes. Such analysis will help military users make timely and intelligent
decisions. Therefore, this project is critically relevant to many military missions. Moreover, the
systematic development of effective knowledge discovery methods for information networks will
also have strong impact on knowledge discovery methods in other networks and therefore on the
other projects on network science.

7.3.9 Relation to DoD and Industry Research


Knowledge discovery in information networks is highly relevant to many DoD tasks and
industry research and we will design and develop methods with extensible applications in mind
and will collaborate with other DoD projects and industry research labs on related projects.

Project Research Milestones

Due Task Description


Investigate methods for classification of heterogeneous information
networks (UIUC, IBM, UCSB, CUNY)
Q2 Task 1 Formulate problem and investigate possible solutions of evolutionary
pattern discovery in dynamic information networks (UIUC, UCSB)
Formulate problem and investigate possible solutions of determining

NS CTA IPP v1.4 7-55 March 17, 2010


Project Research Milestones

Due Task Description


outliers in an evolving stream environment in information networks.
(IBM, UIUC)
Formulate problem and investigate possible solutions of clustering in
Q2 Task 2
cyber-physical networks (UIUC, IBM, UCSB)
Formulate problem and investigate possible solutions of text mining
Q2 Task 3
for information network analysis (UIUC, CUNY, UCSB)
Design techniques for determining outliers in evolving stream
environment in information networks (IBM, UIUC)
Q3 Task 1
Design techniques for pattern discovery in information networks
(UIUC, UCSB)
Design techniques for real-time methods for cyber-physical network
Q3 Task 2
clustering (UIUC, IBM, UCSB)
Design techniques for text mining for information network analysis
Q3 Task 3
(UIUC, CUNY, UCSB)
Develop and test techniques for determining outliers in an evolving
stream environment for information networks (IBM, UIUC)
Develop and test techniques for pattern discovery in information
Q4 Task 1
networks (UIUC, UCSB)
Write research reports and submit research papers based on the work
done (UIUC, IBM, UCSB)
Develop and test techniques for cyber-physical network clustering, as
specified two subtasks in Task I3.2 (UIUC, IBM, UCSB, CUNY)
Q4 Task 2
Write research reports and submit research papers based on the work
done (UIUC, IBM, UCSB)
Develop and test techniques for text mining for information network
analysis, including the three subtasks specified in the IPP Task I3.3
Q4 Task 3 (UIUC, CUNY, UCSB)
Write research reports and submit research papers based on the work
done (UIUC, IBM, UCSB)

Budget By Organization

Organization Government Funding ($) Cost Share ($)


CMU (INARC) 32,590
CUNY (INARC) 36,664

NS CTA IPP v1.4 7-56 March 17, 2010


Budget By Organization

Organization Government Funding ($) Cost Share ($)


IBM (INARC) 110,227
UCSB (INARC) 89,624
UIUC (INARC) 269,912
TOTAL 539,017

NS CTA IPP v1.4 7-57 March 17, 2010


8 Non-CCRI Research: Social/Cognitive Networks
Academic Research Center (SCNARC)

Director: Boleslaw Szymanski, RPI


Email: szymansk@cs.rpi.edu, Phone: 518-276-2714
Government Lead: Jeffrey Hansberger
Email: jeff.hansberger@us.army.mil, Phone: 757-203-3431

Project Leads Lead Collaborators


Project S1: C. Lin, IBM K. Carley, CMU, (IRC)
Project S2: M. Magdon-Ismail, RPI D. Cassenti, ARL
Project S3: W. Gray, RPI J. Hansberger, ARL
P. Mohapatara, UC Davis (CNARC)
P. Pirolli, PARC (INARC)

Table of Contents
8 Non-CCRI Research: Social/Cognitive Networks Academic Research Center (SCNARC) . 8-1
8.1 Overview ......................................................................................................................... 8-3
8.2 Motivation ....................................................................................................................... 8-3
8.2.1 Challenges of Network-Centric Operations ............................................................. 8-4
8.2.2 Example Military Scenarios ..................................................................................... 8-4
8.2.3 Impact on Network Science ..................................................................................... 8-5
8.3 Key Research Questions ................................................................................................. 8-5
8.4 Technical Approach ........................................................................................................ 8-6
8.5 Project S1: Networks in Organization ........................................................................... 8-9
8.5.1 Project Overview ..................................................................................................... 8-9
8.5.2 Project Motivation ................................................................................................. 8-10
8.5.3 Key Project Research Questions ............................................................................ 8-13
8.5.4 Initial Hypotheses .................................................................................................. 8-14
8.5.5 Technical Approach ............................................................................................... 8-15
8.5.6 Task S1.1: Infrastructure Challenges of Large-Scale Network Discovery and
Processing (R. Konuru, IBM (SCNARC); S. Papadimitriou, IBM (SCNARC); Z. Wen, IBM
(SCNARC); C.-Y. Lin, IBM, (SCNARC); T. Brown, CUNY (SCNARC); A. Pentland, MIT
(SCNARC); A.-L. Barabasi, NEU (SCNARC); D. Lazer, NEU, (SCNARC); J. Hendler,
RPI (SCNARC); W. Wallace, RPI (SCNARC); N. Chawla, ND (SCNARC); A. Vespignani,
Indiana (SCNARC))........................................................................................................... 8-16
8.5.7 Task S1.2: Impact Analysis of Informal Organizational Networks (C.-Y. Lin, NYU
(SCNARC); S. Aral, NYU (SCNARC); E. Brynjolfsson, MIT (SCNARC)).................... 8-19

NS CTA IPP v1.4 8-1 March 17, 2010


8.5.8 Task S1.3: Multi-Channel Networks of People (Z. Wen, IBM (SCNARC); C.-Y.
Lin, IBM (SCNARC); Brian Uzzi, Northwestern (SCNARC)) ........................................ 8-23
8.5.9 Linkages with Other Projects ................................................................................. 8-26
8.5.10 Collaborations and Staff Rotations ...................................................................... 8-27
8.5.11 Relation to DoD and Industry Research .............................................................. 8-28
8.5.12 Project Research Milestones ................................................................................ 8-29
8.6 Project S2: Adversary Social Networks: Detection, Evolution, and Dissolution ......... 8-32
8.6.1 Project Overview ................................................................................................... 8-32
8.6.2 Project Motivation ................................................................................................. 8-33
8.6.3 Key Project Research Questions ............................................................................ 8-35
8.6.4 Initial Hypotheses .................................................................................................. 8-35
8.6.5 Technical Approach ............................................................................................... 8-36
8.6.6 Task S2.1: Detection of Hidden Communities and their Structures (M. Magdon-
Ismail, RPI (SCNARC); M. Goldberg, RPI (SCNARC); B. Szymanski, RPI (SCNARC); W.
Wallace, RPI (SCNARC); D. Lazer, NEU (SCNARC); Z. Wen, IBM (SCNARC)) ........ 8-37
8.6.7 Task S2.2: Information Flow via Trusted Links and Communities (M. Magdon-
Ismail, RPI (SCNARC); S. Adali, RPI (SCNARC); M. Goldberg, RPI (SCNARC); W.
Wallace, RPI (SCNARC)) ................................................................................................. 8-41
8.6.8 Task S2.3: Community Formation and Dissolution in Social Networks (G. Korniss,
RPI (SCNARC); B. Szymanski, RPI (SCNARC); C. Lim, RPI (SCNARC); M. Magdon-
Ismail, RPI (SCNARC); A.-L. Barabasi, NEU (SCNARC); T. Brown, CUNY (SCNARC);
Z. Toroczkai, ND (SCNARC)) .......................................................................................... 8-44
8.6.9 Linkages with Other Projects ................................................................................. 8-47
8.6.10 Collaborations and Staff Rotations ...................................................................... 8-49
8.6.11 Relation to DoD and Industry Research .............................................................. 8-49
8.6.12 Project Research Milestones ................................................................................ 8-49
8.7 Project S3: The Cognitive Social Science of Net-Centric Interactions ....................... 8-54
8.7.1 Project Overview ................................................................................................... 8-54
8.7.2 Project Motivation ................................................................................................. 8-54
8.7.3 Key Project Research Questions and Hypotheses ................................................. 8-55
8.7.4 Summary ................................................................................................................ 8-57
8.7.5 Technical Approach ............................................................................................... 8-57
8.7.6 Task S3.1: The Cognitive Social Science of Human-Human Interactions (W. Gray,
RPI (SCNARC); M. Schoelles, RPI (SCNARC); J. Mangels, CUNY (SCNARC)) ......... 8-58
8.7.7 Task S3.2: Human-Technology and Human-Information Interactions: Develop
Argus-Army, a Net-Centric Simulated Task Environment to collect data on Cognitive
Social Science constructs of interest (W. Gray, RPI (SCNARC); M. Schoelles, RPI
(SCNARC))........................................................................................................................ 8-61
8.7.8 Linkages with Other Projects ................................................................................. 8-63
8.7.9 Relevance to U.S. Military Visions/Impact on Network Science .......................... 8-63
8.7.10 Collaborations and Staff Rotations ...................................................................... 8-64
8.7.11 Relation to DoD and Industry Research .............................................................. 8-64
8.7.12 Project Research Milestones ................................................................................ 8-64

NS CTA IPP v1.4 8-2 March 17, 2010


8.1 Overview

The long term objective of the Center is to advance the scientific understanding of how the social
networks form, operate, and evolve and how they affect the functioning of large, complex
organizations such as the Army. We will also address the issue of adversary networks hidden in
large social networks. For successful operations within a foreign society, such adversary
networks must be detected, monitored, and when necessary, dissolved. Finally, it is clear that
human cognition on one hand directs and on the other is impacted by the network-centric
interactions. Furthering understanding of this complex and important feedback loop is of primary
importance for social network science. The Center will undertake research to gain a fundamental
understanding of the underlying theory, as well as create scientific foundations for modeling,
simulation, measurements, analysis, prediction, and control of social/cognitive networks and
humans interacting with such networks as well as understanding, modeling, and engineering the
impact that these networks have on the U.S. Army.

8.2 Motivation

In large organizations such as the U.S. Army, social networks involve interplay between a formal
network imposed by the hierarchy of the organization and informal ones based on operational
requirements, historical interactions, and personal ties of the personnel. Traditionally it has been
difficult to extract data that can document the interplay between these informal and formal
networks. We address this challenge by planning to utilize data collection developed by the IBM
team of the Center. The data covers employee interactions, communications, activity, and
performance across the IBM Corp. This data will enable us to conduct research to further our
understanding of the fundamental aspects of human communication within an organization and
the impact of social and cognitive networks on issues ranging from team performance to the
emergence of groups and communities.

Currently, and in foreseeable future, the U.S. Army is or will likely be operating in coalition with
allied armies and deeply entangled within the foreign societies in which those missions are
conducted. Social networks of the allies and the involved societies invariably include groups that
are hostile or directly opposed to U.S. Army goals and missions. Such groups are embedded in
larger social networks, often attempting to remain hidden to conduct covert, subversive
operations. The challenging research issues of studying such adversary social networks include
their discovery, the construction of tools for monitoring their activities, composition, and
hierarchy, as well as understanding their dynamics and evolution. We will address these
challenges based on statistical and methods for analyzing large social networks.

Ultimately, the network benefits and impact are limited by the human‘s capability to understand
and act upon the information that they are capable of providing. Hence, the human cognition is
an important component of understanding how networks impact people. Important challenges to
study human cognition in relation to network-centric interactions include finding how limits on
cognitive resources or cognitive processing influence social (or asocial) behavior or what
demands do social behaviors make on cognitive resources and processing that may limit basic
information processing mechanisms such encoding, memory, and attention. We would like also

NS CTA IPP v1.4 8-3 March 17, 2010


to understand what social behaviors (e.g., trust) are most important to or most influenced by Net-
Centric mediated human communications. In terms of performance, the central challenge is to
understand how the design of Net-Centric technologies or the form and format of Net-Centric
information interact with the human cognitive processes responsible for heuristics and biases in
human performance. Finally, there is a challenge to create predictive computational cognitive
models of individuals interacting via Net-Centric technologies (i.e., interactive cognitive agents).

Finally social networks are an important conduit and also reservoir of ideologies and opinions.
Hence understanding the dynamics of community creation, existence, and dissolution around the
ideas or opinions is important aspect of social network science. We plan to investigate different
models of representing ode diversity and its impact on community formation and disintegration.
Engineering such process on real social networks may be important for U.S. Army missions in
foreign societies.

Those are the issues that we will address in our research on cognitive aspects of social networks.
Fundamental research challenges of dynamic and evolving integrated networks are the subject of
the CCRI EDIN to which researchers of SCNARC bring an important view of social, cognitive,
and network science perspective. Since these challenges and associated research are described
elsewhere, no further discussion of these issues is given here.

Central to the operation and efficiency of social networks is the level of trust among interacting
agents. The challenges of studying trust in networked systems are the focus of the CCRI research
in this area to which the SCNARC researchers will fundamentally contribute, as ultimately trust
is a social issue with technology and information playing an important but supportive role. The
corresponding challenges are discussed in a separate Trust section of the IPP, so are no discussed
here.

8.2.1 Challenges of Network-Centric Operations


Social, information, and communication networks are today ubiquitously present in everyday
life. As such, they are both a tool for accomplishing the desired goals, at the level of individual,
an informal group, an organization, or the entire society. Hence, they impact both how large
organizations, such as U.S. Army, operate internally and how they interact with the societies
within which these organization operate. Our understanding of how such basic processes as
human cognition, social interactions, and societal attitudes and ideologies are impacted by being
nested in a multitude of networks and communities enabled by such networks is very crude.
Hence, the challenge is to further develop social theory, our understanding of how cognition
impact and is impacted by net-centric environments, our ability to process the collected network
data, to build the predictive models of network interactions and dynamics are the fundamental
challenges for network science and its applications both to military and civilian applications.

8.2.2 Example Military Scenarios


Network centric environment creates a potential of interacting with a large number of other
agents (including humans), sources of data of not always clear provenance, and potential for
distortion of the received information not only by the sources but also by the links and nodes
through which these sources convey the information. Thus, network centric environments create

NS CTA IPP v1.4 8-4 March 17, 2010


challenges of building a trust in social communities, both formal and informal, of separating
valuable information from noise or even deliberately false information, and of making decisions
based on such information, decisions which in military situation impact lives of the soldiers,
allies, neutrals, and enemies.
One scenario frequently arising is that of a commander taking over management of a town in the
country with insurgency and societal attitudes towards the U.S. Army ranging from
collaborative, to neutral, to antipathy, and finally to full and active opposition. The society within
which the commander operates has therefore full spectrum of networks, formal and informal,
open and hidden, overlapping and influencing each other. Understanding how the dynamics of
such network interactions influences information that he/she receives, how such networks can be
influenced, either suppressed or encourage to grow, is of fundamental importance for the success
of the mission of the commander.
Increasingly, missions of the U.S. Army involve allies with varied level of training and trust.
Understanding not only formal, by the book, interactions with allies, being able to create
informal networks of commanders of allied forces, building trust among them are crucial for
success of allied missions.
The social network theories, predictive models that we plan to research, and algorithms for
processing relevant social data will create a solid foundation for building tools that would be
helpful in the described scenarios.

8.2.3 Impact on Network Science


The proposed research is at the center of the current challenges of the network science, which is
scaling our ability to analyze and predict behavior of dynamical processes on large networks.
This is also the result of involving in the proposed research international leaders of network
science. Successful modeling, both analytically and in silico, of large networks, taking advantage
of large data set that are already collected and are being collected by the participating research
group will advance network science. It will also enable us to verify various existing theories of
social interactions that drive social network dynamics, currently a grand challenge of social
network science.

8.3 Key Research Questions

The modern military increasingly needs to rely on bottom up network processes, as compared to
top down hierarchic processes. This paradigm shift creates unique challenges and invokes
several important questions. How does the pattern of interactions within a military unit affect
performance of tasks? How do formal and informal social networks interact with each other?
How technology (communication networks, information networks) fosters or weakens
connections between people? What kinds of ties external to the Army are necessary to success?
How can we use the massive streams of data to detect adversarial networks? How a social and
cognitive network can quickly extract the most meaningful information for the soldier and
decision maker that is useful in all aspects of their operations from supporting humanitarian
operations to force protection and full combat operations. How informal social networks can be
influenced or engineered, or even dissolved? How the human cognitive processes impact the
performance of the soldier, more generally, human in network centric environment? The research

NS CTA IPP v1.4 8-5 March 17, 2010


that attempts to answer these questions and address the fundamental network science challenges
associated with these answers is the subject of this chapter.

8.4 Technical Approach

The research within Social Cognitive Network Academic Research Center is organized in the
following four projects.

Project S1 – Networks in Organization

Project Lead: Ching-Yung Lin, IBM


Email: chingyung@us.ibm.com, Phone: 914-784-7822

This project focuses on analyzing and understanding fundamental principles guiding the
formation and evolution of large organizational networks and their impact on performance of the
teams embedded in them. This project consists of three tasks, each targeting a major aspect of
social network researches – capturing of networks, impact of networks, and understanding of
networks.

The task S1.1, Infrastructure Challenges for Large-Scale Network Discovery and Processing,
will study the infrastructure challenge in gathering and handling large-scale heterogeneous
streams for social network research, with the context of information network and communication
network. We will conduct system level research to consider how to incorporate real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
Given that informal social network data reside intrinsically in different data sources, informal
networks can usually only be inferred from sampling larger networks. Since partially observed
data is a norm, we will derive mathematical theories to investigate the robustness of graph
sampling and its implications under various conditions. We will investigate what types of
sampling strategies are required to obtain a good estimation on the entire network. We will also
investigate analytic methods to conduct network analysis on only partially observed data.

Task 2 of Project S1, Impact Analysis of Informal Organization Networks, Second, will analyze
the impacts of informal social networks in an organization. We want to learn what and how
informal networks affect the performance of people. We shall model, measure and quantify the
impact of dynamic informal organizational networks on the productivity and performance of
individuals and teams; to develop and apply methods to identify causal relationships between
dynamic networks and productivity, performance and value; to model and measure peer
influence in social networks by statistically distinguishing influence effects from homophily and
confounding factors; and to examine how individuals change their use of social networks under
duress.

Task S1.3, Multi-Channels of People, will investigate the multi-channel networks between
people. With the unique large-scale info-social network system of IBM‘s SmallBlue system, we
are able to capture multi-facets of people‘s relationships via such means as email, instant
messaging, teleconference, etc. Based on this data, we will explore whether channel capacity and

NS CTA IPP v1.4 8-6 March 17, 2010


coding theories in communications and information theory can be extended to the humanity
domain to model the relationship variation and distribution in different channels.

Project S2 – Adversary Social Networks: Detection, Evolution, and Dissolution

Project Lead: Malik Magdon-Ismail, RPI


Email: magdon@cs.rpi.edu, Phone: 518-276-4857

The overall goal of this project is to study adversary networks through the communication and
information signature that such networks create during internal interactions. The broad research
questions which we address in this project include (i) identification of communities in a dynamic
social network, especially hidden and informal groups, based on measurable interactions
between the members, but without looking at details of the interactions, (ii) uncovering relations
between communities in terms of membership, trust and opposition or support, (iii) observing
evolution and the stable cores of communities, especially anomalous and adversary communities
or groups, and their relationships, (iv) understanding how information flows within such
communities and between communities, (v) identifying communities in social networks, which
manifestly emerge as a results of communication and information flowing across the links, (vi)
developing efficient strategies to dissolve communities in social networks, corresponding to
adversarial communities with hostile, extremist and/or militant ideologies. To address the key
research questions of this project, we defined three tasks.

In the first task of this project, S2.1 Detection of Hidden Communities and their Structures, we
use interaction data over time to build a picture of community structure and community
evolution, including information pathways and inter-community relationships. This is an
appropriate first step in understanding the core of social networks.

In the second task, S2.2 Information Flow via Trusted Links we build agent-based models to
study how information pathways are affected by the varying degrees of trust between individuals
and communities in heterogeneous networks which contain adversary (non-trusted) as well as
non-adversary (trusted) networks.

In the third task, S2.3, Community Formation and Dissolution in Social Networks, we will
develop individual-based models for opinion formation in order to detect communities in social
networks. Further, we will develop the efficient strategies and trade-offs for attacking and
disintegrating adversarial communities with hostile, extremist, and/or militant ideologies. Our
long-term objective is to develop generically applicable frameworks and computational methods
for extracting individual- to community-level behavioral patterns from the underlying social
graphs.

Project S3 – The Cognitive Social Science of Net-Centric Interactions

NS CTA IPP v1.4 8-7 March 17, 2010


Project Lead: Wayne Gray, RPI
Email: grayw@rpi.edu, Phone: 518-276-6576

This project will bring the computational modeling techniques of cognitive science together with
the tools and techniques of cognitive neuroscience to ask how the design of the technology
(human-technology interaction), the form and format of information (human-information
interaction), or features of communication (human-human interaction) shape the success of net-
centric interactions. It includes three tasks.

The first task S3.1: the Cognitive Social Science of Human-Human Interactions investigates
cognitive mechanisms that influence human interactions. Our initial topic will temporarily
combine this effort with the CCRI on trust to investigate how cognitive mechanisms are affected
by trust and how human evaluations of trust influence our subsequent cognitive processing of
information. A year 1 focus of SCNARC 3 will be to examine the effect of trust on cognitive
processing and variations in human trust over human-human versus human-agent interactions.
Specifically, we hypothesize that differences in trust are signaled by differences in cognitive
brain mechanisms and that these differences can be detected by event-related brain potential
(ERP) measures and related to established cognitive science constructs, which in turn can be
incorporated as improvements in the ACT-R cognitive architecture. A key element in the study
will be the analysis of cognitive brain data collected from humans as they receive information
from the interactive cognitive agent or other humans.

The second task, S3.2: Human-Technology and Human-Information Interactions, in the first year
will develop a Net-Centric Simulated Task Environment to collect across multiple locations data
on Cognitive Social Science constructs of interest. The focus will be on technology mediated
human-human interactions along with an emphasis on interaction design and information form,
format, and order. These data will be collected and analyzed across at least three sites: RPI,
CUNY, ARL APG, and PARC.

Together, these three project consider the whole spectrum of human aspects of network science,
from cognition to social interactions and the whole spectrum of the social networks, from formal
and open, to informal, to highly organized but hidden. Together, they promise to create scientific
foundation for our understanding of dynamics in network science for large scale diverse social
and cognitive networks.

NS CTA IPP v1.4 8-8 March 17, 2010


8.5 Project S1: Networks in Organization

Project Lead: Ching-Yung Lin, IBM


Email: chingyung@us.ibm.com, Phone: 914-784-7822

Primary Research Staff Collaborators


S. Aral, NYU/MIT (SCNARC) S. Adali, RPI (SCNARC)
A. Barabasi, NEU (SCNARC) L. Adamic, U Michigan (INARC)
T. Brown, CUNY (SCNARC) C. Aggarawal, IBM (INARC)
E. Brynjolfsson, MIT (SCNARC) D. Agrawal, IBM (INARC)
N. Chawla, ND (SCNARC) K. Carley, CMU (IRC)
J. Hendler, RPI (SCNARC) C. Faloutsos, CMU (INARC)
R. Konuru, IBM (SCNARC) T. Huang, UIUC (INARC)
D. Lazer, NEU (SCNARC) P. Mohapatara, UC Davis (CNARC)
C. Lin, IBM (SCNARC) D. Parkes, Harvard (IRC)
S. Papadimitriou, IBM (SCNARC) J. Srivastava, U. Minnesota (IRC)
A. Pentland, MIT (SCNARC) B. Szymanski, RPI (SCNARC)
B. Uzzi, Northwestern (SCNARC) Z. Toroczkai, ND (SCNARC)
A. Vespignani, Indiana (SCNARC) X. Yan, UCSB (INARC)
W. Wallace, RPI (SCNARC)
Z. Wen, IBM (SCNARC)

8.5.1 Project Overview


Any science progress relies on large-scale practical data collection and analysis. No disciplinary
can be considered as science without real observations. Project S1 focuses on understanding
human behaviors and social networks, based on modeling and experimenting from the large-
scale, multiple-relation data of human, and on understanding the social network formation,
impacts, as well as its interaction with information and communication network.

Based on our existing expertise in building large-scale distributed socio-info network system, we
shall also conduct the network-centric system researches in understanding the architecture needs
for real-time decision making.

NS CTA IPP v1.4 8-9 March 17, 2010


Project S1 shall include 13 Researchers in the SCANRC, from highly diverse disciplines,
including computer science, electrical engineering, economics, management, physics, politics,
etc. It shall also collaborate with 12 other researchers in INARC, IRC, CNARC, and SCNARC.
Many first-stage researches will utilize IBM, a global organization of 400,000 employees, as the
model and theory formation and experiment test-beds, by the large-scale data of human that are
currently being collected inside IBM. In the future, after the models are matured and the privacy
issues of Army personnel are adequately handled, more experiments and analysis will then
conducted directly in the Army.

8.5.2 Project Motivation


There is a large body of literature on social networks in organizations going back many decades
[Borgatti03]. Typical findings demonstrate that individuals that span structural holes in the
network are more likely to be productive [Burt92], and that teams with large communication
gaps in their structures are less likely to be successful [Cummings03]. However, little research
has been done that has leveraged the ample data that is created by people's interactions, such as
e-mail, call logs, text messaging, document repositories, web 2.0 tools, and so on. Moreover, the
literature on organizational networks suffers from the same deficits that much of the social
network literature does, tending to focus on small, static networks. As a result, important
questions including ―what information flows through a network and which link is more effective
at transmitting the information?‖ as well as ―what is the appropriate timing of communication?‖
have been completely neglected.

Military Scenarios and Challenges of Network-Centric Operations

In principle, social networks in the Army are informal organization networks, which relate to the
formal hierarchical networks. How do social networks form and evolve in an organization? What
kinds of roles do the formal organizational hierarchy and informal social networks affect the
Army‘s personnel‘s performance? How different divisions of people work together, and, in what
way, in social networks can enhance interdivision collaboration? Can trust be propagated from
one division to another?

To understand social networks in the Army, it is important to gather and analyze the real
relationship data of Army personnel. However, due to privacy and security concerns, it would be
very difficult to conduct this type of data collection, especially for researchers outside the Army.
Therefore, social network research will have to heavily rely on external data – either public or
gathered in other organizations. After theories are developed and verified in external sources,
then experiments can be done inside Army organizations to validate or modify the theories.

Research on people networks in organizations provides insights and predictive models on how
people in big, hierarchical organizations work with each other. These models shall serve as
foundations for Army social networks modeling. Research on infrastructure requirement in an
organization corresponding to the size of the Army will help to understand the infrastructure that
is needed for social network analysis, as well as cross-network analysis, in the Army. We target
several theories: (1) Network Sampling Theory that explains how to best observe social
networks, (2) Networked Social Capital Theory that quantifies the value of social networks, and

NS CTA IPP v1.4 8-10 March 17, 2010


(3) Networked Social Capacity Theory that describes different channels of human interactions
and how people should allocate their capacity over the channels. These theories will help people
have better understanding of networks and should have significant impact on the Network
Science.

Prior Work

We have been making qualitative and quantitative studies of organizational networks by building
methods, tools, and theories to examine the many digital traces left within organizations of
collaboration and communication. Our objective is to use behavioral data to understand the
relational structures in an organization to find what behaviors correlate with friendship, advice
giving, and information sharing and what patterns are associated with success at both the
individual level and the collective level, focusing particularly on the dimension of network
dynamics.

Figure S1-1 System overview of the original SmallBlue social network analysis and expertise mining engine.
Tens of thousands of distributed social sensors were deployed in 76 countries to capture communications
between people as well as the term frequencies of the communication contents.

IBM Corporation has 400,000 worldwide employees with complex hierarchical structures. There
can be as many as 10 to 15 layers of hierarchy from the CEO to the general employees. There are

NS CTA IPP v1.4 8-11 March 17, 2010


many organizational report-to and manage relationships inside this big organization – similar to
the Army.

Since 2006, our team at IBM has been inventing and deploying in more than 70 countries an
organizational social network analysis system, SmallBlue [Lin08], to quantitatively infer the
social networks of 400,000 employees within IBM organizations. SmallBlue deploys social
sensors to gather, crawl and mine significant amount of data sources, including content and
properties of individual email and instant message communications, calendars, organizational
hierarchical structure, project and role assignment, employee performance measurement,
personal and project revenue, and so on. These sensors have been placed on more than 10,000
volunteers' machines. Millions of continuous dynamic data items have been being processed in
the server in order to discover the valuable business intelligence of who knows what, who knows
whom, and who knows what about whom within an organization. It also unlocks the value of
social networks through analyzing social network data in conjunction with the individual and
project financial gain [Wu09]. The aim of SmallBlue is to automatically locate knowledgeable
colleagues, communities, and knowledge networks in organizations. It also helps users manage
their personal networks, reach out to their extended network to find and access expertise, reveal
personalized relevant information such as documents or webpages that are shared or found useful
by their extended network, and visualize millions of keyword-based social networks of subject
experts in organizations. SmallBlue provides Google-like expertise search capabilities.

The SmallBlue framework is a general platform for analysis, indexing and querying of social
networks, derived from continuous streams of information such as email, instant messaging,
blogs, and wikis. The input consists of arbitrary feeds, which are aggregated at the appropriate
granularities, to derive and infer both social links (i.e., relationships among people) and expertise
(i.e., relationships between people and content). The initial version of SmallBlue shown in
Figure S1-1 focused on social networks. A later version of SmallBlue shown in Figure S1.2 also
crawls webpages and databases and receives data feeds to infer information networks and the
cross-network relationships between people and information. This integrated system includes
social, info, and socio-info networks, and has been applied inside IBM for network-centric
personalized search and recommendation.

By Nov. 2009, the SmallBlue system has been capturing 20,000,000 emails and instant messages
(with the communication party information and term-frequency statistics of content), 1,000,000
items of Learning asset access data (including which Learning courses and materials employees
accessed), 10,000,000 items of Knowledge asset access data (including who accessed which
knowledge assets, e.g., technical documents, presentations, market analysis, business
intelligence, etc.), 1,000,000 items of Web 2.0 social software usages (social bookmarks, blogs,
file sharing, etc.), 200,000 employees‘ external financial billing databases, and 400,000
employees‘ organization profiles (including hierarchy, location, demographic data, job roles, job
categories, self-reported skills, resumes, etc.) The system keeps acquiring live data inside the
company to provide timely and real-time services for social network analysis, expertise mining,
social-network-enabled information recommendation, and search. It is unique in its scale, live
environment, and trustworthy information, as well as the multiple facets of people. It certainly
can help network scientists have better understanding of human behavior, and thus build up
models that can be experimented on and verified.

NS CTA IPP v1.4 8-12 March 17, 2010


Figure S1-2: Layered structure of the current SmallBlue socio-info network analysis and application
architecture. The infrastructure includes Acquisition, Analysis, Index, and Service layers. It is used for
graph analysis, people analysis and content analysis, and provides services for information and social
relationship search, recommendation, visualization, etc.

Impact on Network Science

Project S1 will focus on generating these outcomes that will have long-term impact on Network
Science:

(1) Perform System Research in suggesting what kinds of infrastructure design will better
fit the goal of network science for real-time large-scale distributed decision making and
utilization of network data capture and analysis.

(2) Derive Network Sampling Theories to help future network systems understand the
resources needed to deploy sensors, either physical or virtual, for data gathering.

(3) Derive Networked Social Capital Theories to quantify the value of social networks. The
theories will include both micro-scale (e.g., person, or a small team) and macro-scale
(overview of the network structure of a division or organization).

(4) Derive Networked Social Capacity Theories to understand how people utilize networks
using multiple channels of interaction given their constrained capacity (e.g., time).

8.5.3 Key Project Research Questions

How do people networks form, operate, and have impact in large organizations?

NS CTA IPP v1.4 8-13 March 17, 2010


The various specific research questions addressed in this project include:
What infrastructure features will best support efficient and secure storage, processing,
indexing, and queries against dynamic networks?
In terms of network data capturing & sampling, what will be the equivalent of the
Nyquist Rate of Networks?
What are the impacts of dynamic informal organizational networks on the productivity
and performance of individuals and teams?
How do people utilize social networks across divisions?
How to quantify the quality and value of social networks?
How do people utilize social networks under stress?
How do people networks form through multiple relationships, e.g., project
collaborations, information exchange, communities, formal organizations, etc?
How does a network in one channel relate to the others, e.g., how social networks
derived from communications drive the accuracy of targeted knowledge sharing?
How do people distribute resources through multiple channels of relationships?

8.5.4 Initial Hypotheses


First, we shall study the infrastructure challenge in gathering and handling large-scale
heterogeneous streams for social network research, with the context of information network and
communication network. Today‘s SmallBlue should be able to handle social networks and
information networks in a global organization of half-million personnel. Although these
networks evolve, they usually do not require making decisions or analyses in seconds. They
usually allow batch processing. However, communication networks and cognitive networks will
need more real-time analysis with millisecond decisions, considering ever changing
communication channel capacity/quality and human cognition capacity/quality. We will conduct
system level research to consider how to incorporate this type of real-time network processing
requirements into the existing SmallBlue socio-info network infrastructure. One of the
technologies we will be looking at is the novel stream processing system that we developed for
the Department of Defense -- IBM InfoSphere Stream, which enables aggressive intelligence
extraction of information and knowledge from heterogeneous continuous data streams of 10 to
100 Gbits/sec. We shall conduct research to incorporate the InfoSphere-like system for real-time
network decision making.

In practice, it is difficult to thoroughly capture all information about every individual in the same
degree of detail. For instance, data streams from physical sensors of soldiers in the combat field
may not always available. Sometimes, the information can be missing and thus make it
impossible for the system to understand the current social and cognitive network status of
individuals. In another scenario, global privacy laws enacted in many countries prohibit
communication data being processed by the service provider beyond the scope of providing
communication service. Therefore, gathering a complete set of non-anonymized communication
information from a service provider usually draws negative legal and privacy debate. Instead, we
collect data from volunteers and use that data to infer the network on a much larger scale. Given
that informal social network data reside intrinsically in different data sources, informal networks
can usually only be inferred from sampling larger networks. Since partially observed data is a
norm, we will derive mathematical theories to investigate the robustness of graph sampling and

NS CTA IPP v1.4 8-14 March 17, 2010


its implications under various conditions. We call this a Network Sampling Theory. We will
investigate what types of sampling strategies are required to obtain a good estimation on the
entire network. We will also investigate analytic methods to conduct network analysis on only
partially observed data.

Second, we will analyze the impacts of informal social networks in an organization. We want to
learn what and how informal networks affect the performance of people. We shall model,
measure and quantify the impact of dynamic informal organizational networks on the
productivity and performance of individuals and teams; to develop and apply methods to identify
causal relationships between dynamic networks and productivity, performance and value; to
model and measure peer influence in social networks by statistically distinguishing influence
effects from homophily and confounding factors; and to examine how individuals change their
use of social networks under duress. With the understanding of the impact of social networks, we
can thus derive the real ‗value‘ or ‗quality‘ of social networks. We refer to this as Network
Capital Theories. Being able to quantify the quality of social network shall become a critical
contribution into the researches in CNARC, which operates communication networks based on
the quality measurement of other networks.

Third, we will investigate the multi-channel networks between people. With the unique large-
scale info-social network system of IBM‘s SmallBlue system, we are able to capture multi-facets
of people‘s relationships, including email communications, instant messaging communications,
teleconference or face-to-face meetings, file sharing, social bookmarking sharing, blog
interaction, wiki collaborative document composition, knowledge sharing, etc. This SmallBlue
system also collects the content of the above multi-channel interactions. Thus, it provides a
unique opportunity to observe how people networks form and interchange between different
channels. We shall also explore whether channel capacity and coding theories in
communications and information theory can be extended to the humanity domain to model the
relationship variation and distribution in different channels.

8.5.5 Technical Approach


Overview
Project S1 consists of 3 tasks. (1) Infrastructure Challenges for Large-Scale Network Discovery
and Processing. (2) Impact Analysis of Informal Organization Networks, and (3) Multi-Channels
of People. Each of these tasks targets one major aspect of social network researches – capturing
of networks, impact of networks, and understanding of networks.

NS CTA IPP v1.4 8-15 March 17, 2010


8.5.6 Task S1.1: Infrastructure Challenges of Large-Scale Network Discovery and
Processing (R. Konuru, IBM (SCNARC); S. Papadimitriou, IBM (SCNARC);
Z. Wen, IBM (SCNARC); C.-Y. Lin, IBM, (SCNARC); T. Brown, CUNY
(SCNARC); A. Pentland, MIT (SCNARC); A.-L. Barabasi, NEU (SCNARC);
D. Lazer, NEU, (SCNARC); J. Hendler, RPI (SCNARC); W. Wallace, RPI
(SCNARC); N. Chawla, ND (SCNARC); A. Vespignani, Indiana (SCNARC))

Task Overview
We plan to conduct basic researches on how to design infra-structure that will support efficient
storage, updates, and queries against social network data. Data items that contain the association
between entities along with a timestamp are continuously collected from multiple feeds. Each of
those items may have content and several different metadata fields that need to be stored. A
scalable and efficient infrastructure to store and retrieve these dynamic data sets is necessary.
Specifically, the infrastructure should be powerful, flexible, and simple to use for all these
sources. Appending new data items should be fast, updating existing items should be possible,
and efficiently accessing the entire corpus for aggregate analyses should be efficient and
scalable. Furthermore, the infrastructure should support distributed processing and analysis of
the data. Finally, the infrastructure should be interoperable with systems and models of other
types of networks, e.g., the information network systems from INARC and the communication
network systems from CNARC.
Task Motivation
We will conduct system level research to consider how to incorporate this type of real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
One of the technologies we will be looking at is the novel stream processing system that we
developed for the Department of Defense -- IBM InfoSphere Stream, which enables aggressive
intelligence extraction of information and knowledge from heterogeneous continuous data
streams of 10 to 100 Gbits/sec. We shall conduct research to incorporate the InfoSphere-like
system for real-time network decision making.
Since partially observed data is a norm, we will derive mathematical theories to investigate the
robustness of graph sampling and its implications under various conditions. We call this a
Network Sampling Theory. We will investigate what types of sampling strategies are required to
obtain a good estimation on the entire network. We will also investigate analytic methods to
conduct network analysis on only partially observed data.
Key Research Questions
Design a set of benchmarks and use cases to evaluate infrastructure needs for social network data
storage and processing. In particular, to what extent can web crawling, social sensor mining, full-
text storage, indexing, processing, and search infrastructure and methods be leveraged? How can
these be integrated with large-scale data processing software (e.g., Apache Hadoop DFS and
MapReduce, HBase, etc)?

As social networks grow exponentially over time, any conclusion drawn from partial network
observations could be biased and deceiving. It is often impossible to obtain the entire network
because of resource constraints and continuous network growth. Because of time constraints, in
many applications it is impossible to perform the computation of network measures and

NS CTA IPP v1.4 8-16 March 17, 2010


statistics, e.g., average connectivity, closeness, and density, over the entire social networks.
However it is also unnecessary. One can sample a portion of the underlying network and perform
an approximate calculation. If the sample captures the characteristics of the whole network,
graph analysis on the sample will generate results with a good precision much faster.
Nevertheless, we need to model partially observed (or sampled) networks with special care. We
plan to address two research issues related to the analysis of partially observed networks. First,
what is the error range for a measure calculated from a partially observed network? Second, how
can we develop a model to reduce the error by predicting the growth of observed networks? The
solution to these questions will help us measure the confidence of patterns and knowledge
discovered in partially observed networks and the validity of these discoveries to predict future
development of these networks.

Initial Hypotheses
In order to sample a large scale social network, we will develop several graph sampling methods:
(1) randomly sample the original graph, (2) sample globally, for each sampled node, also sample
its neighbors, and (3) choose one node and sample its neighbors with k-hops. We will investigate
which sampling strategy is most effective for different network analytical tasks. In addition, we
will study if the network topology plays a significant role in selecting the right sampling
techniques for finding high-quality patterns.

Prior Work
The core abstraction for these kinds of data is sparse matrices, or equivalently, adjacency lists.
There are a number of approaches to store, index, and query such data. Traditional information
retrieval systems such as Lucene [Lucene09] store document-term matrices. However, these are
hard to scale to large clusters of commodity hardware. In response to challenges posed by
requirements for analyzing the web graph, a number of approaches have emerged, including the
Google Filesystem [Ghemawat03], MapReduce [Dean04], BigTable [Chang06], Hadoop, Pig,
Hive [Hadoop09], and so on.

Technical Approach
We feel that none of the already existing solutions are suitable for our setting, out of the box, for
the following reasons. First, both indegree and outdegree distributions may be heavily skewed: in
a web graph, the indegree may be heavily skewed (e.g., CNN frontpage versus personal
homepages), but the outdegree distribution usually is not. Therefore, in graphs arising from
social interactions, processing ―super-nodes‖ poses significant performance penalties. Second,
performing traversals naively on the graph is slow: if the graph is not somehow clustered so
graph neighbors are stored together, traversal is slow. Finally, we need a clean way to deal with
different information sources (e.g., emails, profile data, webpages, and bookmarks, etc) in a
clean manner. We currently use different stores for each (e.g., pure relational databases for
profile data, Lucene for emails, a mix of both for bookmark data, Solr for data with text content
and typed faceted information and so on), and joining different data sources is often cumbersome
and time-consuming.

These are problems that need to be addressed, whether by adapting or modifying existing
approaches, by writing general enough ―glue‖ components, or by building completely new
components (to be determined), as necessary. At this stage, SmallBlue employs fairly

NS CTA IPP v1.4 8-17 March 17, 2010


rudimentary data analytics and simple statistics, but as we move forward, we expect both data
volume as well as the required sophistication of analytics to increase. Although our goal is not to
build a complete infrastructure from scratch, we need to have in place whatever is essential for
further research and more advanced analyses.

We plan to work on the following issues in the Year 1:

Q2. Design a set of benchmarks and use cases to evaluate infrastructure needs for social network
data storage and processing. In particular, to what extent can web crawling, social sensor mining,
full-text storage, indexing, processing, and search infrastructure and methods be leveraged?
How can these be integrated with large-scale data processing software (e.g., Apache Hadoop
DFS and MapReduce, HBase, etc)?

Q3. Start design and experiment prototype infrastructure to store and query graphs, which also
allows: (i) efficient retrieval based on graph metrics (e.g., weighted neighbor range queries,
shortest path, etc.); (ii) efficient aggregation based on node relationships; (iv) support for
heterogeneous types of nodes and edges; and (iv) support for both original, observed data (e.g.,
emails, webpages, etc) as well as derived information (e.g., document topics, user profiles and
expertise, etc).

Q4. We initially plan to start from homogeneous graphs (single type of nodes and edges) and
then expand into dynamic large-scale heterogeneous graphs.

In the next step at Year 2, we shall investigate how to incorporate streaming updates. Different
data sources need to be updated at different frequencies and/or will produce data at different
volumes. For some data sources, the rate of updates may be high enough to warrant a different
approach for storage and indexing.

Validation Approach
We plan to evaluate the needs that streaming updates pose; the open, Internet version of
SmallBlue, which will be developed as part of another project, will hopefully provide the
necessary high data volume testbed to evaluate alternatives.
Summary of Military Relevance
This study is important to allow the Army to understand the system-aspect challenges in
analyzing large-scale networks of network-centric systems. Especially, it helps to understand
how the system shall be designed to consider various networks – social, info, and
communications – together.
Research Products
There shall be understanding of the system needs – how the network-centric system shall be
designed and the design recommendations.

NS CTA IPP v1.4 8-18 March 17, 2010


8.5.7 Task S1.2: Impact Analysis of Informal Organizational Networks (C.-Y. Lin,
NYU (SCNARC); S. Aral, NYU (SCNARC); E. Brynjolfsson, MIT
(SCNARC))

Task Overview
The purpose of this task is to model, measure and quantify the impact of dynamic informal
organizational networks on the productivity and performance of individuals and teams; to
develop and apply methods to identify causal relationships between dynamic networks and
productivity, performance and value; to model and measure peer influence in social networks by
statistically distinguishing influence effects from homophily and confounding factors; and to
examine how individuals change their use of social networks under duress.

Task Motivation
Productivity and Social Network: It is well-accepted that social dynamics and chemistry can
dramatically affect the team performance. One way to measure the productivity of individuals or
group is based on revenues or other quantifiable success measurements.
Causality of Network Impacts: Determining the direction of causality is central to
understanding how to help employees improve their productivity. Currently, our analyses focus
on correlations. The results would be much stronger if we can establish a causal mechanism of
how knowing more executives would help improve a worker‘s productivity, and truly debunk the
alternative hypothesis that high performers are more likely to attract managerial interaction.
Using our existing social network infrastructure, we can learn this mechanism through
randomized experiments and through extensive interviews in the field.
Peer Influence of Social Networks: We propose to study the degree to which networks `spread'
productivity by making those who are connecting to productive peers more productive, or
alternatively the degree to which productive workers attract network contacts.
Utilization of Social Networks under Duress: We shall explore how people activate their social
networks when they are under duress. We hypothesize how people use their network ties can
have a profound impact on they can internalized the stress as well as improving their chance of
obtaining new opportunities.

Key Research Questions


What are the impacts of dynamic informal organizational networks on the productivity
and performance of individuals and teams?
How do people utilize social networks across divisions?
How to quantify the quality and value of social networks?
How do people utilize social networks under stress?

Initial Hypotheses
Anecdotal evidence from a recent article in New York Times shows that some people tend to
immerse themselves in a web of close relationship. Deepening their relationship with friends and
loved ones help them cope with the psychological stress from being unemployed. A recent Wall
Street journal article also profiled a Wall Street banker graduated from MIT who also was laid

NS CTA IPP v1.4 8-19 March 17, 2010


off from a prominent financial firm. However, on the contrary, he immediately expands his
network ties, making contacts to anyone that he could think of. He made sure that he is on
everybody‘s radar for any possible job openings. In this paper, we plan to explore how people
use their network when they are laid off. Specially, we like to answer questions such as what
types of people tend to activate what type of ties and what is the structure for the activated
network. If possible, we hope to understand if leveraging network ties has any effect on
eventually landing a new position.

Prior Work
To the best of our knowledge, we published the first large-scale quantitative study on the
connection between social dynamics and productivity [Wu09]. We derived the social network
data of 7500 employees from their electronic communications at a large information technology
firm over 2 years. In total, about 400,000 people were in the aggregated social network. We
focused our study on the 2500 consultants in our sample and collected detailed data on the 2,592
projects these consultants participated in from June 2007 to July 2008. The sheer volume of the
data allowed us to more precisely estimate how population level topology in a network
contributed to information worker productivity, after controlling for human capital, work
characteristics, and demographics.
Social networks has long been theorized to help people realize better outcome in the labor
market (e.g. [Granovetter73], [Castilla05]) Network contacts can offer job seekers with
information about where to find jobs as well possibly influence the hiring process in the seeker‘s
favor. However, in certain situation or in certain network configuration, job seekers are limited
in activating their network, such as those weak and long ranging ties [Granovetter73]. For
example, Smith [Smith05] explores how urban black are forced to leverage their strong ties even
if weaker and long ranging ties exist. She attribute this phenomenon to the lack of trust from the
weakly connected individual to help the job seeker who fear that referring the job seeker may foil
their own reputation at the workplace. Her work shows that the network people activate are
different from their potential network. Social psychologists have long emphasized the
importance of studying the cognitive network as opposed to the objective networks
[Krackhardt87]. In this study we study how individuals cognitively activate their social network
under duress. This is an important and largely unexplored area in social network research.

Technical Approach

Productivity and Social Network: Specifically, we uncovered four key results. First, we found
that the structural diversity of social networks is positively correlated with performance,
corroborating previous work [Aral07]. Second, network size was positively correlated with
higher productivity. However, when we separated network size into in-degree and out-degree,
we found that while in-degree was positively correlated with higher work performance, out-
degree was not correlated with performance in the project network in which each node was a
project not a person. Third, for both the employee and the project network, knowing many
executives was positively associated with work performance. However, having many managers
on a project was negatively correlated with project revenues. Fourth, we found that betweeness
centrality was negatively correlated with individual productivity while it was positively
correlated with project revenues.

NS CTA IPP v1.4 8-20 March 17, 2010


Causality of Network Impacts: This aspect of our investigation focuses on the identification
of influence in social networks. While observations of correlation between network position and
productivity and performance are important pursuits, correlation does not demonstrate causation.
If for example we find that the most productive employees in organization whose networks
display the highest betweeness centrality, we might conclude that these workers‘ positions in the
social network afford them timely access to information that helps them to be more productive or
to outperform their peers. An equally likely explanation, however, is that high performing
employees are more likely to be sought after when others seek advice. They are essentially star
performers, who perform well because of their personal attributes, and find themselves in certain
network positions as a result of their performance. To explore the precise mechanism of how
network attributes enable a person to be successful above and beyond a person‘s inherent ability,
we propose to conduct experiments that randomly assign certain people with desirable
connections. We then measure the performance differences before and after the connections are
made. If work performance improved after the intervention, we can reasonably determine that
network positions indeed cause a person to be more productive. Understanding the causal
relationship network properties would be a breakthrough in the field of social networks and
organization. Establishing this relationship in a large dynamic network is even more powerful,
since results using data spanning multiple years in a large group of people would be more
convincing than results coming from a small or biased sample.

Peer Influence of Social Networks: We propose to study the degree to which networks
`spread' productivity by making those who are connecting to productive peers more productive,
or alternatively the degree to which productive workers attract network contacts. We intend to
develop and employ several different methods in order to separate selection, homophily and
influence from exogenous contemporaneously correlated confounding shocks to individual‘s
productivity. We intend to do this using the following methods:
1. Peer effect models of social influence which extend previously developed theory from the
class of spatial autoregressive models. When groups vary in size or structure, deviations from
group means can under certain assumptions be identified using various subsets and supersets of
the graph as instrumental variables.
2. Actor-oriented models which model the dynamic co-evolution of networks and behavior as
continuous time Markov models. These are based on panel network data estimated with Markov
Chain Monte-Carlo methods to reduce the dimensionality of the state space.
3. Matched sample estimators which simulate experimental settings by matching potential
treatment nodes with treatment comparison nodes that are likely to be similar on observed
dimensions. These methods will help us address the role of influence in social networks.

Utilization of Social Networks under Duress: The recent financial downturn provides us a
unique opportunity to study how people activate their social ties under distress. In Jan 2009,
IBM, the largest information technology company in the world, underwent a restructuring
process and eliminated 10% of their work force world-wide, resulting more than 40,000 people
losing their jobs. However, one of the unemployment benefits at IBM is that the laid-off
employees do not have to leave IBM immediately. In the next two months, they can still use all
the resources of an active employee to look for a position within the firm. We expect a large part
of their daily communication would be attributed to looking for a new position. Since SmallBlue

NS CTA IPP v1.4 8-21 March 17, 2010


has been capturing people‘s daily electronic communication (email and IM) since 2005, we are
able to construct social networks before and after lay off to examine how people activate their
social networks when they lose their job. Using two years of network data prior to the lay off this
allow us to assess the potential or latent social networks a person has. We then can examine how
this latent network is activated during the two months when they can still access company
resources. We suspect people will activate their networks differently in a consistent way. For
example, certain people, such as low status individuals and minority may not be able to activate
their week ties [Smith05]. First we will look at how lay off induces changes on any network
characteristics, such as size, various measure of centrality and social cohesion, by running the
following model:

Change_in_Network_Characteristics = a + b*layoff + c*demographics information +


d*job role + e

We are interested in the size the statistical significance of ―b‖.

Next, we plan to study how the dynamics of network activation over time. Perhaps, people may
initially embrace their strong ties and immerse themselves in a web of densely connected
network for moral support and coping with the psychological stress induced from the job loss.
Then perhaps, we will see people activating their weak and long ranging ties for finding future
opportunities. The duration for each stage may vary and it is important to see if any personal or
network characteristics moderate the durations in each stage.

The ideal outcome is that we see a distinctive pattern of how people activate their social network
under distress. The activation pattern should be different from those who were not laid off. From
our initial analysis, we see a communication peak, as measured by total email exchange, for
those who were laid off as compare to those who are not. Next, we plan to delve deeper into
these communication exchanges to see whom people talked to and their relationships. We hope
to find a pattern of the activation strategies when people are under duress.

Validation Approach
We plan to evaluate the models, especially the causality studies, based on the SmallBlue system.

Summary of Military Relevance


This task shall generate novel models in understanding the impact of social networks to human‘s
performance. It may lead to establish quantitative network capital theory, which understands the
quality of social network and its potential impact to the person‘s performance. It shall then
provide guidelines to the information and communication centers in knowing what the important
social network factors are, in order to make them more successful.

Research Products
Novel study results and understanding of human networks, as well as, the network-affected
performance is further understood. There was no precedent large-scale study. The results shall be
insightful to the progress of network science.

NS CTA IPP v1.4 8-22 March 17, 2010


8.5.8 Task S1.3: Multi-Channel Networks of People (Z. Wen, IBM (SCNARC); C.-
Y. Lin, IBM (SCNARC); Brian Uzzi, Northwestern (SCNARC))

Task Overview
This task will investigate the multi-channel networks between people. With the unique large-
scale info-social network system of IBM‘s SmallBlue system, we are able to capture multi-facets
of people‘s relationships, including email communications, instant messaging communications,
teleconference or face-to-face meetings, file sharing, social bookmarking sharing, blog
interaction, wiki collaborative document composition, knowledge sharing, etc. This SmallBlue
system also includes the content of the above multi-channel interactions. Thus, it provides a
unique opportunity to observe how people network forms and interchange between different
channels. We shall also develop theories to model the effectiveness of the channels for people to
exchange information and build relationship, and thus help people appropriately allocate their
limited capacity over the channels. In particular, we will explore whether these theories can be
built by extending channel capacity and coding theories in communications and information
theory to the humanity domain.
One study of understanding multi-faceted relationships of people is to know how and whether
people with similar interests would voluntarily gathering together, having more interactions,
sharing more information, and automatically forming communities. In other words, if ―Birds of a
Feather Flock Together‖ is correct, how true it is? How diverse are social networks formed by
people?
Modeling user interests is important for search and recommender systems to provide
personalized information to meet individual user needs [Teevan05]. Towards this goal, existing
works have studied a user's explicit interests specified in his profile, or implicit interests
indicated by his prior interactions with various types of information, such as content the user has
created or read including web pages, documents and email. Recently, the proliferation of online
social networks spark an interests of leveraging social network to infer user interests [White09],
based on the existence of social influence and correlation among neighbors in social networks
[Singla08]. For applications that can directly observe a user's behavior (e.g., logs of search
engines he uses), inferring interests from his friends in social networks provides one extra useful
enhancement. For many other applications, however, it is difficult to observe sufficient behavior
of a large number of users. In such scenarios, inferring their interests from their friends can be
the only viable solution. For example, for a new user in a social application, the application may
only have information about his friends who are already using it. To motivate the new user to
actively participate, the application may want to provide personalized recommendations of
relevant content. To this end, the application has to infer his interests from friends.
However, there exists huge variation in the types and amount of information in social
interactions. According to existing studies on enterprise social networks [Brzozowski09], only a
small percentage of online users (e.g., <10%) may actively contribute social content using one or
more social software (e.g., blogs, social bookmarking and file sharing). But a large number of
users (e.g., >90%) may seldom do so. Moreover, certain user contributed data may not be
accessible (e.g., private files) or cannot be associated with a particular user (e.g., anonymous
data). That results in both a demand and a challenge for accurate user interest modeling,
especially for inactive users. On one hand, accurate user interest modeling can provide
personalized search and recommendation results, and thus may help to increase the usage of

NS CTA IPP v1.4 8-23 March 17, 2010


social software. On the other hand, the available observations of users are sparse and exist in
multiple types of media. In this task, we shall investigate how relevant people are related to their
social neighbors and how far the social sphere of a person could affect him. We shall also
investigate algorithms to estimate how accurate a person‘s interests can be inferred in social
networks. These researches shall be critical in offering clues for information recommendation as
well as deciding the Quality of Information in social network context.

Task Motivation
People always interact with others through many different ways and have different types of
relationship. The researches in this task are to take human as a basic unit, to understand how a
person handle and allocate different relationships with other people – under the resources such as
time or information constraints.
Taking human as a basic research study unit, we shall also be interested in investigating how a
person‘s interaction with information relates to other people in social networks. This type of
scenario can help the Army understand how people‘s information sharing behavior relates to the
longer-term social interaction behavior. And, also, from another angle, how the people in
personal social networks possible affect one‘s decision or infer one‘s own interest.

Key Research Questions


How do people networks form through multiple relationships, e.g., project
collaborations, information exchange, communities, formal organizations, etc?
How does network in one channel relate to the others, e.g., how social networks derived
from communications drive the accuracy of targeted knowledge sharing?
How do people distribute resources through multiple channels of relationships?

Initial Hypotheses
People‘s capacity is limited. How do people allocate their resources in keeping relationships? We
assume it is possible to derive relationship channels through Dynamic Probabilistic Complex
Network (DPCN) Modeling. Through the DPCN models, we shall then be able to further
consider it in conjunction with channel capacity theories in information theories to model
human‘s capacity issues, especially regards to the people-to-people networks and the people-to-
information networks.

Prior Work
We proposed a Dynamic Probabilistic Complex Network (DPCN) model to predict people
behavior in diffusing different types of information propagation in network [Lin07]. DPCN
models the states of nodes and edges as Markov models, which are infected into different states
as information spreads.

Technical Approach
Figure S1-3 shows a description. DPCN models the time and probabilistic factors in information
propagation, and estimates time requirements to spread information throughout a region of
network, based on previous behavior modeling of individuals on topic spreading.

NS CTA IPP v1.4 8-24 March 17, 2010


Figure S1-3: Dynamic Probabilistic Complex Network (DPCN) Model. State transitions of edges: S-D-A-R
model. (Susceptible, Dormant, Active, and Removed) This indicates the time-aspect changes of the state of
edges. States of nodes: S-A-I model. (Susceptible, Active, and Informed) Trigger occurs when the start node
of the edge changes from state S to state I. Topic of Information triggers state transition throughout network.

[Definition -- DPCN] A Dynamic Probabilistic Complex Network can be represented by a


Dynamic Transition Matrix P(t), a Dynamic Vertex Status Random Vector Q(t), and two
dependency functions fM and gM.
p1,1 (t ) p 2,1 (t ) p N,1 (t ) q1 ( t )
p1,2 (t ) p 2,2 (t ) p N,2 (t ) q 2 (t )
P( t ) , Q( t ) ,

p1,N (t ) p 2,N (t ) p N,N (t ) q N (t )

P(t t) f M (Q(t ), P(t )), and Q(t t) gM (P(t t ), Q(t ), P(t )),

where
Pr( yi , j (t ) S ) i, j
Pr( xi (t ) S ) i
Pr( yi , j (t ) D ) i, j
pi,j (t ) , q i (t ) Pr( xi (t ) A) i , i, j i, j i, j i, j 1 and i i i 1.
Pr( yi , j (t ) A) i, j
Pr( xi (t ) I ) i
Pr( yi , j (t ) R ) i, j

xi (t ) is the status value of vertex i at time t, yi , j (t ) is the status value of edge i →j at time t.
The network topology follows the characteristics of complex network, e.g., (1) the node
degrees shall follow the Power-law:
d
Pr( u( pi, j ) l ) S l
i

1, if t , Pr( yi , j (t ) null ) 0
where u( pi , j ) , and d is typically in the range of 2 ~ 2.5, and (2) the
0, else
small-world phenomenon such as the clustering coefficient C:
C Pr(u( p j ,k ) 1| u( pi , j ) 1, u( pi ,k ) 1) CTH

NS CTA IPP v1.4 8-25 March 17, 2010


where CTH is typically at 0.2.
In the definition, each value of the transition/relation matrix P and the vertex status vector Q
represents one type of relationship channel between nodes (i.e. people). Each relation channel
can be a topic relationship, a type of communication medium, etc. By modeling the multiple
channels in the above DPCN model, it can then be further studied based on some constraints,
e.g., the capacity of each channel – either it is inside each it is time/resource constraint of each
node or each edge.

In the Year 1, we shall investigate the large data we collected using this DPCN model and then
derive theories and algorithms to describe the channels. In the following years, we shall then
further investigate how people allocate their capacities to use different types of channels and
consider the capacity issues from the macro network viewpoints.

Validation Approach
We shall use the data collected in the SmallBlue system to conduct researches and validate
models.

Summary of Military Relevance


This study will help the Army to understand how people handle their social relationships and
how people make priorities in keeping relationships when there are multiple channels of
interaction. With the intensive study and observation on the interaction of people with
information, this study shall also help the Army to understand, e.g., the capacity of people in
terms of handling information in these different channels of interaction. Given that every human
has only limited resource and time, what is the most important relationship to keep and what is
the most effective channel? How to rank the importance of the relationships and the effectiveness
of channels, especially from personal perspectives? After human beings have better
understanding on these questions, better network-centric decision making or information routing
can better suit Army personnel‘s needs.

Research Products
Studies as well as Models will be derived and tested, and then published.

8.5.9 Linkages with Other Projects


Project S1 is related to the Projects I1, I2, and I3, the CNARC project in communication sensor
network data collection and processing as well as utilizing the output of our Network Capital
measurements, and the IRC projects on social network, economic analysis, and joint research on
creating a system for monitoring and processing about information on social network of
researchers involved in NS CTA. It is also related to the two CCRIs.

The SmallBlue system, a mature social network and information network analysis product-level
platform, can be an important experimental vehicle to the entire CTA. To foster collaboration
with all other CTA projects, the SmallBlue platform shall gradually evolve to include the

NS CTA IPP v1.4 8-26 March 17, 2010


Internet data processing and collection capabilities as well as directly handling of communication
network data.
This project provides an important insight of deep human behavior, interest, intention
understanding, based on its unique capability to thoroughly capture multiple faceted data of
people. To the best of our knowledge, no other academic or industry research institutes had ever
been able to capture such detailed information of people. Such understanding shall provide
foundation for all other CTA projects.

IPP Tasks Linkage


S1.1  S2 System requirements of adversary network
analysis
S1.1  S3 System requirements of cognitive network
analysis
S1.3  I1.2 Video and Image Analysis to infer the
relationships of People
S1.1  I1.1 General information network gathering and
processing requirements
S1.1  I3.2 Spatio-Temporal of stream data should be
considered in the infrastructure study
S1.1  I2 Requirements of graph processing systems
S1.1  CNARC Requirements of handling communication sensor
network data
S1.1  IRC System requirement researches for large-scale
social, information and communication networks
S1.1  T1.2.a Information and social network activities for
trust researches on provenance
S1.2  CNARC Quality of social network
S1.3  E3.2 Multiple relationships of people for co-
evolutionary networks of social, information,
and communication.
S1.3  T3.1 Trust relationships of people for distributed
oracles
S1.3  CNARC People capacity in handling multiple
relationships

8.5.10 Collaborations and Staff Rotations


Project S1 will have very close collaboration with INARC, IRC, CNARC, and other team
members in SCNARC. The Project Leads of S1 (Ching-Yung Lin), I1 (Charu Aggarawal), and
I2 (Xifeng Yan) were all originally working on the same IBM team for the DoD Distillery
project IBM. The Task Lead of I3.2 (Spiros Papadimitriou) is a primary leading researcher of
Task S1.1. The Task Lead of I1.2 (Thomas Huang) was the thesis advisor of Task S1.3 Lead
(Zhen Wen). The Project Lead of I2 (Xifeng Yan) was in the SmallBlue team until Dec 2008.
There have been significant collaborations between Project S1 and INARC.

NS CTA IPP v1.4 8-27 March 17, 2010


Project S1 has been collaborating with CCRI Task T1.2.a Lead (Mudhakar Srivasta) on the
Provenance researches since June 2009. We have also been collaborating with Christos Faloutsos
and Lada Adamic in the INARC for years. We shall strengthen our close collaborations with
these INARC researchers.

Since late 2008, the key researchers in Project S1 have been closely working with other team
members in the SCNARC to create the Center proposal. There has been tremendous mutual trust
and interesting discussions between teams so strong collaborations are expected in the future.
Especially, RPI and IBM have a very strong collaboration history. Many key IBM researchers,
including the current Head of worldwide IBM research organization, are RPI alumnus. The
collaboration between IBM and New York State universities (e.g., CUNY and NYU) is
especially supported by New York State Government. IBM welcomes visiting students or
researchers of the NS Consortium to come to IBM to conduct researches on anonymized data
inside firewall.

In addition, we are building new collaborations with SCNARC researchers from Northeastern,
Northwestern, Indiana, MIT, and Notre Dame. We are also initiating new collaborations with
IRC and CNARC researchers. For instance, we shall collaborate with IRC researchers Kathleen
Carley on social network analysis systems, Jaideep Srivastava on social/information network
search and recommendations, and David Parkes on economic impact of networks, and with
CNARC researcher Prasant Mohapatara on communication network issues with social/info
networks. We are looking forward to building up more collaborations in the years to come.

Project S1 will provide a researcher (with Ph.D. degree) to the NS center in every other year.

8.5.11 Relation to DoD and Industry Research


Project S1 is related to the DoD Distillery project that is a multiyear $40M research program
awarded to IBM Research by the Maryland Procurement Office. This award ends at 2009.
Distillery is a ―Grand Challenge‖ Research Collaboration aimed at creating the technology for a
new era of data-driven computing. As the amount of information potentially available to
enterprises and government organizations rises, the need to extract knowledge from the data
efficiently and effectively will become more acute. Since there will always be too much raw data
to study and too much information to analyze, it will be necessary to change focus and priority,
to introduce new algorithms and analysis techniques, to accommodate changes in the technology
and information content, in a continuously operating dependable system. Distillery is
establishing a new paradigm of computing, with requirements different from those of classic
high-end transaction processing, database management, web serving, and scientific computing.
Systems must be arbitrarily scalable in many dimensions, able to change and improve on all
levels of granularity, useful for a variety of organizational purposes by numerous people, and
capable of safeguarding the confidentiality and privacy of information. The project seeks
breakthroughs rather than incremental progress. Since 2008, this system has been being
converted into an industrial commercial product – IBM InfoSphere Stream, which has been
provided to customers for real-time large-scale stream data analysis.

NS CTA IPP v1.4 8-28 March 17, 2010


Project S1 is based to the IBM‘s internal SmallBlue Project, which has been being provided to
customers as IBM Lotus Atlas. Internally, IBM‘s Global Business Service division and IBM‘s
Advanced Learning Institute have been the sponsored of the internal projects. These divisions
provide use cases, requirements, real users, and experimental platforms for researches. After
research results are coined, then IBM‘s Lotus software division coverts the research prototypes
into hardened and robust product-level Atlas software services.

8.5.12 Project Research Milestones

Research Milestones

Due Task Description


Report on the design a set of benchmarks and use cases to
Q2 Task S1.1 evaluate infrastructure needs for social network data storage and
processing.
Report on the design of randomized experiments to show the
causal relationship between network structure and performance;
Q2 Task S1.2
Processing data for those who left the firm after layoff as well
as their network usages.
Preparing, processing, and indexing heterogeneous data of
Q2 Task S1.3
human based on time units
Start design and experiment prototype infrastructure to store
Q3 Task S1.1 and query social network graphs; Derive the network features
need to be preserved under network sampling.
Collect social network and performance data during the
randomized experiments to infer causal relations. The data
Q3 Task S1.2 collection process should take at least 6 months. Report on how
workers under duress activate ties and if there is any discernable
pattern.
Report on the accuracy of the Dynamic Probabilistic Complex
Q3 Task S1.3
Network model of human behaviors.
Report on system design issues from homogeneous graphs
(single type of nodes and edges) into dynamic large-scale
Q4 Task S1.1
heterogeneous graphs; Report on the relationships between
sample rates and the network features that can be preserved.
Append the recommendation system for SmallBlue so that it
can make beneficial recommendations to start observing the
Q4 Task S1.2 impact of performance and productivity changes after system
intervention. Report on the observations; Derive algorithms to
see whether it is possible to predict successfulness of finding

NS CTA IPP v1.4 8-29 March 17, 2010


Research Milestones

Due Task Description


new jobs inside company based on the patterns of social
networks.
Derive models to consider people‘s capacity constraints into the
Q4 Task S1.3
way they interact with each other.

Budget By Organization

Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 32,557
IBM (SCNARC) 329,804
IU (SCNARC) 11,000
MIT (SCNARC) 14,009
MIT (SCNARC) 40,000
ND (SCNARC) 11,000
NEU (SCNARC) 96,969
NWU (SCNARC) 14,000
NYU (SCNARC) 40,000
RPI (SCNARC) 45,921 8,383
TOTAL 635,260 8,383

References

[Aral07] S. Aral, E. Brynjolfsson, and M. Van Alstyne. Productivity effects of information diffusion
in networks. In Proceedings of the 28th Annual International Conference on Information
Systems, Montreal, CA, 2007.
[Borgatti03] S. Borgatti and P. Foster. The network paradigm in organizational research: A review
and typology. Journal of Management, 29(6):991--1013, 2003.
[Burt92] R. Burt. The Social Structure of Competition. 1992.
[Castilla05] Castilla, E. Social networks and employee performance in a call center. Amer. J. of
Sociology 100(5) 1243-1283, 2005.

NS CTA IPP v1.4 8-30 March 17, 2010


[Chang06] Fay Chang, Jeffrey Dean, Sanjay Ghemawat, Wilson C Hsieh, Debora A Wallach, Mike
Burrows, Tushar Chandra, Andrew Fikes, Robert E. Gruber, "Bigtable: A Distributed Storage
System for Structured Data, OSDI 2006.
[Cummings03] J. Cummings and R. Cross. Structural properties of work groups and their
consequences for performance. Social Networks, 25(3):197--210, July 2003.
[Dean04] Jeffrey Dean, Sanjay Ghemawat, "MapReduce: Simplified Data Processing on Large
Clusters", OSDI 2004.
[Granovetter73] Granovetter, M. "The strength of weak ties." American Journal of Sociology, 6:
1360-1380, 1973
[Ghemawat03] Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, "The Google File System",
SOSP 2003.
[Hadoop09] Apache Hadoop http://hadoop.apache.org/
[Krackhardt87] Krackhardt, D. Cognitive social structures. Soc. Networks. 9 109-134, 1987.
[Lin07] C.-Y. Lin. Information flow prediction by modeling dynamic probabilistic social network. In
International Conference on Network Science, New York, May 2007.
[Lucene09] Apache Lucene http://lucene.apache.org/
[Lin08] C.-Y. Lin, K. Ehrlich, V. Griffiths-Fisher, and C. Desforges. Smallblue: People mining for
expertise search. IEEE Multimedia Magazine, Jan.-Mar. 2008.
http://smallblue.research.ibm.com.
[Papadimitriou08] S. Papadimitriou and J. Sun. Disco: Distributed co-clustering with map-reduce. In
IEEE International Conference on Data Mining, Pisa, Italy, December 2008.
[Singla08] P. Singla and M. Richardson. Yes, there is a correlation – from social networks to
personal behavior on the web. In WWW Conference, pp. 655-664, Beijing, April 008.
[Smith05] Smith, S. S. "Don't put my name on it": Social capital activation and job-finding
assistance among the black urban poor. Amer. J. Sociology. 111 1-57., 2005.
[Teevan05] J. Teevan, S. T. Dumais, and E. Horvitz. Personalizing search via automated analysis of
interests and activities. In ACM SIGIR, pp. 449-456, Salvador, Brazil, August 2005.
[White09] R. W. White, P. Bailey, and L. Chen. Predicting user interests from contextual
information. In ACM SIGIR, pp. 363-370, Boston, July 2009.
[Wu09] L. Wu, C.-Y. Lin, S. Aral, and E. Brynjolfsson. Network structure and information worker
productivity: New evidence from the global consulting services industry. In Proceedings of
Winter Information Systems Conference, University of Utah, Salt Lake City, UT, February 2009.

NS CTA IPP v1.4 8-31 March 17, 2010


8.6 Project S2: Adversary Social Networks: Detection, Evolution,
and Dissolution

Project Lead: M. Magdon-Ismail, RPI


Email: magdon@cs.rpi.edu, Phone: 518-276-4857

Primary Research Staff Collaborators


S. Adali, RPI (SCNARC) K. Carley, CMU (IRC)
A. Barabasi, NEU (SCNARC) N. Chawla, ND (SCNARC)
T. Brown, CUNY (SCNARC) M. Faloutsos, UCR (IRC)
M. Goldberg, RPI (SCNARC) J. Han, UIUC (INARC)
G. Korniss, RPI (SCNARC) J. Hendler, RPI (IRC)
D. Lazer, NEU (SCNARC) C. Lin, IBM (SCNARC)
C. Lim, RPI (SCNARC)
M. Magdon-Ismail, RPI (SCNARC)
B. Szymanski, RPI (SCNARC)
Z. Toroczkai, ND (SCNARC)
W. Wallace, RPI (SCNARC)
Z. Wen, IBM (SCNARC)

8.6.1 Project Overview


Adversary Networks are social communities that do not advertise themselves; they may be an
informal spontaneous gathering into a group, a coalition, or a movement, or they may be
organized, planning some activity (typically malicious). The goal of this project is to identify the
communities, understand how they evolve and relate to each other and to develop a theory of
how community structure and heterogeneities in social networks together with trust relationships
affect the flow of information through the network, through trusted and non-trusted communities.
Social networks are the unseen global conduits of information relying on the underlying
communication and information network infrastructure. A fundamental understanding of this
community structure and how information flows through it is therefore one of the important
questions at the forefront of network science. The particular challenges are in developing a
theory for discovering overlapping communities and tracking their evolution; in understanding

NS CTA IPP v1.4 8-32 March 17, 2010


the relationships between these communities including adversary (non-trusted) versus trusted; in
developing a theory of how information flows through such a structure. It can be seen that there
are significant linkages between this project, the trust CCRI, and the projects in INARC and
CNARC through the facts that the dynamics of the information and communication networks
will influence the formation, structure, and dynamics of social networks, and vice-versa. Further,
we will also consider the following fundamental research questions: What are the efficient
strategies and trade-offs for attacking and disintegrating adversarial communities with hostile,
extremist, and/or militant ideologies? Our long-term objective is to develop generically
applicable frameworks and computational methods for extracting individual- to community-level
behavioral patterns from the underlying social networks. In particular, by implementing
stochastic agent-based models for opinion formation on empirical networks, we will develop
models with predictive power, applicable to networks of various scales. As the availability of
information affects social interactions, and likewise, the integrity of the communication
infrastructure impacts social interactions, our project will rely on models and methods developed
by the INARC, CNARC, and IRC.

8.6.2 Project Motivation

Challenges of Network-Centric Operations


Network centric operations occur in social networks with information flowing to and from
military decision makers. In order to understand the capabilities of a social network, it is
necessary to understand the structure and evolution of its inter-related communities. One
immediate challenge is that the social networks are not observable. All we can observe are the
behaviors of the individuals on the underlying communication and information networks. Based
on this behavior, it is thus necessary to build an understanding of the underlying social networks,
both adversary and non-adversary, which are consistent with the observed behavior. This is the
challenge addressed by this project: how can we understand the structure, relationships and
evolution of the community structure in a social network, based only on minimal information
regarding the behavior of the individuals in these networks (interactions between individuals).
The challenge becomes more apparent when one observes that the observed interactions are vast
and random in nature.

Example Military Scenarios


As already mentioned, Adversary Networks are social communities that do not advertise
themselves; they may be an informal spontaneous gathering into a group, a coalition, or a
movement, or they may be organized, planning some activity (typically malicious). Examples
which highlight the importance of studying the detailed community structure of a social network,
including all its adversary networks are: (1) A hidden group that uses vast background
communications to mask its own communications. The hidden group communications may be
related to planning some malicious activity; (2) an initial (benign) social network that has been
functioning for some time. Suddenly some malicious group starts communicating with some of
the members of the social network, building some support. Gradually this support grows as the
adversary group starts becoming part of the original network and influencing its evolution; (3) an
idea (piece of information) or ideology that begins to spread through the communication
network, creating a group of individuals who are spontaneously forming a movement. Will this

NS CTA IPP v1.4 8-33 March 17, 2010


movement grow or fizzle out? (4) A piece of information has been received from a member of
adversary network X regarding the operations of adversary network Y. What is the observed
relationship between networks X and Y? Should the information be trusted? Further, our
individual-based models for opinion formation will provide an array of strategies for ―what if‖
scenarios, that will enable us to answer such questions as where and how to defend network
communities with neutral or tolerant ―ideologies‖ against militant infiltration, and conversely,
where and how to attack and disintegrate adversarial communities exhibiting hostile, extremist
and/or militant ideologies.

For example, IW is a complex and ambiguous inherently social phenomenon: ―insurgency and
counterinsurgency operations focusing on the control or influence of populations, not on the
control of an adversary‘s forces or territory‖ (IW, Joint Operating Concept, 2007). Likewise, ―an
operation that kills five insurgents is counterproductive if collateral damage leads to the
recruitment of fifty more insurgents‖ (COIN). To help operations in such an environment, in this
project we will develop individual-based models to investigate social influencing and associated
strategies in weighted social networks. Our methods and models for community detection,
community stability, and social influencing will be applicable to any data sets, spanning vast
scales, including those collected by the military.

Impact on Network Science


Social network science aims to build theories to model and predict properties of social networks.
A first and fundamental building block of any such network science is the ability to understand
the evolving community structure of existing and functioning social networks. This project aims
at developing the methods to understand the underlying structure of general networks of varying
natures and scales based on observable interactions. The project will develop the statistical
network science methods (which will bring together statistics and graph theory) necessary to
build such an understanding. The ability to understand evolving network structure will feed into
models of formation, stability, dynamics and network evolution, to name a few of the tasks under
the EDIN CCRI of this project. In addition, it will allow us to describe the adversary versus non-
adversary networks, a topic which should be of fundamental to the Army, especially from the
viewpoint of social corroboration of intelligence, which should be essential to decision making
processes involving asymmetric threats. Further, community structure and its evolution will have
consequences on measures of trust; in turn trust will have a big impact on how we identify the
communities, their relationships and internal structure. Interactions are based on information
exchange via communication, and so the evolution of the information networks and
communication networks will play a significant role on community structure evolution (input
from INARC and CNARC) and in turn, community structure will drive information and
communication network evolution (output to INARC and CNARC).

In conclusion, being able to identify the hidden networks and communities (some adversary and
some supportive) in a social network is a fundamental task to building a theory of network
science. Sound methods for this project would thus lay one of the foundations for much research
in the field.

NS CTA IPP v1.4 8-34 March 17, 2010


8.6.3 Key Project Research Questions
In general communities are hidden or informal, and many adversary communities actively
attempt to remain hidden within a larger casual social network. In many cases, communities are
autonomously formed with individuals not necessarily aware of the communities they have
implicitly become members of.

The broad research questions which we address in this project include:

Research Question 1. How can we identify communities in a dynamic social network, especially
hidden and informal groups, based on measurable interactions between the members, but
without looking at details of the interactions? For example, if the interactions are
communications, can we understand the underlying social structure of the network by only
looking at the communication dynamics of the network, and not the communication
contents?
Research Question 2. How do the communities relate to each other in terms of membership,
trust and opposition or support?
Research Question 3. How do we discover the evolution and the stable cores of communities,
and identify anomalous communities or groups, some of which may be adversary?
Research Question 4. How do the relationships between communities evolve? For example, do
intersections tend to grow before a merger? Do the relationships between large communities
evolve in a similar way to the relationships between small communities?
Research Question 5. How does information flow within such communities and between
communities?
Research Question 6. How do we identify communities in social networks which manifestly
emerge as the result of communication and information flowing across the links (S2.3.1)? In
a military setting, these communities correspond to adversarial communities.
Research Question 7. How does the frequency of communication across the links (edges) affect
the emerging community structure (S2.3.1)?
Research Question 8. How do we influence/dissolve communities in social graphs (S2.3.2)? In a
military context, the answers shall yield ways to disintegrate adversarial communities with
hostile, extremist and/or militant ideologies.

In general we would like to address these research questions in a general and scalable way, so
that they may apply to diverse networks ranging into the tens of millions of nodes. Thus, we
envision validating all conclusions on real, stochastic networks ranging from million node
networks to thousand node networks. Traditionally social science research involving detailed
investigations of community structure have either not taken overlap of communities into account
or only applied to very small networks.

8.6.4 Initial Hypotheses

The technical path to achieving these research goals will require innovations in a number of
interlinked areas, which naturally leads to the initial hypotheses we will investigate:

NS CTA IPP v1.4 8-35 March 17, 2010


Hypothesis 1 Any definition of a community should satisfy certain minimal ―community
axioms‖. In particular, communities in social networks overlap; communities are local
structures which display more introvertedness than extrovertedness in terms of observed
interactions. We propose efficient algorithms to discover communities satisfying such a
definition and study whether there are certain general definitions introvertedness which may
be applied to a variety of networks versus specific versus definitions which apply to specific
networks.
Hypothesis 2 Community evolution can be stitched together based on the community structure
over different time steps. This would be based on a definition of community evolution (input
from EDIN CCRI) and efficient algorithms for matching community structures to identify
when a community has grown, shrunk, split into two or more communities or died.
Hypothesis 3 Trust and community structure interrelate, one of the essential building blocks of
strong communities is trust. Measures of trust can be used to further refine the definition of a
community (input from Trust CCRI); in addition, the discovered community structure can
help to further refine measures of trust. Thus through a process of bootstrapping, we can
develop better trust measures and in turn better community structure.
Hypothesis 4 The relationships between communities can be discovered through (i) the overlap
between the communities; and, (ii) the interactions between nodes of the different
communities. In particular, the prolonged intense interactions and regularity of interactions
can be measured and used to identify detailed internal community structure as well as inter-
community relationships. Further refinements may be obtained via more detailed analysis of
the interactions (e.g. the text of the communications). In this way, we may distinguish the
relationships between communities (support versus adversary) and the internal structure of
communities (leaders, bridgers, bonders).
Hypothesis 5 It is possible to develop initial community structure, and relationships, purely using
statistical interaction data (without detailed semantics of the interactions). This is the only
feasible approach to make inferences on the community structure for large scale networks.
We believe that individual-based models for opinion dynamics can be effectively used to
detect and identify communities in social graphs (S2.3.1).
Hypothesis 6 Based on the emerging community structure, influencing a small number of
selected individuals with critical positioning in the social network will be sufficient and will
provide an efficient to ideologically dissolve or disintegrate adversarial or hostile
communities (S2.3.2).

8.6.5 Technical Approach


Overview
The basis of this work is to use measurable interactions between members of the social networks
as the lens for understanding communities and information flow. There is considerable prior
work on identifying community structure, predominantly in a static network, focusing on non-
overlapping communities [Clauset04, Newman04a, Newman04b]. Neither static networks, nor
restriction to non-overlapping community structure is relevant to dynamic social networks. This
project will build from initial work [Baumes05, Baumes07a, Baumes07b, Baumes08,
Goldberg08a, Goldberg08b] on discovering overlapping community structure, and how it can be
used to model network evolution, and information flow both when the social network is normally

NS CTA IPP v1.4 8-36 March 17, 2010


operating and under stress conditions [Hui08a, Hui08b, Magdon-Ismail05, Magdon-Ismail06].
The main innovation which we propose to develop is that, unlike SIR type models studied in
mathematical epidemiology for infection spread, information flow is an active process, requiring
an agent based model. We will build agent based axioms for information flow, which derive
from the underlying community and structure of the social network. We will test our methods for
understanding community structure within a software system SIGHTS [Baumes07a] which will
provide a framework for general analysis of social network structure and evolution. In particular,
the methods and conclusions of this work would be available to other core areas of the CTA with
the specific goals of understanding how changes in the social network drive changes and
dynamics of the underlying information and communication networks, and in turn how changes
in the information and communication networks drive the evolution of the social network.
Most traditional methods to find community structure utilize various forms of hierarchical
clustering, spectral bisection (Scott00; Newman06; Newman05; Wu04), clique optimization
(Palla06), or iterative edge removal (Newman04). In contrast, in this project, we will utilize an
array of individual-based models for opinion dynamics, where during the evolution of the
systems, communities explicitly manifest themselves through meta-stable clusters of shared
stylized opinions (e.g., religions or cultures). Tracking the evolution of the system does not only
allow us to identify underlying community structures (Blatt96; Reichardt04; Lu09; Kim09;
Cai05), but will also enable us to find agents with critical roles and answer such questions as
how to attack and disintegrate adversarial communities exhibiting hostile, extremist and/or
militant ideologies.

To address the key research questions of this project, we define two tasks. In the first task, S2.1
Detection of Hidden Communities and their Structures, we use interaction data over time to build
a picture of community structure and community evolution, including information pathways and
inter-community relationships. This is an appropriate first step because it aims at understanding
the core of social networks. In the second task, S2.2 Information Flow via Trusted Links we build
agent-based models to study how information pathways are affected by the varying degrees of
trust between individuals and communities in heterogeneous networks which contain adversary
(non-trusted) as well as non-adversary (trusted) networks.

8.6.6 Task S2.1: Detection of Hidden Communities and their Structures (M.
Magdon-Ismail, RPI (SCNARC); M. Goldberg, RPI (SCNARC); B.
Szymanski, RPI (SCNARC); W. Wallace, RPI (SCNARC); D. Lazer, NEU
(SCNARC); Z. Wen, IBM (SCNARC))

Task Overview
The long-term goal addressed by this task is to understand all the communities and their
evolution in large functioning social networks - the hidden and informal ones in addition to the
self-advertising ones. This is the first step toward understanding the community structure in a
dynamic social network.

Task Motivation
Challenges: As discussed above, social networks are typically not seen; however, the random,
statistical interactions are observed, and based on these interactions, we would like to understand

NS CTA IPP v1.4 8-37 March 17, 2010


the evolving community structure, as well as the relationships between communities, including
adversary versus non-adversary.
Military Scenarios: Corroborating information in asymmetric warfare requires an understanding
of the landscape of adversary networks. Hence an understanding of which networks are
adversary and of the community structure in general is relevant to Military operations.
Impact on Network Science: Social network science aims to build theories to model and predict
properties of social networks. Most work has focused on static network using methods for
finding communities which require the groups to be non-overlapping. We need methods for
finding the community structure which allows communities to overlap, and to make them
efficient, these methods should use only statistical interaction data. The theory we will develop
in this project will be the first step in building the models of dynamic networks proposed in the
EDIN CCRI of this IPP.
Key Research Questions
We can decompose this long term goal into a set of smaller main research questions:

Research Question 1. How can we identify communities and internal community hierarchy,
including information pathways, stable and unstable points?
Research Question 2. How can we identify leaders and influential members in communities?
Research Question 3. How can we build an initial understanding of the relationships between
communities, for example opposition versus cooperation?
Research Question 4. How can we track the evolution of these communities and their
relationships over time, including their stable cores?

Initial Hypotheses
It is possible to discover community structure using statistical graph theoretical analysis of the
interaction data. Deeper analysis can lead to internal structure of the communities and overlap
and interactions between members of different communities can discover relationships between
communities. Identifying communities over successive time steps, together with matching
communities in successive time steps can lead to identifying the evolution of communities.
Prior Work
There is considerable prior work on identifying community structure, predominantly in a static
network, focusing on non-overlapping communities [Clauset04, Newman04a, Newman04b].
Neither static networks, nor restriction to non-overlapping community structure is relevant to
dynamic social networks. This project will build from initial work [Baumes05, Baumes07a,
Baumes07b, Baumes08, Goldberg08a, Goldberg08b] on discovering overlapping community
structure, and how it can be used to model network evolution. Our work focuses on discovering
overlapping communities from vast random interactions.
Technical Approach
The basis of our approach is to use measurable behaviors of individuals to identify local
interactions between pairs of nodes. Though our methods will be general, the initial basis will be
dyadic communications and information exchange. The first step will be to give a precise and
formal definition of a community, which satisfies a set of minimal requirements:
1) It should be a local definition.

NS CTA IPP v1.4 8-38 March 17, 2010


2) It should allow groups to overlap.
3) Groups can be heterogeneous in the same social network.
4) It conforms to an intuitive notion that a group should be more intense internally versus
externally.
5) It has some measure of non-improvability -- i.e. at a minimum, moving one node does not
locally improve the situation -- local optimality.

For example, the very popular methods in [Clauset04, Newman04a, Newman04b] fail to satisfy
the most basic property 1) above. Given such a definition we will develop algorithms which find
such groups whose interactions display a notion of persistence of the locally intense
communication flow between members of the community; in building these notions, it is
imperative that social communities can overlap. Accordingly, we will develop a theory for
identifying overlapping communities in social networks whose members are connected by
patterns of persistent locally intense communication. More specifically, we define a notion of
density E(G) for a group G, and propose that
1. E(G) is a local measure which captures the notion of intensity.
2. G should be locally optimal with respect to E.
Given such a definition, the task will be to find all such G which persist; to identify overlaps
(relationships) which are significant; to identify the evolution and stable cores; and to identify
the internal hierarchy.

Our approach is statistical in that we do not appeal to the semantic contents of the interactions.
In particular, this will allow our methods to be applied to massive networks. Additionally, we
have to develop measures of statistical significance and reliability of the discovered
communities, and community membership. Such measures can themselves be useful for Army
decision making.

Based on our ability to understand community structure, the next task we address is to
understand community evolution; in particular, we will develop methods to understand when a
community has changed its composition or structure, in turn identifying its stable cores which
persist over the long time scale; when it has grown substantially; when it has died; and, when it
has split into two or more sub-communities. The fundamental building block here will be an
understanding of how to match communities to identify similarities and dissimilarities. One
dimension of the relationship between communities is adversary versus cooperative, and how
that changes with time. Our approach to understanding the finer structure within the
communities, such as role and hierarchy and adversary versus cooperative will be based on three
main approaches: (i) statistical communication content analysis (recursive data mining) for role
identification; (ii) statistical communication pattern analysis for topic detection, and (iii) social
network representation based on bi-colored edges representing opposition versus support for
improved community detection. Topic and sentiment analysis are already well studied, and will
not be the topic of our research; we will adapt existing research to fit our goals (see for example
[Pang05, Taboda04, Bethard04]).

Validation Approach
We will test all our methods on real data, from multiple networks on multiple scales: Currently,
data available to us are email networks from the Enron email corpus and the IBM small blue

NS CTA IPP v1.4 8-39 March 17, 2010


project; the Livejournal Blog Network; the Twitter communication network. These networks are
large functioning networks of different natures, and will serve as robust tests for the models,
varying in scales from hundreds of thousands to millions. In particular, we will run our
algorithms on these networks, for which we already have been collecting data; we will test using
external corroboration the validity of the communities discovered.

Summary of Military Relevance


This project gets at the root of social networks: its communities and hidden networks.
Understanding the social networks at play in asymmetric warfare is an important goal from the
U.S. Military point of view. For example, social corroboration of intelligence is important in the
Army decision making process. Understanding the social structure within any social network
needs is a fundamental first step, an understanding of its communities. Some communities
advertise themselves, such as friendship networks and alliances, but many do not. Knowing the
network of hidden cooperatives and how they relate to each other can only help the Army in its
decision making.

Though the networks we will study and validate on are not of particular interest to the Military,
they networks are typical social communication networks, which in some cases are very random
and rapidly evolving; they are diverse networks, so our methods would have to be general
enough to apply to such diverse networks, and thus, would apply to networks of importance to
Military. Further, some of these networks, like the Twitter network, are very harsh for statistical
algorithms (vast, very dynamic and very random), so the ability to discover community structure
in such an environment would indicate the robustness of our algorithms.

Research Products
By the end of the first year, we anticipate that we will develop preliminary methods for hidden
community detection, based on statistical communication patterns, establishing overlap as a
defining property. We anticipate one or more publications and/or reports submitted related to this
topic. Further, we will use these methods to begin to develop an understanding of typical
evolution of communities; in particular, the study of their stable cores, and understanding of
what constitutes anomalous evolution.

NS CTA IPP v1.4 8-40 March 17, 2010


8.6.7 Task S2.2: Information Flow via Trusted Links and Communities (M.
Magdon-Ismail, RPI (SCNARC); S. Adali, RPI (SCNARC); M. Goldberg,
RPI (SCNARC); W. Wallace, RPI (SCNARC))
Task Overview
This task aims to understand how information flows through a social network, in particular
through trusted and non-trusted (adversary) communities. Such a model should consider the
social community structure as well as the trust relationships between communities and between
individuals within the same community and within different communities. Over the first year, we
will build preliminary models, and preliminary validation on simulated as well as realistic social
networks.

Task Motivation
Challenges: Social networks are conduits for information. Any model for information flow must
take into account the community structure as well as the relationships between communities, as
well as the dynamics of the community structure. Dynamics of community structure results in
dynamics of interactions, and interactions are the medium over which information is flowing.
Thus, we need to build low-level agent based models which capture interaction dynamics during
information flow. These models will involve trust between agents, and so we need to build a
model of information flow which takes into account trust and community structure.
Military Scenarios: How far will an ideology spread in a social network? How valuable is a
piece of information, given the path it took in getting to you (through trusted and non-trusted
communities)? These are issues of interest to Military decision making.
Impact on Network Science: We need new models of agent behavior which considers community
structure, so that we may model how agents disseminate information through a social network. In
addition, by observing information flow through the networks, we can calibrate these models, as
well as infer more details about the underlying social network and trust structure – information
tends to flow along trusted paths, through trusted communities. Such a model would allow us to
study the interplay between trust, community structure and information flow.

Key Research Questions


Understanding the information dynamics of social networks is a very general and long term goal.
We break it down into three shorter-term research questions, as well as some longer-term
research questions:
Research Question 1. How do agents react to information coming in, process it and forward it
along?
Research Question 2. How does the community structure and trust within the social network
(including opposing versus cooperative communities), together with the agent dynamics,
affect the macroscopic properties of information flow through the social network?
Research Question 3. How does information cascade through the social groups via trusted and
un-trusted communities?

Ultimately, we would also like to address the following longer term objectives:

NS CTA IPP v1.4 8-41 March 17, 2010


Research Question 4. How can we use our understanding of the information dynamics to
immunize against the diffusion of bad ideologies and enhance the spread of important
information? How information sources or collection points be distributed so that information
is most efficiently disseminated or collected?
Research Question 5. What are the most important links to maintain at a high trust value for
purposes of information flow?

As can be seen, there will be a significant linkage to the Trust CCRI, which should be expected
because trust is what underlies the formation of communities and so should both be relevant in
detecting community structure, and understanding information flow. Importantly, given
communication structure, and the model for information flow, with trust as an input, we could
―reverse engineer‖ the trust values consistent with the observed information dynamics –
behaviorally measured trust.

Initial Hypotheses
Agents can be modelled as following a simple automaton for processing information and
interacting during the course of information flow through the social network. This model should
take as inputs, the community and trust structure of the network, together with parameters
governing the information flow and agent dynamics. Within this model, we conjecture that
different community and trust structures lead to drastically different information dynamics
footprints. We will build a scalable, realistic model for analysing such information flow through
communities for large, dynamic networks.

Prior Work
The most relevant prior work are our own initial models for social network dynamics and agent
based processing of information in networks operating under normal and high stress conditions,
[Hui08a, Hui08b, Magdon-Ismail05, Magdon-Ismail06]. The main innovation which we propose
to develop here is that, unlike SIR type models studied in mathematical epidemiology for
infection spread, information flow is an active process, requiring an agent based model. We will
build agent based axioms for information flow, which derive from the underlying community
and structure of the social network.

Technical Approach
Our model will have two main building blocks. First we will develop the small scale agent based
dynamics which ensue when an agent receives an important piece of information to react upon;
the desired reaction may be some action like ―retreat‖, or the desired action may be ―forward this
message‖. In either case, the agent must first decide whether to believe the information; if so act.
If not should he seek additional information or simply ignore it. Our model will incorporate this
complexity of agent based micro-modeling to ensure a realistic model of cognitive agents, based
on social and cognitive sciences. In particular, we formulate three axioms:

Axiom 1. The value of information changes depending on the path it takes through trusted and
non-trusted communities.
Axiom 2. Agents combine information from different sources depending on the nature of the
information, and the nature of action they are asked to perform based on the information.

NS CTA IPP v1.4 8-42 March 17, 2010


Axiom 3. Based on the value of the information and internal agent parameters, the agent acts.

These three axioms are parameterized so that they may accommodate a variety of environments.
Some setting of the parameters should apply to any particular social network.

The other input to the model will be the community structure. An inhomogeneous community
structure will lead to inhomogeneous trust relationships. We would then be able to investigate
how community structure influences the information dynamics with trust as a major player.

Out task will then be to understand such agent based models both from the theoretical and
simulation viewpoints: in particular, (i) what are the important source points and cut points, (ii)
how to immunize the diffusion of bad ideologies and enhance the spread of important
information, and (iii) with simulation, we will study the impact of communities, and
heterogeneous trust relations on the massive scale – million node networks.

Validation Approach
We have already collected data on the evacuation of San Diego during the wild fires of 2007.
Our data set contains information on the communication network, the social network (Hispanic
and non-Hispanic communities) as well as data on the evacuation dynamics resulting from a set
of reverse 911 phone calls. This is an ideal data set to study many aspects of the information
cascade: How do heterogeneous trust relations between multiple communities affect the
information cascade which ultimately results in a set of evacuated nodes? How does the
underlying communication network facilitate the information dissemination? How important are
social communities to the success of the information dissemination? We will model the San
Diego network on the scale of millions of nodes and simulate a variety of different models,
testing them against the observed evacuation dynamics. In addition, we will generate simulated
social networks obeying macroscopic properties (such as scaling laws) as discovered in our
EDIN tasks on network dynamics; we will use these networks to study in detail how different
community structures, trust structures and network structures affect information flow. In
particular, how different information seeding mechanisms perform in different environments.

Summary of Military Relevance


Social networks are conduits of information. In particular, an ad hoc social network within the
field may only have a few points of contact with the centralized command center. Intelligence
agents may only have a few hidden operatives in an insurgent social network. These points of
contact can be information collection points, or, as is often the case, sources for information
dissemination. How should these contact points be distributed so that information is most
efficiently disseminated? What is the information basin, the set of individuals who can reliably
convey information to the contacts, bearing in mind that information is cascading through a
diffusive process in which the trustworthiness of the links plays an imperative role in
determining the extent of the information cascade. Our goal in this task is to build a model which
governs the information dynamics through a social network.

Research Products
By the end of the first year, we will develop preliminary large scale models of information flow
through social networks via trusted links and communities. Our models will take into account

NS CTA IPP v1.4 8-43 March 17, 2010


heterogeneities in the network as expressed through communities via homophily. In particular,
we will develop our algorithms for large scale (million+ node) analysis of information dynamics
though social networks. We expect preliminary validation of our models of information
dynamics on the San Diego wild fire evacuation data, which we expect to lead to one or more
submitted publications and/or reports.

8.6.8 Task S2.3: Community Formation and Dissolution in Social Networks (G.
Korniss, RPI (SCNARC); B. Szymanski, RPI (SCNARC); C. Lim, RPI
(SCNARC); M. Magdon-Ismail, RPI (SCNARC); A.-L. Barabasi, NEU
(SCNARC); T. Brown, CUNY (SCNARC); Z. Toroczkai, ND (SCNARC))

Task Overview
The fundamental research question that we are trying to address in this project is: What are the
efficient strategies and trade-offs for attacking and disintegrating adversarial communities with
hostile, extremist and/or militant ideologies? Our long-term objective is to develop generically
applicable frameworks and computational methods for extracting individual- to community-level
behavioral patterns from the underlying social networks. In particular, by implementing
stochastic agent-based models for opinion formation on empirical networks, we will develop
models with predictive power, applicable to networks of various scales. As the availability of
information affects social interactions, and likewise, the integrity of the communication
infrastructure impacts social interactions, our project will rely on models and methods developed
by the INARC, CNARC, and IRC.

Task Motivation
Our individual-based models for opinion formation will provide an array of strategies for ―what
if‖ scenarios, that will enable us to answer such questions as where and how to defend network
communities with neutral or tolerant ―ideologies‖ against militant infiltration, and conversely,
where and how to attack and disintegrate adversarial communities exhibiting hostile, extremist
and/or militant ideologies.

For example, IW is a complex and ambiguous inherently social phenomenon: ―insurgency and
counterinsurgency operations focusing on the control or influence of populations, not on the
control of an adversary‘s forces or territory‖ (IW, Joint Operating Concept, 2007). Likewise, ―an
operation that kills five insurgents is counterproductive if collateral damage leads to the
recruitment of fifty more insurgents‖ (COIN). To help operations in such an environment, in this
project we will develop individual-based models to investigate social influencing and associated
strategies in weighted social networks. Our methods and models for community detection,
community stability, and social influencing will be applicable to any data sets, spanning vast
scales, including those collected by the military.

Prior Work
Most traditional methods to find community structure utilize various forms of hierarchical
clustering, spectral bisection (Scott00; Newman06; Newman05; Wu04), clique optimization
(Palla06), or iterative edge removal (Newman04). In contrast, in this project, we will utilize an
array of individual-based models for opinion dynamics, where during the evolution of the

NS CTA IPP v1.4 8-44 March 17, 2010


systems, communities explicitly manifest themselves through meta-stable clusters of shared
stylized opinions (e.g., religions or cultures). Tracking the evolution of the system does not only
allow us to identify underlying community structures (Blatt96; Reichardt04; Lu09; Kim09;
Cai05), but will also enable us to find agents with critical roles and answer such questions as
how to attack and disintegrate adversarial communities exhibiting hostile, extremist and/or
militant ideologies.

Key Research Questions


Research Question 1. How do we identify communities in social networks which manifestly
emerge as the result of communication and information flowing across the links (S2.3.1)? In
a military setting, these communities correspond to adversarial communities.
Research Question 2. How does the frequency of communication across the links (edges) affect
the emerging community structure (S2.3.1)?
Research Question 3. How do we influence/dissolve communities in social graphs (S2.3.2)? In a
military context, the answers shall yield ways to disintegrate adversarial communities with
hostile, extremist, and/or militant ideologies.

Initial Hypotheses
1. We believe that individual-based models for opinion dynamics can be effectively used to
detect and identify communities in social graphs (S2.3.1).
2. Based on the emerging community structure, influencing a small number of selected
individuals with critical positioning in the social network will be sufficient and will provide an
efficient to ideologically dissolve or disintegrate adversarial or hostile communities (S2.3.2).

Technical Approach
We will develop individual-based models to investigate social influence and associated processes
in large-scale social networks. Tracing how individual‘s opinions change over time will enable
us to identify communities in the underlying social network. Clusters of nodes sharing the same
opinion for some time reflect the inherent community structure through this network-dynamics
probe. Furthermore, by tracing communication patterns and investigating and understanding
information flow in the social network, we can identify nodes with high importance in the
corresponding social graph.

We will also perform a systematic comparative analysis of social engineering and influencing:
we will employ different strategies, such as indoctrinating an optimally chosen small set of
agents vs. removing agents (nodes) from the network, and analyze social costs, time scales, and
associated trade-offs associated with reaching the desired state (such as breaking up hostile
communities). Our methods and models for community detection, community stability, and
social influencing will be applicable to any data sets, spanning vast scales, including those
collected by the military. As the availability of information affects social interactions, and
likewise, the integrity of the communication infrastructure impacts social interactions, our
project will rely on models and methods of information and communication networks. In
particular, we will use methods developed allowing inference of appropriate link weights
(strength of the effective social interactions between individuals, with possible temporal

NS CTA IPP v1.4 8-45 March 17, 2010


variation and uncertainties) from the information and communication layers, which in turn, in
our models will represent social influence

Subtask S2.3.1: Identifying communities in social graphs by employing models for opinion
formation
We will employ individual-based models for opinion formation, such as the Naming Game.
Different words carried by individuals represent different opinions, or ideological standings.
More specifically, in models for social dynamics, our hypothesis states that communities
manifest themselves in the context in which distinct stylized opinions (e.g., religions, cultures,
and languages) have evolved and emerged over time. Thus, if at the late stages of the social
dynamics on the networks several communities persist and co-exist (different opinions survive),
they will be authentic signatures of the community structure of the underlying graphs. We will
also implement a weighted-link version of the model, applicable to weighted social networks.
The research and analysis on weighted social networks will be an important part of this task, as
the availability of information affects social interactions, and likewise, the integrity of the
communication infrastructure impacts social interactions. Therefore we will also collaborate with
M. Faloutsos (UCR) and J. Han (UIUC) on methods allowing inference of appropriate link
weights (strength of the effective social interactions between individuals, with possible temporal
variation and uncertainties) from the information and communication layers, which in turn, in
our models will represent social influence. Also, clusters and communities are often blurred as
concepts, and IRC researchers led by M. Faloutsos will investigate their interplay starting from
scratch, i.e., from appropriately projected edge weights.

Validation Approach
We will validate our methods by implementing individual-based models (Naming Game) on
empirical social networks of various scales. These weighted empirical networks include high-
school friendship networks (order of 103 nodes) and a large scale mobile communication graph
(order of 106 nodes).

Research Products
By the end of the first year, we anticipate that we will develop individual-based models and
theories for social dynamics applicable for community detection in various social networks. We
anticipate a paper on this effort in preparation by the end of this period.

Subtask S2.3.2: Dissolving communities in social networks


The results of subtask S2.3.1 will facilitate the detection and identification of communities in
social graphs. Subtask S2.3.2 will address methods to ideologically dissolve or disintegrate
communities, which in a military context, correspond to adversarial communities exhibiting
hostile, extremist, and/or militant ideologies. We will achieve this by selecting a small number of
individuals of critical positioning in the network, e.g., according to their degree, centrality,
betweenness, or communication frequency across ideological borders. Then we will also perform
a systematic comparative analysis of social engineering and influencing: we will employ
different strategies, such as indoctrinating an optimally chosen small set of agents vs. removing
agents (nodes) from the network, and analyze social costs, time scales, and associated trade-offs
associated with reaching the desired state (such as breaking up hostile communities). Our

NS CTA IPP v1.4 8-46 March 17, 2010


methods and models for community detection, community stability, and social influencing will
be applicable to any data sets, spanning vast scales, including those collected by the military.

Validation Approach
We will validate the effectiveness of our social engineering methods by implementing
individual-based models (Naming Game) on empirical social networks of various scales. These
weighted empirical networks include high-school friendship networks (order of 103 nodes) and a
large scale mobile communication graph (order of 106 nodes). We will use the emerging
communities exhibited by these empirical social graphs as test beds to develop efficient
strategies for ideologically dissolving or disintegrating communities

Research Products
By the end of the first year, we anticipate that we will develop models, theories, and methods to
efficiently ideologically disintegrate and dissolve communities in social graphs. We anticipate a
report on this effort by the end of this period which will serve as the basis for a future paper.

8.6.9 Linkages with Other Projects


The fundamental mode of interaction in the social network is through communication, so the
communication network plays a fundamental role in determining who communicates with whom,
in the social network; similarly information is the object of the communication, so the
information network is a fundamental component driving the social interactions. Thus changes in
the communication and information networks fundamentally affects how the social agents
interact in forming communities and hidden groups; thus the community structure intimately
depends on the structure and function of the information and communication networks. In turn,
the social network structure should govern the evolution of the information and communication
networks. This three-way interplay is a direct link between this project and the core activities of
INARC and CNARC.

Modeling the evolution dynamics of integrated networks (EDIN) will require first an
understanding of the community structure and its evolution; this means that we will need
methods for detecting the hidden groups, their structure, their information pathways, and their
evolution. These are primary research issues of interest to this project, hence our work will feed
into the EDIN CCRI; in addition, an understanding of how communities form and evolve will
help with the detection of those communities, so the work of the EDIN CCRI will certainly feed
into this project.

Also, information flow between two parties is a display of some measure of trust. Trust between
communities and between individuals will be fundamental to the information dynamics of the
social network. Thus, the ability to measure trust, refine it based on observed community
structure and information flow will be a major input-output relationship between this project and
the Trust CCRI of the CTA, at all levels from social trust to information and communication
trust. Further the small scale cognitive dynamics of the interacting agents has a big role in the
information flow. Thus efficient agent models should be rooted in sound cognitive science theory

NS CTA IPP v1.4 8-47 March 17, 2010


(Project S3). In addition, there will be collaboration with Project R3 of IRC, especially with task
R3.2 that investigates impact of adversary networks on information flow.
Further, the research and analysis on weighted social networks will be an important part of our
tasks, as the availability of information affects social interactions, and likewise, the integrity of
the communication infrastructure impacts social interactions. Therefore we will also collaborate
with M. Faloutsos (UCR) leading project IRC R.2, and J. Han (UIUC) on methods allowing
inference of appropriate link weights (strength of the effective social interactions between
individuals, with possible temporal variation and uncertainties) from the information and
communication layers, which in turn, in our models will represent social influence. These results
will serve as input for our tasks. Also, clusters and communities are often blurred as concepts.
Our methods to detect communities employing individual-based models, will serve as an input
for IRC researchers and projects (IRC R.2) led by M. Faloutsos.
In summary, we see fundamental linkage to the EDIN and Trust CCRI of SNARC, together with
synergy with IRC, INARC and CNARC because community/hidden network structure,
communications and information are intertwined.

Summary of linkages:

IPP Tasks Linkage


S2.1  I1.2 Video and Image Analysis to infer the relationships
of People to help forming communities.
S2.1  E2, E4 Community structure input to models for composite
network structure and dynamics.
S2.1  I3, R3.2 Knowledge extraction from text communication to
refine internal community structure as well as
relationships between communities.
S2.1  CNARC, INARC Evolving community structure to understand
interplay between info-communication network
evolution and social network evolution, to better
manage information and communication networks.

S2.1  CNARC, INARC Communication, interaction, and information flow


data to study how dynamic communication and
interaction networks affect social community
structure and evolution.
S2.1  IRC Methods for identification of community structure
based on statistical interaction data.
S2.1  T1 Trust metrics enhance community detection.
S2.2  T1 Trust metrics needed for information flow.
S2.1  T1 Community structure used to improve trust metrics.
S2.2  IRC, INARC, CNARC Models of information flow through communities to
integrate with IN and CN and help understand how
to better manage IN and CN.
S2.3 IRC Determining strength of effective social interactions;

NS CTA IPP v1.4 8-48 March 17, 2010


inference of appropriate tie strength

8.6.10 Collaborations and Staff Rotations


Project S2 will establish close collaboration with Kathleen Carley (CMU) of IRC. Kathleen
Carley has a major research interest in understanding social network structure, and our work in
S2.1 develops methods to understand overlapping structure in social networks, together with
dynamics. Kathleen Carley in IRC will be the primary liaison with coordinating this research
with INARC and CNARC research to understand how social network community structure co-
evolves with IN and CN evolution. Also, we will establish collaborations with M. Faloutsos
(UCR) in IRC and J. Han (UIUC) in INARC.

Project S2 will provide two researchers (with Ph.D. degree) to the NS center every third year of
the program.

8.6.11 Relation to DoD and Industry Research


Adversary networks are of increasing importance to DoD missions and nation‘s security, hence
our team has been supported by ONR and CIA in building a foundation for the proposed
research.

8.6.12 Project Research Milestones

Research Milestones

Due Task Description


Q2 Task S2.1 Progress report on hidden community detection
Initial evaluation of approaches to model information flow
Q2 Task S2.2
in trusted communities.
Report on employing individual-based models for
Q2 Task S2.3
community detection in social graphs.
Report on detecting persistent and overlapping communities
Q3 Task S2.1 in different techno-social networks such and blogosphere
and twitter communities
Progress report on modeling information flow in large social
Q3 Task S2.2
networks
Report on the efficiency and comparative methods for
Q3 Task S2.3
community dissolution in social graphs
Q4 Task S2.1 Final report on hidden community detection principles
Q4 Task S2.2 Report on validation of the model on San Diego fires data
Q4 Task S2.3 Paper or preprint on employing individual-based models for

NS CTA IPP v1.4 8-49 March 17, 2010


Research Milestones

Due Task Description


community detection in social networks and on the
efficiency and comparative methods for community
dissolution in social networks.

NS CTA IPP v1.4 8-50 March 17, 2010


Budget By Organization

Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 26,046
IBM (SCNARC) 47,115
ND (SCNARC) 11,000
NEU (SCNARC) 66,230
RPI (SCNARC) 503,704 91,954
TOTAL 654,095 91,954

References
[Bethard04] Bethard, S., H. Yu, A. Thornton, Vasileios Hatzivassiloglou, and Dan Jurafsky.
―Automatic extraction of opinion propositions and their holders.‖ Edited by James G. and Wiebe,
Janyce and Yan Qu. The AAAI Spring Symposium. 2004.
[Blatt96] Blatt, M., Wiseman, S., & Domany, E. (1996). Superparamagnetic clustering of data.
Phys. Rev. Lett 76, 3251--3254.
[Baumes05] Baumes, J., M. Goldberg, and M. Magdon-Ismail. ―Efficient identification of
overlapping communities.‖ IEEE International Conference on Intelligence and Security
Informatics. Springer,, pp 27-36, 2005.
[Baumes07a] Baumes, J., M. Goldberg, M. Hayvanovich, S. Kelley, M. Magdon-Ismail, K.
Mertsalov, and W. Wallace. ―SIGHTS: A software system for finding coalitions and leaders in a
social network.‖ IEEE International Conference on Intelligence and Security. 2007.
[Baumes07b] Baumes, J., M. Goldberg, M. Magdon-Ismail, and W. Wallace. ―Identifying hidden
groups in communication networks.‖ Handbooks in Information Systems -- National Security,
no. 2 (2007): 209--242.
[Baumes08] Baumes, J., H.-C. Chen, M. Francisco, M. Goldberg, M. Magdon-Ismail, and W.
Wallace. ―Visage: A virtual laboratory for simulation and analysis of social group evolution.‖
ACM Transactions on Autonomous and Adaptive Systems. 2008.
[Cai05] D. Cai, Z. Shao, X. He, X. Yan, and J. Han (2005) Community mining from multi-
relational networks. Proc. 2005 European Conf. Principles and Practice of Knowledge Discovery
in Databases (PKDD'05), pp. 445-452.
[Clauset04] Clauset, A., M.E.J Newman, and C. Moore. ―Finding community structure in very
large networks.‖ Physical Review E 70 (2004): 066111.
[Goldberg06] Goldberg, A. B., and X. Zhu. ―Seeing stars when there aren‘t many stars: Graph-
based semi-supervised learning for sentiment categorization.‖ HLT-NAACL 2006 Workshop on
Textgraphs: Graph-based Algorithms for Natural Language Processing. 2006.

NS CTA IPP v1.4 8-51 March 17, 2010


[Goldberg08a] Goldberg, M., M. Hayvanovich, A. Hoonlor, S. Kelley, M. Magdon-Ismail, H.
Mertsalov, B.K. Szymanski, and W. Wallace. ―Discovery, Analysis and Monitoring of Hidden
Social Networks and Their Evolution.‖ IEEE Homeland Security Technologies Conference.
Boston, MA: IEEE Computer Science Press, 2008.
[Goldberg08b] Goldberg, M., S. Kelley, M. Magdon-Ismail, and K. Mertsalov. ―Communication
dynamics of blog networks.‖ Interdisciplinary Studies in Information Privacy and Security.
2008.
[Hui08a] C. Hui, Malik Magdon-Ismail, Mark Goldberg, William A. Wallace 
 "The Impact of
Changes in Network Structure on Diffusion of Warnings", 
 Proc. Workshop on Analysis of
Dynamic Networks (SIAM International Conference on Data Mining), pages, 2009.
[Hui08b] C. Hui, Malik Magdon-Ismail, Mark Goldberg, William A. Wallace 
 "Micro-
Simulation of Diffusion on Warnings", Proc. 5th Int. Conf. on Information Systems for Crisis
Response and Management ISCRAM, pages 424-430, 2008.
[Kim09] M.-S. Kim and J. Han (2009). A particle-and-density based evolutionary clustering
method for dynamic networks. Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09), pp.
622-633.
[Lu09] Q. Lu, G. Korniss, and B.K. Szymanski (2009). The Naming Game in Social Networks:
Community Formation and Consensus Engineering. J. Econ. Interact. Coord. 4, 221—235
(2009).
[Magdon-Ismail05] Magdon-Ismail, M., W. A. Wallace, and M. Goldberg. ―SGER: Using global
communication systems as early warning systems for natural disasters NSF IIS-0522672.‖ 2005.
[Magdon-Ismail06] Magdon-Ismail, M., W. A. Wallace, and M. Goldberg. ―Social
communication networks for early warning in disasters NSF IIS-0621303.‖ 2006.
[Newman04] Newman, M. E. J. and Girvan, M. (2004). Finding and evaluating community
structure in networks, Phys. Rev. E 69, 026113.
[Newman04a] Newman, M.E.J. ―Fast algorithm for detecting community structure in networks.‖
Physical Review E 69 (2004)
[Newman04b] Newman, M.E.J., and M. Girvan. ―Finding and evaluating community structure in
networks.‖ Phys. Rev. E., no. 69 (2004): 026,113.
[Newman05] Newman, M. E. J. (2005). Detecting Community Structure in Networks, Eur. Phys.
J. B. 38, 321–330.
[Newman06] Newman, M. E. J. (2006). Finding community structure in networks using the
eigenvectors of matrices. Phys Rev E 74:036,104.
[Palla05] Palla G, Derenyi I, Farkas I, and Vicsek T (2005). Uncovering the overlapping
community structure of complex networks in nature and society. Nature 435:814–818.
[Pang05] Pang, B., and L. Lee. ―Seeing stars: Exploiting class relationships for sentiment
categorization with respect to rating scales. In Proceedings of ACL-05.‖ 43th Meeting of the
Association for Computational Linguistics. 2005. 115--124.
[Reichardt04] Reichardt, J., & Bornholdt, S. (2004). Detecting fuzzy community structures in
complex networks with a potts model. Phys. Rev. Lett. 93, 218701.

NS CTA IPP v1.4 8-52 March 17, 2010


[Scott00] Scott, J (2000). Social Network Analysis: A Handbook. Sage, London, 2nd ed.
[Taboda04] Taboada, M., and J. Grieve. ―Analyzing appraisal automatically.‖ Symposium on
Exploring Attitude and Affect in Text: Theories and Applications. 2004. 158 -- 161.
[WuHuberman04] Wu, F., and Huberman, B. A. (2004). Finding communities in linear time: a
physics approach. Eur Phys J B 38, 331–338.

NS CTA IPP v1.4 8-53 March 17, 2010


8.7 Project S3: The Cognitive Social Science of Net-Centric
Interactions

Project Lead: Wayne D. Gray, RPI


Email: grayw@rpi.edu, Phone: 518-276-6576

Primary Research Staff Collaborators


W. Gray, RPI (SCNARC) S. Adali, RPI (SCNARC)
M. Schoelles, RPI (SCNARC) W. Wallace, RPI (SCNARC)
J. Mangels, CUNY (SCNARC) P. Pirolli, PARC (INARC)
T. Hollerer, UCSB (INARC)
X. Yan, UCSB (INARC)
J. Hansberger, ARL
D. Cassenti, ARL

8.7.1 Project Overview


Project S3 represents the main cognitive science component of the four ARCs. Our goal is to
apply the theories, behavioral and neuroscience data collection techniques, and modeling
(computational and mathematical) approaches of the Cognitive Sciences to uncovering and
modeling the cognitive mechanisms that underlie human-human, human-technology, and human-
information social interactions especially when these interactions are mediated by or entail the
use of Net-Centric technologies. Since the SCNARC project is intended by the Army to be 6.1
research, many of our research products will advance the emerging sub-discipline of cognitive
social science by focusing on issues and problems important in the design, evaluation, and use of
Net-Centric systems. The longer term applied products of this work will include the development
of modeling approaches that predict the performance impact of alternative Net-Centric designs,
that produce prescriptive models for the visual display of analytic Net-Centric data, and facilitate
the mediation of human-human interactions via Net-Centric systems.

8.7.2 Project Motivation


The title of the SCNARC Project S3, the Cognitive Social Science of Net-Centric Interactions,
should be parsed into three key ideas that are somewhat novel in their combination. First,
cognitive social science implies a concern with topics inherent to the social sciences but from a
perspective informed by modern theories, models, and neuroscience methods of cognitive
science. Most social science theories are disconnected from the modeling approaches and

NS CTA IPP v1.4 8-54 March 17, 2010


theories of contemporary cognitive science. Hence, they lack an awareness of how human social
behavior is shaped by human cognitive limits and the interactions of these limits with the
demands placed on human performance by various natural and designed task environments (e.g.,
see Sun07; Turner01).

Second, the term interactions is broadly interpreted to imply human-human, human-technology,


and human-information interactions. The focus in all three types of interaction is on the ways in
which human cognition adapts or fails to adapt (i.e., ―cognitive overload‖) due to the nature of
the other human(s), technology design, or information design. An initial focus (as discussed
below) will encompass the exchange of information among teams of humans or mixed teams of
humans and interactive cognitive agents, which are simulated humans based on the ACT-R
unified cognitive theory or architecture (Anderson07).

Third, the intersection of the cognitive with the social defines a vast space. It would be easy for a
basic researcher to wander aimlessly in this space and end up wasting years pursuing issues that
shed a trivial light on the influence of human cognition in shaping social interactions. A better
strategy is to focus on Pasteur‘s Quadrant (Stokes97) by conducting a program of fundamental
scientific studies that focus on solving a problem that someone (society, the U. S. Army) actually
cares about solving. Hence, net-centric interaction grounds our cognitive social science proposal
by focusing our efforts on an applied technology of growing national importance.

8.7.3 Key Project Research Questions and Hypotheses


In this section we state the broad questions and hypotheses that will drive our research. Our
focus throughout is on immediate interactive behavior; that is, cognitive resources and processes
at the 300 to 3000 ms temporal resolution that typically operate beneath conscious awareness to
select and implement steps in service of consciously pursued goals and subgoals.

Question 1: Human-Human Interactions. We will identify a candidate set of social


behaviors (e.g., trust) that are most important to or influenced by Net-Centric mediated
human communications. Within this set we will determine the relationship between
cognitive resources or cognitive processing and social (or asocial) behavior.
Hypothesis 1a: Evidence-based versus superficial knowledge. Limits on cognitive
resources or cognitive processing at the 300 to 3000 ms time affect social behavior by
shifting the beliefs and goals used by human actors from a reliance on current assessment
of the other actors‘ beliefs and goals to a reliance on stereotypic knowledge based on the
actors‘ roles, ethnicity, or gender, or to non-relevant perceptual aspects of the form and
format of information. [For example, beliefs and goals have been shown to influence the
second-by-second allocation of attention to information in a real-time data acquisition
task (Mangels06) as have perceptual aspects (Rudoy09)]
Hypothesis 1b: Hypothesis 2 is the complement of hypothesis 1. Social behaviors place
demands on cognitive resources and processing that may limit basic information
processing mechanisms such encoding, memory, and attention. [For example, lying to
another human has been shown to increase demands on cognitive resources of the liar
(Carrión10).] Resource demands at the temporal resolution of immediate interactive
behavior may influence the successful performance of higher-level goals for the use of

NS CTA IPP v1.4 8-55 March 17, 2010


Net-Centric technologies. Contemporary research documents an instance of this by
showing that talking on a cell phone while driving has the same effect on driving
performance as a blood alcohol level of .08 (i.e., legally intoxicated in most states).

An extended example for Question 1 is provided by Jennifer Mangels‘ (CUNY) (Mangels06)


work showing how students‘ beliefs and goals (a factor usually discussed in social psychology
journals) have cognitive consequences (i.e., negatively or positively affecting learning
outcomes). A common distinction in the literature is between people who view intelligence as
fixed versus those who view it as malleable. Those who view it as fixed see performance as a
public test of how smart they are. In contrast, those who see it as malleable see performance as a
means of identifying and overcoming shortcomings, hence becoming more intelligent. Guided by
cognitive neuroscience models of top–down, goal-directed behavior, Mangels found a
differential brain response (the P3 response) in the anterior frontal cortex following negative
performance feedback that was positively correlated with concerns about proving ability relative
to others. The fixed group also showed less sustained memory-related activity (left temporal
negativity) to corrective information than did the malleable group. This difference suggested
reduced attention by the fixed group to learning the correct answers following feedback on
errors. The strategy of avoiding informative feedback contributed to the fixed group‘s reduced
performance on a subsequent surprise retest.

Question 2: Human-Technology and Human-Information Interactions. How do the design


of Net-Centric technologies or the form and format of Net-Centric information interact
with the human cognitive processes responsible for heuristics and biases in human
performance?

Hypothesis: Successful higher-level cognitive performance is rooted in the demands made on


immediate interactive behavior at the 300 to 3000 ms level of cognitive processing and resource
allocation. Small differences in how a Net-Centric system is designed may have a large impact
on successful task performance. Predictions of this general hypothesis are specific to the design
of individual Net-Centric systems. However, an approach rooted in the development of
predictive computational cognitive models should be capable of being generally applied to
particular systems to yield particular hypotheses. [See our discussion of year 1 tasks that
follows.]

There is much empiricism, that is, trial and error, in attempts to optimize the design of human-
information or human-technology interaction. For example, Google reports a study
(http://googleblog.blogspot.com/2009/02/eye-tracking-studies-more-than-meets.html) that
discusses how they came about with a webpage design that allowed users to shave seconds off of
their searches and thereby facilitated successful search. As Stokes discusses in the context of
Pasteur‘s 19th Century research, ―one of the most valuable properties of applied research is
'reducing the degree of empiricism in a practical art'‖ (Stokes97, p. 8). Any one design can be
tweaked by trial and error (as in the Google example) so as to prevent users from making
premature selections or premature rejections; i.e., to avoid the heuristics and biases to which
human cognition falls prey. However, a more robust approach is to understand the underlying
cognitive processes and to be able to guide the design of technology or information to avoid
suboptimal performance (Fu04). For example, the human information acquisition system is

NS CTA IPP v1.4 8-56 March 17, 2010


exquisitely sensitive to time and can adjust to time costs measured in hundreds of milliseconds.
Gray and Schoelles have shown dramatic shifts between the reliance on perfect information in-
the-world versus imperfect information in-the-head as the time costs to visually access
information in-the-world increases (Gray04; Gray06). Another example is the approach that
treats language as a form of joint activity drawing on physical interactions to disambiguate
meaning. In the course of two people cooperating to perform a single task, it has been shown
(Shintel09) that those who relied on language alone to disambiguate meaning imposed on
themselves a cognitive demanding and time consuming constraint. Better coordination was
achieved by the less cognitively demanding and faster strategy of focusing on the perceptual-
motor cues available in the interpersonal interactions.

Question 3: Can we create predictive computational cognitive models of individuals


interacting via Net-Centric technologies (i.e., interactive cognitive agents)?

Hypotheses. This research question does not lend itself to a fixed set of hypotheses as much as it
does to a challenge to the current state-of-the art in computational cognitive modeling. Most
applications of computational cognitive modeling are limited to small, laboratory phenomena.
Few attempts take on the complexity of even the computer-mediated world [Gray03]. Recent
years have seen an increase in the attempts to apply models developed for basic research to
cognitive engineering applications [Gray08]. The sweet spot to which question 3 is aimed is the
creation of fully embodied models (i.e., those with the same perceptual-motor and cognitive
constraints as humans) that can predict performance success and failures due to characteristics of
the technology (human-technology interaction), information form and format (human-
information interaction), and interpersonal exchanges (human-human interactions). Over the past
decade, progress has been made on the first two components (for an overview see, Gray07). The
challenge will be in extending the computational cognitive modeling approach (a) to Net-Centric
operations and (b) to encompass the interaction of cognitive with social processes. This last
challenge lies at the heart of the emerging sub-discipline of cognitive social science (Sun07;
Turner01).

8.7.4 Summary
The Cognitive Social Science of Net-Centric Interactions will bring the computational modeling
techniques of cognitive science together with the tools and techniques of cognitive neuroscience
to ask how the design of the technology (human-technology interaction), the form and format of
information (human-information interaction), or features of communication (human-human
interaction) shape the success of net-centric interactions.

8.7.5 Technical Approach


Our initial efforts will focus on three tasks.
In support of the Trust CCRI, the first cognitive social science construct we will study is
trust (this is an example of Research Question 1, above, a key social science construct in
human-human interaction).
The second task entails the building of a simulated task environment that can be used
over the next several years as a complex laboratory task within which to study Net-

NS CTA IPP v1.4 8-57 March 17, 2010


Centric interactions of among 2-3 human agents or some combination of human and
artificial agent. Our longer-term goal is to collect detailed behavioral and neuroscience
data from users of the Net-Centric prototype systems that emerge from research among
the four ARCs. However, having a simulated task environment of our own will allow us
to quickly modify and extend this task to test specific hypotheses that may be harder to
directly address in other contexts.
The third task entails the establishment of software standards for human data collection
and the interaction of computational cognitive models with Net-Centric systems that can
be used by all ARC groups who are doing human testing or who are building prototype
Net-Centric systems for use in human testing.

8.7.6 Task S3.1: The Cognitive Social Science of Human-Human Interactions (W.
Gray, RPI (SCNARC); M. Schoelles, RPI (SCNARC); J. Mangels, CUNY
(SCNARC))

Overview: How does the social psychology construct of trust vary in human-human versus
human-agent interactions? Specifically, what cognitive mechanisms are affected by trust
and how do human evaluations of trust influence our subsequent cognitive processing of
information?

A year 1 focus of SCNARC S3 will be to examine the effect of trust on cognitive processing and
variations in human trust over human-human versus human-agent interactions. Specifically, we
hypothesize that differences in trust are signaled by differences in cognitive brain mechanisms
and that these differences can be detected by event-related brain potential (ERP) measures and
related to established cognitive science constructs, which in turn can be incorporated as changes
to the ACT-R (Anderson07) cognitive architecture.

Task S3.1 Motivation


Trust is a CCRI that has important cognitive and social components as well as computer science
components. Although trust has been researched and studied from the perspective of social
psychology, the cognitive social science of trust is a recent topic. Hence, the importance of trust
to the SCNARC effort justifies its status as the first social science construct that we study.
Furthermore, we expect that trust will provide a good topic on which to try out and refine our
various tools for behavior and neuroscience data collection and analysis, and computational
cognitive modeling.

Key Research Questions


The key question for this initial effort is whether we can identify an event-related brain potential
that reliably and validly signals the social science construct of trust.

Initial Hypotheses
Very recent work from one laboratory (Rudoy09) has
suggested that degree of trustworthiness is signaled by three
event-related brain potential responses, ―an early frontal
correlate of consensus trustworthiness, a later correlate of

NS CTA IPP v1.4 8-58 March 17, 2010


Figure S3-1: anterior cingulate
memory retrieval with a parietal topography, and an even later correlate . . . that also exhibited a
parietal topography.‖ Our initial hypothesis is that these ERPs provide a general assessment of
trust in human-human Net-Centric interactions. We will also investigate the relationship between
trust and the ERP construct of event-related negativity (ERN).

Prior Work
The cognitive science study of human trust is a wide-open area with little use of EEG data and
no incorporation of EEG findings into computational cognitive models. The work of Rudoy and
Paller (Rudoy09) is the only example that we know in which EEG correlates of trust have been
sought. This pioneering effort has not been replicated and the paradigm used was extremely
artificial and limited in scope. Of key interest to our work is their manipulation of time pressure
from which they concluded that as time pressure to perform increased, people‘s assessment of
trust relied less on their past experience and more on non-predictive, perceptual factors.

From related work (not on trust), we know that the posterior medial frontal cortex (pMFC
including anterior cingulate and pre-SMA, see Figure S3-1) plays a role in the detection of errors
in one‘s own performance as well as in other‘s performance (Bekkering09). Such error detection
is signaled in EEG data by the event-related negativity (ERN) response. Hence, to the extent that
expectations of other people making an error is an index of one type of trustworthiness, the ERN
evoked-response potential (ERP) will be explored as an index of trust-reliability in our initial
study.

Technical Approach
Paradigms: As cautious scientists we plan two empirical studies that will start us on the
exploration of the cognitive social science of trust. The first will attempt to replicate as well as
vary some of the conditions used by Rudoy and Paller (Rudoy09). A tactical advantage of this
replication is that it will allow our work to get started in Q2 while we are building the more
complex Argus-Army simulation discussed in Task S3.2.

As is the case for Argus (Schoelles01), Argus-Army will require


three human or some combination of 3 human and cognitive
agents, each of whom receives partial information of the current
situation. The players have different roles (e.g., squad leader, UAV
operator, and Battalion Commander) and access to different
sources of information. Successful play requires the exchange and
integration of information that is needed to complete the ground
mission. (We elaborate on Argus-Army in Task S3.2.)

Figure S3-2: Graduate For purposes of our initial study, the UAV operator‘s role will be
student demonstrating played by a human team member or by an interactive cognitive
stylish EEG cap with 32 agent. In both types of teams, team members will communicate
electrodes. with each other via menu selections and typing simple commands
(i.e., no voice or complex linguistic data).

Data Collection. For both paradigms, one or more human members of each team will be
instrumented with 32 electrode EEG caps (see Figure S3-2). All system states, all human mouse

NS CTA IPP v1.4 8-59 March 17, 2010


movements, responses, and so on will be saved to a log file and times-tamped to the nearest 8
milliseconds. The log file will be complete enough so as to be able to be ―played back‖ for
qualitative assessment of strategies. It will be complete enough for quantitative assessment of
human eye movements during game play (for example, to examine characteristics of scan paths
during play by the human agents and to compare them with the predicted scan paths made by the
Interactive Cognitive Agent).

Hyperscanning. Babiloni and colleagues [Babiloni06] have recently introduced the notion of
hyperscanning; the simultaneous collection of EEG data from two or more human subjects in
real time as they engage in a group task. The technical challenges of this technique are
challenging but the promise of the technique in the study of cognitive social science is high and
will be pursued as part of our research effort.

Validation Approach
As outlined above our task 1 enables two types of validation. First, our partial replication of
Rudoy and Paller‘s study will provide an important replication of their finding that three ERP
components signal trustworthiness. Second, our variations on their study will be the first step in
generalizing their results to different paradigms. Third, our use of their measures in Argus-Army
will be a strong test of the validity of these factors to measure trust.

Summary of Military Relevance


We will leave the question of why the construct of trust has military relevance to the Trust CCRI.
Here we will address the narrower question of why a valid neuroscience measure of trust is of
military interest. Most measures of trust rely on questionnaire data collected either during or
after task performance. Inferring differences in social constructs based on questionnaire data is
notoriously weak and unreliable. Having a neuroscience measure that can discriminate
differences in trust (or other social science constructs) within a single individual across multiple
occasions would be an important step to increasing the trustworthiness of Army systems.

Research Products
The main product of this effort will be to establish a reliable and valid neuroscience measure of
trust that can be used by us and others in more applied work with future Net-Centric systems. A
second product will be the establishment of time parameters for the emergence of a ―trust‖
appraisal that can be used in computational cognitive models to predict trust in complex
technology-mediate human-human interactions. Of course, as academic researchers we see much
potential in this work for research reports and papers that contribute to the basic research
foundation of cognitive social science and to the 6.1 goals of this project.

NS CTA IPP v1.4 8-60 March 17, 2010


8.7.7 Task S3.2: Human-Technology and Human-Information Interactions:
Develop Argus-Army, a Net-Centric Simulated Task Environment to collect
data on Cognitive Social Science constructs of interest (W. Gray, RPI
(SCNARC); M. Schoelles, RPI (SCNARC))

Overview
We will repurpose an existing team simulation environment for research in cognitive social
science constructs of Net-Centric interactions.

Motivation
Although we expect to work with Net-Centric software being developed by other groups, there is
a need for a research vehicle that can be rapidly reconfigured to emphasize or isolate important
social science constructs.

Key Research Questions


The proposed Argus-Army simulated task
environment builds on the Argus simulated task
environment engine developed by Schoelles
(Schoelles01) under prior Air Force funding. It will
support both human users and interactive cognitive
agents based on the ACT-R cognitive architecture
(Anderson07). The radar displays of Air-Sea space
from Argus will be replaced with terrain maps as is
common in military ground war games (see Figure
S3-3). As is the case for Argus, Argus-Army will
require three human or some combination of human
Figure S3-3: Argus-Army will use a variety of and artificial cognitive agents, each of whom
terrain maps that will be developed to support receives partial information of the current situation.
scenarios recommended by the U. S. Army. of
ground terrain maps used in military war It is important to emphasize that Argus-Army will
games. Note that this is not our terrain mapbe designed as a flexible vehicle for designing and
but is an example pulled off the web.
building scenarios relevant to the U. S. Army. The
military expertise of our team is limited to Gray‘s limited and aging experience in small unit
tactical team training and at Battalion level operations at the National Training Center (Fort
Irwin, circa 1984). We recognize the limit of this experience to the current Army and will seek
ARL expertise in developing scenarios for irregular warfare in urban and nonurban terrain.

Initial Hypotheses
Argus-Army is a vehicle for research and not a research topic per se. To the extent that it is
appropriate to say that a simulated task environment entails a hypothesis then the hypothesis is
that a complex but manageable research environment is the best tool for advancing basic
research in an applied domain [Gray02].

Technical Approach
Initial activities for Argus-Army include the following:

NS CTA IPP v1.4 8-61 March 17, 2010


Transform Argus from a radar-inspired, Air Force task environment to an Army-inspired
simulation focusing on resource allocation and tactical decision-making among agents
with distinct roles (e.g., squad leader on the ground, UAV operators elsewhere in the
world, and the Battalion Commander in the rear).
Build interactive cognitive agents in the ACT-R cognitive architecture that will play the
role of one or more human agents (initially we plan on building an interactive cognitive
agent of the UAV operator).

The transformation of Argus into Argus-Army, the integration of Argus-Army with software that
can collect and synchronize the time stamps of EEG data at remote locations (to millisecond
accuracy), and the construction of cognitively plausible interactive cognitive agents will not be
trivial. Efforts on this line can start immediately but are expected to continue to be refined over
the next several years. The good news is that intermediate products should be useful in the 1 st
year (e.g., Argus-Army usable at one location to collect baseline human data from 3 human
participants). In designing Argus-Army we will work closely with Dr. Daniel Cassenti (ARI
APG) and colleagues to ensure that Army-relevant scenarios are built in.

We need to devote significant levels of first year effort to developing software and software
standards to:

A. Standards for Human Data Collection: Facilitate the collection and sharing of human
performance, process, and neuroscience data from human interactions with Net-Centric
software.
B. Design Specifications for Developers of Net-Centric Software: Support the direct
interaction of software interactive cognitive agents with the same Net-Centric software
used by human users.
C. Hyperscanning is the simultaneous collection of EEG data from two or more humans at
single or distributed locations. Hyperscanning for EEG is a new technique that is
primarily being developed and used by an Italian group [Babiloni06]. In-house
development is required to ensure millisecond level synchronization with all 32 EEG
channels from two systems plus the task software.
D. Simultaneous collection and synchronization of data collected in remote locations:
Enable the simultaneous collection and synchronization of EEG and other performance
data from multiple people at different physical locations (e.g., Mangels‘ laboratory at
CUNY and the Gray-Schoelles laboratory at RPI).

Validation Approach
Three types of validation are sought for Argus-Army. The first type would occur if the research
that uses Argus-Army advances our understanding of the cognitive factors underlying social
constructs in complex Net-Centric operations. A second type of validation would come from the
adoption by Argus-Army by other groups associated with ARC such as the Army Research
Laboratory. The third type of validation is the validation of Argus-Army as a complex software
system for accurately and reliability collecting human data at the temporal resolution of
milliseconds. This validation includes the completeness of the log files, the accuracy of the
timestamps, and the ability of interactive cognitive agents to interact with Net-Centric software

NS CTA IPP v1.4 8-62 March 17, 2010


and to create the same log file records as human users. Each of these problems will be addressed
using standard software validation techniques.

Summary of Military Relevance


Argus-Army is not meant to have direct relevance to military operations; that is, it is not
intended as a realistic simulation of military operations that might be used by the U. S. Army for
training or for tactical planning. Rather, Argus-Army‘s relevance will be in its ability to aid
researchers in isolating and understanding cognitive science constructs that can be applied to
more complex Net-Centric systems to facilitate military operations. Hence, accomplishing this
task correctly will enable the collection of data at multiple sites by different teams of
investigators and will ensure that different teams of researchers will be able to access the same
data sets to analyze data relevant to the research questions that are of interests to them.

Research Products
Argus-Army is its own research product. It is intended to be a flexible vehicle that fully
integrates the collection of behavioral and neuroscience data with performance data. Likewise, it
is intended to support the development of complex computational cognitive models that can
interact directly with the Argus-Army software as a 3rd human player might. Such a capability
will allow us to design interactive cognitive agents who, for example, vary in their
trustworthiness so as to facilitate the understanding and investigations of the hypotheses
discussed in Task S3.1.

8.7.8 Linkages with Other Projects

IPP Tasks Linkage


S3.1  CCRI Trust Cognitive Social Science basis of Trust
S3.1  IN 2 (PARC) Linkage between the cognitive and social levels
of analysis
S3.2  CCRI Trust Net-Centric simulated task environments within
which to study human-human trust
S3.2  ARL, PARC All other groups who plan to collect or analyze
human Net-Centric data
S3.2  IN 2 (PARC & UCSB) Visual representations of Net-Centric data.
S3.2  IN 2 (PARC) Collection and synchronization of data collected
in multiple locations. Required by any group
working on human interactions of Net-Centric
systems

8.7.9 Relevance to U.S. Military Visions/Impact on Network Science


SCNARC Project S3: Cognitive Social Science of Net-Centric Interactions will bring the
computational modeling techniques of cognitive science together with the tools and techniques
of cognitive neuroscience to ask how the design of the technology (human-technology

NS CTA IPP v1.4 8-63 March 17, 2010


interaction), the form and format of information (human-information interaction), or features of
communication (human-human interaction) shape the success of net-centric interactions.

8.7.10 Collaborations and Staff Rotations


The collaborations identified about have been with the CCRI Trust and the INARC T2.2: Human
Dimension of Trust (Pirolli/PARC; Tobias/UCSB). We will develop plans for staff rotations
when postdoctoral researchers are hired.

8.7.11 Relation to DoD and Industry Research


The work we propose on the cognitive social science of Net-Centric interactions is congruent
with work being supported by the AFRL HE laboratories at Mesa, AZ on the cognitive basis of
interactive behavior and the use of interactive cognitive agents in models of real-time safety
critical tasks. Likewise, the AFRL HE group is supporting work on team performance by Drs.
Nancy Cooke and Chris Myers. Some of the work we propose is also related to ARL work
performed at ARL by Troy Kelley, Dan Cassenti, and Kevin Oie. However, to our knowledge all
of the work being done by AFRL and ARL focuses on individuals or on collocated teams.
Likewise the AFRL and ARL work focuses on individual cognitive factors and not on the
cognitive basis of social constructs.

8.7.12 Project Research Milestones

Research Milestones

Due Task Description


Identification of event-related brain potential responses that
Q1/2 Task S3.1 seem most likely to be related to the social construct of human
trust
Q1/2 Task S3.2 Begin Transformation of Argus into Argus-Army
Q2 Task S3.1 Purchase of 2nd EEG for use in Hyperscanning
Q2 Task S3.2 Unit Testing of Argus-Army
Q3 Task S3.1 Begin collection of pilot data from teams of pure human players
Pilot testing of experimental setup including Argus-Army and
synchronization of EEG data
Q3 Task S3.2
Initial development of interactive cognitive agents for Argus-
Army
Preliminary analyses of pilot data with priority going to
Q4 Task S3.1
analyses of the ERP data.
Begin Argus-Army data collection with mixed teams of human
Q4 Task S3.1
and interactive cognitive agents
Q4 Task S3.2 Construction of interactive cognitive agents for use in Argus-

NS CTA IPP v1.4 8-64 March 17, 2010


Research Milestones

Due Task Description


Army data collection

Budget By Organization

Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 65,115
RPI (SCNARC) 221,138 40,370
TOTAL 286,253 40,370

References
[Anderson07] Anderson, J. R. (2007). How can the human mind occur in the physical universe?
New York: Oxford University Press.
[Babiloni06] Babiloni, F., Cincotti, F., Mattia, D., Mattiocco, M., De Vico Fallani, F., Tocci, A.,
et al. (2006). Hypermethods for EEG hyperscanning. Paper presented at the Engineering in
Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the
IEEE.
[Bekkering09] Bekkering, H., Bruijn, E. R. A. d., Cuijpers, R. H., Newman-Norlund, R., Schie,
H. T. v., & Meulenbroek, R. (2009). Joint Action: Neurocognitive mechanisms supporting
human interaction. Topics in Cognitive Science, 1(2), 340-352.
[Carrión10] Carrión, R., Keenan, J. P., & Sebanz, N. (2010). A truth that's told with bad intent:
An ERP study of deception. Cognition, 114(1), 105-110.
[Fu04] Fu, W.-T., & Gray, W. D. (2004). Resolving the paradox of the active user: Stable
suboptimal performance in interactive tasks. Cognitive Science, 28(6), 901-935.
[Gray08] Gray, W. D. (2008). Cognitive architectures: Choreographing the dance of mental
operations with the task environments. Human Factors, 50(3), 497-505.
[Gray07] Gray, W. D. (Ed.). (2007). Integrated models of cognitive systems. New York: Oxford
University Press.
[Gray04] Gray, W. D., & Fu, W.-T. (2004). Soft constraints in interactive behavior: The case of
ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head. Cognitive
Science, 28(3), 359-382.
[Gray06] Gray, W. D., Sims, C. R., Fu, W.-T., & Schoelles, M. J. (2006). The soft constraints
hypothesis: A rational analysis approach to resource allocation for interactive behavior.
Psychological Review, 113(3), 461-482.

NS CTA IPP v1.4 8-65 March 17, 2010


[Gray03] Gray, W. D., Schoelles, M. J., & Myers, C. W. (2003). Meeting Newell's other
challenge: Cognitive architectures as the basis for cognitive engineering. Behavioral & Brain
Sciences, 26(5), 609-610.
[Gray02] Gray, W. D. (2002). Simulated task environments: The role of high-fidelity
simulations, scaled worlds, synthetic environments, and microworlds in basic and applied
cognitive research. Cognitive Science Quarterly, 2(2), 205–227.
[Mangels06] Mangels, J. A., Butterfield, B., Lamb, J., Good, C., & Dweck, C. S. (2006). Why do
beliefs about intelligence influence learning success? A social cognitive neuroscience model.
Scan, 1, 75-86.
[Rudoy09] Rudoy, J. D., & Paller, K. A. (2009). Who can you trust? Behavioral and neural
differences between perceptual and memory-based influences. Frontiers in Human
Neuroscience, 3.
[Schoelles01] Schoelles, M. J., & Gray, W. D. (2001). Argus: A suite of tools for research in
complex cognition. Behavior Research Methods, Instruments, & Computers, 33(2), 130–140.
[Shintel09] Shintel, H., & Keysar, B. (2009). Less Is More: A Minimalist Account of Joint
Action in Communication. Topics in Cognitive Science, 1(2), 260-273.
[Stokes97] Stokes, D. E. (1997). Pasteur's quadrant: Basic science and technological
innovation. Washington, DC: Brookings Institution Press.
[Sun07] Sun, R., & Naveh, I. (2007). Social institution, cognition, and survival: A cognitive-
social simulation. Mind and Society, 6(2), 115-142.
[Turner01] Turner, M. (2001). Toward the founding of Cognitive Social Science.
http://markturner.org/checss.html

NS CTA IPP v1.4 8-66 March 17, 2010


9 Non-CCRI Research: Communication Networks
Academic Research Center (CNARC)

Director: Thomas F. La Porta, Penn State


Email: tlp@cse.psu.edu, Phone: 814-865-6725
Government Lead: Greg Cirincione, ARL
Email: cirincione@arl.army.mil, Phone: 301-394-4809

Project Leads Lead Collaborators


Project C1: A. Yener/R. Govindan, Penn
State/USC
Project C2: T. F. La Porta, Penn State
Project C3: Deferred

Table of Contents
9 Non-CCRI Research: Communication Networks Academic Research Center (CNARC) .... 9-1
9.1 Overview ......................................................................................................................... 9-2
9.2 Motivation ....................................................................................................................... 9-3
9.2.1 Challenges of Network-Centric Operations ............................................................. 9-3
9.2.2 Example Military Scenarios ..................................................................................... 9-3
9.2.3 Impact on Network Science ..................................................................................... 9-4
9.3 Key Research Questions ................................................................................................. 9-4
9.4 Technical Approach ........................................................................................................ 9-4
9.5 Project C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks . 9-6
9.5.1 Project Overview ..................................................................................................... 9-7
9.5.2 Project Motivation ................................................................................................... 9-8
9.5.3 Key Research Questions .......................................................................................... 9-8
9.5.4 Initial Hypothesis ..................................................................................................... 9-9
9.5.5 Technical Approach ................................................................................................. 9-9
9.5.6 Task C1.1: Modeling Operational Information Content Capacity (OICC) and Factors
that Impact OICC (G. Kramer and K. Psounis, USC (CNARC), R. Ramanathan, BBN
(CNARC), A. Yener, Penn State (CNARC)) ..................................................................... 9-10
9.5.7 Task C1.2: Characterizing and Controlling QoI (R. Govindan and M. Neely, USC
(CNARC); S. Krishnamurthy, UC Riverside (CNARC); Q. Zhao, UC ............................ 9-16
9.5.7 Davis (CNARC); A.Bar-Noy, CUNY (CNARC); T.F. La Porta, Penn State
(CNARC); M. Srivatsa, IBM (INARC), T. Abdelzaher, UIUC (INARC)) ....................... 9-16
9.5.8 Task C1.3: Characterizing Connectivity and Information Capacity for Dynamic
Networks (Q. Zhao, UC Davis (CNARC), N. Young, UC Riverside (CNARC), A. Yener,
Penn State (CNARC), P. Brass, CUNY (CNARC), A. Swami, ARL) .............................. 9-25

NS CTA IPP v1.4 9-1 March 17, 2010


9.5.9 Task C1.4: Modeling the Impact of Data Provenance and Confidentiality Properties
on QoI (K. Levitt and P. Mohapatra (UC Davis); A. Smith, S. Zhu, and A. Yener (Penn
State); S. Krishnamurthy (UC Riverside)) ......................................................................... 9-29
9.5.10 Linkages with Other Projects ............................................................................... 9-39
9.5.11 Collaborations and Staff Rotations ...................................................................... 9-39
9.5.12 Relation to DoD and Industry Research .............................................................. 9-39
9.6 Project C2: Characterizing the Increase of QoI Due to Networking Paradigms ......... 9-42
9.6.1 Project Overview ................................................................................................... 9-42
9.6.2 Project Motivation ................................................................................................. 9-43
9.6.3 Key Research Questions ........................................................................................ 9-43
9.6.4 Initial Hypothesis ................................................................................................... 9-44
9.6.5 Technical Approach ............................................................................................... 9-44
9.6.6 Task C2.1: Characterizing Performance of Collaborative Networking for
Concurrency in Dynamic networks with Realistic Traffic (J.J. Garcia-Luna-Aceves and H.
Sadjadpour, UCSC (CNARC); Q. Zhao, UC Davis (CNARC), S. Krishnamurthy, UC
Riverside; M. Faloutsos, UC Riverside (IRC); B. Sadler, ARL) ....................................... 9-44
9.6.7 Task C2.2: Characterizing the Benefits of In-Network Storage and Cooperative
Caching (G. Cao and T.F. La Porta, Penn State (CNARC); B. Krishnamachari, USC
(CNARC); I. Iyengar, IBM (INARC), T. Abdelzaher, UIUC (INARC)) ......................... 9-48
9.6.8 Task C2.3: Characterizing the Impact of Scheduling on QoI (M. Neely and B.
Krishnamachari, USC (CNARC), A. Yener and T. F. La Porta, Penn State (CNARC), A.
Bar-Noy, CUNY (CNARC)) ............................................................................................. 9-54
9.6.9 Linkages with Other Projects ................................................................................. 9-61
9.6.10 Collaborations and Staff Rotations ...................................................................... 9-61
9.6.11 Relation to DoD and Industry Research .............................................................. 9-61
9.7 Project C3: Achieving QoI Optimal Networking (Deferred) ...................................... 9-63
9.7.1 Project Overview ................................................................................................... 9-63
9.7.2 Project Motivation ................................................................................................. 9-64
9.7.3 Key Project Research Questions ............................................................................ 9-64
9.7.4 Initial Hypotheses .................................................................................................. 9-64
9.7.5 Technical Approach ............................................................................................... 9-64

9.1 Overview

Our goal is to understand and characterize the capabilities of complex communications networks,
such as those used for network-centric warfare and operations, so that their behavior may be
accurately predicted and they may be configured for optimal information sharing and gathering.
The objective of such a network is to deliver the highest quality of information based on which
decisions can correctly be made to provide comprehensive situational awareness. This will
provide increased mission tempo and overall supremacy in managing resources engaged in a
mission. Network science must embody the vision of a network as an information source.
Therefore, in the CNARC we aim to characterize and optimize network behavior in a way that
maximizes the useful information delivered to its users. To this end we define a new currency by
which we evaluate a network: its operational information content capacity (OICC).

NS CTA IPP v1.4 9-2 March 17, 2010


We believe that this approach will truly capture the value of a network and allow a science to be
developed that fundamentally characterizes the volume of useful information that a network can
transfer to a set of users. To do this, one must consider the information needs of tactical
applications that share a network, and must cast the behavior of the network in light of these
needs. As the network delivers data to applications, the data is transformed into information.
Different applications require different types of sources and different network behavior in terms
of data delivery characteristics and security to distill useful information from the data received.
The data delivery characteristics and security properties of the network may vary depending on
the location of the source of data. The goal of the network is to deliver the data required from
which the highest quality of information (QoI) may be derived from the perception of the
application.

We work with the INARC to understand the characteristics on information and the underlying
information networks. These will be leveraged to determine the relative important of
information and how information must be treated as it is transported across a network. We will
work with the SCNARC to understand underlying social networks and how information is used
in making decisions.

9.2 Motivation

Our ultimate target is to understand and control network behavior so that the operational
information content capacity of a network may be increased by an order of magnitude over what
is possible today.

9.2.1 Challenges of Network-Centric Operations


The purpose of a tactical network is to disseminate the information required to make decisions
and increase mission tempo. This project directly addresses the definition of the quality of
information in terms of parameters that can be controlled or monitored within a network. With
this definition one can determine information capacities of networks, and determine which
network controls on which parameters must be in place.

This project establishes the limits of capacity of communication networks in terms of their
operational information content capacity. We explicitly consider tactical network characteristics
of size, dynamics, mobility and heterogeneity. The project ultimately accounts for interactions
with information and social networks. These underlying networks must be leveraged to a) fully
understand what information is important and how it is being used, and b) allocate
communication network resources intelligently to optimize the OICC.

9.2.2 Example Military Scenarios


Communications are critical for virtually all military operations. These can range from troop
movements, intelligence gathering, to mission planning. These networks operate in a variety of
conditions. Communication networks may be established in highly populated areas with
competing networks, or in the presence of adversarial networks so that communications must be
made robust and secure. We specifically examine wireless networks, both single- and multi-hop,
in which nodes are highly mobile and communications conditions are highly dynamic.

NS CTA IPP v1.4 9-3 March 17, 2010


9.2.3 Impact on Network Science
The results of this work will lead to a new characterization communication networks:
understanding, optimizing and controlling the operational information content capacity. This
new metric will lead to fundamentally new models and a new understanding of the properties of
networks that impact the OICC: data delivery characteristics, security properties in the network,
mobility, dynamics of communications environment and traffic, and information sources. These
new models and understanding will lead to new algorithms to control and optimize the OICC
obtained from these networks, and will ultimately lead to new protocol designs to achieve the
maximum OICC.

9.3 Key Research Questions

This Center aims to answer the following meta-level questions:


- How can OICC, which is a composite of quality of information and the amount of
information that may be delivered, be modelled to capture the network properties that
most impact it?
- Given the network properties that impact OICC, and underlying social and
information networks, what mechanisms are best suited to control and optimize (if
possible) the OICC of a network?
- Given the mechanisms that will most impact OICC, and the existence of optimal
solutions or known bounds, what protocol structures will come closest to achieving
optimal OICC in an operational network?

9.4 Technical Approach

We will achieve our high level objective by first developing comprehensive models capturing the
behavior of OICC and QoI and the properties of tactical networks that impact them (e.g.,
dynamics). From these models we expect to learn which factors have the largest impact OICC
and will be able to model network paradigms that mitigate and control these factors so that we
can improve the achievable OICC. With this understanding we will analyze protocol structures
to determine if they prevent networks from reaching their optimal QoI, and if so, explore new
protocol structures that alleviate these bottlenecks.

Thus, the CNARC will execute three projects:

C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks – This project
focuses on the behavior of OICC and QoI and the factors that impact them. In the first year, the
largest effort will be on C1. This is so that we may develop a fundamental definition of OICC
and QoI that is accepted across the CTA program; in fact this work will be done collaboratively
with the other centers. Within C1 we will also define the first models of OICC and QoI in terms
of network parameters, properties and constraints. INARC is participating in this project, and
ARL is collaborating

NS CTA IPP v1.4 9-4 March 17, 2010


C2: Characterizing the Increase of QoI due to Networking Paradigms – This project focuses on
the impact of select networking paradigms in increasing QoI and OICC. We will also have a
significant effort on C2 in year 1. The networking paradigms selected for the first year are based
on our experience with network research and will be tuned as the project progresses. As C1
matures we will be better able to quantify OICC and select network paradigms that we expect to
have the largest impact on OICC. INARC, IRC and ARL are collaborating on this project.

C3: Achieving QoI Optimal Networking – This project focuses on the structure of protocols that
may limit the QoI achieved in practical networks. Project C3 will be deferred until the second
year of the program. We will select protocols to analyze once we have an understanding of the
factors that impact OICC and QoI and the networking paradigms that hold the most promise.

NS CTA IPP v1.4 9-5 March 17, 2010


9.5 Project C1: Modeling Data Delivery in Dynamic,
Heterogeneous, Mobile Networks

Project Lead: A. Yener, Penn State


Email: yener@ee.psu.edu Phone: 814-865-4337
Project Lead: R. Govindan, USC
Email: ramesh@usc.edu Phone: 213-740-4509

Primary Research Staff Collaborators


A. Bar-Noy, CUNY (CNARC) A. Swami, ARL
P. Brass, CUNY (CNARC)
R. Govindan, USC (CNARC)
G. Kramer, USC (CNARC)
B. Krishnamachari, USC (CNARC)
S. Krishnamurthy, UC Riverside (CNARC)
T. F. La Porta, Penn State (CNARC)
K. Levitt, UC Davis (CNARC)
P. Mohapatra, UC Davis (CNARC)
M. Neely, USC (CNARC)
K. Psounis, USC (CNARC)
R. Ramanathan, BBN (CNARC)
A. Smith, Penn State (CNARC)
A. Yener, Penn State (CNARC)
N. Young, UC Riverside (CNARC)
Q. Zhao, UC Davis (CNARC)
S. Zhu, Penn State (CNARC)
M. Srivatsa, IBM (INARC)
T. Abdelzaher, UIUC (INARC)

NS CTA IPP v1.4 9-6 March 17, 2010


9.5.1 Project Overview
In this project we will characterize the impact of the data delivery capabilities of the network on
Quality-of-Information (QoI), and identify the fundamental performance limits of the network.
Our overarching objective is to model and determine the operational information content
capacity (OICC) of a network with realistic constraints. We emphasize that this is a new
performance metric unlike any considered before for communication links or networks.

Traditionally, capacity refers to the number of bits per unit resource that is reliably
communicated between source(s) and destination(s). It is a limit that, in theory, is achievable and
one that is provably the upper bound beyond which reliable communication is not possible. Since
Shannon [1] characterized the channel capacity of a single transmitter-to-single receiver link
sixty years ago, there has been an intense effort in realizing or at least coming close to this limit
in real systems. Information theory, in the past few decades, has evolved to include many new
communication paradigms using the mathematical framework of Shannon’s: for example,
multiple transmitters or receivers for which capacity is a region that consists of the collection of
all rate tuples at which reliable communication is possible and beyond which it is not, or a model
where a helper node relays information between the transmitter and the receiver. Though the
models considered appear deceivingly simple – a transmitter and two receivers [2] [3]; two
interfering transmitters and with two corresponding receivers [4]; a three-node network including
a relay [5] -- finding the exact channel capacity in the sense of Shannon has so far eluded the
information theory community.

At the other extreme, another line of work starting with the seminal work by Gupta and Kumar
[6] considered networks consisting of asymptotically large number of nodes, and provided an
initial modeling tool for the discussion of the fundamental limits on the capacity of large wireless
networks. They concluded that the capacity of wireless networks decreases as the number of
nodes increases, and since the publication of their results almost 10 years ago, many researchers,
including ourselves, have addressed ways to alleviate the impact of multiple access interference
(MAI) on the throughput capacity of wireless networks. However, many limitations must be
revisited in order to fully understand the true limits of a tactical network consisting of social
networks, information networks and an underlying communication and storage network in a
dynamic environment consisting of heterogeneous nodes.

We posit that the fundamental limits of tactical networks are inherently different than the limits
of broadcast links, and that we should consider all the inherent factors that constitute a tactical
network, and which will be reflected in QoI. We believe that an essential goal of the proposed
center is to develop analytical tools that further our understanding of realistic tactical networks.

We certainly acknowledge that the proposed effort constitutes uncharted territory for
information, communication, and network theorists, as well as network protocol experts. We
submit that recognizing the above four key aspects of fundamental limits of a tactical network
and the adoption of the proper central theme to glue our efforts together are critical. As a result, a
definition of a new currency of information is key to getting results for the new theory of
networks that is as fundamental as what Shannon capacity is for a single link. We call this
fundamental metric the operational information content capacity of the network.

NS CTA IPP v1.4 9-7 March 17, 2010


To fully account for the social and information networking aspects of tactical networks that
impact that nature of their information flows, we will collaborate with the INARC and SNARC
researchers, seeking common modeling grounds derived from complex networks.

9.5.2 Project Motivation


Existing models of networks have many limitations that must be removed to characterize the
achievable QoI and OICC in a network. First, in all the models of wireless network capacity to
date, the fundamental limit has intrinsically been assumed to be the number of bits per unit that
the network can communicate, without regard to the meaning of the bits. Second, the models
adopted to date fail to reflect the social and information network aspects of a tactical network, or
any network in general. Third, the analytic tools, and hence the insights, developed over the last
decade have been largely limited to homogeneous communication networks in which nodes have
identical capabilities, nodes have identical demands, node densities are uniform, the level of
cooperation is identical throughout the network, and so forth. Fourth, while a few efforts have
considered the impact of mobility, there are no results on the capacity of wireless networks when
nodes follow realistic mobility patterns.
Challenges of Network-Centric Operations
The goal of a network in a military environment is to enable rapid decision making to increase
mission tempo. This requires sufficient amount of information at a required quality to be
delivered to decision makers. Challenges include understanding what information is important
(in collaboration with the SCNARC) and with what quality (e.g., delay, security property)
information must be delivered for knowledge extraction to be effective (in collaboration with
INARC). This requires defining the QoI of information in terms of network properties and
translating this into the OICC.
Example Military Scenarios
Communications are critical for virtually all military operations. These can range from troop
movements, intelligence gathering, to mission planning. We consider all of these cases in our
models and analysis. We also consider all military communications environments – multi-hop
wireless, dynamic conditions, high mobility, heterogeneous nodes and heterogeneous traffic.
Impact on Network Science
The results from this project will allow the design of networks and algorithms that approach the
theoretical limits of operational information content capacity. With knowledge of bounds, such
networks and algorithms may be properly evaluated and controlled to maximize decision-making
capabilities and command and control operations.

9.5.3 Key Research Questions


We seek to answer the following research questions:
- What are the fundamental limits of operational information content capacity (OICC)
in dynamic, heterogeneous mobile networks jointly considering data delivery and
security properties?
- How do we characterize QoI? How do we characterize the impact of QoI on the
OICC and vice versa?
- What are the relative impacts of properties and constraints of tactical MANETs on
OICC, and why?

NS CTA IPP v1.4 9-8 March 17, 2010


9.5.4 Initial Hypothesis
Our research hypothesis can be summarized as follows: We forecast that our modeling of OICC
will yield a new measure of capacity that is vastly different than any measure of capacity thus far
studied. We expect this new metric to differ from traditional metrics to date with respect to
sensitivity to various node, network, and security properties and capabilities. Consequently, we
forecast these capabilities and properties to have a profound impact on OICC.

9.5.5 Technical Approach


Overview
This project is organized into four tasks. The first two develop models for OICC and quality of
information (QoI), the two tenents by which networks should be evaluated. The subsequent two
tasks focus on properties of networks (dynamics of different types) and security that will impact
OICC and QoI.

In particular, the tasks are:

1. Modeling Operational Information Content Capacity and Factors that Impact OICC – In this
task, we will develop models to determine the limits of OICC in the face of realistic
constraints. We will consider scaling to the size of large tactical networks and define
network parameters that impact network performance. We will work with INARC to
determine the importance of bits of information. We will also interact with Tasks C1.2-4
towards modeling the OICC for dynamic heterogeneous networks and its impact on QoI and
vice versa.

2. Characterizing and Controlling QoI - In this task, our goal is to systematically understand the
issues pertaining to the relationship between the network and the quality of information (QoI)
ultimately delivered to the end user. It can be impacted by the network due to the data
delivery characteristics of the network (e.g., loss, delay, and jitter) and the security services
offered in the network. This is collaborative work with INARC.

3. Characterizing Connectivity and Information Capacity for Dynamic Networks – In this task
we explicitly model heterogeneity and dynamics and their effect on QoI. We consider both
temporal and spatial dynamics. We will work with the EDIN CCRI task on mobility
modeling to capture mobility dynamics. We will also work with Task 4 towards
understanding the impact of security properties on connectivity and capacity.

4. Modeling the impact of data provenance and confidentiality properties on QoI - QoI is
impacted by security characteristics of the information source(s), the network that transports
it, and the hosts that process it. Our aim is to characterize the impact of security on the
quality of information but also to mitigate its impact.

NS CTA IPP v1.4 9-9 March 17, 2010


References
[1] C. E. Shannon, “A mathematical theory of communication,” Bell. Sys. Tech. Journal, vol. 27,
pp. 379–423, 623–656, 1948.

[2] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information Theory, vol. 18, pp.
2–14, January 1972.

[3] R. G. Gallager, “Capacity and coding for degraded broadcast channels,” Problemy Peredachi
Informatsii, vol. 10, no. 3, pp. 3–14, 1974.

[4] T. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE
Transactions on Information Theory, vol. 27, pp. 49–60, January 1981.

[5] T. M. Cover and A. A. E. Gamal, “Capacity theorems for the relay channel,” IEEE
Transactions on Information Theory, vol. 25, pp. 572–584, September 1979.

[6] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Transactions on
Information Theory, vol. 46, pp. 388–404,March 2000.

9.5.6 Task C1.1: Modeling Operational Information Content Capacity (OICC) and
Factors that Impact OICC (G. Kramer and K. Psounis, USC (CNARC), R.
Ramanathan, BBN (CNARC), A. Yener, Penn State (CNARC))

Task Overview
The operational information content capacity is a new paradigm where we seek to understand
performance limits of communication networks that take into account (i) realistic limitations on
node and network capability and complexity; and (ii) the quality of information (QoI) that is
communicated between the various entities. We will seek the limits of a communication network
under realistic constraints with performance metrics that reflect the impact of information
content and QoI. As eluded to in the overall summary of the project, this is a performance metric
that has not been considered or defined up to date.

Task Motivation
Only by understanding the limits of a communication network in terms of its OICC can we start
to make progress on controlling and optimizing the performance of a network.

Key Research Questions


In this task we seek to answer the question: What is the OICC of a communication network?

Initial Hypothesis
We expect that we will find that certain constraints, such as the number of nodes in a network,
the varying capabilities of the nodes, and the communication environment (and the resulting
topology) will have varying impact on QoI, and will use these results to focus the work in
subsequent tasks. We expect that these properties will jointly affect the QoI and OICC in a way

NS CTA IPP v1.4 9-10 March 17, 2010


which magnifies the variation of OICC far beyond traditional measures of network capacity
considered to date.
Technical Approach
Our approach is to define a network model that includes several parameters of interest to
traditional networks and additional parameters such as security and information-specific
requirements to define the OICC of a network. Our goal in the first year is to define this
framework and capture interdependencies of these parameters. Since this framework is a
fundamental departure as compared to “capacity” definition(s) to date, this task is an overarching
task of the whole project and while the PIs listed will be the primary drivers; we expect tight
collaboration across the center tackling many facets of this challenge.

Given the grand challenge this task presents, we plan to tackle it from a number of angles as
outlined below.

Fundamental Building Blocks for OICC


Towards establishing the OICC framework, we will address the following main steps: (i)
developing a concurrent communication network model; and (ii) establishing the operational
information content capacity. In subsequent tasks we will usse these results to design
communication strategies that can attain the OICC. Our recent work on graph-theoretic bounds
based on fd-separation [1][2] that applies to any kind of cooperative coding can be extended to
include all aspects of the metric QoI=f(I,D,S). In particular, understanding precisely what the
security parameter S means promises to be interesting. This will be pursued in the first year.
Once the performance limits are established, in subsequent tasks communication strategies will
be designed, where we expect that unifying the basic coding methods for interference networks
[2] in a smart way will be needed including treating interference as noise, avoiding interference
(TDM/FDM), precoding, power control, partial interference cancellation, interference alignment,
interference forwarding, computation over channels and the use of feedback to adapt to time
variations.

Impact of Realistic Constraints on Network Operations


A main characteristic of tactical network is their dynamic nature owing to both the mobility of
the nodes to the communication environment. A tactical network of reasonable (considerable)
size is bound to consider realistic constraints brought by the environment or the communication
scenario. In order to determine the OICC, we must account for these constraints explicitly. For
example, dynamic resource allocation that includes link scheduling, power and rate allocation is
a must as the information to be communicated over the networks is subject to time-varying link
capacities, interference, and time-varying link connectivity. The network must be designed such
that performance can be guaranteed over a wide range of network parameters, i.e., the system
design must be robust. For example, the scheduling or routing algorithms must not fail if some of
the wireless links fail, or if the link connectivity changes. Further, resource allocation algorithms
must be distributed, i.e., it must function only with the local information about queue lengths and
channel state information. Due to mobility, the nodes may not even have accurate information
about their neighborhood, i.e., some neighboring nodes may leave or new neighboring nodes
may join the neighborhood. We will characterize the loss in performance in terms of achievable
stability region for if the topology information is time-varying and inaccurate. This task will

NS CTA IPP v1.4 9-11 March 17, 2010


interact with Tasks C1.3 and C2.3 and serve as a bridge towards defining realistic performance
limits.

Realistic Scaling
Parallel with the above realistic modeling effort, we will also pursue the understanding of
performance and scalability of finite-sized networks. Over the years, a number of theoretical
results have appeared in the academic literature regarding fundamental limits. However, the
implication of these asymptotic results on finite, medium-sized networks is not clear. For
example, the scope of military wireless networks is likely to be around the brigade level (a few
thousands of nodes).

This effort complements work on asymptotic scalability by investigating in-practice scalability.


Asymptotic scalability, under which the bulk of work along the lines of [3] lie, refers to the order
of growth of some metric (e.g. capacity) as a function of size. In contrast, we define in-practice
scalability as the number of nodes (or other parameter) beyond which a network will not work
“adequately” – as part of the task, we shall develop a formal definition of “adequately”. Unlike
asymptotic scalability, which is unqualified (“Network (of type) X does not scale”), in-practice
scalability is qualified (“Network X with parameter set {P} scales to 1000 nodes”). Information
theorists and network scientists/engineers use the two different interpretations, but there has been
no science to bridge the gap.

The notion of in-practice scalability is essentially one of practical performance modeling by


considering the various features of a military network. In year one we start with four input
parameter vectors:

1. A network definition N that captures the size, mobility, density, degree, diameter,
connectivity and other properties. The idea is to capture in simple terms the topological
properties that affect the capacity and protocols underlying the network. For instance, a
line network (military convoy), a clique (parking lot) and a random deployment are all
quite different from each other in the constraints that they present. We shall draw upon
our work in density-adaptation [4] and topology characteristics [5] in defining this.

2. A node parameter definition P that captures the data rate, number of transceivers,
directionality [6], energy constraints, processing constraints, storage constraints and other
such properties. The idea is to capture in simple terms limitations on information carrying
capability that directly affects the performance.

3. A protocol definition R that captures the overhead induced by MAC, network and
transport layer protocols, including security protocols. . These may be specific to
networks, but a first-order model can have choices like TDMA/CSMA-CA, link-state/on-
demand, TCP/UDP respectively. We shall draw upon our prior work in link state routing
[7], mobility-assisted on demand routing [8][9], and TDMA versus CSMA-CA MAC
[10] in informing this vector.

4. An information profile load definition L that captures the quality, nature, scope and other
properties of the information that is transported across the network for user consumption.

NS CTA IPP v1.4 9-12 March 17, 2010


This includes the traditional concept of offered load, including rate, packet size, hop-
distribution, but goes beyond that to capture QoI.

Given a tuple of vectors, (N, P, R, L), each of which has a number of to-be-defined variables; we
shall derive expressions that characterize the performance. The expressions can be used to
estimate any one value given other values, or a k-dimensional envelope for a set of k parameters
given an instantiation of the other.

We note that the parameter vectors may be inter-related within and outside of the
communications network – for example, the protocol overhead (R) depends upon the topology
(N), which in turn may depend upon the underlying social and information pathways. Similarly,
the load definition (L) depends upon the social network (military hierarchy) and the information
network underlying it. In investigating these dependencies, we expect to collaborate heavily with
the INARC, SCNARC and IRC.

We shall inform and refine our model with real military networks being developed at BBN
Technologies (a part of Raytheon Corp.), as available (subject to sensitivity constraints). In
particular, we plan on using the DARPA WNaN network as a basis for “sounding out” the
accuracy of out model, and progressively refining it.

Figure 1: Example framework for analytical model of operational performance based on an evolving
interconnection of specific models. Orthogonal refinements can be made of each model to progressively
increase accuracy. Note: This is for illustration purposes only, not all components or relationships shown.

NS CTA IPP v1.4 9-13 March 17, 2010


We shall first develop a framework for capturing the various components of the system
performance, and specify their inter-dependencies. The framework encapsulates a hierarchical
structure where a more general model is recursively composed of more specific models, some of
which may use known theoretical results. An example is shown in Figure 1. At the “highest”
level are the four components – network topology, node parameters, protocols and information
profile discussed earlier. Developing these requires capturing specific aspects – for instance, the
Topology model requires an understanding of dynamics, which in turn requires an understanding
of interference and mobility. Other parts of the NS-CTA effort will produce results that can then
be drawn into these models – in this example, CNARC is working on both interference and
mobility models. Of course, the framework is not strictly hierarchical since there are several
inter-dependencies, a couple of which (e.g. topology affecting routing overhead) are shown in
the figure.

Other centers, such as SCNARC and INARC will be tapped, for instance, in the characterization
of load. Similarly, computing multicast branching factors and tree shapes (which determines
loading), or routing overhead computation (which depends on network structure) could leverage
sophisticated graph theoretic results from the EDIN part of the project. Indeed, our effort can be
a source of specific problems and tasks for information theorists, graph theorists, experimenters
etc. – problems that stem from real-world needs.

As a result of our work, it will be possible to theoretically predict rough-order-of-magnitude


performance for large networks and perform sensitivity analysis. Currently, there exists models
for a few of the specific models (e.g. routing overhead of proactive/reactive protocol), but there
is no theoretical framework or model that combines all the elements of a system into a single
evolvable model. Initial very preliminary work along these lines in the context of DARPA
WNaN has already proven valuable – somewhat counter-intuitively, small changes in average
degree or MAC efficiency, for instance, can be seen to have a far greater influence then routing
protocol overhead reduction. We hope to provide similar practical insights that complement
asymptotic analyses.

Validation Approach
We will validate our framework by testing the sensitivity of the OICC against several of the
parameters described above. We will compare these results to the traditional capacity measure
of networks to measure the order difference in terms of sensitivity.

Summary of Military Relevance


The goal of a network in a military environment is to enable rapid decision making to increase
mission tempo. This requires understanding and optimizing the OICC.

Research Products
In the first year we will
1. Develop a preliminary OICC framework including impact of realistic constraints
identified to be crucial.
2. Develop the framework of model hierarchies and inter-relationships both within and
outside of CNARC.

NS CTA IPP v1.4 9-14 March 17, 2010


3. Create a first instantiation of the models to derive expressions for at least the information
carrying capacity of a given network.

References

[1] G. Kramer, “Communication on line networks with deterministic or erasure broadcast


channels," 2009 IEEE Inf. Theory Workshop, Taormina, Italy, pp. 404-405, October
2009.

[2] Y. Tian and A. Yener, “The Gaussian Interference Relay Channel with a Potent Relay,”
IEEE Global Telecommunications Conference, Globecom'09, Honolulu, Hawaii,
December 2009.

[3] P. Gupta and P.R. Kumar, “The Capacity of Wireless Networks,”, IEEE Transactions
on Information Theory, 2000

[4] R. Ramanathan, "Making Ad Hoc Networks Density Adaptive", Proc. IEEE MILCOM
2001, Tysons Corner, Virginia, October 2001.

[5] E.L. Lloyd, R. Liu, M.V. Marathe, R. Ramanathan, S.S. Ravi, "Algorithmic Aspects of
Topology Control Problems for Ad Hoc Networks," Proc. ACM MOBIHOC 2002,
Lausanne, Switzerland, June 2002

[6] R. Ramanathan, "On the performance of ad hoc networks using beamforming


antennas", Proc. ACM MOBIHOC 2001, Long Beach, California, USA, October 2001.

[7] C. Santivanez, R. Ramanathan, I. Stavrakakis, "Making Link-State Routing Scale for


Ad Hoc Networks", Proc. ACM MOBIHOC 2001, Long Beach, California, USA,
October 2001.

[8] T. Spyropoulos, K. Psounis, and C. Raghavendra, Efficient Routing in Intermittently


Connected Mobile Networks: The Single-copy Case, IEEE/ACM Transactions on
Networking, Vol. 16, Iss. 1, pp. 63-76, February 2008.

[9] T. Spyropoulos, K. Psounis, and C. Raghavendra, Efficient Routing in Intermittently


Connected Mobile Networks: The Multiple-copy Case, IEEE/ACM Transactions on
Networking, Vol. 16, Iss. 1, pp. 77-90, February 2008.

[10] A. Jindal and K. Psounis, Characterizing the Achievable Rate Region of Wireless
Multi-hop Networks with 802.11 Scheduling, IEEE/ACM Transactions on Networking,
Vol. 17, Iss. 4, pp. 1118-1131, August 2009.

NS CTA IPP v1.4 9-15 March 17, 2010


9.5.7 Task C1.2: Characterizing and Controlling QoI (R. Govindan and M. Neely,
USC (CNARC); S. Krishnamurthy, UC Riverside (CNARC); Q. Zhao, UC
Davis (CNARC); A.Bar-Noy, CUNY (CNARC); T.F. La Porta, Penn State
(CNARC); M. Srivatsa, IBM (INARC), T. Abdelzaher, UIUC (INARC))

Task Overview
In this task our goal is to systematically understand the issues pertaining to the relationship
between the network and the quality of information (QoI) ultimately delivered to the end user.

QoI is a composite, multi-dimensional, metric that captures the trade-offs of several factors to
characterize the information ultimately delivered to the application. It allows us to model the
network as an information source. QoI applies to mission-critical information that is desired by
the end-user. The quality of this content will be specified in terms of a few user defined high
level requirements. We seek to determine the extent to which the network can fulfill these high
level requirements.

This task differs from the effort underway in the ITA program in the following major ways: (i)
the ITA program examines QoI in sensor networks; here we take a much more broad view of
information; (ii) here we model many network properties, such as security, when modeling QoI;
(iii) here we consider underlying information and social networks, and their relationship with
information and decision making, when defining QoI.

This task has INARC participants.

Task Motivation
QoI can be understood as a utility defined in terms of the information obtained from the network.
Information is derived from data. The type and quality of the information derived from the data
is application specific. It can be impacted by the network due to the data delivery characteristics
of the network (e.g., loss, delay, and jitter) and the security services offered in the network.

To the best of our knowledge, there are no efforts to date towards computing a metric that is
similar to QoI outside of preliminary research conducted in the ITA program. There have been
several studies on providing Quality of Service (QoS) support but we stress that the two are not
the same.

The benefit of using QoI as the metric of interest is that it allows us to model and characterize
networks in terms of the information it can transfer, not simply the data. Often information used
to drive a decision may come from multiple sources, in multiple formats, and with varying
properties. By focusing on information, we are able to fully leverage social and information
networks to the benefit of the communications network.

Key Research Questions


We seek to answer the question: What is the form of QoI in terms of a multi-dimensional
function with respect to network parameters?

NS CTA IPP v1.4 9-16 March 17, 2010


Initial Hypotheses
We expect that the QoI achievable by a network can be described by (possibly stochastic)
manifold in a multi-dimensional space. We expect to find that under certain assumptions and
constraints, an optimal QoI may be found.

Technical Approach
Our approach is to define a multi-dimensional function representing QoI that can be used across
the CTA, and then investigate approaches by which such a function can be optimized.

Since this is a non-classical approach, we first attempt to clearly define QoI, to explore the
impact of network structure, dynamics, and other factors on the QoI delivered to the user, and to
explore the mathematical foundations behind QoI metrics. This exploration will reveal
architectural trade-offs and mechanism choices, and help us, in subsequent years, to define an
architecture, mechanisms and protocols for QoI. The task is organized into several sub-tasks:

1. Definition of the QoI Space - We define QoI as a function with respect to network
parameters. We start with tractable subsets of network properties which will be extended
and jointly developed with the INARC and SCNARC.
2. Evaluating architectural choices for QoI – Given the definition of QoI, we evaluate
architectural tradeoffs in accurately estimating QoI in different classes of networks. This
step will give us a fundamental understanding of the space of designs for QoI
mechanisms.
3. Optimizing QoI metrics – Finally, we will explore the theory of optimizing complex
non-convex QoI metrics in a stochastic sense. This sub-task will help define operational
bounds on the achievable QoI in a network under various conditions.

Prior Work
There has been significant work within the DoD community on QoI and network-centric warfare
[8][9][10]. In much of the work QoI is central to what extent information is “shareable.” This
work defines several different types of attributes by which QoI may be judged. These include
attributes that are objective, related to fitness-for-use, etc. Many prior works propose various
metrics which are composed into QoI, such as accuracy, currency, clarity, along with many
others [11][12][13][14][15][20]. In particular, [13] classifies these metrics into “situation
independent” and “situation dependent.”

In the ITA program, a great deal of effort has been spent on determining the difference between
quality and value of information, and how to codify QoI [16]Error! Reference source not
found.. They propose that the value of information depends on how useful the information is to
taking an action, while QoI is related to fitness of information. The research emphasizes how
metadata representing QoI metrics is important to convey to information recipients how data
may have been processed, for example, fused, within the network.

In [21] the authors propose “quality views.” Quality annotations, a type of meta-data, are
provided with information to give an indication of its quality. Different types of annotations are
called quality evidence. Operators, called quality assertions, may be applied to the annotations
to determine a rank or rating for the information. These quality assertions are domain specific.

NS CTA IPP v1.4 9-17 March 17, 2010


Sub-Task 1: Definition of the QoI Space and determining the QoI of a network

QoI as experienced by the user is impacted by several factors. A simple way to conceive QoI is:

QoI = f(I, D, S)

where, I, D and S characterize the source information, the network data delivery performance
and the level of associated security, respectively. The QoI is application specific; in other words,
for a given application the QoI requirements may differ. Formulating a generic utility to
characterize the QoI will be a challenge given that one expects this utility to depend on the
applications, the context (the topology of the network, the terrain), the user requirements based
on the tactical mission at hand (under attack versus peace missions), the heterogeneity of the
nodes that compose the network (in terms of memory, CPU etc.), and human input.

QoI may be viewed as a type of utility function that captures how useful information is to an
application. The QoI achieved will vary over time depending on other applications using the
network, the state of the network, etc. The type of utility function will vary depending on the
application. Some applications will find information useful in increments; others may require
some aspect of information in its entirety for it to be useful. Given the short term and long term
uncertainties associated with many of the above factors, the QoI metric can be expected to be
stochastic in nature. In other words, one might expect a QoI metric to be expressed as a
probability that certain criteria are fulfilled or a function of this probability (such as a moment).

To provide an example, the QoI for a node that requires authenticated images from a specific
location within a certain delay constraint might be expressed as the joint probability

P{{the achieved throughput} > τ,{the messages received are authenticated}, {the latency is} < δ,
{there are camera sensors in the desired location},{the location information is correct}}

When computing this probability, note that the factors will have to be jointly considered. As an
example, if messages for a certain application are authenticated using digital signatures, this will
affect the achievable throughput and delay for that application. The location of the camera(s) will
have an impact on the delay. Depending on the efficacy of the localization schemes, there may
be errors in the location and this may bias the probability. The mobility, zoom and steering
capabilities of camera sensors will dictate whether or not there are such sensors in the location of
interest. The throughput and delay functions will depend on the routes that are possible, the
terrain, interference etc. The objective of the node under discussion may be to maximize this
probability. Clearly this is a complex task given the number of constraints; thus computing the
above probability is extremely difficult. The problem becomes even more complex when we
factor in human behaviors and more stringent application requirements, and when we consider
that multiple missions will be competing for resources and affect the QoI of each other.

If we now consider a network where each node has a different set of such QoI requirements, the
problem of meeting the requirements will be a challenge. In such a setting, one might envision a
network-wide QoI index, which is the joint probability that the QoI requirements of all the users
in the network are satisfied. Since the requirements of the different users consume resources and

NS CTA IPP v1.4 9-18 March 17, 2010


any one mission interferes with other missions, the QoI achieved by one mission is constrained
by the very existence of other missions. In other words, there is a correlation between the
performance and security that can be provided to the multiplicity of simultaneously ongoing
missions in a network. Our ambitious goal is to find generic expressions for computing this
network-wide QoI.

In particular, we will determine the data delivery capabilities of the network under various
conditions; we will consider more realistic PHY layer representations, heterogeneity and the
impact of security on data delivery. The behavior of the network under attack and the
ramifications of enforcing security policies will be characterized. The type of information source
is dictated by the application and available sources. We will work on this collaboratively with
INARC.

We point out that our goal here is four fold. First we wish to fully understand the interactions
between the information sources and network on the achievable QoI. We will develop a
formulation of QoI to first include a subset of the important metrics of interest. We expect this
to be an iterative process with the INARC and IRC. Second, we propose to determine the highest
level of network-wide QoI that can be provided in any given tactical deployment setting. We
will then determine via a combination of experimentation and analysis if current protocols will
be able to provide this highest possible QoI. If not, we will undertake a careful assessment to
identify which of the functionalities (either in the protocol stack or in the security suite) is
causing the degradation. New protocol structures and algorithms will be explored as appropriate
towards enabling the highest QoI.

In our work we have classified QoI metrics into intrinsic and contextual. Intrinsic metrics assess
the quality of information independent of the situation. Attributes include concreteness, freshness
(the age of information), and provenance, among others. Contextual metrics include fitness-for-
use attributes, like accuracy, timeliness, and credibility (or believability). The research on
provenance and credibility and their impact on trust are part of the Trust CCRI.

The following efforts will be undertaking during the first year:

1. We will determine the QoI of a network in terms of a subset of data delivery and security
requirements. We propose to use an initial definition of QoI wherein, the information
quality only pertains to a subset of the intrinsic and contextual attributes of QoI that we
expect to study over the life of the program. For the intrinsic metrics we will consider
data freshness and provenance; for contextual metrics we will consider accuracy and
timeliness. We will not address the modeling of provenance as it relates to trust, because
that is being addressed in the Trust CCRI; here we focus on the interactions of
provenance with the other metrics on overall QoI. The objective is to jointly satisfy a
user requirement in terms of these metrics. In other words, the QoI is the probability that
each of these requirements is satisfied to a desired degree.

Clearly, there is a challenge in terms of representing QoI in a compact form. We will


work with INARC on their efforts on QoI metadata. There is a trade-off in representing it
as a stochastic metric (as an example, a probability that certain conditions are satisfied as

NS CTA IPP v1.4 9-19 March 17, 2010


described above) as opposed to simply a requirement in terms of satisfying a range
requirement (as an example, the delay should be no higher than say D). While the first
representation is more comprehensive in capturing temporal and spatial variations, the
second representation could offer better tractability of our analyses. We will consider
both and examine the trade-offs.

In our first year, we will first map the QoI to a vector [freshness, provenance, accuracy,
timeliness]. We will determine if each of these metrics individually exceeds a threshold.
Otherwise said, the freshness should be higher than say , the provenance measure
should be higher than and accuracy not lower than . We will consider both the
representations outlined above (the stochastic and the range based representations). To
begin with we will consider a homogeneous network wherein the requirements are the
same for all the nodes in the network. Note that freshness and timeliness, and several
variants have been previously considered for impact on data quality, and we will build
upon this work [22]. We will also work with task C2.2 to determine the impact of in-
network storage on freshness and timeliness.

2. QoI is stochastic and dynamic in nature. It can vary rapidly in both temporal and spatial
domains. It is thus crucial that end users be able to project and predict QoI evolution so
that optimal decisions can be made on how to proceed with each specific mission. The
major challenge here is that the stochastic model of QoI evolution may not be known a
priori. The presence of multiple simultaneous missions distributed across the network
further complicates the problem: actions taken regarding one mission may affect the QoI
experienced by another mission (for example, due to shared communication resources).

We propose to develop real-time algorithms for learning and predicting QoI evolution to
support multiple distributed data flows associated with different end users/missions.
Comparing with offline learning, real-time learning avoids overhead by learning from
information-bearing data instead of training data; it offers improved performance over
time with the accumulation of observations; it adapts to the dynamics of QoI evolution
and allows online prediction.

Our technical approach rests on our preliminary work [3] where we have developed a
stochastic optimization framework for decentralized multi-armed bandit with multiple
distributed users. This framework gives a new formulation of the classic bandit problem
that considers only a single user [4]. It captures the tradeoff across exploration (exploring
new routes to learn their QoI evolution for future use), exploitation (exploiting routes
with a good history of past QoI evolution), and competition (avoiding congestion caused
by competing users).

The design objective is to minimize the system regret (or the so-called cost of learning)
defined as the performance loss compared to the ideal scenario where the QoI model of
every route is perfectly known to all users and collisions among users are eliminated
through perfect centralized scheduling. In the first year of this project, our objective is
twofold: (i) establish the fundamental limit on the system regret of real-time distributed
learning algorithms; (ii) develop distributed learning algorithms to achieve the

NS CTA IPP v1.4 9-20 March 17, 2010


fundamental limit. We will start with a temporally independent stochastic model for QoI
evolution. We will then address how temporal correlation in QoI evolution affects the
minimum system regret and the design of distributed learning algorithms.

As evident in the discussion in the above paragraph, there is a cost associated with
providing the QoI. Thus, we also seek to estimate and capture the cost of providing the
QoI. In order to determine whether or not the above attributes are satisfied, nodes will
need to exchange information. When more information is exchanged, the accuracy of the
QoI estimate can be higher, at the expense of increased overhead.

3. The action that a node takes based on the input will also affect the QoI achieved in the
network on the whole. In particular, when provided with different choices, a node may
choose a specific action. To provide a simple example, a node might choose video to
obtain high accuracy but with lower confidentiality (since encrypting video streams could
be expensive). This affects the other flows in the network differently as compared to a
case where the node chooses encrypted text instead. The traffic patterns are clearly
different and in addition the routes chosen could differ. This in turn affects the QoI
achieved by the other nodes. Our goal in the first year is to examine the impact of such
choices on the overall network QoI that can be achieved. We once again will consider
preliminary representations that only consider a subset of performance objectives and
will gradually move to more complex representations in later years.

4. QoI is a multi objective optimization goal as evident from the above discussion. A
different way to attack it is to define the function f (as above perhaps) and maximize it.
Another way is to give a “budget” to all but one of the parameters and then maximize the
value of the non-budgeted parameter. Then, we can optimize f by binary search over the
values of the other parameters. We propose to explore how such methods approximate
the ultimate goal of optimizing the function f. Again, in the first year, we will confine
ourselves to the function defined above.

Sub-Task 2: QoI in network models: Towards evaluating architectural choices

Once we have determined a definition for QoI, the next step is to understand how, given the
information requested by a user, the network determines the QoI to be delivered to the user. We
classify data into two types. User data is data that is transformed into information based on
which operational decisions are made. Network data is data that gives insight into how the
network is performing (e.g., control information or performance statistics). The user data
received by a sink from each source must be transformed into information. The quality of this
information is partially a function of the delivery capabilities of the network and the security
services provided. The functions for determining the quality of this user information will be
developed jointly with the INARC. In addition, to compute the expected QoI, network and user
data must be collected and analyzed in a distributed way. This computation must be performed
on incomplete data and its collection will add overhead to the network.

NS CTA IPP v1.4 9-21 March 17, 2010


The challenges in assimilating these types of data have not previously been considered from a
holistic perspective. In fact, the collection and assimilation of network and user data to determine
a QoI capability of a system has not before been considered. Although direct observations of a
node have a higher credibility with respect to the specific observation, they may not provide high
confidence with regards to the holistic network behavior. In particular, a locally observed
phenomenon may be triggered by an event that originates elsewhere, possibly far away from the
node under discussion. If a node is able to make informed decisions based on the locally
observed data it will do so; if not, it either determines a strategy to disseminate this data (using
lower level network protocols) to other nodes in its vicinity or seek human input. In order to
correctly characterize the QoI, it is important to (a) account for correlations between different
observable network events, and (b) account for lags in time associated with the collected data.
The network data that is collected may be from a plurality of sources including sensors, lower
level protocol stack modules (such as reports of packet drops), and human input.

In the first year, we will study the impact of network structure, and dynamics on QoI
assessments. Specifically, we are interested in the following questions:
1. For a given piece of information, how does network structure affect the QoI that can be
delivered to the user?
2. For the same information, how does QoI degrade, if at all, with network evolution and
dynamics?
3. How do the choice of specific mechanisms affect the QoI delivered to the user:
o The use of stochastic network performance models to estimate QoI: Such models
are cheap, but may be inaccurate under certain circumstances.
o The role of active and passive measurements of network performance in estimating
QoI. Specifically, how does the overhead of measurements impact the QoI, and is
there are point of diminishing returns where additional measurements do not
significantly improve QoI?

Sub-task 3: Optimizing QoI Metrics

Techniques for maximizing time averages of network attributes such as throughput and power
expenditure, and for maximizing concave functions of these time averages, are developed for
stochastic networks with ergodic events in our prior work [1][7] using Lyapunov Optimization
theory. This is an extension to the prior Lyapunov Stability theory of [2]. Alternative fluid
model analyses are developed in [5][6] and early primal-dual techniques are considered for
simple one-hop wireless problems with infinitely backlogged sources in [3][4].

While these techniques are powerful, they do not allow optimization of the more complex
Quality of Information (QoI) metrics described in our proposal. This is due to several reasons:
First, the metrics may have combinatorial and non-convex properties, and may therefore be
fundamentally intractable. Second, the proposed metrics should include models of distortion
which are not yet fully understood. Third, the metrics may be associated with output from multi-
stage network tasks that cannot yet be treated by stochastic network optimization theory.

We propose to extend the theory to address these issues. Preliminary work will focus on metrics
of distortion. These models will be enhanced as we learn more about distortion throughout the

NS CTA IPP v1.4 9-22 March 17, 2010


course of this project. To handle the possible non-convex nature of the QoI functions, we will not
require our solutions to yield global optima. Rather, we seek stochastic algorithms that yield
some notion of “local optimum,’’ as defined in a stochastic sense. This work will be informed by
existing stochastic network optimization theory as well as existing static optimization theory that
enable convergence to local optima for non-convex problems. We shall also explore utility
metrics based on the output of multi-stage network operations. Multi-stage operations include
breaking computational tasks associated with network data into distributed sub-tasks that can be
performed by different nodes (such as data processing or compression), and network coding or
data fusing in networks with more sophisticated information theoretic abilities. This study will
consider learning algorithms, approximate dynamic programming, and Markov decision theory,
incorporating these into the stochastic network models. First year goals in this area are:
1. Develop a general algorithm that finds local optima for (simple) classes of QoI
functions on stochastic networks, with some form of analytical convergence proof.
2. Initiate study on multi-stage problems.

The theory we develop in year 1 will be extended and plugged into more extensive models and
networks in future years, as we learn more about collecting and interpreting network data.

Validation Approach
In the first year we will have two thrusts for validation. For the definition of QoI we will work
closely with INARC to ensure that the measures of QoI are consistent with the ability to extract
knowledge and make decisions given different network structures. INARC researchers are part
of this team. For optimizing QoI metrics, we will prove, if possible, that optimal values exist
and determine what they are. If optimal values do not exist we will evaluate the local maxima
and attempt to set bounds.

Summary of Military Relevance


The goal of networks in this environment is to enable rapid decision making to increase mission
tempo. A large part of the ability to do this is based on the quality of information received.

Research Products
In the first year we will have:
- A multidimensional model of QoI
- An initial result on the impact of network structure on QoI
- A general algorithm for finding the local optima for a class of stochastic QoI
functions

References
[1] S. Marti, T. Giuli, K. Lai, and M. Baker, “Mitigating Routing Misbehavior in Mobile Ad
Hoc Networks,” Aug., 2000.

[2] M. Grossglauser and D. Tse, “Mobility increases the capacity of ad-hoc wireless
networks,” in Proceedings of IEEE INFOCOM, 2001.

[3] K. Liu and Q.Zhao, “Decentralized Multi-Armed Bandit with Multiple Distributed
Players,” submitted to IEEE 2010 ICASSP; available at http://arxiv.org/abs/0910.2065v1

NS CTA IPP v1.4 9-23 March 17, 2010


[4] T. Lai and H. Robbins, “Asymptotically efficient adaptive allocation rules,” Advances in
Applied Mathematics, vol. 6, no. 1, pp. 4C22, 1985.

[5] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource Allocation and Cross-Layer


Control in Wireless Networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp.
1-149, 2006.

[6] L. Tassiulas and A. Ephremides, “Stability Properties of Constrained Queueing Systems


and Scheduling Policies for Maximum Throughput in Multihop Radio Networks,” IEEE
Transactions on Automatic Control, vol. 37, no. 12, pp. 1936-1949, Dec. 1992.

[7] R. Agrawal and V. Subramanian, “Optimality of Certain Channel Aware Scheduling


Policies,” Proc. 40th Annual Allerton Conf. on Communication, Control, and Computing,
Monticello, IL, Oct. 2002.

[8] Department of Defense. “Network Centric Warfare Report to Congress”, July 2001.

[9] J. Garstka, and D. Alberts. 2003. “Network Centric Operations Conceptual Framework
Version 2.0.” Vienna, Va.: Evidence Based Research, Inc.

[10] Department of Defense. “Joint Publication 3-13: Information Operations”, 13 February


2006.

[11] Department of Defense. “Joint Publication 6-0: Joint Communications System”, 20


March 2006.

[12] Headquarters, Department of the Army. “Field Manual 6-0: Mission Command:
Command and Control of Army Forces”, August 2003.

[13] D. Alberts, and R. Hayes, 2006. “Understanding Command and Control”, CCRP
Publication Series.

[14] P. Driscoll, M. Tortorella and E. Pohl, “Information Product Quality in Network Centric
Operations”, Operations Research Center of Excellence Technical Report DSE-TR-0516,
May 2005.

[15] M. Tortorella, and P. Driscoll, “Reliability of Information-Fueled Services in Network-


Centric Operations”, Military Academy Dept Of System Engineering, West Point, NY,
Technical Report A312464, June 2005.

[16] C. Bisdikian, D. Verma, L. Kaplan, M. Srivastava, D. Thornley, “Defining Quality of


Information and Metadata for Sensor-originated Information,” 4th USMA Network
Science Workshop, West Point, NY, October 28-30, 2009

NS CTA IPP v1.4 9-24 March 17, 2010


[17] C. Bisdikian, L. Kaplan, M. Srivastava, D. Thornley, D. Verma, R. Young, “Buildin
Principles for a Quality of Information Specification for Sensor Information,” 12th
ational Conference on Information Fusion, Seattle Washington, July 6-9, 2009.

[18] Y. Hung, A. Dennis, L. Robert, “Trust in Virtual Teams: Towards an Integrative Model
of Trust Formation,” 37th International Conference on System Sciences, 2004.

[19] B. Hilligoss, and S. Rieh, “Developing a Unifying Framework of Credibility


Assessment: Construct, heuristics, and Interaction in Context,” Information Processing
and Management 44 (2008) 1467-1484, Elsevier.

[20] B. Stvilia, L. Gasser, M.B. Twidale, L.C. Smith, “A Framework for Information Quality
Assessment,” JASIST, 58(12):1720-1733, 2007.

[21] P. Missier, S. Embury, M. Greenwood, A. Preece, B. Jin, “Quality Views: Capturing


and Exploiting the User Perspective on Data Quality,” Proc. of ACM VLDB, 2006.

[22] M. Bouzeghoub and V. Peralta, “A Framework for Analysis of Data Freshness,” Proc.
of ACM IQIS, 2004.

9.5.8 Task C1.3: Characterizing Connectivity and Information Capacity for


Dynamic Networks (Q. Zhao, UC Davis (CNARC), N. Young, UC Riverside
(CNARC), A. Yener, Penn State (CNARC), P. Brass, CUNY (CNARC), A.
Swami, ARL)

Task Overview
In an integrated tactical communication network, heterogeneous components with different
priorities and time-varying QoI requirements coexist and inter-operate. Such a dynamic
composition of the network leads to highly dynamic interference patterns across network
components. As a consequence, temporal and spatial dynamics of interference play an important
but not well understood role in link stability and network connectivity. Fading dynamics and
node mobility, as well as security requirements, further impact the very existence and quality of
links, and consequently, the connectivity and capacity of the integrated heterogeneous network.

Task Motivation
Understanding the impact of different types of dynamics in a network on OICC is important to
gaining a full understanding of how OICC can be controlled and which protocol mechanisms
may be most suitable for achieving optimal OICC. Prior efforts on modeling the impact of
dynamics on networks only consider traditional notions of capacity.

Key Research Questions


The overall goal of this project is to answer the following questions:

NS CTA IPP v1.4 9-25 March 17, 2010


- How does the network topology and connectivity change due to fading (which could
vary depending on the spatial locations of nodes and in time), node mobility, and
interference?
- How do the spatial and temporal dynamics of interference, fading, and mobility affect
the achievable operational information capacity of the network?
Initial Hypotheses
We expect to find that dynamics of different types have a much larger impact on OICC than
traditional measures of capacity. Our reasoning is that dynamics will impact both the ability of
the network to deliver data and the QoI, both of which impact OICC. Our primary objective
under this task is to characterize the impact of the dynamics of fading, node mobility, and
interference on network connectivity and capacity.
Technical Approach
Towards answering the research questions, there are multiple issues that need to be jointly
considered. The coherence time of the network will determine the rate at which the links change
and this is likely to have an impact on the size of the packets to be used and the transmission
rate. A small coherence time might aid the use of high transmission rates but require the
transmission of small packets. Depending on the signal quality variations, the transmission rate
will have to be carefully chosen.

We also wish to point out that if the wireless channel changes frequently, tracking would require
frequent exchange of control overhead, which would impact capacity. If one were to avoid
tracking, it may result in the use of sub-optimal system parameters which could in turn affect
capacity.

Implicit in the above discussion is the fact that, network dynamics affect the interference patterns
within the network as well. The rate of transmissions would affect the interference projections in
the network. Depending on the dynamics of the network, determining the set of links that are to
be simultaneously activated is not an easy problem to solve.

In this task, we propose to undertake the following sub-tasks:

Characterizing the evolution of the network topology due to dynamics


Here we seek to capture the impact of network dynamics on topological changes i.e.,
intermittence. In particular, we seek to find models which capture how frequently link qualities
change, how often and to what extent do the rates achievable on the links vary. Our goal is to
come up with a set of simple models in the first year that capture changes due to realistic
dynamics. In subsequent years, we will confirm the realism of these models, primarily via
experimentation, and refine the models as necessary. This work will maintain a close
relationship to E2 (task 2 in EDIN).

Characterizing the impact of interference dynamics on network connectivity and delay


We propose to analytically characterize the impact of interference dynamics on the connectivity
and multihop delay profile of an integrated heterogeneous network. Our emphasis will be on the
fundamental properties and the basic structure and behavior of the network topology,
connectivity, and delay. In our preliminary work [1][2] developed in collaboration with Dr.
Ananthram Swami at ARL, we have characterized the impact of interference dynamics and

NS CTA IPP v1.4 9-26 March 17, 2010


interference constraints of a high priority network component on the topology and connectivity
of a low priority network component. In the preliminary work, we mainly consider the spatial
dynamics of interference and focus on the instantaneous connectivity of the network. In the first
year of this project, we aim to broaden the scope of the preliminary work to characterize the
intermittent connectivity of the network and the associated delay profile. In particular, we aim to
analyze the impact of link availability (determined by interference dynamics) on the scaling
behavior of multihop delay with respect to the source-destination distance. In addition to the
spatial dynamics of interference, the impact of the temporal dynamics of interference on the
intermittent connectivity and delay profile of the network will be examined.

Characterizing connectivity when link failure arise from interference and compromised
security properties
Recent work has used percolation theory to study information spread, mobility and resilience in
networks. Essentially, the uncertainty in links or mobility is modeled as (random) link failures,
resulting in a bond percolation model [3][4]. Mobility models considered are – constrained i.i.d.
mobility (nodes move to a new location chosen i.i.d. within a circle of radius a) and discrete-time
Brownian motion. While these models are more refined than the unconstrained i.i.d. model in
[5], they still do not capture the mobility of a realistic tactical MANET.

Here, we propose to consider realistic models for link failures. In contrast with the prior art cited
above, we posit that the link failures will not be i.i.d., since it is likely that links connected to the
same node fail at the same time (or links in the same area, perhaps due to interference). Failure
due to a common interference source can be modeled by considering correlated failure of links.
Failure of all links connected to a given node can be modeled as a node (site) failure (instead of
bond failure), leading to a mixed bond and site failure model.

This framework will help us model link failures due to security compromises as well. For
example, if shared keys in two nodes with an active link are compromised, confidentiality on
that link it lost; if this is a required security property, then the link must not be used, i.e., the link
is effectively disconnected. The model of link failure we develop is general enough to capture
this type of security compromise and others, and can be extended in the future to capture the
dynamics of cascading failures. Thus this model will be useful in determining the impact of loss
of security properties, studied in Task 1.4, and can be re-used by that task.

Capturing the impact of dynamics on transmission rate and the overall capacity
In our first year, we will also study the impact of dynamics of link quality on the transmission
rate and as a consequence the overall network capacity. In our preliminary work, we have
considered the impact of rate on a hybrid network wherein infrastructure stations are augmented
by multi-hop relays [6][7]. Extending this work to a fully ad hoc setting is a challenge that we
seek to address. Furthermore, the model will be refined to reflect the uncertainty in the
achievable capacity due to network dynamics.

The work in [6][7] also considered throughput to reflect the capacity metric. However, in our
work, we seek to find the capacity in terms of the QoI. We will consider dynamic changes in
flow patterns, both temporally and spatially and capture these when computing the capacity. As

NS CTA IPP v1.4 9-27 March 17, 2010


an example, there may be flow disruptions due to intermittence. The interference patterns could
also result in a reduction in the flow rates due to interference variations.

We will be to begin with low complexity models (as an example, a fixed rate at which all the
links of the network vary, saturation traffic) in our first year. Progressively, we will increase the
realism and complexity to refine our models. We expect to characterize the homogeneous flows
in the first year and extend it to cases with heterogeneous traffic and nodes in the second year.

Collaborating with SNARC and INARC: The mobility of nodes in the network will depend on
social interactions among the soldiers. Furthermore, the types of traffic generated will also have
an impact on the information flows and the resistance/vulnerability to network dynamics. We
expect to interact with the INARC to obtain good representations of the traffic. The dynamics of
the network itself (topology) will be also varied based on models that we expect to obtain from
INARC and SNARC.

Validation Approach
We will use the models developed here to produce numerical results capturing the impact of
different types of dynamics (interference, mobility) on OICC. We will compare these results
with traditional capacity measures to determine the relative importance of different types of
dynamics and the order of their impact.

Summary of Military Relevance


Dynamics are a key characteristic of military networks. Understanding the impact of different
types of dynamics in a network on OICC is important to gaining a full understanding of how
OICC can be controlled and which protocol mechanisms may be most suitable for achieving
optimal OICC.

Research Products
In the first year we will
1. Produce a characterization of the impact of interference dynamics on the instantaneous
and intermittent connectivity of an integrated heterogeneous network.
2. Produce a characterization of the impact of interference dynamics on the delay profile of
integrated heterogeneous networks.

References
[1] W. Ren, Q. Zhao, and A. Swami, "Connectivity of Heterogeneous Wireless Networks,"
submitted to IEEE Transactions on Information Theory, available at
http://arxiv.org/abs/0903.1684
[2] W. Ren, Q. Zhao, and A. Swami, "Power Control in Cognitive Radio Networks: How to
Cross a Multi-Lane Highway," IEEE Journal on Selected Areas in Communications
(JSAC): Special Issue on Stochastic Geometry and Random Graphs for Wireless
Networks, vol. 27, No. 7, pp. 1283-1296, September, 2009.
[3] Z. Kong and E. M. Yeh, “Connectivity, Percolation, and Information Dissemination in
Large-Scale Wireless Networks with Dynamic Links,” submitted to IEEE Transactions
on Information Theory, 2009.

NS CTA IPP v1.4 9-28 March 17, 2010


[4] Z. Kong and E. M. Yeh, “On the latency for information dissemination in mobile wireless
networks.” Proceedings of the 9th ACM international Symposium on Mobile Ad Hoc
Networking and Computing (Hong Kong, Hong Kong, China, May 26 - 30, 2008, ACM
MobiHoc '08, New York, NY, 139-148.
[5] M. Grossglauser and D. N. Tse, “Mobility increases the capacity of ad hoc wireless
networks,” IEEE/ACM Trans. Netw., vol. 10, no. 4, pp. 477-486, Aug. 2002.
[6] L. K. Law, S. Krishnamurthy and M. Faloutsos, “Capacity of Hybrid Cellular-Ad hoc
Data Networks,” IEEE Infocom 2008, Phoenix, AZ.
[7] L. K. Law, K. Pelechrinis, S.Krishnamurthy and M. Faloutsos, “Downlink Capacity of
Hybrid Cellular Ad hoc Networks,” IEEE/ACM Transactions on Networking, April 2010
(to appear).

9.5.9 Task C1.4: Modeling the Impact of Data Provenance and Confidentiality
Properties on QoI (K. Levitt and P. Mohapatra (UC Davis); A. Smith, S. Zhu,
and A. Yener (Penn State); S. Krishnamurthy (UC Riverside))

Task Overview
We expect the QoI is impacted by security characteristics of the information source(s), the
network that transports it, and the hosts that process it. Security has a significant impact on the
amount and quality of information conveyed over a tactical network. Our aim is to characterize
the impact of security on the quality of information but also to mitigate its impact. Our approach
is to model the impact of provenance and confidentiality on QoI.

Intuitively, one can argue that securing information results in an increase in overhead either in
terms of additional information to be shared between the network entities, higher delays incurred
in ensuring security, or administrative overhead to address attacks or false alerts. One can
further argue that this in turn leads to a discount in what can be carried over the network. Certain
choices made in order to enable secure transmissions, e.g., establishing/distributing keys, leads to
resource consumption that would otherwise have been used for data transmission.

We argue that the relationship between the fundamental performance metric in a tactical
network, the operational information content capacity, and information security is far more
complex than simply quantifying security overhead. This is precisely due to the fact that the
metric is based on the QoI. In many applications, various security properties may be required to
ensure a high (or acceptable) QoI; in other words, without these security properties in place, one
cannot achieve high operational capacity.

The network that we consider is inherently heterogeneous in terms of node processing


capabilities, information content and sources, and application requirements.

NS CTA IPP v1.4 9-29 March 17, 2010


Task Motivation
Information capacity and security are fundamentally inter-related. The trade-offs made between
security and data delivery in terms of QoI are not well understood, particularly when unknown
attacks or attack methods are considered.

Key Research Questions


In this task we seek to answer the meta-level question: What is the impact of the presence, lack,
or partial presence of security properties on QoI?

In particular, we seek to answer the following fundamental questions:


- How does the operational information content capacity change when a set of system
expectations are impacted due to attacks that impact security?
- How can such attacks impact data delivery and, in turn, QoI?
- How does the presence of an attack impact QoI with and without the realization of
selected security mitigations/solutions?
- How do inherent network properties such as heterogeneity and mobility affect the
security properties and, hence QoI?
-
Initial Hypotheses
We expect the QoI is impacted by security characteristics in ways that will require large degrees
of flexibility in providing information to users. More specifically, the interaction between
security properties and the delivery of information will require the use of different modalities of
information and combinations of sources to meet QoI targets.

Technical Approach
We recognize that there exists an abundance of research efforts on information security.
However, the fundamental issue of quantifying the capacity in terms of secure information
transfer in a heterogeneous wireless network, with high mobility and dynamics, is not well
understood. By bridging fundamentals of information delivery and information security, we seek
to overcome this long standing barrier and will contribute to significant progress towards the
long sought after metrics for security.

We take the following approach to answering the fundamental questions above. First, we define
a set of basic security properties that are relevant to tactical networks. Working with the INARC
and CNARC, we will investigate what effects the presences of these properties have,
individually and jointly, on the actual QoI needed by applications and users. We will then
model classes of high-level paradigms towards providing these security properties and determine
the achievable QoI with and without the presence of an attacker. By necessity, we will model
attackers but only through models (not actual attack scripts) in order to gain a full understanding
of the impact attacks may have on QoI.

During the first year, our focus will be on the modeling aspects of two of the most important
security properties: provenance and confidentiality. These two properties are most relevant to
tactical applications. The models will be developed to analyze and evaluate the impact of these
properties on QoI. In the subsequent years, we will also address the modeling of various security
mechanisms and their impact on QoI. Note that the two properties are not completely

NS CTA IPP v1.4 9-30 March 17, 2010


independent. The use of encryption could provide confidentiality but also provide some level of
provenance on the action of the node (the fact that it used its key to encrypt information).
Furthermore, confidentiality itself (such as the use of private keys) supports the provision of
provenance; no other node can tamper with the encryption without being detected and thus, the
encrypting node can be held accountable for sending the message. Finally, an attacker learning a
key could impact data delivery.

Thus, the following subtasks are proposed for this part of the study:

A. Modeling provenance and its impact on QoI: In this subtask during the first year we focus
primarily on modeling provenance and its impact on QoI. This will be a cross-cutting
model with input from INARC and CNARC. We will model several notions of provenance
and consider aspects such as authentication and non-repudiation. This work will have direct
implications on the Trust CCRI.

B. Modeling confidentiality and its impact on QoI: In this task we will focus on developing
models that captures the fundamentals of the confidentiality properties with varies degrees,
and their relationship with QoI.

C. Modeling the Mechanisms of Security Protocols to enhance QoI: In this task (which is
deferred until the 2nd year and beyond), we will focus on mechanisms to provide
provenance and data confidentiality. For provenance we focus primarily on cryptographic
techniques used for authentication but also on methods to overcome compromised nodes on
multipath routes. For confidentiality we consider different encryption mechanisms and
their impact on node processing and data delivery, for example the impact of key
compromise; we seek to characterize the impact of key distribution and certifiability.

Modeling provenance and its impact on QoI:

Data provenance refers to the one's certainty about the origin of and operations on data from its
source through its transfer to a destination. Provenance subsumes the properties of authenticity
and non-repudiation. Authenticity allows peer nodes to have proof of who they are
communicating with, or to validate the source of data. Non-repudiation provides proof of
identity or origin to a third party such that a communicating party cannot later deny transmitting
a message or performing an operation.

We propose to undertake the following efforts in this subtask:

Developing a cross-cutting model of provenance: Working with the SCNARC and the
INARC, we will develop models that relate provenance to the achievable QoI. These models
will be abstract in the following sense: we do not consider the mechanisms for providing
provenance, but only the degree to which the property is satisfied. These models will allow us
(i) to capture the relative importance of each security property with respect to QoI, and (ii) to
determine the sensitivity of QoI to the degree that the properties are satisfied. The key challenge

NS CTA IPP v1.4 9-31 March 17, 2010


that must be addressed here is the determination of what kinds of information or operations have
a high requirement of provenance versus others. We will also link with TRUST project T1
which considers the amount of meta-data that must be maintained to track provenance.

We define four regimes for investigating the impact of each property on QoI: (i) We assume that
all nodes in the network satisfy the desired property(ies) perfectly. (ii) We relax this assumption
so that only a certain percentage of nodes satisfy the desired property(ies). (iii) We relax our
assumptions further to allow a partial or probabilistic satisfaction of the desired property(ies) by
each node. (iv) We model the case in which the satisfaction of properties in nodes changes over
time. This schema is meant to be a flexible starting point for our investigation. The exact
modality of the modeling will be adapted based on intermediate research results.

Collectively, these regimes will enable us to determine quantitatively how security – including
the overhead it incurs – impacts QoI. The first regime allows us to determine the impact of
perfect security on QoI. The second and third regimes allow us to determine QoI sensitivity to
heterogeneous nodes (not all nodes achieve the same level of security), a portion of the network
being compromised (not all nodes satisfy a property), and the strength of the security
mechanisms in an abstract way (not all nodes satisfy the property perfectly). The fourth regime
allows us to consider network dynamics.

Important areas that we will address with the INARC include parameterizing the impact of a
security property with respect to information, i.e., which properties impact different types of
information the most. We will work with the SCNARC on defining how the security properties
applied to different pieces of information impact overall trust, and how they impact decision
processes. This will shed light on the importance of different types of information and allow us
to adjust security properties on a per-piece of information basis, thus incurring the overhead due
to security only as required. We will also work with the SCNARC and INARC to determine
how the presence of differing security properties on multiple pieces of information, perhaps each
with different importance will impact QoI and decision making. It is clear that the work on
security modeling serves a dual purpose: it plays a major role in determining QoI, and it has a
high impact on the Trust CCRI.

Characterizing the impact of provenance on QoI: For provenance the four regimes apply
directly. The first step is to ensure that the properties of authenticity and non-repudiation apply to
the source node. If the source node cannot be authenticated, there is no provenance associated
with the generated data. Likewise, without a non-repudiation property, a third party cannot
verify the source of the information. The non-repudiation property should apply to the transit
nodes as well, i.e., the nodes that forward the data from the source to its destination. Because
provenance is related to the “chain of evidence” when passing data, if a transit node operates on
the data, exactly what operation was performed must be certifiable. The interaction between
provenance and the achievable performance in the network is complex and not well understood
so far. Furthermore, providing security and trustworthiness for provenance records is
challenging.

While providing provenance one should ensure that:

NS CTA IPP v1.4 9-32 March 17, 2010


– An adversary (or a group of adversaries) cannot modify the information chain
(source to destination though a trusted members) by adding fake entries, or
removing entries from valid users.
– People who forward the information cannot repudiate their activities on the
information.
– Decision makers can verify authenticity of an information chain without the
requirement to learn more details, such as the content of the information.
– Integrity: Decision makers should be able to detect the following:
• Forgery of individual provenance records
• The sequence of records in the chain.
• How nodes were compromised, so as to determine if other nodes are
threatened.

Most of the existing research focused on provenance relies on recording the entire history of
information, annotations, information-flow recoding etc. However, in a heterogeneous dynamic
network, the provision of provenance and its impact on the information quality and content is not
well understood. Moreover, the history-based approaches could impose various resource
constraints in terms of bandwidth and storage. The risk of explosion and the complexities related
to the history-based approaches further limits their applicability.

Is it possible to identify a compact form to deliver provenance information with a policy that the
full information is stored for possible audit or for query by a decision maker?

Typically providing provenance can be done by the following generic ways. Strong provenance
is provided by requiring the operator node to include some form of information that uniquely
attests that it performed the operation. Digital signatures or the use of private keys to encrypt the
information are mechanisms that can support strong provenance. Strong provenance provides a
means for ensuring non-repudiation i.e., a node that includes this unique information is
responsible for the operations on the message and its contents up to the point wherein the “node
signs the message in some form.”

Finally, a weak sense of provenance can be obtained by witnesses that observe the action. The
degree of provenance associated with an operation observed by a witness is directly dependent
on the trustworthiness of the witness. Thus, this is directly tied to the trust CCRI . All of these
high level strategies have direct implications on the QoI achievable in the network, which
includes how QoI is impacted by the overhead of security mechanisms. As detailed below, we
will characterize these approaches mathematically in this task.

Operator provenance: A node can be required to include information that uniquely ties to the
operation that it performs. As described earlier, digital signatures or simply using private keys to
encrypt the message are mechanisms that can help satisfy this requirement. The questions that
arise are (a) How effective is a signature in uniquely tying the operation to the node performing
the operation? Stated otherwise, how likely is it that an adversarial node can override this
security property to destroy the provenance? and, (b) How does the provision of operator
provenance affect the data delivery performance and the QoI? To illustrate these issues,
consider that digital signatures serve as the mechanisms for supporting operator provenance. A

NS CTA IPP v1.4 9-33 March 17, 2010


stronger signature would ensure higher provenance but will impact data delivery in terms of
throughput and delay. A weaker signature can be more easily reproduced by an adversary, but
will have a lower impact on data delivery performance.

Different applications will have different requirements in terms of the provenance and
performance. Further, in a heterogeneous network, different nodes will have different
capabilities in terms of providing operator provenance. From the perspective of mechanisms,
different processing capabilities and, thus, will not be able to either sign or verify signatures.
Our approach is to consider a heterogeneous network consisting of varying degrees of operator
provenance. We will model the operator provenance to have a specific impact on the data
delivery. Together these will have an impact on the QoI metric which will also be captured. In
fact, our approach is expected to lead to a region of QoIs which will allow us to capture the
trade-offs between information quality (performance) and security (provenance).

Authentication: The use of operator provenance is unlikely to be of use in cases where a node
may be compromised, its signatures have been has been stolen, or is of unknown identity. Here, a
higher level known entity (such as a decision maker – or a digital counterpart) will have to
authenticate a node. The impact of authentication is very much dependent on the location of the
higher level authenticating entity. If this node is further away (a decision maker overseeing a
large group of soldiers) or the channel qualities are poor, obtaining the authentication may be
time-consuming and in some extreme cases, may be impossible. There is then, an inherent trade-
off between information quality, provenance and performance. If obtaining the authentication
takes time, the data may become somewhat stale. The question then would be whether there is a
sufficient level of QoI for the application. If this is not the case, the question would be whether a
weaker provenance is sufficient – but accepting the potential QoI degradation? We will carefully
model these artifacts considering different topologies and application requirements.

Witnesses: A node could simply depend on other observing nodes (witnesses) to provide
provenance on a node’s actions. This would depend on the number of observers, and the trust
that the node seeking provenance, has on these observers. Clearly, this would depend on the
density of the network, mobility and the likelihood of having colluding adversaries, which could
be mitigated by artificial diversity among the nodes. These factors will result in an associated
uncertainty with regards to the provenance that can be provided but will reduce the overhead
incurred and thus could result in better data delivery capabilities. We will also consider this with
different parametric choices with regards to the aforementioned factors and will estimate the
possible QoI and the operational capacity.

In addition, the provenance provided by a witness directly depends on the trustworthiness of the
witness. Thus, this part of our work will tie in closely with the trust CCRI.

A combination of possibilities: Finally, we envision considering a network with all of the above
possibilities, all available to different extents and compute the achievable QoI in terms of
information quality (performance) and security (provenance) that can be provided.

NS CTA IPP v1.4 9-34 March 17, 2010


Modeling secured provenance is extremely complex in the case of multihop transmission of
information among the heterogeneous networks. It will be even more complex if we assume the
network is highly mobile, which is the normal mode of operation in the tactical environments.

We will attempt to model this complicated relationship between provenance, information


integrity, confidentiality and trust and provide a unique secured provenance model to ensure a
high QoI.

1. We will first identify in detail the challenges in trustworthy provenance and define a
preliminary adversarial model.
2. We will analyze the potential authentication and confidentiality issues related to securing
provenance information from the adversary.
3. We will start with a single hop system, where a set of users transmit their observation to
the central commanding unit. In this case we will attempt to model the secured data
provenance using a query processing or a storage history method. Then, we shall evaluate
the relationships between the trust and the provenance of the data. We shall derive more
fundamental mathematical relationship to capture the amount of privacy of source,
quality of the information, and the level of desired provenance.
4. The single hop model will then be expanded to multihop transmissions in heterogeneous
networks.
5. In the subsequent years, with inputs from EDIN, we will expand the model of secured
data provenance in multihop networks by integrating node dynamics.

Modeling confidentiality and its impact on QoI:

Data confidentiality is the ability to prevent data from being learned by unauthorized nodes. In a
tactical network, it is of paramount importance to ensure that information is available to those
who are authorized to receive/decode it, and is protected from those who are not. The latter may
be external (malicious) entities, compromised nodes, or simply nodes that have a lower security
clearance as compared to the authorized nodes. Data confidentiality is the ability to prevent data
from being learned by unauthorized nodes.

Impact of data confidentiality on QoI: Confidentiality also affects the achievable QoI. As
mentioned, confidentiality is required to protect transmitted information from eavesdropping on
the network and from an attacker who has compromised a node. The keys themselves are
subject to compromise and must themselves be subject to confidentiality.

Providing confidentiality with the use of encryption will increase security but will require
processing and thus will affect the quality of information that is delivered and the performance.
As an example, ensuring the confidentiality of large volumes of video data could be difficult and
in some cases even infeasible. However, the same information could be delivered as text data
(but with lower fidelity) and could be provided with a much higher degree of associated
confidentiality. These trade-offs will be carefully assessed.

NS CTA IPP v1.4 9-35 March 17, 2010


Varying strengths of confidentiality could lead to varying impacts on the QoI. Typically,
confidentiality is achieved using encryption. Encryption could either be performed using secret
keys or using a public key system which requires a Public Key Infrastructure (PKI). In our work,
we will consider the two above two broad classes of approaches for ensuring confidentiality.
Each generic class comes with a set of limitations and characteristics. With secret keys, the
processing overhead incurred for encryption and verification is lower.
However, these secret keys will have to be distributed. Key distribution by itself is a challenge.
Providing nodes with large sub-sets of the available keys would improve performance (shorter
routes); however, it would be more susceptible to eavesdropping and thus, would lead to lower
degrees of confidentiality. Thus, there is an inherent trade-off between the data delivery
capabilities (performance) and security in this context. We will model the provision of
confidentiality using the broad class of secret key encryption (without considering specific
mechanisms themselves). In this broad class, we will derive stochastic representations of
connectivity and the associated secrecy that can be imbibed with a piece of information. The
connectivity will dictate the routes that are used and thus, indirectly the data delivery and the
QoI.

The second class of approaches that satisfy the confidentiality property are public key-based
encryption techniques. These require the use of public and private keys can provide stronger
security (more difficult to break the encryption). However, public keys have to be certified by
authorities (similar to authentication with provenance). Certification, as authentication, comes
with a price – overhead. One can choose to bypass certification to improve performance;
however, this comes at the risk of man-in-the-middle attacks by compromised adversaries and
thus, would weaken the security in the delivered QoI. We propose to capture the overall
properties of public key based confidentiality in a mathematical model and compare and contrast
the achievable QoI of public key based approaches with that provided with secret key based
approaches. Again, the location dependence (where are the certifiers located), the time taken to
break the encryption versus the time duration for which the confidentiality needs to be preserved,
and the uncertainty in terms of topology and link qualities will be carefully captured in our
models. Our overall goal will be to compare the QoI achieved with the two generic classes of
approaches and find a OICC region which will be represented by the maximum of the OICCs
that are derivable with the two specific classes.

For data confidentiality, the four regimes apply directly to the ability of each node to prevent the
leakage of information either when transmitting or holding the data. For example, achieving this
property imperfectly might correspond to leaking information with a certain probability, leaking
some part of the secret information or, more generally, leaking some correlated secret whose
mutual information with the secret information is bounded. Note, because we are characterizing
QoI, we do not assume that all “bits of information” are equal. Therefore, in these models we
will account for the variable protection of different pieces of information, including the
information associated with encryption keys themselves.

In the first year of our effort, we envision achieving the following:


• We will jointly consider confidentiality and performance while computing the QoI. The
goal here will be to define a QoI metric in terms of a target data delivery performance
and desired level of confidentiality. We envision working with INARC to characterize

NS CTA IPP v1.4 9-36 March 17, 2010


the information source streams to determine the impact of confidentiality on
performance.
• We will consider the broad class of secret key based confidentiality and compute the
OICC in this context.
• We will begin and make progress with regards to characterizing the OICC with the
public/private key confidentiality class.

Modeling the Mechanisms of Security Protocols to enhance QoI

Although we are deferring this effort to the 2nd year, here we provide a brief outline that will
further relate the subsequent applications of the models developed in subtask A and B.

Given the specific properties and functional characteristics of different security mechanisms,
they may affect QoI differently even though they are designed to address the same type of attack.
Our objective is to capture the nuances that are particular to security mechanisms and determine
the impact of specific functions on QoI. The understanding that such models will provide will
allow us to (a) further refine and tighten the fundamental operational capacity bounds that we
compute, and (b) identify and address inherent protocol limitations as discussed later.

Modeling the impact of authentication on provenance:


The major data provenance mechanisms are related to authentication and signatures. As
discussed above, provenance includes non-repudiation which involves a third party being able to
verify a transaction. For this reason, most mechanisms that provide non-repudiation either use
public keys, or use a third party to assist in the transaction, for example, assigning keys for a
session. These keys, either public or private, are used to generate digital signatures that cannot
be forged and can be checked by a third party.

Digital signatures are based on cryptographic functions, which as discussed above have varying
overheads and strength. In highly constrained devices, authentication may be possible using
other methods (with quorums, visual evidence etc.). These methods will enable the provision of
provenance to varying degrees. Our objective is to account for these uncertainties in quantifying
the level of provenance that can be provided and in turn, its impact on QoI. The trade-offs
between using the traditional crypto-based class of methods and this more recent class of
provenance methods will be quantified in a fundamental way. In particular, we will quantify the
QoI possible with these two different methods, in terms of both security and performance.

Numerous mechanisms have been developed to support authentication in wireless networks.


Less work has been devoted to the evaluation of these schemes as would be required to
determine the impact on QoI. Our work will attempt to characterize properties of authentication
mechanisms as they bear on QoI. In addition, we will consider new mechanisms that offer the
possibility of improving authentication but with less overhead than existing schemes.

Modeling data confidentiality mechanisms and their impact on QoI:


There are two main classes of mechanisms providing data confidentiality: encryption, based on
the computational intractability, and “information-theoretic” methods, based on an
eavesdropper's inherent physical uncertainty about the channel being observed.

NS CTA IPP v1.4 9-37 March 17, 2010


Encryption results in a processing/communication overhead that has to be accounted for. In order
to provide a certain degree of data confidentiality, a trade-off in terms of the achieved throughput
and latency must be made. These trade-offs are a function of the encryption algorithm, the length
of the keys used, and the key distribution method. As mentioned, our interest in data
confidentiality is in part driven by the loss of keys and its impact on provenance and, ultimately,
on QoI.

Among others, one model of compromise that we will investigate is the following. The
substantial literature on perfectly secure message transmission (e.g.[1][2][3]) discusses secrecy
in the presence of a subset of nodes leaking information completely; recent work on leakage-
resilient encryption [4][5] provides computational security guarantees in the presence of
partially-leaked secret keys; however, neither of these lines of research provides a
comprehensive network model, nor are their models expressive enough to capture heterogeneous
information. The perspective of QoI is expected to shed light on a unified analysis of security
mechanisms and protocols, and drive the design of novel protocols that address the needs of
these unified systems.

Validation Approach
We will evaluate this work by comparing the impact of security properties on the value of QoI
achieved with the relative usefulness of this information in making decisions. We will work with
task C1.2 and the members of the INARC team to rationalize these values.

Summary of Military Relevance


Secure communications in the presence of adversaries is a requirement of virtually all military
networks. Quantifying the benefits-cost tradeoffs of security properties on the ability to make
decisions is an important step to maximizing the benefits of military networks.

Research Products
In the first year we:
- Produce a fundamental mathematical relationship to capture the amount of privacy of
source, quality of the information, and the level of desired provenance.
- Define a QoI metric in terms of a target data delivery performance and desired level of
confidentiality.

References

[1] F. Baccelli, O. Dousse, M. Haenggi, J.G. Andrews, M. Franceschetti, "Stochastic


geometry and random graphs for the analysis and design of wireless networks," IEEE
Journal on Selected Areas in Communications, vol.27, no.7, pp.1029-1046, Sept. 2009.

[2] D. Dolev, C. Dwork, O. Waarts, and M. Yung, “Perfectly secure message transmission,”
J. ACM, vol. 40, no. 1, pp. 17–47, 1993.

NS CTA IPP v1.4 9-38 March 17, 2010


[3] K. Srinathan, A. Narayanan, and C. P. Rangan, “Optimal perfectly secure message
transmission,” in CRYPTO (M. K. Franklin, ed.), vol. 3152 of Lecture Notes in
Computer Science, pp. 545–561, Springer, 2004.

[4] M. Fitzi, M. K. Franklin, J. A. Garay, and S. H. Vardhan, “Towards optimal and efficient
perfectly secure message transmission,” in TCC (S. P. Vadhan, ed.), vol. 4392 of Lecture
Notes in Computer Science, pp. 311–322, Springer, 2007.

[5] S. Dziembowski and K. Pietrzak, “Leakage-resilient cryptography,” in FOCS, pp. 293–


302, IEEE Computer Society, 2008.

9.5.10 Linkages with Other Projects

This project has close ties to the other active project within CNARC as well as both CCRIs and
projects within the other centers.

Within the CNARC this feeds into project C.2. By understanding the theoretical bounds we can
determine how network paradigms can improve QoI.

We will rely heavily on tasks in INARC for determining the characteristics of information
linkages and flows (I1.1). The linkages will be used specifically in C1.2. We are working
jointly with INARC on QoI and the results of task C1.2 will feed into INARC I2.1.

This project both feeds EDIN and requires output from EDIN. It provides EDIN with the
evolution of communication capabilities given dynamics of interference and mobility. This will
in turn impact the evolution of communications, information and social network. This project
will use the mobility models from EDIN as well as the models that relate the interaction of
social, information and communication networks in terms of change in structure.

This project has a direct relationship with TRUST, specifically T1. Both QoI and security
properties impact trust. Several members of the QoI and security teams (C1.2 and C1.4) are part
of T1.

9.5.11 Collaborations and Staff Rotations

9.5.12 Relation to DoD and Industry Research


We will leverage the DARPA WNAN program in terms of validating our ideas. The WNAN
program will give us a realistic military network platform on which to evaluate our results.

The ITA program addresses Quality of Information and Value of Information specifically in
sensor networks. The ITA program addresses topics such as formal ontologies for representing
QoI, and algorithms specific to sensor networks, such as calibration, data filtering, and sensor
sampling algorithms. In the project presented here, we take a broad view of information to

NS CTA IPP v1.4 9-39 March 17, 2010


include documents, stored data, and processed intelligence information. We are concerned with
a multi-dimensional definition of QoI that takes information and social networks into account.
We comprehensively consider the impact of security properties and the impact of the network on
information that will be subject to decision making algorithms.

Research Milestones
Due Task Description
Q2 Task 2 Define first version of multi-dimensional QoI function
Define relationship between authentication and confidentiality
Q2 Task 4
related to securing provenance information from the adversary
Develop the framework of model hierarchies and inter-
Q3 Task 1
relationships both within and outside of CNARC
Q3 Task 2 Determine relationship between network evolution and QoI
Task 3 Characterize the impact of interference dynamics on the
Q3 instantaneous and intermittent connectivity of an integrated
heterogeneous network
Task 4 Derive a fundamental mathematical relationship to capture the
Q3 amount of privacy of source, quality of the information, and the
level of desired provenance
Create a first instantiation of the models to derive expressions
Q4 Task 1
for at least the information carrying capacity of a given network
Develop a general algorithm that finds local optima for low
Q4 Task 2 complxity classes of QoI functions on stochastic networks, with
some form of analytical convergence proof
Characterize the impact of interference dynamics on the delay
Q4 Task 3
profile of integrated heterogeneous networks.
Characterize the OICC with the public/private key
Q4 Task 4
confidentiality class

NS CTA IPP v1.4 9-40 March 17, 2010


Budget By Organization

Organization Government Funding ($) Cost Share ($)


BBN (CNARC) 66,240
CUNY (CNARC) 75,000
PSU (CNARC) 274,766 114,000
UCD (CNARC) 100,000
UCR (CNARC) 193,000
USC (CNARC) 274,014
TOTAL 983,020 114,000

NS CTA IPP v1.4 9-41 March 17, 2010


9.6 Project C2: Characterizing the Increase of QoI Due to
Networking Paradigms

Project Lead: T. F. La Porta, Penn State


Email: tlp@cse.psu.edu, Phone: 814-865-6725

Primary Research Staff Collaborators


A. Bar-Noy, CUNY (CNARC) T. Abdelzaher, UIUC (INARC)
G. Cao, Penn State (CNARC) M. Faloutsos, UCR (IRC)
J.J. Garcia-Luna-Aceves, UCSC (CNARC) A. Iyengar, IBM (INARC)
B. Krishnamachari, USC (CNARC) B. Sadler, ARL
T.F. La Porta, Penn State (CNARC) S. Krishnamurthy, UC Riverside
M. Neely, USC (CNARC)
K. Psounis, USC (CNARC)
H. Sadjadpour, UCSC (CNARC)
A. Yener, Penn State (CNARC)
Q. Zhao, UC Davis (CNARC)

9.6.1 Project Overview


In this project we examine key network paradigms to determine how they improve QoI and thus
increase OICC. We apply known and emerging communication and networking techniques to
redefine these limits in terms of QoI. Possible approaches include collaborative communications
(to leverage intelligence and cooperation), in-network storage (to leverage underlying social and
information networks), and scheduling mechanisms (to leverage use of resources). The actual
network behavior is impacted by mobility and the limitations of implementation and operating
environments. The former is addressed by incorporating mobility models; the latter is captured
via our thorough experimentation and iteration which will begin in year two of this project.

In this project, we consider: (a) the inherent signaling overhead used to create and maintain the
network; (b) the heterogeneous nature of information flows in tactical networks; (c) the
heterogeneous nature of the nodes that form part of a tactical network; (d) the exploitation of all
communication, processing and storage resources of a network, not just its wireless bandwidth;
and (e) the exploitation of all forms of inter-nodal cooperation, not just methods to avoid
multiple access interference. This will require collaboration with the SCNARC and INARC to
make the models more “information-aware”. Addressing these aspects of the fundamental limits

NS CTA IPP v1.4 9-42 March 17, 2010


of tactical networks will provide much needed insight on what is possible to attain, but also on
how to attain it. These items will be addressed progressively over the lifetime of the project. In
the first year we focus on items b, c, and d. We start to address items (a) in Task C2.2
(considering the overheads of different techniques in interference management, and (e) by
studying interference management. More detailed studied of different types of cooperation and
their overheads will be studied in latter years.

9.6.2 Project Motivation


Using the models of project C.1 we will have an understanding of what network properties
impact QoI and by incorporation, OICC. In this project our goals is to understand which
networking paradigms can control these network properties, or where not controllable, mitigate
their effects on OICC. This will allow us to better control these networks and achieve the
optimal QoI.

Challenges of Network-Centric Operations


The goal of a network in a military environment is to enable rapid decision making to increase
mission tempo. This requires sufficient amount of information at a required quality to be
delivered to decision makers. Challenges include understanding what information is important
(in collaboration with the SCNARC) and with what quality (e.g., delay, security property)
information must be delivered for knowledge extraction to be effective (in collaboration with
INARC). Here we undertake an effort to understand which networking paradigms may
overcome these challenges in the face of dynamics to have a large impact on OICC.

Example Military Scenarios


This improves the OICC by exploiting new network paradigms. We apply these paradigms in
like of needs of tactical networks and QoI. The project ultimately accounts for interactions with
information and social networks. We consider all cases of military network environment,
including troop movement, information gathering and mission planning. We also consider all
military communications environments – multi-hop wireless, dynamic conditions, high mobility,
heterogeneous nodes and heterogeneous traffic.
Impact on Network Science

The results from this project will allow the design of networks and algorithms that approach the
theoretical limits of operational information content capacity. With knowledge of the impact of
network paradigms, protocol structures may then be analyzed and improved so that the optimal
OICC may be achieved.

9.6.3 Key Research Questions


We seek to answer:
- What are the networking paradigms that will most improve (subject to constraints)
OICC?
- What are the gains achieved by the paradigms and why?

NS CTA IPP v1.4 9-43 March 17, 2010


9.6.4 Initial Hypothesis
We expect that leveraging the intelligence of nodes will have a large impact on OICC if the
algorithms effectively make use of underlying information and social networks and consider QoI
instead of traditional notions of QoS. The consideration of underlying information and social
networks will allow smarter choices in resource allocation, thus greatly increasing OICC above
those algorithms that consider only communication network characteristics.

9.6.5 Technical Approach


This project is organized into three tasks. The first explores the impact of collaboration in terms
of interference management and the second in terms of in-network storage and information
sharing of nodes. The third tasks addressed universal scheduling.

In particular, the tasks are:

C2.1 Characterizing performance of collaborative networking for concurrency in dynamic


networks with realistic traffic – We explore using concurrency to perform interference
management, and comparing this approach to distributed MIMO techniques, which in theory
are the best physical-layer concurrency techniques but are difficult to implement in practice.

C2.2 Characterizing the benefits of in-network storage and cooperative caching – We


characterize the limits of gains in networks performance when leveraging the storage of
nodes in the network. We consider mobility and varying importance of information. These
models will be developed in cooperation with the INARC and SCNARC.

C2.3 Characterizing the impact of scheduling on QoI – In this task, we optimize cost-QoI
tradeoffs in wireless networks, and determine universal scheduling policies.

9.6.6 Task C2.1: Characterizing Performance of Collaborative Networking for


Concurrency in Dynamic networks with Realistic Traffic (J.J. Garcia-Luna-
Aceves and H. Sadjadpour, UCSC (CNARC); Q. Zhao, UC Davis (CNARC),
S. Krishnamurthy, UC Riverside; M. Faloutsos, UC Riverside (IRC); B.
Sadler, ARL)
Task Overview
In this task, we propose a clean-slate approach to collaborative networking aimed at dynamic
networks. First, we note that virtual MIMO systems are simply not practical in tactical networks.
Second, we observe that, contrary to the current view, fading can improve the network capacity
by allowing more concurrent dialogues to take place among neighboring nodes in the same
portion of the wireless spectrum. Intuitively, the capacity of a network can be increased if
source-destination pairs communicate over portions of the spectrum for which the received
signals are well above a signal to noise ration (SNR) at the intended receivers, but constitute
faint noise at other nearby nodes. Hence, we propose to study opportunistic interference
management as an alternative approach to taking advantage of physical-layer concurrency in
dynamic networks. The approach consists of taking advantage of the time-varying nature of
wireless channels to find parallel channels between sources and destinations that can be activated

NS CTA IPP v1.4 9-44 March 17, 2010


in the same spectrum and at the same time without these signals interfering with each other. We
claim that the time-varying wireless channels can potentially create these parallel channels that
are equivalent of beam forming without any need for computing the channel state information for
the channel matrix.

Task Motivation
To date, the vast majority of studies on achieving physical-layer concurrency focus on taking
advantage of distributed MIMO cooperation [4]. However, all prior studies focusing on virtual
MIMO solutions ignore the actual overhead incurred for distributed cooperative MIMO to work,
and the synchronization of transmitter nodes and distributed space-time encoding and decoding
make this technique very impractical.

The alternative view of MIMO systems has focused on combating channel fading between
source-destination pairs. In MIMO broadcast channels, the optimum capacity is attained by the
well-known dirty paper coding [5] technique. However, dirty paper coding requires significant
feedback along with non-causal transmitted data information. Recent approaches in achieving the
capacity of dirty paper coding are based on the use of random beam forming [6]. However, these
approaches are not practical and more importantly, they are not designed for distributed systems
such as ad hoc networks.

Key Research Questions


In this task we seek to answer:
- Can collaborative networking techniques attain substantial performance
improvements in dynamic networks by increasing concurrency at the physical layer
using practical implementations?
- How much is QoI improved with this added concurrency?

Initial Hypotheses
We expect that concurrency will greatly improve the QoI and OICC in a network because of the
increased sensitivity of OICC to network capacity and QoI. Concurrency will allow more
information to be transferred and reduce congestion, this improving the quality of the delivered
information. Coupled with the expected reduction in overhead in the techniques used below
when compared to cooperative MIMO, we expect this gain in OICC to be large (e.g., 50%).

Technical Approach
Most current studies on capacity analysis and cooperation are related to homogeneous networks.
For example, most capacity analyses in the literature assume a single traffic class in the network.
Keshavarz-Haddad et al. [1] introduced the concept of transmission arena. Based on that
definition, they presented a method to compute the upper bound of the capacity for different
traffic patterns and different topologies of the network. However, they did not provide closed-
form scaling laws for the network capacity. Toumpis [2]investigated the throughput capacity
when there are s sources and s destinations in the network, where 0 < < 1. Liu et al [3]
extended this result by relaxing the constraint on the number of sources and destinations. While
these results address asymmetric traffic, the results apply to the case of a single type of traffic
pattern in the network. However, practical military networks consist of many types of traffic
patterns. We will investigate the ramification of such assumptions on the throughput capacity

NS CTA IPP v1.4 9-45 March 17, 2010


when different types of cooperation are used. We will attempt to find cooperation techniques that
result in much higher throughput capacity in dynamic networks. In addition, prior results on
cooperative techniques [4] do not consider the relative importance of different types of traffic or
different pieces of information within a single flow of traffic. Understanding the relative
importance of data within the traffic flows is required to enable QoI-aware cooperative
networking techniques.

We have shown that the per-source-destination unicast throughput of a tactical wireless network
can attain the optimal value and scale with the number nodes by embracing concurrency of
transmissions at the physical layer using multi-packet transmission and reception (MPT and
MPR). We have also shown that MPR and MPT increase the order capacity of wireless networks
for multicasting and broadcasting applications [9][10][11]. On the other hand, we have also
shown that network coding does not provide any order capacity gains for multicasting or
broadcasting in wireless networks [12][13][14]. Hence, our results show that increasing the order
throughput capacity of wireless ad hoc networks requires concurrency at the physical layer.

We have derived preliminary results that indicate the potential for order capacity increases by
taking advantage of the fading of signals over wireless links to manage interference [7][8]. These
results show that interference management can: (a) require the smallest possible feedback
reported in literature to date; (b) be implemented with current available hardware using simple
encoding and decoding of signals; (c) be extended to tactical networks, because there is no need
for transmitters to synchronize during transmission of the signals; and (d) constitute a viable
alternative to distributed cooperative MIMO approaches.

We will carry out the following activities during the first year:
1. Derive appropriate metrics for QoI-aware networks with heterogeneous traffic. We will
further investigate the appropriate type of cooperation required for networks with
heterogeneous traffic.
2. Study the conditions within which opportunistic interference management can be
implemented with limited complexity, the channel feedback needed from receivers to
senders, and the attainable capacity gains in combination with the use of multi-radio
nodes under limited feedback conditions.
3. Compare the performance of interference management against that of distributed
cooperative MIMO schemes, with emphasis on dynamic networks and taking into
account the signaling overhead incurred in the network.
4. Begin to study the design of low-complexity approaches for the implementation of
protocols that take advantage of interference management for channel access and
scheduling, and the interplay between interference management and cooperative
networking mechanisms that can be implemented above the physical layer, such as
adaptive scheduling, multipath routing, multi-copy forwarding, queue management,
transmission control, and coding.

Our research after the first year will address the results of the four activities listed above to
mount a study of the modeling and design of a clan-slate approach to collaborative networking in
which the design of the protocol architecture of a tactical guided by the integration of
communication and storage networks with information and social networks.

NS CTA IPP v1.4 9-46 March 17, 2010


Validation Approach
We will perform analysis and simulation of the mechanisms we consider and compare them to
results of traditional cooperative MIMO techniques. We will measure the reduction in overhead
and the increase in OICC as defined in C1.1.

Summary of Military Relevance


We consider all cases of military network environment, including troop movement, information
gathering and mission planning. We also consider all military communications environments –
multi-hop wireless, dynamic conditions, high mobility, heterogeneous nodes and heterogeneous
traffic. Knowledge of networking paradigms that increase OICC are important to improve the
information sharing capabilities of the military.

Research Products
In the first year we will:
- Report on the conditions within which interference management is practical, the
channel feedback needed from receivers to senders, and the attainable capacity gains
in combination with the use of multi-radio nodes under limited feedback conditions.

- Report on the performance of interference management against that of distributed


cooperative MIMO schemes, with emphasis on dynamic networks.

References

[1] A. Keshavarz-Haddad and R. Riedi, “Bounds for the capacity of wireless multihop networks
imposed by topology and demand”, ACM MobiHoc, pp. 256-265, September 2007.

[2] S. Toumis, “Asymptotic capacity bounds for wireless networks with non-uniform traffic,”
IEEE Transaction on Wireless Communications, vol. 7, No. 5, pp. 1-12, May 2008.

[3] B. Liu, D. Towsley, and A. Swami, “Data gathering capacity of large scale multihop
wireless networks,” ACM MobiHoc 2008.

[4] A. Ozgur, O. Leveque, and D. Tse, “Hierarchical Cooperation achieves Optimal Capacity
Scaling in Ad Hoc Neworks,” IEEE Transaction on Information Theory, Vol. 53, No. 10, pp.
2549-2572, 2007.

[5] M. Costa, “Writing on Dirty paper,” IEEE Transaction on Information theory, May 1983.

[6] M. Sharif, B. Hassibi, “On the capacity of MIMO broadcast channels with partial side
information,” IEEE Transaction on Information theory, vol. 51, pp. 506-522, February 2005.

[7] Z. Wang, M. Ji, H.R. Sadjadpour, and JJ. Garcia-Luna-Aceves, "Cooperation-Multiuser


Diversity Tradeoff in Wireless Cellular Networks ," IEEE Globecom 2009 conference.

[8] Z. Wang, M. Ji, H.R. Sadjadpour, and JJ. Garcia-Luna-Aceves, "Interference Management:
A New Paradigm in Wireless Cellular Networks ," IEEE Milcom 2009 conference.

NS CTA IPP v1.4 9-47 March 17, 2010


[9] J.J. Garcia-Luna-Aceves, H. Sadjadpour, and Z. Wang, “Challenges: Towards Truly
Scalable Ad Hoc Networks,” Proc. ACM MobiCom 2007, Montreal, QC, Canada, September
9--14, 2007.

[10] Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “Capacity-Delay Tradeoff for


Information Dissemination Modalities in Wireless Networks,” ISIT 08: 2008 IEEE
International Symposium on Information Theory, Toronto, Ontario, Canada, July 6--11,
2008.

[11] Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, ``The Capacity and Energy
Efficiency of Wireless Ad Hoc Networks with Multipacket Reception,'' Proc. ACM MobiHoc
2008, Hong Kong SAR, China, May 26--30, 2008.

[12] Z. Wang, S. Karande, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “On the Capacity
Improvement of Multicast Traffic with Network Coding,” Proc. IEEE MILCOM 2008, San
Diego, California, November 17--19, 2008. (IEEE Fred W. Ellersick 2008 MILCOM Award
for Best Unclassified Paper).

[13] S. Karande, Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “Network Coding Does
Not Change The Multicast Throughput Order of Wireless Ad Hoc Networks,” Proc. IEEE
ICC 2009, Dresden, Germany, June 14-18, 2009

[14] S. Karande, Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “Multicast Throughput


Order of Network Coding in Wireless Ad-hoc Networks,” Proc. IEEE SECON 2009, June
22--26, 2009, Rome, Italy.

9.6.7 Task C2.2: Characterizing the Benefits of In-Network Storage and


Cooperative Caching (G. Cao and T.F. La Porta, Penn State (CNARC); B.
Krishnamachari, USC (CNARC); I. Iyengar, IBM (INARC), T. Abdelzaher,
UIUC (INARC))

Task Overview
We will to investigate the capacity of a tactical network when nodes cache information for which
there has been interest around them. We expect that the capacity improvements will be a direct
function of the popularity of information, the distribution of nodes with common interest, and the
temporal aspect of the QoI of the information objects that may or may not be replicated
throughout the network. An interesting aspect of this work that will be addressed starting in year
two is which nodes may store information based on permissions (i.e., security considerations).
Our preliminary insight on the possibility of separation theorems for heterogeneous traffic
indicates that we should be able to characterize the capacity of a tactical network in the presence
of in-network storage by breaking the problem into the dissemination of individual objects, or by
focusing on individual “social networks.” One important distinguishing characteristic of our
models is that they will directly account for social networks and links between information as

NS CTA IPP v1.4 9-48 March 17, 2010


guidance for what information to store in which places, and to make searches for information
more efficient. This will require mapping social and information networks onto the
communication networks in concert with the SCNARC and INARC. We expect to take input
and have impact on INARC tasks I1.1 and I2.1. We will take input from SCNARC task S2.2. In
fact we have two researchers from INARC on this task.

Task Motivation
Several researchers [1][3][4][5][6][7][8][9] and ourselves [2] have shown that the “store-carry
forward” approach to information dissemination does increase the order throughput of a wireless
networks when nodes move. We have also demonstrated how opportunistically sharing
information among nodes locally, called cooperative caching [10][11], can reduce both the
latency of retrieving information and load on the networks. In [12] we have shown that the
scalability of a store-and-query network with limited resources such as energy and storage
depends critically on application-specific event and query traffic. In [13][14], we have explored
how different content replication strategies impact the latency of information retrieval in DTN-
type vehicular networks. Furthermore, our results on (n,m,k)-casting formulation clearly show
that the order capacity of a network increases as we allow information to flow from the nearest
node, rather than the original source of the information. This intuition is further supported by
recent demonstrations in the DARPA DTN program, where the use of in-network storage has
been shown to increase substantially the capacity of a network and in some cases to enable any
networking.

Given these encouraging preliminary results and the suitability of in-network storage to military
networks, we believe this approach has a significant possibility of providing order gains for
OICC.

Key Research Questions


In this task we seek to answer: How does intelligent and cooperative leveraging of node storage
and mobility improve QoI, and why?

Initial Hypotheses
We expect that the combination of knowledge of underlying social (who will use common
information) network and mobility characteristics will greatly improve the performance of in-
network storage mechanisms over algorithms. Coupled with the knowledge of information
requirements (e.g., latency, freshness) we can expect to see large (e.g., greater than 50%) gains in
OICC compared to traditional in-network storage algorithms.

Technical Approach
This task differs from others in that here we assume that information has some persistent value
(that perhaps degrades over time) and that information is shared by more than one member of a
network. As such, we can leverage the storage of nodes and their mobility in delivering
information, not just the links through which nodes communicate.

This task will consider two important factors that impact the benefits of in-network storage and
hence the strategies that will provide the best QoI: mobility and the existence of social and
information networks. Mobility impacts two aspects of in-network storage – where to store

NS CTA IPP v1.4 9-49 March 17, 2010


items to protect against network partitions, and on which nodes to store information so that it
may be carried to a different part of the network using node mobility. The existence of social
and information networks impacts which information should be stored where, and what priorities
should be given to information, e.g., if memory is finite, which items should be stored at the
expense of others.

The effects of mobility on caching


Most existing caching decisions assume that the network is relatively stable. When nodes
frequently move, the existing caching decisions may not work well. We notice that in tactical
networks, mobile nodes exhibit some correlated mobility patterns. For example, in a battlefield,
soldiers in the same platoon are geographically close and have similar mobility patterns. Such
platoon mobility model will certainly affect the caching decision.

A platoon itself may be partitioned as squads within the platoon move away. As a result,
network partitions may occur, or the cost of communicating between squads may become high.
To deal with this problem, nodes should be able to detect and predict platoon partitions or node
split. By monitoring or predicting the neighbor mobility pattern, a node may be able to predict
its split from other platoon members. Then, it can prefetch and cache some data in advance, to
ensure that the data is still available when the partition occurs.

Other mobility patterns will also have an impact on the effectiveness of caching. For instance, in
some cases some nodes may act as communication hubs among soldiers as they move around
different platoons. Then, it may be more effective to cache the popular data on these hub nodes,
by which the popular data can be easily accessed by the encountered nodes.

We will collaborate with the EDIN CCRI on mobility, specifically project E4. La Porta is a
member of the E4 team.

We plan to explore how best to aggregate, store, and (upon encounter with other mobile nodes)
disseminate such content in order to allow propagation of high confidence event detections while
maintaining storage and communication efficiency.

This year we will:


1. Study the impact of caching in environments in which group formation and splitting
occur. We will work with the mobility modeling task of the EDIN CCRI and reuse
their mobility models.
2. From the output of item 1, we will determine optimal replication strategies given a
likelihood of network partitions.

QoI-Aware cache management


In cooperative caching, cache management includes two problems: cache replacement and cache
admission control. When the cache is full, cache replacement algorithms are used to find a
suitable subset of data items for eviction from the cache. Cache admission control decides
whether a data item should be brought into the cache. Traditionally, the decision on whether to
admit an item into the cache was dependent on the probability of cache hits. For example, if the
data item that will not be accessed in the near future is brought into the cache at the cost of

NS CTA IPP v1.4 9-50 March 17, 2010


replacing another data item that will be accessed soon, the performance will be degraded. This
strategy does not, however, consider the type of information or the impact of caching on the QoI
that will be delivered to the requestor. Prior work in the area also does not explicitly consider
the joint impact of the presence of underlying social and information networks.

The most important issue in social-aware caching is to determine nodes that may cache data for
each other, due either to their mobility patterns or their interests. Some nodes may tend to share
information with each other or request common information. Thee characteristics may result
from underlying social networks. Currently, the “betweenness” centrality metric is widely used
in social network analysis. In the DTN scenario, the number of contacted neighbors of a mobile
node can be used to estimate its betweenness centrality. We can also extend this metric to
include a measure of common interests. We will study how to quantify the centrality of each
node in DTNs and map the node centrality to the popularity of the data that the node should
cache.

Likewise, certain pieces of information may be more frequently accessed or more important than
others. Moreover, certain pieces of information may be linked, i.e., often accessed together.
Current data dissemination schemes are generally data-centric ignoring the user interests or the
linkage of information. We will explore how to optimize the distribution of content from a given
repository given estimates of their popularity in a mobile network, taking into account the
underlying mobility and information linkages. T. Abdelzaher and A. Iyengar of INARC will
collaborate with us on this task. We plan to investigate solutions to this question using random-
walk and probabilistic gossip-based algorithms. We will also study user-centric data
dissemination in DTNs, which aims at forwarding data only to the interested nodes using the
minimum number of relays.

While prior work has examined various aspects of replication and cache consistency, none have
looked at QoI-aware cache management. For certain types of information, “freshness” may be
more important than low latency retrieval. Consider, for example, that traffic conditions in a
road may be much more valuable if they are generated within a few minutes of being used, but
are not sensitive to a few seconds of delay in retrieving them. In these cases it may e important
to refresh caches more frequently so information is fresh. In other cases latency may be more
important; in these cases it may be important to cache information in more places and refreshing
may not be as urgent.

Traditionally, to address QoI parameters such as latency, hop counts are used as a driving metric.
For example, if the requesting node is within (a system parameter) hops away from another
node that has cached the data, it will not cache the data; otherwise, it will. Hence, the same data
item is cached at least hops apart. There is a tradeoff between access latency and data
accessibility. With a small , the number of replicas for each data item is high, and the access
delay for these data will be low. However, with a fixed amount of cache space, the number of
distinct data items cached by the nodes becomes low. If there is a network partition, many nodes
may not be able to access these data items. On the other hand, a large can increase the data
accessibility, but the number of replicas for each data item will be small and the access delay for
these data may be a little bit longer. Based on the application, may take different values, and
we will examine the impact of on the overall cache performance.

NS CTA IPP v1.4 9-51 March 17, 2010


Given the existence of underlying social and information networks, and QoI requirements for
types of information, we will examine more sophisticated caching algorithms. Algorithms will
consider information links, social relationships and QoI requirements when determining
locations to store information and which information should have priority in memory-limited
systems.

In this year we will:

1. Work will work with SCNARC models of social relationships in sharing information to
determine the impact of social-aware caching.
2. We will work with the INARC descriptions of information importance and relationships
to determine the impact of information-aware caching.
3. We will study the essential difference between multicast and unicast in DTNs, and
formulate relay selections for multicast as a unified knapsack problem by exploiting node
centrality and social community structures.
4. Given mixes of information and node mobility, determine bounds on caching
performance given QoI requirements.

Validation Approach
We will analyze and simulate the performance of the resulting algorithms that make use of
realistic mobility models and underlying social and information networks. We will compare the
results to algorithms that only use communication network parameters or simple measures of
priority to drive cache management.

Summary of Military Relevance


Nodes in a military environment may cooperate with each other and often have significant
memory and storage capabilities. Military networks also have underlying social and information
networks and distinct mobility patterns. We believe that in-network storage is ideally suited to
this environment for these reasons.

Research Products
In this year we will:

- Produce algorithms for in-network storage that increase OICC by leveraging underlying
social and information networks
- Report on bounds on caching performance given QoI requirements.

References
[1] M. Grossglauser and D. Tse, “Mobility increases the capacity of ad-hoc wireless networks,”
Proc. of IEEE INFOCOM 2001, April 22-26 2001.

[2] R. de Moraes, H. Sadjadpour, and J. Garcia-Luna-Aceves, “Mobility-capacity-delay trade-


off in wireless ad hoc networks,” Ad Hoc Networks, vol. 4, no. 4, 2006.

NS CTA IPP v1.4 9-52 March 17, 2010


[3] W. Zhao, M. Ammar, and E. Zegura, “A message ferrying approach for data delivery in
sparse mobile ad hoc networks,” in MobiHoc ’04: Proceedings of the 5th ACMinternational
symposium on Mobile ad hoc networking and computing, pp. 187–198, 2004.

[4] Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, “Impact of human mobility
on opportunistic forwarding algorithms,” IEEE Transactions on Mobile Computing, vol. 6,
no. 6, pp. 606–620, 2007.

[5] E. Daly and M. Haahr, “Social network analysis for routing in disconnected delay-tolerant
manets,” in Mobi-Hoc ’07: Proceedings of the 8th ACM international symposium on Mobile
ad hoc networking and computing, pp. 32–40, 2007.

[6] P. Hui, J. Crowcroft, and E. Yoneki, “Bubble rap: Social based forwarding in delay tolerant
networks,” in MobiHoc’08: Proceedings of the 9th ACM international symposium on Mobile
ad hoc networking and computing, 2008.

[7] J. Ghosh, S. J. Philip, and C. Qiao, “Sociological orbit aware location approximation and
routing (solar) in manet,” Ad Hoc Netw., vol. 5, no. 2, pp. 189–209, 2007.

[8] J. Burgess, B. Gallagher, D. Jensen, and B. Levine, “Maxprop: Routing for vehicle-based
disruption-tolerant networks,” in INFOCOM’06: Proceedings of the 25th IEEE International
Conference on Computer Communications, 2006.

[9] L. Yin and G. Cao, “Supporting cooperative caching in ad hoc networks,” in IEEE
Transactions on Mobile Computing, vol. 5, pp. 77–90, 2006.

[10] G. Cao, “A sacalable low-latency cache invalidation strategy for mobile environments,” in
IEEE Transactions on Knowledge and Data Engineering, vol. 15, 2003.

[11] W. Gao, Q. Li, B. Zhao, and G. Cao, “Multicasting in delay tolerant networks: A social
network perspective,” in ACM Mobihoc, 2009.

[12] J. Ahn, B. Krishnamachari: Scaling laws for data-centric storage and querying in wireless
sensor networks. IEEE/ACM Trans. Netw. 17(4): 1242-1255 (2009)

[13] S. Ghandeharizadeh, S. Kapadia, B. Krishnamachari: Comparison of replication strategies


for content availability in C2P2 networks. Mobile Data Management 2005: 107-115

[14] S. Ghandeharizadeh, S. Kapadia, B. Krishnamachari: An evaluation of availability latency


in carrier-based vehicular ad-hoc networks. MobiDE 2006: 75-82

NS CTA IPP v1.4 9-53 March 17, 2010


9.6.8 Task C2.3: Characterizing the Impact of Scheduling on QoI (M. Neely and B.
Krishnamachari, USC (CNARC), A. Yener and T. F. La Porta, Penn State
(CNARC), A. Bar-Noy, CUNY (CNARC))

Task Overview
In this task we explore low complexity techniques related to scheduling to improve the QoI and
OICC in a network. We first seek the existence of universal scheduling algorithms that provide
guarantees in the types of networks we are interested in: highly dynamic. We then characterize
scheduling algorithms in terms of the tradeoffs, specifically delays vs. utility.

Our approach is to extend stochastic network optimizations to find optimal scheduling


disciplines.

Task Motivation
Optimization theory plays a crucial role in designing networks. Traditionally, well known
optimization techniques are invoked in hopes of understanding certain traits or obtaining insights
to network design. We must recognize that for contemporary communication networks,
including tactical MANETs, traditional optimization approaches are not sufficient to address the
underlying dynamics which are more often than not unpredictable. In other words, they do not
treat dynamic situations where the quantities to be optimized are themselves changing due both
to online control decisions (such as resource allocation and routing) and to unpredictable events
(such as traffic bursts, fading channels, and mobility). Traditionally, Markov decision theory
and Dynamic Programming theory have been used to address time varying systems with known
probability models. However, such approaches have well known complexity explosion problems
when networks are large, and also typically require the underlying system to act according to a
given (and perhaps incorrect) probability rule.

Key Research Questions


In this task we seek to answer the question: Can a universal scheduling algorithm be developed
considering QoI?

Initial Hypotheses
We expect that under certain non-restrictive assumptions, we will be able to find optimal
scheduling disciplines which will lead us to a universal scheduling algorithm for QoI-aware
networks.

Technical Approach
It is important to develop low complexity and adaptive techniques for optimization of time
varying functions subject to time varying constraints. Such a theory should be informed by
existing convex programming, Markov decision, and exact (and approximate [18]) dynamic
programming theory, but should allow practical solutions to complex dynamic network
problems, including the problems that appear in the different tasks of this project. The emerging
field of stochastic network optimization provides auspicious results in this direction. However,
there are several gaps in the state-of-the art stochastic network optimization theory. First, the
existing theory still relies on structured probabilistic assumptions that may not be valid in actual
networks. Second, there is still much unknown in the area of fundamental delay tradeoffs when

NS CTA IPP v1.4 9-54 March 17, 2010


optimizing stochastic networks in terms of utility or energy efficiency, and existing algorithms
are known to often suffer from large delay. Third, the existing theory is typically applied to
MANETs with link-based and packet-based structure, and must be augmented to be compatible
with our desired Quality of Information (QoI) metrics and to incorporate modern concepts of
information processing.

This task seeks to fill these gaps in the following ways:

1. We will develop new theories for handling uncertainty in networks with general
traffic, channel, and mobility dynamics. Of particular interest is the development of
“universal scheduling algorithms” that provide performance guarantees for time
varying networks with time varying constraints, without requiring a probability
model for the time variation.

2. We will investigate new scheduling approaches to reduce network delay, together


with fundamental energy/delay and utility/delay tradeoffs for multi-hop networks.

3. In subsequent years we will show how to incorporate complex objectives and


sophisticated data fusion and information processing techniques into the optimization
paradigm. This will build bridges between the different tasks of this project and the
particular optimization methods that can be used for them.

Universal Scheduling for Networks with Uncertain Dynamics:

Networks experience unexpected events. Links can fail, nodes can move, and traffic bursts can
bring congestion with unpredictable timescales and spatial locations. It is clear that perfect
knowledge of future events could dramatically improve network performance. For example,
knowledge of an upcoming failure at a primary link of a path could be used to preemptively re-
route data. Knowledge that, in the near future, a certain node is going to move into range of a
source and then immediately move into range of its intended destination can be used for
opportunistic relaying. Knowledge of an upcoming traffic flood can be used to mitigate its
detrimental impact. There are of course many more examples of complex sequences of arrival,
channel, and mobility events that, if known in advance, could be exploited to yield improved
performance. However, because realistic networks to not have knowledge of the future, it is not
clear if these events can be practically used. Further, even if full future information were known,
it is not clear how to optimize over the many combinatorial sequences of actions to exploit this
knowledge.

Existing theories of opportunistic scheduling and stochastic network optimization provide partial
solutions to this problem. Techniques of max-weight scheduling, backpressure, and Lyapunov
optimization can treat networks with random traffic, channels, and mobility, often without
knowing the underlying probability distributions associated with these random events. We have
contributed significantly to this area in both theory and practice (see, for example, [1] for
stochastic network optimization theory, [5] for optimal energy-delay tradeoff theory, [7] for
incorporation of new information processing capabilities, and [17] for low-delay implementation
of backpressure routing). However, the strongest known performance bounds (such as for

NS CTA IPP v1.4 9-55 March 17, 2010


throughput and delay) are for simple cases when random events are memoryless in nature, such
as when arrivals are Poisson, and/or when channel variations are i.i.d. over timeslots. These
assumptions take advantage of the rapid “law of large number” averaging properties of
memoryless events.

It is possible to extend these claims to allow more general ergodic assumptions on the underlying
stochastic processes, although the performance bounds degrade in proportion to the “mixing
times” of the ergodic processes. Further, it is possible to develop analytical claims concerning
non-ergodic systems, such as when traffic yields “instantaneous rates” that can vary arbitrarily
inside a network capacity region [1], or when “instantaneous capacity regions” can vary
arbitrarily but are assumed to always contain the traffic rate vector [3]. However, the prior non-
ergodic analyses still assume an underlying probability model, and make assumptions about
traffic rates and network capacity with respect to this model.

What is missing is a universal scheduling theory that adapts to any network, without any
probabilistic assumptions. Such a theory should show how to compute the optimal performance
of a network in the ideal case when the full future is known, should incorporate general network
constraints, and should quantify the performance gap due to our lack of knowledge of the future.
The universal theory should also provide decision making strategies that track the ideal optimum
as much as possible, within the fundamental performance gap bounds. It is not obvious if such a
theory exists, and if any type of performance guarantees can be made without a probability
model. We believe that it is possible, and we propose to develop such a theory during the course
of this project.

Our optimism is informed by the fact that such universal algorithms exist in other areas. For
example, the universal Lempel-Ziv data compression algorithm operates on arbitrary files.
Universal stock portfolio allocation algorithms hold for arbitrary price sample paths
[12][13][14]. There are also network algorithms that have universal properties for limited types
of networks. This includes competitive ratio approaches for wireline and simple classes of
wireless networks without channel variation or mobility and competitive ratio and adversarial
queueing theory approaches for scheduling in switching systems [15] and time varying wireless
links [16]. We have also made important advances using competitive ratio analysis applied to
on-line streaming and computer switching policies [8][9], and to graph coloring problems
(related to interference problems in wireless networks) [10][11]. However, there is a significant
gap in our understanding of universal scheduling in MANETs.

Our first year goal for this task is to demonstrate the existence of universal scheduling algorithms
for wireless networks with general time varying dynamics in the traffic, channel, and mobility
processes. We shall use the “competitive ratio metric” and/or the related “T-slot lookahead”
metric to show how practical algorithms can be designed (without knowledge of the future) that
closely track the performance of ideal algorithms that have knowledge about the future. This will
be demonstrated by meeting the following milestones:
1. For simple networks, we will quantify the achievable performance of idealized
algorithms that have certain levels of knowledge about the future, using metrics related to
competitive ratios and/or T-slot lookahead.

NS CTA IPP v1.4 9-56 March 17, 2010


2. For simple networks, we will develop practical algorithms that handle arbitrary traffic,
channels, and mobility, with quantifiable performance gap bounds measured with respect
to the metrics of item 1.

Performance and Delay Tradeoffs in Multi-Hop MANETs:

The max-weight, virtual queue, and backpressure scheduling techniques of stochastic network
optimization are well known to optimize throughput metrics and to minimize average power
and/or meet specific average power constraints (see [1] and references therein). However, these
scheduling techniques can lead to large network delay, particularly for multi-hop networks.
Further, the fundamental delay tradeoffs when different performance metrics are optimized (such
as energy or throughput-utility) are known only for single-hop networks and limited classes of
multi-hop [4][5][19][20].

Two recent advances made by different members of our team suggest a possible dramatic
breakthrough in the area of network delay for multi-hop networks for different scheduling
disciplines. First, our work in [20] demonstrated a modified backpressure rule that achieves a
near-optimal delay tradeoff, dramatically improving the prior linear delay bound to a logarithmic
delay bound. However, the algorithm given in [20] requires prior knowledge of Lagrange
multiplier information that is time consuming to obtain in practice. It was not clear if this
modified backpressure approach could be made practical until the second recent advance: The
work in [17] demonstrated a successful implementation of diversity-based backpressure (related
to the Diversity Backpressure Routing Algorithm (DIVBAR) in [20]). The implementation
showed that backpressure routing beats existing tree-based or shortest path routing for diversity
scheduling. Further, a simple change to using Last-In-First-Out (LIFO) scheduling was shown to
yield a dramatic delay improvement (up to 98% in some cases). While it is not yet clear why
such a remarkable delay improvement is achieved, we believe that this change to LIFO is a
simple and practical way of implementing the modified backpressure rule of [20], without
knowing the Lagrange multipliers. This suggests that the dramatic 98% improvement seen in
experiments and the dramatic linear-to-logarithmic improvement in modified backpressure are
one-and-the-same. We propose to study this further, and this study may lead to important
discoveries about network delay and performance delay tradeoffs.

In addition to simple changes to LIFO scheduling, a third significant advance has shown that
incorporating information theoretic data manipulation can improve delay tradeoffs and can
significantly reduce complexity [6][7]. This holds not only for throughput optimization
problems, but problems of energy minimization. Energy-Delay issues have been previously
explored without information theoretic data processing. A fundamental square-root energy-delay
tradeoff law is developed in [4]for a single wireless link, and this is extended in [5] for a multi-
user downlink. Our work in [6][7] looks at energy-delay problems in simple multi-hop network
models. This work develops algorithms that operate with different levels of source cooperation
and availability of queue state information at the individual sources. Our preliminary results
showed that with limited feedback of only one bit queue information is shown to approach the
optimal cost, while providing a low average packet delay. The results illustrate new energy-delay
tradeoffs based on different levels of cooperation and queue information availability. As future
work in this direction, we aim to extend to more general multi-hop dynamic networks, and to

NS CTA IPP v1.4 9-57 March 17, 2010


obtain decentralized algorithms with only local queue information which provides near-optimal
cost-QoI trade-offs. We plan to explore distributed and scalable network design to exploit energy
saving along with throughput, stability, and other QoI constraints for general networks. The
priority will be given to developing decentralized algorithms with limited or no feedback.

These problems of network delay are known to be notoriously difficult, and new breakthroughs
can have significant impact. Our first year goals in this area are summarized below:
1. We will provide a mathematical foundation for delay analysis of the LIFO based
backpressure rule, for simple classes of networks. This will improve our
understanding of network delay, explain the 98% improvement observed in practice,
and may suggest practical algorithms for dramatic delay improvement in more
complex networks.
2. We will develop new energy/delay tradeoffs for multi-hop networks, possibly
leveraging information theoretic results. We will also begin to explore how such
tradeoffs can be used in more general cost-QoI tradeoffs.

Incorporating Complex Network Operations and Building Bridges:

Our universal scheduling algorithms and delay analysis methods should be general enough to
apply to dynamic networks that must perform complex tasks with general QoI metrics. The
particular problems and QoI metrics of interest are described in their respective task areas. We
will be intentional about building bridges between the different tasks of this project and the
optimization theory that might be used for them. This includes extending the optimization
theory, asking if the extended theory can be used in the desired problem, and iterating on this
process to provide a meaningful and useful collection of results. This iteration process is most
applicable to years 2 and beyond. However, in year 1 we seek to present preliminary
representative examples of networks with extended functionality and QoI metrics that can be
incorporated into the optimization paradigm.

Validation Approach
For universal scheduling we will prove that an optimal scheduling discipline exists, or determine
bounds. We will also determine the complexity of the algorithm. We will analyze and simulate
specific algorithms and compare their performance to the bounds on optimality that we have
derived.

Summary of Military Relevance


Universal optimal scheduling for QoI-aware networks will increase the amount of useful
information that is delivered to decision makers on time. This will assist in increasing mission
tempo.

Research Products
This year we will:
- Report on the achievable performance of idealized algorithms that have certain
levels of knowledge about the future, using metrics related to competitive ratios
and/or T-slot lookahead.

NS CTA IPP v1.4 9-58 March 17, 2010


- Develop practical algorithms that handle arbitrary traffic, channels, and mobility,
with quantifiable performance gap bounds measured with respect to the metrics of
item 1.
- Provide a mathematical foundation for delay analysis of the LIFO based
backpressure rule, for simple classes of networks.

References
[1] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource Allocation and Cross-Layer Control
in Wireless Networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-149,
2006.

[2] L. Tassiulas and A. Ephremides, “Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop Radio Networks,” IEEE
Transactions on Automatic Control, vol. 37, no. 12, pp. 1936-1949, Dec. 1992

[3] M. J. Neely and R. Urgaonkar, “Cross layer adaptive control for wireless mesh networks,”
Ad Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, August 2007.

[4] R. Berry and R. G. Gallager. Communication over Fading Channels with Delay Constraints.
IEEE Transactions on Information Theory, 48(5):1135–1149, May 2002.

[5] M. J. Neely. Optimal Energy and Delay Tradeoffs for Multi-User Wireless Downlinks.
IEEE Transactions on Information Theory, 53(9), September 2007.

[6] E. N. Ciftcioglu, A. Yener and R. Berry, Stability of Bi-Directional Cooperative Relay


Networks, in Proceedings of the IEEE Information Theory Workshop, ITW'08, Porto,
Portugal, May 2008.

[7] E. N. Ciftcioglu, Y. E. Sagduyu, R. Berry, and A. Yener, Cost Sharing with Network Coding
in Two-Way Relay Networks, In Proc. of the 47th Annual Allerton Conference on
Communication, Control, and Computing, Allerton'09, Monticello, IL, September 2009.

[8] A. Bar-Noy and R. Ladner, ``Competitive On-Line Stream Merging Algorithms for Media-
on-Demand,'' Journal of Algorithms (JALG), 48(1):59--90, August 2003.

[9] A. Bar-Noy, A. Freund, S. Landa, and J. Naor, ``Competitive On-Line Switching Policies,''
Algorithmica, 36(3):225--247, May 2003.

[10] A. Bar-Noy, P. Cheilaris, and S. Smorodinsky, ``Conflict-Free Coloring for Intervals: from
Offline to Online,'' ACM Transactions on Algorithms (TALG), 4(4):44:1--44:18, 2008.

[11] A. Bar-Noy, P. Cheilaris, S. Olonetsky, and S. Smorodinsky, ``Online Conflict-Free


Colorings for Hypergraphs,'' accepted for publication (Aug 16, 2009) in Combinatorics,
Probability and Computing (CPC).

[12] T. M. Cover, "Universal Portfolios," Mathematical Finance, vol. 1, no. 1, pp. 1-29, Jan.
1991.

NS CTA IPP v1.4 9-59 March 17, 2010


[13] N. Merhav and M. Feder, "Universal Schemes for Sequential Decision from Individual Data
Sequences," IEEE Transactions on Information Theory, vol. 39, no. 4, pp. 1280-1292, July
1993.

[14] M. J. Neely, "Stock Market Trading via Stochastic Network Optimization," ArXiv Technical
Report, arXiv:0909.3891v1, Sept. 2009.

[15] M. Andrews, "Maximizing Profit in Overloaded Networks," Proc. IEEE INFOCOM, March
2005.

[16] X. Meng, T. Nandagopal, S. H.Y. Wong, H.Yang , S. Lu, “Scheduling Delay-Constrained


Data in Wireless Data Networks,” Proc. WCNC, 2007.

[17] Moeller, Sridharan, Krishnamachari, Gnawali, “Backpressure Routing Made Practical,”


Submitted to Hotnets 09.

[18] W. B. Powell, Approximate Dynamic Programming: Solving the Curses of Dimensionality,


John Wiley & Sons, 2007

[19] M. J. Neely, "Super-Fast Delay Tradeoffs for Utility Optimal Fair Scheduling in Wireless
Networks," IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on
Nonlinear Optimization of Communication Systems, vol. 24, no. 8, pp. 1489-1501, Aug.
2006.

[20] L. Huang and M. J. Neely, "Delay Reduction via Lagrange Multipliers in Stochastic
Network Optimization," Proc. of 7th Intl. Symposium on Modeling and Optimization in
Mobile, Ad Hoc, and Wireless Networks (WiOpt), June 2009.

NS CTA IPP v1.4 9-60 March 17, 2010


9.6.9 Linkages with Other Projects

This project has close ties to the other active project within CNARC as well as both CCRIs and
projects within the other centers.

Within the CNARC this takes output from C.1. By understanding the models of OICC and what
factors impact them, we will focus on these phenomena within C.2.

We will work with INARC on leveraging in-network storage. The collaboration will be
specifically on C.2.2.

This project requires output from EDIN. This project will use the mobility models from EDIN
as well as the models that relate the interaction of social, information and communication
networks in terms of change in structure.

9.6.10 Collaborations and Staff Rotations

9.6.11 Relation to DoD and Industry Research

Research Milestones

Due Task Description


Derive metrics for QoI-aware networks with heterogeneous
Q2 Task 1
traffic.
We will determine optimal replication strategies given a
Q2 Task 2
likelihood of network partitions
We will quantify the achievable performance of idealized
algorithms that have certain levels of knowledge about the
Q2 Task 3
future, using metrics related to competitive ratios and/or T-slot
lookahead.
Determine conditions within which interference management is
practical, the channel feedback needed from receivers to
Q3 Task 1
senders, and the attainable capacity gains in combination with
the use of multi-radio nodes under limited feedback conditions.
Q3 Task 2 We will develop practical algorithms that handle arbitrary

NS CTA IPP v1.4 9-61 March 17, 2010


Research Milestones

Due Task Description


traffic, channels, and mobility, with quantifiable performance
gap bounds measured with respect to metrics of 1Q.
Compare the performance of interference management against
Q4 Task 1 that of distributed cooperative MIMO schemes, with emphasis
on dynamic networks.
Given mixes of information and node mobility, determine
Q4 Task 2
bounds on caching performance given QoI requirements
We will provide a mathematical foundation for delay analysis of
the LIFO based backpressure rule, for simple classes of
networks. This will improve our understanding of network
Q4 Task 3
delay, explain the 98% improvement observed in practice, and
may suggest practical algorithms for dramatic delay
improvement in more complex networks

Budget By Organization

Government Funding
Organization Cost Share ($)
($)
CUNY (CNARC) 98,000
PSU (CNARC) 256,930 114,000
UCSC (CNARC) 110,000 48,187
USC (CNARC) 147,000
TOTAL 611,930 162,187

NS CTA IPP v1.4 9-62 March 17, 2010


9.7 Project C3: Achieving QoI Optimal Networking (Deferred)

Project Lead: TBD;


Email: Phone:

Primary Research Staff Collaborators

9.7.1 Project Overview


In projects C1 and C2 we focus on modeling fundamental properties of networks and verifying
these models via experimentation. We alluded to several reasons why networks, in practice,
might not achieve the QoI predicted by the models developed: (i) inaccurate assumptions; (ii)
characteristics such as heterogeneity, uncertainty and mobility; (iii) interactions between
different phenomena that are not recognized a priori; (iv) processing limitations and (v) protocol
limitations. The first three items can be corrected by iterating between models and
experimentation. The last two are more problematic – they dictate what we can actually achieve
in current networks.

In this project we propose, based on the results of our modeling and experimentation, to analyze
the aspects of protocol structure that limit achieving the theoretical QoI.

Our approach is to undertake an iterative process whereby experimental observations drive the
modeling and protocol analysis. By iterating on this process when can determine how close to
the optimal QoI a network can achieve, how to do it, and gain more insight into the fundamental
models characterizing the limits of the network.

While we cannot know for sure the results of our modeling and experimentation at this time, our
intuition leads us to suspect that certain characteristics of protocols are likely to lead to
limitations. Foremost, because of mobility and uncertainty present in mobile tactical networks,
signaling overhead tends to become a dominant factor in performance degradation. Virtually all
protocols require the exchange of some information to operate; typically the more “optimal” the
algorithm the protocol attempts to implement, the more information, and hence signaling, it
requires.

NS CTA IPP v1.4 9-63 March 17, 2010


9.7.2 Project Motivation
There is a vast amount of work on protocol design for wireless networks, and efforts in various
contexts have already been reported in published books (examples include [1][2] and [3]).
However, much of the previous work on communication protocols for MANETs has its genesis
in protocols designed for wireline networks. They fail to change the fundamental structure of the
protocols to account for the new environment, and are often limited by their overhead.
Challenges of Network-Centric Operations
Tactical wireless networks differ from wireline networks and today’s wireless meshes at home
and the office in many ways. A tactical wireless network is characterized by: (i) varying mobility
patterns of nodes, (ii) the heterogeneity of nodes and links, (iii) the inherent vulnerabilities and
characteristics of radio links, (iv) an untethered nature, (v) the scarcity of bandwidth, (vi) the
ability of nodes to cooperate with one another in the transmission and reception of information
packets, (vii) the ability of nodes to store and process information while they move, and (viii) the
importance of supporting multicast traffic. In addition, the transition of the U.S. military to
network-centric warfare imposes multiple constraints on the timeliness, secrecy, and reliability
of information dissemination throughout the field that are very different from those requirements
that exist in commercial wireless meshes.
Example Military Scenarios
This improves the OICC by exploiting new network paradigms. We apply these paradigms in
like of needs of tactical networks and QoI. The project ultimately accounts for interactions with
information and social networks.

We consider all cases of military network environment, including troop movement, information
gathering and mission planning. We also consider all military communications environments –
multi-hop wireless, dynamic conditions, high mobility, heterogeneous nodes and heterogeneous
traffic.

Impact on Network Science


If successful we will uncover new protocol mechanisms and approaches that directly related to
OICC.

9.7.3 Key Project Research Questions


We seek to answer the questions
- What are the structural limitations that protocols impose on achieving QoI?
- How can these limitations be removed?

9.7.4 Initial Hypotheses


We expect to find that protocols are not poorly designed, i.e., they are not inefficient. Instead we
expect to find that in some cases they do not seek to control the proper network properties to
increase OICC.

9.7.5 Technical Approach


We will defer the start of this research until year 2 of the program.

References

NS CTA IPP v1.4 9-64 March 17, 2010


[1] P.Mohapatra and S.Krishnamurthy, Ad hoc Networks: Technologies and Protocols.
Springer, New York, 2004.

[2] A.Boukerche, Algorithms and Protocols for Mobile Ad hoc Networks. Wiley Series on
Parallel and Distributed Computing, 2008.

[3] F. Anjum and P. Mouchtaris, “Security for wireless ad hoc networks.” John Wiley, 2007.

NS CTA IPP v1.4 9-65 March 17, 2010


10 NS CTA Events Schedule (First 15 Months)

Month Title Explanation


Oct 19 ARL briefing to IRC
Oct 19 Weekly TMG telecons
begin
Oct 30 TMG and GM initial face-
to-face
Nov 6 IRC visit to ARL IRC leads visit to ALC
Nov 16-18 NS CTA Technical includes first ECC meeting
Kickoff
Dec 4 First IPP draft distributed CTA review and comments on IPP, solidify
collaboration
Dec 17 CMC votes to approve
submission of IPP to ARL
Dec 18 Submit CMC-approved
IPP version 1.0 to ARL
Dec 24 IPP approved by ARL
Jan 2010 IPP research begins Authorization to proceed to researchers that will be
funded
Jan 15 Quarterly Report inputs
due
Jan ECC meeting at UDel Coordination of education, summer institutes, etc
Feb Experimental plan review Combined face-to-face and telecon discussion at ALC
Feb IPP mod-1 submitted
Mar NS Facility initial First Consortium researchers arrive at NS Facility
operation (Note: some researchers may start in temporary
Facility space as early as January)
Apr 15 Quarterly Report inputs
due
Apr 19-21 RDECOM Technical Leads and selected researchers attend, selected
Planning Workshop (at posters or demonstrations in collaboration with ARL
ALC) researchers
May 2011 APP guidance Get templates, themes, CCRIs, collaboration, and
distributed possible Army exercises and IRC experiment plans to
APP authors; make available for researchers not
currently engaged in NS CTA research
Jun Draft APP research task Iterate with ARL and Consortium leads to refine and
whitepapers due harmonize
Jun IRC Facility ribbon Formal opening and open house. Date to be
cutting determined for availability of Mr. Miller and other
invited guests

NS CTA IPP v1.4 10-1 March 17, 2010


Jun 15 Quarterly Report inputs
due
Jun 16-18 USMA NS workshop Leadership, ARL, and selected researchers participate
Jun-Jul Summer NS Institute Initial kickoff of the series: focused interdisciplinary
research program at the NS CTA Facility
Jul APP first draft Distribute to leadership
Jul Comments back to APP Iterate with ARL and Consortium leads for
research task authors developing final versions of research tasks
Jul IRC Technology Initial workshop will focus on determining readiness
Transition Workshop of particular cross-genre projects, tool concepts… for
6.2 transition and general technology transition
beyond the NS CTA
Aug APP finalized, CMC vote
to submit to ARL
Aug Topic focus week at NS
Facility
Aug/Sep Ft. Dix: C4ISR-OTM Possible researcher participation
Sep APP approval by Announce approved 2011 research tasks to NS CTA
ARL/RMB
Oct 15 Quarterly Report inputs
due
Oct Annual ARL program Leaders present final first-year results to ARL/RMB
review
Oct Funding authorizations Initial authorization to proceed followed by
distributed incremental funding letters.
Nov RMB meeting Presentation of initial research results, and discussion
of Army and DoD needs
Nov/Dec Army Science Conference (Even years) IRC coordinates selected researchers
participating in Army ASC displays and
demonstrations. Presentations, posters, and
demonstrations to showcase the NS CTA’s research
to wide Army audience and invite engagement.
Nov/Dec NS CTA Conference Leads, ARL, RMB planning for 2011 NS CTA
Planning participation in CTA Conference

NS CTA IPP v1.4 10-2 March 17, 2010


11 First Year Budget by Project and Organization

Table of Contents

11 First Year Budget by Project and Organization ..................................................... 11-1


11.1 Subtotals per Center ........................................................................................ 11-1
11.2 Subtotals per Project ....................................................................................... 11-6
11.3 Subtotals per Institution ................................................................................ 11-10
11.4 IRC Funding by Category (6.1/6.2) .............................................................. 11-15
11.5 Cost Share ..................................................................................................... 11-17

The following sections show different views of the budget. Section 11.1 shows the
Consortium’s budget sorted by Center; Section 11.2, by project; Section 11.3, by
institution. These first three sections are shown at the project level. Section 11.4 shows
the IRC’s budget, at task level, sorted by type of funding (6.1/6.2). Either 6.1 or 6.2
dollars fund each task in that section. (Note that some projects have both 6.1 and 6.2
tasks). Section 11.5 shows cost-share data. Within each view, the respective subtotals
are included.

11.1 Subtotals per Center

Category Center Project Institution Amount


6.1 CNARC C1 BBN $66,240
6.1 CNARC C1 CUNY $75,000
6.1 CNARC C1 PSU $274,766
6.1 CNARC C1 UCD $100,000
6.1 CNARC C1 UCR $193,000
6.1 CNARC C1 USC $274,014
6.1 CNARC C2 CUNY $98,000
6.1 CNARC C2 PSU $256,930
6.1 CNARC C2 UCSC $110,000
6.1 CNARC C2 USC $147,000
6.1 CNARC E1 CUNY $22,000
6.1 CNARC E2 BBN $44,000
6.1 CNARC E2 CUNY $24,614
6.1 CNARC E2 UCD $50,000
6.1 CNARC E2 UCSC $84,614
6.1 CNARC E3 PSU $30,000
6.1 CNARC E3 UCD $100,000

NS-CTA IPP v1.4 11-1 March 17, 2010


Category Center Project Institution Amount
6.1 CNARC E3 UCSC $25,000
6.1 CNARC E4 CUNY $15,000
6.1 CNARC E4 PSU $55,589
6.1 CNARC E4 UCD $30,000
6.1 CNARC E4 USC $73,000
6.1 CNARC T1 NCState $110,240
6.1 CNARC T1 UCD $129,014
6.1 CNARC T1 USC $45,000
6.1 CNARC T2 Stanford $110,240
6.1 CNARC T2 UCD $110,000
6.1 CNARC T2 USC $10,000
6.1 CNARC T3 PSU $35,000
6.1 CNARC T3 UCD $30,000
6.1 CNARC T3 UCR $27,480
CNARC Total $2,755,741
6.1 INARC E1 UCSB $45,323
6.1 INARC E2 CMU $66,852
6.1 INARC E2 IBM $35,070
6.1 INARC E2 UCSB $129,389
6.1 INARC E3 CMU $20,891
6.1 INARC E3 NWU $69,601
6.1 INARC E3 UCSB $48,814
6.1 INARC E3 UIUC $91,961
6.1 INARC E3 UMich $76,253
6.1 INARC I1 CUNY $101,193
6.1 INARC I1 IBM $118,761
6.1 INARC I1 UCSB $111,873
6.1 INARC I1 UIUC $208,536
6.1 INARC I2 IBM $89,756
6.1 INARC I2 PARC $35,646
6.1 INARC I2 UCSB $274,239
6.1 INARC I2 UIUC $56,664
6.1 INARC I3 CMU $32,590
6.1 INARC I3 CUNY $36,664
6.1 INARC I3 IBM $110,227
6.1 INARC I3 UCSB $89,624
6.1 INARC I3 UIUC $269,912
6.1 INARC T1 CUNY $80,849
6.1 INARC T1 IBM $192,205
6.1 INARC T1 PARC $94,341
6.1 INARC T1 UCSB $37,604
6.1 INARC T1 UIUC $188,936

NS-CTA IPP v1.4 11-2 March 17, 2010


Category Center Project Institution Amount
6.1 INARC T3 IBM $41,967
INARC Total $2,755,741
6.1 IRC E1 BBN $147,286
6.1 IRC E1 RPI $132,854
6.1 IRC E2 BBN $177,147
6.1 IRC E2 UCR $62,256
6.1 IRC E2 UMass $25,144
6.1 IRC E3 BBN $154,436
6.1 IRC E4 BBN $47,313
6.1 IRC EDUC UDEL $48,276
6.1 IRC R1 BBN $478,743
6.1 IRC R1 Harvard $79,915
6.1 IRC R1 NEU $77,693
6.1 IRC R1 UCR $124,511
6.1 IRC R1 UMass $50,288
6.1 IRC R1 UMich $93,559
6.1 IRC R1 UMinn $45,611
6.1 IRC R2 BBN $189,264
6.1 IRC R2 CMU $38,983
6.1 IRC R2 NWU $60,940
6.1 IRC R2 RPI $66,427
6.1 IRC R2 UDEL $51,494
6.1 IRC R3 Artistech $48,722
6.2 IRC R3 Artistech $133,773
6.1 IRC R3 BBN $112,995
6.2 IRC R3 BBN $363,013
6.1 IRC R3 NWU $32,814
6.2 IRC R3 UDEL $119,080
6.1 IRC R3 UIUC $75,920
6.1 IRC R3 UMinn $30,407
6.1 IRC R3 Williams $28,330
6.2 IRC R3 Williams $12,142
6.1 IRC R4 BBN $17,543
6.2 IRC R4 BBN $4,386
6.1 IRC R4 RPI $17,714
6.2 IRC R4 RPI $4,428
6.1 IRC R4 UMass $6,705
6.2 IRC R4 UMass $1,676
6.1 IRC R5 BBN $412,062
6.2 IRC R5 BBN $361,502
6.1 IRC T1 BBN $203,130
6.1 IRC T2 BBN $103,667

NS-CTA IPP v1.4 11-3 March 17, 2010


Category Center Project Institution Amount
6.1 IRC T2 UCR $62,255
6.1 IRC T3 BBN $203,128
6.1 IRC T3 UCR $62,256
6.1 IRC T3 UDEL $102,989
IRC Total $4,672,777
6.1 SCNARC E1 RPI $15,250
6.1 SCNARC E3 CUNY $26,046
6.1 SCNARC E3 IBM $47,115
6.1 SCNARC E3 IU $99,000
6.1 SCNARC E3 MIT $28,017
6.1 SCNARC E3 ND $49,500
6.1 SCNARC E3 NEU $53,235
6.1 SCNARC E3 RPI $67,528
6.1 SCNARC E4 CUNY $45,580
6.1 SCNARC E4 MIT $28,017
6.1 SCNARC E4 NEU $114,714
6.1 SCNARC E4 RPI $18,385
6.1 SCNARC S1 CUNY $32,557
6.1 SCNARC S1 IBM $329,804
6.1 SCNARC S1 IU $11,000
6.1 SCNARC S1 MIT $14,009
6.1 SCNARC S1 MIT $40,000
6.1 SCNARC S1 ND $11,000
6.1 SCNARC S1 NEU $96,969
6.1 SCNARC S1 NWU $14,000
6.1 SCNARC S1 NYU $40,000
6.1 SCNARC S1 RPI $45,921
6.1 SCNARC S2 CUNY $26,046
6.1 SCNARC S2 IBM $47,115
6.1 SCNARC S2 ND $11,000
6.1 SCNARC S2 NEU $66,230
6.1 SCNARC S2 RPI $503,704
6.1 SCNARC S3 CUNY $65,115
6.1 SCNARC S3 RPI $221,138
6.1 SCNARC T1 CUNY $65,115
6.1 SCNARC T1 IBM $23,557
6.1 SCNARC T1 NWU $56,000
6.1 SCNARC T1 RPI $138,020
6.1 SCNARC T1 UMD $71,000
6.1 SCNARC T2 ND $38,500
6.1 SCNARC T2 RPI $171,997
6.1 SCNARC T3 IBM $23,557

NS-CTA IPP v1.4 11-4 March 17, 2010


Category Center Project Institution Amount
SCNARC Total $2,755,741
Grand Total $12,940,000

NS-CTA IPP v1.4 11-5 March 17, 2010


11.2 Subtotals per Project

Category Center Project Institution Amount


6.1 CNARC C1 BBN $66,240
6.1 CNARC C1 CUNY $75,000
6.1 CNARC C1 PSU $274,766
6.1 CNARC C1 UCD $100,000
6.1 CNARC C1 UCR $193,000
6.1 CNARC C1 USC $274,014
C1 Total $983,020
6.1 CNARC C2 CUNY $98,000
6.1 CNARC C2 PSU $256,930
6.1 CNARC C2 UCSC $110,000
6.1 CNARC C2 USC $147,000
C2 Total $611,930
6.1 IRC E1 BBN $147,286
6.1 CNARC E1 CUNY $22,000
6.1 IRC E1 RPI $132,854
6.1 SCNARC E1 RPI $15,250
6.1 INARC E1 UCSB $45,323
E1 Total $362,713
6.1 CNARC E2 BBN $44,000
6.1 IRC E2 BBN $177,147
6.1 INARC E2 CMU $66,852
6.1 CNARC E2 CUNY $24,614
6.1 INARC E2 IBM $35,070
6.1 CNARC E2 UCD $50,000
6.1 IRC E2 UCR $62,256
6.1 INARC E2 UCSB $129,389
6.1 CNARC E2 UCSC $84,614
6.1 IRC E2 UMass $25,144
E2 Total $699,086
6.1 IRC E3 BBN $154,436
6.1 INARC E3 CMU $20,891
6.1 SCNARC E3 CUNY $26,046
6.1 SCNARC E3 IBM $47,115
6.1 SCNARC E3 IU $99,000
6.1 SCNARC E3 MIT $28,017
6.1 SCNARC E3 ND $49,500
6.1 SCNARC E3 NEU $53,235
6.1 INARC E3 NWU $69,601

NS-CTA IPP v1.4 11-6 March 17, 2010


Category Center Project Institution Amount
6.1 CNARC E3 PSU $30,000
6.1 SCNARC E3 RPI $67,528
6.1 CNARC E3 UCD $100,000
6.1 INARC E3 UCSB $48,814
6.1 CNARC E3 UCSC $25,000
6.1 INARC E3 UIUC $91,961
6.1 INARC E3 UMich $76,253
E3 Total $987,397
6.1 IRC E4 BBN $47,313
6.1 CNARC E4 CUNY $15,000
6.1 SCNARC E4 CUNY $45,580
6.1 SCNARC E4 MIT $28,017
6.1 SCNARC E4 NEU $114,714
6.1 CNARC E4 PSU $55,589
6.1 SCNARC E4 RPI $18,385
6.1 CNARC E4 UCD $30,000
6.1 CNARC E4 USC $73,000
E4 Total $427,598
6.1 IRC EDUC UDEL $48,276
EDUC Total $48,276
6.1 INARC I1 CUNY $101,193
6.1 INARC I1 IBM $118,761
6.1 INARC I1 UCSB $111,873
6.1 INARC I1 UIUC $208,536
I1 Total $540,363
6.1 INARC I2 IBM $89,756
6.1 INARC I2 PARC $35,646
6.1 INARC I2 UCSB $274,239
6.1 INARC I2 UIUC $56,664
I2 Total $456,305
6.1 INARC I3 CMU $32,590
6.1 INARC I3 CUNY $36,664
6.1 INARC I3 IBM $110,227
6.1 INARC I3 UCSB $89,624
6.1 INARC I3 UIUC $269,912
I3 Total $539,017
6.1 IRC R1 BBN $478,743
6.1 IRC R1 Harvard $79,915
6.1 IRC R1 NEU $77,693
6.1 IRC R1 UCR $124,511
6.1 IRC R1 UMass $50,288
6.1 IRC R1 UMich $93,559

NS-CTA IPP v1.4 11-7 March 17, 2010


Category Center Project Institution Amount
6.1 IRC R1 UMinn $45,611
R1 Total $950,320
6.1 IRC R2 BBN $189,264
6.1 IRC R2 CMU $38,983
6.1 IRC R2 NWU $60,940
6.1 IRC R2 RPI $66,427
6.1 IRC R2 UDEL $51,494
R2 Total $407,108
6.1 IRC R3 Artistech $48,722
6.1 IRC R3 BBN $112,995
6.1 IRC R3 NWU $32,814
6.1 IRC R3 UIUC $75,920
6.1 IRC R3 UMinn $30,407
6.1 IRC R3 Williams $28,330
6.2 IRC R3 Artistech $133,773
6.2 IRC R3 BBN $363,013
6.2 IRC R3 UDEL $119,080
6.2 IRC R3 Williams $12,142
R3 Total $957,196
6.1 IRC R4 BBN $17,543
6.1 IRC R4 RPI $17,714
6.1 IRC R4 UMass $6,705
6.2 IRC R4 BBN $4,386
6.2 IRC R4 RPI $4,428
6.2 IRC R4 UMass $1,676
R4 Total $52,452
6.1 IRC R5 BBN $412,062
6.2 IRC R5 BBN $361,502
R5 Total $773,564
6.1 SCNARC S1 CUNY $32,557
6.1 SCNARC S1 IBM $329,804
6.1 SCNARC S1 IU $11,000
6.1 SCNARC S1 MIT $14,009
6.1 SCNARC S1 MIT $40,000
6.1 SCNARC S1 ND $11,000
6.1 SCNARC S1 NEU $96,969
6.1 SCNARC S1 NWU $14,000
6.1 SCNARC S1 NYU $40,000
6.1 SCNARC S1 RPI $45,921
S1 Total $635,260
6.1 SCNARC S2 CUNY $26,046
6.1 SCNARC S2 IBM $47,115

NS-CTA IPP v1.4 11-8 March 17, 2010


Category Center Project Institution Amount
6.1 SCNARC S2 ND $11,000
6.1 SCNARC S2 NEU $66,230
6.1 SCNARC S2 RPI $503,704
S2 Total $654,095
6.1 SCNARC S3 CUNY $65,115
6.1 SCNARC S3 RPI $221,138
S3 Total $286,253
6.1 IRC T1 BBN $203,130
6.1 INARC T1 CUNY $80,849
6.1 SCNARC T1 CUNY $65,115
6.1 INARC T1 IBM $192,205
6.1 SCNARC T1 IBM $23,557
6.1 CNARC T1 NCState $110,240
6.1 SCNARC T1 NWU $56,000
6.1 INARC T1 PARC $94,341
6.1 SCNARC T1 RPI $138,020
6.1 CNARC T1 UCD $129,014
6.1 INARC T1 UCSB $37,604
6.1 INARC T1 UIUC $188,936
6.1 SCNARC T1 UMD $71,000
6.1 CNARC T1 USC $45,000
T1 Total $1,435,011
6.1 IRC T2 BBN $103,667
6.1 SCNARC T2 ND $38,500
6.1 SCNARC T2 RPI $171,997
6.1 CNARC T2 Stanford $110,240
6.1 CNARC T2 UCD $110,000
6.1 IRC T2 UCR $62,255
6.1 CNARC T2 USC $10,000
T2 Total $606,659
6.1 IRC T3 BBN $203,128
6.1 INARC T3 IBM $41,967
6.1 SCNARC T3 IBM $23,557
6.1 CNARC T3 PSU $35,000
6.1 CNARC T3 UCD $30,000
6.1 CNARC T3 UCR $27,480
6.1 IRC T3 UCR $62,256
6.1 IRC T3 UDEL $102,989
T3 Total $526,377
Grand Total $12,940,000

NS-CTA IPP v1.4 11-9 March 17, 2010


11.3 Subtotals per Institution

Category Center Project Institution Amount


6.1 IRC R3 Artistech $48,722
6.2 IRC R3 Artistech $133,773
Artistech
Total $182,495
6.1 CNARC C1 BBN $66,240
6.1 CNARC E2 BBN $44,000
6.1 IRC E1 BBN $147,286
6.1 IRC E2 BBN $177,147
6.1 IRC E3 BBN $154,436
6.1 IRC E4 BBN $47,313
6.1 IRC R1 BBN $478,743
6.1 IRC R2 BBN $189,264
6.1 IRC R3 BBN $112,995
6.2 IRC R3 BBN $363,013
6.1 IRC R4 BBN $17,543
6.2 IRC R4 BBN $4,386
6.1 IRC R5 BBN $412,062
6.2 IRC R5 BBN $361,502
6.1 IRC T1 BBN $203,130
6.1 IRC T2 BBN $103,667
6.1 IRC T3 BBN $203,128
BBN Total $3,085,855
6.1 INARC E2 CMU $66,852
6.1 INARC E3 CMU $20,891
6.1 INARC I3 CMU $32,590
6.1 IRC R2 CMU $38,983
CMU Total $159,316
6.1 CNARC C1 CUNY $75,000
6.1 CNARC C2 CUNY $98,000
6.1 CNARC E1 CUNY $22,000
6.1 CNARC E2 CUNY $24,614
6.1 CNARC E4 CUNY $15,000
6.1 INARC I1 CUNY $101,193
6.1 INARC I3 CUNY $36,664
6.1 INARC T1 CUNY $80,849
6.1 SCNARC E3 CUNY $26,046
6.1 SCNARC E4 CUNY $45,580
6.1 SCNARC S1 CUNY $32,557
6.1 SCNARC S2 CUNY $26,046

NS-CTA IPP v1.4 11-10 March 17, 2010


Category Center Project Institution Amount
6.1 SCNARC S3 CUNY $65,115
6.1 SCNARC T1 CUNY $65,115
CUNY Total $713,779
6.1 IRC R1 Harvard $79,915
Harvard
Total $79,915
6.1 INARC E2 IBM $35,070
6.1 INARC I1 IBM $118,761
6.1 INARC I2 IBM $89,756
6.1 INARC I3 IBM $110,227
6.1 INARC T1 IBM $192,205
6.1 INARC T3 IBM $41,967
6.1 SCNARC E3 IBM $47,115
6.1 SCNARC S1 IBM $329,804
6.1 SCNARC S2 IBM $47,115
6.1 SCNARC T1 IBM $23,557
6.1 SCNARC T3 IBM $23,557
IBM Total $1,059,134
6.1 SCNARC E3 IU $99,000
6.1 SCNARC S1 IU $11,000
IU Total $110,000
6.1 SCNARC E3 MIT $28,017
6.1 SCNARC E4 MIT $28,017
6.1 SCNARC S1 MIT $14,009
6.1 SCNARC S1 MIT $40,000
MIT Total $110,043
6.1 CNARC T1 NCState $110,240
NCState
Total $110,240
6.1 SCNARC E3 ND $49,500
6.1 SCNARC S1 ND $11,000
6.1 SCNARC S2 ND $11,000
6.1 SCNARC T2 ND $38,500
ND Total $110,000
6.1 IRC R1 NEU $77,693
6.1 SCNARC E3 NEU $53,235
6.1 SCNARC E4 NEU $114,714
6.1 SCNARC S1 NEU $96,969
6.1 SCNARC S2 NEU $66,230
NEU Total $408,841
6.1 INARC E3 NWU $69,601
6.1 IRC R2 NWU $60,940

NS-CTA IPP v1.4 11-11 March 17, 2010


Category Center Project Institution Amount
6.1 IRC R3 NWU $32,814
6.1 SCNARC S1 NWU $14,000
6.1 SCNARC T1 NWU $56,000
NWU Total $233,355
6.1 SCNARC S1 NYU $40,000
NYU Total $40,000
6.1 INARC I2 PARC $35,646
6.1 INARC T1 PARC $94,341
PARC Total $129,987
6.1 CNARC C1 PSU $274,766
6.1 CNARC C2 PSU $256,930
6.1 CNARC E3 PSU $30,000
6.1 CNARC E4 PSU $55,589
6.1 CNARC T3 PSU $35,000
PSU Total $652,285
6.1 IRC E1 RPI $132,854
6.1 IRC R2 RPI $66,427
6.1 IRC R4 RPI $17,714
6.2 IRC R4 RPI $4,428
6.1 SCNARC E1 RPI $15,250
6.1 SCNARC E3 RPI $67,528
6.1 SCNARC E4 RPI $18,385
6.1 SCNARC S1 RPI $45,921
6.1 SCNARC S2 RPI $503,704
6.1 SCNARC S3 RPI $221,138
6.1 SCNARC T1 RPI $138,020
6.1 SCNARC T2 RPI $171,997
RPI Total $1,403,366
6.1 CNARC T2 Stanford $110,240
Stanford
Total $110,240
6.1 CNARC C1 UCD $100,000
6.1 CNARC E2 UCD $50,000
6.1 CNARC E3 UCD $100,000
6.1 CNARC E4 UCD $30,000
6.1 CNARC T1 UCD $129,014
6.1 CNARC T2 UCD $110,000
6.1 CNARC T3 UCD $30,000
UCD Total $549,014
6.1 CNARC C1 UCR $193,000
6.1 CNARC T3 UCR $27,480
6.1 IRC E2 UCR $62,256

NS-CTA IPP v1.4 11-12 March 17, 2010


Category Center Project Institution Amount
6.1 IRC R1 UCR $124,511
6.1 IRC T2 UCR $62,255
6.1 IRC T3 UCR $62,256
UCR Total $531,758
6.1 INARC E1 UCSB $45,323
6.1 INARC E2 UCSB $129,389
6.1 INARC E3 UCSB $48,814
6.1 INARC I1 UCSB $111,873
6.1 INARC I2 UCSB $274,239
6.1 INARC I3 UCSB $89,624
6.1 INARC T1 UCSB $37,604
UCSB Total $736,866
6.1 CNARC C2 UCSC $110,000
6.1 CNARC E2 UCSC $84,614
6.1 CNARC E3 UCSC $25,000
UCSC Total $219,614
6.1 IRC EDUC UDEL $48,276
6.1 IRC R2 UDEL $51,494
6.2 IRC R3 UDEL $119,080
6.1 IRC T3 UDEL $102,989
UDEL Total $321,839
6.1 INARC E3 UIUC $91,961
6.1 INARC I1 UIUC $208,536
6.1 INARC I2 UIUC $56,664
6.1 INARC I3 UIUC $269,912
6.1 INARC T1 UIUC $188,936
6.1 IRC R3 UIUC $75,920
UIUC Total $891,929
6.1 IRC E2 UMass $25,144
6.1 IRC R1 UMass $50,288
6.1 IRC R4 UMass $6,705
6.2 IRC R4 UMass $1,676
UMass Total $83,813
6.1 SCNARC T1 UMD $71,000
UMD Total $71,000
6.1 INARC E3 UMich $76,253
6.1 IRC R1 UMich $93,559
UMich Total $169,812
6.1 IRC R1 UMinn $45,611
6.1 IRC R3 UMinn $30,407
UMinn Total $76,018
6.1 CNARC C1 USC $274,014

NS-CTA IPP v1.4 11-13 March 17, 2010


Category Center Project Institution Amount
6.1 CNARC C2 USC $147,000
6.1 CNARC E4 USC $73,000
6.1 CNARC T1 USC $45,000
6.1 CNARC T2 USC $10,000
USC Total $549,014
6.1 IRC R3 Williams $28,330
6.2 IRC R3 Williams $12,142
Williams
Total $40,472
Grand Total $12,940,000

NS-CTA IPP v1.4 11-14 March 17, 2010


11.4 IRC Funding by Category (6.1/6.2)

Note that this table is shown at the task level, any given task is either 6.1 or 6.2.

Category Center Project Task Institution Amount


6.1 IRC E1 E1.1 BBN $92,286
6.1 IRC E1 E1.1 RPI $132,854
6.1 IRC E1 E1.2 BBN $55,000
6.1 IRC E2 E2.1 BBN $59,049
6.1 IRC E2 E2.2 BBN $59,049
6.1 IRC E2 E2.3 BBN $59,049
6.1 IRC E2 E2.3 UCR $62,256
6.1 IRC E2 E2.3 UMass $25,144
6.1 IRC E3 E3.1 BBN $154,436
6.1 IRC E4 E4.2 BBN $47,313
6.1 IRC EDUC EDUC.1 UDEL $48,276
6.1 IRC R1 R1.1 BBN $380,418
6.1 IRC R1 R1.1 UCR $124,511
6.1 IRC R1 R1.1 UMass $50,288
6.1 IRC R1 R1.1 UMinn $45,611
6.1 IRC R1 R1.2 BBN $46,780
6.1 IRC R1 R1.2 Harvard $79,915
6.1 IRC R1 R1.2 UMich $46,779
6.1 IRC R1 R1.3 BBN $98,325
6.1 IRC R1 R1.3 NEU $77,693
6.1 IRC R2 R2.1 BBN $189,264
6.1 IRC R2 R2.1 NWU $60,940
6.1 IRC R2 R2.1 RPI $66,427
6.1 IRC R2 R2.1 UDEL $51,494
6.1 IRC R2 R2.2 CMU $38,983
6.1 IRC R3 R3.2 Artistech $48,722
6.1 IRC R3 R3.2 BBN $112,995
6.1 IRC R3 R3.2 NWU $32,814
6.1 IRC R3 R3.2 UIUC $75,920
6.1 IRC R3 R3.2 UMinn $30,407
6.1 IRC R3 R3.2 Williams $28,330
6.1 IRC R4 R4.1 BBN $17,543
6.1 IRC R4 R4.1 RPI $17,714
6.1 IRC R4 R4.1 UMass $6,705
6.1 IRC R5 R5.1 BBN $412,062
6.1 IRC T1 T1.1 BBN $203,130
6.1 IRC T2 T2.2 BBN $103,667
6.1 IRC T2 T2.2 UCR $62,255
6.1 IRC T3 T3.1 BBN $203,128
6.1 IRC T3 T3.1 UCR $62,256
6.1 IRC T3 T3.1 UDEL $102,989
6.1 Total $3,672,777
6.2 IRC R3 R3.1 Artistech $65,498

NS-CTA IPP v1.4 11-15 March 17, 2010


Category Center Project Task Institution Amount
6.2 IRC R3 R3.1 BBN $189,345
6.2 IRC R3 R3.3 Artistech $68,275
6.2 IRC R3 R3.3 BBN $173,668
6.2 IRC R3 R3.3 Williams $12,142
6.2 IRC R3 R3.4 UDEL $119,080
6.2 IRC R4 R4.2 BBN $4,386
6.2 IRC R4 R4.2 RPI $4,428
6.2 IRC R4 R4.2 UMass $1,676
6.2 IRC R5 R5.2 BBN $361,502
6.2 Total $1,000,000
Grand Total $4,672,777

NS-CTA IPP v1.4 11-16 March 17, 2010


11.5 Cost Share
Category Center Project Institution Amount
6.1 CNARC C1 PSU 114,000
6.1 CNARC C2 PSU 114,000
6.1 CNARC E4 PSU 20,000
PSU Total 248,000
6.1 SCNARC E1 RPI 2,784
6.1 SCNARC E3 RPI 12,327
6.1 SCNARC E4 RPI 3,356
6.1 SCNARC S1 RPI 8,383
6.1 SCNARC S2 RPI 91,954
6.1 SCNARC S3 RPI 40,370
6.1 SCNARC T1 RPI 25,196
6.1 SCNARC T2 RPI 31,399
RPI Total 215,769
6.1 CNARC T2 UCD 14,500
UCD Total 14,500
6.1 CNARC C2 UCSC 48,187
6.1 CNARC E2 UCSC 48,000
UCSC Total 96,187
6.1 IRC EDUC UDEL 20,260
UDEL Total 20,260
Grand 594,716
Total

NS-CTA IPP v1.4 11-17 March 17, 2010


12 Five Year Roadmap
The central goal of the NS CTA program is to create:

“A common mathematical language to describe the behavior of communications,


information, and social/cognitive networks, as well as their interactions and
interpenetrations” to “enable joint design of these systems to optimize mission-derived
metrics.”

At the end of five years, the NS CTA must substantially succeed at achieving this goal. This
chapter presents a high-level roadmap of how the NS CTA will pursue this goal over the next
five years.

12.1 Outline of Each Year

This chapter is structured as a series of sketches for each of the next five years. Each year begins
with a goal for the year followed by a brief discussion of how we expect to achieve that goal.
The discussion necessarily becomes more speculative in later years but should always convey the
essential research work of the year.

The sketch highlights a key task or project in each of the two major cross-cutting research
initiatives: Evolving Dynamic Integrated (Composite) Networks (EDIN) and Trust in Distributed
Decision Making (“Trust” for short).

The sketch then highlights some key tasks and projects in the year. These tasks and projects are
divided into three broad groupings:
1. Enabling efforts are tasks or projects whose results in the current year will be important
inputs or underpinnings to efforts in the following years. Substantial progress on these
efforts is critical to success in later years.
2. Enriching efforts are tasks or projects that seek to take research results and give them
greater depth. Examples including apply research results to a new domain (e.g. a
different type of network) or combining and expanding research results to create a richer
ability to model or predict. Enriching efforts expand the power of the NS CTA’s
research results.
3. Expeditions are tasks or projects that seek to get ahead of the core research. An example
expedition might seek to examine merged information and cognitive networks by
emulating two merged networks that we are not yet able to accurately model and use the
results of that emulation to inform research on models. Expeditions are vital to spur
progress by providing insight into to-be-solved problems.

NS CTA IPP v1.4 12-1 March 17, 2010


12.2 Year 1

The goal for the Year 1 is to begin a robust research program and to create infrastructure that will
be used to begin validating research results as soon as they start becoming available (no later
than Year 2).

EDIN: Task E1.1 (Harmonized Vocabulary and Ontology) is a critical task for Year 1: it will
make initial progress on ontologies so that all the projects in EDIN can use the same terms
to mean the same concepts, and can map research insights derived in one arena more readily
into other research areas. Project E2 will investigate a plethora of mathematical models to
characterize dynamic composite networks.

Trust: A critical task in this year is Task T1.1 (Unified models and metrics of Trust). This task
will create a firm foundation both for the Trust CCRI per se and for many other NS CTA
areas of focus; for example, by providing crucial trust research input into the many issues
surrounding QoI (Quality of Information).

Enabling Efforts: Broadly speaking, the CCRI Task E1.1 highlighted above will enable more
than just EDIN, but will provide the intellectual framework for collaboration and synergy
across the CTA. Another key dimension of enabling efforts is Project R3 (Experimentation
with Composite Networks), which is focused on creating the infrastructure for validation
and for shared experimentation, as well as creating the fundamental scientific understanding
required in order to effectively specify, measure, and validate NS CTA 6.1 and 6.2 research.

Enriching Efforts: Among many examples, Task S2.3 (Community formation and dissolution in
social networks) provides a strong example of how NS CTA research goes beyond
conventional detection and analysis of social networks to address the counterbalancing
question of the dissolution of communities in social networks – a process that inherently
involves understanding the information and communications networks sustaining these
communities. Similarly, the three tasks in Project C2 (Characterizing the Increase of QoI
due to Networking Paradigms) each investigate specific elements of a suite of network
paradigms (concurrency, in-network storage, and scheduling) that together span the major
options for network impact on QoI.

Expeditions: Two notable examples of “expeditions” in Year 1 are highlighted here. Task I1.3
(Modeling Uncertainty in Heterogeneous Information Network Sources) studies the
uncertainty that results from the fusion of different kinds of networked data: this work will
provide new insight into the downside of combining classes of information, whose
uncertainties ultimately impact what can be achieved by, for example, network metrics. If
the approach is validated by this research, it will suggest further extensions to a greater
variety of information across all genres of networks. In the same exploratory spirit, the
research in Task R2.2 (Impact of Information Loss and Error on Social/Cognitive Networks)
provides an early investigation of one the key drivers of cross-genre network interactions:
measuring, reasoning about, and forecasting the impact of information loss in
communication networks and information error in information networks on the structure and
performance of socio-cognitive networks.

NS CTA IPP v1.4 12-2 March 17, 2010


12.3 Year 2

The goal for Year 2 is to begin pairwise evaluation of merged network models (e.g. social
networks with communications networks, cognitive networks with information networks) both
from a theoretical perspective and in experiments in the experimental infrastructure.

EDIN: In addition to continuing various research threads, we anticipate utilizing the initial
research in mathematical models (E2.1, E2.2, E2.3), and both human and metric-driven
mobility modeling (E4.1 and E4.2 respectively) for developing a better understanding of the
dynamic behaviors of composite networks in project E3 (both short-term and longer-term
evolution).

Trust: We currently anticipate that research in trust propagation, including the research begun in
tasks T1.3 (Cognitive Models of Trust in Human-Information, Human-Human, Human-
Agent Interactions), T2.1 (Interaction of trust with the network under models of trust as a
risk management mechanism), and T3.1 (Distributed Oracles for Trust), will enable fruitful
research attacks on the propagation of trust properties across the boundaries between two
genres of networks (such as loss of trust at the information level affecting trust at the social
network level).

Enabling Efforts: Tasks R3.1 (Shared Environment for Experimentation in Composite


Networks), R3.2 (Basic Research to Enable Experimentation in Composite Networks), and
R3.3 (Applied Experimentation in Composite Networks) will have advanced by the end of
Year 1 sufficiently to expand from preliminary experimentation on selected near-term
research results to becoming fully active environments for research experimentation and
validation.

Enriching Efforts: Project C3 (Achieving QoI Optimal Networking) will extend the exploitation
of QoI to the analysis of optimal protocols for achieving QoI in composite networks, where
both the underlying communications networks and driving social networks are subject to
rapid evolution.

Expeditions: Task R1.2 (Advanced Mathematical Models) has the potential to open up new
conceptual approaches to controlling co-evolving composite networks using economically-
principled techniques, such as perturbing the decision problems of network actors to change
behaviors in useful ways, and bringing economic control paradigms to bear with passive
methods that do not insist on active elicitation of the preferences of participants. If this
economically-principled exploration proves fruitful in Year 1, we envision its Year-1
approaches informing and ultimately merging with EDIN. In this case, in Year 2, this task
may extend its exploration to investigate market-design approaches to intermediate resource
allocation across different network genres and across competing as well as cooperating
networks.

NS CTA IPP v1.4 12-3 March 17, 2010


12.4 Year 3

The twin goals for Year 3 are (1) to begin evaluation of complete merged network models (e.g.
social networks, communications networks, cognitive networks and information networks) both
from a theoretical perspective and in experiments in the experimental infrastructure and (2) to
complete the initial trust model such that it is ready for extensive validation in year 4.

EDIN: TBD

Trust: TBD

Enabling Efforts: TBD

Enriching Efforts: TBD

Expeditions: TBD

12.5 Year 4

The twin goals for Year 4 are to consolidate the results of year 3 and to demonstrate the new
trust model.

If year 3’s work goes according to plan we should have a body of verification results that
highlight the many strengths and limitations of our combined models. Furthermore, we will
have completed work ontologies and metrics and, as a community, will be consistently using the
same terminology. That is a perfect time to collectively re-examine our work in the first part of
the year and drive it to new heights in the second part of the year.

Similarly, after three years of hard work, the trust model should be mature enough for extensive
testing and demonstration.

EDIN: TBD

Trust: TBD

Enabling Efforts: TBD

Enriching Efforts: By Year-4, we envision that the extensive research on Quality of Information
(QoI) will enable QoI to be adapted to a wide range of network genres and research efforts,
providing a unifying theme to many previously disparate research issues (such as design
techniques for optimized composite networks).

Expeditions: TBD

NS CTA IPP v1.4 12-4 March 17, 2010


12.6 Year 5

The twin goals for Year 5 are to demonstrate our ability to use complete merged network models
to optimize networks for a simple military mission with simple mission metrics and to
demonstrate the use of network models over a broader domain of problems.

EDIN: TBD

Trust: TBD

Enabling Efforts: TBD

Enriching Efforts: TBD

Expeditions: TBD

NS CTA IPP v1.4 12-5 March 17, 2010

S-ar putea să vă placă și