Documente Academic
Documente Profesional
Documente Cultură
Network Science
Collaborative Technology Alliance
A detailed Table of Contents is provided at the start of each of Sections 2 through 9 and 11.
The Alliance will conduct interdisciplinary research in network science and transition the results
of this research to benefit network-centric military operations. Network science is the study of
the properties, models, and theories that apply to all varieties of networks, and the use of this
understanding in the analysis, prediction, design, and control of all varieties of networks. The NS
CTA research program exploits intellectual synergies across network science by uniting parallel
fundamental (6.1) and applied (6.2) research across the disciplines of social/cognitive,
information, and communications network research. It will drive the synergistic combination of
these technical areas for network-centric warfare and network-enabling capabilities in support of
all missions required of today's military forces, including humanitarian support, peacekeeping,
and full combat operations in any kind of terrain, but especially in complex and urban terrain. It
will also support and stimulate dual-use applications of this research and technology to benefit
commercial use. As a critical element of this program, the Alliance is creating a sustainable
world-class network science research facility, with critical mass in the NS CTA Facility in
Cambridge, MA, as well as shared distributed experimental resources throughout the Alliance.
The NS CTA also serves the Army’s NCW needs through an Education Component, which acts
to increase the pool of network science expertise in the Army and the nation, while bringing
greater awareness of Army needs into the academic and industrial research community.
This document presents the Initial Program Plan (IPP) for the Alliance, which describes the
projects and technical activities to be undertaken for the first year of its existence, Sep 28, 2009
through Sep 27, 2010. The current document incorporates all changes for the first modification
of the IPP (mod-1). The research is structured into six areas: two Cross-Cutting Research Issues
(CCRIs) that entail close intellectual integration across all four Centers, and the non-CCRI
research conducted at each Center. An essential aspect of all research (both CCRI and non-
CCRI) conducted within and between the Centers is that it is addressing critical NCW technical
challenges in the context of composite networks of networks: not simply multi-technology
environments, but environments where all genres of network (social/cognitive, information, and
communications) inherently interact. The two CCRIs are “Trust in Distributed Decision Making”
and “Evolving Dynamic Integrated (Composite) Networks” (EDIN).
Table of Contents
2. Alliance Overview ................................................................................................................ 2-1
2.1 The nature of the Alliance............................................................................................... 2-1
2.2 Oversight of the NS CTA Program ................................................................................. 2-2
2.3 Interdisciplinary Research Center (IRC) ........................................................................ 2-3
2.3.1 Principal and General Members institutions ............................................................ 2-3
2.3.2 Responsibilities ........................................................................................................ 2-3
2.3.3 Key personnel and their roles .................................................................................. 2-4
2.4 Social/Cognitive Networks Academic Research Center (SCNARC) ............................. 2-5
2.4.1 Principal and General Member institutions ............................................................. 2-5
2.4.2 Research focus ......................................................................................................... 2-5
2.4.3 Key personnel and their roles .................................................................................. 2-5
2.5 Information Networks Academic Research Center (INARC) ........................................ 2-6
2.5.1 Principal and General Member institutions ............................................................. 2-6
2.5.2 Research focus ......................................................................................................... 2-6
2.5.3 Key personnel and their roles .................................................................................. 2-6
2.6 Communications Networks Academic Research Center (CNARC) ............................... 2-7
2.6.1 Principal and General Member institutions ............................................................. 2-7
2.6.2 Research focus ......................................................................................................... 2-7
2.6.3 Key personnel and their roles .................................................................................. 2-7
2.7 Alliance summary ........................................................................................................... 2-8
The essence of the Alliance is threefold: close intellectual and programmatic collaboration,
interdisciplinary network science research, and parallel mutually-reinforcing basic (6.1) and
applied (6.2) research activities. The driving vision of this unified network science approach is
to capture the fundamental underlying commonalities across social/cognitive, information, and
communications networks, and to exploit this understanding in support of network-centric
operations. The result of our research is both to advance understanding and control of composite
systems embracing all these intensely interacting elements, and also to cross-fertilize insights,
In support of this ambitious program of fundamental research, the Consortium has an NS CTA
Facility in Cambridge, MA, with significant stable core of advanced research staff from across
the Alliance, working together with rotational and shorter-term researchers. The Facility
provides the Alliance with a central point for close interaction, as well as support for distributed
multi-user experimentation and distributed interactions across the Alliance.
A key aspect of program is its Education Component: the Centers, working with ARL and other
government researchers, is explicitly charged with expanding the scope and relevance of the
entire network science community. The Education Component will enhance the access of Army
researchers to the latest academic and industrial research, expand the pool of network science
experts by promoting network science educational programs, and enhance the awareness of
Army technical challenges throughout the network science community.
In addition to the Cooperative Agreement shared by ARL and all the Principal and General
Members, the IRC also holds a separate contract vehicle with ARL for the IRC’s Technology
Transition Component, which will support transition into specific DoD and commercial
programs. The IRC is charged with aggressively promoting transition of research products and
technologies from 6.1 and 6.2 research into direct benefit to Army programs and further
government and commercial use. This emphasis on practical impact, in turn, feeds back new
challenges into the guidance of existing and future Alliance research.
The NS CTA is guided throughout its research and transition activities by a highly collaborative
structure for oversight; key elements include:
Collaborative Alliance Manager (CAM). Overall technical management and fiscal
responsibility for the NS- CTA resides in Dr Alexander Kott of ARL, the designated NS CTA
CAM. Dr Kott is assisted by six Government Area Technical Leads (GATLs): Dr. Ananthram
Swami (IRC), Dr. Lance Kaplan (INARC), Dr. Jeff Hansberger (SCNARC), Mr. Greg
Cirincione (CNARC), Mr. David Dent (EDIN CCRI), and Dr. Jerry Powell (Trust CCRI).
Deputy CAM. The CAM is assisted by Dr Robert Cole of CERDEC, the designated Deputy
CAM for the NS CTA. Dr. Cole’s focus is on experimentation and technology transition.
Program Director. The NS CTA Program Director, Dr. Will Leland of BBN, is the
Consortium’s technical representative charged with the Consortium’s overall responsibility for
the network science basic research, coordination of research results, and management of the
cooperative agreement. Dr. Leland is assisted by Dr. David Sincoskie of the University of
Delaware, the Associate Program Director.
The following four subsections summarize the composition and focus of each of the Centers.
2.3.2 Responsibilities
The IRC is focused on performing interdisciplinary research (both basic and applied) that spans
the interests in the Alliance and leads to cross-cutting insights into network science and
innovative technologies. The IRC transitions basic research from across the Consortium (its own
and the ARCs’) into applied research, and, through the Technology Transition Component,
promotes the rapid transition of technology to meet the specific needs of a network-centric
Army. The IRC leads the Education Component as a cooperative program across all four
Centers. The IRC also operates the NS CTA Facility, which supports and coordinates distributed
collaborative research and experimentation.
To perform this work the CNARC will work closely with the other centers to fully leverage the
existence of underlying social and information networks. These networks will impact the
configuration of network flows and the mobility of nodes. The relative importance of
information and the need for multiple pieces of information to make decisions will impact how
information is transferred across a network and which security properties are applied.
The CNARC team has experts in areas of network modeling, security, optimizations, systems,
protocols analysis and experimental evaluation. Key personnel include:
Thomas La Porta (PSU, Center Director) – expertise in wireless and mobile networks, security
and protocol analysis;
J.J. Garcia-Aceves-Luna (UC Santa Cruz) – expertise in formal analysis of network structure,
scaling laws, wireless networks;
Ramesh Govindan (USC) – expertise in wireless networks, networked applications and quality of
information;
The ARL Network Science CTA performs foundational, cross-cutting research on network
science to achieve: A fundamental understanding of the interplay and common underlying
science among social/cognitive, information, and communications networks; Determination of
how processes and parameters in one of these networks affect and are affected by those in other
networks; and, Prediction and control of the individual and composite behavior of these complex
interacting networks.
The Alliance provides a unique, and uniquely powerful, program of fundamental research that
addresses the critical technical and scientific needs of Army in network-centric operations. The
most critical and Army-relevant phenomena emerge in intertwining social, information,
communications networks, not in one of them individually. These phenomena have had dramatic
impact on both conventional and irregular warfare, as well as humanitarian and peace-keeping
missions. As the first and only major Army research program to study mutual interdependency
and relations of these three dissimilar (and most influential) genres of networks, it is positioned
to achieve significant advancement in scientific understanding and practical impact. Its intensely
collaborative organization as four Centers working closely with ARL and other government
researchers serves to balance (a) in-depth expertise in each network genre; (b) relations,
dependencies, and mutual influences of three networking genres; and (c) sustained focus on the
Army’s technical challenges in network-centric warfare and operations.
The result of the NS CTA’s research, and its parallel technology transition, will be optimized
human performance in network-enabled warfare and greatly enhanced speed and precision for
complex military operations.
Table of Contents
3. Research Overview ............................................................................................................... 3-1
3.1 Military need .................................................................................................................... 3-1
3.2 Defining the IPP Research Plan ....................................................................................... 3-4
3.2.1 From military needs to research tasks ........................................................................ 3-4
3.2.2 The fundamental network science research questions ................................................ 3-5
3.3 Roadmap to the rest of Section 3...................................................................................... 3-5
3.4 Trust in Distributed Decision-Making CCRI ................................................................... 3-6
3.5 Evolving Dynamic Integrated (Composite) Networks (EDIN) CCRI ............................. 3-9
3.6 IRC non-CCRI research ................................................................................................. 3-12
3.7 INARC non-CCRI research............................................................................................ 3-15
3.8 SCNARC non-CCRI research ........................................................................................ 3-18
3.8.1 Research Challenges: ................................................................................................ 3-18
3.8.2 Research Approach ................................................................................................... 3-19
3.9 CNARC non-CCRI research .......................................................................................... 3-21
Army missions face a world in which every mission is profoundly influenced by the interactions
of complex, rapidly evolving, systems of composite social/cognitive, information, and
communications networks. Warfighter performance and mission effectiveness are critically
affected by our ability to understand, design, predict, and ultimately control the structure and
behavior of this vast composite system. To bring these general observations down to concrete
cases, we illustrate here the network science-related technical challenges faced by the modern
Army with two examples, and present some of the key research questions that directly arise from
these examples.
Consider Figure 3.1. All decision-makers, at every level of command, follow the Battle
Command framework depicted. The information needs of the squad in the street (bottom center
of Figure 3.1) are different from than that of the battalion, which sent them there (upper center).
Staff interactions impact decision making at the battalion command post (upper right), as do their
interactions with the deployed company and the formal and informal command structures found
there (lower right). Communication exchanges throughout the battlespace are dependent on the
type of communication resources available (left).
This challenging environment directly defines key network science research questions that will
be addressed in the NS CTA: for example, what is the most effective (social) organizational
structure given the realities of battlefield information and communications networking? What is
the most effective information network structure given the realities of social structures and
communications networks? What is the impact of errors and losses in communications and
Counter-insurgency (COIN). Nowhere is the need for network science research greater than in
the support of Stability Operations, where the threat uses asymmetric warfare tactics, discovering
and exploiting weaknesses in our networks and network interactions for their own gains.
Consider Figure 3.2. The insurgent fights a political battle, most of it outside the traditional
focus of tactical intelligence collection and analysis methods and systems.
US forces have adopted the “Cultural Intelligence Preparation of the Battlefield” (lower left
corner of Figure 3.2) which explains what we want to know but not how we are going to learn it.
Consider the upper right hand corner of the figure. There are three different explanations of Iraqi
tribal organization: not only do we have difficulty in establishing who the leaders are within the
structure, we still have challenges identifying the structure as well and its interaction with social
and government networks. In short, we are trying to identify all the networks in play, the
individuals (or elements) present in those networks, their relative importance vis-à-vis their
primary, secondary, and tertiary networks, and at what points these networks intersect. In
parallel, we are analyzing our own networks to ensure their interactions are organized to have
maximum effect on the operation we are supporting. We seek to strengthen our own networks
while identifying the exploitable points in our foes.
In this environment, network science, through its disciplined approach to ontology formation,
creation of operationally useful metrics, development of mathematical tools and modeling
techniques, culminating in simulations of composite networks, has the potential to provide the
means to understand the operational environment, identify the key individuals/elements of that
environment, and wargame our own Courses of Actions within those networks.
Here again, the technical challenges of the mission environment directly drive key research
questions that the NS CTA program will address. How, for example, can analysis of composite
social/information/communications networks reveal hidden communities? How can we
manipulate controllable parts of the composite system of networks to drive their evolution in
desirable ways (even though the composite includes networks for which we have no direct
control or even observation)? What is an objective measure of Quality of Information (QoI), so
our information and communications networks support social/cognitive systems by prioritizing
QoI, not merely QoS (Quality of Service)? How can effective modeling, simulation, and
hypothesis validation be performed for such composite systems of networks?
“Mayors” Also Head Smaller Small Villages Have “Mukhtar” Some Sheikhs (elders) are Household Family Hammulah = Extended Family
Towns (=Chosen) Mayors of their Towns Raht
National versus Tribal Hierarchies in Iraq Bayt = Household
“neighborhoods” with own “neighborhoods” with own “neighborhoods” with own
“council” or a “Mukhtar” “council” or a “Mukhtar” “council” or a Mukhtar Tribal “Chain of Command”
Insurgent
Insurgent Communications Networks Activity
– Fallujah Cycle Identify Target with greatest
(Hypothetical) political impact
Figure 3.2 – Network science is the key to unraveling the organizational structure and
weaknesses of an insurgency operating in a framework of political, cultural, and personal
networks where their communication networks or impact to local economies (such as
commodity-related activities) may be the only observable indicators of their presence, strength,
organization, and intentions.
The remaining subsections of Section 3 provide an overview of each of the major research
programs in the NS CTA Initial Program Plan: The Trust in Distributed Decision-Making Cross-
Cutting Research Issue (CCRI), the Evolving Dynamic Interdisciplinary Networks (EDIN)
CCRI, and the non-CCRI research programs of each Center. An essential aspect of all research
(both CCRI and non-CCRI) conducted within and between the Centers is that it is addressing
critical NCW technical challenges in the context of composite networks of networks: not simply
multi-technology environments, but environments where all genres of network (social/cognitive,
information, and communications) inherently interact.
Much further detail is provided in the corresponding Sections of this IPP (Section 4 through 9,
below). In particular, the key long-term research questions are provided for each project, and the
focused initial research questions are provided for each task. The overall structure for technical
descriptions is roughly recursive: at each level (project and task), we provide explicit key
Trust in distributed decision making is one of the two cross-cutting research initiatives in the NS
CTA being studied collaboratively in all the centers from different and complementary
perspectives. Trust is a relationship involving two entities, a trustor and a trustee, in a specific
context under the conditions of uncertainty and vulnerability. It is universally agreed that trust is
a subjective matter and it involves expectations of future outcomes. In a trust relationship, trustor
encounters uncertainty from a lack of information or an inability to verify the integrity,
competence, predictability, and other characteristics of trustee, and is vulnerable to suffering a
loss if expectations of future outcomes turn out to be incorrect. Trust allows trustor to take
actions under uncertainty and from a position of vulnerability by depending on trustee.
To address the challenges raised by understanding and establishing trust in composite networks,
the Trust CCRI is organized into three main projects that incorporate highly collaborative
research in all aspects of the problem. An overview of the projects in the Trust CCRI is given in
Figure 1 which outlines the centers participating in the different tasks in FY10. The three main
projects in the Trust CCRI are the following:
T1. Trust models and metrics: computation, inference and aggregation
T2. Understanding the interactions between network characteristics and Trust.
T3. Fundamental paradigms for enhancing Trust
These projects center around the problems related to identifying how to model and compute trust
in composite networks (Project T1), understanding how the composite network may impact trust
and how trust may impact the network (Project T2) and paradigms for enhancing trust and
propagation trust related information in the network (Project T3). In addition to the highly
collaborative nature of each task in these projects, there are close linkages between the three
projects, as described in Section 4. Furthermore, all the tasks outlined in the Trust CCRI have
close linkages to other tasks in the CTA that concentrate on various characteristics of different
network types and the dynamics of networks in the EDIN CCRI. In the following, we describe
the research that will be conducted in FY10 in each project.
T1: Trust models and metrics: computation, inference and aggregation : The overall goal of
this project is to investigate the fundamental question of how the trust relationship between two
entities in the composite network can be computed and aggregated. There is a substantial body of
In FY10, this project will concentrate on providing unified models and ontologies for computing
trust by incorporating factors and models from different networks. We will conduct in-depth
investigation of critical trust factors that are most relevant to NCO. In year 1, we will concentrate
on two key factors that play a crucial role in determining quality of information and therefore
trust in decision making. : provenance (the origin of the information, the path it traveled and the
changes that are made to it), credibility (how believable the information is). We will develop
cognitive models of credibility and technology-enabled human interactions to better understand
the perception of trust.
T2: Understanding the interactions between network characteristics and Trust: The main
research challenge in this project will be the identification of the various interactions between the
network characteristics and trust. Some of the main research topics to be studied in this project
are how trust enables the various types of interactions in the network, how the different behavior
dynamics such as conversation or mobility patterns that are related to trust can be detected, how
these different behaviors are correlated to each other and to which degree a network can sustain
these behaviors. In FY10, we will investigate economic models that deal with risk in the context
of trust, networks characteristics related to patterns of reciprocity, mobility, heterogeneity, and
the variations in channel conditions. The theory developed in Project T2 will build on the
fundamental work in Project T1 and will be used to develop the necessary paradigms for
enhancing trust, studied in Project T3.This project will be led by Sibel Adali from SCNARC.
T3: Fundamental paradigms for enhancing Trust: This project looks at questions relating to
trust meta-data – how do we propagate the trust models about entities in the network, how do we
support dynamics of trust, and how can we modify the network to increase the trustworthiness of
entities. In FY10, we will look at how to scalably disseminate the trust meta-data, how to reliably
establish trust for newcomers, and how to revoke trust as the environment changes. In later years,
we will examine how to tie the properties of the network to the emergent trust of the entities, and
possibly optimize the structure of the networks to improve the overall trust behaviour. The effort
in this project requires a comprehensive unified trust model developed in Project T1, and will
provide a framework for validating ideas in both Projects T1 and T2. Lessons learned in T3 will
feed the development of ideas in T1 and T2. This project will be led by Karen Haigh from IRC.
A basic tenet of Blue Forces Network-Centric Operations (NCO) is that the mission
effectiveness of networked forces can be improved by information sharing and collaboration,
shared situational awareness, and actionable intelligence. The effectiveness of such networks is
dependent on our ability to accurately anticipate the evolution of the structure and dynamics of
social-cognitive (SCN), information (IN) and communication (CN) networks that are constantly
influencing each other. Understanding the structure of these component networks and the
dynamics therein, and of the dynamic interactions between these networks is crucial to the
design of robust composite networks which is one of the primary goals of the NS CTA program.
Understanding the evolution and dynamics of a network entails understanding both the structural
properties of dynamic networks and understanding the dynamics of processes (or behaviors) of
interest embedded in the network. Typically, the dynamics of network structure impacts certain
processes (e.g., how information propagates through the network); but at the same time the
dynamics of processes (or behaviors) may result in alteration of network structures. Therefore,
gaining a fundamental understanding of such relationships under several kinds of network
dynamics is of paramount importance for obtaining significant insights into the behavior of
complex military tactical networks as well as adversarial networks.
To address these challenges, the NS CTA has formulated this CCRI named EDIN (Evolving
Dynamic Interdisciplinary Networks) whose focus is to characterize, analyze and design
composite networks that consist of social/cognitive, information and communication network
components. The key research questions that will be address by this research include:
How should dynamic composite networks be represented? What are their fundamental
attributes? How do we represent dynamic network structures succinctly but with
sufficient richness? How do we incorporate probabilistic or fuzzy information?
How to model time-varying dependencies and interaction between devices, information
objects, and humans? How to model conditional triggers/interactions between pairs of
networks? How to model interaction with adversary networks, which may only be
partially known or even observable?
What are the factors and rules of network formation and the subsequent co-evolution?
How do we predict the integrated effects of both emergent properties from simple rules
and complex engineered behavior from complex rules?
What are the objectives of an integrated network? What are good cross-cutting shared
metrics for measuring the network’s effectiveness in accomplishing the goal? Can the
overall network function (metrics) be systematically composed from constituent network
functions (metrics)?
To answer these research questions, we have broken down the research efforts under EDIN into
4 major projects summarized below. Each project consists of 2-3 tasks that have been named in
Figure 2. The high-level linkages between these research projects have also been shown.
Project E1: Ontology and Shared Metrics for Composite Military Networks (Leads:
Prithwish Basu, BBN (IRC); Jim Hendler, RPI (SCNARC, IRC)) This project is focused on
developing a shared vocabulary and ontology across social, information, and communication
networks. Specifically, it will identify entities in a composite network and their attributes; what
are the relationships between them and how do they affect network formation; what are the
metrics that need to be defined across composite networks irrespective of their representation
structure (this will include metrics relevant to tactical missions that use all 3 networks).
Project E2: Mathematical Modeling of Composite Networks (Leads: J. J. Garcia-Luna-
Aceves, UC Santa Cruz (CNARC); Prithwish Basu, BBN (IRC)) This project will focus on
the development of mathematical representations, models, and tools to capture the salient aspects
of dynamic composite networks and their evolution. Since our knowledge about composite
network structures is in its infancy, we will explore a number of orthogonal mathematical
modeling approaches rather than focusing on one or two. We will investigate composite multi-
layered graph theory, tensor analysis tools, temporal graphlets, dynamic random graphs,
constrained optimization etc. Since each technique is likely to have advantages and
disadvantages with respect to the specific scenario, we have to be careful to not declare winners
or losers too early in the research.
Project E3: Dynamics and Evolution of Composite Networks (Leads: Ambuj Singh, UC
Santa Barbara (INARC); Boleslaw Szymanski, RPI (SCNARC)) This project is focused on
the investigation of the temporal evolution of various structural properties of integrated
networks; We will study the short-term effects of network stimuli and dynamics, e.g., arrival of
new information flows, deletion of nodes, on the properties of a given composite network and
also investigate how composite networks co-evolve in longer time scales due to exogenous and
exogenous influences. This project will also focus on the development of cross-cutting
approaches for the prediction of dynamic evolution of networked communities (groups of
soldiers, clusters of similar documents etc.) using multi-theoretic multi-level modeling
techniques.
Project E4 - Modeling Mobility and its Impact on Composite Networks (Lead: Thomas La
Porta, Penn State University): The goal of this project is to develop a suite of mobility models
that capture metrics of specific interest to the evolution of different types of networks, and
ultimately the evolution of composite networks. We also plan for models that will make use of
R1: Methods for Understanding Composite Networks The goal of this project is to work with
a variety of alternative approaches to understanding networks composed from communication,
social/cognitive and information networks and, as these efforts uncover insights, to provide those
insights to the EDIN project. A core question for this project is: Are integrated multi-genre
networks better understood and controlled using techniques outside classic structurally-focused
approaches?
Compositional Methods: One of the primary tasks for the IRC is to provide the understanding of
how global network properties or behaviors can be composed from local properties of
information, social-cognitive, and communications networks. The non-homogeneity of the
Efficient Knowledge Extraction from Integrated Networks. This task will develop theory,
methods, and tools for efficiently extracting knowledge from large networks of various types,
such as communication, social/cognitive, and information networks, and especially, integrated
compositions of the above. The problems addressed here are fundamental in our ability to move
from the plane of measured data and observed behavior to “knowledge”. A key goal here is to
measure and understand the fundamental relationships that define the networks and their
characteristics. Such knowledge can then be utilized not only for later analysis but also for
modeling, simulation and experimental Design in R3. In addition, we want to understand the
interplay between network structure and function, specifically, when function pertains to
information retrieval and information flow.
Then, we will explore how loss or disruption of communications and information networks will
affect the function of social networks and how to prevent the effects of this loss when networks
we must maintain would be impacted. We will focus on issues such as characteristics of
information loss and error scenarios, intentional and inadvertent information loss and error, error
propagation through network topologies, estimation of information loss and error levels given
data, socio-cognitive network metrics that are robust in the face of information loss and error,
etc.
R5: Technical and Programmatic Leadership This project consolidates the consortium’s
technical and programmatic management.
The IRC will also form an Educational Coordination Committee (ECC), with membership drawn
as appropriate from the academic members of the NS-CTA. Given that Network Science is a
nacent discipline, the purpose of the ECC will be to promote establishment of Network Science
educational programs of benefit to the NS-CTA client base. Initially, the ECC will formulate
and produce short courses, archived by the IRC, on NS topics. The ECC will also develop
workshops for participants in the NS-CTA. The ECC will compile and disseminate information
on existing degree programs of interest to Network Science. The ECC will discuss the potential
for establishing educational programs at multiple institutions, and ultimately publish one or more
model curricula for degree programs, particularly graduate degrees, in Network Science.
In order for the shared facilities and research experimentation resources to be ready and actually
usable as 6.1 research needs emerge and as 6.1 research results become ready to transition into
6.2 research, we must start in year one to build the tools, the systems, the expertise, and the
shared culture to allow effective creation and continuing extension of shared facilities and
resources. This responsibility is led by the IRC. Basic scientific questions must be answered by
year-1 pure research in order to enable effective research experimentation in the network science
domain of intensely interacting interdisciplinary networks of networks; Project R3 concurrently
develops the needed applied research insights and results; and each project informs and advances
the other. We will use these projects not only to enable shared use of the Facility resources but to
enable shared use of an increasing pool of distributed resources throughout the course of the NS
CTA program. As Alliance network science research advances, we will use the insights and
capabilities created by the projects to identify highly valuable resources and facilities throughout
the Alliance that can technically and pragmatically be united with the growing distributed facility
base to allow more widespread use by Alliance researchers. It is the emerging experience of all
Alliance researchers that must drive the selection of shared research resources and the
investment of effort required to make them widely accessible and easily usable to the Alliance.
The Information Network Academic Research Center (INARC) is aimed at (1) investigating the
general principles, methodologies, algorithms, and implementations of information networks,
and the ways how information networks work together with communication networks, social and
cognitive networks, and (2) developing the information network technologies required to
improve the capabilities of the US Army and provide users with reliable and actionable
intelligence across the full spectrum of Network Centric Operations.
An information network is a logical network of data, information, and knowledge objects that are
acquired and extracted from disparate sources, such as geographical maps, satellite images,
sensors, text, audio, and video, through devices ranging from hand-held GPS to high-
performance computers. Moreover, for systematic development of information network
technologies, it is essential to deal with large-scale information networks whose nodes and/or
edges are of heterogeneous types, linking among multi-typed objects, highly distributive,
dynamic, volatile, and containing a lot of uncertain information.
Among the five projects, the first two, E and T, are two cross-center research initiative (CCRI)
projects. For these two projects, we will work closely with three other research centers,
CNARC, SCNARC, and IRC, to make good contributions on system development of cross-
center technologies in these projects. Therefore, these two CCRI projects will be detailed in their
corresponding CCRI project descriptions. This INARC IPP proposal will be dedicated to the
accomplishment of remaining three INARC-centered projects. Moreover, INARC will be
actively contributing to, together with other centers, a comprehensive CTA-wide education plan.
Here we provide a general overview of the three dedicated INARC research projects, and outline
for every project the major research problems, the tasks to be solved, the organization, and the
plan for collaboration with other centers.
I1: Distributed and Real Time Data Integration and Information Fusion This projects aims
to answer two fundamental questions: (i) “how to integrate and fuse heterogeneous data (sensor,
visual, and textual) that may be delivered over resource-constrained communication networks to
infer and organize implicit relationships to enable comprehensive exploitation of information,
and how to improve fusion by incorporating human feedback in the process?” and (2) “how to
model uncertainty resulting from resource-constrained communication, data integration, and
information fusion to enable assessment of the quality and value of information?” The focus of
this project is on large scale information extraction and fusion in the context of a large linked
information network. The goals of the project include both the derivation of such logical
linkages as well as their use during the fusion process. This project will be co-led by Charu
Aggarwal (IBM) and Tarek Abdelzaher (UIUC).
The project consists of three tasks. (1) Signal Data Integration (led by Tarek Abdelzaher),
which is to answer the key research question: “How to integrate and fuse sensor feeds (e.g.,
scalar, streaming data sources) into a semantic information network so as to a) select the most
appropriate sensors and sensors modalities for maximizing global information value of the
network given current resource constraints; b) create effective summarization of sensor data that
accounts for underlying uncertainty in data, for example those imparted by tactical networks and
sources of questionable provenance?” (2) Human and Visual Data Integration (Led by Thomas
Huang), which is to answer the key question: “How to integrate and fuse human generated and
visual data (e.g., unstructured, multi-dimensional data sources) so that a) network analysis is
enriched by virtual linkages embedded in different kinds of data leading to information gain; b)
semi-automated analysis of activities in social networks (e.g., occupancy levels in a building,
distribution of human movement, etc.) can be performed through powerful data representation
models. How can human and visual data be used to uncover the structures of social networks?”
This project is concerned with the following research problems: (1) “how to organize and
manage distributed and volatile Information networks?” and (2) “how to analyze and visualize
information networks to provide end-users information tailored to their context?” A
sophisticated information network system should present a human-centric, simple and intuitive
interface that automatically scales according to the context, information needs, and cognitive
state of users, and maintain its integrity under uncertainties, physical constraints (communication
capacity, power limitation, device computation capability, etc.), and the evolving data
underneath. This project will be led by Xifeng Yan (UCSB).
The project consists of three tasks: (1) information network organization and management
(led by Xifeng Yan and Ambuj Singh), which is to answer the key questions: (i) “how to
organize provenance metadata to enable robust, multi-resolution computation of QoI, and (ii)
how to design and implement graph query languages to enable flexible network information
access?” (2) information network online analytical processing (led by Xifeng Yan), which is
to answer the key questions: “(i) how to perform multi-dimensional analytics that allows a non-
expert to explore information networks in real time, and (ii) what are new fundamental
operators that support network-wise graph modeling and analytics?” and (3) information
network visualization (led by Tobia Höllerer), which is to answer the key question: “what are
essential in situation-aware information network visualization?”
This project considers a key research problem: “how to develop efficient and effective knowledge
discovery mechanisms in distributed and volatile information networks?” Knowledge discovery
in information networks involves the development of scalable and effective algorithms to
uncover patterns, correlations, clusters, outliers, rankings, evolutions, and abnormal relationships
or sub-networks in information networks. Knowledge discovery in information networks is a new
research frontier, and the Army applications pose many new challenges and call for in-depth
research for effective knowledge discovery in distributed and volatile information networks. The
project is led by Jiawei Han (UIUC).
The project consists of three tasks: (1) new principles for scalable mining of dynamic,
heterogeneous information networks (led by Jiawei Han), which is to answer the key question:
“what are the new principles and methods for mining distributed, incomplete, dynamic, and
heterogeneous information networks to satisfy the end-user needs?” (2) real-time methods for
spatiotemporal context analysis (co-led by Spiros Papadimitriou and Jiawei Han), which is to
answer the question: “how to mine spatiotemporal-related patterns in information networks in
Since information networks are intertwined with communication networks and social and
cognitive networks in many aspects, concrete collaboration plans with all the three centers have
been laid out in every project’s IPP, as well as the plan for the exploration of military
applications.
Currently, and in foreseeable future, the U.S. Army is or will likely be operating in coalition with
allied armies and deeply entangled within the foreign societies in which those missions are
conducted. Social networks of the allies and the involved societies invariably include groups that
are hostile or directly opposed to U.S. Army goals and missions. Such groups are embedded in
larger social networks, often attempting to remain hidden to conduct covert, subversive
operations. The challenging research issues of studying such adversary social networks include
their discovery, the construction of tools for monitoring their activities, composition, and
hierarchy, as well as understanding their dynamics and evolution. We will address these
challenges based on statistical and methods for analyzing large social networks.
Ultimately, the network benefits and impact are limited by the human’s capability to understand
and act upon the information that they are capable of providing. Hence, the human cognition is
the important component of understanding how networks impact people. Important challenges to
study human cognition in relation to network-centric interactions include finding how limits on
cognitive resources or cognitive processing influence social (or asocial) behavior or what
demands do social behaviors make on cognitive resources and processing that may limit basic
information processing mechanisms such encoding, memory, and attention. We would like also
to understand what social behaviors (e.g., trust) are most important to or most influenced by Net-
Centric mediated human communications. In terms of performance, the central challenge is to
understand how the design of Net-Centric technologies or the form and format of Net-Centric
information interact with the human cognitive processes responsible for heuristics and biases in
human performance. Finally, there is a challenge to create predictive computational cognitive
Central to the operation and efficiency of social networks is the level of trust among interacting
agents. The challenges of studying trust in networked systems are the focus of the CCRI research
in this area to which the SCNARC researchers will fundamentally contribute, as ultimately trust
is a social issue with technology and information playing an important but supportive role. The
corresponding challenges are discussed in a separate Trust section of the IPP, so are no discussed
here.
The first task, Infrastructure Challenges for Large-Scale Network Discovery and Processing, will
study the infrastructure challenge in gathering and handling large-scale heterogeneous streams
for social network research, with the context of information network and communication
network. We will conduct system level research to consider how to incorporate real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
Given that informal social network data reside intrinsically in different data sources, informal
networks can usually only be inferred from sampling larger networks. Since partially observed
data is a norm, we will derive mathematical theories to investigate the robustness of graph
sampling and its implications under various conditions. We will investigate what types of
sampling strategies are required to obtain a good estimation on the entire network. We will also
investigate analytic methods to conduct network analysis on only partially observed data.
Task 2 of Project S1, Impact Analysis of Informal Organization Networks, Second, will analyze
the impacts of informal social networks in an organization. We want to learn what and how
informal networks affect the performance of people. We shall model, measure and quantify the
impact of dynamic informal organizational networks on the productivity and performance of
individuals and teams; to develop and apply methods to identify causal relationships between
dynamic networks and productivity, performance and value; to model and measure peer
influence in social networks by statistically distinguishing influence effects from homophily and
confounding factors; and to examine how individuals change their use of social networks under
duress.
S2: Adversary Social Networks: Detection and Evolution Project S2 focuses on adversary
networks. The broad research questions which we address in this project include (i) identification
of communities in a dynamic social network, especially hidden and informal groups, based on
measurable interactions between the members, but without looking at details of the interactions,
(ii) uncovering relations between communities in terms of membership, trust and opposition or
support, (iii) observing evolution and the stable cores of communities, especially anomalous and
adversary communities or groups, and their relationships, (iv) understanding how information
flows within such communities and between communities.
To address the key research questions of this project, we defined two tasks. In the first task, S2.1
Detection of Hidden Communities and their Structures, we use interaction data over time to build
a picture of community structure and community evolution, including information pathways and
inter-community relationships. This is an appropriate first step in understanding the core of
social networks. In the second task, S2.2 Information Flow via Trusted Links and Communities
we build agent-based models to study how information pathways are affected by the varying
degrees of trust between individuals and communities in heterogeneous networks which contain
adversary (non-trusted) as well as non-adversary (trusted) networks.
S3: The Cognitive Social Science of Net-Centric Interactions This project will bring the
computational modeling techniques of cognitive science together with the tools and techniques
of cognitive neuroscience to ask how the design of the technology (human-technology
interaction), the form and format of information (human-information interaction), or features of
communication (human-human interaction) shape the success of net-centric interactions. It
includes three tasks.
The first task S3.1: the Cognitive Social Science of Human-Human Interactions investigates
cognitive mechanisms that influence human interactions. Our initial topic will temporarily
combine this effort with the CCRI on trust to investigate how cognitive mechanisms are affected
by trust and how human evaluations of trust influence our subsequent cognitive processing of
information. A year 1 focus of SCNARC 3 will be to examine the effect of trust on cognitive
processing and variations in human trust over human-human versus human-agent interactions.
Specifically, we hypothesize that differences in trust are signaled by differences in cognitive
brain mechanisms and that these differences can be detected by event-related brain potential
(ERP) measures and related to established cognitive science constructs, which in turn can be
incorporated as improvements in the ACT-R cognitive architecture. A key element in the study
will be the analysis of cognitive brain data collected from humans as they receive information
from the interactive cognitive agent or other humans.
To this end we define a new currency by which we evaluate a network: its operational
information content capacity (OICC). We believe that this approach will truly capture the value
of a network and allow a science to be developed that fundamentally characterizes the volume of
useful information that a network can transfer to a set of users. Our goal is to understand and
control network behavior so that the operational information content capacity of a network may
be increased by an order of magnitude over what is possible today.
The key challenges to achieving this involve creating comprehensive models for data delivery
capabilities that embody OICC. A key inhibitor to reaching this goal is the inability to model
complex ad hoc networks with a sufficient level of fidelity and at sufficient scale to: (1) develop
the fundamental knowledge to enable the a priori prediction of the behaviors of diverse and
dynamic communications networks; (2) understand how communication networks impact or are
impacted by information and social networks; and (3) understand trade-offs and impact of
various protocols and techniques under a wide variety of dynamic adverse conditions. These
models must accurately capture characteristics (such as heterogeneity, uncertainty, mobility),
interactions (such as between communications, information, and social networks), constraints
(such as processing, bandwidth, protocol limitations), and dynamics of these networks.
To develop these comprehensive models, one must consider the information needs of tactical
applications that share a network, and must cast the behavior of the network in light of these
needs. As the network delivers data to applications, the data is transformed into information.
Different applications require different types of sources and different network behavior in terms
of data delivery characteristics and security to distill useful information from the data received.
The data delivery characteristics and security properties of the network may vary depending on
the location of the source of data. The goal of the network is to deliver the data required from
Developing comprehensive models for data delivery capabilities that support QoI.
First developing comprehensive models defining OICC and QoI and the properties and
constraints of tactical networks that impact them. Ultimately models developed to
characterize the network behavior must account for the interactions of the physical
layer(s) present in the network with the higher layer protocols to accurately describe the
composite behavior seen by the applications in terms of their ability to gather and share
information. The focus of the network models are on the data delivery characteristics of
the network and the security properties. We recognize that achieving security properties
impacts network delivery capabilities and we will formally capture this interaction in
terms of QoI across applications. We will also leverage the existence of underlying
social networks and information networks to determine the fundamental limits of the
communications network. Social and information networks may allow a network more
degrees of freedom in terms of what type of information must be transferred, from where
information may be retrieved, and what security properties must be present for different
types of information and may also allow the communications network to better
accommodate mobility and reduce uncertainty.
We define a new objective for network science: the modeling and optimization of a
network to deliver useful information to applications.
We develop comprehensive models for the data delivery capabilities of networks that
differ from ongoing work in that they explicitly target the impact on QoI. These models
We incorporate security as a first class citizen in network modeling and account for the
interactions between security and data delivery in terms of QoI.
We fully leverage the existence of social and information networks so that the
communication network can optimize the delivery of information (or sets of equivalent
information) so that the operational information content capacity of the network is
optimized.
C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks – This project
focuses on the behavior of OICC and QoI and the factors that impact them and will seek to
answer the research questions: What are the fundamental limits of Operational Information
Content Capacity in dynamic, heterogeneous mobile networks jointly considering data delivery
and security properties? What are the relative impacts of properties and constraints of tactical
MANETs on OICC, and why?
C2: Characterizing the Increase of QoI due to Networking Paradigms – This project focuses
on the impact of select networking paradigms in increasing QoI and OICC and will seek to
answer the research questions: What are the networking paradigms that will most improve
OICC? What are the gains achieved by these paradigms and why?
C3: Achieving QoI Optimal Networking – This project focuses on the structure of protocols
that may limit the QoI achieved in practical networks and will seek to answer the research
questions: What are the structural limitations that protocols impose on achieving QoI? How can
these limitations be removed?
In the first year the largest effort will be on C1 with a significant effort on C2. Project C3 will be
deferred until the second year of the program when we have an understanding of the factors that
impact OICC and QoI and the networking paradigms that hold the most promise.
Table of Contents
4 Cross-Cutting Research Issue: Trust in Distributed Decision Making ...................... 4-1
4.1 Overview .............................................................................................................. 4-3
4.2 Motivation ............................................................................................................ 4-4
4.2.1 Challenges of Network-Centric Operations ............................................................. 4-4
4.2.2 Example Military Scenarios ..................................................................................... 4-5
4.2.3 Impact on Network Science...................................................................................... 4-5
4.3 Background on Trust ............................................................................................ 4-6
4.4 Key Research Questions ...................................................................................... 4-8
4.5 Technical Approach ............................................................................................. 4-9
4.6 Project T1: Trust models and metrics: computation, inference and aggregation4-11
4.6.1 Project Overview .................................................................................................... 4-12
4.6.2 Project Motivation .................................................................................................. 4-13
4.6.3 Key Project Research Questions ............................................................................ 4-13
4.6.4 Initial Hypotheses ................................................................................................... 4-13
4.6.5 Technical Approach ................................................................................................ 4-14
4.6.6 Task T1.1: Unified Models and Metrics of Trust (J. Opper, BBN (IRC); S. Adali, RPI
(SCNARC); D. Agrawal, IBM (INARC); J. Golbeck, UMD (SCNARC); K. Haigh, BBN
(IRC); K. Levitt, UCDavis (CNARC); S. Parsons, CUNY (SCNARC); P. Pirolli, PARC
(INARC); D. Roth, UIUC (INARC); M. Singh, NCSU (CNARC); Collaborators: S. Aral,
MIT (SCNARC); J. Cho, ARL; P. Mohapatra, UCD (CNARC); C. Partridge, BBN (IRC); M.
Srivatsa, IBM (INARC); Red team reviewers).................................................................... 4-15
4.6.7 Task T1.2: Computing Trust Factors (M. Srivatsa, IBM (INARC); R. Govindan,
USC (CNARC); T. Abdelzaher, UIUC (INARC); S. Adali, RPI (SCNARC); D. Agrawal,
IBM (INARC); T. Brown, CUNY (INARC); J. Han, UIUC (INARC); H. Ji, CUNY
4.1 Overview
Trust is a relationship involving two entities, a trustor and a trustee, in a specific context under
the conditions of uncertainty and vulnerability. It is universally agreed that trust is a subjective
matter and it involves expectations of future outcomes. In a trust relationship, trustor encounters
uncertainty from a lack of information or an inability to verify the integrity, competence,
predictability, and other characteristics of trustee, and is vulnerable to suffering a loss if
expectations of future outcomes turn out to be incorrect. Trust allows trustor to take actions
under uncertainty and from a position of vulnerability by depending on trustee. Trust is not a
singular or simple concept (Taylor, 1989) - it is broader than the concepts of cooperation and
confidence (Mayer, Davis & Schoorman, 1995) but narrower than concepts of love and like, both
of which are very board and difficult-to-define terms (Brehm, 1992). Trustworthiness is a
concept closely related to trust, based on the expected outcomes that the trust relation depends
on. The trustor expects the trustee to act in a certain way and if they do, they are considered
trustworthy.
The goal of the Trust CCRI is to enhance distributed decision making capabilities of the Army in
the context of Network-Centric Operations (NCO), in particular, for IW and COIN by
understanding the role trust plays in composite networks that consists of large systems with
complex interactions between communication, information, and social/cognitive networks.
Towards that goal, we will structure the research program in three projects: The first project will
focus on models and metrics for trust, methods for computing factors related to trust, methods
for aggregating multiple factors, models, and metrics into a uniform trust framework. The second
project will focus on how the trust relationships mediate the different interactions between
networks: how the network characteristics influence factors related to trust and how trust
modulates various network characteristics. The third project will examine paradigms for
establishment of trust and propagation of trust related information in the composite network.
Before going into the details of these projects, we will briefly describe why trust-based decision
making is crucial to NCO and which technical barriers need to be overcome to effectively use
trust in distributed decision making. This will be followed by a background review of prior work
in trust and detailed project descriptions.
There are various ways to view to trust. A rational view of trust suggests that the decision to
place trust on a specific trustee depends on a specific context: the context defines the potential
gains and losses. It defines the specific scenario under which trust is placed, to whom, and for
It is clear that to address the Network Science challenges posed by types of scenarios described
earlier, there is a need to develop methods that address trust in each individual type of network as
well as across different types of networks. Collaborations have to be established early on to seek
common theories and methods that address the concerns of individual networks but provide the
necessary interface to be integrated into integrated common picture of trust. We intend to study
trust from the following perspectives:
The Trust CCRI will attempt to facilitate constructive synergy and interactions among the ARCs
and the IRC with respect to the formalization, establishment, composition, propagation,
rescinding of trust and with respect to trust-based distributed decision making. We will also
exploit interactions with other CCRI and other topics among the ARCs and within the IRC,
especially in helping to integrate the distributed monitoring and assessment mechanisms
developed within the CNARC, INARC, SCNARC to enable trust-based distributed decision
making. The overall research activities in this CCRI will be associated with three primary
projects with distinct but mutually supportive aims. The three projects are:
Project T1: Trust models and metrics: computation, inference and aggregation (Leads:
Dakshi Agrawal, IBM (INARC) and P. Mohapatra, UC Davis (CNARC))
Task T1.1 Unified Models and Metrics of Trust (Lead: J. Opper, BBN (IRC))
In this task, we will develop a framework to enable representation of trust in a system-wide
manner, integrating trust measures across multiple abstraction layers from communication,
information, social, and cognitive networks. The work in the first year will focus on creating a
common trust taxonomy and on identifying trust factors that should be incorporated in the
aggregate model and how these trust factors influence each other and how they should be
represented individually.
Task T1.2 Computing Trust Factors (Leads: M. Srivatsa, IBM (INARC); R. Govindan,
USC (CNARC))
In this task, we will conduct in-depth investigation of the two critical aspects of information –
the credibility of information and source of information and provenance as a critical factor that
determines credibility of information. The focus in this CCRI will be on latent network
characteristics that can be exploited to better assess the credibility of information and how
provenance metadata can be modeled, organized, stored, and transmitted so that credibility of
information can be accurately assessed even when provenance metadata cannot be presented to
the user in its totality.
Task T1.3 Cognitive Trust Models (Lead: P. Pirolli, PARC/INARC)
This task concentrates on the development of a theory of the human cognitive machinery
involved in judgments of credibility (of the information itself and of other humans as sources of
information). The behavioral sciences have for the most part treated such judgments as ―black
boxes‖. Our aim is to start the development of a computational/quantitative theory of the
perceptual and cognitive processing inside of that black box, with a focus on how people process
4.6.6 Task T1.1: Unified Models and Metrics of Trust (J. Opper, BBN (IRC); S.
Adali, RPI (SCNARC); D. Agrawal, IBM (INARC); J. Golbeck, UMD
(SCNARC); K. Haigh, BBN (IRC); K. Levitt, UCDavis (CNARC); S. Parsons,
CUNY (SCNARC); P. Pirolli, PARC (INARC); D. Roth, UIUC (INARC); M.
Singh, NCSU (CNARC); Collaborators: S. Aral, MIT (SCNARC); J. Cho,
ARL; P. Mohapatra, UCD (CNARC); C. Partridge, BBN (IRC); M. Srivatsa,
IBM (INARC); Red team reviewers1)
Task Overview
The goal of this task is to develop a framework that enables representation of Trust in a holistic
system-wide way, integrating trust measures across multiple abstraction layers from
communication, information, social and cognitive networks. The benefit of a unified framework
is the ability to provide a unified trust picture that can be used by both automated reasoners and
humans for dynamic mission planning and distributed decision making.
To establish a cross-cutting research thrust on Trust, researchers require an understanding of the
different vocabularies, key research questions and motivating problems. Researchers need to
understand how factors change as they cross network-type boundaries, and how factors in one
network-type impact the factors of another network-type.
Task Motivation
Today's forward deployed net-centric operations involve a large number of concurrent mission
participants and resources linked by complex interaction networks. As each mission participant
continuously performs trust assessments to guide his personal decision making, these individual
assessments need to be combined into a unified trust picture to be useful and applicable to
distributed collaborative decision making. This involves combination and normalization of trust
metrics at different layers, measured along different dimensions, and represented in formats that
vary across the armed forces and coalition partners.
A good example to illustrate the complexity of trust models is to consider an end-to-end flow of
information as the information is gathered from a battlefield, processed, and presented to the
end-user for a decision involving significant risk. The scenario is that of a battlefield under
surveillance from a sensor (sending time-series data, image and video sources mounted on
different platforms), and text reports from human sources. The decision maker is trying to assess
situation in the battlefield and decide if the hostile forces have emerged with unanticipated
capabilities, making it necessary to counteract by sending reinforcements.
First, in this scenario, it is important to assess security properties of the sources as well as the
capabilities of the sources and technical sophistication of the adversary forces. For human
sources, it is important to understand how much they are trusted for a specific decision making
context and why. This information can originate from many sources, including other users in a
social network. Depending on the situation and full context, this results in a credibility
assessment of the sources. Various attributes of sources (identity, credibility, etc.) become part of
1
We have verbal agreements with many additional ARL-CTA researchers who have agreed to provide red-team
style reviews to the documents we generate. We will be seeking to have as many ARL-CTA reviewers as possible.
To help understand these touch-points, we will use both visual and quantitative analysis of trust
patterns in networks. Exploratory visual analysis is a useful tool for hypothesis generation,
identifying potential correlations or non-correlations. The quantitative analysis will validate
those theories against existing trust networks, and identify how important the factors are. In
particular, we will examine patterns of trust relationships to understand how trust between people
develops in social networks, in an information system, and to what degree trust correlates with
cognitive factors such as profile characteristics and personality. We will look at different levels
of the network (overall, ego networks, and network patterns) to determine trust interaction and
distribution.
Finally, we will also identify how factors of one network type influence different factors in a
different network type. For example, the objective factors of accessibility and verifiability of a
communication link affect the timeliness and authority of nodes (both contextual factors) in an
information network.
Our development approach is to involve individuals with experience in all four of the network
types and all four of the centers, as shown in the following table. In addition to the names listed
below, we have verbal agreements with many other ARL-CTA researchers who have agreed to
provide red-team style reviews to the documents we generate.
IRC SCNARC CNARC INARC
Communications Haigh, Opper, Singh, (Srivasta)
(Partridge) (Mohaptra)
Social Haigh Golbeck, Parsons, Singh Agrawal
Adali
Information Haigh, Parsons, Adali Roth, (Srivasta)
(Partridge)
As part of this activity, we intend to form collaborative reading groups and seminars so that
relevant literature can be identified, presented, and explained to the audience. The cross-
disciplinary group will be able to elaborate unclear terminology, extend terminology when it
reaches limited conclusions, identify any gaps in the discussion, and perhaps most importantly,
identify potential early collaboration opportunities.
As described in the below, we will perform a variety of experiments and analyses to validate our
hypotheses of what factors influence each other.
Developing an Aggregate Trust Model. The goal of this activity is to develop a framework that
enables representation of trust in a holistic system-wide way, integrating trust measures across
multiple abstraction layers present in communication, information, and social networks. Trust is
a multidimensional highly subjective measure relevant to a wide range of multiple properties
(e.g., trust with respect to data correctness, data integrity, timeliness, freshness, resistance to
unauthorized alteration and denials of service, and unauthorized propagation, as well as system
integrity, and trustworthiness with respect to security, reliability, survivability, people, etc.). The
benefit of a unified framework is the ability to provide a unified trust picture that can be used by
both automated reasoners and humans for dynamic mission planning and distributed decision
making.
What is needed is a suite of mechanisms that provide necessary flexibility (since end-user
missions have different requirements and therefore different context-sensitive models of trust in
information) and scalability (since communication and information processing resources are
precious in military scenarios). A unified trust picture increases the accuracy of assessments and
therefore allows entities to isolate untrustworthy entities in a more expedient, reliable manner.
Being able to efficiently compute the quality of information as dictated by the end-user mission
is essential for computing trust in information-based decision making.
The IRC will lead the effort to create a single, unified trust metric that takes into account factors
(and aggregate trust models) from each of the network types.
We believe we can construct a unified trust model, including trust assessments of nodes linked
through network, information, and social networks at different layers. The trust model will be
scalable, composable, and distributable. Key questions to address include:
We will also monitor the following Key Questions that will be addressed in other tasks:
The taxonomy of factor types above will outline which factors from each network type affect
which factors in another network type.
We will design a composable Trust metric based on the ideas for the optimization function
described in (Haigh 2007). The function is a weighted sum of factors, designed in such a way as
to be easily distributable around the network. The key components will be the important factors
identified by Taxonomy, normalized, and with initial weights based on importance values
identifiable from literature searches. The advantage of this approach is that weights or
component factors can easily accommodate factors such as cost, risk, and benefit. The key
advantage to this metric is that it can incorporate trust models (and lessons learned) from the
three ARCs and also from prior art (see list below), while also balancing the concerns of the
different network types in a composable manner.
For example, Task T1.2 investigates provenance tracking algorithms by propagating provenance
metadata in the form of annotations across database queries/views; in particular, Task T1.2
addresses similar challenges such as expressiveness and scalability, but within the limited
context of one intrinsic QoI attribute, namely, provenance. Task 1.1 will explore the applicability
(and possible extensions) of such annotation propagation algorithms (along the lines of
expressiveness, scalability and distributability) for a broader range of trust factors.
Wang and M. Singh have proposed a Bayesian approach that captures trustworthiness in terms of
the certainty with which an agent performs at a specific level. Trust will have a component of
moral hazard wherein agents may alter their behaviors to maximize their utility, and agents seek
to establish incentives under which others will be well behaved. Hazard and M. Singh have
recently argued that, under some weak assumptions, the trustworthiness of a rational agent is
isomorphic to its intertemporal discount factor.
Argumentation systems, studied by Parsons, give us another way to develop a framework.
Argumentation is a symbolic reasoning mechanism that constructs reasons (arguments) for and
against propositions of interest. In a social context, for example, where we are interested in the
reliability of some information, a reason to believe it might be the fact that this source has been
reliable over a long period. A reason against believing it might be that it is outside the realm of
expertise of the source. Argumentation provides methods for taking this structure of interlocking
reasons and determining which propositions should be accepted. This symbolic model can be
combined with numerical measures like those developed by Singh. Further, we can use the
argument structures as plausible explanations for human users, and extend the argumentation
framework to not only handle information about trust, but also to aid decision making using the
information of questionable trustworthiness which takes into account the decision attitudes (such
as the risk-averseness) of the human users. The steps in the reasoning process that argumentation
captures do not have to be homogeneous. In the context of a unified model of trust, this would
make it possible to have arguments which represent reasons based on different aspects of trust
(trustworthiness of acquaintance, quality of information and so on), with the mechanism used to
combine these arguments summarizing the way that the different kinds of trust should be used in
combination.
Create the identify which factors should be incorporated into the aggregate model,
describe how they influence each other, and clearly define how they are represented
(mathematically, logically, symbolically, or other),
Identify different aggregate Trust models that have been developed in the prior art, and
examine them for their strengths and weaknesses, and
Develop clear, structured use cases (with accompanying validation datasets) against
which we can develop a framework.
To the extent that prior work has been performed by individuals participating in these tasks, we
intend to compare and contrast these approaches directly, including evaluating them against
datasets that are readily available.
Task Overview
The goal of this task is to conduct in-depth investigation of critical trust factors that are most
relevant to NCO and yet the state of the art contains research gaps that hinder effective and
efficient assessment of these trust factors and therefore assessment of the overall trust in decision
making. In year 1, we have identified two such key trust factors: provenance and credibility.
Both play a crucial role in determining quality of information and therefore trust in decision
making. The result of the research will be fed to the overall modeling efforts in T1.1 and T1.3
and to the work being done to enhance trust in composite networks in T3.1.
Subtask T1.2a: Information Credibility: Definitions, Assessments, and Fundamental
Tradeoffs (Dan Roth, UIUC (INARC); Heng Ji, CUNY (INARC); Jiawei Han, UIUC
(INARC); Ramesh Govindan, USC (CNARC); Prasant Mohapatra, UCDavis (CNARC);
Sibel Adali, RPI (SCNARC); Malik Magdon-Ismail, RPI (SCNARC); Brian Uzzi, NW
(SCNARC))
Task Overview
In the context of NCO, a pre-requisite for trust-based decision making is the ability to assess
credibility of information and information sources. Different fields of study have examined
credibility (Rieh and Danielson, 2007) and have used subtly differing definitions. For our
purposes, we use credibility in the sense of believability: credible people are believable people
and a piece of information is said to be credible if it is believable without specific evidence or
proof. Credible information is information that has face validity and so appears to be trustworthy
circumstantial evidence rather than specific testing (Tseng and Fogg, 1999). Some languages use
the same word for these two English words, and for our purposes, these two are synonyms
(Tseng and Fogg, 1999).
Credibility is not binary, since information can have different degrees of believability.
Credibility is different from other attributes of information – for example, credibility is different
from accuracy: credible information might be inaccurate. Similarly, credibility is different from
information freshness or timeliness. The final value of a piece of information will depend on its
credibility, its accuracy, and its freshness, in addition to several other factors. Note that
credibility judgments and perceptions may also be wrong, based on heuristic factors that are not
related to the accuracy or some other rational, testable attribute of the information. An aspect of
computing credibility is to understand how credible the information will be judged or perceived
when the judgment or perception is partly based on heuristic measures and partly on intrinsic
quality attributes of the information that are rational and objective. For example, on a personal
Research Question
2
To avoid a misunderstanding by the readers of this IPP, we are using a terminology consistent with the rest of this
IPP – the work by Yin, et al. use a different terminology (e.g., veracity of information sources) to describe similar
concepts.
Thus, from the CNARC perspective, the research conducted on this task will be to examine
models and methods for assessing node credibility, and to determine methods for computing
information credibility in-network, and proactively enhancing the credibility of information. In
the short-term we will work on the following problems:
Develop a taxonomy of credibility and the objects of credibility assessment (sources,
nodes, etc.), and exemplify various credibility that may be used in tactical settings.
Explore expert assessments methods for node credibility assessment.
Understand the credibility frontier of a network (the amount of credible information that
could be delivered by a network) and how that depends upon the type of credibility
functions, as well as the type of network.
Discuss with INARC the push-vs-pull tradeoffs for the information used to assess
credibility.
Find classifiers for patterns of networked activity that can classify relationships as
credible or not.
Use natural experiments in real data to reveal trust and credibility.
Information Networks
Within the information network, by constructing a rich information network and by analyzing
interconnected information in this network, it is possible to cross-check and mutually consolidate
the information from multiple sources so that ambiguities and conflicts can be resolved
automatically and efficiently in a dynamic situation. The diversity of sources produces rich data
about the history (more generally, spatiotemporal properties) of the information objects that can
facilitate information analysis. The philosophy is to make use of interconnected entities and the
relationships among them in an information network to cross-check and mutually consolidate the
information and thus assess various quality attributes of information and information sources in a
specific context. Such an information consolidation process that works by cross-checking and
mutual consolidation of interconnected entities and their relationships in an information network
is colloquially referred to as the self-boosting of information networks.
As mentioned earlier, the credibility of a sensor or information source will be highly linked with
the context in a heterogeneous information network. For example, some sensors will be excellent
at detecting low speed, heavy objects while others will be tuned for cold, low-resolution
environments, and yet others for highly noisy environments.
For example, a battle-field may be under observation by sensors of different modalities as well as
by personnel belonging to different organizations (and therefore, with different allegiance,
equipment, training, etc.). Each information source has a different vantage point of the battle-
field in which a highly dynamic and volatile situation is unfolding. In this scenario, it may be
impractical to only rely on sensors or people with highest capabilities, pre-determined credibility,
etc. as these information sources may not be located in optimal places to observe a phenomenon.
Furthermore, even the sensors or people with highest capabilities or pre-determined credibility
may provide conflicting information about the same entity. To overcome this impasse, the idea is
Social Networks
Another notion of credibility is based on the status, prominence of individuals in the social
network (Podolny 1993). We postulate that prominence is a self-dependent mechanism: the
prominence of individuals in the society is a function of the prominence of the communities they
belong to and the prominence of communities is a function of the prominence of the individuals
participating in that community. The more prominent an individual is, it is likely that the
information they provide will be considered more credible. In order to validate our hypothesis,
we will develop quantitative methods for inferring prominence based on community and meta-
community structure. The theory should be general enough to apply to a wide variety of social
network domains. The theory should also come with efficient algorithms so that we can measure
and validate the prominence measures on large real data. To that end, we will develop external
measures of prominence so that we can validate the quantitative predictions as implied by the
theory and as measured by the algorithms.
We will first develop methods for identifying communities and how communities interact with
each other based on observed interactions between individuals. This will rely heavily on the
work done in task S.2 (Adversary networks: detection and evolution, which builds a basic
understanding of how to detect communities). We will need algorithms that scale up to million
node networks, and so we will develop two types of algorithms: i) rapid metric based algorithms
for clustering nodes in a graph into overlapping clusters based first on a low dimensional
embedding of the graph and then on rapid spectral methods for estimation of mixtures; and ii)
iterative local optimization of clusters based on density metrics which find communities with
more internal connectivity than external.
Given communities discovered by these algorithms, we will develop an iterative algorithm,
which bootstraps an initial crude estimate of prominence based on interactions into a refined
measure of prominence taking into account the community structure. As our algorithm is similar
to TruthFinder in construction – this provides us with a natural meta-algorithm that combines
information and social networks. We will test whether it is possible to improve information
credibility computation by considering degree of corroboration and the prominence of the nodes.
Subtask T1.2b Models and Maintenance of Provenance Metadata for Enhancing Trust in
Information (Dakshi Agrawal, IBM (INARC); Tasos Kementsietsidis, IBM (INARC);
Mudhakar Srivatsa, IBM (INARC); Tarek Abdelzaher, UIUC (INARC); Ted Brown,
CUNY (INARC); Ramesh Govindan, USC (CNARC); Prasant Mohapatra, UCD
(CNARC); Munindar Singh, NCSU (CNARC); Ching-Yung Lin, IBM (SCNARC); Zhen
Wen, IBM (SCNARC))
Task Overview
The goal of this task is to model provenance metadata and develop methods for maintaining
provenance metadata within the information and communication networks so that trust in
information can be estimated accurately and enhanced appropriately. It is well known that the
quality of information is a major contributor to the trust that decision makers place in
information. Each piece of information has certain important intrinsic quality attributes, and
within the context of NS-CTA, provenance has been identified as one such key intrinsic QoI
attribute. During the first year, we will focus our attention on modeling provenance and
providing scalable provenance tracking across the information and communication networks and
analyze impact of these techniques on credibility assessments.
Task Motivation
Primary sources of information in military domains (e.g., sensor readings, UAV data, satellite
images, reports from spies, coalition partners and local militia, etc.) are inherently heterogeneous
in terms of their capabilities as well as their affiliations and motives. We envision that in the
future: (i) raw data and processed information will be augmented with QoI attributes (more
specifically, with provenance metadata); (ii) end-users will query for information that is most
valuable to their mission goals and objectives (for example, a user may only want information
whose provenance can be traced to information sources from some selected organizations); and
(iii) end-users will make trust-based decisions in which credibility of the information, and
therefore, the provenance of information plays a major role.
In dynamic networks, provenance modeling and tracking becomes one of the main technical
challenges in assessing credibility due the following reasons: (a) provenance metadata grows as
the information is processed through the information pipeline; (b) provenance metadata could be
sensitive and its sensitivity may be time-dependent; (c) given the constraints of communication
networks, provenance information may sometimes be inaccurate or incomplete. Hence, we need
a suite of provenance models and network mechanisms that are flexible so that they can address
end-user missions with different requirements and cope with varying availability of
communication, information, and cognitive resources. Moreover, since provenance information
is necessary to establish the credibility of information, it is important to understand how the
incompleteness of provenance can impair credibility assessments and how the impact of
incomplete provenance can be minimized.
Key Research Questions
The overarching research problem can be summarized as follows: How does provenance affect
credibility? What are appropriate provenance models? How can trust in distributed decision
Prior Work
The prior work in this area includes the computational and mathematical cognitive models
developed in information foraging theory (Fu & Pirolli, 2007, Pirolli, 2007, Pirolli & Card,
1999), models of human interaction with information visualizations (Budiu, Pirolli, Fleetwoord
& Heiser, 2006, Pirolli, Card & Van Der Wege, 2003), and models of human category formation
while interacting with exploratory information browsers (Pirolli, 2004). All of these models deal
with people making or learning utility judgments based on the relevance of information.
Information foraging theory (Pirolli, 2007) models how human semantic memory is involved in
assessing the relevance of snippets of information, such as link text on a Web page, and how that
is used to assess the utility of various information gathering actions. This approach has been
extended somewhat to modeling interaction with highly interactive information visualizations
Technical Method
Some of the observed cues about content (e.g., as presented in a user interface) are inferred by
people to be manifestations of latent topics that the information is about (the semantic gist), but
for our research we will now additionally assume that other cues, more specifically, information
provenance visualizations and people profiles, are inferred to be manifestations of latent
categories about consistency, expertise, etc. that will be involved in credibility judgments.
We will develop theory and models of the human cognition with focus on credibility assessments
in individual human information-seeking, information-monitoring, and information-propagation
decisions. Our model development will involve the rational analysis methodology that has
gained currency in the cognitive science: (1) precise specification of the computational goal, (2)
formal analysis of the structure of the task/information environment, (3) specification of an
optimization approach assuming minimal cognitive costs, (4) tests against data, (5) iterations of
1-5, and (6) specification as mechanisms in a cognitive computational architecture (e.g., ACT-
R).
In recent years, in the cognitive sciences, the optimality analysis of steps 1-4 have derived from
Bayesian approaches, which is what we intend to do, building upon our work in information
foraging theory and theories of human category induction. Many recent theories of human
category formation bear similarity to Latent Dirichlet Allocation (LDA) and Topic Modeling in
machine learning.
The LDA generative model is a three-level hierarchical Bayesian models. An LDA topic model
assumes an inferred latent structure, L, that represents the gist of a set of discrete information
items (e.g., documents or messages), g, as a probability distribution over some T topics. Each
topic is, itself, a probability distribution over discrete cues (e.g., words), and those cues can be
associated with multiple topics. The generative statistical process selects the gist as a distribution
where g is the gist distribution for an information items. For the ith cue-token occurring in the
information item, the topic zi is selected conditionally on the gist distribution, and the cue wi is
selected conditionally on the topic zi. Thus, in essence, P(z|g) reflects the prevalence of topics
within an information item, and P(w|z) reflects the prevalence of cues within a topic. The
probabilities of an LDA model can be estimated using Gibbs sampling.
Validation Method
The modeling efforts will be driven by experimental studies that manipulate human-information
interaction techniques in ways that are theoretically predicted to affect individual human
cognitive computations involved in judging and processing incoming information as well as
predicted to affect communication of information, social action, and task performance. These
empirical studies will build upon PARC‘s data mining and user interface experience with
systems such as micro-blogging, wikis, social tagging,, RSS feeds, and email. The experiments
will involve meaningful scenarios involving decision-making and prediction tasks that require
utilization of information recommended by others and stated opinions of others. While these
experiments do not simulate a military mission, we believe that the setup is provides a rich
environment for development of cognitive models of trust that will be applicable to many
military missions.
Research Products
This research will produce a report on initial specification (e.g., in ACT-R) of a credibility
judgment model and empirical predictions of the effects of variations in information provenance
and the personal reputation of the information sources on credibility assessments. We will
develop an ecologically valid task scenario and experimental test harness involving PARC‘s
experimental interfaces to fused information from streaming content that aims to test the model
predictions, and obtain preliminary results from the execution of the experiment developed in
this task including the tests of the model against data collected in these experiments.
Subtask T1.3.b: Cognitive science basis for human trust during interactions with human
versus nonhuman agents (W. Gray, RPI (SCNARC); M. Schoelles, RPI (SCNARC))
Figure 3: Example of ERP analysis of P300 response during play of a complex video game. Figure taken from
ONR Workshop presentation of Gray and Anderson (2009).
Research Products
Q2. Transformation of Argus (Schoelles & Gray, 2001) into Argus-Army. Investigation of the
ability to synchronize EEG collected from two humans at either a single or multiple locations
(taking advantage of the speeds offered by Internet2 technology). Q3. Pilot testing of
experimental setup including Argus-Army and synchronization of EEG data. Q4. Collection of
data from teams of pure human and mixed human-agent players and preliminary analyses of data
with priority going to analyses of the ERP data.
Collaborations
SCNARC Task 3 focuses on the cognitive social science of Net-Centric interactions. As such it
seeks to investigate topics inherent to the social sciences from a perspective informed by modern
theories, models, and neuroscience methods of cognitive science. Focusing on the social science
construct of human trust is part of the larger agenda of Task 3 and may have the effect of helping
to advance the agenda of the Trust CCRI. In addition, many or most of the developments of this
initial effect will be reused in SCNARC Task 3 in further studies of trust and to examine other
social science constructs that are relevant to Net-Centric interactions.
References
J.R. Anderson, C.S. Carter, J. M. Fincham, Y. Qin, S. M. Ravizza & M. Rosenberg-Lee (2008).
Using fMRI to Test Models of Complex Cognition. Cognitive Science, 32(8), 1323-1348.
D Blei, A Ng, M Jordan Latent Dirichlet allocation. Journal of Machine Learning Research
(2003) vol. 3 pp. 993-1022
R. Budiu, P. Pirolli, M. Fleetwood & J. Heiser (2006). Navigation in Degree of Interest Trees. In
Proceedings of AVI 2006, Venezia, Italy.
R. Carrión, J.P. Keenan & N. Sebanz (2010). A truth that's told with bad intent: An ERP study of
deception. Cognition, 114(1), 105-110.
W.-T. Fu & W.D. Gray (2006). Suboptimal tradeoffs in information seeking. Cognitive
Psychology, 52(3), 195-242.
W. Fu & P. Pirolli (2007). SNIF-ACT: A model of user navigation on the World Wide Web.
Human Computer Interaction, 22(4), 355-412.
P. Pirolli (2007). Information foraging: A theory of adaptive interaction with information. New
York: Oxford University Press.
P. Pirolli & S. K. Card (1999). Information foraging. Psychological Review, 106, 643-675.
P. Pirolli, S. K. Card & M. M. Van Der Wege (2003). The effects of information scent on visual
search in the Hyperbolic Tree Browser. ACM Transactions on Computer-Human Interaction,
10(1), 20-53.
J. D. Rudoy & K.A. Paller (2009). Who can you trust? Behavioral and neural differences
between perceptual and memory-based influences. Frontiers in Human Neuroscience, 3.
M. J. Schoelles & W. D. Gray (2001). Argus: A suite of tools for research in complex cognition.
Behavior Research Methods, Instruments, & Computers, 33(2), 130–140.
M. Wibral, G. Turi, D.E.J. Linden, J. Kaiser & C. Bledowski (2008). Decomposition of working memory-
related scalp ERPs: Crossvalidation of fMRI-constrained source analysis and ICA. International Journal
of Psychophysiology, 67(3), 200-211.
Recently, IARPA has started a new initiative on TRUST which aims to develop protocols to
measure trust between individuals based on neurological, psychological, behavioral and
physiological inputs. The main aim of this work is to measure trust in face to face encounters of
dyads or small groups of people. This work is complementary to the emphasis of the work on
CTA on networks, technology driven interactions and large groups. Both CTA and IARPA
efforts are based firmly on social trust. There is a great deal of synergy between this approach
and task T1.3 and we are already initiating discussions with the researchers interested in this
work to seek possible collaborations and knowledge transfer between the projects that will
emerge under this new program.
Budget By Organization
D. Hachen, ND (SCNARC)
O. Lizardo, ND (SCNARC)
Z. Torackai, ND (SCNARC)
4.7.6 Task T2.1: Interaction of trust with the network under models of trust as a
risk management mechanism (C. Lim, RPI (SCNARC); A. Goel, Stanford
(CNARC); R. Govindan, USC (CNARC); W. Wallace, RPI (SCNARC);
Collaborators: S. Adali, RPI (SCNARC); G. Korniss, RPI (SCNARC); D.
Parkes, Harvard (IRC); S. Parsons, CUNY (SCNARC); M. Srivatsa, IBM
(INARC); M. Wellman, UMich (IRC))
Task Overview
This task concentrates on developing the science needed to study economic models that represent
and quantify trust in a network. Many military operations such as those involving COIN, IW
require the soldiers to interact with network nodes (communication, information or social) with
various levels of trust with possibly different values and objectives. In these networks, nodes are
decision makers that initiate interactions to achieve a possible gain. In some cases, the
interactions require cooperation between two nodes (in the case of two people interacting with
each other) and in other cases the interactions are in the form of flows where one node simply
decides to pass information to another node or block it (in case of a decision maker receiving
information and processing it). In these settings the gain can be thought of as a change in the
Validation Approach
Each of the following aspects (a) incomplete markets, (b) non-monetary price mechanisms, (c)
self-financing non-monetary budgetary constraints and (d) existence of equalizing discount
factor and uniform risk-neutral or Martingale measures, that are vital to the Research Program
proposed here, will have to be theoretically deduced as theorems and/or verified empirically –
only the minimum set of a priori assumptions will be made concerning these basic notions. In
this respect the work by other components of SCNARC, INARC and CNARC will be useful.
Particularly, the empirical work by (Hui, Magdon-Ismail, Goldberg and Wallace, 2009) on
natural disaster trust mechanism, the cognitive perspective on Trust provided by Task T1.3, and
the general equilibrium microeconomic models posited by members of the IRC, will positively
contribute to the validation efforts of our theoretical models. As part of the validation of first-
order models in this task, which combines the effects of stochastic dynamic optimization and
changes in the underlying social network, we will carefully check that various realistic scenarios
of local short-time changes in the social network are indeed suitably modeled by stochastic
simulation of sensing ranges. Extending this simple model to other more complex social-
cognitive networks, the notion that asset price provides an objective metric for the current
usefulness/correctness of the data collected by a given agent is part of a powerful coarse-graining
or aggregation methodology that has served the social-political-economic sciences well (Dixit)
and is expected to provide reliable metrics for information collection tasks in the current social-
cognitive contexts. In this direction we will use the Naming Game algorithm (Lu, Korniss,
Szymanski, 2008) as an efficient diagnostic and validation tool to discover underlying persistent
community-structure in a given dynamic social network. To gain further insight and construct
network-influenced trust metrics, we will also investigate fundamental models for reinforcement
learning (Arthur, 1994; Challet and Zhang, 1997) on dynamic co-evolving networks. In
particular, we will investigate the emergence of leadership structure influenced by an evolving
trust landscape (Anghel et al., 2004; Kirley, 2006).
For area (2), we will validate our findings by (a) Making sure that the simulation and formal
studies support each other, and (b) Equally importantly, we will consult experts in the area of
proxy and circumvention, and possibly State department officials, to ensure that we are on the
right track.
Research Products
Further issues concerning the dissemination of information after its efficient collection and
trustworthiness have been established will be related to the work of the IN and CN-ARCs where
the different time and spatial scales involved in the underlying dynamic socially-stratified
References
Task Overview
Trust is defined as a relationship between a trustor and a trustee. It enables interactions between
nodes: the more trust there is between two nodes, the more likely they are to interact with each
other. In addition, one can argue that even in cases where trust did not exist at first, a series of
interactions between nodes may cause trust to emerge over time. The key idea then is that
various network behaviors and patterns can be indicative of the presence (or absence) of trust, or
situations in which trust may be forming (or eroding). These two processes are interdependent:
trust has impacts on network behavior and network behaviors can impact trust. But this does not
signal a cycle but rather an ongoing process in which interactions within social networks are
intertwined with the evolution of trust relations between social actors.
Social networks have a tendency to exhibit triangular closure and transitivity such that if A
interacts with B and B interacts with C, then it is very likely that A will interact with C (Davis
1963; Granovetter 1973).This clustering can occur through a variety of mechanisms, such as in
functional groups (i.e. a task directed project) (Feld 1982, Uzzi and Spiro 2004), because of
social balance processes (Davis 1963; Hummon and Doreian 2003) or even affinity/similarity
(McPherson, Smith-Lovin and Cook 2001). The end result is clusters or islands within social
networks in which social interactions including information flows are more likely to occur within
communities and much less likely to occur between communities unless there are ―bridges‖
connecting communities (Granovetter 1973; Watts and Strogatz 1998). Communities and more
generally the typology of social networks are created through the formation and dissolution of
social ties (Burt 2000; Wellman et al 1997; Feld, Suitor and Hoegh 2007). To some extent social
tie formation and dissolution is endogenous, i.e. a function of node traits and social processes
within a social network/group such as homophily, proximity, task directives, and norms of
reciprocity (Toivonen et al 2009). But tie formation and dissolution can also be impacted by (1)
information networks (and most importantly the information flows) through influence processes
(A persuades B to be more like A) and (2) the tools and configurations of communication
systems designed to facilitate information transfer and processing. But just as important, and
maybe more important, is how information networks (and their flows) and communication
networks impact social network interactions and evolution (Doreian 2002) This is clearest in the
case of limited bandwidth resulting in actors who want/need to communicate but are unable to do
so (Aral and Alstyne 2007). Communication networks have to evolve with social networks and
the information transfer needs they generate, otherwise a communication network‘s design and
structure is likely to inhibit interaction and not facilitate it. What is needed then are smart
communication networks, i.e. networks which are aware of the interactions occurring within
them and can modify themselves (e,g,, switch communication between bands) based on those
What are the important network topological and flow (interaction) characteristics in social,
information and communication networks that are indicative of the existence or lack of trust?
Which network features and dynamic characteristics of social, information and communication
networks affect the emergence or dissolution of trust?
How do we quantify the impact of network features and dynamics on trust (using the metrics
from T1) and provide guidelines for altering network features in order to enhance trust (feeding
to T3)?
Initial Hypotheses
Social Networks: At the dyadic level, four properties are crucial to understanding social tie
evolution and therefore the extent to which trust exists between two people: reciprocity, tie
strength, homophily (node similarity) and embeddedness (the extent to which connected nodes
Communication Networks: It is possible to derive statistical features that can characterize the
network elements (nodes, links, paths) that are impacted by the network dynamics due to
security, mobility and social networking. These statistical features can be used to improve the
network architecture and protocols to compensate for performance degradations due to variations
in network dynamic.
Prior Work
Prior work in this area consider the dynamics of reciprocity in social networks in different
contexts (Garlaschelli and Loffredo, 2004, Gouldner, 1960, Hallinan, 1978, Hallinan,1980,
Hammer, 1985, Mandel, 2000, Skvoretz and Agneessens, 2007, Zamore-Lopez et. al. 2008). Our
work extends on this work for large-scale communication networks (Hachen et. al. 2009(a),
2009(b)) and complex networks in general (Barrat et. al., 2004) . The impact of mobility on
social networks has been studied by (Eagle, Pentland and Lazer, 2008, 2009). We intend to
extend this work to issues of reciprocity (Hachen et. al., 2009a, 2009b) and social influence
(Hachen and Davern, 2006).
In a preliminary effort, we have shown the probability distribution function (PDF) of the RSS
and its variations in the wireless links with respect to node mobility (Govindan, et al, 2010). A
closed form expression has been derived to characterized the PDF and relate it to a set of
mobility patterns. In a recent feature in Nature (Gammon, 2010), Professor Wu introduced the
idea of integrating social networking concepts (from the application-layer) into the foundations
of network design. Instead of provisioning global connectivity (which is the current mantra of
Internet), the connectivity could be established based on the social interactions and relationship.
This approach emulates the normal human communication behavior. Various security threats can
be eliminated by this fundamental design philosophy. In addition, this approach would also
impact the network dynamics. For example, a node no longer will have a unique identity; it will
have contextual social identity and the information will get routed based on the contextual social
identity.
Technical Approach
Social Networks
One of the main problems we will study is the evolution within social networks of reciprocity in
dyadic relationships and its relationship to trust. We expect that most ties begin in a more non-
reciprocal (unbalanced) state, and that over time either the tie dies as non-reciprocating by one
node (A) leads the other node (B) to decrease initiating interactions with that node OR the tie
evolves into a more reciprocal one as node A initiates more interactions with node B. We will
examine if these are the only two evolutionary paths, or whether it is also possible that a stable
non-reciprocal relationship can also emerge and,if so, under what conditions.
Essential to modeling the evolution of social networks is directional weighted network data
indicative of the extent to which a social actor initiates interaction with another social actor. We
define these weights as mij , the count of the number of interactions (e.g., communication events)
initiated by i and directed towards j. Reciprocity is then defined as the level of balance in a
dyad, i.e. mij m ji . The strength of a tie (edge) can be defined as the geometric mean of the
weights, i.e. mij m ji . An important type of similarity is degree assortativity defined as di dj ,
where di is the out degree of node i, e.g., the number of people with whom node i communicates.
Embeddedness of a tie can be defined as the number of neighbors that i and j have in common.
There are both weighted and non-weighted versions of this common neighbor measure.
The first stage of the research focuses on reciprocity and its correlates. Preliminary analyses
show that degree dissimilar ties are almost always reciprocal, indicating that degree dissimilar
dyads tend not to persist. However among dyads in which di d j we find both reciprocal and
non-reciprocal ties, indicating that non-reciprocal ties can persist when the nodes are similar.
The second stage involves developing algorithms and simulations to predict how newly formed
ties (which tend to be non-reciprocal) become reciprocal over time. We will examine the role
that homophily (including degree assortativity, but also similarity in terms of age, gender, and
residential location), and embeddedness play.
The third stage focuses on tie persistence or the longevity of a tie. The continuous-time hazard
R(t) R(t Vt)
rate of a tie dissolving in the next moment of time, h(t) lim , can be modeled as
Vt 0 Vt R(t)
a function of time itself (new ties have high failure rates and failure rates decline with a tie‘s age)
The fourth stage involves modeling change over time in the information flows and levels of
interaction between communicating nodes (the mij ‘s) using change and growth models. We will
investigate the extent to which growth/decline in flows is a function of the level of reciprocity
already evident in the dyad, as well as the similarity between nodes. Also, recalling
Granovetter‘s strength of weak tie thesis that non-redundant (new) information in more likely to
arrive at a node (A) from a neighbor that is not connected to any of node‘s A other neighbors, we
intend to examine whether and how changes in flows are affected by embeddedness.
The final stage will involve disaggregating data so that instead of looking at counts of directional
interactions and their changes over discrete periods of time we will focus on the events
themselves and their sequencing. We will examine different types of dyadic communication
patterns or motifs, such as conversations that entail back and forth messages and sequences
indicative of the propagation of information (e.g., A calls B, and then B calls C). While
conversations do not necessarily signal trust relationships, they may be indicative of a
relationship that will evolve into trust. Purposeful forwarding of information from one node to
another can be interpreted as a trust vote on the content and its sender. Discovering a
conversation on a specific topic that is taking place may be too costly based on textual analysis
for graphs involving millions of nodes. As a result, our aim will be to first develop methods to
discover and analyze these behaviors solely based on statistical analyses of the timing and
sequencing of these actions. We will extract relationships such as A trusts B (a directed link), or
there is a mutual trust relationship between A and B (an undirected link) from this graph.
Consider first conversations, we postulate that if two nodes converse, then they are more likely
to trust each other; and a prolonged conversation reinforces this conclusion. We first propose to
partition the message exchanges into conversations based on the times between two exchanges.
The strength of the conversational trust indicator Tc between A and B is then given by:
where H(Ci) is a measure of the balance in the conversation Ci. One possible measure of balance
is based on entropy:
where p is the fraction of the messages in the conversation sent by A.
Communication Networks
The primary causes of the dynamics from a communication network viewpoint are related to the
variations in channel quality, failures of nodes and links, and node mobility. In addition, security
attacks or perceived threats create adversarial dynamicity. Changes and variances in application
requirements and the social contexts also cause network dynamism. Social relationships create
both spatial and temporal dynamism in the network. These causes of network dynamics have an
impact on the path or links through which information is transferred through the network.
Changes in paths translate into a change in the set of nodes and links with various degrees of
trustworthiness associated with them. They also impact trust factors such as provenance,
credibility, security, confidentiality, among other aspects.
We propose to first derive a set of statistical features such as various distribution functions that
can characterize the network elements (nodes, links, paths) that are impacted by the network
dynamics. For example, in our preliminary effort, we have derived closed form expressions for
the probability distribution function of the variation in link qualities due to various node mobility
patterns in wireless networks. We propose to derive similar statistical characterizations for the
link quality variations due to interferences, and the topological variations due to node/link
failures. Deriving the correlation of social dynamics and the network characteristics in the
statistical domain will be an interesting task that will need strong collaborations between
SCNARC and CNARC researchers.
The correlation between the statistical characterizations and their impact on various trust factors
will be the second scope of our study under this task. We will analyze how the statistical
characterizations of the link quality variations impact the changes in topology, and thereby affect
the provenance and credibility of information flow through the network.
Combination of the above two scopes will provide us an insight to the correlation between the
network characteristics and trust. We will use metrics developed in Project T1 for quantifying
the correlations and the inferences derived from these correlations would help in the trust
propagation designs envisioned in Task T.3.1.
Validation Approach
The reciprocity evolution and persistence hypotheses will be validated using cellular telephone
network data on the calling and texting patterns of over 7 million subscribers of one cellular
telephone company over an extended period of time. With these data we can identify who
We will test the reciprocity and trust relationship by studying communications in Twitter. We
will extract conversation and behavior relationships from the Twitter data set and test whether
they represent similar or different types of relationships. We will then test whether our statistical
measures capture the relationships induced from actual propagation behavior that will be used as
a proxy for trust. As trust is a founding element of communities, we will examine the
communities formed by trusting relationships and test to which degree communities that are
formed by different types of relationships are similar to or different from each other.
From the communication perspective, we are in the process of collecting a large set of
experimental data from our existing QuRiNet (Wu and Mohapatra, 2010) to validate the
proposed technical approaches. QuRiNet is a unique test-bed in the sense that it can be
configured for various sensitivity analyses. Large scale experiments can be conducted using
QuRiNet to validate the statistical behaviors derived in the proposed study. The network
topology can be varied and the path of information flow can be changed to generate input data
for the validation model. In parallel, a large amount of experimental data can be analyzed to help
design better models to capture the impact of network dynamics on the reliability of the network.
Research Products
The major product that will come out of this research is a series of papers designed to advance
the theoretical understanding of how reciprocity develops in dyadic relationships. Theories about
how reciprocity emerges and the role of reciprocity in social networks are underdeveloped
because until recently researchers did not have longitudinal weighted edge data with which they
could assess the degree of reciprocity in a dyad and changes in that degree over time. Analysis
and models of the determinants of reciprocity will help us understand tie persistence and decay,
given that we expect more reciprocal ties to persist. Advancing theories about reciprocity and
balance within dyads through this research will then provide the foundation for exploring how
reciprocity (balanced flows) is related to other flow aspects, in particular the redundancy of
information and the amount and quality of the information that is being conveyed.
References
S. Aral and M. Van Alstyne 2007. ―Network Structure & Information Advantage.‖ Proceedings
of the Academy of Management Conference, Philadelphia, PA. 2007.
A. Barrat, M. Barthelemy, R. Pastor-Satorras & A. Vespignani. ―The architecture of complex
weighted networks‖. Proceedings of the National Academy of Sciences, 101(11):3747–3752,
2004.
The goal of this project is to determine how to propagate the trust meta-data so that changes in
evidence can be easily understood and accounted for, and to assess how changes in the network
can impact trustworthiness of participating nodes. For instance, sudden differences in
3
We can use this ―ground truth‖ information to validate social and cognitive trust models developed in other CTA
activities, notably T1.1 (Golbeck), T1.3 (Pirolli et al), and T2.2 (Faloutsos).
Budget By Organization
Table of Contents
5 CCRI EDIN: Evolving Dynamic Integrated (Composite) Networks .................................... 5-1
5.1 Overview ......................................................................................................................... 5-3
5.2 Motivation ....................................................................................................................... 5-4
5.2.1 Challenges of Network-Centric Operations ............................................................. 5-4
5.2.2 Example Military Scenarios ..................................................................................... 5-5
5.2.3 Impact on Network Science ..................................................................................... 5-6
5.3 Key Research Questions ................................................................................................. 5-8
5.4 Technical Approach ........................................................................................................ 5-9
5.5 Project E1: Ontology and Shared Metrics for Composite Military Networks .............. 5-11
5.5.1 Project Overview ................................................................................................... 5-11
5.5.2 Project Motivation ................................................................................................. 5-11
5.5.3 Key Project Research Questions ............................................................................ 5-13
5.5.4 Initial Hypotheses .................................................................................................. 5-13
5.5.5 Technical Approach ............................................................................................... 5-13
5.5.6 Task E1.1: Harmonized Vocabulary and Ontology for Composite Network
Modeling (J. Hendler, RPI (IRC, SCNARC); C. Partridge, BBN (IRC); A. Singh, UC Santa,
Barbara (INARC); A. Bar-Noy, CUNY (CNARC). D. Dent, Cpt. S. Shaffer, and A. Swami,
ARL) 5-14
5.1 Overview
EDIN CCRI will focus on the modeling and mathematical representation of dynamic, composite
networks comprised of social, information and communication networks, and how stimuli, both
internal (within a specific type of network) and external (from a different network type) impact
the dynamic co-evolution of the composite network structure. EDIN will also investigate a
particular type of network dynamics, i.e., mobility, which is particularly important in the military
networks context.
A basic tenet of Blue Forces Network-Centric Operations (NCO) is that the mission
effectiveness of networked forces can be improved by information sharing and collaboration,
shared situational awareness, and actionable intelligence. The effectiveness of such networks is
dependent on our ability to accurately anticipate the evolution of the structure and dynamics of
social-cognitive (SCN), information (IN) and communication (CN) networks that are constantly
influencing each other. While partly controlled by design and by TTPs (Tactics, Techniques and
Procedures), these networks also exhibit complex emergent restructuring and dynamics that
depend on mission context and the adversarial environment.
The objective of a tactical network may be viewed as delivering the right information at the right
time to the right user (persons, applications, and systems) so as to enable timely and accurate
decision-making (e.g., to enable the soldier to effectively ―shoot, move, and communicate‖), and
therefore, mission success. The tactical network is composed of multiple interacting networks:
communications networks, information networks, and command-and-control or social networks.
Understanding the structure of the component networks and the dynamics therein, and of the
dynamic interactions between these networks is crucial to the design of robust interdisciplinary
(or composite1) networks which is one of the primary goals of the NS CTA program.
Understanding the evolution and dynamics of a network entails understanding both the structural
properties of dynamic networks and understanding the dynamics of processes (or behaviors) of
interest embedded in the network. Typically, the dynamics of network structure impacts certain
processes (e.g., how information propagates through the network); but at the same time the
dynamics of processes (or behaviors) may result in alteration of network structures. Therefore,
gaining a fundamental understanding of such relationships under several kinds of network
dynamics is of paramount importance for obtaining significant insights into the behavior and
evolution of complex military tactical networks as well as adversarial networks.
1
We use the terms interdisciplinary and composite interchangeably in the context of EDIN.
The driving goal of EDIN research is to understand structure and dynamics of co-evolving
composite networks leading to predictive models that enable control of composite networks.
The complexity of the evolving dynamics of composite networks and its importance to Network-
Centric Operations and the Army mission require a composite CCRI research plan that exploits
and unifies a wide range of approaches.
In the subsections below, we describe in detail four complementary projects that involve tight
collaboration between CNARC, INARC, IRC, and SCNARC to study different aspects of
dynamic evolving networks while addressing the common theme.
Project E1: Ontology and Shared Metrics for Dynamic Composite Networks
This project is focused on developing a shared vocabulary and ontology across social,
information, and communication networks. Specifically, it will identify entities in a composite
network and their attributes; what are the relationships between them and how do they affect
network formation; what are the metrics that need to be defined across composite networks
irrespective of their representation structure (this will include metrics relevant to tactical
missions that use all 3 networks).
5.5.6 Task E1.1: Harmonized Vocabulary and Ontology for Composite Network
Modeling (J. Hendler, RPI (IRC, SCNARC); C. Partridge, BBN (IRC); A. Singh,
UC Santa, Barbara (INARC); A. Bar-Noy, CUNY (CNARC). D. Dent, Cpt. S.
Shaffer, and A. Swami, ARL)
Task Overview
The primary goal of this task is to model a composite network. This involves modeling it at the
communication, information, and social networking levels. We define a composite military
network (CMN) as consisting of one or more interconnected networks composed from each of
the three disparate types of networks that interact with and typically influence each other (and
military operations); these are communications networks (CN), information networks (IN) and
social cognitive networks (SCN). In this task, we will identify the fundamental entities that
constitute CMNs.
Task Motivation
We believe that a dictionary or handbook of network science will be needed if the composite
military networks we describe in this section are to be modeled and if those models are to inform
the eventual military need to optimize our network infrastructures, to protect our networks
against attack, and to enable us to find minimum effort means to upset enemy networks,
especially those used by non-nation-state actors.
Key Research Questions
How do we model the structure of a composite network so as to delineate the specific
dependency relationships between structural entities at the SCN, IN, and CN levels?
How do we represent the dynamics of these composite networks as they change over
time?
Initial Hypotheses
OWL will provide a sufficient level of expressivity for expressing composite network
structure. Extensions of OWL will be needed, and designed, to better represent the
dynamic changes in networks over time.
Prior Work
Since dynamic composite networks are a new topic of research, existing modeling approaches
cannot be applied directly to develop a model in a straightforward manner. However, modeling
nodes, edges, and flow information in various classes of networks has received a lot of attention
in the past. Entity-relationship modeling is a database modeling method, used to produce a
semantic data model of a system [Chen76] – more recent variants of this exist. Ontology
languages such as standards SKOS, RDF(S) [Rdfs04] and OWL 2 RL [Owl09] exist for semantic
modeling of complex relationships between concepts. These will be leveraged in this task.
However these approaches are not directly amenable to a succinct representation of spatio-
temporal dynamics or probabilistic representation of various network attributes. Hence, new
SCN IN CN
Node Entities with intelligent Information objects: Devices that can source,
decision making Concepts, Documents sink, store, and forward
capabilities (humans, Content metadata data packets
robots)
2
In principle these relations can be n-ary. Mathematically, n-ary networks can be reduced to binary ones by the
introduction of redundant or specially-labeled arcs. In practice, however, visualization and analysis tools may need
to include representations of the n-ary relations. Note that a relation (edge) can have continuous-valued attributes.
Having modeled relationships between nodes inside each type of network, we will focus on
modeling the relationships between nodes, edges, and flows of SCN and IN, IN and CN, and CN
and SCN.
Examples of common types of relationships include:
Mapping relationships between nodes: a (human) node in SCN is mapped to a (wireless
ad hoc) node in CN if the former uses the latter as his communications device; the human
node is mapped to an information node in IN if the former is a consumer of that
information, etc.
Producer/consumer dependency relationships: a node in IN with its producer and
consumer nodes in SCN can be mapped to a flow in CN
Observer relationships: a human operator could be observing/monitoring/controlling a
CN link.
5.5.7 Task E1.2: Shared metrics for Composite Military Network Analysis (W.
Leland, P. Basu, BBN (IRC); A. Bar-Noy, CUNY (CNARC); J. Hendler, RPI (IRC,
SCNARC); A. Singh, UC Santa Barbara (INARC))
Task Overview
This task is responsible for the identification of cross-cutting metrics for composite networks in
general and composite military networks in particular.
Task Motivation
Clearly, each of the separate Centers will have to develop core network metrics by which they
can meaningfully measure the properties of the dynamics of the specific kinds of networks that
fall in their specialized areas. While a few of these metrics may be common to the study of all
networks, most of these metrics are specific to the particular type of network be it CN, IN or
3
For example, the Protégé tool was developed for the needs of the bioinformatics community, TopBraid is based on
SWOOP, which was developed under funding from the intelligence community, but is primarily aimed at industrial
enterprise needs. In each case, specific functionalities were added for these ―primary customer‖ groups.
4
With this structure of subtasks being pushed down from higher to lower echelon commanders in the military, time
is of the essence in many cases. To assist, it would be prudent to work a plan into this network that would prevent,
or at least identify, information to leaders, which they are attempting to gather. In other words, if a commander was
attempting to gather specific information and, without his/her knowledge, the information was already out there on
the network, then the network would be able to find this and provide the data back to the commander. This would
prevent double efforts and provide the commander with more time to complete other tasks.
Research Milestones
Budget By Organization
5.6.6 Task E2.1: Unifying Graph Representations of Composite Networks (P. Basu,
I. Castineyra, BBN (IRC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY
(CNARC); A. Singh, X. Yan, UCSB (INARC); R. D’Souza, UC Davis (CNARC); A.
Swami, ARL)
Task Overview
This task aims at developing graph theoretic representations of composite networks and studying
their basic properties. It also aims at studying summary representations of massive networks,
which are useful for fast extraction of useful information.
Task Motivation
Composite networks consist of social, information, and communication networks and each
consist of constituent entities (nodes, edges, flows, formation rules) that have been summarized
earlier. Since graphs are the most natural representation for each of the network genres, it is
natural to think of how individual graph representations for each network can be ―composed‖ to
represent a composite network. Metrics can then be defined on such ―composite graphs‖ and this
graph representation can then be used effectively for various purposes, such as the following:
Distributed search for a piece of information
Determination of critical bottlenecks in the joint CN+IN+SN space
Prediction of evolution of network attributes
Research Milestones
Government Funding
Organization Cost Share ($)
($)
BBN (CNARC) 44,000
BBN (IRC) 177,147
CMU (INARC) 66,852
CUNY (CNARC) 24,614
IBM (INARC) 35,070
UCD (CNARC) 50,000
UCR (IRC) 62,256
UCSB (INARC) 129,389
UCSC (CNARC) 84,614 48,000
UMass (IRC) 25,144
TOTAL 699,086 48,000
5.7.6 Task E3.1 Analysis of Causal Structure of Network Interactions (L. Adamic,
Michigan (INARC); P. Basu and W. Leland, BBN (IRC); C. Faloutsos, CMU
(INARC); J.J. Garcia-Luna-Aceves, UCSC (CNARC); A. Singh, UCSB (INARC);
Q. Zhao, UC Davis (CNARC); Post-doc researcher (PSU); A. Swami, ARL)
Validation approach
In the first year, we will concentrate on a small number of network stimuli (namely, deletion of
information node, failure of a communication device, change of a relationship from ally to
adversary) and analytically study their impact on key shared network metrics such as number of
affected information flows over a period of time. The key observation here is that above events
will not only change the structural properties of the network but also impose constraints on the
existing information flows in the network as well as their communication pathways, for example,
routing information through a non-ally or fusing information at a communication device owned
by a non-ally are not viable options anymore. This may hurt performance of some flows since it
may reduce the number of potential paths, but on the other hand, it may improve performance of
some other flows that were competing for resources but were not impacted by the network
stimulus. We will begin our research by studying such behaviors analytically for simple static
topologies of IN, CN, and SCN and simple causal mappings between them. Eventually we will
undertake more realistic models of dependencies when such models become available from other
research in the Consortium. This task will maintain close collaboration with task E2.
As mentioned in E2, mappings between networks may be highly dynamic in a spatio-temporal
sense. We will analytically study such classes of dynamics and their impact in this task. In
particular we will analyze the rate of change of composite network metrics as a function of the
rate of change of the aforementioned inter-network mappings.
Empirically, we will study many co-evolving networks in the virtual world Second Life,
including chat, social, asset transfer, and monetary transactions. Activities in virtual worlds
resemble many real-world activities, such as coordination and information distribution. New
information may include the arrival of a new asset or landmark designating a new assembly
point. A time series analysis will establish correlation between information arrival and chat
network structure.
Summary of Military Relevance
What are the appropriate models and representations for military communication, information,
and social networks? How are the individual networks linked together? How do each individual
network and the composite network evolve? What are the causal paths in the composite network?
What are the fundamental scientific underpinnings of this evolution? These are the key scientific
questions of network science that will be addressed in this project. These answers will
eventually lead to sound design and analysis techniques that can be used to analyze existing
military networks and to develop new networks so that military operations can be carried out
successfully.
Research Products
We will study relationships between information flow and network structure, and how best to
represent the relationships across multiple time-scales. These studies will lead to
recommendations for analyzing integrated networks (including military networks), and research
manuscripts.
Task Overview
This task focuses on how nodes and edges simultaneously and interdependently co-evolve. It
involves the development of both computational models and empirical examination of these co-
evolutionary processes.
Task Motivation
Co-evolution is endemic in network phenomena. If someone becomes ill, it changes their
interaction patterns, which then affects the spreading process. If political preferences affect
interaction patterns, and interaction patterns affect political preferences, then a wide variety of
emergent patterns are possible. These in turn lead to changes in the information network (as the
interconnections between logical objects evolve) and in the communication network (as the
communication patterns evolve). Similarly, changes in the communication network or the
information network can drive changes in the other two networks. Despite the importance of co-
evolution, it has been a neglected topic in network science until recent years, because of the
methodological challenges of studying changes in edges and nodes of a single network and
changes across networks. Our aspiration is to push forward the analytic and conceptual
machinery for the study of co-evolution of integrated networks. This is a fundamental component
in any formal model of network formation as well as in the understanding of the network
response in case of exogenous or endogenous stress condition. This task is complementary and
extends some of the objectives of task E3.1 aiming at the study of how information
delivery/alteration might be reflected in changes of the network structure. Here we focus mostly
on social and behavioral processes and the continuous feedback loop structure-dynamics in the
network evolution. The two tasks as a whole encompass the necessary elements needed for a
complete description of dynamical networks.
Initial Hypothesis
Networks and the dynamics of agents are changing continuously and any communication,
mobility or contagion process in large scale networks is the outcome of the co-evolution
mechanisms of all the network components and dynamical processes. These co-evolution
mechanisms cannot be neglected in the description of the adaptation, resilience and response of
networks to critical situations. We begin with the assumptions (1) that the behavior of elements
of any networked systems is influenced by the interaction pattern encoded in the network
structure 2) that the network structure changes according to the behavior of the elements of the
network. This co-evolution mechanism, in the form of a continuous feedback loop, is a key
element of the dynamical evolution of networks across different systems. Infrastructures and
nodes with a diffusion coefficient dij that depends on the node degree, node attributes and/or the
mobility matrix. Within each node particles may react according to various schemes that
represent possible interactions among particles/individuals. This framework has the advantage of
How do social, demographic and economical factors constrain network topology and co-
evolution?
How do the embedding space and administrative boundaries affect network topology and
co-evolution?
How do groups, social, and communication networks co-evolve? In order to tackle this
question we will test several hypotheses such as the following: Are social ties the lie
within groups are more likely to be active as measured in chat activity and are also more
likely to become embedded in additional groups? Are Individuals who are ―physically‖
proximate in the online world more likely to chat with one another? Are individuals who
are more likely to chat with one another more likely to develop a game partnership
relation (engaging in joint gaming activities)? Are individuals who engage in joint
activities more likely to trade with one another? Are individuals who trade with one
another more likely to co-locate in the virtual world?
How can the co-evolution dynamics be plugged in the particle-network framework by
general nonlinear coupling mechanisms?
Is it possible to define general types of co-evolution mechanisms that can be mapped into
specific particle-network equation classes?
Is it possible to extend the particle network-framework to non-Markovian processes by
ad-hoc approximations and quasi-stationary effective coupling terms?
Technical Approach
We plan to study the co-evolution of integrated networks using two interrelated mathematical
formalisms: multi-scale networks and particle dynamics.
Characterization of multi-scale networks. The nodes of techno-social multi-scale-networks
usually have fractal distributions over multiple scales in space. These geographical distributions
and the corresponding demographic (population) distributions strongly constrain and define the
evolution and the structural and transport properties of these networks. We will analyze the
correlations of topological and dynamical properties with the actual demographic, geographical
and economic factors underneath the structure of these networks, including: (a) spatial
distribution of nodes and edges and their correlation with population density, and (b) correlation
of traffic flows and network evolution on edges with geographical and population attributes.
Particle-network framework. The reaction-diffusion framework has the advantage of allowing
suitable approximations including explicitly the discrete nature of particle packets and the
underlying topology of the network. In particular, one of the main issues to be considered in
multi-scale networks is the ubiquitous presence of very heterogeneous topologies dominated by
statistical distributions with heavy-tails. In the reaction-diffusion framework it is possible to
introduce explicitly in the description of the system classes of statistically equivalent nodes of
where Vk is the number of nodes with degree k and the sums run over all nodes I having degree ki
equal to k. The degree block variable Nk is therefore representing the average number of particles
in all nodes with the degree k. While this representation corresponds to a homogeneous
approximation, it allows working explicitly with arbitrary network topologies and to
progressively introduce higher order structural properties such as clustering and multi-point
correlations. In addition, it allows for an explicit analytic solution while taking into account a
wide range of dynamical routing strategies as well as specific processes modeling the injection or
absorption of particles/individuals/information in the network. This is due to the particular nature
of the general framework in which the dynamics of particles is represented by a mean-field
dynamical equation expressing the variation in time of the sub-populations Nk(t) in each degree
block as:
This basic reaction-diffusion framework can be used to study the propagation of information
particles as well as individuals mobility whose dynamics depend on or are modulated by the
structural properties (e.g., node degrees) of the underlying network. The above basic reaction-
diffusion framework can be further extended by including multiple particle types, change of
states or awareness, birth-death process (or some other processes) at nodes, for example, to
model the injection/absorption of information/individuals. Such an extended framework is
particularly useful for modeling networks under critical conditions or stress. In this task the
reaction-diffusion formulation on networks will allow us to explore the correlation feedback and
co-evolution of the network structure and the equilibrium stationary distribution of particles and
their flows. An ambitious objective is the formulation of particle-networks classes of equations
that correspond to dynamical types of co-evolution networks as categorized in the data analysis.
We also plan the study of networks combining engineered and emerging properties to
discriminate their effect in the co-evolution process. Another main objective during the first year
is the extension of the particle-network framework through approximate and quasi-stationary
approximations to non-Markovian dynamics that in many cases characterize social mobility and
interaction processes.
Validation
Validation will occur through the accumulation of results through multiple technical approaches,
and, most importantly, through examination of multiple data sets. In particular, the critical
validation questions will be: (1) which results are robust across a wide variety of data sets, and
(2) where there are powerful results that are heterogeneous across data sets, what contextual
details create this heterogeneity; 3) what general mathematical classes can be found in the formal
description of co-evolution processes that find evidence in the real data analysis. Data analyzed
will range from the dynamics of person-to-person interaction obtained with experiments that
assess face-to-face social interactions with active RFID to large-scale multimodal mobility
networks in more than 35 countries and worldwide long-range transportation systems such as the
airline transportation network. This will allow to leverage over more than 40 large scale datasets,
Task Overview
The primary goals of this task include: group analysis and their formation using real-world data
to identify the socio-psychological and environmental key factors (internal, external), methods
of community detection, development of agent-based computational models for in-silico testing
and prediction of networked group behavior, and computational algorithms for network analysis
on several levels of resolution (multi-scale analysis).
Task Motivation
In network science today, efficient algorithms for large scale dynamic network analysis,
including detection of communities, are an ongoing, central and by far a non-finished research
area. Once such algorithms are developed, then the social/behavioral theories could be modeled
in-silico, and the various hypotheses put forward by sociological scientists validated. Armed with
knowledge on the most important factors and mechanisms for adversarial group formation,
evolution, dissolution, a multi-scale agent-based environment can be created.
The modern military increasingly needs to rely on bottom up network processes, as compared to
top down hierarchic processes. This is particularly true in interactions with social networks
existing in foreign societies in which many Army missions are conducted. How can we use the
massive streams of data to detect adversarial networks? How can we quickly extract the most
meaningful information for the soldier and decision maker that is useful in all aspects of their
operations from supporting humanitarian operations to force protection and full combat
operations. The models and methods investigated in this task will be evaluated for potential
military information network applications. Although we may not be able to get real military
mission datasets, we plan to use the available data from civilian application to investigate the
possible cases and scenarios and generate simulated data for proof of the concepts and promote
the potential applications of the developed theory and technology in military applications. The
multi-scale agent-based environment that we will be developing can be used as a computational
framework to predict network behavior based on partial data and input from the military
intelligence services, either from past, current or future operational areas of the US Army.
Key Research Questions
The research question here is to understand the mechanism of group formation and evolution.
Why – given a wide set of potential group members in the network–- an individual might be
more likely to join one group of individuals rather than another? Is this group formation on the
basis of attributes of the individuals (such as skill, level, role, resources) as well as the extant
links (such as communication, financial transactions, exchange of materials or services) among
individuals within the network? How do we analyze the network for such behavior rules? Or,
alternatively how do we verify that a given set of group formation rules fit a given dynamic
network?
Initial Hypotheses
We hypothesize that the social/behavioral theories about network dynamics could be modeled in-
silico on large scale real data. Currently, there is a lack of efficient and scalable algorithms for
modeling community detection and network dynamics. To address this issue, we will investigate
Each task will have monthly teleconferences to coordinate the planned research work. Each task
itself will naturally decompose into subtasks and these will have closer collaborations in the form
of joint papers and joint advising on PhD theses. There will be organized visits by the faculty and
staff of the collaborating institutions. Wherever possible, meetings will be coordinated
opportunistically with travel to conferences and meetings. The UCSB Research scientist located
at NS-CTA facility will spend a part of his/her time working on this project.
Research Milestones
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 154,436
CMU (INARC) 20,891
CUNY (SCNARC) 26,046
IBM (SCNARC) 47,115
IU (SCNARC) 99,000
MIT (SCNARC) 28,017
ND (SCNARC) 49,500
NEU (SCNARC) 53,235
NWU (INARC) 69,601
PSU (CNARC) 30,000
RPI (SCNARC) 67,528 12,327
UCD (CNARC) 100,000
UCSB (INARC) 48,814
UCSC (CNARC) 25,000
Government Funding
Organization Cost Share ($)
($)
UIUC (INARC) 91,961
UMich (INARC) 76,253
TOTAL 987,397 12,327
Task Overview
The long-term goal of this task is to gain a basic quantitative understanding of the fundamental
forces that shape the topology as well as the spatial and geographical properties of social
networks. This requires extensive tool development and database preparation. During the first
year we plan to start this exploratory analysis and lay the groundwork for our and the other team
members‘ work during the coming years.
Task Motivation
The motivation for this task is to understand the interplay between the physical space (location)
and social network.
Key Research Questions
In this task we answer the following research questions: Can mobility models be generated from
large sets of trace data and military planning scenarios that can be used for understanding the
formation of communities of interest and communication networks? The answer to this question
will lead to be able to analyze the impact of mobility on network structure.
The team currently has in its possession the best curated and best understood mobile phone
dataset in existence. It is therefore the ideal resource with which to develop methods to analyze
the network and mobility information flows that result from mobile phones. While we would
anticipate substantial adaptation would be required in moving from these data to data from, for
example, areas characterized by violent conflict, these are, at this time, the ideal data with which
to start developing models and tools to understand normal and atypical patterns in networks and
mobility. These data are particularly useful because we have identified events, such as
bombings, earthquakes, power outages, etc, that would serve as models for detecting anomalous
movement and communication events associated with societal disruptions. Over the longer term
we seek to cultivate a broader range of data sets, including, potentially, areas characterized by
strife, where we could link known conflict events to patterns observed in the data.
Objective 2: In aiming to understand the interplay between physical space and social networks,
we first need to understand the mobility patterns. We therefore plan to start a modeling effort to
5.8.7 Task E4.2 - Deriving Metric-Driven Mobility Models (K. Psounis, USC
(CNARC); P. Mohapatra, UC Davis (CNARC); T. La Porta, PSU (CNARC); P.
Basu, BBN (IRC); T. Brown, CUNY (CNARC); A. Swami, ARL)
Task Overview
In this task we consider the phenomena that is important to the networks being impacted by the
mobility and performance metrics of interest. We generate models to produce traces that
conform to the statistics of these metrics. The type of mobility being modeled will also impact
communities of interest that form based on locality.
Task Motivation
The results of this task will be used by E.2 to determine network evolution, and several tasks in
CNARC that deal with the dynamics of communications networks. They will also be used by
Task S1.2.
Key Research Questions
In this task we seek to answer the question: Can models be developed that generate traces that
are statistically close the real mobility and scale form vastly different scenarios?
Initial Hypothesis
We expect that we can generate models based on statistics that scale from mobility that impacts
connectivity to those that impact proximity within hundreds of yards. As a longer term goal we
expect to generate models that can impact channel conditions on a time scale of a few seconds.
Technical Approach
When considering mobility models for determining the impact on communication networks, it is
crucial to understand the type of phenomena that must be captured to keep models tractable.
This in turn requires the understanding to the type of network being modeled. When modeling
mobility for cellular networks, for example, the metrics of interest are typically how often
handoffs occur and how many users may be in the coverage area of a single radio. The first
metric is used to help engineer the capacity of signaling links and network processors so that
they can handle the control load generated by mobility. Simple fluid flow models that
approximate the amount of movement during congested time periods often suffice for this
purpose. The second metric is used to determine how to allocate resources, for example
frequencies to cells. Models that incorporate expected human movements given the layout of a
network are often used for this purpose (e.g., movement on roadways, movement between cities
and suburbs). This second type of model is addressed in E4.1.
When modeling mobility in tactical networks, a different set of metrics is important. For
communication networks slight movements such as the change in angle of a node may impact its
ability to maintain connectivity with nodes. From a networking perspective mobility in
MANETs typically impacts routing protocol overhead and the latency with which data is
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 47,313
CUNY (CNARC) 15,000
CUNY (SCNARC) 45,580
MIT (SCNARC) 28,017
Government Funding
Organization Cost Share ($)
($)
NEU (SCNARC) 114,714
PSU (CNARC) 55,589 20,000
RPI (SCNARC) 18,385 3,356
UCD (CNARC) 30,000
USC (CNARC) 73,000
TOTAL 427,598 23,356
Table of Contents
6 Non-CCRI Research: Interdisciplinary Research Center (IRC) ............................................ 6-1
6.1 Overview ......................................................................................................................... 6-3
6.2 Motivation ....................................................................................................................... 6-3
6.2.1 Challenges of Network-Centric Operations ............................................................. 6-3
6.2.2 Example Military Scenarios ..................................................................................... 6-4
6.2.3 Impact on Network Science ..................................................................................... 6-4
6.3 Key Research Questions ................................................................................................. 6-4
6.4 Technical Approach ........................................................................................................ 6-4
6.5 Project R1: Methods for Understanding Composite Networks ...................................... 6-7
6.5.1 Project Overview ..................................................................................................... 6-7
6.5.2 Project Motivation ................................................................................................... 6-7
6.5.3 Key Project Research Questions .............................................................................. 6-8
6.5.4 Initial Hypotheses .................................................................................................... 6-9
6.5.5 Technical Approach ................................................................................................. 6-9
6.5.6 Task R1.1: Extracting Network Knowledge: Graph Sampling and Clustering in
Integrated Networks (M. Faloutsos, UC Riverside (IRC); D. Towsley, UMass (IRC); P.
Basu, BBN (IRC); J. Srivastava, UMinn (IRC)) ............................................................... 6-10
6.5.7 Task R1.2: Advanced Mathematical Models: Economic / Market-based Approach to
Modeling Integrated Networks (D. Parkes, Harvard (IRC); M. Wellman, UMich (IRC); V.
Kawadia, BBN (IRC)) ....................................................................................................... 6-18
6.5.8 Task R1.3: Category Theory Based Approach to Modeling Composite Networks (M.
Kokar, NEU (IRC); V. Kawadia, BBN (IRC); Collaborators: P. Basu, BBN (IRC); J.
Hendler, RPI (IRC); C. Cotton, D. Sincoskie, UDel (IRC)) .............................................. 6-25
6.1 Overview
6.2 Motivation
Are integrated multi-genre networks better-understood and controlled using techniques outside
classic structurally focused approaches?
How can we describe and characterize the propagation of information across social cognitive
networks in a way akin to the characterization of capacity, noise, and distortion in
communication networks: e.g., define semantic capacity, impact of loss and error of information
on decision processes?
What is the best methodology for validating theories, models and characterizations of multi-
genre networks?
6.5.6 Task R1.1: Extracting Network Knowledge: Graph Sampling and Clustering
in Integrated Networks (M. Faloutsos, UC Riverside (IRC);
D. Towsley, UMass (IRC); P. Basu, BBN (IRC); J. Srivastava, UMinn (IRC))
NOTE: This is a 6.1 basic research task.
Task Overview
In this task, we will explore two complementary approaches for extracting knowledge about the
structure of a network: (a) graph sampling, as a means to overcome measurement limitations,
and (b) clustering, as a means to extract patterns and information from the structure of the
network. Note that these two problems are synergistic: they both provide different insights into
the structure of the observed network, as we discuss below.
In the first year, we will focus on the question: how can we identify patterns and ―typical‖ node
behavior? This is not a trivial task as it an open-ended question. Finally, in subsequent years, we
will develop methods to detect outliers and anomalous behavior; nodes that interact with the rest
of the network in surprising ways. All tasks here are closely related: outliers are nodes or group
of nodes that deviate from the observed structural rules and patterns. The work will leverage
the work of PIs Faloutsos [He08] [Ilio07] and Towsley [Jaiswal04] [Bu02] in this area.
Task Motivation
The problem addressed here is fundamental in our ability to move from the plane of measured
data and observed behavior to ―knowledge‖. Here are some concrete examples: (a) we can
identify communications nodes or sensors that have passed to the hands of the enemy and are
used to impede the communication functions, (b) we can identify information exchanges such as
email, that can point to abnormal behavior, of someone attempting to contact or access people
and information that they should not, and (c) we can discover and maintain key military
relationships and communication channels, which are essential for the success of a mission . The
capabilities developed in this project will benefit directly our ability to develop trusted networks
that are reliable in their operation and protected from malicious users.
We present key research questions and the approach for each two components of this tasks
separately.
Subtask R1.1.1: Graph Sampling
The first approach to explore the structure of a network is graph sampling.
Technical Approach
First, it is easy to show that uniform node sampling provides an unbiased estimate of the degree
distribution. Second, in the context of online social networks many studies based on snowball
sampling have produced significantly erroneous estimates of the degree distribution, see Figure
above.
When both node sampling and RW sampling are applicable, which is better? Here we take mean
square error (MSE) as the metric to use in comparing different methods. Our initial results show
that node sampling provides better estimates for small degrees and that RW-sampling provides
better estimates for high degree nodes. This suggests that RW-based sampling is more
appropriate for characterizing the degree distribution of networks with highly variable degree
distributions and, in particular, those where the degree distributions are characterized by a power
law. The figure below shows degree distribution estimates under both node sampling and RW
sampling along with the mean square errors as estimated through simulation. We observe that
random node sampling provides better estimates for small degrees and RW (here we used the
variant called RDS) provides better estimates for large degrees.
connectivity, which captures the connectivity between clusters, (c) intra-cluster connectivity,
which captures the connectivity inside the clusters.
Using these metrics we propose to explore different clustering algorithms and adopt them for the
needs of our networks. Note that different clustering algorithms have different performance
depending on the properties of the network and also the definition of the goal of clustering. A
critical issue for example is the degree distribution and the existence of degree correlations
(assortativity). The PIs have extensive experience with clustering algorithms in specific contexts
namely, clustering and modeling the Internet topology, and identifying groups of communicating
nodes by analyzing network traffic as can be seen in the Figures above.
Specifically, for the first year, we propose to see how much information we can extract from the
topology of a network and we will focus on: (a) the topology of communication networks, (b) the
network wide interactions between computer devices, (c) social networks, focusing on online
social networks such as eBay, YouTube, and Enron data set. For eBay and YouTube, we have
[Beye08] "The eBay Graph: How Do Online Auction Users Interact?" Yordanos Beyene and M.
Faloutsos and Polo Chau and Christos Faloutsos IEEE Global Internet, 2008
[Bu02] Tian Bu, Donald F. Towsley: On Distinguishing between Internet Power Law Topology
Generators. INFOCOM 2002
[He08] "Policy-Aware Topologies for Efficient Inter-Domain Routing Evaluations,‖ Yihua He
and Michalis Faloutsos and Srikanth V. Krishnamurthy and Marek Chrobak IEEE INFOCOM
2008 Mini-Conference.
[Heckathorn97] "Respondent-Driven Sampling: A New Approach to the Study of Hidden
Populations." Social Problems. (1997)
[Ilio07] "Network Monitoring Using Traffic Dispersion Graphs (TDGs)" Marios Iliofotou , P.
Pappu, M. Faloutsos, M. Mitzenmacher, S. Singh, G. Varghese ACM/USENIX Internet
Measurement Conference (IMC 07) 2007.
Task Overview
We will pursue the task of economic modeling of composite networks at two levels. We will
seek to effect useful coordination across and within networks by inferring the utility of actors
within the networks, identifying simple parameterizations of the decision environment that
facilitate automated control to improve behavior, and developing incentive-compatible
mechanisms to elicit additional information as necessary from participants. We will also use
market modeling to capture specific network resource allocation scenarios, and analyze the
resulting models as economic systems.
Task Motivation
Mathematical economics provides a well-developed mathematical theory to model the behavior
of large systems of (approximately) rational agents as they act to deploy limited resources for
value creation. A fundamental principle is that rational (i.e., utility maximizing) agents respond
to incentives, and so identifying the operable incentives is always a first step of analysis. The
economic perspective extends naturally to decision making over networks that restrict
information of participants, provide structure to interdependencies between participants, and
constrain available actions.
Key Research Questions
A prerequisite to developing economic network models will be to understand the ways in which
different network planes (information, communication, social) come together in competition for
resources, and in service of each other‘s functions. Key research questions include:
What are the resources that are in supply and demand by each network?
What does this suggest about where to position a small number of market processes, that
will provide coordinated allocation for those resources and services?
Initial Hypotheses
An economic and market-based approach to modeling complex interconnected systems bears on
two distinct, but equally important dimensions of network analysis: understanding, and control.
These two dimensions of analysis apply both within individual networks and across networks,
Subtask R1.2.2: Market Modeling (D. Parkes, Harvard (IRC); M. Wellman, UMich (IRC);
V. Kawadia, BBN (IRC))
NOTE: This activity will be deferred to year 2.
In a direct approach to economic modeling, we can map specific network resource allocation
scenarios—or generic resource allocation problems on networks—to literal market systems,
where agents interact with neighbors through the explicit exchange of goods and services at
negotiated prices.
Many previous works have developed computational market-based methods for resource
allocation on networks or other decentralized environments. One of the most active areas has
been in markets for computational resource allocation, starting over 40 years ago, and revisited
periodically thereafter in a range of computational settings [Clearwater, 1995; Kurose et al.,
1985; Mullen & Wellman, 1995; Nisan et al., 1998; Stonebraker et al., 1996; Waldspurger et al.,
1992; Wellman et al., 2001]. Attempts to ground computational markets in microeconomic
general equilibrium theory led to a general methodology for market-oriented programming
[Wellman, 1993; Ygge & Akkermans, 1999]. Recent work in grid and utility computing has
naturally brought a resurgence of interest in the market-based control idea [Buyya &
Bubendorfer, 2009; Buyya et al., 1999; Lai et al., 2005, Lubin et al., 2009; Wolski et al., 2001].
The successes and failures of these various systems—particularly those operated on networks—
are highly instructive for development of computational market infrastructure for all kinds of
resources.
We start from the general-equilibrium perspective, but extend the classical microeconomic
model to operate dynamically over time, and through explicit computational market mechanisms.
Our basic price-determination mechanism is the call market—a two-sided auction that matches
trades periodically according to well-defined bidding and clearing rules [Wurman et al., 1998].
Network structure defines the possible trade pathways, and market functions are implemented by
message passing on these networks. Depending on the scenario, market operations (bidding,
clearing) may be conducted synchronously or asynchronously, with informational latencies
governed by network capacities and market rules.
For example, the graph below depicts a simple supply chain network, where oblong nodes
represent agents, and circular nodes represent resources (goods or services) and their
corresponding markets. Agents a1 and a2 are suppliers, who can provide raw materials to the
producers a3, a4, and a5, who can provide finished tasks to the end consumers (e.g.,
The computational market on a network will directly model canonical resource allocation
scenarios, such as supply chain formation and management (shown above), and allocation of
generic goods (where the network captures communication links).
Validation Approach
We will evaluate the effect of operational market parameters (e.g., periodicity of call markets,
communication latency), network structure, and other environmental conditions (e.g., demand
volatility) on allocation outcomes. The key measure of outcome quality is efficiency: how well
does the market allocate resources to their most valued uses over time.
References
Task Overview
This task focuses on using key ideas from category theory to model composite networks. This
promises to give us an alternative mathematical modeling toolset to augment the structure-
focused tools that will be developed in Project E1 and E2.
Task Motivation
One of the primary tasks for the IRC is to provide the understanding of how global network
properties or behaviors can be composed from properties of information, social-cognitive, and
communications networks. Consequently, model development must also follow the
compositional pattern –the model development process must capture how the composition
influences the overall structure and how human effectiveness with respect to the mission can be
assessed and influenced by both the properties of particular networks and by the interactions
among them within the composed model of network of networks. For this reason, the model
must be crafted systematically, with the focus of providing the high fidelity of the representation
with respect to the behavior of the real system.
The problem of derivation of models of composed networks is exacerbated by the fact that in this
project we are not dealing with a homogenous collection of simple objects, but rather with a
collection of networks, which are very complex objects to model and analyze. On top of this
complexity is the additional layer of complication, perhaps the most difficult one, resulting from
the fact that three different types of network are involved. This non-homogeneity, both within
one type of network as well as among the three types of network, calls for modeling
(mathematical) tools that are very abstract and thus are capable of capturing the commonalities
and the differences among various components and still be able to provide means for deriving
models of such complicated structures. This is the main reason why we are proposing to use
category theory – a mathematical framework that proved to be very efficient in capturing
representations of many, seemingly disparate, mathematical structures.
Key Research Questions
What kind of mathematical formalism is needed to model complex heterogeneous
networks in a compositional way?
Prior Work
In this research thrust we are proposing to use the approach to modeling of heterogeneous
networks based on the principles of category theory (c.f. [Goldblatt, 1984; Pierce, 1991]). This
approach to modeling relies on the colimit operator of category theory. Intuitively, the colimit
operator is an extension of the shared union operator of set theory. The shared union operator is
applicable only to sets. The colimit operator, on the other hand, is applicable to any objects in a
category. For instance, if the objects are processes (or algorithms) on a particular network –
social, information, or communications - then the colimit of two processes allows for a
systematic and consistent weaving of the two processes into one in such a way that pieces of
each of the process (sub-processes) contribute their own parts to the overall process and their
This task will address the question – what are the categories that are appropriate for representing
complex heterogeneous networks and networks of such networks. This task is the realization of
Thrust A above.
The first seemingly natural attempt would be to use the category of algebras to model this kind
of networks. Such a category can be understood as the category of specifications of software
components, i.e., collections of sets and some operations on sets. The operations are always
functions. However, it has been already recognized by the researchers in computer science (cf.
[Anlauff et al., 2004]) that such categories are not adequate for modeling behavioral components.
For instance, while such categories are very useful for modeling functional programs (cf.
[Pavlovic & Smith, 2003]), they cannot model programs with state and global variables. Since
networks considered in this project are not assumed to be functional (i.e., we do not assume that
a network node will respond to some informational inputs always in the same manner
independently of the history of such inputs in the recent time), therefore we need to look further,
beyond the category of algebras. At this point we anticipate that even the category of Espec,
partially formalized in [Anlauff et al., 2004], will not suffice for this particular problem. We will
use Espec category as a starting point and will propose a new category that will faithfully
represent heterogeneous networks of networks, including the evolution of networks due to both
the passage of time and of external events.
In this way we will pursue (constructively) the plan of research suggested by Bonick. In other
words, the result of this task will be a higher category along with justification of the selection of
the category. One of the important issues in this task will be the decision on how to assess the
appropriateness of a proposed category to the task of modeling at hand. For this purpose we will
use the results of Task 2, in which we will develop examples of representations of networks of
networks. These examples will serve as the test cases for the theory we develop in this task. The
feature assessed by this testing will be the fidelity of the representation of a system (in this case
these will be simulated systems) by the proposed modeling formalisms. Towards this goal, the
developed test cases must provide information on what the expected results of the tests should
be.
Subtask R1.3.2: Meta-models for modeling heterogeneous networks and networks of
networks
This task will address the question of what king of meta-modeling language is needed to
faithfully represent heterogeneous networks and networks of such networks. This task is the
realization of Thrust B above.
Research Milestones
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
Government Funding
Organization Cost Share ($)
($)
Task Overview
The fundamental mathematics used in understanding the transmission of information in
communication networks is Shannon‘s information theory. This seminal work provided
mathematical formalization to the notion of information (a bit stream) being transmitted between
entities, and allows for the specific definition of notions such as information loss, coding and
encoding errors, and information entropy. In short, it allowed us to probe the fundamental limits
on compressing, reliably storing, and transmitting data. The impact of Shannon‘s work is huge,
and applications of the theory include lossless data compression (e.g. ZIP files), lossy data
compression (e.g. MP3s), and channel coding (e.g. for DSL lines). Further, the field is at the
intersection of mathematics, statistics, computer science, physics, neurobiology, and electrical
engineering. Its impact has been crucial to the success of the Voyager missions to deep space, the
invention of the compact disc, the feasibility of mobile phones, the development of the Internet,
the study of linguistics and of human perception, the understanding of black holes, and numerous
other fields. Important sub-fields of information theory are source coding, channel coding,
algorithmic complexity theory, algorithmic information theory, and measures of information.
However, Shannon‘s theory, and its many descendents, is deficient in an important way when
considering modern information networks. While it talks about ―information,‖ it really focuses
on data – that is the transmission of bits and the ability of a device to reconstitute those bits at the
receiving end. However, it does not take into account the critical aspects of the transmission of
the information with respect to whether the receiver ―understands‖ that information. Thus, under
Shannon‘s definitions, information transmission is essentially related to coding and encoding,
and not to human communication of that information.
Task Motivation:
Modeling the modern information network thus needs a significant extension to Shannon‘s work.
Social networks are made of links that represent people (or organizations) and their relationships.
The information that flows through those networks cannot be viewed just as bits - they are bits
that encode specific meanings, and those meanings only are comprehensible in the presence of
specific semantics. Information networks carry this encoded meaning, but it is only useful when
interpreted by humans, and that notion of interpretation (or understanding) is not covered in
Shannon‘s work. Thus, there is a crucial link between social and information networks that we
must better explore, understand and explicate to be able to develop a network science that can
explain these newer forms of networks.
Key Research Questions
Can the concepts of information theory be extended to the composite networks needed for the
NS CTA to model modern composite military networks?
Initial Hypotheses
We hypothesize that modeling the information flow in composite networks will require
extending the definitions from Shannon‘s Information Theory to understand how terms that have
successfully been used in modeling communications networks can be applied to these new
network types. We will start by extending definitions from information theory as applied to
Figure: Information Fluidity may allow us to apply more rigorously define quantifiable effects of information
transmission errors based on changes in social or semantic contexts.
Validation Approach
As a primarily mathematical approach, the model will mainly be validated through application to
scenarios (e.g. use cases) and intuitions coming from more applied collaborating tasks (e.g.
information loss in R2.2). Early results (year 1) will primarily focus on definitions that attempt
to expand those of Shannon theory to the new network types; later work will include proofs of
key theorems and development of scientific experiments to test key hypotheses of the model.
One key aspect of validation will be showing that the new definitions, which extend Shannon‘s
approach, will
Technical Approach
The field of Information Retrieval (IR) offers many tools to approach these research questions.
We define IR as the study of computational methods for searching, organizing, and analyzing
very large, semi-structured, heterogeneous databases of text, images, video, and other media that
are easily understood at the level of human cognition but for which mathematical models can at
best only capture the semantics at rudimentary levels. We therefore view IR as a problem in
Artificial Intelligence. The basic function of an IR system is to take a query entered by a human
user and return a set of documents ranked in decreasing order of probability of relevance to the
user.
One of the key problems in IR is measuring the utility of results ranked in response to a query.
Utility is generally measured in terms of the relevance of the results; two of the most common
measures are precision (the proportion of retrieved material that is relevant) and recall (the
proportion of relevant material retrieved). Since relevance is a semantic (or pragmatic) property,
it can only be well-judged by human assessors. These assessors incur a cost in time and money,
and the cost is potentially very high: in particular, if they must have expert knowledge to be able
to assess relevance, or if the databases are very large, costs can quickly grow to the point that no
evaluation can be done without a significant investment.
This is an important problem in IR, and we believe it will be even more important in network
science: we expect that the types of information flowing through the network will be so varied,
and the algorithms used to process queries so diverse across nodes, that it will cost significantly
more to evaluate the utility of a network than it does a standard IR system. Fortunately the
problem has been studied, and there are tools that we can build on. Much of the previous work
focuses on selecting individual results (i.e. individual documents, images, videos, etc) to be
judged by human assessors, with the goal of being able to accurately measure utility with only a
small amount of assessor input. Some of this work is based on statistical sampling approaches
that attempt to find a set of documents that provide a low-variance, zero-bias estimate of a
measure like recall ([Aslam06]; [Carterette2008b]). One approach is an algorithm, using a
priority queue ([Cormack98]) that is updated after each assessment. Alternatively, some
approaches assume an initial set of assessments from which additional ones can be inferred
([Jensen07]).
Our approach builds on algorithmic work published in a series of papers by Carterette et al.
([Carterette06], [Carterette07], [Carterette08a], [Carterette08b]). The focus in that work is on
comparative evaluation: given two algorithms for ranking possibly-relevant material, determine
which is better (and report the confidence in that decision). Specifically, given some utility
Task Overview
As discussed in the military scenario described earlier, measuring, reasoning about, and
forecasting the impact of information loss in communication networks and information error in
information networks on the structure and performance of socio-cognitive networks is a key
challenge that systems will face when integrating socio-cognitive models with information and
communication models. A related challenge, necessary to transitioning the theoretical results of
task 1.1 is representing and visualizing the impact of information loss and error on socio-
cognitive networks in a manner that will help warfighters identify socio-cognitive networks of
interest (e.g., adversarial networks, boundaries of trust within coalitions, etc.).
Task Motivation
We believe that advances in network science will occur more quickly if we simultaneously
address the IRC research challenges above. Our goal is to develop theory, models, tools, metrics
and visualizations for reasoning about the impact of information loss and error on socio-
cognitive networks. Insights gained from our research will inform research concerned with how
information loss and error impact, for example, Blue and Red force capability, their socio-
cognitive networks, and the adequacy of warfighters‘ knowledge of adversaries‘ socio-cognitive
networks. In the long term, our research will lay the foundation for a capability to perform
dynamic error-impact analysis and visualization on-line as network-data streams in from
evolving battlefields.
Key research questions
Our key research question is how can missing or incorrect network data be characterized using
the metrics (extant and forthcoming) that practitioners might use to describe and reason about
networks? Our secondary research question is how does information error affect behavior in
socio-cognitive networks?
To answer these questions we will address a number of core issues, including (progressing from
6.1 to 6.2):
What are the characteristics of information loss and error scenarios?
How do we differentiate intentional vs. inadvertent information loss and error?
What is the effect of network topology on error propagation?
How can we provide a high level estimation of information loss and error levels given
specific networking data?
What is the robustness of socio-cognitive network metrics in the face of information loss
and error?
What procedures do we have for assessing confidence in network metrics?
What is the impact of information loss and error on socio-cognitive networks and how do
these affect the resultant human behaviors at the tactical, operational and strategic levels?
Technical Approach
This task focuses on modeling adversary networks.
Research Milestones
Government Funding
Organization Cost Share ($)
($)
References
[Aslam06] Javed A. Aslam, Virgil Pavlu, and Emine Yilmaz. A statistical method for system
evaluation using incomplete judgments. In Proceedings of the 29th Annual International ACM
SIGIR Conference on Research and Development in Information Retrieval, pp. 541-548, 2006.
[Barnes54] Barnes, J. A. "Class and Committees in a Norwegian Island Parish", Human
Relations 7:39-58, 1954.
[Batini86] C Batini, M Lenzerini, Sb Navathe, A Comparative Analysis of Methodologies for
Database Schema Integration, ACM Computing Surveys, 1986.
[Belov09] Belov, N., Martin M.K., Patti, J., Reminga, J., Pawlowski, A., & Carley, K.M. (2009).
Dynamic networks: Rapid assessment of changing scenarios. In the Proceedings of the 2nd
International Workshop on Social Computing, Behavior Modeling, and Prediction: Phoenix, AZ
(March, 2009).
[Borgatti06] Stephen Borgatti, Kathleen M. Carley, and David Krackhardt, 2006, ―Robustness of
Centrality Measures under Conditions of Imperfect Data,‖ Social Networks, 28(2), p. 124 - 136.
Task Overview
This task will extend through the course of the project. The main goals are to:
Analyze the opportunities and requirements for experimentation in composite networks,
based on the initial consortium research directions and military needs.
Develop a shared environment for experimentation in composite networks, selecting and
integrating resources (such as datasets, models, simulations, ontologies, methodologies,
software tools) created and used throughout the consortium.
Lead collaborative experimentation in composite network through outreach, education,
coordination, and planning across the consortium.
Creation of the shared experimentation platform will start in the first year, and additional
components and capabilities will be added as the composite experimentation needs of the
consortium grow. A key aspect of the initial year effort will be to identify a small number of
experimentation components which are mature enough to deploy as a shared resource and to
begin the integration work necessary to deliver capabilities for cross-genre network
experimentation. We anticipate that one likely component is a game-based interface suitable for
presenting a scenario and capturing human behaviors from teams, in order to provide realistic
action input and mission performance measurements for experiments on underlying
communication and information networks.
A central objective of this multi-year task is to establish the fundamental interoperability of a
distributed experiment environment to make Cross-Cutting Research Initiative (CCRI)
experimental needs and experimentation with combined network research across the ARCS
possible. As the program progresses we intend to stay in front of research and experimental need
Initial Hypothesis
The overall CTA must quickly realize a geographically distributed experiment environment
which is built for flexibility and extensibility. We are starting with pieces, some as of yet
unidentified, and a few well defined experimental needs. Thus the key initial research
hypotheses in this project are:
Multiple consortium groups have experimentation resources which should be of general
use for experimentation in composite networks.
A basic pack of security and distributed computing software can rapidly enable sharing of
data and models in a substantial but non-real-time way across the ARCs.
A shared platform for experimentation is feasible to develop, deploy, and evolve in a
dynamic CTA research environment.
After examining the initial hypotheses, this task will then go on to determine how:
Consortium researchers will perform collaborative experimentation with composite
networks, using the shared platform.
One can realistically assess the potential effectiveness of network science research
products by placing human subjects in simulated environments and measuring their
responses and decisions in reaction to properties of and events in their social, cognitive,
information and communications networks.
Inherent in these explorations is the underlying theme that any NS CTA experiment is dependent
on many other CTA projects and tasks. We will be putting research products like ontologies,
metrics, models, data sets, and visualization approaches to use in the distributed experiment
environment.
Task Overview
In order to realistically reflect performance in military relevant scenarios ranging from operation
planning to analysis of intelligence to execution of humanitarian relief missions, we must be able
to incorporate human behaviors. The experimentation platform developed under R3.1 will
eventually include methods to simulate simple human actions can be using automated computer
agents. However, initial development of the composite experimentation platform will emphasize
integration of a virtual environment as an interface to capture human behaviors. Task R3.2
focuses on developing and validating a methodology for practical experimentation in composite
networks, particularly experimentation using human participants acting in virtual environments.
This task differs from R3.3 because this task focuses on experimentally understanding some key
principals required to enable general experimentation in composite networks, while the emphasis
in task R3.3 is to perform experimentation to validate and assess specific composite network
theories.
Task Motivation
Experimentation in full-scale, real-world composite networks is most realistic, yet not usually
feasible due to expense and lack of experimental control. Yet, composite networks simulations
that rely only on scripted or automated behaviors to model human decisions may not produce
realistic results. We must understand which types of network characteristics and behaviors are
accurately modeled in simulation, and which require more real-world input of actual human
behaviors. New controlled, virtual experimentation environments can incorporate some degree of
more realistic human behaviors, but their limitations in this domain are not fully understood. In
particular, it is important to understand which types of behaviors and phenomena (i.e. formation
of trust, sharing of information, use of communications, emergence of leadership) unfold
differently in virtual environments than in live environments. Before we can rely on results from
virtual world experimentation, we must assess how closely virtual world behaviors map to the
real world.
Similarly, much research in network science depends on civilian dataset and civilian experiment
participants, due to limited availability of military data and participants. It is important to
understand whether and how networked interactions among military populations may differ from
civilian analogs.
1
BBN has done game-based experimentation or training work at more than 20 Army and military training sites for
previous efforts, but we have not yet arranged collaboration for these experiments.
6.7.8 Task R3.3: Applied Experimentation in Composite Networks (A. Leung, BBN
(IRC); J. Hancock, ArtisTech (IRC); D. Williams, USC (IRC))
NOTE: This is a 6.2 applied research task.
Task Overview
In this task, we will perform applied experimentation to validate composite network theories and
assess the potential military operational applications of these theories. Additionally, we will
develop guidelines and processes for experimental design and shared experimentation platform
use, paving the way for other consortium researchers to do more composite network
experimentation.
Task Overview
This task looks into the future of combining network models to understand the kinds of hazards
or dissonances that occur when networks are combined. It looks at existing examples of
combined networks, such as BitTorrent (an information network) on top of the Internet (a
communications network), and the various attempts to layer communications networks with
social networks, to understand the implications of combining network models and the types of
Research Milestones
Government Funding
Organization Cost Share ($)
($)
ArtisTech (IRC) 133,773
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 363,013
UDEL (IRC) 119,080
Williams (IRC) 12,142
TOTAL 628,008
Government Funding
Organization Cost Share ($)
($)
TOTAL 329,188
6.8.3 Task R4.1 6.1 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D.
Towsley, UMass (IRC))
NOTE: This is a 6.1 basic research task.
This task is centered on encouraging collaboration and communication of 6.1 projects across the
NS-CTA. In particular, the liaisons will regularly talk with researchers at their respective ARCs
regarding technical activities and then talk among themselves, the project leader, and other IRC
staff looking for potential unexplored or unrealized synergies.
Mike Dean is responsible for liaison with INARC. Jim Hendler is responsible for liaison with
SNARC. Don Towsley is responsible for liaison with CNARC. Note that both Hendler and
Towsley actually receive funding from both the IRC and the ARCs with whom they will conduct
liaison work, making this task very straightforward.
6.8.4 Task R4.2 6.2 Liaison (M. Dean, BBN (IRC); J. Hendler, RPI (IRC); D.
Towsley, UMass (IRC))
NOTE: This is a 6.2 applied research task.
This task is centered on identifying promising research at the ARCs that should be rapidly
moved to 6.2 experimentation, validation and possible tech transfer. It is well-known that
researchers often become so engrossed in the research problems in their work that they do not
realize when the work has reached the point it should transition. This task addresses that risk.
The liaisons for the ARCs will be the same as in Task R4.1
Government Funding
Organization Cost Share ($)
($)
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 17,543
RPI (IRC) 17,714
UMass (IRC) 6,705
TOTAL 41,962
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 4,386
RPI (IRC) 4,428
UMass (IRC) 1,676
TOTAL 10,490
6.9.3 Task R5.1 6.1 Technical and Programmatic Leadership (W. Leland, BBN
(IRC); I. Castineyra, BBN (IRC))
NOTE: This is a 6.1 basic research task.
6.9.4 Task R5.2 6.2 Technical and Programmatic Leadership (I. Castineyra, BBN
(IRC); W. Leland, BBN (IRC))
NOTE: This is a 6.2 applied research task.
This task addresses technical and programmatic leadership of 6.2 funds. This task also includes
management of the NS CTA facility in Cambridge MA.
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 412,062
TOTAL 412,062
Government Funding
Organization Cost Share ($)
($)
BBN (IRC) 361,502
TOTAL 361,502
6.10.2 Task EDUC.1 Education and Transition Planning (D. Sincoskie and C.
Cotton, UDEL (IRC))
NOTE: This is a 6.1 basic research task.
Government Funding
Organization Cost Share ($)
($)
UDEL (IRC) 48,276 20,260
TOTAL 48,276 20,260
Table of Contents
7. Non-CCRI Research: Information Networks Academic Research Center (INARC) ........... 7-1
7.1 Project I1: Distributed and Real-time Data Fusion and Information Extraction ............ 7-4
7.1.1 Project Overview ..................................................................................................... 7-5
7.1.2 Project Motivation ................................................................................................... 7-5
7.1.3 Key Research Questions .......................................................................................... 7-6
7.1.4 Initial Hypothesis ..................................................................................................... 7-6
7.1.5 Technical Approach ................................................................................................. 7-6
7.1.6 Task I1.1 Quality-of-Information-Aware Signal Data Fusion (T. Abdelzaher, UIUC
(INARC); C. Aggarwal, IBM (INARC); A. Bar-Noy, CUNY (INARC); S. Papadimitriou,
IBM (INARC); R. Govindan, USC (CNARC)) ................................................................... 7-7
7.1.7 Task I1.2 Human and Visual Data Fusion (T. Huang, UIUC (INARC); B. S.
Manjunath, UCSB (INARC); H. Ji, CUNY (INARC); T. Höllerer, UCSB (INARC); C. Lin,
IBM (SCNARC); A. Pentland, MIT (SCNARC); Z. Wen, IBM (SCNARC)) .................. 7-10
7.1.8 Task I1.3 Modeling Uncertainty for Quality-of-Information Awareness in
Heterogeneous Information Network Sources (H. Ji, CUNY (INARC); C. Aggarwal, IBM
(INARC); D. Roth, UIUC (INARC); A. Singh, UCSB (INARC)).................................... 7-16
7.1.9 Linkages Table to Other Projects/Centers ............................................................. 7-20
7.1.10 Collaborations and Staff Rotations ...................................................................... 7-20
7.1.11 Relevance to US Military Visions/Network Science ........................................... 7-20
7.1.12 Relation to DoD and Industry Research .............................................................. 7-21
7.2 Project I2: Scalable, Human-Centric Information Network Systems ........................... 7-24
7.2.1 Project Summary/Research Issues Addressed ....................................................... 7-25
The Information Network Academic Research Center (INARC) is aimed at (1) investigating the
general principles, methodologies, algorithms, and implementations of information networks,
and the ways how information networks work together with communication networks, social and
cognitive networks, and (2) developing the information network technologies required to
improve the capabilities of the US Army and provide users with reliable and actionable
intelligence across the full spectrum of Network Centric Operations.
An information network is a logical network of data, information, and knowledge objects that are
acquired and extracted from disparate sources, such as geographical maps, satellite images,
sensors, text, audio, and video, through devices ranging from hand-held GPS to high-
performance computers. For systematic development of information network technologies, it is
essential to deal with large-scale information networks whose nodes and/or edges are of
heterogeneous types, linking among multi-typed objects, highly distributive, dynamic, volatile,
and containing a lot of uncertain information.
Among the five projects, the first two, E and T, are two cross-center research initiative (CCRI)
projects. For these two projects, we will work closely with three other research centers, i.e.,
CNARC, SCNARC, and IRC, to make good contributions on systematic investigation and
development of cross-center network science technologies. Therefore, these two CCRI projects
will be detailed in their corresponding CCRI project descriptions. This INARC IPP proposal will
be dedicated to the accomplishment of the remaining three INARC-centered projects. Moreover,
INARC will be actively contributing to, together with other centers, a comprehensive CTA-wide
education plan.
Here we provide a general overview of the three dedicated INARC research projects, and outline
for every project the major research problems, the tasks to be solved, the organization, and the
plan for collaboration with other centers.
Project I1. Distributed and Real Time Data Integration and Information Fusion (Leads: C.
Aggarwal, IBM (INARC) and Tarek Abdelzaher, UIUC (INARC))
This projects aims to answer two fundamental questions to ensure QoI (quality of information) in
data and information fusion: (i) ―how to integrate and fuse heterogeneous data (sensor, visual,
and textual) that may be delivered over resource-constrained communication networks to infer
and organize implicit relationships to enable comprehensive exploitation of information, and
how to improve fusion by incorporating human feedback in the process?” and (2) ―how to model
uncertainty resulting from resource-constrained communication, data integration, and
information fusion to enable assessment of the quality and value of information?” The focus of
this project is on large scale information extraction and fusion in the context of a large linked
information network. The goals of the project include both the derivation of such logical
linkages as well as their use during the fusion process.
This project is concerned with the following research problems: (1) ―how to organize and
manage distributed and volatile Information networks?‖ and (2) ―how to analyze and visualize
information networks to provide end-users information tailored to their context?‖ A
sophisticated information network system should present a human-centric, simple and intuitive
interface that automatically scales according to the context, information needs, and cognitive
state of users, and maintain its integrity under uncertainties, physical constraints (communication
capacity, power limitation, device computation capability, etc.), and the evolving data
underneath.
Project I3: Knowledge Discovery in Information Networks (Lead: J. Han, UIUC (INARC))
This project considers a key research problem: ―how to develop efficient and effective knowledge
discovery mechanisms in distributed and volatile information networks?” Knowledge discovery
in information networks involves the development of scalable and effective algorithms to
uncover patterns, correlations, clusters, outliers, rankings, evolutions, and abnormal relationships
or sub-networks in information networks. Knowledge discovery in information networks is a new
research frontier, and the army applications pose many new challenges and call for in-depth
research for effective knowledge discovery in distributed and volatile information networks.
Task I3.1: Methods for scalable mining of dynamic, heterogeneous information networks
Task I3.2: Real-time methods for mining spatiotemporal information-related cyber-
physical networks
Task I3.3: Text and Unstructured Data Mining for Information Network Analysis
In general, information networks are intertwined with communication networks and social and
cognitive networks in many aspects, therefore, concrete collaboration plans with all the three
centers have been laid out in every project‘s IPP, as well as the plan for the exploration of
military applications.
Task Motivation
Information networks rest on two fundamental abstractions: the abstraction of information
objects and the abstraction of links that describe their inter-relations. Examples of nodes include
information entities residing in images, text, and sensors, concepts such as threats and
vulnerabilities, events such as attacks, and locations. Links have multiple types that specify
semantic relations between objects. For example, links might indicate acquisitions,
communications, or command chains. There are two main challenges in assimilating signal data
into the information network. First, the same physical object, concept or event is often sensed or
reported by multiple sensors from different perspectives, different degrees of reliability, and
possibly different veracity. This creates data which must be collected, consolidated and linked
with each other at different levels of abstraction. In turn, such linkage can aid identification and
maintenance of individual information objects, for example by helping prioritize further data
acquisition needs. Second, sensors may generate a large amount of streaming data that needs to
be summarized for known specific applications as well as for generic unknown applications that
may be required in future. Feedback from the information network into the data fusion system
could improve situation awareness and resource efficiency of fusion by maintaining and sharing
an up-to-date representation of key measured or inferred features of the environment, guide the
degree of summarization, determine and reduce uncertainty, and help compute the most efficient
sensing resource allocation.
Hypothesis
This task will test the hypothesis that the resource requirements of data fusion and the value of
information resulting from data fusion can be significantly improved by exploiting feedback from
the information network regarding object uncertainty, data linkages, and models of sensed
phenomena. This improvement is with respect to techniques that do not use feedback from the
information network.
Prior Work
Proposed Work
In a joint vision with the CNARC, we aim to explore the concept of networks as information
sources, as opposed to networks as mere connectivity providers between end-points. In addition,
we close the loop between the physical sensing system and the information network in order to
improve information quality, bound the degree of uncertainty and reduce resource needs. An
information distillation network must integrate a large number of human and data sources of
different degrees of reliability, noise, uncertainty, resource cost, and veracity into higher-level
pieces of actionable information with a quantifiable level of quality and low uncertainty. Current
data fusion techniques are ―open loop‖ in that the logical information flow is one-directional
from sensors to fusion engines. We hypothesize that closing the loop by utilizing knowledge
extracted from the information network can significantly improve these techniques. There are
two aspects to consider in that regard; namely the quality of information (e.g., how reliable,
accurate, or uncertain it is) and the value of information (e.g., how much does the user care about
this information in view of other known information). For example, a determination that
something is a threat with high confidence makes the information of higher quality than the same
determination with low confidence. Furthermore, new information from ground sensors on tanks
in the vicinity might be of more value if the current knowledge about the tanks is low and the
tanks are in very close vicinity.To achieve the vision of information distillation networks, we
propose to investigate and develop three fundamental enabling components as discussed below.
The first component and outcome of this task is a suite of distributed algorithms, leveraging our
prior work on battlefield awareness and tracking [Abdel2004, Luo2006], to transform sensory
information feeds into a set of uniquely identifiable and addressable logical (information
network) objects with well-defined operationally-relevant semantics and a quantified degree of
uncertainty. Information objects thus formed could range from physical models of distributed
phenomena measured by the network to the representation of physical objects in the
environment. The aforementioned algorithms will employ feedback from fusion results back to
the data collection system to optimize the quality of collected information. The feedback
recognizes that data is only as valuable as its contribution to the quality of information objects.
Towards that end, we shall develop protocols for data collection and fusion that maximize the
quality of object representation, taking into account both the structure of the information network
and user-defined value of information on each object. Explicit consideration will be made to
resource constraints. Hence, an optimization problem will be solved whose objective is to select
the most appropriate sensors and sensor modalities for maximizing the global information value
of the network, given current resource constraints and degree of uncertainty of different sources.
Distributed and localized algorithms will be explored that take into account the communication
The second main component and outcome of this task is an analytic framework for reasoning
about uncertainty propagation in the data fusion system, together with mechanisms that bound
such propagation. In particular, as sensory data are combined in the data fusion system,
inferences can be made about the degree to which individual data sources contribute to the
quality of information at the destination and to the error in information object representation.
These inferences can be used to reconfigure data collection such that error is reduced and quality
is improved. An analytic framework will be developed to offer a principled approach for such
reconfiguration.
The third component considers signal data sources such as infrared or vibration monitors that
generate a significant amount of the data that arrives in the form of streams. Since, the amount
of historical data may be too large to be held explicitly and may be received from a large number
of diverse sensors, we need to be able to develop companion techniques which can fuse the
information from multiple networked sensors, and construct compressed summaries for a wide
variety of information extraction applications. The logical information network linkages among
the networked sensors may be used during the summarization process. These summarization
techniques must be general enough to be reusable in a variety of scenarios. We will explore how
summaries such as sketches, histograms, wavelets and sampling [Agg2007] can be leveraged in
military applications. Effective techniques will be designed to archive and retrieve such
summaries.
Validation
Validation will proceed experimentally by comparing performance of data fusion algorithms in
terms of resources needed and value achieved in the case where fusion does not utilize object and
phenomena models gathered from the information network and in the case where it leverages
such information. The PIs are in possession of a number of simple sensors including light, sound,
magnetic, infrared (camera), motion, and acceleration sensors. In addition, a number of publicly
available sensor data sets are available, such as the UC Berkeley sensor data set. Experiments
will proceed on three steps. Initial comparison of the algorithms (in year 1) will be conducted in
simulation using common network simulators such as NS2 or equivalent that have the capability
to account for communication models and constraints. In later years, we will use models
provided by CNARC. Sensors will be abstracted by noisy detectors or noisy scalar sources. This
evaluation will lead to a general understanding of the strengths and limitations of the data fusion
algorithms considered as a function of abstract sensor models. Next, validation will be conducted
using data from the physical sensors mentioned above. These results will shed light on
performance characteristics of fusion, such as the quality of collected information and the degree
of resulting uncertainty, given specific sensors and targets. Finally, physical experiments will be
performed on (existing) laboratory testbeds comprised of multiple distributed sensors with
wireless interfaces. For the summarization portion, the PIs will use publicly available sensor
data sets. The PIs are further open to suggestions regarding sensing modalities to use in their
Products
The research from this effort will result in the following: (1) Methods for creating semantically
linked objects from sensor streams (2) Analytic and performance results on information quality
and uncertainty that quantify the end-to-end behavior of fusion that takes information network
linkages into account (3) Methods for stream summarization (4) Experimental results validating
the hypothesis on usefulness of exploiting the information network in fusion (5) Research reports
or published papers which describe the afore-mentioned items.
7.1.7 Task I1.2 Human and Visual Data Fusion (T. Huang, UIUC (INARC); B. S.
Manjunath, UCSB (INARC); H. Ji, CUNY (INARC); T. Höllerer, UCSB
(INARC); C. Lin, IBM (SCNARC); A. Pentland, MIT (SCNARC); Z. Wen,
IBM (SCNARC))
Task Overview
This task studies the fusion of multimodal data (with a special focus on visual and human data)
in the context of information network ontology and linkage structure. The core idea is to utilize
the virtual linkages in the information network during the fusion process. Some examples of
such fusion techniques are based on the following kinds of virtual linkages: (1) Co-reference in
multiple documents (2) Text linked by images and video (sensor) feeds (3) Links between
different sensors implicit in a distributed network such as a camera sensor network (4) Network
links between different kinds of entities; for example a document may link to an image in an
information network environment, and therefore provide hints for effective fusion and inferences
which may be derived from such a linkage.
Hypothesis
Inference methods which utilize network ontology and links during the fusion process of
multimodal data are significantly superior to conventional inference methods.
Prior Work
The problem of fusion in the context of multimedia data has been widely studied; see the survey
reference [Wu2004] for a detailed study of such techniques. However, these methods construct
fusion tools in isolation, and do not take into account the rich information which is available in
the virtual linkages of an information network. This task studies the problem of multimedia
fusion in the context of the virtual linkages of an information network; a scenario in which the
rich linkage information can be leveraged for fusion and knowledge discovery.
Proposed Work
Information networks are often involved with multiple types of data. How to best combine these
features is a challenging problem. The fusion problem becomes more important when we
introduce virtual network linkages as discussed above. We have proposed a probabilistic
approach to model the subject relationships using heterogeneous kinds of data sources in
[Cao2008], and show that the model with heterogeneous data significantly outperforms models
from uniform data. However the approach in [Cao2008] is limited since it requires that the
nodes in the network only be of two states (0 or 1). We are working on a more complex model
We will also design methods for fusing human data (in the form of text, lectures, oral histories,
speech, images, and audio sensor feeds) from multiple networked sources with the use of
linkage information. As possible scenarios, we build the information networks using (a) web-
scale multimedia data, especially news and blogs on the web, and (b) using data from a camera
sensor network augmented by human observer‘s recordings. At the core of human data fusion lie
techniques to identify ‗facts‘ (entities, relations, and events) of a particular type within different
kinds of media, such as documents, images, sensor information or video, which are subsequently
converted into structured representations (e.g., databases). Most current information extraction
(IE) systems focus on processing one source at a time. This is not well suited to large
information networks containing many disconnected, unranked, redundant (and some erroneous)
facts. A related challenge is that the information network associated with images or videos often
contain unlabeled data. Some unlabeled samples contain distinguished characters such as
familiar faces and popular activities, which can be recognized by general visual understanding
systems. The other samples, however, are difficult to interpret. We will leverage a label
propagation approach based on the association network, which is built according to the similarity
between visual subjects. To handle the missing labels, our approach first annotates the visual
subjects with distinguished characteristics, and then propagates the labels to the other ―hard‖
samples according to association network. We believe this approach has potentials in general
network interpretation tasks and will keep working in this direction. When we combine the
information from images and their associated text (e.g. meta-data, captions, surrounding text,
transcription), one of the challenges lies in the uncertainty of text representation. The
descriptions are usually generated by humans and thus are prone to error. The images, especially
the web images, are typically labeled by different users in different languages and cultural
backgrounds. It is unrealistic to expect descriptions to be consistent. Without rich and accurate
descriptions, information network images cannot be searched or processed correctly. We
propose the following
(1)We will design methods for integrating such heterogeneous data into a canonical document
representation graph that integrates data coming from heterogeneous formats and media. By
mining the connections (or virtual links) between these meta-data and their models, it is possible
to obtain a unifying coherent representation of the structured network. This representation is
designed such that it is able to accommodate, for every supported document format, enough
information to allow an inference algorithm to run. Furthermore, a typical information
distillation system primarily uses prior knowledge, which is not updated during the extraction
process. We will take a broader view by exploiting posterior knowledge derived from related
documents and other data available on the entire information network. The underlying
philosophy for such integrated information networks is to leverage the redundancy information
We will also leverage an existing approach based on Markov Logical Networks [Dom2008] to
capture the global inference rules. Such a statistical relational learning approach will provide a
more unified framework to combine the power of both uncertainty (global confidence metrics)
and complex relational structure (interactions among different events, arguments and roles).
Exploiting this approach will also provide greater flexibility to encode probabilistic graphical
relations besides first-order logic, and thus allow us to fuse dynamic background knowledge as
required to effectively interpret a multimedia document in a more holistic way in the context of
the information network. Besides written texts, ever-increasing human generated data is
available as speech recordings, including news broadcasts, meetings, debates, lectures, hearings,
oral histories and webcasts. This new medium involves many difficulties related to its variability
in terms of quality, environment, speaker and language. In this project, we will attempt a novel
approach of linking diverse extracted facts from the information network as feedback to enhance
the fusion process. Our approach is not restricted to multiple document groups, but also to cross-
media fusion.
For the camera sensor network scenario, we will use the extensive network data available at
UCSB. This unique infrastructure has a large number of video cameras, both static and mobile,
distributed across the campus and surrounding areas. For this project, we will augment the
camera network data with human observers who provide verbal information that may cover
areas not covered by the camera network. On the analysis side, one of the primary challenges is
to discover the relationships between the non-visual data and the visual information.
Specifically, how the visual data can be used to provide a summarization similar to the narrated
descriptions and, if available, how verbal annotations can help the visual recognition process.
Another challenge is to sense the anticipated and unanticipated abnormal behavior occuring in
the camera network based on the historical information available through other network sources
including the past camera data itself. For example, these could include prior network information
about possible motion patterns at different times of the day/week at specified locations in the
network.
Unfortunately, automatically inferred information is never perfect as there are many factors that
determine the quality of information in visual data. Starting with the image acquisition process
and accumulating uncertainties all the way to the decision engines. Modeling and quantifying
these degradations is especially important in the networked environment, where the information
nodes need to know how trustworthy the information coming from other nodes are. In the camera
network setting, we will model the accuracy of our algorithms with respect to time dependent
factors such as outside lighting, crowd conditions and traffic patterns. This approach to modeling
information quality in time is well aligned to our goal of temporal integration of the sources. In
addition modeling the quality of the data through several variables will help the robustness of the
proposed query as well as data synopsis interfaces for handling uncertainty in Task I1.3.
(2) Modeling and reducing the uncertainty in information networks is critical in practical
problems. In essence, the data uncertainty in these information networks usually comes from
noises caused by incorrect labeling of linkages due to human errors or subjectivity
[Weinberger2008]. Especially, in collaborative annotation tasks, people join in labeling the
linkages between cross-media data for training examples. They annotate them based on
individual subjectivity, which may vary largely among different people due to their different
education and culture backgrounds, knowledge and life experiences. To reduce the uncertainty
of their annotations, some quality control procedures are required. Redundant information in
annotations among different users can be used to improve the annotation quality. In the case of
collaborative annotation, each linkage is often annotated by different users simultaneously. By
mining this redundancy among these simultaneous annotations, the uncertainty can be reduced.
A direct way is to vote among these annotations. For example, some game-based human
annotation systems have been developed for collaborative annotating the cross-media linkages
[Ho2009]. It designs Games with A Purpose (GWAP) [Ahn2006] to attract users to actively find
consensual annotations in the game so that the annotation quality can be guaranteed. Such
system has successfully generated large amounts of linkage annotations with high quality
through the information network.
Validation
Since the linkage based approach uses two scenarios corresponding to web-scale linked data and
the camera sensor network data, we will design validation methods for both scenarios. For
multiple data source fusion in the first scenario, we will build a collection from the World Wide
Web by harvesting large amounts of media relevant to an army-related domain (e.g., vehicles).
Specifically, the data will consist of text, images, and images with captions as well as other text
that links to the source documents. We will compare the performance of the algorithms to
unbiased human annotators. We will also compare our techniques to conventional methods
which do not use network-based linkage structure in the fusion process. This method will yield
qualitative and quantitative assessment for the capabilities of the proposed methods. For the
combined representation of visual and text data we will work closely with project I2 to identify
the opportunities of developing fusion and dissemination methods that are optimally beneficial to
the remainder of the information network. For the distributed camera network case we will focus
on specific activity scenarios wherein the activities cannot be recognized in any single camera
event but can be detected collectively in the network-sense, and optimal ways of fusing human
observer data with sensor network data. The data will include both indoor and outdoor activities,
and in both cases, analysis and reasoning need to happen across the network, and would present
challenges in fusing the information between the nodes in the network that may have non-
overlapping visual fields. We will develop protocols for comparing the automated analysis with
human interactive analysis. The PIs will also work with ARL to identify military relevant data
sets in order to test the effectiveness of these techniques.
Products
The research from this effort will result in the following: (1) Methods for data fusion of
heterogeneous data sources; (2) Data collected from the camera sensor effort; (3) Experimental
results validating the above techniques; (4) Research reports or published papers which describe
the afore-mentioned items.
Task Overview
The task studies the uncertainty which results from the fusion of different kinds of networked
data. The uncertainty is analyzed in the context of network linkage issues such as co-reference
between objects.
Task Motivation
The ability to handle heterogeneous data sources, many of which are unstructured—text or
images—in a networked environment provides unparalleled challenges and opportunities for
improved decision making. Data can be noisy, incorrect, or misleading. Unstructured data,
mostly text, is difficult to interpret. In a large, diverse, and interconnected system, it is difficult
to assure accuracy or even coherence among the data sources. The fusion process may itself
cause data uncertainties, which may be challenging both from a modeling and usage perspective.
Our underlying premise is that uncertainties in these scenarios must be explicitly accounted for
to achieve truly significant advances in network sciences. The need to handle such uncertainties
in the context of the structural framework of a massive information network, and support
querying, search, retrieval, mining methods over such new structures is a highly critical
requirement. Our vision is to build on these techniques to develop a theory of integrating
information from various sources into an information network, and further to model the use of all
available information from the information network. The goal of our proposed research is to
study both how to learn good models from different information network sources with different
kinds of associated uncertainty, and how to make use of these, along with their level of
uncertainty in supporting coherent decisions, taking into account characteristics of the data as
well as of its source. We propose a model centered on extended joint inference over learned,
discriminative or generative models over the entire information network, where the level of
uncertainty is represented explicitly, and inference is done within a constrained optimization
framework. This allows us to generate multiple models for information network data sources,
take into account their level and type of uncertainty, and combine and propagate coherent
decisions that respects domain and tasks specific constraints. Such an approach is more
appropriate to the information network domain.
Hypothesis
Prior Work
The field of information theory provides fundamental measures to quantify and explicitly
account for uncertainties in data and provides the corresponding sound means to formulate
performance criteria for optimal algorithm development, as well as overall performance bounds
[Cover1991]. The database community has also studied the concept of uncertainty and its
representation in the form of probabilistic databases [Agg2009]. Multiple systems have
addressed the problem in traditional databases in some context [Corm2009,Jag2008, Jag2004,
Dalvi2007, Suc2004, Jam2008, Benj2006, Sen2007].
Proposed Work
We will conduct research on the design of statistical or machine learning approaches for
determining correctness of information extraction output. We will adapt the node centrality
problem in graph theory to our global confidence estimation research. Our basic underlying
hypothesis is that the salience of an entity should be calculated by taking into consideration both
its confidence and the confidence of other entities connected to it, which is inspired by PageRank
and LexRank. This implies that the confidence of an entity should be calculated by examining
the entire information network surrounding that particular node. In this way we intend to explore
more than each individual co-reference or relation link, and also analyze the entities that cast the
vote. For example, a vote by linked entities which are highly voted on by other entities is more
valuable than a vote from unlinked entities. This is the essence of an approach which works with
the page-rank paradigm. These methods will also be integrated into the social/cognitive network
paradigm in which users may provide feedback during search and monitoring.
In addition, text and image processing methods are typically organized as a pipeline architecture
of processing stages (e.g. from pattern recognition, to information fusion, and to summarization).
Each of these stages has been studied separately and quite intensively over the past decade.
There has clearly been a great deal of progress on some of these components. However, the
output of each stage is chosen locally and passed to the next step, and there is no feedback from
later stages to earlier ones. Although this makes the systems comparatively easy to assemble, it
comes at a high price: errors accumulate as information progresses through the pipeline, and an
error once made cannot be corrected. There is little work of using logic to model the
interpretation of facts. Classical logical inference, however, is unable to deal with the
combinations of disparate, conflicting, uncertain evidence that shape such events in discourse.
We intend to move away from approaches that make chains of independent local decisions, and
instead toward methods that make multiple decisions jointly using global information. We
propose to address this by combining logical inference with probabilistic methods. We will focus
on extraction of facts with the following property: they are neither yes nor no, but they convey
information that can be used to infer such a fact with some degree of confidence, though often
not with enough confidence to count as resolving. We will develop techniques for improving the
extraction performance of multi-media data by explicitly modeling the errors in the extraction
We shall further extend the aforementioned techniques to the integration of a large number of
inaccurate and possibly inconsistent physical sensing sources. Physical signals obey physical
laws of nature that are given by known models (of unknown parameters). When multiple sensors
observe overlapping phenomena, these physical models present constraints on possible relations
between correct data values. An iterative algorithm can then be executed where the collected
sensor values of different degrees of confidence help quantify the parameters of a physical
model, which then in turn helps quantify the confidence in individual data sources. By estimating
the veracity of the individual sources in view of both known physical data models and estimated
linkages between data flows, the approach can further be used to reconfigure the data collection
system as described in Task I1.1 to improve quality of information.
As the volumes of uncertain data increase, users are forced to become more reliant on data
exploration techniques to identify the interesting portions of their data. Unfortunately, query
processing using the intuitive all possible worlds semantics, in these systems has proven to be a
computationally difficult task and is shown to be #p-complete in the general case [Suc2004].
Due to this limitation, accurately approximating query results (and their associated probabilities)
is an important problem. We aim to design methods for building a compact data synopsis, which
is capable providing approximate answers to simple count queries over probabilistic databases.
This will provide a mechanism which allows users to quickly explore large uncertain datasets by
circumventing the standard query processing engine. In practice, it is often the case that there
are multiple tables which need to be compressed and only a global space budget is provided. A
simple approach would be to distribute available space evenly across all of the tables. However,
this can result in wasted resources as it may be possible to represent some data accurately with
less space than others. Additionally, the goal in this scenario should be to minimize error in a
global manner, over the set of all representations, not just locally for each dataset.
In the proposed work, we address this problem and provide an optimal solution which is
independent of the method used to summarize each individual data source. Specifically, given a
global space budget, B, and a set of data sources, S, we compute an optimal space allocation in
which the global L2 error is minimized. We will consider two algorithms to compute an optimal
space allocation. The first is a dynamic programming approach which works without restriction.
The second is a local update method which is faster, but requires that the error function of the
approximation method is strictly decreasing with respect to the amount of space allocated to a
signal.
For tuples with large domains, the typical techniques for representing distributions (ie.
histograms) become costly to compute and may induce large errors. Additionally, although
In the first year, we will develop techniques for summarizing the uncertainty in a single
component of a collection of objects. Beyond the initial year, we will extend these techniques to
summarizing the entirety of the collection, possible with uncertainty in multiple components. We
will also develop methods that can be compute these summaries in an online manner. Thus, we
will be able to consider any dynamic collection of uncertain objects and be able to summarize
them in a compact representation.
Validation
We will validate our techniques on a variety of real data sets which can be used in order to
simulate information networks and other forms of uncertain data which are associated with
information networks. We will compare the techniques proposed in this task to more
conventional methods and validate whether our approaches turn out to be superior. We will test
our methods on the following data sets.
The research from this effort will result in the following: (1) Models for uncertainty estimation
from data fusion (2) Methods for processing and improving the uncertainty of the data obtained
from different sources (3) Experimental results validating the above (4) Research reports or
published papers which describe the afore-mentioned items.
The research from this effort will allow us to summarize a collection of uncertain objects in a
compact representation. These compact representations can be used to develop models of
uncertain data for querying and mining (Project I2), and for assessing the QoI of data. The
framework for characterizing and reducing uncertainty will be essential for understanding QoI
and trust. Whenthere are conflicting data sources, the algorithms which have been developed in
this task allow us to estimate a confidence measure for the extracted information. In the Trust
CCRI, we will show how we will use the models developed from our information extraction
algorithms to develop a more comprehensive theory of distributed trust over the entire pipeline
of information processing.
References
Budget By Organization
7.2.4 Task I2.1 Information Network Organization and Management (X. Yan, A.
Singh, UCSB (INARC); C. Aggarwal, IBM (INARC); G. Cao, PSU
(CNARC); J. Hendler, RPI (IRC))
Task Overview
An information network is a conceptual representation of not only the data which is explicitly
stored on the military network of distributed repositories, but also the implicit knowledge present
in various human-intelligence sources and public domains. A given query, for which results are
desired in real-time, may need to draw on any of these explicit and implicit information sources
for the most appropriate resolution. This task will study the science and principles behind
different data models and management mechanisms for heterogeneous network information
access and query answering.
Task Motivation
Military domain data and information is often widely spread across a disparate collection of data
sources. It is important to recover implicit links between nodes not only in one information
network, but also from multiple heterogeneous information networks. In addition, physical
constraints and human factors will significantly influence information networks. In network
centric military operations, for example, battlefield soldiers need a simple, intuitive query
interface for fast information access, while headquarter agents might rely on complicated
analytical tools and powerful machines to discover knowledge hidden deeply in an information
network. These differences pose great challenges for studying information network models,
languages and systems that allow situation-aware network information access.
Prior Work
The Resource Description Framework (RDF) [Manola2004], e.g., used in Linked Data
[Bizer2008], has an abstract syntax that could represent network data. However, there is a lack
of formal semantics that can capture the two ubiquitous issues existing in most of military data
sources: uncertainty and trust. In this project, we are going to study how to augment/design new
data models with statistical measures that are suitable for information networks. There have
been a lot of studies on data cleaning, information integration, and trustability analysis, e.g.,
[Dasu2003, Raman2001, Bhattacharya2004, Culotta2007, Han2004, Udrea2007] to fuse data
together. However, they do not consider managing data from a network science point of view.
There is a lack of languages and systems to organize and manage network information.
Existing data models, query languages, and database systems do not offer adequate support for
the modeling, management, and querying of graph data. There are a number of reasons for
developing native graph-based data management systems. Consider expressiveness of queries:
we need query languages that manipulate graphs in their full generality. This means the ability
to define constraints (graph-structural and value) on nodes and edges not in an iterative one-
node-at-a-time manner but simultaneously on the entire object of interest. This also means the
ability to return a graph (or a set of graphs) as the result and not just a set of nodes. Another
need for native graph databases is prompted by efficiency considerations. There are heuristics
and indexing techniques that can be applied only if we operate in the domain of graphs.
A number of query languages have been proposed for graphs [Consens90, Guting94, Leser05,
Sheng99]. GraphQL is different from these languages in that graphs are taken as the basic units
in a fundamental way. Some of the recent interest in graph query languages has been spurred by
the Semantic Web and the accompanying SPARQL query language. This language works
primarily through a pattern which is a constraint on a single node. All possible matchings of the
pattern are returned from the graph database. A general graph query language should be more
powerful by providing primitives for expressing constraints on the entire result graph
simultaneously. Graph grammars have been used previously for modeling visual languages and
graph transformations in various domains. Our work is different in that our emphasis has been
on a query language and database implementations. Furthermore, GraphQL can be efficiently
implemented using graph specific optimizations.
Proposed Approaches
We will study the science behind different data models for appropriate information network
representation. In order to resolve queries in a comprehensive, accurate, and timely manner, we
propose to investigate innovative techniques for information fusion to link entities from
Information networks typically have information in different nodes which are logically linked
with each other. These logical links often directly lead to further dissemination. For example,
the links within blogs, social networks or information networks naturally lead to further
dissemination. A question naturally arises as which nodes provide the best representatives at
Validation Approach
The design of the proposed information network management system will be validated on a set
of public and synthetic datasets. We will perform comparative studies to evaluate the
effectiveness and efficiency of data access, with and without the information network system.
The validation should demonstrate significant improvement over the existing relational database
techniques on accessing information networks. We will first test our system using the following
datasets.
1. DBLP Information Network: The DBLP graph is downloaded from www.informatik.uni-
trier.de/~ley/db/. There are 684,911 distinct authors and more than 1.3 million publications.
The publications include venues, years, areas, etc., thus formulating a large special-topic
information network.
2. WebGraph. We downloaded a 9GB uk-2007-05 web graph data from http://webgraph.dsi.
unimi.it/ [Boldi2004]. This web graph is a collection of UK web sites. It contains 30,500,
000 nodes and 956,894,306 edges. The edges are directional in this case. This dataset could
be used to test the scalability of the proposed information network system.
3. Biological Networks. Functional gene networks integrate multiple interaction sources into a
single network. One common repository for interaction data sources is the BioGRID
database. Different interaction types from high-throughput experiments can be assigned a
confidence value and combined together.
4. Temporal datasets: Dynamic graph datasets will be crucial to some of the research. Such
datasets can be obtained from blog networks.
These publically available network datasets might contain much more information for testing
than would be available in a military relevant environment. Working with ARL researchers, we
will adopt two approaches to alleviate this issue. First, we could generate synthetic networks
from these datasets by injecting noise or reducing data points so that they behave similarly with
military data. Second, we will collaborate with military researchers to obtain more extensive
datasets for validation.
Products
(i) technical reports on research issues in information network organization and management,
(ii) scalable algorithms and frameworks generated from the proposed studies, and (iii) research
papers and submissions to international conferences and journals.
7.2.5 Task I2.2 Information Network Online Analytical Processing (X. Yan, UCSB
(INARC); J. Han, UIUC (INARC); C. Lin, IBM (SCNARC))
Task Overview
Given an information network with nodes and edges associated with multiple attributes, a
multidimensional network model can be built to help military operators to perform on-line
analysis over the network. In this case, networks can be generalized or specialized dynamically
for any portions of the data. This model could provide multiple, versatile views of information
networks. In this task, we will study the mechanisms of online analytical processing (OLAP) in
complex information networks.
Task Motivation
To reduce information overload and provide real-time responses to military users, the
exploration of information networks will be done only through the slices of data that are
relevant to a user‘s current mission and situations. For example, a battlefield information
network may be formed by nodes representing commanders, soldiers, tanks, supporting units,
and enemy units. A commander in the field may like to roll-up the network to see how different
battalions are spatially related and are changing in the entire battlefield, or drill-down to check a
particular spot or soldier to see if re-enforcement is needed when the enemy is approaching.
Unfortunately, the lack of a general analytical model makes such sensible navigation and human
comprehension virtually impossible in an environment with complex information networks.
Initial Hypothesis
It is feasible to provide an information network analytical framework that allows a non-expert to
perform hierarchical, multi-dimensional exploration of information network in real time.
Prior Work
OnLine Analytical Processing (OLAP) concepts have been popular in industry for multi-
dimensional analysis of relational databases and data warehouses [Gray97, Chaudhuri97].
OLAP relies on pre-computed summaries for multi-dimensional data in order to provide fast
responses to flexible drill-down/roll-up styled queries for online data analysis. Unfortunately,
the OLAP framework is not available in the context of networks.
Proposed Approaches
In this task, we will study the mechanisms of OLAP in complex networks, and develop a
novel information network OLAP framework. Conceptually, OLAP on informational
dimensions is similar to overlaying multiple information networks without changing the
granularity of the network, e.g., ―merging‖ the coauthor networks of multiple years and/or
multiple conferences into one. On the other hand, OLAP on topological dimensions is similar
to the zooming out/in of information networks, which merges a set of nodes into one (thus hides
its internal structure) or splits one node into many (thus discloses its internal structure). In this
sense, we distinguish two types of OLAP on information networks: (i) informational OLAP
(i.e., I-OLAP), that drills along informational dimensions; and (ii) topological OLAP (i.e., T-
OLAP) that drills along topological dimensions. In the first year, we will develop a multi-
dimensional OLAP framework. It will be accomplished by finding suitable techniques to
summarize the graphs by determining how to identify the salient nodes via ranking and how to
calculate localized graph properties in an accurate manner.
First, heterogeneous information networks involving different types of objects are becoming
ubiquitous in military operations. How to define similarity among objects using different
relations in heterogeneous information networks and how to efficiently return top-k most similar
objects given a query accordingly are challenging research problems. In the first year, we will
examine link-based similarity definition in homogeneous networks such as Personalized
PageRank and SimRank, and then develop entity similarity operators in heterogeneous networks
where objects belong to different types with different semantic meaning, and with different size.
We will formalize an intuitive similarity definition based on path schemas given by users, i.e., a
user can specify the relation and their orders to decide the similarity among objects in a
network. Multiple path schemas can also be selected to calculate the combined similarity
results. Since path schemas can be arbitrarily given, on the one hand, we can not fully
materialize all the possible similarity results given different path schemas and their
combinations; on the other hand, online calculation for queries involves matrix multiplications,
Second, to gain a deep understanding of the structures and functions of a complex information
network, it is fundamental to investigate various properties of the network and its constituent
components, i.e., nodes, edges, sub-networks, and their associated features and attributes.
Different kinds of network analysis have been proposed and conducted over the past decades,
offering useful insights into a great number of network data, e.g., small world phenomena
[Watts1998, Barabasi2003] and power-law degree distribution [Faloutsos1999]. While these
global properties provide important observations of the real-world networks as a whole, there is
a growing need of introducing new graph operators to search local structures that characterize
objects and their neighbors in an information network. For example, ―evaluate the reliability of
an information point via its related information sources‖, ―find alert associations in intrusion
networks, which could correspond to multi-step intrusions‖, and ―discover semantic relationship
between nodes and generate hierarchies for OLAP.‖ The discovered knowledge for hierarchies
or semantic relationships can be used for OLAP. In this project, we will first study the
emerging needs for graph aggregation and association in information networks and then develop
scalable implementation for these two new graph OLAP operators. We are going to formulate
the problem of association mining as a probabilistic ranking problem, and propose a time-
constrained probabilistic factor graph (TPFG) model to model the dynamic information
network. Furthermore, we will design an efficient algorithm to optimize the joint probability
via a process of message propagation [Frey1998] on the network.
The proposed information network OLAP is also important for QoI. Node ranking and graph
information aggregation provide mechanisms to validate information and detect anomalies
existing in information network.
Validation
The data sets used in Task I2.1 will also be utilized in this task. We plan to include an
information network crawled from Last.fm for information network OLAP and graph
association study. We will demonstrate that Network OLAP will provide mechanisms for
searching knowledge and patterns hidden in large complex information networks. We will
study the scalability of our approach by analyzing the space-time complexity of various new
graph OLAP operators. In the subsequent years, we will integrate Information Network OLAP
with visualization techniques developed in I2.3 and conduct case studies.
Products
Task Motivation
Different users of the information network need different views of the available information. An
urban war fighter tracking down a sniper may need fast access to a floor plan representation of a
particular building as well as answers to several simple yes/no questions, whereas a commander
planning a new mission may need a comprehensive overview of intelligence regarding a tactical
situation, including an analysis of sources, reliability, time stamps, alternative resources, etc.
For many tasks, we anticipate the necessity of visualizing (parts of) the information network as
graphs consisting of nodes and edges. Hence, we will be researching flexible comprehensible
representations to depict large graphs interactively in different user contexts. We also anticipate
users being equipped with a wide variety of interaction platforms, ranging from ultra-mobile
devices to surround-view immersive situation rooms. We will work towards a framework for
automatically tailoring information to different user contexts. Our goal is superior situational
awareness for information network users.
Prior Work
There has been extensive research on Situational Awareness [Endsley2000, Endsley2003,
Gawron2008] and how to achieve it on different presentation and interaction platforms
[Rosenblum97, McCarley2002, Bell2002]. We will work towards achieving Situational
Awareness for heterogeneous information networks, considering scalability in data size, platform
variability, and user context. Interactive graph visualization and mining [Heer2005, Wong2006,
Cui2008, Gretarsson2009] can play an important role in the representation of large
heterogeneous data sets, and we will use this representation scheme for a central component of
our scalable visualization and interaction agenda. Visual analytics approaches to sensemaking in
large-scale data repositories [Viegas2007, Stasko2007] have occasionally explored the role of
different types of provenance [Gotz2008] with good results, and we feel that our agenda will
result in considerable new contributions to the state of the art in this area as well.
Automatic generation of visualizations and multimedia briefings has been explored in the
intelligent user interface research community [Maybury1998, Dalal1996, Green2004,
Rousseau2006]. We focus our work on the specific tailoring of information network data to
different users with different presentation and interaction platforms in different user contexts.
Proposed Approaches
Our first-year efforts on optimally visualizing general information network content will focus on
a crucial component of network information visualization, interactive graph visualization, as well
as the design of an integrated framework to tailor information content to user context and
availability of presentation/interaction platforms. In particular, we will work on the following
two tasks:
1. Design scalable graph representations of information networks;
2. Develop and evaluate a framework for adapting information network content to different
user contexts and presentation and interaction platforms (ultra-mobile to immersive
surround-view);
1. Scalable Graph Representations of Information Networks
In the first year, we will develop novel graph-based representations for information networks.
The interactive visualizations and interfaces that we plan to develop and evaluate will be based
on large heterogeneous data collections as well as a careful analysis of meaningful situational
awareness tasks in the INARC domain. We will perform a detailed task analysis and collect
Extending our previous work on scalable visualization and constrained interaction for large
graphs [Gretarsson2009], we will focus on real-time interactive visualization of graphs
representing entire information networks and selective clusters of interest. Our motivation
comes from the insight that real-time interaction and dynamic probing of large data sets can lead
to a much clearer mental representation and better understanding of the available data than
partial views and text-based analysis and queries alone. We will be working towards powerful
graphical tools enabling our users to form their own comprehensive views of the information
universe and allowing them to navigate and explore it as comfortably as possible. We strongly
believe that interactive exploration of very large networks is not just feasible but also essential
for forming an increased understanding of the available resources. The first step is to display the
data universe as a whole and make it feasible to interact with it in real time. The second step is to
leverage interaction to let each individual user predictably downscale the network to form
various level-of-detail representation of the space. This structure will become their mental model
of the data universe. While the representation will be dynamic and continuously adapt to newly
arriving data, it will stay comprehensible and predictable because it is formed under the user‘s
direct control.
2. Adapting Information Networks to Different User Contexts and Presentation Platforms
While the previous subproject deals with the specific case of graph representations, the
following two subprojects are concerned with general multimedia information presentation.
Apart from the scalability to large amounts of data, we want our information network
visualizations to be flexible in terms of the type and cognitive state of the recipient (scalability
to user context) and the presentation and interaction platforms they use (scalability to available
infrastructure).
To this end we will design a representation and presentation framework that takes into account
user models (from our domain and user analysis) and cognitive models (using extensions to
PARCs Information Foraging Theory), as well as detailed information about the capabilities and
constraints of various candidate presentation and interaction platforms, ranging from ultra-
mobile handheld/wearable devices to the Allosphere, our three-story immersive situation room
at UCSB. Our framework will allow us to tailor task-specific information from the
representations inside the information network to the interaction platforms and general context
of the recipient. We will start our agenda with simple ultra-mobile interfaces, integrating
iPhones with the Allosphere infrastructure. In conjunction with Project I1, we will develop
interactive situation room interfaces to visualize an information network originating from a
large indoor/outdoor camera network deployed at UCSB.
To study the effectiveness and efficiency of the graph visualizations, UCSB and PARC will
collaborate on an empirical study of trade-offs among task conditions, user interface constraints
Validation
We will validate the usefulness of our novel scalable visualizations and interfaces through the
usability evaluation agenda proposed above. Specifically, in the first year, we will design
controlled formative user studies for scalable graph visualizations and provenance data, UCSB
and PARC will collaborate to determine meaningful variables & trade-offs to study for the
identified tasks, as well as to set up a systematic infrastructure and procedure for replicable
controlled collaborative user evaluation.
We will make use of the data sets used in tasks I2.1 and I2.2, and work with Project I1 to obtain
meaningful graph representations of heterogeneous information networks. Furthermore, UCSB
and PARC will draw upon their capabilities for mining publicly available media systems such as
tagging systems, microblogging, RSS feeds, etc. These systems are frequently used in everyday
life to support social, communication, and information network functionality. All these datasets
together will provide representative samples of real network structures and dynamics. If
necessary, these data could be enhanced with synthetic data (e.g., geolocation) to have high
similarity to anticipated Army scenarios.
A flexible representation and interactive visualization framework that can automatically adapt to
different interaction platforms will be critical for military operations. For example, an urban
war fighter, clearing houses, will need a different information view than a battalion commander,
who assesses and dispatches incoming intelligence. Global command centers, on the other hand,
need to maintain a holistic view of the situation, including interactive views of all available
assets, mission-critical assets, mission status, threats, and what-if scenarios. The proposed graph
Products
(i) Novel scalable graph visualizations and interfaces to information provenance. (ii) User study
design on information network visualization. (iii) Research reports and published scientific
papers on effective situation-aware visualization of heterogeneous information networks.
I2.1 E3.1, E3.2, E3.3 Time varying aspects of networks will be studied in EDIN Tasks
3.1, 3.2 and 3.3. The research there will provide the kinds of
queries that will be useful for our studies.
Budget By Organization
References
[Barabasi2003] Barabasi A.-L., Linked: How everything is connected to everything else and
what it means. Plume, 2003.
Task I3.1: Methods for scalable mining of dynamic, heterogeneous information networks
(Lead: Jiawei Han). This project is to answer the key research question: ―What are the new
principles and methods for mining distributed, incomplete, dynamic, and heterogeneous
information networks to satisfy the end-user needs?”
Task I3.3: Text and Unstructured Data Mining for Information Network Analysis (Lead:
Dan Roth). This project is to answer the key research question: ―How to develop effective
mechanisms for mining knowledge from text and unstructured data in information networks in
noisy and volatile environments?”
We will systematically investigate these research issues, develop robust and effective methods
and algorithms for solving these problems and test our solutions in military and/or similar
applications. Moreover, since information networks are closely linked with communication
networks and social and cognitive networks in many aspects, we will pay much attention on
collaborating with the other research centers in this project. We will also invest our efforts to
collaborate with IRC on potential technology transfer of our algorithms and methods developed
Task Motivation
It is critically important in military and other application to construct models (i.e., classification)
based on limited training data, discover evolutionary regularities and anomalies from massive,
inter-related datasets. Such massive, inter-related data form heterogeneous information networks
and objects in such information networks may mutually consolidate each other and thus
information network analysis will enhance the quality of data, information and knowledge
overall and help us make intelligent and informative decisions.
Initial Hypothesis
We assume networks in the real world are heterogeneous, interacting and evolving, and new data
set streaming into the system dynamically. Many military applications require the system to
perform effective, scalable, comprehensive and real-time analysis of such information networks.
We further assume such networks consist of multi-typed, interconnected objects, such as
soldiers, commanders, armed vehicles, geospatial objects (such as bridges, rivers, highways,
villages), text messages and documents, and other artifacts, each associated with multiple
properties (called attributes), and such networks poses many new challenges to the analysis
systems that handle only homogeneous objects, such as people-people networks. We hypothesize
that our proposed approaches will be able to discover mission-critical knowledge systematically
from such dynamic, heterogeneous information networks.
Prior work
Most existing network modeling and analysis methods consider homogeneous, static networks
(Girvan02). However, networks in the real world are heterogeneous, interacting, distributive, and
dynamically evolving, which pose great challenges in terms of effectiveness, scalability, and
Technical approaches
Mining distributed, incomplete, dynamic, and heterogeneous information networks is a multi-
year task. The first year work contains three subtasks: (i) developing methods for effective
classification of heterogeneous information networks, (ii) developing methods for pattern
discovery of in evolutionary heterogeneous information networks, and (iii) developing methods
for detecting outliers and exceptions in dynamic, heterogeneous information networks. The first
subtask lays out the foundation and works out effective algorithm for model construction in
heterogeneous information networks. The second task investigates pattern discovery in dynamic
heterogeneous information networks. We expect the result of this study will also contribute to
the study of evolution and dynamics of general networks, i.e., the EDIN project. The third task is
to detect outliers in dynamic information networks where data can be streaming into the network
in real-time. The following are the details of these subtasks.
Validation Approach
We will use the following data sets to design, develop, and validate our proposed methods, not
only for this task but also for the remaining tasks in this project.
DBLP bibliographic networks: This is a typical heterogeneous information network with
authors, titles (a bag of keywords), conferences, and research papers linked together. With
the year information, one can study its evolution regularity and mining interesting patterns.
Note such multi-typed network is also typical in military networks which could be
interconnected, multiple typed entities representing soldiers, commanders, equipments, time,
location, documents, etc.
NASA aviation safety databases: We have bee using NASA aviation incident report
database (ASRS) for studying text mining and information network analysis and will
continue using it for this study. Note that incident reports of similar nature could be common
in military applications.
News data sets: We plan to use news datasets, including Google News or other typical news
datasets to construct information networks and observe the evolution of some popular events
and see how to perform mining in such datasets. Note that news, radio-broadcasts, blogs,
Research Products
1. A set of algorithms and methods generated from this study and a series of reports on the
effectiveness and efficiency tested in the datasets provided above, and
2. A set of research papers to be published in international conferences and journals.
Initial hypotheses
We hypothesize that our proposed approach as outlined below will be able to discover patterns
and knowledge in cyber-physical networks effectively in distributed, mobile environment.
Prior work
Sensor networks and the related data analysis have been studied extensive in previous research,
with many important issues addressed, including formal modeling, debugging, data-link
streaming, privacy-preserving data aggregation, etc. Our aim here is to integrate sensor network
analysis with information network analysis to create a new generation cyber-physical network
systems. Social network analysis, including Web community mining, has attracted much
attention in recent years (Chakrabarti03). Abundant literature has been dedicated to the area of
social network analysis, ranging from the network property, such as power law distribution
(Newman03) and small world phenomenon (Watt03), to the more complex network structural
analysis (Flake02; Girvan02), evolution analysis (Leskovec05; Backstrom06), and statistical
learning and prediction (Aleman-Meza06). The static behavior of large graphs and networks has
been studied extensively with the derivation of power laws of in- and out- degrees, communities,
and small world phenomena (Faloutsos99; Chakrabarti03). Our proposed work is to establish a
general analytical framework, with which users can easily manipulate and explore massive
cyber-physical networks to uncover interesting patterns, measures, and sub-networks.
Technical approaches
Validation Approach
We will use the battle-field scenarios data that simulates military applications in our study. In
the meantime, other dataset that outlined in Task I3.1 as well as the MoveBank datasets (at
movebank.org) that contain spatiotemporal information related to object movements will also be
used for testing in this study.
Products
7.3.6 Task I3.3 Text and Unstructured Data Mining for Information Network
Analysis (D. Roth, UIUC (INARC); J. Han, UIUC (INARC); H. Ji, CUNY
(INARC); X. Yan, UCSB (INARC); M. Faloutsos, UCR (IRC); B.Szymanski,
RPI (SCNARC))
Task Overview
This task is to investigate the integration of text mining and information network
analysis to enhance knowledge discovery in text-rich information networks. Most
Task Motivation
Information networks often contain abundant text and unstructured data, in the form
of electronic documents, reports, e-mails, blogs, conversations, news, web-pages,
and other narratives. We envision that such text-based datasets also play an essential
role in military applications. Besides integrating traditional text information retrieval
methods with information network analysis, it is essential to study multidimensional
text analysis in information networks.
Initial hypothesis
We hypothesize that the proposed methods outlined below that integrate text mining
with information network analysis are effective in knowledge discovery in
information networks containing valuable text data.
Prior work
There have been extensive studies on information retrieval with text data and text mining
(Chakrabarti03; Manning08). However, the multidimensional analysis of text data by
construction of text cube and topic cubes has just been recently proposed in our studies
(Lin08; Zhang09). There have been even fewer studies on integration of information
network analysis with text data. Recently, our group has proposed a topic modeling
approach that integrates information network analysis with text data analysis
(Sun09c).
Proposed approaches
Three research subtasks are proposed as follows.
2. Dynamic language modeling: For queries and mining in natural language speech or
text, we plan to analyze, parse and expand these queries in order to retrieve more
accurate and reliable information. We will consider query-time dynamic language model
adaptation for speech recognition and segmentation. We will leverage the information in
user queries for topic analysis and Language Model (LM) adaptation. We intend to
investigate extensively the underlying characteristics and different kinds of topic
modeling approaches, including the conventional MAP-based adaptation, latent Dirichlet
allocation (LDA) and clustering. Their performance will be analyzed and verified in the
information network setting. In this way we will achieve the fusion of global topical,
local contextual information and the feedback from natural language queries. In this first
year, we will implement a dynamic query expansion method based on entity and event
co-reference and finish experiments on using the query expansion methods in template
based data fusion.
Validation Approach
The data sets used in Task I3.1 will be used in this task. Especially news datasets, such as
Google News, could be the most useful example database for studying topic modeling, topic
cube and text mining. Moreover, we plan to use additional dataset as follows:
NIST Automatic Content Extraction Program's 2002-2009 Training Corpora: These include
thousands of documents with facts annotated (entities, relations and events) in different
genres (broadcast conversation, broadcast news, newswire, news groups and weblogs).
DARPA Global Autonomous Language Exploitation Program's question answering Year 1
and Year 2 training corpora: These corpora include a set of 17 different question templates
and their gold-standard answer snippets.
We plan to use these data sets to construct multi-dimensional text cube and develop information
network analysis and text mining methods for such datasets.
Products
1. A set of methods generated and reports on the effectiveness and efficiency of the mining text
data in information networks, and
2. A set of research papers to be published in international conferences and journals.
References
C. C. Aggarwal, J. Han, J. Wang and P.S. Yu (2003), ―A framework for clustering evolving data
streams‖, in Proc. 2003 Int. Conf. Very Large Data Bases (VLDB'03).
C. C. Aggarwal, J. Han, J. Wang and P.S. Yu (2004), ―On demand classification of data
streams‖, in Proc. 2004 ACM SIGKDD Int. Conf. Knowledge Discovery in Databases
(KDD'04).
B. Aleman-Meza, M. Nagarajan, C. Ramakrishnan, L. Ding, P. Kolari, A. P. Sheth, I. B. Arpinar,
A. Joshi, and T. Finin (2006). ―Semantic analytics on social networks: experiences in addressing
the problem of conflict of interest detection‖, in Proc. 15th Int. Conf. World Wide Web
(WWW'06).
L. Backstrom, D. Huttenlocher, J. Kleinberg, and X. Lan. (2006) ―Group formation in large
social networks: Membership, growth, and evolution. In Proc. 2006 ACM SIGKDD Int. Conf.
Knowledge Discovery in Databases (KDD'06).
D. Bortner and J. Han (2010), ―Progressive Clustering of Networks Using Structure-Connected
Order of Traversal‖, in Proc. 2010 Int. Conf. on Data Engineering (ICDE'10).
S. Chakrabarti. Mining the Web: Discovering Knowledge from Hypertext Data. Morgan
Kaufmann, 2003.
H. Cheng, X. Yan, J. Han, and P. S. Yu (2008), ―Direct discriminative pattern mining for
effective classification‖, in Proc. 2008 Int. Conf. Data Engineering (ICDE'08).
M. Faloutsos, P. Faloutsos, and C. Faloutsos. (1999) ―On power-law relationships of the internet
topology‖, in Proc. ACM SIGCOMM'99 Conf. Applications, Technologies, Architectures, and
Protocols for Computer Communication, pp. 251-262.
G. Flake, S. Lawrence, C. L. Giles, and F. Coetzee (2002) ―Self-organization and identification
of web communities‖, IEEE Computer, 35:66-71, 2002.
M. Girvan and M. E. J. Newman (2002) ―Community structure in social and biological
networks‖, in Proc. Nat. Acad. Sci. USA, pp. 7821-7826, 2002.
J. Han, Y. Chen, G. Dong, J. Pei, B. W. Wah, J. Wang, & Y. D. Cai (2005), ―Stream Cube: An
architecture for multi-dimensional analysis of data streams‖, Distributed and Parallel Databases,
18:173-197, 2005.
M.-S. Kim & J. Han (2009a), ―A Particle-and-Density Based Evolutionary Clustering Method
for Dynamic Networks‖, in Proc. 2009 Int. Conf. on Very Large Data Bases (VLDB'09).
M.-S. Kim & J. Han (2009b), ―CHRONICLE: A Two-Stage Density-based Clustering Algorithm
for Dynamic Networks‖, in Proc. 12th Int. Conf. on Discovery Science (DS'09).
Knowledge discovery in information network has close ties to many projects in INARC and
three other centers, especially with SCNARC, EDIN and Trust-CCRI. The mining will need
guidance and receive requests from other projects, and in turn provide patterns, hierarchies and
new knowledge to other projects and centers. Therefore, we provide the following table and
propose to work closely with other projects and centers.
Budget By Organization
Table of Contents
8 Non-CCRI Research: Social/Cognitive Networks Academic Research Center (SCNARC) . 8-1
8.1 Overview ......................................................................................................................... 8-3
8.2 Motivation ....................................................................................................................... 8-3
8.2.1 Challenges of Network-Centric Operations ............................................................. 8-4
8.2.2 Example Military Scenarios ..................................................................................... 8-4
8.2.3 Impact on Network Science ..................................................................................... 8-5
8.3 Key Research Questions ................................................................................................. 8-5
8.4 Technical Approach ........................................................................................................ 8-6
8.5 Project S1: Networks in Organization ........................................................................... 8-9
8.5.1 Project Overview ..................................................................................................... 8-9
8.5.2 Project Motivation ................................................................................................. 8-10
8.5.3 Key Project Research Questions ............................................................................ 8-13
8.5.4 Initial Hypotheses .................................................................................................. 8-14
8.5.5 Technical Approach ............................................................................................... 8-15
8.5.6 Task S1.1: Infrastructure Challenges of Large-Scale Network Discovery and
Processing (R. Konuru, IBM (SCNARC); S. Papadimitriou, IBM (SCNARC); Z. Wen, IBM
(SCNARC); C.-Y. Lin, IBM, (SCNARC); T. Brown, CUNY (SCNARC); A. Pentland, MIT
(SCNARC); A.-L. Barabasi, NEU (SCNARC); D. Lazer, NEU, (SCNARC); J. Hendler,
RPI (SCNARC); W. Wallace, RPI (SCNARC); N. Chawla, ND (SCNARC); A. Vespignani,
Indiana (SCNARC))........................................................................................................... 8-16
8.5.7 Task S1.2: Impact Analysis of Informal Organizational Networks (C.-Y. Lin, NYU
(SCNARC); S. Aral, NYU (SCNARC); E. Brynjolfsson, MIT (SCNARC)).................... 8-19
The long term objective of the Center is to advance the scientific understanding of how the social
networks form, operate, and evolve and how they affect the functioning of large, complex
organizations such as the Army. We will also address the issue of adversary networks hidden in
large social networks. For successful operations within a foreign society, such adversary
networks must be detected, monitored, and when necessary, dissolved. Finally, it is clear that
human cognition on one hand directs and on the other is impacted by the network-centric
interactions. Furthering understanding of this complex and important feedback loop is of primary
importance for social network science. The Center will undertake research to gain a fundamental
understanding of the underlying theory, as well as create scientific foundations for modeling,
simulation, measurements, analysis, prediction, and control of social/cognitive networks and
humans interacting with such networks as well as understanding, modeling, and engineering the
impact that these networks have on the U.S. Army.
8.2 Motivation
In large organizations such as the U.S. Army, social networks involve interplay between a formal
network imposed by the hierarchy of the organization and informal ones based on operational
requirements, historical interactions, and personal ties of the personnel. Traditionally it has been
difficult to extract data that can document the interplay between these informal and formal
networks. We address this challenge by planning to utilize data collection developed by the IBM
team of the Center. The data covers employee interactions, communications, activity, and
performance across the IBM Corp. This data will enable us to conduct research to further our
understanding of the fundamental aspects of human communication within an organization and
the impact of social and cognitive networks on issues ranging from team performance to the
emergence of groups and communities.
Currently, and in foreseeable future, the U.S. Army is or will likely be operating in coalition with
allied armies and deeply entangled within the foreign societies in which those missions are
conducted. Social networks of the allies and the involved societies invariably include groups that
are hostile or directly opposed to U.S. Army goals and missions. Such groups are embedded in
larger social networks, often attempting to remain hidden to conduct covert, subversive
operations. The challenging research issues of studying such adversary social networks include
their discovery, the construction of tools for monitoring their activities, composition, and
hierarchy, as well as understanding their dynamics and evolution. We will address these
challenges based on statistical and methods for analyzing large social networks.
Ultimately, the network benefits and impact are limited by the human‘s capability to understand
and act upon the information that they are capable of providing. Hence, the human cognition is
an important component of understanding how networks impact people. Important challenges to
study human cognition in relation to network-centric interactions include finding how limits on
cognitive resources or cognitive processing influence social (or asocial) behavior or what
demands do social behaviors make on cognitive resources and processing that may limit basic
information processing mechanisms such encoding, memory, and attention. We would like also
Finally social networks are an important conduit and also reservoir of ideologies and opinions.
Hence understanding the dynamics of community creation, existence, and dissolution around the
ideas or opinions is important aspect of social network science. We plan to investigate different
models of representing ode diversity and its impact on community formation and disintegration.
Engineering such process on real social networks may be important for U.S. Army missions in
foreign societies.
Those are the issues that we will address in our research on cognitive aspects of social networks.
Fundamental research challenges of dynamic and evolving integrated networks are the subject of
the CCRI EDIN to which researchers of SCNARC bring an important view of social, cognitive,
and network science perspective. Since these challenges and associated research are described
elsewhere, no further discussion of these issues is given here.
Central to the operation and efficiency of social networks is the level of trust among interacting
agents. The challenges of studying trust in networked systems are the focus of the CCRI research
in this area to which the SCNARC researchers will fundamentally contribute, as ultimately trust
is a social issue with technology and information playing an important but supportive role. The
corresponding challenges are discussed in a separate Trust section of the IPP, so are no discussed
here.
The modern military increasingly needs to rely on bottom up network processes, as compared to
top down hierarchic processes. This paradigm shift creates unique challenges and invokes
several important questions. How does the pattern of interactions within a military unit affect
performance of tasks? How do formal and informal social networks interact with each other?
How technology (communication networks, information networks) fosters or weakens
connections between people? What kinds of ties external to the Army are necessary to success?
How can we use the massive streams of data to detect adversarial networks? How a social and
cognitive network can quickly extract the most meaningful information for the soldier and
decision maker that is useful in all aspects of their operations from supporting humanitarian
operations to force protection and full combat operations. How informal social networks can be
influenced or engineered, or even dissolved? How the human cognitive processes impact the
performance of the soldier, more generally, human in network centric environment? The research
The research within Social Cognitive Network Academic Research Center is organized in the
following four projects.
This project focuses on analyzing and understanding fundamental principles guiding the
formation and evolution of large organizational networks and their impact on performance of the
teams embedded in them. This project consists of three tasks, each targeting a major aspect of
social network researches – capturing of networks, impact of networks, and understanding of
networks.
The task S1.1, Infrastructure Challenges for Large-Scale Network Discovery and Processing,
will study the infrastructure challenge in gathering and handling large-scale heterogeneous
streams for social network research, with the context of information network and communication
network. We will conduct system level research to consider how to incorporate real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
Given that informal social network data reside intrinsically in different data sources, informal
networks can usually only be inferred from sampling larger networks. Since partially observed
data is a norm, we will derive mathematical theories to investigate the robustness of graph
sampling and its implications under various conditions. We will investigate what types of
sampling strategies are required to obtain a good estimation on the entire network. We will also
investigate analytic methods to conduct network analysis on only partially observed data.
Task 2 of Project S1, Impact Analysis of Informal Organization Networks, Second, will analyze
the impacts of informal social networks in an organization. We want to learn what and how
informal networks affect the performance of people. We shall model, measure and quantify the
impact of dynamic informal organizational networks on the productivity and performance of
individuals and teams; to develop and apply methods to identify causal relationships between
dynamic networks and productivity, performance and value; to model and measure peer
influence in social networks by statistically distinguishing influence effects from homophily and
confounding factors; and to examine how individuals change their use of social networks under
duress.
Task S1.3, Multi-Channels of People, will investigate the multi-channel networks between
people. With the unique large-scale info-social network system of IBM‘s SmallBlue system, we
are able to capture multi-facets of people‘s relationships via such means as email, instant
messaging, teleconference, etc. Based on this data, we will explore whether channel capacity and
The overall goal of this project is to study adversary networks through the communication and
information signature that such networks create during internal interactions. The broad research
questions which we address in this project include (i) identification of communities in a dynamic
social network, especially hidden and informal groups, based on measurable interactions
between the members, but without looking at details of the interactions, (ii) uncovering relations
between communities in terms of membership, trust and opposition or support, (iii) observing
evolution and the stable cores of communities, especially anomalous and adversary communities
or groups, and their relationships, (iv) understanding how information flows within such
communities and between communities, (v) identifying communities in social networks, which
manifestly emerge as a results of communication and information flowing across the links, (vi)
developing efficient strategies to dissolve communities in social networks, corresponding to
adversarial communities with hostile, extremist and/or militant ideologies. To address the key
research questions of this project, we defined three tasks.
In the first task of this project, S2.1 Detection of Hidden Communities and their Structures, we
use interaction data over time to build a picture of community structure and community
evolution, including information pathways and inter-community relationships. This is an
appropriate first step in understanding the core of social networks.
In the second task, S2.2 Information Flow via Trusted Links we build agent-based models to
study how information pathways are affected by the varying degrees of trust between individuals
and communities in heterogeneous networks which contain adversary (non-trusted) as well as
non-adversary (trusted) networks.
In the third task, S2.3, Community Formation and Dissolution in Social Networks, we will
develop individual-based models for opinion formation in order to detect communities in social
networks. Further, we will develop the efficient strategies and trade-offs for attacking and
disintegrating adversarial communities with hostile, extremist, and/or militant ideologies. Our
long-term objective is to develop generically applicable frameworks and computational methods
for extracting individual- to community-level behavioral patterns from the underlying social
graphs.
This project will bring the computational modeling techniques of cognitive science together with
the tools and techniques of cognitive neuroscience to ask how the design of the technology
(human-technology interaction), the form and format of information (human-information
interaction), or features of communication (human-human interaction) shape the success of net-
centric interactions. It includes three tasks.
The first task S3.1: the Cognitive Social Science of Human-Human Interactions investigates
cognitive mechanisms that influence human interactions. Our initial topic will temporarily
combine this effort with the CCRI on trust to investigate how cognitive mechanisms are affected
by trust and how human evaluations of trust influence our subsequent cognitive processing of
information. A year 1 focus of SCNARC 3 will be to examine the effect of trust on cognitive
processing and variations in human trust over human-human versus human-agent interactions.
Specifically, we hypothesize that differences in trust are signaled by differences in cognitive
brain mechanisms and that these differences can be detected by event-related brain potential
(ERP) measures and related to established cognitive science constructs, which in turn can be
incorporated as improvements in the ACT-R cognitive architecture. A key element in the study
will be the analysis of cognitive brain data collected from humans as they receive information
from the interactive cognitive agent or other humans.
The second task, S3.2: Human-Technology and Human-Information Interactions, in the first year
will develop a Net-Centric Simulated Task Environment to collect across multiple locations data
on Cognitive Social Science constructs of interest. The focus will be on technology mediated
human-human interactions along with an emphasis on interaction design and information form,
format, and order. These data will be collected and analyzed across at least three sites: RPI,
CUNY, ARL APG, and PARC.
Together, these three project consider the whole spectrum of human aspects of network science,
from cognition to social interactions and the whole spectrum of the social networks, from formal
and open, to informal, to highly organized but hidden. Together, they promise to create scientific
foundation for our understanding of dynamics in network science for large scale diverse social
and cognitive networks.
Based on our existing expertise in building large-scale distributed socio-info network system, we
shall also conduct the network-centric system researches in understanding the architecture needs
for real-time decision making.
In principle, social networks in the Army are informal organization networks, which relate to the
formal hierarchical networks. How do social networks form and evolve in an organization? What
kinds of roles do the formal organizational hierarchy and informal social networks affect the
Army‘s personnel‘s performance? How different divisions of people work together, and, in what
way, in social networks can enhance interdivision collaboration? Can trust be propagated from
one division to another?
To understand social networks in the Army, it is important to gather and analyze the real
relationship data of Army personnel. However, due to privacy and security concerns, it would be
very difficult to conduct this type of data collection, especially for researchers outside the Army.
Therefore, social network research will have to heavily rely on external data – either public or
gathered in other organizations. After theories are developed and verified in external sources,
then experiments can be done inside Army organizations to validate or modify the theories.
Research on people networks in organizations provides insights and predictive models on how
people in big, hierarchical organizations work with each other. These models shall serve as
foundations for Army social networks modeling. Research on infrastructure requirement in an
organization corresponding to the size of the Army will help to understand the infrastructure that
is needed for social network analysis, as well as cross-network analysis, in the Army. We target
several theories: (1) Network Sampling Theory that explains how to best observe social
networks, (2) Networked Social Capital Theory that quantifies the value of social networks, and
Prior Work
We have been making qualitative and quantitative studies of organizational networks by building
methods, tools, and theories to examine the many digital traces left within organizations of
collaboration and communication. Our objective is to use behavioral data to understand the
relational structures in an organization to find what behaviors correlate with friendship, advice
giving, and information sharing and what patterns are associated with success at both the
individual level and the collective level, focusing particularly on the dimension of network
dynamics.
Figure S1-1 System overview of the original SmallBlue social network analysis and expertise mining engine.
Tens of thousands of distributed social sensors were deployed in 76 countries to capture communications
between people as well as the term frequencies of the communication contents.
IBM Corporation has 400,000 worldwide employees with complex hierarchical structures. There
can be as many as 10 to 15 layers of hierarchy from the CEO to the general employees. There are
Since 2006, our team at IBM has been inventing and deploying in more than 70 countries an
organizational social network analysis system, SmallBlue [Lin08], to quantitatively infer the
social networks of 400,000 employees within IBM organizations. SmallBlue deploys social
sensors to gather, crawl and mine significant amount of data sources, including content and
properties of individual email and instant message communications, calendars, organizational
hierarchical structure, project and role assignment, employee performance measurement,
personal and project revenue, and so on. These sensors have been placed on more than 10,000
volunteers' machines. Millions of continuous dynamic data items have been being processed in
the server in order to discover the valuable business intelligence of who knows what, who knows
whom, and who knows what about whom within an organization. It also unlocks the value of
social networks through analyzing social network data in conjunction with the individual and
project financial gain [Wu09]. The aim of SmallBlue is to automatically locate knowledgeable
colleagues, communities, and knowledge networks in organizations. It also helps users manage
their personal networks, reach out to their extended network to find and access expertise, reveal
personalized relevant information such as documents or webpages that are shared or found useful
by their extended network, and visualize millions of keyword-based social networks of subject
experts in organizations. SmallBlue provides Google-like expertise search capabilities.
The SmallBlue framework is a general platform for analysis, indexing and querying of social
networks, derived from continuous streams of information such as email, instant messaging,
blogs, and wikis. The input consists of arbitrary feeds, which are aggregated at the appropriate
granularities, to derive and infer both social links (i.e., relationships among people) and expertise
(i.e., relationships between people and content). The initial version of SmallBlue shown in
Figure S1-1 focused on social networks. A later version of SmallBlue shown in Figure S1.2 also
crawls webpages and databases and receives data feeds to infer information networks and the
cross-network relationships between people and information. This integrated system includes
social, info, and socio-info networks, and has been applied inside IBM for network-centric
personalized search and recommendation.
By Nov. 2009, the SmallBlue system has been capturing 20,000,000 emails and instant messages
(with the communication party information and term-frequency statistics of content), 1,000,000
items of Learning asset access data (including which Learning courses and materials employees
accessed), 10,000,000 items of Knowledge asset access data (including who accessed which
knowledge assets, e.g., technical documents, presentations, market analysis, business
intelligence, etc.), 1,000,000 items of Web 2.0 social software usages (social bookmarks, blogs,
file sharing, etc.), 200,000 employees‘ external financial billing databases, and 400,000
employees‘ organization profiles (including hierarchy, location, demographic data, job roles, job
categories, self-reported skills, resumes, etc.) The system keeps acquiring live data inside the
company to provide timely and real-time services for social network analysis, expertise mining,
social-network-enabled information recommendation, and search. It is unique in its scale, live
environment, and trustworthy information, as well as the multiple facets of people. It certainly
can help network scientists have better understanding of human behavior, and thus build up
models that can be experimented on and verified.
Project S1 will focus on generating these outcomes that will have long-term impact on Network
Science:
(1) Perform System Research in suggesting what kinds of infrastructure design will better
fit the goal of network science for real-time large-scale distributed decision making and
utilization of network data capture and analysis.
(2) Derive Network Sampling Theories to help future network systems understand the
resources needed to deploy sensors, either physical or virtual, for data gathering.
(3) Derive Networked Social Capital Theories to quantify the value of social networks. The
theories will include both micro-scale (e.g., person, or a small team) and macro-scale
(overview of the network structure of a division or organization).
(4) Derive Networked Social Capacity Theories to understand how people utilize networks
using multiple channels of interaction given their constrained capacity (e.g., time).
How do people networks form, operate, and have impact in large organizations?
In practice, it is difficult to thoroughly capture all information about every individual in the same
degree of detail. For instance, data streams from physical sensors of soldiers in the combat field
may not always available. Sometimes, the information can be missing and thus make it
impossible for the system to understand the current social and cognitive network status of
individuals. In another scenario, global privacy laws enacted in many countries prohibit
communication data being processed by the service provider beyond the scope of providing
communication service. Therefore, gathering a complete set of non-anonymized communication
information from a service provider usually draws negative legal and privacy debate. Instead, we
collect data from volunteers and use that data to infer the network on a much larger scale. Given
that informal social network data reside intrinsically in different data sources, informal networks
can usually only be inferred from sampling larger networks. Since partially observed data is a
norm, we will derive mathematical theories to investigate the robustness of graph sampling and
Second, we will analyze the impacts of informal social networks in an organization. We want to
learn what and how informal networks affect the performance of people. We shall model,
measure and quantify the impact of dynamic informal organizational networks on the
productivity and performance of individuals and teams; to develop and apply methods to identify
causal relationships between dynamic networks and productivity, performance and value; to
model and measure peer influence in social networks by statistically distinguishing influence
effects from homophily and confounding factors; and to examine how individuals change their
use of social networks under duress. With the understanding of the impact of social networks, we
can thus derive the real ‗value‘ or ‗quality‘ of social networks. We refer to this as Network
Capital Theories. Being able to quantify the quality of social network shall become a critical
contribution into the researches in CNARC, which operates communication networks based on
the quality measurement of other networks.
Third, we will investigate the multi-channel networks between people. With the unique large-
scale info-social network system of IBM‘s SmallBlue system, we are able to capture multi-facets
of people‘s relationships, including email communications, instant messaging communications,
teleconference or face-to-face meetings, file sharing, social bookmarking sharing, blog
interaction, wiki collaborative document composition, knowledge sharing, etc. This SmallBlue
system also collects the content of the above multi-channel interactions. Thus, it provides a
unique opportunity to observe how people networks form and interchange between different
channels. We shall also explore whether channel capacity and coding theories in
communications and information theory can be extended to the humanity domain to model the
relationship variation and distribution in different channels.
Task Overview
We plan to conduct basic researches on how to design infra-structure that will support efficient
storage, updates, and queries against social network data. Data items that contain the association
between entities along with a timestamp are continuously collected from multiple feeds. Each of
those items may have content and several different metadata fields that need to be stored. A
scalable and efficient infrastructure to store and retrieve these dynamic data sets is necessary.
Specifically, the infrastructure should be powerful, flexible, and simple to use for all these
sources. Appending new data items should be fast, updating existing items should be possible,
and efficiently accessing the entire corpus for aggregate analyses should be efficient and
scalable. Furthermore, the infrastructure should support distributed processing and analysis of
the data. Finally, the infrastructure should be interoperable with systems and models of other
types of networks, e.g., the information network systems from INARC and the communication
network systems from CNARC.
Task Motivation
We will conduct system level research to consider how to incorporate this type of real-time
network processing requirements into the existing SmallBlue socio-info network infrastructure.
One of the technologies we will be looking at is the novel stream processing system that we
developed for the Department of Defense -- IBM InfoSphere Stream, which enables aggressive
intelligence extraction of information and knowledge from heterogeneous continuous data
streams of 10 to 100 Gbits/sec. We shall conduct research to incorporate the InfoSphere-like
system for real-time network decision making.
Since partially observed data is a norm, we will derive mathematical theories to investigate the
robustness of graph sampling and its implications under various conditions. We call this a
Network Sampling Theory. We will investigate what types of sampling strategies are required to
obtain a good estimation on the entire network. We will also investigate analytic methods to
conduct network analysis on only partially observed data.
Key Research Questions
Design a set of benchmarks and use cases to evaluate infrastructure needs for social network data
storage and processing. In particular, to what extent can web crawling, social sensor mining, full-
text storage, indexing, processing, and search infrastructure and methods be leveraged? How can
these be integrated with large-scale data processing software (e.g., Apache Hadoop DFS and
MapReduce, HBase, etc)?
As social networks grow exponentially over time, any conclusion drawn from partial network
observations could be biased and deceiving. It is often impossible to obtain the entire network
because of resource constraints and continuous network growth. Because of time constraints, in
many applications it is impossible to perform the computation of network measures and
Initial Hypotheses
In order to sample a large scale social network, we will develop several graph sampling methods:
(1) randomly sample the original graph, (2) sample globally, for each sampled node, also sample
its neighbors, and (3) choose one node and sample its neighbors with k-hops. We will investigate
which sampling strategy is most effective for different network analytical tasks. In addition, we
will study if the network topology plays a significant role in selecting the right sampling
techniques for finding high-quality patterns.
Prior Work
The core abstraction for these kinds of data is sparse matrices, or equivalently, adjacency lists.
There are a number of approaches to store, index, and query such data. Traditional information
retrieval systems such as Lucene [Lucene09] store document-term matrices. However, these are
hard to scale to large clusters of commodity hardware. In response to challenges posed by
requirements for analyzing the web graph, a number of approaches have emerged, including the
Google Filesystem [Ghemawat03], MapReduce [Dean04], BigTable [Chang06], Hadoop, Pig,
Hive [Hadoop09], and so on.
Technical Approach
We feel that none of the already existing solutions are suitable for our setting, out of the box, for
the following reasons. First, both indegree and outdegree distributions may be heavily skewed: in
a web graph, the indegree may be heavily skewed (e.g., CNN frontpage versus personal
homepages), but the outdegree distribution usually is not. Therefore, in graphs arising from
social interactions, processing ―super-nodes‖ poses significant performance penalties. Second,
performing traversals naively on the graph is slow: if the graph is not somehow clustered so
graph neighbors are stored together, traversal is slow. Finally, we need a clean way to deal with
different information sources (e.g., emails, profile data, webpages, and bookmarks, etc) in a
clean manner. We currently use different stores for each (e.g., pure relational databases for
profile data, Lucene for emails, a mix of both for bookmark data, Solr for data with text content
and typed faceted information and so on), and joining different data sources is often cumbersome
and time-consuming.
These are problems that need to be addressed, whether by adapting or modifying existing
approaches, by writing general enough ―glue‖ components, or by building completely new
components (to be determined), as necessary. At this stage, SmallBlue employs fairly
Q2. Design a set of benchmarks and use cases to evaluate infrastructure needs for social network
data storage and processing. In particular, to what extent can web crawling, social sensor mining,
full-text storage, indexing, processing, and search infrastructure and methods be leveraged?
How can these be integrated with large-scale data processing software (e.g., Apache Hadoop
DFS and MapReduce, HBase, etc)?
Q3. Start design and experiment prototype infrastructure to store and query graphs, which also
allows: (i) efficient retrieval based on graph metrics (e.g., weighted neighbor range queries,
shortest path, etc.); (ii) efficient aggregation based on node relationships; (iv) support for
heterogeneous types of nodes and edges; and (iv) support for both original, observed data (e.g.,
emails, webpages, etc) as well as derived information (e.g., document topics, user profiles and
expertise, etc).
Q4. We initially plan to start from homogeneous graphs (single type of nodes and edges) and
then expand into dynamic large-scale heterogeneous graphs.
In the next step at Year 2, we shall investigate how to incorporate streaming updates. Different
data sources need to be updated at different frequencies and/or will produce data at different
volumes. For some data sources, the rate of updates may be high enough to warrant a different
approach for storage and indexing.
Validation Approach
We plan to evaluate the needs that streaming updates pose; the open, Internet version of
SmallBlue, which will be developed as part of another project, will hopefully provide the
necessary high data volume testbed to evaluate alternatives.
Summary of Military Relevance
This study is important to allow the Army to understand the system-aspect challenges in
analyzing large-scale networks of network-centric systems. Especially, it helps to understand
how the system shall be designed to consider various networks – social, info, and
communications – together.
Research Products
There shall be understanding of the system needs – how the network-centric system shall be
designed and the design recommendations.
Task Overview
The purpose of this task is to model, measure and quantify the impact of dynamic informal
organizational networks on the productivity and performance of individuals and teams; to
develop and apply methods to identify causal relationships between dynamic networks and
productivity, performance and value; to model and measure peer influence in social networks by
statistically distinguishing influence effects from homophily and confounding factors; and to
examine how individuals change their use of social networks under duress.
Task Motivation
Productivity and Social Network: It is well-accepted that social dynamics and chemistry can
dramatically affect the team performance. One way to measure the productivity of individuals or
group is based on revenues or other quantifiable success measurements.
Causality of Network Impacts: Determining the direction of causality is central to
understanding how to help employees improve their productivity. Currently, our analyses focus
on correlations. The results would be much stronger if we can establish a causal mechanism of
how knowing more executives would help improve a worker‘s productivity, and truly debunk the
alternative hypothesis that high performers are more likely to attract managerial interaction.
Using our existing social network infrastructure, we can learn this mechanism through
randomized experiments and through extensive interviews in the field.
Peer Influence of Social Networks: We propose to study the degree to which networks `spread'
productivity by making those who are connecting to productive peers more productive, or
alternatively the degree to which productive workers attract network contacts.
Utilization of Social Networks under Duress: We shall explore how people activate their social
networks when they are under duress. We hypothesize how people use their network ties can
have a profound impact on they can internalized the stress as well as improving their chance of
obtaining new opportunities.
Initial Hypotheses
Anecdotal evidence from a recent article in New York Times shows that some people tend to
immerse themselves in a web of close relationship. Deepening their relationship with friends and
loved ones help them cope with the psychological stress from being unemployed. A recent Wall
Street journal article also profiled a Wall Street banker graduated from MIT who also was laid
Prior Work
To the best of our knowledge, we published the first large-scale quantitative study on the
connection between social dynamics and productivity [Wu09]. We derived the social network
data of 7500 employees from their electronic communications at a large information technology
firm over 2 years. In total, about 400,000 people were in the aggregated social network. We
focused our study on the 2500 consultants in our sample and collected detailed data on the 2,592
projects these consultants participated in from June 2007 to July 2008. The sheer volume of the
data allowed us to more precisely estimate how population level topology in a network
contributed to information worker productivity, after controlling for human capital, work
characteristics, and demographics.
Social networks has long been theorized to help people realize better outcome in the labor
market (e.g. [Granovetter73], [Castilla05]) Network contacts can offer job seekers with
information about where to find jobs as well possibly influence the hiring process in the seeker‘s
favor. However, in certain situation or in certain network configuration, job seekers are limited
in activating their network, such as those weak and long ranging ties [Granovetter73]. For
example, Smith [Smith05] explores how urban black are forced to leverage their strong ties even
if weaker and long ranging ties exist. She attribute this phenomenon to the lack of trust from the
weakly connected individual to help the job seeker who fear that referring the job seeker may foil
their own reputation at the workplace. Her work shows that the network people activate are
different from their potential network. Social psychologists have long emphasized the
importance of studying the cognitive network as opposed to the objective networks
[Krackhardt87]. In this study we study how individuals cognitively activate their social network
under duress. This is an important and largely unexplored area in social network research.
Technical Approach
Productivity and Social Network: Specifically, we uncovered four key results. First, we found
that the structural diversity of social networks is positively correlated with performance,
corroborating previous work [Aral07]. Second, network size was positively correlated with
higher productivity. However, when we separated network size into in-degree and out-degree,
we found that while in-degree was positively correlated with higher work performance, out-
degree was not correlated with performance in the project network in which each node was a
project not a person. Third, for both the employee and the project network, knowing many
executives was positively associated with work performance. However, having many managers
on a project was negatively correlated with project revenues. Fourth, we found that betweeness
centrality was negatively correlated with individual productivity while it was positively
correlated with project revenues.
Peer Influence of Social Networks: We propose to study the degree to which networks
`spread' productivity by making those who are connecting to productive peers more productive,
or alternatively the degree to which productive workers attract network contacts. We intend to
develop and employ several different methods in order to separate selection, homophily and
influence from exogenous contemporaneously correlated confounding shocks to individual‘s
productivity. We intend to do this using the following methods:
1. Peer effect models of social influence which extend previously developed theory from the
class of spatial autoregressive models. When groups vary in size or structure, deviations from
group means can under certain assumptions be identified using various subsets and supersets of
the graph as instrumental variables.
2. Actor-oriented models which model the dynamic co-evolution of networks and behavior as
continuous time Markov models. These are based on panel network data estimated with Markov
Chain Monte-Carlo methods to reduce the dimensionality of the state space.
3. Matched sample estimators which simulate experimental settings by matching potential
treatment nodes with treatment comparison nodes that are likely to be similar on observed
dimensions. These methods will help us address the role of influence in social networks.
Utilization of Social Networks under Duress: The recent financial downturn provides us a
unique opportunity to study how people activate their social ties under distress. In Jan 2009,
IBM, the largest information technology company in the world, underwent a restructuring
process and eliminated 10% of their work force world-wide, resulting more than 40,000 people
losing their jobs. However, one of the unemployment benefits at IBM is that the laid-off
employees do not have to leave IBM immediately. In the next two months, they can still use all
the resources of an active employee to look for a position within the firm. We expect a large part
of their daily communication would be attributed to looking for a new position. Since SmallBlue
Next, we plan to study how the dynamics of network activation over time. Perhaps, people may
initially embrace their strong ties and immerse themselves in a web of densely connected
network for moral support and coping with the psychological stress induced from the job loss.
Then perhaps, we will see people activating their weak and long ranging ties for finding future
opportunities. The duration for each stage may vary and it is important to see if any personal or
network characteristics moderate the durations in each stage.
The ideal outcome is that we see a distinctive pattern of how people activate their social network
under distress. The activation pattern should be different from those who were not laid off. From
our initial analysis, we see a communication peak, as measured by total email exchange, for
those who were laid off as compare to those who are not. Next, we plan to delve deeper into
these communication exchanges to see whom people talked to and their relationships. We hope
to find a pattern of the activation strategies when people are under duress.
Validation Approach
We plan to evaluate the models, especially the causality studies, based on the SmallBlue system.
Research Products
Novel study results and understanding of human networks, as well as, the network-affected
performance is further understood. There was no precedent large-scale study. The results shall be
insightful to the progress of network science.
Task Overview
This task will investigate the multi-channel networks between people. With the unique large-
scale info-social network system of IBM‘s SmallBlue system, we are able to capture multi-facets
of people‘s relationships, including email communications, instant messaging communications,
teleconference or face-to-face meetings, file sharing, social bookmarking sharing, blog
interaction, wiki collaborative document composition, knowledge sharing, etc. This SmallBlue
system also includes the content of the above multi-channel interactions. Thus, it provides a
unique opportunity to observe how people network forms and interchange between different
channels. We shall also develop theories to model the effectiveness of the channels for people to
exchange information and build relationship, and thus help people appropriately allocate their
limited capacity over the channels. In particular, we will explore whether these theories can be
built by extending channel capacity and coding theories in communications and information
theory to the humanity domain.
One study of understanding multi-faceted relationships of people is to know how and whether
people with similar interests would voluntarily gathering together, having more interactions,
sharing more information, and automatically forming communities. In other words, if ―Birds of a
Feather Flock Together‖ is correct, how true it is? How diverse are social networks formed by
people?
Modeling user interests is important for search and recommender systems to provide
personalized information to meet individual user needs [Teevan05]. Towards this goal, existing
works have studied a user's explicit interests specified in his profile, or implicit interests
indicated by his prior interactions with various types of information, such as content the user has
created or read including web pages, documents and email. Recently, the proliferation of online
social networks spark an interests of leveraging social network to infer user interests [White09],
based on the existence of social influence and correlation among neighbors in social networks
[Singla08]. For applications that can directly observe a user's behavior (e.g., logs of search
engines he uses), inferring interests from his friends in social networks provides one extra useful
enhancement. For many other applications, however, it is difficult to observe sufficient behavior
of a large number of users. In such scenarios, inferring their interests from their friends can be
the only viable solution. For example, for a new user in a social application, the application may
only have information about his friends who are already using it. To motivate the new user to
actively participate, the application may want to provide personalized recommendations of
relevant content. To this end, the application has to infer his interests from friends.
However, there exists huge variation in the types and amount of information in social
interactions. According to existing studies on enterprise social networks [Brzozowski09], only a
small percentage of online users (e.g., <10%) may actively contribute social content using one or
more social software (e.g., blogs, social bookmarking and file sharing). But a large number of
users (e.g., >90%) may seldom do so. Moreover, certain user contributed data may not be
accessible (e.g., private files) or cannot be associated with a particular user (e.g., anonymous
data). That results in both a demand and a challenge for accurate user interest modeling,
especially for inactive users. On one hand, accurate user interest modeling can provide
personalized search and recommendation results, and thus may help to increase the usage of
Task Motivation
People always interact with others through many different ways and have different types of
relationship. The researches in this task are to take human as a basic unit, to understand how a
person handle and allocate different relationships with other people – under the resources such as
time or information constraints.
Taking human as a basic research study unit, we shall also be interested in investigating how a
person‘s interaction with information relates to other people in social networks. This type of
scenario can help the Army understand how people‘s information sharing behavior relates to the
longer-term social interaction behavior. And, also, from another angle, how the people in
personal social networks possible affect one‘s decision or infer one‘s own interest.
Initial Hypotheses
People‘s capacity is limited. How do people allocate their resources in keeping relationships? We
assume it is possible to derive relationship channels through Dynamic Probabilistic Complex
Network (DPCN) Modeling. Through the DPCN models, we shall then be able to further
consider it in conjunction with channel capacity theories in information theories to model
human‘s capacity issues, especially regards to the people-to-people networks and the people-to-
information networks.
Prior Work
We proposed a Dynamic Probabilistic Complex Network (DPCN) model to predict people
behavior in diffusing different types of information propagation in network [Lin07]. DPCN
models the states of nodes and edges as Markov models, which are infected into different states
as information spreads.
Technical Approach
Figure S1-3 shows a description. DPCN models the time and probabilistic factors in information
propagation, and estimates time requirements to spread information throughout a region of
network, based on previous behavior modeling of individuals on topic spreading.
P(t t) f M (Q(t ), P(t )), and Q(t t) gM (P(t t ), Q(t ), P(t )),
where
Pr( yi , j (t ) S ) i, j
Pr( xi (t ) S ) i
Pr( yi , j (t ) D ) i, j
pi,j (t ) , q i (t ) Pr( xi (t ) A) i , i, j i, j i, j i, j 1 and i i i 1.
Pr( yi , j (t ) A) i, j
Pr( xi (t ) I ) i
Pr( yi , j (t ) R ) i, j
xi (t ) is the status value of vertex i at time t, yi , j (t ) is the status value of edge i →j at time t.
The network topology follows the characteristics of complex network, e.g., (1) the node
degrees shall follow the Power-law:
d
Pr( u( pi, j ) l ) S l
i
1, if t , Pr( yi , j (t ) null ) 0
where u( pi , j ) , and d is typically in the range of 2 ~ 2.5, and (2) the
0, else
small-world phenomenon such as the clustering coefficient C:
C Pr(u( p j ,k ) 1| u( pi , j ) 1, u( pi ,k ) 1) CTH
In the Year 1, we shall investigate the large data we collected using this DPCN model and then
derive theories and algorithms to describe the channels. In the following years, we shall then
further investigate how people allocate their capacities to use different types of channels and
consider the capacity issues from the macro network viewpoints.
Validation Approach
We shall use the data collected in the SmallBlue system to conduct researches and validate
models.
Research Products
Studies as well as Models will be derived and tested, and then published.
The SmallBlue system, a mature social network and information network analysis product-level
platform, can be an important experimental vehicle to the entire CTA. To foster collaboration
with all other CTA projects, the SmallBlue platform shall gradually evolve to include the
Since late 2008, the key researchers in Project S1 have been closely working with other team
members in the SCNARC to create the Center proposal. There has been tremendous mutual trust
and interesting discussions between teams so strong collaborations are expected in the future.
Especially, RPI and IBM have a very strong collaboration history. Many key IBM researchers,
including the current Head of worldwide IBM research organization, are RPI alumnus. The
collaboration between IBM and New York State universities (e.g., CUNY and NYU) is
especially supported by New York State Government. IBM welcomes visiting students or
researchers of the NS Consortium to come to IBM to conduct researches on anonymized data
inside firewall.
In addition, we are building new collaborations with SCNARC researchers from Northeastern,
Northwestern, Indiana, MIT, and Notre Dame. We are also initiating new collaborations with
IRC and CNARC researchers. For instance, we shall collaborate with IRC researchers Kathleen
Carley on social network analysis systems, Jaideep Srivastava on social/information network
search and recommendations, and David Parkes on economic impact of networks, and with
CNARC researcher Prasant Mohapatara on communication network issues with social/info
networks. We are looking forward to building up more collaborations in the years to come.
Project S1 will provide a researcher (with Ph.D. degree) to the NS center in every other year.
Research Milestones
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 32,557
IBM (SCNARC) 329,804
IU (SCNARC) 11,000
MIT (SCNARC) 14,009
MIT (SCNARC) 40,000
ND (SCNARC) 11,000
NEU (SCNARC) 96,969
NWU (SCNARC) 14,000
NYU (SCNARC) 40,000
RPI (SCNARC) 45,921 8,383
TOTAL 635,260 8,383
References
[Aral07] S. Aral, E. Brynjolfsson, and M. Van Alstyne. Productivity effects of information diffusion
in networks. In Proceedings of the 28th Annual International Conference on Information
Systems, Montreal, CA, 2007.
[Borgatti03] S. Borgatti and P. Foster. The network paradigm in organizational research: A review
and typology. Journal of Management, 29(6):991--1013, 2003.
[Burt92] R. Burt. The Social Structure of Competition. 1992.
[Castilla05] Castilla, E. Social networks and employee performance in a call center. Amer. J. of
Sociology 100(5) 1243-1283, 2005.
For example, IW is a complex and ambiguous inherently social phenomenon: ―insurgency and
counterinsurgency operations focusing on the control or influence of populations, not on the
control of an adversary‘s forces or territory‖ (IW, Joint Operating Concept, 2007). Likewise, ―an
operation that kills five insurgents is counterproductive if collateral damage leads to the
recruitment of fifty more insurgents‖ (COIN). To help operations in such an environment, in this
project we will develop individual-based models to investigate social influencing and associated
strategies in weighted social networks. Our methods and models for community detection,
community stability, and social influencing will be applicable to any data sets, spanning vast
scales, including those collected by the military.
In conclusion, being able to identify the hidden networks and communities (some adversary and
some supportive) in a social network is a fundamental task to building a theory of network
science. Sound methods for this project would thus lay one of the foundations for much research
in the field.
Research Question 1. How can we identify communities in a dynamic social network, especially
hidden and informal groups, based on measurable interactions between the members, but
without looking at details of the interactions? For example, if the interactions are
communications, can we understand the underlying social structure of the network by only
looking at the communication dynamics of the network, and not the communication
contents?
Research Question 2. How do the communities relate to each other in terms of membership,
trust and opposition or support?
Research Question 3. How do we discover the evolution and the stable cores of communities,
and identify anomalous communities or groups, some of which may be adversary?
Research Question 4. How do the relationships between communities evolve? For example, do
intersections tend to grow before a merger? Do the relationships between large communities
evolve in a similar way to the relationships between small communities?
Research Question 5. How does information flow within such communities and between
communities?
Research Question 6. How do we identify communities in social networks which manifestly
emerge as the result of communication and information flowing across the links (S2.3.1)? In
a military setting, these communities correspond to adversarial communities.
Research Question 7. How does the frequency of communication across the links (edges) affect
the emerging community structure (S2.3.1)?
Research Question 8. How do we influence/dissolve communities in social graphs (S2.3.2)? In a
military context, the answers shall yield ways to disintegrate adversarial communities with
hostile, extremist and/or militant ideologies.
In general we would like to address these research questions in a general and scalable way, so
that they may apply to diverse networks ranging into the tens of millions of nodes. Thus, we
envision validating all conclusions on real, stochastic networks ranging from million node
networks to thousand node networks. Traditionally social science research involving detailed
investigations of community structure have either not taken overlap of communities into account
or only applied to very small networks.
The technical path to achieving these research goals will require innovations in a number of
interlinked areas, which naturally leads to the initial hypotheses we will investigate:
To address the key research questions of this project, we define two tasks. In the first task, S2.1
Detection of Hidden Communities and their Structures, we use interaction data over time to build
a picture of community structure and community evolution, including information pathways and
inter-community relationships. This is an appropriate first step because it aims at understanding
the core of social networks. In the second task, S2.2 Information Flow via Trusted Links we build
agent-based models to study how information pathways are affected by the varying degrees of
trust between individuals and communities in heterogeneous networks which contain adversary
(non-trusted) as well as non-adversary (trusted) networks.
8.6.6 Task S2.1: Detection of Hidden Communities and their Structures (M.
Magdon-Ismail, RPI (SCNARC); M. Goldberg, RPI (SCNARC); B.
Szymanski, RPI (SCNARC); W. Wallace, RPI (SCNARC); D. Lazer, NEU
(SCNARC); Z. Wen, IBM (SCNARC))
Task Overview
The long-term goal addressed by this task is to understand all the communities and their
evolution in large functioning social networks - the hidden and informal ones in addition to the
self-advertising ones. This is the first step toward understanding the community structure in a
dynamic social network.
Task Motivation
Challenges: As discussed above, social networks are typically not seen; however, the random,
statistical interactions are observed, and based on these interactions, we would like to understand
Research Question 1. How can we identify communities and internal community hierarchy,
including information pathways, stable and unstable points?
Research Question 2. How can we identify leaders and influential members in communities?
Research Question 3. How can we build an initial understanding of the relationships between
communities, for example opposition versus cooperation?
Research Question 4. How can we track the evolution of these communities and their
relationships over time, including their stable cores?
Initial Hypotheses
It is possible to discover community structure using statistical graph theoretical analysis of the
interaction data. Deeper analysis can lead to internal structure of the communities and overlap
and interactions between members of different communities can discover relationships between
communities. Identifying communities over successive time steps, together with matching
communities in successive time steps can lead to identifying the evolution of communities.
Prior Work
There is considerable prior work on identifying community structure, predominantly in a static
network, focusing on non-overlapping communities [Clauset04, Newman04a, Newman04b].
Neither static networks, nor restriction to non-overlapping community structure is relevant to
dynamic social networks. This project will build from initial work [Baumes05, Baumes07a,
Baumes07b, Baumes08, Goldberg08a, Goldberg08b] on discovering overlapping community
structure, and how it can be used to model network evolution. Our work focuses on discovering
overlapping communities from vast random interactions.
Technical Approach
The basis of our approach is to use measurable behaviors of individuals to identify local
interactions between pairs of nodes. Though our methods will be general, the initial basis will be
dyadic communications and information exchange. The first step will be to give a precise and
formal definition of a community, which satisfies a set of minimal requirements:
1) It should be a local definition.
For example, the very popular methods in [Clauset04, Newman04a, Newman04b] fail to satisfy
the most basic property 1) above. Given such a definition we will develop algorithms which find
such groups whose interactions display a notion of persistence of the locally intense
communication flow between members of the community; in building these notions, it is
imperative that social communities can overlap. Accordingly, we will develop a theory for
identifying overlapping communities in social networks whose members are connected by
patterns of persistent locally intense communication. More specifically, we define a notion of
density E(G) for a group G, and propose that
1. E(G) is a local measure which captures the notion of intensity.
2. G should be locally optimal with respect to E.
Given such a definition, the task will be to find all such G which persist; to identify overlaps
(relationships) which are significant; to identify the evolution and stable cores; and to identify
the internal hierarchy.
Our approach is statistical in that we do not appeal to the semantic contents of the interactions.
In particular, this will allow our methods to be applied to massive networks. Additionally, we
have to develop measures of statistical significance and reliability of the discovered
communities, and community membership. Such measures can themselves be useful for Army
decision making.
Based on our ability to understand community structure, the next task we address is to
understand community evolution; in particular, we will develop methods to understand when a
community has changed its composition or structure, in turn identifying its stable cores which
persist over the long time scale; when it has grown substantially; when it has died; and, when it
has split into two or more sub-communities. The fundamental building block here will be an
understanding of how to match communities to identify similarities and dissimilarities. One
dimension of the relationship between communities is adversary versus cooperative, and how
that changes with time. Our approach to understanding the finer structure within the
communities, such as role and hierarchy and adversary versus cooperative will be based on three
main approaches: (i) statistical communication content analysis (recursive data mining) for role
identification; (ii) statistical communication pattern analysis for topic detection, and (iii) social
network representation based on bi-colored edges representing opposition versus support for
improved community detection. Topic and sentiment analysis are already well studied, and will
not be the topic of our research; we will adapt existing research to fit our goals (see for example
[Pang05, Taboda04, Bethard04]).
Validation Approach
We will test all our methods on real data, from multiple networks on multiple scales: Currently,
data available to us are email networks from the Enron email corpus and the IBM small blue
Though the networks we will study and validate on are not of particular interest to the Military,
they networks are typical social communication networks, which in some cases are very random
and rapidly evolving; they are diverse networks, so our methods would have to be general
enough to apply to such diverse networks, and thus, would apply to networks of importance to
Military. Further, some of these networks, like the Twitter network, are very harsh for statistical
algorithms (vast, very dynamic and very random), so the ability to discover community structure
in such an environment would indicate the robustness of our algorithms.
Research Products
By the end of the first year, we anticipate that we will develop preliminary methods for hidden
community detection, based on statistical communication patterns, establishing overlap as a
defining property. We anticipate one or more publications and/or reports submitted related to this
topic. Further, we will use these methods to begin to develop an understanding of typical
evolution of communities; in particular, the study of their stable cores, and understanding of
what constitutes anomalous evolution.
Task Motivation
Challenges: Social networks are conduits for information. Any model for information flow must
take into account the community structure as well as the relationships between communities, as
well as the dynamics of the community structure. Dynamics of community structure results in
dynamics of interactions, and interactions are the medium over which information is flowing.
Thus, we need to build low-level agent based models which capture interaction dynamics during
information flow. These models will involve trust between agents, and so we need to build a
model of information flow which takes into account trust and community structure.
Military Scenarios: How far will an ideology spread in a social network? How valuable is a
piece of information, given the path it took in getting to you (through trusted and non-trusted
communities)? These are issues of interest to Military decision making.
Impact on Network Science: We need new models of agent behavior which considers community
structure, so that we may model how agents disseminate information through a social network. In
addition, by observing information flow through the networks, we can calibrate these models, as
well as infer more details about the underlying social network and trust structure – information
tends to flow along trusted paths, through trusted communities. Such a model would allow us to
study the interplay between trust, community structure and information flow.
Ultimately, we would also like to address the following longer term objectives:
As can be seen, there will be a significant linkage to the Trust CCRI, which should be expected
because trust is what underlies the formation of communities and so should both be relevant in
detecting community structure, and understanding information flow. Importantly, given
communication structure, and the model for information flow, with trust as an input, we could
―reverse engineer‖ the trust values consistent with the observed information dynamics –
behaviorally measured trust.
Initial Hypotheses
Agents can be modelled as following a simple automaton for processing information and
interacting during the course of information flow through the social network. This model should
take as inputs, the community and trust structure of the network, together with parameters
governing the information flow and agent dynamics. Within this model, we conjecture that
different community and trust structures lead to drastically different information dynamics
footprints. We will build a scalable, realistic model for analysing such information flow through
communities for large, dynamic networks.
Prior Work
The most relevant prior work are our own initial models for social network dynamics and agent
based processing of information in networks operating under normal and high stress conditions,
[Hui08a, Hui08b, Magdon-Ismail05, Magdon-Ismail06]. The main innovation which we propose
to develop here is that, unlike SIR type models studied in mathematical epidemiology for
infection spread, information flow is an active process, requiring an agent based model. We will
build agent based axioms for information flow, which derive from the underlying community
and structure of the social network.
Technical Approach
Our model will have two main building blocks. First we will develop the small scale agent based
dynamics which ensue when an agent receives an important piece of information to react upon;
the desired reaction may be some action like ―retreat‖, or the desired action may be ―forward this
message‖. In either case, the agent must first decide whether to believe the information; if so act.
If not should he seek additional information or simply ignore it. Our model will incorporate this
complexity of agent based micro-modeling to ensure a realistic model of cognitive agents, based
on social and cognitive sciences. In particular, we formulate three axioms:
Axiom 1. The value of information changes depending on the path it takes through trusted and
non-trusted communities.
Axiom 2. Agents combine information from different sources depending on the nature of the
information, and the nature of action they are asked to perform based on the information.
These three axioms are parameterized so that they may accommodate a variety of environments.
Some setting of the parameters should apply to any particular social network.
The other input to the model will be the community structure. An inhomogeneous community
structure will lead to inhomogeneous trust relationships. We would then be able to investigate
how community structure influences the information dynamics with trust as a major player.
Out task will then be to understand such agent based models both from the theoretical and
simulation viewpoints: in particular, (i) what are the important source points and cut points, (ii)
how to immunize the diffusion of bad ideologies and enhance the spread of important
information, and (iii) with simulation, we will study the impact of communities, and
heterogeneous trust relations on the massive scale – million node networks.
Validation Approach
We have already collected data on the evacuation of San Diego during the wild fires of 2007.
Our data set contains information on the communication network, the social network (Hispanic
and non-Hispanic communities) as well as data on the evacuation dynamics resulting from a set
of reverse 911 phone calls. This is an ideal data set to study many aspects of the information
cascade: How do heterogeneous trust relations between multiple communities affect the
information cascade which ultimately results in a set of evacuated nodes? How does the
underlying communication network facilitate the information dissemination? How important are
social communities to the success of the information dissemination? We will model the San
Diego network on the scale of millions of nodes and simulate a variety of different models,
testing them against the observed evacuation dynamics. In addition, we will generate simulated
social networks obeying macroscopic properties (such as scaling laws) as discovered in our
EDIN tasks on network dynamics; we will use these networks to study in detail how different
community structures, trust structures and network structures affect information flow. In
particular, how different information seeding mechanisms perform in different environments.
Research Products
By the end of the first year, we will develop preliminary large scale models of information flow
through social networks via trusted links and communities. Our models will take into account
8.6.8 Task S2.3: Community Formation and Dissolution in Social Networks (G.
Korniss, RPI (SCNARC); B. Szymanski, RPI (SCNARC); C. Lim, RPI
(SCNARC); M. Magdon-Ismail, RPI (SCNARC); A.-L. Barabasi, NEU
(SCNARC); T. Brown, CUNY (SCNARC); Z. Toroczkai, ND (SCNARC))
Task Overview
The fundamental research question that we are trying to address in this project is: What are the
efficient strategies and trade-offs for attacking and disintegrating adversarial communities with
hostile, extremist and/or militant ideologies? Our long-term objective is to develop generically
applicable frameworks and computational methods for extracting individual- to community-level
behavioral patterns from the underlying social networks. In particular, by implementing
stochastic agent-based models for opinion formation on empirical networks, we will develop
models with predictive power, applicable to networks of various scales. As the availability of
information affects social interactions, and likewise, the integrity of the communication
infrastructure impacts social interactions, our project will rely on models and methods developed
by the INARC, CNARC, and IRC.
Task Motivation
Our individual-based models for opinion formation will provide an array of strategies for ―what
if‖ scenarios, that will enable us to answer such questions as where and how to defend network
communities with neutral or tolerant ―ideologies‖ against militant infiltration, and conversely,
where and how to attack and disintegrate adversarial communities exhibiting hostile, extremist
and/or militant ideologies.
For example, IW is a complex and ambiguous inherently social phenomenon: ―insurgency and
counterinsurgency operations focusing on the control or influence of populations, not on the
control of an adversary‘s forces or territory‖ (IW, Joint Operating Concept, 2007). Likewise, ―an
operation that kills five insurgents is counterproductive if collateral damage leads to the
recruitment of fifty more insurgents‖ (COIN). To help operations in such an environment, in this
project we will develop individual-based models to investigate social influencing and associated
strategies in weighted social networks. Our methods and models for community detection,
community stability, and social influencing will be applicable to any data sets, spanning vast
scales, including those collected by the military.
Prior Work
Most traditional methods to find community structure utilize various forms of hierarchical
clustering, spectral bisection (Scott00; Newman06; Newman05; Wu04), clique optimization
(Palla06), or iterative edge removal (Newman04). In contrast, in this project, we will utilize an
array of individual-based models for opinion dynamics, where during the evolution of the
Initial Hypotheses
1. We believe that individual-based models for opinion dynamics can be effectively used to
detect and identify communities in social graphs (S2.3.1).
2. Based on the emerging community structure, influencing a small number of selected
individuals with critical positioning in the social network will be sufficient and will provide an
efficient to ideologically dissolve or disintegrate adversarial or hostile communities (S2.3.2).
Technical Approach
We will develop individual-based models to investigate social influence and associated processes
in large-scale social networks. Tracing how individual‘s opinions change over time will enable
us to identify communities in the underlying social network. Clusters of nodes sharing the same
opinion for some time reflect the inherent community structure through this network-dynamics
probe. Furthermore, by tracing communication patterns and investigating and understanding
information flow in the social network, we can identify nodes with high importance in the
corresponding social graph.
We will also perform a systematic comparative analysis of social engineering and influencing:
we will employ different strategies, such as indoctrinating an optimally chosen small set of
agents vs. removing agents (nodes) from the network, and analyze social costs, time scales, and
associated trade-offs associated with reaching the desired state (such as breaking up hostile
communities). Our methods and models for community detection, community stability, and
social influencing will be applicable to any data sets, spanning vast scales, including those
collected by the military. As the availability of information affects social interactions, and
likewise, the integrity of the communication infrastructure impacts social interactions, our
project will rely on models and methods of information and communication networks. In
particular, we will use methods developed allowing inference of appropriate link weights
(strength of the effective social interactions between individuals, with possible temporal
Subtask S2.3.1: Identifying communities in social graphs by employing models for opinion
formation
We will employ individual-based models for opinion formation, such as the Naming Game.
Different words carried by individuals represent different opinions, or ideological standings.
More specifically, in models for social dynamics, our hypothesis states that communities
manifest themselves in the context in which distinct stylized opinions (e.g., religions, cultures,
and languages) have evolved and emerged over time. Thus, if at the late stages of the social
dynamics on the networks several communities persist and co-exist (different opinions survive),
they will be authentic signatures of the community structure of the underlying graphs. We will
also implement a weighted-link version of the model, applicable to weighted social networks.
The research and analysis on weighted social networks will be an important part of this task, as
the availability of information affects social interactions, and likewise, the integrity of the
communication infrastructure impacts social interactions. Therefore we will also collaborate with
M. Faloutsos (UCR) and J. Han (UIUC) on methods allowing inference of appropriate link
weights (strength of the effective social interactions between individuals, with possible temporal
variation and uncertainties) from the information and communication layers, which in turn, in
our models will represent social influence. Also, clusters and communities are often blurred as
concepts, and IRC researchers led by M. Faloutsos will investigate their interplay starting from
scratch, i.e., from appropriately projected edge weights.
Validation Approach
We will validate our methods by implementing individual-based models (Naming Game) on
empirical social networks of various scales. These weighted empirical networks include high-
school friendship networks (order of 103 nodes) and a large scale mobile communication graph
(order of 106 nodes).
Research Products
By the end of the first year, we anticipate that we will develop individual-based models and
theories for social dynamics applicable for community detection in various social networks. We
anticipate a paper on this effort in preparation by the end of this period.
Validation Approach
We will validate the effectiveness of our social engineering methods by implementing
individual-based models (Naming Game) on empirical social networks of various scales. These
weighted empirical networks include high-school friendship networks (order of 103 nodes) and a
large scale mobile communication graph (order of 106 nodes). We will use the emerging
communities exhibited by these empirical social graphs as test beds to develop efficient
strategies for ideologically dissolving or disintegrating communities
Research Products
By the end of the first year, we anticipate that we will develop models, theories, and methods to
efficiently ideologically disintegrate and dissolve communities in social graphs. We anticipate a
report on this effort by the end of this period which will serve as the basis for a future paper.
Modeling the evolution dynamics of integrated networks (EDIN) will require first an
understanding of the community structure and its evolution; this means that we will need
methods for detecting the hidden groups, their structure, their information pathways, and their
evolution. These are primary research issues of interest to this project, hence our work will feed
into the EDIN CCRI; in addition, an understanding of how communities form and evolve will
help with the detection of those communities, so the work of the EDIN CCRI will certainly feed
into this project.
Also, information flow between two parties is a display of some measure of trust. Trust between
communities and between individuals will be fundamental to the information dynamics of the
social network. Thus, the ability to measure trust, refine it based on observed community
structure and information flow will be a major input-output relationship between this project and
the Trust CCRI of the CTA, at all levels from social trust to information and communication
trust. Further the small scale cognitive dynamics of the interacting agents has a big role in the
information flow. Thus efficient agent models should be rooted in sound cognitive science theory
Summary of linkages:
Project S2 will provide two researchers (with Ph.D. degree) to the NS center every third year of
the program.
Research Milestones
Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 26,046
IBM (SCNARC) 47,115
ND (SCNARC) 11,000
NEU (SCNARC) 66,230
RPI (SCNARC) 503,704 91,954
TOTAL 654,095 91,954
References
[Bethard04] Bethard, S., H. Yu, A. Thornton, Vasileios Hatzivassiloglou, and Dan Jurafsky.
―Automatic extraction of opinion propositions and their holders.‖ Edited by James G. and Wiebe,
Janyce and Yan Qu. The AAAI Spring Symposium. 2004.
[Blatt96] Blatt, M., Wiseman, S., & Domany, E. (1996). Superparamagnetic clustering of data.
Phys. Rev. Lett 76, 3251--3254.
[Baumes05] Baumes, J., M. Goldberg, and M. Magdon-Ismail. ―Efficient identification of
overlapping communities.‖ IEEE International Conference on Intelligence and Security
Informatics. Springer,, pp 27-36, 2005.
[Baumes07a] Baumes, J., M. Goldberg, M. Hayvanovich, S. Kelley, M. Magdon-Ismail, K.
Mertsalov, and W. Wallace. ―SIGHTS: A software system for finding coalitions and leaders in a
social network.‖ IEEE International Conference on Intelligence and Security. 2007.
[Baumes07b] Baumes, J., M. Goldberg, M. Magdon-Ismail, and W. Wallace. ―Identifying hidden
groups in communication networks.‖ Handbooks in Information Systems -- National Security,
no. 2 (2007): 209--242.
[Baumes08] Baumes, J., H.-C. Chen, M. Francisco, M. Goldberg, M. Magdon-Ismail, and W.
Wallace. ―Visage: A virtual laboratory for simulation and analysis of social group evolution.‖
ACM Transactions on Autonomous and Adaptive Systems. 2008.
[Cai05] D. Cai, Z. Shao, X. He, X. Yan, and J. Han (2005) Community mining from multi-
relational networks. Proc. 2005 European Conf. Principles and Practice of Knowledge Discovery
in Databases (PKDD'05), pp. 445-452.
[Clauset04] Clauset, A., M.E.J Newman, and C. Moore. ―Finding community structure in very
large networks.‖ Physical Review E 70 (2004): 066111.
[Goldberg06] Goldberg, A. B., and X. Zhu. ―Seeing stars when there aren‘t many stars: Graph-
based semi-supervised learning for sentiment categorization.‖ HLT-NAACL 2006 Workshop on
Textgraphs: Graph-based Algorithms for Natural Language Processing. 2006.
Third, the intersection of the cognitive with the social defines a vast space. It would be easy for a
basic researcher to wander aimlessly in this space and end up wasting years pursuing issues that
shed a trivial light on the influence of human cognition in shaping social interactions. A better
strategy is to focus on Pasteur‘s Quadrant (Stokes97) by conducting a program of fundamental
scientific studies that focus on solving a problem that someone (society, the U. S. Army) actually
cares about solving. Hence, net-centric interaction grounds our cognitive social science proposal
by focusing our efforts on an applied technology of growing national importance.
There is much empiricism, that is, trial and error, in attempts to optimize the design of human-
information or human-technology interaction. For example, Google reports a study
(http://googleblog.blogspot.com/2009/02/eye-tracking-studies-more-than-meets.html) that
discusses how they came about with a webpage design that allowed users to shave seconds off of
their searches and thereby facilitated successful search. As Stokes discusses in the context of
Pasteur‘s 19th Century research, ―one of the most valuable properties of applied research is
'reducing the degree of empiricism in a practical art'‖ (Stokes97, p. 8). Any one design can be
tweaked by trial and error (as in the Google example) so as to prevent users from making
premature selections or premature rejections; i.e., to avoid the heuristics and biases to which
human cognition falls prey. However, a more robust approach is to understand the underlying
cognitive processes and to be able to guide the design of technology or information to avoid
suboptimal performance (Fu04). For example, the human information acquisition system is
Hypotheses. This research question does not lend itself to a fixed set of hypotheses as much as it
does to a challenge to the current state-of-the art in computational cognitive modeling. Most
applications of computational cognitive modeling are limited to small, laboratory phenomena.
Few attempts take on the complexity of even the computer-mediated world [Gray03]. Recent
years have seen an increase in the attempts to apply models developed for basic research to
cognitive engineering applications [Gray08]. The sweet spot to which question 3 is aimed is the
creation of fully embodied models (i.e., those with the same perceptual-motor and cognitive
constraints as humans) that can predict performance success and failures due to characteristics of
the technology (human-technology interaction), information form and format (human-
information interaction), and interpersonal exchanges (human-human interactions). Over the past
decade, progress has been made on the first two components (for an overview see, Gray07). The
challenge will be in extending the computational cognitive modeling approach (a) to Net-Centric
operations and (b) to encompass the interaction of cognitive with social processes. This last
challenge lies at the heart of the emerging sub-discipline of cognitive social science (Sun07;
Turner01).
8.7.4 Summary
The Cognitive Social Science of Net-Centric Interactions will bring the computational modeling
techniques of cognitive science together with the tools and techniques of cognitive neuroscience
to ask how the design of the technology (human-technology interaction), the form and format of
information (human-information interaction), or features of communication (human-human
interaction) shape the success of net-centric interactions.
8.7.6 Task S3.1: The Cognitive Social Science of Human-Human Interactions (W.
Gray, RPI (SCNARC); M. Schoelles, RPI (SCNARC); J. Mangels, CUNY
(SCNARC))
Overview: How does the social psychology construct of trust vary in human-human versus
human-agent interactions? Specifically, what cognitive mechanisms are affected by trust
and how do human evaluations of trust influence our subsequent cognitive processing of
information?
A year 1 focus of SCNARC S3 will be to examine the effect of trust on cognitive processing and
variations in human trust over human-human versus human-agent interactions. Specifically, we
hypothesize that differences in trust are signaled by differences in cognitive brain mechanisms
and that these differences can be detected by event-related brain potential (ERP) measures and
related to established cognitive science constructs, which in turn can be incorporated as changes
to the ACT-R (Anderson07) cognitive architecture.
Initial Hypotheses
Very recent work from one laboratory (Rudoy09) has
suggested that degree of trustworthiness is signaled by three
event-related brain potential responses, ―an early frontal
correlate of consensus trustworthiness, a later correlate of
Prior Work
The cognitive science study of human trust is a wide-open area with little use of EEG data and
no incorporation of EEG findings into computational cognitive models. The work of Rudoy and
Paller (Rudoy09) is the only example that we know in which EEG correlates of trust have been
sought. This pioneering effort has not been replicated and the paradigm used was extremely
artificial and limited in scope. Of key interest to our work is their manipulation of time pressure
from which they concluded that as time pressure to perform increased, people‘s assessment of
trust relied less on their past experience and more on non-predictive, perceptual factors.
From related work (not on trust), we know that the posterior medial frontal cortex (pMFC
including anterior cingulate and pre-SMA, see Figure S3-1) plays a role in the detection of errors
in one‘s own performance as well as in other‘s performance (Bekkering09). Such error detection
is signaled in EEG data by the event-related negativity (ERN) response. Hence, to the extent that
expectations of other people making an error is an index of one type of trustworthiness, the ERN
evoked-response potential (ERP) will be explored as an index of trust-reliability in our initial
study.
Technical Approach
Paradigms: As cautious scientists we plan two empirical studies that will start us on the
exploration of the cognitive social science of trust. The first will attempt to replicate as well as
vary some of the conditions used by Rudoy and Paller (Rudoy09). A tactical advantage of this
replication is that it will allow our work to get started in Q2 while we are building the more
complex Argus-Army simulation discussed in Task S3.2.
Figure S3-2: Graduate For purposes of our initial study, the UAV operator‘s role will be
student demonstrating played by a human team member or by an interactive cognitive
stylish EEG cap with 32 agent. In both types of teams, team members will communicate
electrodes. with each other via menu selections and typing simple commands
(i.e., no voice or complex linguistic data).
Data Collection. For both paradigms, one or more human members of each team will be
instrumented with 32 electrode EEG caps (see Figure S3-2). All system states, all human mouse
Hyperscanning. Babiloni and colleagues [Babiloni06] have recently introduced the notion of
hyperscanning; the simultaneous collection of EEG data from two or more human subjects in
real time as they engage in a group task. The technical challenges of this technique are
challenging but the promise of the technique in the study of cognitive social science is high and
will be pursued as part of our research effort.
Validation Approach
As outlined above our task 1 enables two types of validation. First, our partial replication of
Rudoy and Paller‘s study will provide an important replication of their finding that three ERP
components signal trustworthiness. Second, our variations on their study will be the first step in
generalizing their results to different paradigms. Third, our use of their measures in Argus-Army
will be a strong test of the validity of these factors to measure trust.
Research Products
The main product of this effort will be to establish a reliable and valid neuroscience measure of
trust that can be used by us and others in more applied work with future Net-Centric systems. A
second product will be the establishment of time parameters for the emergence of a ―trust‖
appraisal that can be used in computational cognitive models to predict trust in complex
technology-mediate human-human interactions. Of course, as academic researchers we see much
potential in this work for research reports and papers that contribute to the basic research
foundation of cognitive social science and to the 6.1 goals of this project.
Overview
We will repurpose an existing team simulation environment for research in cognitive social
science constructs of Net-Centric interactions.
Motivation
Although we expect to work with Net-Centric software being developed by other groups, there is
a need for a research vehicle that can be rapidly reconfigured to emphasize or isolate important
social science constructs.
Initial Hypotheses
Argus-Army is a vehicle for research and not a research topic per se. To the extent that it is
appropriate to say that a simulated task environment entails a hypothesis then the hypothesis is
that a complex but manageable research environment is the best tool for advancing basic
research in an applied domain [Gray02].
Technical Approach
Initial activities for Argus-Army include the following:
The transformation of Argus into Argus-Army, the integration of Argus-Army with software that
can collect and synchronize the time stamps of EEG data at remote locations (to millisecond
accuracy), and the construction of cognitively plausible interactive cognitive agents will not be
trivial. Efforts on this line can start immediately but are expected to continue to be refined over
the next several years. The good news is that intermediate products should be useful in the 1 st
year (e.g., Argus-Army usable at one location to collect baseline human data from 3 human
participants). In designing Argus-Army we will work closely with Dr. Daniel Cassenti (ARI
APG) and colleagues to ensure that Army-relevant scenarios are built in.
We need to devote significant levels of first year effort to developing software and software
standards to:
A. Standards for Human Data Collection: Facilitate the collection and sharing of human
performance, process, and neuroscience data from human interactions with Net-Centric
software.
B. Design Specifications for Developers of Net-Centric Software: Support the direct
interaction of software interactive cognitive agents with the same Net-Centric software
used by human users.
C. Hyperscanning is the simultaneous collection of EEG data from two or more humans at
single or distributed locations. Hyperscanning for EEG is a new technique that is
primarily being developed and used by an Italian group [Babiloni06]. In-house
development is required to ensure millisecond level synchronization with all 32 EEG
channels from two systems plus the task software.
D. Simultaneous collection and synchronization of data collected in remote locations:
Enable the simultaneous collection and synchronization of EEG and other performance
data from multiple people at different physical locations (e.g., Mangels‘ laboratory at
CUNY and the Gray-Schoelles laboratory at RPI).
Validation Approach
Three types of validation are sought for Argus-Army. The first type would occur if the research
that uses Argus-Army advances our understanding of the cognitive factors underlying social
constructs in complex Net-Centric operations. A second type of validation would come from the
adoption by Argus-Army by other groups associated with ARC such as the Army Research
Laboratory. The third type of validation is the validation of Argus-Army as a complex software
system for accurately and reliability collecting human data at the temporal resolution of
milliseconds. This validation includes the completeness of the log files, the accuracy of the
timestamps, and the ability of interactive cognitive agents to interact with Net-Centric software
Research Products
Argus-Army is its own research product. It is intended to be a flexible vehicle that fully
integrates the collection of behavioral and neuroscience data with performance data. Likewise, it
is intended to support the development of complex computational cognitive models that can
interact directly with the Argus-Army software as a 3rd human player might. Such a capability
will allow us to design interactive cognitive agents who, for example, vary in their
trustworthiness so as to facilitate the understanding and investigations of the hypotheses
discussed in Task S3.1.
Research Milestones
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
CUNY (SCNARC) 65,115
RPI (SCNARC) 221,138 40,370
TOTAL 286,253 40,370
References
[Anderson07] Anderson, J. R. (2007). How can the human mind occur in the physical universe?
New York: Oxford University Press.
[Babiloni06] Babiloni, F., Cincotti, F., Mattia, D., Mattiocco, M., De Vico Fallani, F., Tocci, A.,
et al. (2006). Hypermethods for EEG hyperscanning. Paper presented at the Engineering in
Medicine and Biology Society, 2006. EMBS '06. 28th Annual International Conference of the
IEEE.
[Bekkering09] Bekkering, H., Bruijn, E. R. A. d., Cuijpers, R. H., Newman-Norlund, R., Schie,
H. T. v., & Meulenbroek, R. (2009). Joint Action: Neurocognitive mechanisms supporting
human interaction. Topics in Cognitive Science, 1(2), 340-352.
[Carrión10] Carrión, R., Keenan, J. P., & Sebanz, N. (2010). A truth that's told with bad intent:
An ERP study of deception. Cognition, 114(1), 105-110.
[Fu04] Fu, W.-T., & Gray, W. D. (2004). Resolving the paradox of the active user: Stable
suboptimal performance in interactive tasks. Cognitive Science, 28(6), 901-935.
[Gray08] Gray, W. D. (2008). Cognitive architectures: Choreographing the dance of mental
operations with the task environments. Human Factors, 50(3), 497-505.
[Gray07] Gray, W. D. (Ed.). (2007). Integrated models of cognitive systems. New York: Oxford
University Press.
[Gray04] Gray, W. D., & Fu, W.-T. (2004). Soft constraints in interactive behavior: The case of
ignoring perfect knowledge in-the-world for imperfect knowledge in-the-head. Cognitive
Science, 28(3), 359-382.
[Gray06] Gray, W. D., Sims, C. R., Fu, W.-T., & Schoelles, M. J. (2006). The soft constraints
hypothesis: A rational analysis approach to resource allocation for interactive behavior.
Psychological Review, 113(3), 461-482.
Table of Contents
9 Non-CCRI Research: Communication Networks Academic Research Center (CNARC) .... 9-1
9.1 Overview ......................................................................................................................... 9-2
9.2 Motivation ....................................................................................................................... 9-3
9.2.1 Challenges of Network-Centric Operations ............................................................. 9-3
9.2.2 Example Military Scenarios ..................................................................................... 9-3
9.2.3 Impact on Network Science ..................................................................................... 9-4
9.3 Key Research Questions ................................................................................................. 9-4
9.4 Technical Approach ........................................................................................................ 9-4
9.5 Project C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks . 9-6
9.5.1 Project Overview ..................................................................................................... 9-7
9.5.2 Project Motivation ................................................................................................... 9-8
9.5.3 Key Research Questions .......................................................................................... 9-8
9.5.4 Initial Hypothesis ..................................................................................................... 9-9
9.5.5 Technical Approach ................................................................................................. 9-9
9.5.6 Task C1.1: Modeling Operational Information Content Capacity (OICC) and Factors
that Impact OICC (G. Kramer and K. Psounis, USC (CNARC), R. Ramanathan, BBN
(CNARC), A. Yener, Penn State (CNARC)) ..................................................................... 9-10
9.5.7 Task C1.2: Characterizing and Controlling QoI (R. Govindan and M. Neely, USC
(CNARC); S. Krishnamurthy, UC Riverside (CNARC); Q. Zhao, UC ............................ 9-16
9.5.7 Davis (CNARC); A.Bar-Noy, CUNY (CNARC); T.F. La Porta, Penn State
(CNARC); M. Srivatsa, IBM (INARC), T. Abdelzaher, UIUC (INARC)) ....................... 9-16
9.5.8 Task C1.3: Characterizing Connectivity and Information Capacity for Dynamic
Networks (Q. Zhao, UC Davis (CNARC), N. Young, UC Riverside (CNARC), A. Yener,
Penn State (CNARC), P. Brass, CUNY (CNARC), A. Swami, ARL) .............................. 9-25
9.1 Overview
Our goal is to understand and characterize the capabilities of complex communications networks,
such as those used for network-centric warfare and operations, so that their behavior may be
accurately predicted and they may be configured for optimal information sharing and gathering.
The objective of such a network is to deliver the highest quality of information based on which
decisions can correctly be made to provide comprehensive situational awareness. This will
provide increased mission tempo and overall supremacy in managing resources engaged in a
mission. Network science must embody the vision of a network as an information source.
Therefore, in the CNARC we aim to characterize and optimize network behavior in a way that
maximizes the useful information delivered to its users. To this end we define a new currency by
which we evaluate a network: its operational information content capacity (OICC).
We work with the INARC to understand the characteristics on information and the underlying
information networks. These will be leveraged to determine the relative important of
information and how information must be treated as it is transported across a network. We will
work with the SCNARC to understand underlying social networks and how information is used
in making decisions.
9.2 Motivation
Our ultimate target is to understand and control network behavior so that the operational
information content capacity of a network may be increased by an order of magnitude over what
is possible today.
This project establishes the limits of capacity of communication networks in terms of their
operational information content capacity. We explicitly consider tactical network characteristics
of size, dynamics, mobility and heterogeneity. The project ultimately accounts for interactions
with information and social networks. These underlying networks must be leveraged to a) fully
understand what information is important and how it is being used, and b) allocate
communication network resources intelligently to optimize the OICC.
We will achieve our high level objective by first developing comprehensive models capturing the
behavior of OICC and QoI and the properties of tactical networks that impact them (e.g.,
dynamics). From these models we expect to learn which factors have the largest impact OICC
and will be able to model network paradigms that mitigate and control these factors so that we
can improve the achievable OICC. With this understanding we will analyze protocol structures
to determine if they prevent networks from reaching their optimal QoI, and if so, explore new
protocol structures that alleviate these bottlenecks.
C1: Modeling Data Delivery in Dynamic, Heterogeneous, Mobile Networks – This project
focuses on the behavior of OICC and QoI and the factors that impact them. In the first year, the
largest effort will be on C1. This is so that we may develop a fundamental definition of OICC
and QoI that is accepted across the CTA program; in fact this work will be done collaboratively
with the other centers. Within C1 we will also define the first models of OICC and QoI in terms
of network parameters, properties and constraints. INARC is participating in this project, and
ARL is collaborating
C3: Achieving QoI Optimal Networking – This project focuses on the structure of protocols that
may limit the QoI achieved in practical networks. Project C3 will be deferred until the second
year of the program. We will select protocols to analyze once we have an understanding of the
factors that impact OICC and QoI and the networking paradigms that hold the most promise.
Traditionally, capacity refers to the number of bits per unit resource that is reliably
communicated between source(s) and destination(s). It is a limit that, in theory, is achievable and
one that is provably the upper bound beyond which reliable communication is not possible. Since
Shannon [1] characterized the channel capacity of a single transmitter-to-single receiver link
sixty years ago, there has been an intense effort in realizing or at least coming close to this limit
in real systems. Information theory, in the past few decades, has evolved to include many new
communication paradigms using the mathematical framework of Shannon’s: for example,
multiple transmitters or receivers for which capacity is a region that consists of the collection of
all rate tuples at which reliable communication is possible and beyond which it is not, or a model
where a helper node relays information between the transmitter and the receiver. Though the
models considered appear deceivingly simple – a transmitter and two receivers [2] [3]; two
interfering transmitters and with two corresponding receivers [4]; a three-node network including
a relay [5] -- finding the exact channel capacity in the sense of Shannon has so far eluded the
information theory community.
At the other extreme, another line of work starting with the seminal work by Gupta and Kumar
[6] considered networks consisting of asymptotically large number of nodes, and provided an
initial modeling tool for the discussion of the fundamental limits on the capacity of large wireless
networks. They concluded that the capacity of wireless networks decreases as the number of
nodes increases, and since the publication of their results almost 10 years ago, many researchers,
including ourselves, have addressed ways to alleviate the impact of multiple access interference
(MAI) on the throughput capacity of wireless networks. However, many limitations must be
revisited in order to fully understand the true limits of a tactical network consisting of social
networks, information networks and an underlying communication and storage network in a
dynamic environment consisting of heterogeneous nodes.
We posit that the fundamental limits of tactical networks are inherently different than the limits
of broadcast links, and that we should consider all the inherent factors that constitute a tactical
network, and which will be reflected in QoI. We believe that an essential goal of the proposed
center is to develop analytical tools that further our understanding of realistic tactical networks.
We certainly acknowledge that the proposed effort constitutes uncharted territory for
information, communication, and network theorists, as well as network protocol experts. We
submit that recognizing the above four key aspects of fundamental limits of a tactical network
and the adoption of the proper central theme to glue our efforts together are critical. As a result, a
definition of a new currency of information is key to getting results for the new theory of
networks that is as fundamental as what Shannon capacity is for a single link. We call this
fundamental metric the operational information content capacity of the network.
1. Modeling Operational Information Content Capacity and Factors that Impact OICC – In this
task, we will develop models to determine the limits of OICC in the face of realistic
constraints. We will consider scaling to the size of large tactical networks and define
network parameters that impact network performance. We will work with INARC to
determine the importance of bits of information. We will also interact with Tasks C1.2-4
towards modeling the OICC for dynamic heterogeneous networks and its impact on QoI and
vice versa.
2. Characterizing and Controlling QoI - In this task, our goal is to systematically understand the
issues pertaining to the relationship between the network and the quality of information (QoI)
ultimately delivered to the end user. It can be impacted by the network due to the data
delivery characteristics of the network (e.g., loss, delay, and jitter) and the security services
offered in the network. This is collaborative work with INARC.
3. Characterizing Connectivity and Information Capacity for Dynamic Networks – In this task
we explicitly model heterogeneity and dynamics and their effect on QoI. We consider both
temporal and spatial dynamics. We will work with the EDIN CCRI task on mobility
modeling to capture mobility dynamics. We will also work with Task 4 towards
understanding the impact of security properties on connectivity and capacity.
4. Modeling the impact of data provenance and confidentiality properties on QoI - QoI is
impacted by security characteristics of the information source(s), the network that transports
it, and the hosts that process it. Our aim is to characterize the impact of security on the
quality of information but also to mitigate its impact.
[2] T. M. Cover, “Broadcast channels,” IEEE Transactions on Information Theory, vol. 18, pp.
2–14, January 1972.
[3] R. G. Gallager, “Capacity and coding for degraded broadcast channels,” Problemy Peredachi
Informatsii, vol. 10, no. 3, pp. 3–14, 1974.
[4] T. Han and K. Kobayashi, “A new achievable rate region for the interference channel,” IEEE
Transactions on Information Theory, vol. 27, pp. 49–60, January 1981.
[5] T. M. Cover and A. A. E. Gamal, “Capacity theorems for the relay channel,” IEEE
Transactions on Information Theory, vol. 25, pp. 572–584, September 1979.
[6] P. Gupta and P. R. Kumar, “The capacity of wireless networks,” IEEE Transactions on
Information Theory, vol. 46, pp. 388–404,March 2000.
9.5.6 Task C1.1: Modeling Operational Information Content Capacity (OICC) and
Factors that Impact OICC (G. Kramer and K. Psounis, USC (CNARC), R.
Ramanathan, BBN (CNARC), A. Yener, Penn State (CNARC))
Task Overview
The operational information content capacity is a new paradigm where we seek to understand
performance limits of communication networks that take into account (i) realistic limitations on
node and network capability and complexity; and (ii) the quality of information (QoI) that is
communicated between the various entities. We will seek the limits of a communication network
under realistic constraints with performance metrics that reflect the impact of information
content and QoI. As eluded to in the overall summary of the project, this is a performance metric
that has not been considered or defined up to date.
Task Motivation
Only by understanding the limits of a communication network in terms of its OICC can we start
to make progress on controlling and optimizing the performance of a network.
Initial Hypothesis
We expect that we will find that certain constraints, such as the number of nodes in a network,
the varying capabilities of the nodes, and the communication environment (and the resulting
topology) will have varying impact on QoI, and will use these results to focus the work in
subsequent tasks. We expect that these properties will jointly affect the QoI and OICC in a way
Given the grand challenge this task presents, we plan to tackle it from a number of angles as
outlined below.
Realistic Scaling
Parallel with the above realistic modeling effort, we will also pursue the understanding of
performance and scalability of finite-sized networks. Over the years, a number of theoretical
results have appeared in the academic literature regarding fundamental limits. However, the
implication of these asymptotic results on finite, medium-sized networks is not clear. For
example, the scope of military wireless networks is likely to be around the brigade level (a few
thousands of nodes).
1. A network definition N that captures the size, mobility, density, degree, diameter,
connectivity and other properties. The idea is to capture in simple terms the topological
properties that affect the capacity and protocols underlying the network. For instance, a
line network (military convoy), a clique (parking lot) and a random deployment are all
quite different from each other in the constraints that they present. We shall draw upon
our work in density-adaptation [4] and topology characteristics [5] in defining this.
2. A node parameter definition P that captures the data rate, number of transceivers,
directionality [6], energy constraints, processing constraints, storage constraints and other
such properties. The idea is to capture in simple terms limitations on information carrying
capability that directly affects the performance.
3. A protocol definition R that captures the overhead induced by MAC, network and
transport layer protocols, including security protocols. . These may be specific to
networks, but a first-order model can have choices like TDMA/CSMA-CA, link-state/on-
demand, TCP/UDP respectively. We shall draw upon our prior work in link state routing
[7], mobility-assisted on demand routing [8][9], and TDMA versus CSMA-CA MAC
[10] in informing this vector.
4. An information profile load definition L that captures the quality, nature, scope and other
properties of the information that is transported across the network for user consumption.
Given a tuple of vectors, (N, P, R, L), each of which has a number of to-be-defined variables; we
shall derive expressions that characterize the performance. The expressions can be used to
estimate any one value given other values, or a k-dimensional envelope for a set of k parameters
given an instantiation of the other.
We note that the parameter vectors may be inter-related within and outside of the
communications network – for example, the protocol overhead (R) depends upon the topology
(N), which in turn may depend upon the underlying social and information pathways. Similarly,
the load definition (L) depends upon the social network (military hierarchy) and the information
network underlying it. In investigating these dependencies, we expect to collaborate heavily with
the INARC, SCNARC and IRC.
We shall inform and refine our model with real military networks being developed at BBN
Technologies (a part of Raytheon Corp.), as available (subject to sensitivity constraints). In
particular, we plan on using the DARPA WNaN network as a basis for “sounding out” the
accuracy of out model, and progressively refining it.
Figure 1: Example framework for analytical model of operational performance based on an evolving
interconnection of specific models. Orthogonal refinements can be made of each model to progressively
increase accuracy. Note: This is for illustration purposes only, not all components or relationships shown.
Other centers, such as SCNARC and INARC will be tapped, for instance, in the characterization
of load. Similarly, computing multicast branching factors and tree shapes (which determines
loading), or routing overhead computation (which depends on network structure) could leverage
sophisticated graph theoretic results from the EDIN part of the project. Indeed, our effort can be
a source of specific problems and tasks for information theorists, graph theorists, experimenters
etc. – problems that stem from real-world needs.
Validation Approach
We will validate our framework by testing the sensitivity of the OICC against several of the
parameters described above. We will compare these results to the traditional capacity measure
of networks to measure the order difference in terms of sensitivity.
Research Products
In the first year we will
1. Develop a preliminary OICC framework including impact of realistic constraints
identified to be crucial.
2. Develop the framework of model hierarchies and inter-relationships both within and
outside of CNARC.
References
[2] Y. Tian and A. Yener, “The Gaussian Interference Relay Channel with a Potent Relay,”
IEEE Global Telecommunications Conference, Globecom'09, Honolulu, Hawaii,
December 2009.
[3] P. Gupta and P.R. Kumar, “The Capacity of Wireless Networks,”, IEEE Transactions
on Information Theory, 2000
[4] R. Ramanathan, "Making Ad Hoc Networks Density Adaptive", Proc. IEEE MILCOM
2001, Tysons Corner, Virginia, October 2001.
[5] E.L. Lloyd, R. Liu, M.V. Marathe, R. Ramanathan, S.S. Ravi, "Algorithmic Aspects of
Topology Control Problems for Ad Hoc Networks," Proc. ACM MOBIHOC 2002,
Lausanne, Switzerland, June 2002
[10] A. Jindal and K. Psounis, Characterizing the Achievable Rate Region of Wireless
Multi-hop Networks with 802.11 Scheduling, IEEE/ACM Transactions on Networking,
Vol. 17, Iss. 4, pp. 1118-1131, August 2009.
Task Overview
In this task our goal is to systematically understand the issues pertaining to the relationship
between the network and the quality of information (QoI) ultimately delivered to the end user.
QoI is a composite, multi-dimensional, metric that captures the trade-offs of several factors to
characterize the information ultimately delivered to the application. It allows us to model the
network as an information source. QoI applies to mission-critical information that is desired by
the end-user. The quality of this content will be specified in terms of a few user defined high
level requirements. We seek to determine the extent to which the network can fulfill these high
level requirements.
This task differs from the effort underway in the ITA program in the following major ways: (i)
the ITA program examines QoI in sensor networks; here we take a much more broad view of
information; (ii) here we model many network properties, such as security, when modeling QoI;
(iii) here we consider underlying information and social networks, and their relationship with
information and decision making, when defining QoI.
Task Motivation
QoI can be understood as a utility defined in terms of the information obtained from the network.
Information is derived from data. The type and quality of the information derived from the data
is application specific. It can be impacted by the network due to the data delivery characteristics
of the network (e.g., loss, delay, and jitter) and the security services offered in the network.
To the best of our knowledge, there are no efforts to date towards computing a metric that is
similar to QoI outside of preliminary research conducted in the ITA program. There have been
several studies on providing Quality of Service (QoS) support but we stress that the two are not
the same.
The benefit of using QoI as the metric of interest is that it allows us to model and characterize
networks in terms of the information it can transfer, not simply the data. Often information used
to drive a decision may come from multiple sources, in multiple formats, and with varying
properties. By focusing on information, we are able to fully leverage social and information
networks to the benefit of the communications network.
Technical Approach
Our approach is to define a multi-dimensional function representing QoI that can be used across
the CTA, and then investigate approaches by which such a function can be optimized.
Since this is a non-classical approach, we first attempt to clearly define QoI, to explore the
impact of network structure, dynamics, and other factors on the QoI delivered to the user, and to
explore the mathematical foundations behind QoI metrics. This exploration will reveal
architectural trade-offs and mechanism choices, and help us, in subsequent years, to define an
architecture, mechanisms and protocols for QoI. The task is organized into several sub-tasks:
1. Definition of the QoI Space - We define QoI as a function with respect to network
parameters. We start with tractable subsets of network properties which will be extended
and jointly developed with the INARC and SCNARC.
2. Evaluating architectural choices for QoI – Given the definition of QoI, we evaluate
architectural tradeoffs in accurately estimating QoI in different classes of networks. This
step will give us a fundamental understanding of the space of designs for QoI
mechanisms.
3. Optimizing QoI metrics – Finally, we will explore the theory of optimizing complex
non-convex QoI metrics in a stochastic sense. This sub-task will help define operational
bounds on the achievable QoI in a network under various conditions.
Prior Work
There has been significant work within the DoD community on QoI and network-centric warfare
[8][9][10]. In much of the work QoI is central to what extent information is “shareable.” This
work defines several different types of attributes by which QoI may be judged. These include
attributes that are objective, related to fitness-for-use, etc. Many prior works propose various
metrics which are composed into QoI, such as accuracy, currency, clarity, along with many
others [11][12][13][14][15][20]. In particular, [13] classifies these metrics into “situation
independent” and “situation dependent.”
In the ITA program, a great deal of effort has been spent on determining the difference between
quality and value of information, and how to codify QoI [16]Error! Reference source not
found.. They propose that the value of information depends on how useful the information is to
taking an action, while QoI is related to fitness of information. The research emphasizes how
metadata representing QoI metrics is important to convey to information recipients how data
may have been processed, for example, fused, within the network.
In [21] the authors propose “quality views.” Quality annotations, a type of meta-data, are
provided with information to give an indication of its quality. Different types of annotations are
called quality evidence. Operators, called quality assertions, may be applied to the annotations
to determine a rank or rating for the information. These quality assertions are domain specific.
QoI as experienced by the user is impacted by several factors. A simple way to conceive QoI is:
QoI = f(I, D, S)
where, I, D and S characterize the source information, the network data delivery performance
and the level of associated security, respectively. The QoI is application specific; in other words,
for a given application the QoI requirements may differ. Formulating a generic utility to
characterize the QoI will be a challenge given that one expects this utility to depend on the
applications, the context (the topology of the network, the terrain), the user requirements based
on the tactical mission at hand (under attack versus peace missions), the heterogeneity of the
nodes that compose the network (in terms of memory, CPU etc.), and human input.
QoI may be viewed as a type of utility function that captures how useful information is to an
application. The QoI achieved will vary over time depending on other applications using the
network, the state of the network, etc. The type of utility function will vary depending on the
application. Some applications will find information useful in increments; others may require
some aspect of information in its entirety for it to be useful. Given the short term and long term
uncertainties associated with many of the above factors, the QoI metric can be expected to be
stochastic in nature. In other words, one might expect a QoI metric to be expressed as a
probability that certain criteria are fulfilled or a function of this probability (such as a moment).
To provide an example, the QoI for a node that requires authenticated images from a specific
location within a certain delay constraint might be expressed as the joint probability
P{{the achieved throughput} > τ,{the messages received are authenticated}, {the latency is} < δ,
{there are camera sensors in the desired location},{the location information is correct}}
When computing this probability, note that the factors will have to be jointly considered. As an
example, if messages for a certain application are authenticated using digital signatures, this will
affect the achievable throughput and delay for that application. The location of the camera(s) will
have an impact on the delay. Depending on the efficacy of the localization schemes, there may
be errors in the location and this may bias the probability. The mobility, zoom and steering
capabilities of camera sensors will dictate whether or not there are such sensors in the location of
interest. The throughput and delay functions will depend on the routes that are possible, the
terrain, interference etc. The objective of the node under discussion may be to maximize this
probability. Clearly this is a complex task given the number of constraints; thus computing the
above probability is extremely difficult. The problem becomes even more complex when we
factor in human behaviors and more stringent application requirements, and when we consider
that multiple missions will be competing for resources and affect the QoI of each other.
If we now consider a network where each node has a different set of such QoI requirements, the
problem of meeting the requirements will be a challenge. In such a setting, one might envision a
network-wide QoI index, which is the joint probability that the QoI requirements of all the users
in the network are satisfied. Since the requirements of the different users consume resources and
In particular, we will determine the data delivery capabilities of the network under various
conditions; we will consider more realistic PHY layer representations, heterogeneity and the
impact of security on data delivery. The behavior of the network under attack and the
ramifications of enforcing security policies will be characterized. The type of information source
is dictated by the application and available sources. We will work on this collaboratively with
INARC.
We point out that our goal here is four fold. First we wish to fully understand the interactions
between the information sources and network on the achievable QoI. We will develop a
formulation of QoI to first include a subset of the important metrics of interest. We expect this
to be an iterative process with the INARC and IRC. Second, we propose to determine the highest
level of network-wide QoI that can be provided in any given tactical deployment setting. We
will then determine via a combination of experimentation and analysis if current protocols will
be able to provide this highest possible QoI. If not, we will undertake a careful assessment to
identify which of the functionalities (either in the protocol stack or in the security suite) is
causing the degradation. New protocol structures and algorithms will be explored as appropriate
towards enabling the highest QoI.
In our work we have classified QoI metrics into intrinsic and contextual. Intrinsic metrics assess
the quality of information independent of the situation. Attributes include concreteness, freshness
(the age of information), and provenance, among others. Contextual metrics include fitness-for-
use attributes, like accuracy, timeliness, and credibility (or believability). The research on
provenance and credibility and their impact on trust are part of the Trust CCRI.
1. We will determine the QoI of a network in terms of a subset of data delivery and security
requirements. We propose to use an initial definition of QoI wherein, the information
quality only pertains to a subset of the intrinsic and contextual attributes of QoI that we
expect to study over the life of the program. For the intrinsic metrics we will consider
data freshness and provenance; for contextual metrics we will consider accuracy and
timeliness. We will not address the modeling of provenance as it relates to trust, because
that is being addressed in the Trust CCRI; here we focus on the interactions of
provenance with the other metrics on overall QoI. The objective is to jointly satisfy a
user requirement in terms of these metrics. In other words, the QoI is the probability that
each of these requirements is satisfied to a desired degree.
In our first year, we will first map the QoI to a vector [freshness, provenance, accuracy,
timeliness]. We will determine if each of these metrics individually exceeds a threshold.
Otherwise said, the freshness should be higher than say , the provenance measure
should be higher than and accuracy not lower than . We will consider both the
representations outlined above (the stochastic and the range based representations). To
begin with we will consider a homogeneous network wherein the requirements are the
same for all the nodes in the network. Note that freshness and timeliness, and several
variants have been previously considered for impact on data quality, and we will build
upon this work [22]. We will also work with task C2.2 to determine the impact of in-
network storage on freshness and timeliness.
2. QoI is stochastic and dynamic in nature. It can vary rapidly in both temporal and spatial
domains. It is thus crucial that end users be able to project and predict QoI evolution so
that optimal decisions can be made on how to proceed with each specific mission. The
major challenge here is that the stochastic model of QoI evolution may not be known a
priori. The presence of multiple simultaneous missions distributed across the network
further complicates the problem: actions taken regarding one mission may affect the QoI
experienced by another mission (for example, due to shared communication resources).
We propose to develop real-time algorithms for learning and predicting QoI evolution to
support multiple distributed data flows associated with different end users/missions.
Comparing with offline learning, real-time learning avoids overhead by learning from
information-bearing data instead of training data; it offers improved performance over
time with the accumulation of observations; it adapts to the dynamics of QoI evolution
and allows online prediction.
Our technical approach rests on our preliminary work [3] where we have developed a
stochastic optimization framework for decentralized multi-armed bandit with multiple
distributed users. This framework gives a new formulation of the classic bandit problem
that considers only a single user [4]. It captures the tradeoff across exploration (exploring
new routes to learn their QoI evolution for future use), exploitation (exploiting routes
with a good history of past QoI evolution), and competition (avoiding congestion caused
by competing users).
The design objective is to minimize the system regret (or the so-called cost of learning)
defined as the performance loss compared to the ideal scenario where the QoI model of
every route is perfectly known to all users and collisions among users are eliminated
through perfect centralized scheduling. In the first year of this project, our objective is
twofold: (i) establish the fundamental limit on the system regret of real-time distributed
learning algorithms; (ii) develop distributed learning algorithms to achieve the
As evident in the discussion in the above paragraph, there is a cost associated with
providing the QoI. Thus, we also seek to estimate and capture the cost of providing the
QoI. In order to determine whether or not the above attributes are satisfied, nodes will
need to exchange information. When more information is exchanged, the accuracy of the
QoI estimate can be higher, at the expense of increased overhead.
3. The action that a node takes based on the input will also affect the QoI achieved in the
network on the whole. In particular, when provided with different choices, a node may
choose a specific action. To provide a simple example, a node might choose video to
obtain high accuracy but with lower confidentiality (since encrypting video streams could
be expensive). This affects the other flows in the network differently as compared to a
case where the node chooses encrypted text instead. The traffic patterns are clearly
different and in addition the routes chosen could differ. This in turn affects the QoI
achieved by the other nodes. Our goal in the first year is to examine the impact of such
choices on the overall network QoI that can be achieved. We once again will consider
preliminary representations that only consider a subset of performance objectives and
will gradually move to more complex representations in later years.
4. QoI is a multi objective optimization goal as evident from the above discussion. A
different way to attack it is to define the function f (as above perhaps) and maximize it.
Another way is to give a “budget” to all but one of the parameters and then maximize the
value of the non-budgeted parameter. Then, we can optimize f by binary search over the
values of the other parameters. We propose to explore how such methods approximate
the ultimate goal of optimizing the function f. Again, in the first year, we will confine
ourselves to the function defined above.
Once we have determined a definition for QoI, the next step is to understand how, given the
information requested by a user, the network determines the QoI to be delivered to the user. We
classify data into two types. User data is data that is transformed into information based on
which operational decisions are made. Network data is data that gives insight into how the
network is performing (e.g., control information or performance statistics). The user data
received by a sink from each source must be transformed into information. The quality of this
information is partially a function of the delivery capabilities of the network and the security
services provided. The functions for determining the quality of this user information will be
developed jointly with the INARC. In addition, to compute the expected QoI, network and user
data must be collected and analyzed in a distributed way. This computation must be performed
on incomplete data and its collection will add overhead to the network.
In the first year, we will study the impact of network structure, and dynamics on QoI
assessments. Specifically, we are interested in the following questions:
1. For a given piece of information, how does network structure affect the QoI that can be
delivered to the user?
2. For the same information, how does QoI degrade, if at all, with network evolution and
dynamics?
3. How do the choice of specific mechanisms affect the QoI delivered to the user:
o The use of stochastic network performance models to estimate QoI: Such models
are cheap, but may be inaccurate under certain circumstances.
o The role of active and passive measurements of network performance in estimating
QoI. Specifically, how does the overhead of measurements impact the QoI, and is
there are point of diminishing returns where additional measurements do not
significantly improve QoI?
Techniques for maximizing time averages of network attributes such as throughput and power
expenditure, and for maximizing concave functions of these time averages, are developed for
stochastic networks with ergodic events in our prior work [1][7] using Lyapunov Optimization
theory. This is an extension to the prior Lyapunov Stability theory of [2]. Alternative fluid
model analyses are developed in [5][6] and early primal-dual techniques are considered for
simple one-hop wireless problems with infinitely backlogged sources in [3][4].
While these techniques are powerful, they do not allow optimization of the more complex
Quality of Information (QoI) metrics described in our proposal. This is due to several reasons:
First, the metrics may have combinatorial and non-convex properties, and may therefore be
fundamentally intractable. Second, the proposed metrics should include models of distortion
which are not yet fully understood. Third, the metrics may be associated with output from multi-
stage network tasks that cannot yet be treated by stochastic network optimization theory.
We propose to extend the theory to address these issues. Preliminary work will focus on metrics
of distortion. These models will be enhanced as we learn more about distortion throughout the
The theory we develop in year 1 will be extended and plugged into more extensive models and
networks in future years, as we learn more about collecting and interpreting network data.
Validation Approach
In the first year we will have two thrusts for validation. For the definition of QoI we will work
closely with INARC to ensure that the measures of QoI are consistent with the ability to extract
knowledge and make decisions given different network structures. INARC researchers are part
of this team. For optimizing QoI metrics, we will prove, if possible, that optimal values exist
and determine what they are. If optimal values do not exist we will evaluate the local maxima
and attempt to set bounds.
Research Products
In the first year we will have:
- A multidimensional model of QoI
- An initial result on the impact of network structure on QoI
- A general algorithm for finding the local optima for a class of stochastic QoI
functions
References
[1] S. Marti, T. Giuli, K. Lai, and M. Baker, “Mitigating Routing Misbehavior in Mobile Ad
Hoc Networks,” Aug., 2000.
[2] M. Grossglauser and D. Tse, “Mobility increases the capacity of ad-hoc wireless
networks,” in Proceedings of IEEE INFOCOM, 2001.
[3] K. Liu and Q.Zhao, “Decentralized Multi-Armed Bandit with Multiple Distributed
Players,” submitted to IEEE 2010 ICASSP; available at http://arxiv.org/abs/0910.2065v1
[8] Department of Defense. “Network Centric Warfare Report to Congress”, July 2001.
[9] J. Garstka, and D. Alberts. 2003. “Network Centric Operations Conceptual Framework
Version 2.0.” Vienna, Va.: Evidence Based Research, Inc.
[12] Headquarters, Department of the Army. “Field Manual 6-0: Mission Command:
Command and Control of Army Forces”, August 2003.
[13] D. Alberts, and R. Hayes, 2006. “Understanding Command and Control”, CCRP
Publication Series.
[14] P. Driscoll, M. Tortorella and E. Pohl, “Information Product Quality in Network Centric
Operations”, Operations Research Center of Excellence Technical Report DSE-TR-0516,
May 2005.
[18] Y. Hung, A. Dennis, L. Robert, “Trust in Virtual Teams: Towards an Integrative Model
of Trust Formation,” 37th International Conference on System Sciences, 2004.
[20] B. Stvilia, L. Gasser, M.B. Twidale, L.C. Smith, “A Framework for Information Quality
Assessment,” JASIST, 58(12):1720-1733, 2007.
[22] M. Bouzeghoub and V. Peralta, “A Framework for Analysis of Data Freshness,” Proc.
of ACM IQIS, 2004.
Task Overview
In an integrated tactical communication network, heterogeneous components with different
priorities and time-varying QoI requirements coexist and inter-operate. Such a dynamic
composition of the network leads to highly dynamic interference patterns across network
components. As a consequence, temporal and spatial dynamics of interference play an important
but not well understood role in link stability and network connectivity. Fading dynamics and
node mobility, as well as security requirements, further impact the very existence and quality of
links, and consequently, the connectivity and capacity of the integrated heterogeneous network.
Task Motivation
Understanding the impact of different types of dynamics in a network on OICC is important to
gaining a full understanding of how OICC can be controlled and which protocol mechanisms
may be most suitable for achieving optimal OICC. Prior efforts on modeling the impact of
dynamics on networks only consider traditional notions of capacity.
We also wish to point out that if the wireless channel changes frequently, tracking would require
frequent exchange of control overhead, which would impact capacity. If one were to avoid
tracking, it may result in the use of sub-optimal system parameters which could in turn affect
capacity.
Implicit in the above discussion is the fact that, network dynamics affect the interference patterns
within the network as well. The rate of transmissions would affect the interference projections in
the network. Depending on the dynamics of the network, determining the set of links that are to
be simultaneously activated is not an easy problem to solve.
Characterizing connectivity when link failure arise from interference and compromised
security properties
Recent work has used percolation theory to study information spread, mobility and resilience in
networks. Essentially, the uncertainty in links or mobility is modeled as (random) link failures,
resulting in a bond percolation model [3][4]. Mobility models considered are – constrained i.i.d.
mobility (nodes move to a new location chosen i.i.d. within a circle of radius a) and discrete-time
Brownian motion. While these models are more refined than the unconstrained i.i.d. model in
[5], they still do not capture the mobility of a realistic tactical MANET.
Here, we propose to consider realistic models for link failures. In contrast with the prior art cited
above, we posit that the link failures will not be i.i.d., since it is likely that links connected to the
same node fail at the same time (or links in the same area, perhaps due to interference). Failure
due to a common interference source can be modeled by considering correlated failure of links.
Failure of all links connected to a given node can be modeled as a node (site) failure (instead of
bond failure), leading to a mixed bond and site failure model.
This framework will help us model link failures due to security compromises as well. For
example, if shared keys in two nodes with an active link are compromised, confidentiality on
that link it lost; if this is a required security property, then the link must not be used, i.e., the link
is effectively disconnected. The model of link failure we develop is general enough to capture
this type of security compromise and others, and can be extended in the future to capture the
dynamics of cascading failures. Thus this model will be useful in determining the impact of loss
of security properties, studied in Task 1.4, and can be re-used by that task.
Capturing the impact of dynamics on transmission rate and the overall capacity
In our first year, we will also study the impact of dynamics of link quality on the transmission
rate and as a consequence the overall network capacity. In our preliminary work, we have
considered the impact of rate on a hybrid network wherein infrastructure stations are augmented
by multi-hop relays [6][7]. Extending this work to a fully ad hoc setting is a challenge that we
seek to address. Furthermore, the model will be refined to reflect the uncertainty in the
achievable capacity due to network dynamics.
The work in [6][7] also considered throughput to reflect the capacity metric. However, in our
work, we seek to find the capacity in terms of the QoI. We will consider dynamic changes in
flow patterns, both temporally and spatially and capture these when computing the capacity. As
We will be to begin with low complexity models (as an example, a fixed rate at which all the
links of the network vary, saturation traffic) in our first year. Progressively, we will increase the
realism and complexity to refine our models. We expect to characterize the homogeneous flows
in the first year and extend it to cases with heterogeneous traffic and nodes in the second year.
Collaborating with SNARC and INARC: The mobility of nodes in the network will depend on
social interactions among the soldiers. Furthermore, the types of traffic generated will also have
an impact on the information flows and the resistance/vulnerability to network dynamics. We
expect to interact with the INARC to obtain good representations of the traffic. The dynamics of
the network itself (topology) will be also varied based on models that we expect to obtain from
INARC and SNARC.
Validation Approach
We will use the models developed here to produce numerical results capturing the impact of
different types of dynamics (interference, mobility) on OICC. We will compare these results
with traditional capacity measures to determine the relative importance of different types of
dynamics and the order of their impact.
Research Products
In the first year we will
1. Produce a characterization of the impact of interference dynamics on the instantaneous
and intermittent connectivity of an integrated heterogeneous network.
2. Produce a characterization of the impact of interference dynamics on the delay profile of
integrated heterogeneous networks.
References
[1] W. Ren, Q. Zhao, and A. Swami, "Connectivity of Heterogeneous Wireless Networks,"
submitted to IEEE Transactions on Information Theory, available at
http://arxiv.org/abs/0903.1684
[2] W. Ren, Q. Zhao, and A. Swami, "Power Control in Cognitive Radio Networks: How to
Cross a Multi-Lane Highway," IEEE Journal on Selected Areas in Communications
(JSAC): Special Issue on Stochastic Geometry and Random Graphs for Wireless
Networks, vol. 27, No. 7, pp. 1283-1296, September, 2009.
[3] Z. Kong and E. M. Yeh, “Connectivity, Percolation, and Information Dissemination in
Large-Scale Wireless Networks with Dynamic Links,” submitted to IEEE Transactions
on Information Theory, 2009.
9.5.9 Task C1.4: Modeling the Impact of Data Provenance and Confidentiality
Properties on QoI (K. Levitt and P. Mohapatra (UC Davis); A. Smith, S. Zhu,
and A. Yener (Penn State); S. Krishnamurthy (UC Riverside))
Task Overview
We expect the QoI is impacted by security characteristics of the information source(s), the
network that transports it, and the hosts that process it. Security has a significant impact on the
amount and quality of information conveyed over a tactical network. Our aim is to characterize
the impact of security on the quality of information but also to mitigate its impact. Our approach
is to model the impact of provenance and confidentiality on QoI.
Intuitively, one can argue that securing information results in an increase in overhead either in
terms of additional information to be shared between the network entities, higher delays incurred
in ensuring security, or administrative overhead to address attacks or false alerts. One can
further argue that this in turn leads to a discount in what can be carried over the network. Certain
choices made in order to enable secure transmissions, e.g., establishing/distributing keys, leads to
resource consumption that would otherwise have been used for data transmission.
We argue that the relationship between the fundamental performance metric in a tactical
network, the operational information content capacity, and information security is far more
complex than simply quantifying security overhead. This is precisely due to the fact that the
metric is based on the QoI. In many applications, various security properties may be required to
ensure a high (or acceptable) QoI; in other words, without these security properties in place, one
cannot achieve high operational capacity.
Technical Approach
We recognize that there exists an abundance of research efforts on information security.
However, the fundamental issue of quantifying the capacity in terms of secure information
transfer in a heterogeneous wireless network, with high mobility and dynamics, is not well
understood. By bridging fundamentals of information delivery and information security, we seek
to overcome this long standing barrier and will contribute to significant progress towards the
long sought after metrics for security.
We take the following approach to answering the fundamental questions above. First, we define
a set of basic security properties that are relevant to tactical networks. Working with the INARC
and CNARC, we will investigate what effects the presences of these properties have,
individually and jointly, on the actual QoI needed by applications and users. We will then
model classes of high-level paradigms towards providing these security properties and determine
the achievable QoI with and without the presence of an attacker. By necessity, we will model
attackers but only through models (not actual attack scripts) in order to gain a full understanding
of the impact attacks may have on QoI.
During the first year, our focus will be on the modeling aspects of two of the most important
security properties: provenance and confidentiality. These two properties are most relevant to
tactical applications. The models will be developed to analyze and evaluate the impact of these
properties on QoI. In the subsequent years, we will also address the modeling of various security
mechanisms and their impact on QoI. Note that the two properties are not completely
Thus, the following subtasks are proposed for this part of the study:
A. Modeling provenance and its impact on QoI: In this subtask during the first year we focus
primarily on modeling provenance and its impact on QoI. This will be a cross-cutting
model with input from INARC and CNARC. We will model several notions of provenance
and consider aspects such as authentication and non-repudiation. This work will have direct
implications on the Trust CCRI.
B. Modeling confidentiality and its impact on QoI: In this task we will focus on developing
models that captures the fundamentals of the confidentiality properties with varies degrees,
and their relationship with QoI.
C. Modeling the Mechanisms of Security Protocols to enhance QoI: In this task (which is
deferred until the 2nd year and beyond), we will focus on mechanisms to provide
provenance and data confidentiality. For provenance we focus primarily on cryptographic
techniques used for authentication but also on methods to overcome compromised nodes on
multipath routes. For confidentiality we consider different encryption mechanisms and
their impact on node processing and data delivery, for example the impact of key
compromise; we seek to characterize the impact of key distribution and certifiability.
Data provenance refers to the one's certainty about the origin of and operations on data from its
source through its transfer to a destination. Provenance subsumes the properties of authenticity
and non-repudiation. Authenticity allows peer nodes to have proof of who they are
communicating with, or to validate the source of data. Non-repudiation provides proof of
identity or origin to a third party such that a communicating party cannot later deny transmitting
a message or performing an operation.
Developing a cross-cutting model of provenance: Working with the SCNARC and the
INARC, we will develop models that relate provenance to the achievable QoI. These models
will be abstract in the following sense: we do not consider the mechanisms for providing
provenance, but only the degree to which the property is satisfied. These models will allow us
(i) to capture the relative importance of each security property with respect to QoI, and (ii) to
determine the sensitivity of QoI to the degree that the properties are satisfied. The key challenge
We define four regimes for investigating the impact of each property on QoI: (i) We assume that
all nodes in the network satisfy the desired property(ies) perfectly. (ii) We relax this assumption
so that only a certain percentage of nodes satisfy the desired property(ies). (iii) We relax our
assumptions further to allow a partial or probabilistic satisfaction of the desired property(ies) by
each node. (iv) We model the case in which the satisfaction of properties in nodes changes over
time. This schema is meant to be a flexible starting point for our investigation. The exact
modality of the modeling will be adapted based on intermediate research results.
Collectively, these regimes will enable us to determine quantitatively how security – including
the overhead it incurs – impacts QoI. The first regime allows us to determine the impact of
perfect security on QoI. The second and third regimes allow us to determine QoI sensitivity to
heterogeneous nodes (not all nodes achieve the same level of security), a portion of the network
being compromised (not all nodes satisfy a property), and the strength of the security
mechanisms in an abstract way (not all nodes satisfy the property perfectly). The fourth regime
allows us to consider network dynamics.
Important areas that we will address with the INARC include parameterizing the impact of a
security property with respect to information, i.e., which properties impact different types of
information the most. We will work with the SCNARC on defining how the security properties
applied to different pieces of information impact overall trust, and how they impact decision
processes. This will shed light on the importance of different types of information and allow us
to adjust security properties on a per-piece of information basis, thus incurring the overhead due
to security only as required. We will also work with the SCNARC and INARC to determine
how the presence of differing security properties on multiple pieces of information, perhaps each
with different importance will impact QoI and decision making. It is clear that the work on
security modeling serves a dual purpose: it plays a major role in determining QoI, and it has a
high impact on the Trust CCRI.
Characterizing the impact of provenance on QoI: For provenance the four regimes apply
directly. The first step is to ensure that the properties of authenticity and non-repudiation apply to
the source node. If the source node cannot be authenticated, there is no provenance associated
with the generated data. Likewise, without a non-repudiation property, a third party cannot
verify the source of the information. The non-repudiation property should apply to the transit
nodes as well, i.e., the nodes that forward the data from the source to its destination. Because
provenance is related to the “chain of evidence” when passing data, if a transit node operates on
the data, exactly what operation was performed must be certifiable. The interaction between
provenance and the achievable performance in the network is complex and not well understood
so far. Furthermore, providing security and trustworthiness for provenance records is
challenging.
Most of the existing research focused on provenance relies on recording the entire history of
information, annotations, information-flow recoding etc. However, in a heterogeneous dynamic
network, the provision of provenance and its impact on the information quality and content is not
well understood. Moreover, the history-based approaches could impose various resource
constraints in terms of bandwidth and storage. The risk of explosion and the complexities related
to the history-based approaches further limits their applicability.
Is it possible to identify a compact form to deliver provenance information with a policy that the
full information is stored for possible audit or for query by a decision maker?
Typically providing provenance can be done by the following generic ways. Strong provenance
is provided by requiring the operator node to include some form of information that uniquely
attests that it performed the operation. Digital signatures or the use of private keys to encrypt the
information are mechanisms that can support strong provenance. Strong provenance provides a
means for ensuring non-repudiation i.e., a node that includes this unique information is
responsible for the operations on the message and its contents up to the point wherein the “node
signs the message in some form.”
Finally, a weak sense of provenance can be obtained by witnesses that observe the action. The
degree of provenance associated with an operation observed by a witness is directly dependent
on the trustworthiness of the witness. Thus, this is directly tied to the trust CCRI . All of these
high level strategies have direct implications on the QoI achievable in the network, which
includes how QoI is impacted by the overhead of security mechanisms. As detailed below, we
will characterize these approaches mathematically in this task.
Operator provenance: A node can be required to include information that uniquely ties to the
operation that it performs. As described earlier, digital signatures or simply using private keys to
encrypt the message are mechanisms that can help satisfy this requirement. The questions that
arise are (a) How effective is a signature in uniquely tying the operation to the node performing
the operation? Stated otherwise, how likely is it that an adversarial node can override this
security property to destroy the provenance? and, (b) How does the provision of operator
provenance affect the data delivery performance and the QoI? To illustrate these issues,
consider that digital signatures serve as the mechanisms for supporting operator provenance. A
Different applications will have different requirements in terms of the provenance and
performance. Further, in a heterogeneous network, different nodes will have different
capabilities in terms of providing operator provenance. From the perspective of mechanisms,
different processing capabilities and, thus, will not be able to either sign or verify signatures.
Our approach is to consider a heterogeneous network consisting of varying degrees of operator
provenance. We will model the operator provenance to have a specific impact on the data
delivery. Together these will have an impact on the QoI metric which will also be captured. In
fact, our approach is expected to lead to a region of QoIs which will allow us to capture the
trade-offs between information quality (performance) and security (provenance).
Authentication: The use of operator provenance is unlikely to be of use in cases where a node
may be compromised, its signatures have been has been stolen, or is of unknown identity. Here, a
higher level known entity (such as a decision maker – or a digital counterpart) will have to
authenticate a node. The impact of authentication is very much dependent on the location of the
higher level authenticating entity. If this node is further away (a decision maker overseeing a
large group of soldiers) or the channel qualities are poor, obtaining the authentication may be
time-consuming and in some extreme cases, may be impossible. There is then, an inherent trade-
off between information quality, provenance and performance. If obtaining the authentication
takes time, the data may become somewhat stale. The question then would be whether there is a
sufficient level of QoI for the application. If this is not the case, the question would be whether a
weaker provenance is sufficient – but accepting the potential QoI degradation? We will carefully
model these artifacts considering different topologies and application requirements.
Witnesses: A node could simply depend on other observing nodes (witnesses) to provide
provenance on a node’s actions. This would depend on the number of observers, and the trust
that the node seeking provenance, has on these observers. Clearly, this would depend on the
density of the network, mobility and the likelihood of having colluding adversaries, which could
be mitigated by artificial diversity among the nodes. These factors will result in an associated
uncertainty with regards to the provenance that can be provided but will reduce the overhead
incurred and thus could result in better data delivery capabilities. We will also consider this with
different parametric choices with regards to the aforementioned factors and will estimate the
possible QoI and the operational capacity.
In addition, the provenance provided by a witness directly depends on the trustworthiness of the
witness. Thus, this part of our work will tie in closely with the trust CCRI.
A combination of possibilities: Finally, we envision considering a network with all of the above
possibilities, all available to different extents and compute the achievable QoI in terms of
information quality (performance) and security (provenance) that can be provided.
1. We will first identify in detail the challenges in trustworthy provenance and define a
preliminary adversarial model.
2. We will analyze the potential authentication and confidentiality issues related to securing
provenance information from the adversary.
3. We will start with a single hop system, where a set of users transmit their observation to
the central commanding unit. In this case we will attempt to model the secured data
provenance using a query processing or a storage history method. Then, we shall evaluate
the relationships between the trust and the provenance of the data. We shall derive more
fundamental mathematical relationship to capture the amount of privacy of source,
quality of the information, and the level of desired provenance.
4. The single hop model will then be expanded to multihop transmissions in heterogeneous
networks.
5. In the subsequent years, with inputs from EDIN, we will expand the model of secured
data provenance in multihop networks by integrating node dynamics.
Data confidentiality is the ability to prevent data from being learned by unauthorized nodes. In a
tactical network, it is of paramount importance to ensure that information is available to those
who are authorized to receive/decode it, and is protected from those who are not. The latter may
be external (malicious) entities, compromised nodes, or simply nodes that have a lower security
clearance as compared to the authorized nodes. Data confidentiality is the ability to prevent data
from being learned by unauthorized nodes.
Impact of data confidentiality on QoI: Confidentiality also affects the achievable QoI. As
mentioned, confidentiality is required to protect transmitted information from eavesdropping on
the network and from an attacker who has compromised a node. The keys themselves are
subject to compromise and must themselves be subject to confidentiality.
Providing confidentiality with the use of encryption will increase security but will require
processing and thus will affect the quality of information that is delivered and the performance.
As an example, ensuring the confidentiality of large volumes of video data could be difficult and
in some cases even infeasible. However, the same information could be delivered as text data
(but with lower fidelity) and could be provided with a much higher degree of associated
confidentiality. These trade-offs will be carefully assessed.
The second class of approaches that satisfy the confidentiality property are public key-based
encryption techniques. These require the use of public and private keys can provide stronger
security (more difficult to break the encryption). However, public keys have to be certified by
authorities (similar to authentication with provenance). Certification, as authentication, comes
with a price – overhead. One can choose to bypass certification to improve performance;
however, this comes at the risk of man-in-the-middle attacks by compromised adversaries and
thus, would weaken the security in the delivered QoI. We propose to capture the overall
properties of public key based confidentiality in a mathematical model and compare and contrast
the achievable QoI of public key based approaches with that provided with secret key based
approaches. Again, the location dependence (where are the certifiers located), the time taken to
break the encryption versus the time duration for which the confidentiality needs to be preserved,
and the uncertainty in terms of topology and link qualities will be carefully captured in our
models. Our overall goal will be to compare the QoI achieved with the two generic classes of
approaches and find a OICC region which will be represented by the maximum of the OICCs
that are derivable with the two specific classes.
For data confidentiality, the four regimes apply directly to the ability of each node to prevent the
leakage of information either when transmitting or holding the data. For example, achieving this
property imperfectly might correspond to leaking information with a certain probability, leaking
some part of the secret information or, more generally, leaking some correlated secret whose
mutual information with the secret information is bounded. Note, because we are characterizing
QoI, we do not assume that all “bits of information” are equal. Therefore, in these models we
will account for the variable protection of different pieces of information, including the
information associated with encryption keys themselves.
Although we are deferring this effort to the 2nd year, here we provide a brief outline that will
further relate the subsequent applications of the models developed in subtask A and B.
Given the specific properties and functional characteristics of different security mechanisms,
they may affect QoI differently even though they are designed to address the same type of attack.
Our objective is to capture the nuances that are particular to security mechanisms and determine
the impact of specific functions on QoI. The understanding that such models will provide will
allow us to (a) further refine and tighten the fundamental operational capacity bounds that we
compute, and (b) identify and address inherent protocol limitations as discussed later.
Digital signatures are based on cryptographic functions, which as discussed above have varying
overheads and strength. In highly constrained devices, authentication may be possible using
other methods (with quorums, visual evidence etc.). These methods will enable the provision of
provenance to varying degrees. Our objective is to account for these uncertainties in quantifying
the level of provenance that can be provided and in turn, its impact on QoI. The trade-offs
between using the traditional crypto-based class of methods and this more recent class of
provenance methods will be quantified in a fundamental way. In particular, we will quantify the
QoI possible with these two different methods, in terms of both security and performance.
Among others, one model of compromise that we will investigate is the following. The
substantial literature on perfectly secure message transmission (e.g.[1][2][3]) discusses secrecy
in the presence of a subset of nodes leaking information completely; recent work on leakage-
resilient encryption [4][5] provides computational security guarantees in the presence of
partially-leaked secret keys; however, neither of these lines of research provides a
comprehensive network model, nor are their models expressive enough to capture heterogeneous
information. The perspective of QoI is expected to shed light on a unified analysis of security
mechanisms and protocols, and drive the design of novel protocols that address the needs of
these unified systems.
Validation Approach
We will evaluate this work by comparing the impact of security properties on the value of QoI
achieved with the relative usefulness of this information in making decisions. We will work with
task C1.2 and the members of the INARC team to rationalize these values.
Research Products
In the first year we:
- Produce a fundamental mathematical relationship to capture the amount of privacy of
source, quality of the information, and the level of desired provenance.
- Define a QoI metric in terms of a target data delivery performance and desired level of
confidentiality.
References
[2] D. Dolev, C. Dwork, O. Waarts, and M. Yung, “Perfectly secure message transmission,”
J. ACM, vol. 40, no. 1, pp. 17–47, 1993.
[4] M. Fitzi, M. K. Franklin, J. A. Garay, and S. H. Vardhan, “Towards optimal and efficient
perfectly secure message transmission,” in TCC (S. P. Vadhan, ed.), vol. 4392 of Lecture
Notes in Computer Science, pp. 311–322, Springer, 2007.
This project has close ties to the other active project within CNARC as well as both CCRIs and
projects within the other centers.
Within the CNARC this feeds into project C.2. By understanding the theoretical bounds we can
determine how network paradigms can improve QoI.
We will rely heavily on tasks in INARC for determining the characteristics of information
linkages and flows (I1.1). The linkages will be used specifically in C1.2. We are working
jointly with INARC on QoI and the results of task C1.2 will feed into INARC I2.1.
This project both feeds EDIN and requires output from EDIN. It provides EDIN with the
evolution of communication capabilities given dynamics of interference and mobility. This will
in turn impact the evolution of communications, information and social network. This project
will use the mobility models from EDIN as well as the models that relate the interaction of
social, information and communication networks in terms of change in structure.
This project has a direct relationship with TRUST, specifically T1. Both QoI and security
properties impact trust. Several members of the QoI and security teams (C1.2 and C1.4) are part
of T1.
The ITA program addresses Quality of Information and Value of Information specifically in
sensor networks. The ITA program addresses topics such as formal ontologies for representing
QoI, and algorithms specific to sensor networks, such as calibration, data filtering, and sensor
sampling algorithms. In the project presented here, we take a broad view of information to
Research Milestones
Due Task Description
Q2 Task 2 Define first version of multi-dimensional QoI function
Define relationship between authentication and confidentiality
Q2 Task 4
related to securing provenance information from the adversary
Develop the framework of model hierarchies and inter-
Q3 Task 1
relationships both within and outside of CNARC
Q3 Task 2 Determine relationship between network evolution and QoI
Task 3 Characterize the impact of interference dynamics on the
Q3 instantaneous and intermittent connectivity of an integrated
heterogeneous network
Task 4 Derive a fundamental mathematical relationship to capture the
Q3 amount of privacy of source, quality of the information, and the
level of desired provenance
Create a first instantiation of the models to derive expressions
Q4 Task 1
for at least the information carrying capacity of a given network
Develop a general algorithm that finds local optima for low
Q4 Task 2 complxity classes of QoI functions on stochastic networks, with
some form of analytical convergence proof
Characterize the impact of interference dynamics on the delay
Q4 Task 3
profile of integrated heterogeneous networks.
Characterize the OICC with the public/private key
Q4 Task 4
confidentiality class
In this project, we consider: (a) the inherent signaling overhead used to create and maintain the
network; (b) the heterogeneous nature of information flows in tactical networks; (c) the
heterogeneous nature of the nodes that form part of a tactical network; (d) the exploitation of all
communication, processing and storage resources of a network, not just its wireless bandwidth;
and (e) the exploitation of all forms of inter-nodal cooperation, not just methods to avoid
multiple access interference. This will require collaboration with the SCNARC and INARC to
make the models more “information-aware”. Addressing these aspects of the fundamental limits
The results from this project will allow the design of networks and algorithms that approach the
theoretical limits of operational information content capacity. With knowledge of the impact of
network paradigms, protocol structures may then be analyzed and improved so that the optimal
OICC may be achieved.
C2.3 Characterizing the impact of scheduling on QoI – In this task, we optimize cost-QoI
tradeoffs in wireless networks, and determine universal scheduling policies.
Task Motivation
To date, the vast majority of studies on achieving physical-layer concurrency focus on taking
advantage of distributed MIMO cooperation [4]. However, all prior studies focusing on virtual
MIMO solutions ignore the actual overhead incurred for distributed cooperative MIMO to work,
and the synchronization of transmitter nodes and distributed space-time encoding and decoding
make this technique very impractical.
The alternative view of MIMO systems has focused on combating channel fading between
source-destination pairs. In MIMO broadcast channels, the optimum capacity is attained by the
well-known dirty paper coding [5] technique. However, dirty paper coding requires significant
feedback along with non-causal transmitted data information. Recent approaches in achieving the
capacity of dirty paper coding are based on the use of random beam forming [6]. However, these
approaches are not practical and more importantly, they are not designed for distributed systems
such as ad hoc networks.
Initial Hypotheses
We expect that concurrency will greatly improve the QoI and OICC in a network because of the
increased sensitivity of OICC to network capacity and QoI. Concurrency will allow more
information to be transferred and reduce congestion, this improving the quality of the delivered
information. Coupled with the expected reduction in overhead in the techniques used below
when compared to cooperative MIMO, we expect this gain in OICC to be large (e.g., 50%).
Technical Approach
Most current studies on capacity analysis and cooperation are related to homogeneous networks.
For example, most capacity analyses in the literature assume a single traffic class in the network.
Keshavarz-Haddad et al. [1] introduced the concept of transmission arena. Based on that
definition, they presented a method to compute the upper bound of the capacity for different
traffic patterns and different topologies of the network. However, they did not provide closed-
form scaling laws for the network capacity. Toumpis [2]investigated the throughput capacity
when there are s sources and s destinations in the network, where 0 < < 1. Liu et al [3]
extended this result by relaxing the constraint on the number of sources and destinations. While
these results address asymmetric traffic, the results apply to the case of a single type of traffic
pattern in the network. However, practical military networks consist of many types of traffic
patterns. We will investigate the ramification of such assumptions on the throughput capacity
We have shown that the per-source-destination unicast throughput of a tactical wireless network
can attain the optimal value and scale with the number nodes by embracing concurrency of
transmissions at the physical layer using multi-packet transmission and reception (MPT and
MPR). We have also shown that MPR and MPT increase the order capacity of wireless networks
for multicasting and broadcasting applications [9][10][11]. On the other hand, we have also
shown that network coding does not provide any order capacity gains for multicasting or
broadcasting in wireless networks [12][13][14]. Hence, our results show that increasing the order
throughput capacity of wireless ad hoc networks requires concurrency at the physical layer.
We have derived preliminary results that indicate the potential for order capacity increases by
taking advantage of the fading of signals over wireless links to manage interference [7][8]. These
results show that interference management can: (a) require the smallest possible feedback
reported in literature to date; (b) be implemented with current available hardware using simple
encoding and decoding of signals; (c) be extended to tactical networks, because there is no need
for transmitters to synchronize during transmission of the signals; and (d) constitute a viable
alternative to distributed cooperative MIMO approaches.
We will carry out the following activities during the first year:
1. Derive appropriate metrics for QoI-aware networks with heterogeneous traffic. We will
further investigate the appropriate type of cooperation required for networks with
heterogeneous traffic.
2. Study the conditions within which opportunistic interference management can be
implemented with limited complexity, the channel feedback needed from receivers to
senders, and the attainable capacity gains in combination with the use of multi-radio
nodes under limited feedback conditions.
3. Compare the performance of interference management against that of distributed
cooperative MIMO schemes, with emphasis on dynamic networks and taking into
account the signaling overhead incurred in the network.
4. Begin to study the design of low-complexity approaches for the implementation of
protocols that take advantage of interference management for channel access and
scheduling, and the interplay between interference management and cooperative
networking mechanisms that can be implemented above the physical layer, such as
adaptive scheduling, multipath routing, multi-copy forwarding, queue management,
transmission control, and coding.
Our research after the first year will address the results of the four activities listed above to
mount a study of the modeling and design of a clan-slate approach to collaborative networking in
which the design of the protocol architecture of a tactical guided by the integration of
communication and storage networks with information and social networks.
Research Products
In the first year we will:
- Report on the conditions within which interference management is practical, the
channel feedback needed from receivers to senders, and the attainable capacity gains
in combination with the use of multi-radio nodes under limited feedback conditions.
References
[1] A. Keshavarz-Haddad and R. Riedi, “Bounds for the capacity of wireless multihop networks
imposed by topology and demand”, ACM MobiHoc, pp. 256-265, September 2007.
[2] S. Toumis, “Asymptotic capacity bounds for wireless networks with non-uniform traffic,”
IEEE Transaction on Wireless Communications, vol. 7, No. 5, pp. 1-12, May 2008.
[3] B. Liu, D. Towsley, and A. Swami, “Data gathering capacity of large scale multihop
wireless networks,” ACM MobiHoc 2008.
[4] A. Ozgur, O. Leveque, and D. Tse, “Hierarchical Cooperation achieves Optimal Capacity
Scaling in Ad Hoc Neworks,” IEEE Transaction on Information Theory, Vol. 53, No. 10, pp.
2549-2572, 2007.
[5] M. Costa, “Writing on Dirty paper,” IEEE Transaction on Information theory, May 1983.
[6] M. Sharif, B. Hassibi, “On the capacity of MIMO broadcast channels with partial side
information,” IEEE Transaction on Information theory, vol. 51, pp. 506-522, February 2005.
[8] Z. Wang, M. Ji, H.R. Sadjadpour, and JJ. Garcia-Luna-Aceves, "Interference Management:
A New Paradigm in Wireless Cellular Networks ," IEEE Milcom 2009 conference.
[11] Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, ``The Capacity and Energy
Efficiency of Wireless Ad Hoc Networks with Multipacket Reception,'' Proc. ACM MobiHoc
2008, Hong Kong SAR, China, May 26--30, 2008.
[12] Z. Wang, S. Karande, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “On the Capacity
Improvement of Multicast Traffic with Network Coding,” Proc. IEEE MILCOM 2008, San
Diego, California, November 17--19, 2008. (IEEE Fred W. Ellersick 2008 MILCOM Award
for Best Unclassified Paper).
[13] S. Karande, Z. Wang, H. Sadjadpour, and J.J. Garcia-Luna-Aceves, “Network Coding Does
Not Change The Multicast Throughput Order of Wireless Ad Hoc Networks,” Proc. IEEE
ICC 2009, Dresden, Germany, June 14-18, 2009
Task Overview
We will to investigate the capacity of a tactical network when nodes cache information for which
there has been interest around them. We expect that the capacity improvements will be a direct
function of the popularity of information, the distribution of nodes with common interest, and the
temporal aspect of the QoI of the information objects that may or may not be replicated
throughout the network. An interesting aspect of this work that will be addressed starting in year
two is which nodes may store information based on permissions (i.e., security considerations).
Our preliminary insight on the possibility of separation theorems for heterogeneous traffic
indicates that we should be able to characterize the capacity of a tactical network in the presence
of in-network storage by breaking the problem into the dissemination of individual objects, or by
focusing on individual “social networks.” One important distinguishing characteristic of our
models is that they will directly account for social networks and links between information as
Task Motivation
Several researchers [1][3][4][5][6][7][8][9] and ourselves [2] have shown that the “store-carry
forward” approach to information dissemination does increase the order throughput of a wireless
networks when nodes move. We have also demonstrated how opportunistically sharing
information among nodes locally, called cooperative caching [10][11], can reduce both the
latency of retrieving information and load on the networks. In [12] we have shown that the
scalability of a store-and-query network with limited resources such as energy and storage
depends critically on application-specific event and query traffic. In [13][14], we have explored
how different content replication strategies impact the latency of information retrieval in DTN-
type vehicular networks. Furthermore, our results on (n,m,k)-casting formulation clearly show
that the order capacity of a network increases as we allow information to flow from the nearest
node, rather than the original source of the information. This intuition is further supported by
recent demonstrations in the DARPA DTN program, where the use of in-network storage has
been shown to increase substantially the capacity of a network and in some cases to enable any
networking.
Given these encouraging preliminary results and the suitability of in-network storage to military
networks, we believe this approach has a significant possibility of providing order gains for
OICC.
Initial Hypotheses
We expect that the combination of knowledge of underlying social (who will use common
information) network and mobility characteristics will greatly improve the performance of in-
network storage mechanisms over algorithms. Coupled with the knowledge of information
requirements (e.g., latency, freshness) we can expect to see large (e.g., greater than 50%) gains in
OICC compared to traditional in-network storage algorithms.
Technical Approach
This task differs from others in that here we assume that information has some persistent value
(that perhaps degrades over time) and that information is shared by more than one member of a
network. As such, we can leverage the storage of nodes and their mobility in delivering
information, not just the links through which nodes communicate.
This task will consider two important factors that impact the benefits of in-network storage and
hence the strategies that will provide the best QoI: mobility and the existence of social and
information networks. Mobility impacts two aspects of in-network storage – where to store
A platoon itself may be partitioned as squads within the platoon move away. As a result,
network partitions may occur, or the cost of communicating between squads may become high.
To deal with this problem, nodes should be able to detect and predict platoon partitions or node
split. By monitoring or predicting the neighbor mobility pattern, a node may be able to predict
its split from other platoon members. Then, it can prefetch and cache some data in advance, to
ensure that the data is still available when the partition occurs.
Other mobility patterns will also have an impact on the effectiveness of caching. For instance, in
some cases some nodes may act as communication hubs among soldiers as they move around
different platoons. Then, it may be more effective to cache the popular data on these hub nodes,
by which the popular data can be easily accessed by the encountered nodes.
We will collaborate with the EDIN CCRI on mobility, specifically project E4. La Porta is a
member of the E4 team.
We plan to explore how best to aggregate, store, and (upon encounter with other mobile nodes)
disseminate such content in order to allow propagation of high confidence event detections while
maintaining storage and communication efficiency.
The most important issue in social-aware caching is to determine nodes that may cache data for
each other, due either to their mobility patterns or their interests. Some nodes may tend to share
information with each other or request common information. Thee characteristics may result
from underlying social networks. Currently, the “betweenness” centrality metric is widely used
in social network analysis. In the DTN scenario, the number of contacted neighbors of a mobile
node can be used to estimate its betweenness centrality. We can also extend this metric to
include a measure of common interests. We will study how to quantify the centrality of each
node in DTNs and map the node centrality to the popularity of the data that the node should
cache.
Likewise, certain pieces of information may be more frequently accessed or more important than
others. Moreover, certain pieces of information may be linked, i.e., often accessed together.
Current data dissemination schemes are generally data-centric ignoring the user interests or the
linkage of information. We will explore how to optimize the distribution of content from a given
repository given estimates of their popularity in a mobile network, taking into account the
underlying mobility and information linkages. T. Abdelzaher and A. Iyengar of INARC will
collaborate with us on this task. We plan to investigate solutions to this question using random-
walk and probabilistic gossip-based algorithms. We will also study user-centric data
dissemination in DTNs, which aims at forwarding data only to the interested nodes using the
minimum number of relays.
While prior work has examined various aspects of replication and cache consistency, none have
looked at QoI-aware cache management. For certain types of information, “freshness” may be
more important than low latency retrieval. Consider, for example, that traffic conditions in a
road may be much more valuable if they are generated within a few minutes of being used, but
are not sensitive to a few seconds of delay in retrieving them. In these cases it may e important
to refresh caches more frequently so information is fresh. In other cases latency may be more
important; in these cases it may be important to cache information in more places and refreshing
may not be as urgent.
Traditionally, to address QoI parameters such as latency, hop counts are used as a driving metric.
For example, if the requesting node is within (a system parameter) hops away from another
node that has cached the data, it will not cache the data; otherwise, it will. Hence, the same data
item is cached at least hops apart. There is a tradeoff between access latency and data
accessibility. With a small , the number of replicas for each data item is high, and the access
delay for these data will be low. However, with a fixed amount of cache space, the number of
distinct data items cached by the nodes becomes low. If there is a network partition, many nodes
may not be able to access these data items. On the other hand, a large can increase the data
accessibility, but the number of replicas for each data item will be small and the access delay for
these data may be a little bit longer. Based on the application, may take different values, and
we will examine the impact of on the overall cache performance.
1. Work will work with SCNARC models of social relationships in sharing information to
determine the impact of social-aware caching.
2. We will work with the INARC descriptions of information importance and relationships
to determine the impact of information-aware caching.
3. We will study the essential difference between multicast and unicast in DTNs, and
formulate relay selections for multicast as a unified knapsack problem by exploiting node
centrality and social community structures.
4. Given mixes of information and node mobility, determine bounds on caching
performance given QoI requirements.
Validation Approach
We will analyze and simulate the performance of the resulting algorithms that make use of
realistic mobility models and underlying social and information networks. We will compare the
results to algorithms that only use communication network parameters or simple measures of
priority to drive cache management.
Research Products
In this year we will:
- Produce algorithms for in-network storage that increase OICC by leveraging underlying
social and information networks
- Report on bounds on caching performance given QoI requirements.
References
[1] M. Grossglauser and D. Tse, “Mobility increases the capacity of ad-hoc wireless networks,”
Proc. of IEEE INFOCOM 2001, April 22-26 2001.
[4] Chaintreau, P. Hui, J. Crowcroft, C. Diot, R. Gass, and J. Scott, “Impact of human mobility
on opportunistic forwarding algorithms,” IEEE Transactions on Mobile Computing, vol. 6,
no. 6, pp. 606–620, 2007.
[5] E. Daly and M. Haahr, “Social network analysis for routing in disconnected delay-tolerant
manets,” in Mobi-Hoc ’07: Proceedings of the 8th ACM international symposium on Mobile
ad hoc networking and computing, pp. 32–40, 2007.
[6] P. Hui, J. Crowcroft, and E. Yoneki, “Bubble rap: Social based forwarding in delay tolerant
networks,” in MobiHoc’08: Proceedings of the 9th ACM international symposium on Mobile
ad hoc networking and computing, 2008.
[7] J. Ghosh, S. J. Philip, and C. Qiao, “Sociological orbit aware location approximation and
routing (solar) in manet,” Ad Hoc Netw., vol. 5, no. 2, pp. 189–209, 2007.
[8] J. Burgess, B. Gallagher, D. Jensen, and B. Levine, “Maxprop: Routing for vehicle-based
disruption-tolerant networks,” in INFOCOM’06: Proceedings of the 25th IEEE International
Conference on Computer Communications, 2006.
[9] L. Yin and G. Cao, “Supporting cooperative caching in ad hoc networks,” in IEEE
Transactions on Mobile Computing, vol. 5, pp. 77–90, 2006.
[10] G. Cao, “A sacalable low-latency cache invalidation strategy for mobile environments,” in
IEEE Transactions on Knowledge and Data Engineering, vol. 15, 2003.
[11] W. Gao, Q. Li, B. Zhao, and G. Cao, “Multicasting in delay tolerant networks: A social
network perspective,” in ACM Mobihoc, 2009.
[12] J. Ahn, B. Krishnamachari: Scaling laws for data-centric storage and querying in wireless
sensor networks. IEEE/ACM Trans. Netw. 17(4): 1242-1255 (2009)
Task Overview
In this task we explore low complexity techniques related to scheduling to improve the QoI and
OICC in a network. We first seek the existence of universal scheduling algorithms that provide
guarantees in the types of networks we are interested in: highly dynamic. We then characterize
scheduling algorithms in terms of the tradeoffs, specifically delays vs. utility.
Task Motivation
Optimization theory plays a crucial role in designing networks. Traditionally, well known
optimization techniques are invoked in hopes of understanding certain traits or obtaining insights
to network design. We must recognize that for contemporary communication networks,
including tactical MANETs, traditional optimization approaches are not sufficient to address the
underlying dynamics which are more often than not unpredictable. In other words, they do not
treat dynamic situations where the quantities to be optimized are themselves changing due both
to online control decisions (such as resource allocation and routing) and to unpredictable events
(such as traffic bursts, fading channels, and mobility). Traditionally, Markov decision theory
and Dynamic Programming theory have been used to address time varying systems with known
probability models. However, such approaches have well known complexity explosion problems
when networks are large, and also typically require the underlying system to act according to a
given (and perhaps incorrect) probability rule.
Initial Hypotheses
We expect that under certain non-restrictive assumptions, we will be able to find optimal
scheduling disciplines which will lead us to a universal scheduling algorithm for QoI-aware
networks.
Technical Approach
It is important to develop low complexity and adaptive techniques for optimization of time
varying functions subject to time varying constraints. Such a theory should be informed by
existing convex programming, Markov decision, and exact (and approximate [18]) dynamic
programming theory, but should allow practical solutions to complex dynamic network
problems, including the problems that appear in the different tasks of this project. The emerging
field of stochastic network optimization provides auspicious results in this direction. However,
there are several gaps in the state-of-the art stochastic network optimization theory. First, the
existing theory still relies on structured probabilistic assumptions that may not be valid in actual
networks. Second, there is still much unknown in the area of fundamental delay tradeoffs when
1. We will develop new theories for handling uncertainty in networks with general
traffic, channel, and mobility dynamics. Of particular interest is the development of
“universal scheduling algorithms” that provide performance guarantees for time
varying networks with time varying constraints, without requiring a probability
model for the time variation.
Networks experience unexpected events. Links can fail, nodes can move, and traffic bursts can
bring congestion with unpredictable timescales and spatial locations. It is clear that perfect
knowledge of future events could dramatically improve network performance. For example,
knowledge of an upcoming failure at a primary link of a path could be used to preemptively re-
route data. Knowledge that, in the near future, a certain node is going to move into range of a
source and then immediately move into range of its intended destination can be used for
opportunistic relaying. Knowledge of an upcoming traffic flood can be used to mitigate its
detrimental impact. There are of course many more examples of complex sequences of arrival,
channel, and mobility events that, if known in advance, could be exploited to yield improved
performance. However, because realistic networks to not have knowledge of the future, it is not
clear if these events can be practically used. Further, even if full future information were known,
it is not clear how to optimize over the many combinatorial sequences of actions to exploit this
knowledge.
Existing theories of opportunistic scheduling and stochastic network optimization provide partial
solutions to this problem. Techniques of max-weight scheduling, backpressure, and Lyapunov
optimization can treat networks with random traffic, channels, and mobility, often without
knowing the underlying probability distributions associated with these random events. We have
contributed significantly to this area in both theory and practice (see, for example, [1] for
stochastic network optimization theory, [5] for optimal energy-delay tradeoff theory, [7] for
incorporation of new information processing capabilities, and [17] for low-delay implementation
of backpressure routing). However, the strongest known performance bounds (such as for
It is possible to extend these claims to allow more general ergodic assumptions on the underlying
stochastic processes, although the performance bounds degrade in proportion to the “mixing
times” of the ergodic processes. Further, it is possible to develop analytical claims concerning
non-ergodic systems, such as when traffic yields “instantaneous rates” that can vary arbitrarily
inside a network capacity region [1], or when “instantaneous capacity regions” can vary
arbitrarily but are assumed to always contain the traffic rate vector [3]. However, the prior non-
ergodic analyses still assume an underlying probability model, and make assumptions about
traffic rates and network capacity with respect to this model.
What is missing is a universal scheduling theory that adapts to any network, without any
probabilistic assumptions. Such a theory should show how to compute the optimal performance
of a network in the ideal case when the full future is known, should incorporate general network
constraints, and should quantify the performance gap due to our lack of knowledge of the future.
The universal theory should also provide decision making strategies that track the ideal optimum
as much as possible, within the fundamental performance gap bounds. It is not obvious if such a
theory exists, and if any type of performance guarantees can be made without a probability
model. We believe that it is possible, and we propose to develop such a theory during the course
of this project.
Our optimism is informed by the fact that such universal algorithms exist in other areas. For
example, the universal Lempel-Ziv data compression algorithm operates on arbitrary files.
Universal stock portfolio allocation algorithms hold for arbitrary price sample paths
[12][13][14]. There are also network algorithms that have universal properties for limited types
of networks. This includes competitive ratio approaches for wireline and simple classes of
wireless networks without channel variation or mobility and competitive ratio and adversarial
queueing theory approaches for scheduling in switching systems [15] and time varying wireless
links [16]. We have also made important advances using competitive ratio analysis applied to
on-line streaming and computer switching policies [8][9], and to graph coloring problems
(related to interference problems in wireless networks) [10][11]. However, there is a significant
gap in our understanding of universal scheduling in MANETs.
Our first year goal for this task is to demonstrate the existence of universal scheduling algorithms
for wireless networks with general time varying dynamics in the traffic, channel, and mobility
processes. We shall use the “competitive ratio metric” and/or the related “T-slot lookahead”
metric to show how practical algorithms can be designed (without knowledge of the future) that
closely track the performance of ideal algorithms that have knowledge about the future. This will
be demonstrated by meeting the following milestones:
1. For simple networks, we will quantify the achievable performance of idealized
algorithms that have certain levels of knowledge about the future, using metrics related to
competitive ratios and/or T-slot lookahead.
The max-weight, virtual queue, and backpressure scheduling techniques of stochastic network
optimization are well known to optimize throughput metrics and to minimize average power
and/or meet specific average power constraints (see [1] and references therein). However, these
scheduling techniques can lead to large network delay, particularly for multi-hop networks.
Further, the fundamental delay tradeoffs when different performance metrics are optimized (such
as energy or throughput-utility) are known only for single-hop networks and limited classes of
multi-hop [4][5][19][20].
Two recent advances made by different members of our team suggest a possible dramatic
breakthrough in the area of network delay for multi-hop networks for different scheduling
disciplines. First, our work in [20] demonstrated a modified backpressure rule that achieves a
near-optimal delay tradeoff, dramatically improving the prior linear delay bound to a logarithmic
delay bound. However, the algorithm given in [20] requires prior knowledge of Lagrange
multiplier information that is time consuming to obtain in practice. It was not clear if this
modified backpressure approach could be made practical until the second recent advance: The
work in [17] demonstrated a successful implementation of diversity-based backpressure (related
to the Diversity Backpressure Routing Algorithm (DIVBAR) in [20]). The implementation
showed that backpressure routing beats existing tree-based or shortest path routing for diversity
scheduling. Further, a simple change to using Last-In-First-Out (LIFO) scheduling was shown to
yield a dramatic delay improvement (up to 98% in some cases). While it is not yet clear why
such a remarkable delay improvement is achieved, we believe that this change to LIFO is a
simple and practical way of implementing the modified backpressure rule of [20], without
knowing the Lagrange multipliers. This suggests that the dramatic 98% improvement seen in
experiments and the dramatic linear-to-logarithmic improvement in modified backpressure are
one-and-the-same. We propose to study this further, and this study may lead to important
discoveries about network delay and performance delay tradeoffs.
In addition to simple changes to LIFO scheduling, a third significant advance has shown that
incorporating information theoretic data manipulation can improve delay tradeoffs and can
significantly reduce complexity [6][7]. This holds not only for throughput optimization
problems, but problems of energy minimization. Energy-Delay issues have been previously
explored without information theoretic data processing. A fundamental square-root energy-delay
tradeoff law is developed in [4]for a single wireless link, and this is extended in [5] for a multi-
user downlink. Our work in [6][7] looks at energy-delay problems in simple multi-hop network
models. This work develops algorithms that operate with different levels of source cooperation
and availability of queue state information at the individual sources. Our preliminary results
showed that with limited feedback of only one bit queue information is shown to approach the
optimal cost, while providing a low average packet delay. The results illustrate new energy-delay
tradeoffs based on different levels of cooperation and queue information availability. As future
work in this direction, we aim to extend to more general multi-hop dynamic networks, and to
These problems of network delay are known to be notoriously difficult, and new breakthroughs
can have significant impact. Our first year goals in this area are summarized below:
1. We will provide a mathematical foundation for delay analysis of the LIFO based
backpressure rule, for simple classes of networks. This will improve our
understanding of network delay, explain the 98% improvement observed in practice,
and may suggest practical algorithms for dramatic delay improvement in more
complex networks.
2. We will develop new energy/delay tradeoffs for multi-hop networks, possibly
leveraging information theoretic results. We will also begin to explore how such
tradeoffs can be used in more general cost-QoI tradeoffs.
Our universal scheduling algorithms and delay analysis methods should be general enough to
apply to dynamic networks that must perform complex tasks with general QoI metrics. The
particular problems and QoI metrics of interest are described in their respective task areas. We
will be intentional about building bridges between the different tasks of this project and the
optimization theory that might be used for them. This includes extending the optimization
theory, asking if the extended theory can be used in the desired problem, and iterating on this
process to provide a meaningful and useful collection of results. This iteration process is most
applicable to years 2 and beyond. However, in year 1 we seek to present preliminary
representative examples of networks with extended functionality and QoI metrics that can be
incorporated into the optimization paradigm.
Validation Approach
For universal scheduling we will prove that an optimal scheduling discipline exists, or determine
bounds. We will also determine the complexity of the algorithm. We will analyze and simulate
specific algorithms and compare their performance to the bounds on optimality that we have
derived.
Research Products
This year we will:
- Report on the achievable performance of idealized algorithms that have certain
levels of knowledge about the future, using metrics related to competitive ratios
and/or T-slot lookahead.
References
[1] L. Georgiadis, M. J. Neely, and L. Tassiulas, “Resource Allocation and Cross-Layer Control
in Wireless Networks,” Foundations and Trends in Networking, vol. 1, no. 1, pp. 1-149,
2006.
[2] L. Tassiulas and A. Ephremides, “Stability Properties of Constrained Queueing Systems and
Scheduling Policies for Maximum Throughput in Multihop Radio Networks,” IEEE
Transactions on Automatic Control, vol. 37, no. 12, pp. 1936-1949, Dec. 1992
[3] M. J. Neely and R. Urgaonkar, “Cross layer adaptive control for wireless mesh networks,”
Ad Hoc Networks (Elsevier), vol. 5, no. 6, pp. 719-743, August 2007.
[4] R. Berry and R. G. Gallager. Communication over Fading Channels with Delay Constraints.
IEEE Transactions on Information Theory, 48(5):1135–1149, May 2002.
[5] M. J. Neely. Optimal Energy and Delay Tradeoffs for Multi-User Wireless Downlinks.
IEEE Transactions on Information Theory, 53(9), September 2007.
[7] E. N. Ciftcioglu, Y. E. Sagduyu, R. Berry, and A. Yener, Cost Sharing with Network Coding
in Two-Way Relay Networks, In Proc. of the 47th Annual Allerton Conference on
Communication, Control, and Computing, Allerton'09, Monticello, IL, September 2009.
[8] A. Bar-Noy and R. Ladner, ``Competitive On-Line Stream Merging Algorithms for Media-
on-Demand,'' Journal of Algorithms (JALG), 48(1):59--90, August 2003.
[9] A. Bar-Noy, A. Freund, S. Landa, and J. Naor, ``Competitive On-Line Switching Policies,''
Algorithmica, 36(3):225--247, May 2003.
[10] A. Bar-Noy, P. Cheilaris, and S. Smorodinsky, ``Conflict-Free Coloring for Intervals: from
Offline to Online,'' ACM Transactions on Algorithms (TALG), 4(4):44:1--44:18, 2008.
[12] T. M. Cover, "Universal Portfolios," Mathematical Finance, vol. 1, no. 1, pp. 1-29, Jan.
1991.
[14] M. J. Neely, "Stock Market Trading via Stochastic Network Optimization," ArXiv Technical
Report, arXiv:0909.3891v1, Sept. 2009.
[15] M. Andrews, "Maximizing Profit in Overloaded Networks," Proc. IEEE INFOCOM, March
2005.
[19] M. J. Neely, "Super-Fast Delay Tradeoffs for Utility Optimal Fair Scheduling in Wireless
Networks," IEEE Journal on Selected Areas in Communications (JSAC), Special Issue on
Nonlinear Optimization of Communication Systems, vol. 24, no. 8, pp. 1489-1501, Aug.
2006.
[20] L. Huang and M. J. Neely, "Delay Reduction via Lagrange Multipliers in Stochastic
Network Optimization," Proc. of 7th Intl. Symposium on Modeling and Optimization in
Mobile, Ad Hoc, and Wireless Networks (WiOpt), June 2009.
This project has close ties to the other active project within CNARC as well as both CCRIs and
projects within the other centers.
Within the CNARC this takes output from C.1. By understanding the models of OICC and what
factors impact them, we will focus on these phenomena within C.2.
We will work with INARC on leveraging in-network storage. The collaboration will be
specifically on C.2.2.
This project requires output from EDIN. This project will use the mobility models from EDIN
as well as the models that relate the interaction of social, information and communication
networks in terms of change in structure.
Research Milestones
Budget By Organization
Government Funding
Organization Cost Share ($)
($)
CUNY (CNARC) 98,000
PSU (CNARC) 256,930 114,000
UCSC (CNARC) 110,000 48,187
USC (CNARC) 147,000
TOTAL 611,930 162,187
In this project we propose, based on the results of our modeling and experimentation, to analyze
the aspects of protocol structure that limit achieving the theoretical QoI.
Our approach is to undertake an iterative process whereby experimental observations drive the
modeling and protocol analysis. By iterating on this process when can determine how close to
the optimal QoI a network can achieve, how to do it, and gain more insight into the fundamental
models characterizing the limits of the network.
While we cannot know for sure the results of our modeling and experimentation at this time, our
intuition leads us to suspect that certain characteristics of protocols are likely to lead to
limitations. Foremost, because of mobility and uncertainty present in mobile tactical networks,
signaling overhead tends to become a dominant factor in performance degradation. Virtually all
protocols require the exchange of some information to operate; typically the more “optimal” the
algorithm the protocol attempts to implement, the more information, and hence signaling, it
requires.
We consider all cases of military network environment, including troop movement, information
gathering and mission planning. We also consider all military communications environments –
multi-hop wireless, dynamic conditions, high mobility, heterogeneous nodes and heterogeneous
traffic.
References
[2] A.Boukerche, Algorithms and Protocols for Mobile Ad hoc Networks. Wiley Series on
Parallel and Distributed Computing, 2008.
[3] F. Anjum and P. Mouchtaris, “Security for wireless ad hoc networks.” John Wiley, 2007.
Table of Contents
The following sections show different views of the budget. Section 11.1 shows the
Consortium’s budget sorted by Center; Section 11.2, by project; Section 11.3, by
institution. These first three sections are shown at the project level. Section 11.4 shows
the IRC’s budget, at task level, sorted by type of funding (6.1/6.2). Either 6.1 or 6.2
dollars fund each task in that section. (Note that some projects have both 6.1 and 6.2
tasks). Section 11.5 shows cost-share data. Within each view, the respective subtotals
are included.
Note that this table is shown at the task level, any given task is either 6.1 or 6.2.
At the end of five years, the NS CTA must substantially succeed at achieving this goal. This
chapter presents a high-level roadmap of how the NS CTA will pursue this goal over the next
five years.
This chapter is structured as a series of sketches for each of the next five years. Each year begins
with a goal for the year followed by a brief discussion of how we expect to achieve that goal.
The discussion necessarily becomes more speculative in later years but should always convey the
essential research work of the year.
The sketch highlights a key task or project in each of the two major cross-cutting research
initiatives: Evolving Dynamic Integrated (Composite) Networks (EDIN) and Trust in Distributed
Decision Making (“Trust” for short).
The sketch then highlights some key tasks and projects in the year. These tasks and projects are
divided into three broad groupings:
1. Enabling efforts are tasks or projects whose results in the current year will be important
inputs or underpinnings to efforts in the following years. Substantial progress on these
efforts is critical to success in later years.
2. Enriching efforts are tasks or projects that seek to take research results and give them
greater depth. Examples including apply research results to a new domain (e.g. a
different type of network) or combining and expanding research results to create a richer
ability to model or predict. Enriching efforts expand the power of the NS CTA’s
research results.
3. Expeditions are tasks or projects that seek to get ahead of the core research. An example
expedition might seek to examine merged information and cognitive networks by
emulating two merged networks that we are not yet able to accurately model and use the
results of that emulation to inform research on models. Expeditions are vital to spur
progress by providing insight into to-be-solved problems.
The goal for the Year 1 is to begin a robust research program and to create infrastructure that will
be used to begin validating research results as soon as they start becoming available (no later
than Year 2).
EDIN: Task E1.1 (Harmonized Vocabulary and Ontology) is a critical task for Year 1: it will
make initial progress on ontologies so that all the projects in EDIN can use the same terms
to mean the same concepts, and can map research insights derived in one arena more readily
into other research areas. Project E2 will investigate a plethora of mathematical models to
characterize dynamic composite networks.
Trust: A critical task in this year is Task T1.1 (Unified models and metrics of Trust). This task
will create a firm foundation both for the Trust CCRI per se and for many other NS CTA
areas of focus; for example, by providing crucial trust research input into the many issues
surrounding QoI (Quality of Information).
Enabling Efforts: Broadly speaking, the CCRI Task E1.1 highlighted above will enable more
than just EDIN, but will provide the intellectual framework for collaboration and synergy
across the CTA. Another key dimension of enabling efforts is Project R3 (Experimentation
with Composite Networks), which is focused on creating the infrastructure for validation
and for shared experimentation, as well as creating the fundamental scientific understanding
required in order to effectively specify, measure, and validate NS CTA 6.1 and 6.2 research.
Enriching Efforts: Among many examples, Task S2.3 (Community formation and dissolution in
social networks) provides a strong example of how NS CTA research goes beyond
conventional detection and analysis of social networks to address the counterbalancing
question of the dissolution of communities in social networks – a process that inherently
involves understanding the information and communications networks sustaining these
communities. Similarly, the three tasks in Project C2 (Characterizing the Increase of QoI
due to Networking Paradigms) each investigate specific elements of a suite of network
paradigms (concurrency, in-network storage, and scheduling) that together span the major
options for network impact on QoI.
Expeditions: Two notable examples of “expeditions” in Year 1 are highlighted here. Task I1.3
(Modeling Uncertainty in Heterogeneous Information Network Sources) studies the
uncertainty that results from the fusion of different kinds of networked data: this work will
provide new insight into the downside of combining classes of information, whose
uncertainties ultimately impact what can be achieved by, for example, network metrics. If
the approach is validated by this research, it will suggest further extensions to a greater
variety of information across all genres of networks. In the same exploratory spirit, the
research in Task R2.2 (Impact of Information Loss and Error on Social/Cognitive Networks)
provides an early investigation of one the key drivers of cross-genre network interactions:
measuring, reasoning about, and forecasting the impact of information loss in
communication networks and information error in information networks on the structure and
performance of socio-cognitive networks.
The goal for Year 2 is to begin pairwise evaluation of merged network models (e.g. social
networks with communications networks, cognitive networks with information networks) both
from a theoretical perspective and in experiments in the experimental infrastructure.
EDIN: In addition to continuing various research threads, we anticipate utilizing the initial
research in mathematical models (E2.1, E2.2, E2.3), and both human and metric-driven
mobility modeling (E4.1 and E4.2 respectively) for developing a better understanding of the
dynamic behaviors of composite networks in project E3 (both short-term and longer-term
evolution).
Trust: We currently anticipate that research in trust propagation, including the research begun in
tasks T1.3 (Cognitive Models of Trust in Human-Information, Human-Human, Human-
Agent Interactions), T2.1 (Interaction of trust with the network under models of trust as a
risk management mechanism), and T3.1 (Distributed Oracles for Trust), will enable fruitful
research attacks on the propagation of trust properties across the boundaries between two
genres of networks (such as loss of trust at the information level affecting trust at the social
network level).
Enriching Efforts: Project C3 (Achieving QoI Optimal Networking) will extend the exploitation
of QoI to the analysis of optimal protocols for achieving QoI in composite networks, where
both the underlying communications networks and driving social networks are subject to
rapid evolution.
Expeditions: Task R1.2 (Advanced Mathematical Models) has the potential to open up new
conceptual approaches to controlling co-evolving composite networks using economically-
principled techniques, such as perturbing the decision problems of network actors to change
behaviors in useful ways, and bringing economic control paradigms to bear with passive
methods that do not insist on active elicitation of the preferences of participants. If this
economically-principled exploration proves fruitful in Year 1, we envision its Year-1
approaches informing and ultimately merging with EDIN. In this case, in Year 2, this task
may extend its exploration to investigate market-design approaches to intermediate resource
allocation across different network genres and across competing as well as cooperating
networks.
The twin goals for Year 3 are (1) to begin evaluation of complete merged network models (e.g.
social networks, communications networks, cognitive networks and information networks) both
from a theoretical perspective and in experiments in the experimental infrastructure and (2) to
complete the initial trust model such that it is ready for extensive validation in year 4.
EDIN: TBD
Trust: TBD
Expeditions: TBD
12.5 Year 4
The twin goals for Year 4 are to consolidate the results of year 3 and to demonstrate the new
trust model.
If year 3’s work goes according to plan we should have a body of verification results that
highlight the many strengths and limitations of our combined models. Furthermore, we will
have completed work ontologies and metrics and, as a community, will be consistently using the
same terminology. That is a perfect time to collectively re-examine our work in the first part of
the year and drive it to new heights in the second part of the year.
Similarly, after three years of hard work, the trust model should be mature enough for extensive
testing and demonstration.
EDIN: TBD
Trust: TBD
Enriching Efforts: By Year-4, we envision that the extensive research on Quality of Information
(QoI) will enable QoI to be adapted to a wide range of network genres and research efforts,
providing a unifying theme to many previously disparate research issues (such as design
techniques for optimized composite networks).
Expeditions: TBD
The twin goals for Year 5 are to demonstrate our ability to use complete merged network models
to optimize networks for a simple military mission with simple mission metrics and to
demonstrate the use of network models over a broader domain of problems.
EDIN: TBD
Trust: TBD
Expeditions: TBD