Visualizing Online Social Networks

Unit II
MODELING AND VISUALIZATION
IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

1
Topics Covered
Visualizing Online Social Networks

A Taxonomy of Visualizations
Graph Representation
Node-Edge Diagrams
Visualizing Social Networks with Matrix Based
Representations
Node-Link Diagrams
Hybrid Representations
Modelling and aggregating social network data
Random Walks and their Applications
Use of Hadoop and Map Reduce
Ontological representation of social individuals and 2
relationships
GRAPH REPRESENTATION

3
Graph Theory: Definitions
Node degree: number of edges incident to the node.

Node density:
The density of an undirected graph can be defined as
(2*E)/=N*(N 1), where E is the number of edges.
the density of a directed graph can be defined as E=N*(N- 1).
Path length: number of edges in the sequence that a walk
follows.
Component size: number of connected nodes in a graph.
4
Centrality
One of the key applications in social networks is to identify

the most important or central nodes in the network.
Used to give a rough indication of the social power of a node
based on how well they connect the network.
Three popular individual centrality measures:
Degree centrality
Betweenness centrality
Closeness centrality
5
6
7
8
Clustering
Many social networks contain subsets of nodes that are highly

connected within the subset and have relatively few
connections to nodes outside the subset
The nodes in such subsets are likely to share some attributes
and form their own communities.
Clustering coefficient: to measure the degrees of nodes to
decide which nodes in a graph tend to be clustered together.
9
NODE- EDGE DIAGRAMS

10
Node-Edge Diagrams
A node-edge diagram is an intuitive way to

visualize social networks.
With the node-edge visualization, many network
analysis tasks, such as component size
calculation, centrality analysis, and pattern
sketching, can be better presented in a more
straightforward manner.
Many node-edge layouts have been presented to
place the nodes in the graph for users to clearly
recognize the structure of the social network. 11
Random Layout
A random layout is to put the nodes at random geometric

locations in the graph.
A random layout algorithm can efficiently draw the social
network graph in linear time, O(N).
It can be usable to visualize very large network graphs.
12
Force-Directed Layout
A force-directed layout is also known as a spring

layout, which simulates the graph as a virtual
physical system.
In a force-directed layout, the edges act as spring
and the nodes act as repelling objects
Hence, there exists gravitational attraction or
magnetic repulsion between each node in the
graph.
The running cost of a force-directed layout is
13
much higher than that of a random layout
Node-link graph layouts for
social networks

A random geographic layout A force-based graph layout
14
Tree Layout
A basic tree layout is to choose a node as the root of tree,

and the nodes connected to the root become children of the
root node.
Nodes that are at more levels away from the root become
the grand-children of the root and so on.
A tree layout can display a more structural layout than graph
layouts by considering more contextual information.
15
Three kinds of tree layouts for
social network visualization

Hyperbolic Tree view
Radial Tree layout
An H3 view
16
VISUALIZING SOCIAL NETWORKS WITH 17
MATRIX-BASED REPRESENTATIONS
Matrix Representation
A social network graph can be transformed into a simple

Boolean matrix whose rows and columns represent the
vertices of the graph.
The Boolean values in the matrix can be further replaced
with valued attributes associated with the edges to
provide more informative network visualizations
The matrix-based representation of graphs offers an
alternative to the traditional node-edge diagrams.
With a matrix-based representation, clusters and
associations among the nodes can also be better
discovered when the number of nodes increases. 18
Matrix Visualization

19
Enhanced matrix-based
representation
MatrixExplorer, was developed to visualize social networks

with a Dual-Representation.
provide users with two synchronized representations of the
same network: matrix and node-edge
When a social network is composed of highly interlaced
edges, the matrix-based view can help users quickly recognize
the associations between nodes.
A matrix-based visualization could complement the
shortcomings of a node-edge diagram to better the social
network visualization.
20
Matrix-based representation
of MatrixExplorer
matrix-base view reordered matrix

21
Web Site Example

22
Advantages of matrices
Matrices constitute a good representation to

initiate an exploration.
They do not suffer from node overlapping.
They do not suffer from link crossing each other;
therefore they are a viable alternative for dense
networks.
Matrices show all possible pairs of vertices, they
can highlight the lack of connections and also
the directedness of the connections.
23
NODE-LINK DIAGRAMS

24
Node-Link Diagrams
The principle of node-link diagrams is to

graphically represent actors of the network by
nodes and connections by links.
Node-link diagrams are the most commonly used
representation of graphs and networks.
25
Advantages of node-link
diagrams
These representations are familiar to a wide

audience; they constitute a powerful
communication tool.
For small or sparse networks, node-link diagrams
were more effective than matrices.
For a compact representation, node-link
diagrams are a better choice.
When the analysis requires to perform a number
of path-related tasks, node-link diagrams are
26
more appropriate.
Scaling to Larger Networks
Scaling to large networks with several thousand or even

millions of nodes remains a challenge.
Solutions
Reducing the quantity of information by filtering or
aggregating data
Representing a subset of the network and exploring it
incrementally
Providing more visual space to represent the graph
Using an alternative representation.
27
Matrix Vs Node Link

Usable without reordering Familiar
No node overlapping Compact
No edge crossing More readable for path following
Readable for dense graphs More effective for small graphs
Fast navigation More effective for sparse graphs
Fast manipulation
Usable interactively
More readable for some tasks Useless without layout
Node overlapping
Less familiar Edge crossing
Not readable for dense graphs
Use more space
Manipulation requires layout
Weak for path following tasks
computation 28
HYBRID REPRESENTATIONS

29
Hybrid Representations
To minimize the display space required and limit the

cognitive cost when switching representations
Two hybrid representations:
Augmenting Matrices
Merging Matrix and Node-Link Diagram
The goal of these hybrids is to augment one
representation to overcome its drawbacks and enrich it
with the advantages of the other one.
30
Example Hybrid
Representations
Augmenting Matrices

Merging Matrix and Node-Link Diagram
31
Visualizing Online Social Networks
Visualization of online social networks can be categorized into

three types by their social relationships:
user-centric visualization
content-centric visualization
hybrid visualization.
32
1.User-centric visualization
Present various characteristics of actors and helps

explore different subjects and relationships of interests.
For example, to discover individuals and communities
that meet the following expectations:
actors or groups with similar/complementary features
key actors or those with high social impacts
actors with popular interpersonal relationships or active social
interactions.
user-centric visualizations are widely utilized to help
people access their social networks and discover the
social networks of their interests.
33
2.Content-centric visualization
Various kinds of contents can be properly presented to facilitate

people analyzing social networks.
For example, at least the following three sorts of social network
contents can displayed with content-centric visualization:
1. the distribution of user opinions, including key opinions and
controversial comments
2. user opinions with high impacts toward their social communities,
especially the effective-impact period of time
3. the relations among different content groups.
34
3.Hybrid visualization
Hybrid visualization is to visualize social networks from

different aspects of attributes, e.g. people and contents.
Particularly, online social activities, such as email and dating
services, usually include such elements.
35
NETWORKS
VISUALIZING ONLINE SOCIAL

36
Visualizations of online social
networks
Developed according to the attributes of network sociality to

present their network structure.
Web communities
Email groups
Digital libraries
Web 2.0 services
37
1.Web communities
Different social network services were created on

the Web to help people maintain their social
relationships.
The SixDegrees.com website was an early representative
In 2003, Club Nexus was established
Vizster in 2005, with customized techniques to visualize
social relationships and community structures
A project called FOAF (Friend-of-a-friend) - based on
Semantic Web social metadata
Recently, Microsoft Research Asia proposed a novel object-
level search service, called EntityCube2
38
2.Email groups
For analyzing the social structures of the daily email activities,

visualization techniques are employed to explore different
patterns.
Examples
Soylent visualization
Social Network Fragments (SNF) and PostHistory
Themail
39
3.Digital libraries
In digital libraries, social networks can be mainly analyzed

from two aspects: authors and writings.
Co-Authorship Networks
Some characteristics, such as clustering coefficient and average
path length, can be analyzed
Co-Citation Relations
co-citation social networks can be formed through the
continuously accumulated publications.
With proper visualization of co-citation networks, documents
with high impacts or similar citation patterns can be immediately
identified
40
4.Web 2.0 services
Many Web 2.0 applications are popularly accessed by users to

connect their social networks, such as Twitter and Facebook.
For example, Twitter provides users convenient functionalities
to share the up-to-date status with their followers.
Nexus4 is a visualization application on Facebook
communities to illustrate their large network graphs.
41
Network Data Representation
Ontological Representation of Social Individuals
Ontological Relationship of Social Relationships
Aggregating and Reasoning with Social Network Data
MODELING AND AGGREGATING SOCIAL 4

2
NETWORK DATA
Network Data Representation
Graphs

Matrices
Number the nodes and use the numbers to represent the edges
(e.g., 12 means edge between nodes 1 and 2)
GraphML (XML for graphs)
Do not support the aggregation of network data
Key challenges: Identification and Disambiguation
43
Ontological Representation of
Social Individuals
FOAF is an example of an ontological representation of individuals

Eliminates the drawbacks of early social networks like Friendster,
Orkut
The early social networks had centralized control and were difficult
to manage
FOAF is distributed and has a rich ontology to characterize
individuals
44
Social Relationships
Social networks such as FOAF need to be extended to support

relationships
Support the integration of social information
Integrates/aggregates multiple social networks
Properties of relationships
Sign: Positive or Negative relationships
Strength (e.g., frequency of contact)
Provenance (different ways of viewing relationships)
Relationship History
Relationship roles
Conceptual models for social data semantic net, RDF
45
Aggregating and Reasoning with
Social Network Data
Representing Identity

URI (Universal Resource Identifier)
Disambiguation (A and B are the same; There are two people called
John Smith)
OWL has the sameAS property
Equality
0 The property sameAs is reflexive, symmetric and transitive
Descriptive Logic vs. Rule based reasoners
Rule based reasoners use forward chaining and backward chaining
Descriptive logic is used for classification and checking for ontology
consistency
46
RANDOM WALKS AND THEIR APPLICATIONS

47
Definitions

nxn Adjacency matrix A.
A(i,j) = weight on edge from i to j
If the graph is undirected A(i,j)=A(j,i), i.e. A is symmetric
nxn Transition matrix P.
P is row stochastic
P(i,j) = probability of stepping on node j from node i
= A(i,j)/iA(i,j)
nxn Laplacian Matrix L.
L(i,j)=iA(i,j)-A(i,j)
Symmetric positive semi-definite for undirected graphs
Singular
48
Definitions

Adjacency matrix A Transition matrix P
1
1 1/2
1 1
1
1/2 49
1
Random walk on graphs
On an undirected graph G:

Starting from vertex v0
Repeat for a number of steps:
Go to a random neighbor.
Simple but powerful.
50
What is a random walk
t=0 t=1

1 1
1/2 1/2
1 1
1/2 1/2
t=2 t=3
1
1/2 1
1
1/2
1
1/2
1/2
51
Road map: Random walk

Parameters Algorithms
/Properties
k-SAT
Hitting time st-connectivity
PageRank
Mixing time Approximate counting
Error-reduction 52
Important parameters of random
walk
Access time or hitting time Hij is the expected number of

steps before node j is visited, starting from node i
Commute time: i j i: Hij + Hji
Cover time: Starting from a node/distribution the expected

number of steps to reach every node
53
Applications of Random Walks
on Graphs
Ranking Web Pages

HITS on citation network
Clustering using random walk
54
USE OF HADOOP
AND MAP REDUCE

5
5
What is Hadoop?
Open-source data storage and processing API
Massively scalable, automatically parallelizable
Based on work from Google
GFS + MapReduce + BigTable
Current Distributions based on Open Source and Vendor
Work
Apache Hadoop
Cloudera CH4 w/ Impala
Hortonworks
MapR
AWS
Windows Azure HDInsight
56
IFETCE\M.E CSE\III SEM\NE7012-

SNA\UNIT 2-PPT
Why Use Hadoop?

Cheaper
Scales to Petabytes or
more
Faster
Parallel data processing
Better
Suited for particular types
of BigData problems
57
What types of business problems for
Hadoop?
Customer Churn Recommendation
Risk Modeling
Analysis Engine
Point of Sale
Ad Targeting Transactional Threat Analysis
Analysis
Trade
Search Quality Data Sandbox
Surveillance
58

SNA\UNIT 2-PPT
Companies Using Hadoop

Facebook
Yahoo
Amazon
eBay
American Airlines
The New York Times
Federal Reserve Board
IBM
Orbitz
59
Hadoop is a set of Apache Frameworks and more
Data storage (HDFS)

Runs on commodity hardware (usually Linux)
Horizontally scalable
Processing (MapReduce) Monitoring & Alerting
Parallelized (scalable) processing Tools & Libraries
Fault Tolerant Data Access
Other Tools / Frameworks MapReduce API
Data Access
HBase, Hive, Pig, Mahout Hadoop Core - HDFS
Tools
Hue, Sqoop
Monitoring 60
Greenplum, Cloudera
What are the core parts of a Hadoop distribution ?

HDFS Storage
Redundant (3 copies)
MapReduce API
For large files large blocks Batch (Job) processing
Other Libraries
64 or 128 MB / block Distributed and Localized to Pig
Can scale to 1000s of nodes clusters (Map)
Hive
Auto-Parallelizable for huge
amounts of data HBase
Fault-tolerant (auto retries) Others
Adds high availability and
more
61
Hadoop Cluster HDFS (Physical) Storage

One Name Node
Contains web site to view cluster Name Node

information
V2 Hadoop uses multiple Name
Nodes for HA Secondary
Name Node
Many Data Nodes
3 copies of each node by default

Data Node Data Node Data Node
Work with data in HDFS 1 2 3
Using common Linux shell

commands
Block size is 64 or 128 MB 62
Common Hadoop Distributions

Open Source
Apache
Commercial
Cloudera
Hortonworks
MapR
AWS MapReduce
Microsoft HDInsight
(Beta)
63
What is MapReduce?
Restricted parallel programming model meant for large

clusters
User implements Map() and Reduce() functions
Parallel computing framework
Libraries take care of EVERYTHING else
Parallelization
Fault Tolerance
Data Distribution
Load Balancing
Useful model for many practical tasks
64
Common Data Jobs for MapReduce

Text Index
Graphs
Mining Building
Patterns Filtering Prediction
Risk
Analysis 65
Ways to MapReduce

Libraries Languages
HBase Java*
Hive HiveQL (HQL)
Pig Latin
Pig
Python
Sqoop
C#
Oozie JavaScript
Mahout R
Others More 66
Note: Java is most common, but other languages can be used

Map and Reduce

The idea of Map, and Reduce is 40+ year old
Present in all Functional Programming
Languages.
See, e.g., APL, Lisp and ML
Alternate names for Map: Apply-All
Higher Order Functions
take function definitions as arguments, or
return a function as output
67
Map and Reduce are higher-order functions.
Map and Reduce Functions
Functions borrowed from functional programming

languages (eg. Lisp)
Map()
Process a key/value pair to generate
intermediate key/value pairs
Reduce()
Merge all intermediate values associated with
the same key
68
Example: Counting Words
Map()

Input <filename, file text>
Parses file and emits <word, count> pairs
eg. <hello, 1>
Reduce()
Sums all values for the same key and emits
<word, TotalCount>
eg. <hello, (3 5 2 7)> => <hello, 17>
69
MapReduce Example - WordCount

70
MapReduce v. Hadoop
MapReduce Hadoop
Org Google Yahoo/Apache
Impl C++ Java
Distributed
GFS HDFS
File Sys
Data Base Bigtable HBase
Distributed
Chubby ZooKeeper
lock mgr
SNA\UNIT 2-PPT
71
Limitations of MapReduce

MapReduce
Batch Designed for programming Lack of API / security
processing, a specific paradigm not trained model are
not problem commonly support moving
interactive domain understood professionals targets
(functional)
72
Comparing: RDBMS vs. Hadoop

Traditional RDBMS Hadoop / MapReduce
Data Size Gigabytes (Terabytes) Petabytes (Hexabytes)
Access Interactive and Batch Batch NOT Interactive
Updates Read / Write many times Write once, Read many times
Structure Static Schema Dynamic Schema
Integrity High (ACID) Low
Scaling Nonlinear Linear
Query Response Can be near immediate Has latency (due to batch processing)
Time
73
Social Individuals
FOAF is an example of an ontological representation of individuals

Eliminates the drawbacks of early social networks like Friendster,
Orkut
The early social networks had centralized control and were difficult
to manage
FOAF is distributed and has a rich ontology to characterize
individuals
74
Social Relationships
Social networks such as FOAF need to be extended to support

relationships
Support the integration of social information
Integrates/aggregates multiple social networks
Properties of relationships
Sign: Positive or Negative relationships
Strength (e.g., frequency of contact)
Provenance (different ways of viewing relationships)
Relationship History
Relationship roles
Conceptual models for social data semantic net, RDF
75
In a Nutshell
Visual analysis of social
networks has become
exciting but also
challenging.
A number of techniques
to help node-link
diagrams scale to larger
networks.
Matrix-based
representations can
scale to larger networks 76
and provide insightful
overviews. SNA\UNIT 2-PPT

Visualizing Online Social Networks

Încărcat de

Informații document

Drepturi de autor

Formate disponibile

Partajați acest document

Partajați sau inserați document

Opțiuni de partajare

Vi se pare util acest document?

Este necorespunzător acest conținut?

Drepturi de autor:

Formate disponibile

Visualizing Online Social Networks

Încărcat de

Drepturi de autor:

Formate disponibile

Unit II

MODELING AND VISUALIZATION

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

MODELING AND AGGREGATING SOCIAL 4

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

Simple but powerful.

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

Commute time: i j i: Hij + Hji

Cover time: Starting from a node/distribution the expected

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

Data storage (HDFS)

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT

IFETCE\M.E CSE\III SEM\NE7012-SNA\UNIT 2-PPT