Handwritten Text Recognition for manuscripts and early printed texts
Sunbelt 2013 Presentation
1. Layout Algorithm for Clustered
Graphs to Analyze Community
Interactions in Social Networks
Juan David Cruz
CécileBothorel
François Poulet
SUNBELT Conference 2013
May 23rd, 2013, Hamburg, Germany
2. Institut Mines-Télécom
Introduction – I
Real world social networks store both
social and structural information from
the actors
For example, this social network from
Facebook, contains actors’personal
information and the links between
them…
This social network is described by two
types of information, which is
integrated into the communities that
can be identified…
2 Juan David Cruz
3. Institut Mines-Télécom
Introduction – II
? How to represent these structural and profile similarities on the same
plane while presenting the communities configuration?
Combining both types of information helps
to identify groups of similar and well
connected nodes.
-Find groups of friends who are similar
from a point of view of their hobbies
-Find groups of friends from a point of
view of their academic competences
These new partitions can be analyzed
using visual analytics approaches… but
how to use a visual approach to exploit all
this information?
3 Juan David Cruz
4. Institut Mines-Télécom
Layout of communities – Objective
? How to represent these structural and profile similarities on the same
plane while presenting the communities configuration?
-Graph layout has several challenges, from
computational complexity to readability
-These challenges are true for visualizing
and analyzing communities
We want to:
-Reduce the node cluttering while showing
the relationships between profile and
structure in the communities
-Observe interactions between
communities and find important nodes
4 Juan David Cruz
5. Institut Mines-Télécom
Bibliographical revision – Clustered graphs
layout
5 Juan David Cruz
Force based models Hierarchical models Other models
Partition of the graph G
Graph G
-Multilevel
-LinLog
-Multilevel for weighted
graphs
-Kamada-Kawai based
-Hierarchic quotient
graph
-Radial hierarchy
representation
-Hierarchical visual
clustering
-Rectangles and
straight lines
-Topological features
-Overlapping clustered
graphs
! In general, these models are oriented to differentiate the groups, separating
them from each other: establish their limits.
6. Institut Mines-Télécom
Bibliographical revision – Clustered graphs
layout - Examples
6 Juan David Cruz
Multilevel (Eades&Feng) Orthogonal (di Battista et al.)
Weighted Multilevel (Bourqui et al.) Overlapping (Santamaría&Therón)
7. Institut Mines-Télécom
Clustered graphs layout algorithms -
Summary
Information used
Method Structura
l
Profiles Communitie
s
Multilevel /Force directed [Eades&Feng 1999] No No Yes
Rectangles and straight lines [diGiacomo 2007 ] No No Yes
LinLog /Force model [Noack 2003] Yes No No
Hierarchical /Quotient graph [Brockenauer 2001] No No Yes
Multilevel /Force directed w. graphs [Bourqui et al.
2007]
Yes No Yes
Overlapping clustered graphs [Santamaría et al. 2008] No No Yes
Radial representation [Mun& Ha 2005] Yes No No
Hierarchical /visual clustering [Batagelj et al. 2011] Yes No No
Kamada-Kawai based [Shi et al. 2009] No No Yes
Topological features based [Archambault et al. 2007] Yes No Yes
Multivariate layout algorithm (My algorithm) Yes Yes Yes
7 Juan David Cruz
8. Institut Mines-Télécom
Visualization of communities
The algorithm allows correlating structural and profile
information
Each group has diverse categories from the profile
information
The algorithm focuses on individuals connecting
communities (boundary connectors)
Describe boundary connectors with their profile and
their neighborhood
8 Juan David Cruz
9. Institut Mines-Télécom
Visualization of communities – Multi-
Dimensional Scaling
Maps a similarity into a 2/3
dimensional space
MDS uses a (dis)similarity matrix
as input
The output is a set of coordinates
whose distances resemble the
(dis)similarities
(Dis)similarities:
• Geographic distances
• Jaccard distance (vectors, sets)
• Geodesic distances (graphs)
9 Juan David Cruz
Dissimilarity matrix
2D Coordinates
10. Institut Mines-Télécom
Visualization of communities – Types of
nodes
10 Juan David Cruz
These are the nodes connecting
communities: have neighbors in other
clusters, defining the interaction zone.
These are the nodes with edges from/to
nodes in the same cluster only. Placed
outside the interaction zone.
Border nodes
Inner nodes
11. Institut Mines-Télécom
Visualization of communities – The
algorithm
1. For each node set a dissimilarity matrix is calculated using the profile
and the structural information
2. The coordinates reflect the proximity of the nodes in terms of the two
variables – Output of the MDS algorithm
3. The final coordinates transformation defines the interaction zone
11 Juan David Cruz
12. Institut Mines-Télécom
Visualization of communities –
Experiments – Setup
The goal of the experiments is to test the algorithm capabilities of
identifying important nodes regarding the connections and inside
connections
The graphs used in experimentation has a low edge density, expecting to
have a community structure
The community structure makes these graphs suitable for our algorithm
12 Juan David Cruz
Clustered graphs used for the layout algorithm testing
14. Institut Mines-Télécom
Visualization of communities –
Experiments – Facebook
Ambassadors help their communities to get into the interaction zone (where the
communities interact.) The influence of the structural similarity is reflected on the
proximity of the nodes
14 Juan David Cruz
Layout using Fruchterman&Reingold Layout using our algorithm
Interaction
zone
Inner
nodes
15. Institut Mines-Télécom
Visualization of communities –
Experiments – DBLP
In this graph, several well connected nodes remain as inner nodes. These nodes can
be seen as gurus in their communities, however they are not connected with other
communities (treating other topics)
15 Juan David Cruz
Layout using Fruchterman&Reingold Layout using our algorithm
16. Institut Mines-Télécom
Visualization of communities –
Experiments – Protein interaction
With our algorithm it is possible to observe the sizes of the inner nodes and to identify
those nodes important in regard of the interactions. However, this representation has
to be analyzed by an expert to give some insight about the configuration
16 Juan David Cruz
Layout using Fruchterman&Reingold Layout using our algorithm
17. Institut Mines-Télécom
Visualization of communities – Complexity
17 Juan David Cruz
Complexity of the algorithm
The overall complexity of the algorithm is:
The algorithm was implemented using two
parallelization approaches: threaded CBLAS
routines and GP-GPU CUBLAS routines
where available…
Results of the experiments
In general the complexity is quadratic in
function of the number of border nodes
Graph % border nodes Time
(s)
Protein interaction 38% 1021
DBLP network 40% 346
Twitter network 24% 89
Facebook network 25% 36
19. Institut Mines-Télécom
Conclusion and perspectives
Our proposed visualization model focuses on the integration of
the variables existing on a social network
Dividing the nodes into two categories allows identifying
important nodes regarding the communication between
communities
This division reduces the complexity (in average) of the layout
algorithm
The nodes are placed in such way the distance between them
represents their structural similarity
The model was implemented using PT-CBLAS and CUBLAS to
improve some operations of the algorithm (parallelization)
Qualitative studies have to be performed to test the functionality
of the model on real research cases
19 Juan David Cruz
20. Institut Mines-Télécom
Conclusion and perspectives – Future work
The visual model can be extended to include the notion of point of
view, showing the impact of selecting different elements from the
profile information
Use this visualization method on real world applications such as
identification of influencing actors in marketing campaigns
20 Juan David Cruz
Real social networks represent a set of actors (persons, organizations…) connected through different types of relationships (friendship, family, messages sending…)It is possible to identify two main dimensions: a structural dimension representing the connections between the actors, and a compositional dimension representing the individual aspect of the network.In this example from Facebook, the structure is given by the friendship ties from the FB structureOn the other hand, the composition is given by the profile that is part of Facebook. It may include a picture (or pictures), name, country, hobbies
We propose in this thesis an initial approach to integrate these variables, and then a visual analysis tool that exploits this integration of variablesFor example, this network can be divided into groups of friends (well connected-friendship ties) that have similar hobbies or sport preferences (similar profile information), but also it is possible to find groups of friends with similar academic competences on the same networks, which means that the same social network can be observed from different points of view
We propose in this thesis an initial approach to integrate these variables, and then a visual analysis tool that exploits this integration of variablesFor example, this network can be divided into groups of friends (well connected-friendship ties) that have similar hobbies or sport preferences (similar profile information), but also it is possible to find groups of friends with similar academic competences on the same networks, which means that the same social network can be observed from different points of view
Our proposed algorithm has been designed to exploit the three variables composing a social network. First, it uses the affiliation variable to determine the groups, then using the structural and the composition variables it is able to determine the similarity between the nodes in order to place they according to this similarity.The results presented in this work includes only the similarity deduced from the structural variable (neighborhood similarity) because analyzing constraints
Boundary connectors can be seen as ambassadors/representative of their communities, allowing for communicating or receiving information, in general, they help the communities to communicate (interact) with the outside world.
MDS 101Distancias (Estructura, composición,...)Complejidad de O(dn^2) dondedes el número de dimensiones, en estecaso, d=2.
The clustered graph includes the information from the structural and the composition variablesThe first step is to divide the node set into border and inner nodesBorder and inner nodes are treaded using MDS to find the coordinates of each one. The nodes will be placed according to their structural similarityWith a location for each node, the final coordinates are transformed to place border nodes at center and inner nodes in front of their pairs within the interaction zone
The graphs have been already clustered.The idea is to show how the algorithm captures the structural and the composition variables and represents they using a dissimilarity measure.However, in this work we only worked with the structural similarity
The idea of this slide is to show the ambassadors present on the graph and how the layout helps identifying those important nodes.
This slide presents another graph with 834 communities. The idea here is to present the Gurus who remain inside their own communities as inner nodes. These nodes are well connected with members of their community.The requirement of experts…
The evaluation has been performed in a quantitative way measuring the execution times for each graph. The qualitative evaluation hasn’t been performed because of the requirement of an expert to make a research question to be answered using this toolThere are two implementations because not all available computers have CUDA ready GPUs (notably my laptop)There are other details regarding the memory management I haven’t discussed anywhere because I guess a lot of people do the same thing and I found it irrelevant.
The GPU has memory limits, another problem is the data transfer limit between principal memory and the GPU’s memory. (Even using ZeroCopy)I implemented a memory management scheme to require only a O(max(border_nodes,max(inner_nodes))) of space to use, but still not very interesting as I think that may be implemented somewhere else and is not really an innovation. Although it could be used with ZeroCopy schemes with GPU (I haven’t tested it)
Menostexto en lasdipositivasExplicarlas variables antesQuitar el primer indice, dejar el grisPoner un ejemploparaindicarque se hace en estatesisDejar MUY claroque el método de visualizaciónpermiteexplotar los puntos de vista (Perspectivas)