SlideShare a Scribd company logo
Network Sampling, Community
Detection
(Aula 18)
Roberval Gomes Mariano José Macedo
Orientando Co-Orientador
The social network of friendships
within a 34-person karate club
When people are inuenced by the behaviors
their neighbors in the network (a kind of
informational or social contagion.)
The spread of an epidemic disease
(such as the tuberculosis)
FACEBOOK
AIDS Blog Network
Eric D. Kolaczyk
• Statistical Analysis of Network Data
• Lecture 1 { Network Mapping &
Characterization}
• Dept of Mathematics and Statistics, Boston
University
• kolaczyk@bu.edu
Centrality measures
Closeness
(Proximidade)
Betweenness
Usadas para calcular
a importância dos
Vértices dentro do grafo
eigenvector
A centralidade de autovalor
(e.g. PageRank do Google)
mede a importância de um
vértice na rede considerando
cada aresta como um voto.
Modelo de Watts-Strogatz
O quão pequeno é o mundo?
• Quantos contatos separam uma pessoa de outra, em
qualquer lugar do mundo?
• 1990 – John Guare e a peça “Six Degrees of
Separation”.
• Karinthy acreditava que o mundo estava “encolhendo”.
• 1960 – The Small World Experiment por Stanley
Milgram e Jeffrey Travers.
Modelo de Crescimento Exponencial
• Você : 1
• Seus amigos : 100
• Os amigos dos seus amigos : 10000
Modelo de Crescimento Exponencial
• Em apenas 5 passos, 1005 pessoas seriam alcançadas, o que
é maior que a população mundial.
• Esse modelo despreza a probabilidade de cada pessoa ter
amigos em comum.
• Não serve para representar os relacionamentos sociais no
mundo real.
• Mundo real – muitos triângulos.
• Como é possível então, que as redes sociais possuem
muitos triângulos em escopo local e mesmo assim
constituírem um mundo pequeno?
Modelo de Watts-Strogatz
• Criado por Duncan J. Watts e Steven
Strogatz, em 1998.
• Introduz o conceito de coeficiente de
clusterização.
• Consegue simular tanto redes aleatórias
como as redes sociais do mundo real.
Coeficiente de clusterização
Construção do grafo
• O modelo de Watts-Strogatz usa os seguintes parâmetros:
• N  número de nós;
• K  grau de cada nó;
• (0 <= P <= 1) -> coeficiente que modela o grafo, quanto maior
P, maior a desordem;
• C  coeficiente de clusterização;
• L  distância média entre dois nós da rede.
Coeficiente de clusterização
Estudo do grafo
p = 0 (ordenado) p = 1 (aleatório)
• L (n / k) L (ln n /ln k)
• C 3 / 4 C (k / n) 0
• O que acontece com valores de P intermediários?
• P pequeno - diminui consideravelmente L, mantém C praticamente constante.
• P pequeno caracteriza redes de mundo pequeno.
• Condições necessárias: alguma fonte de ordenação; um pouco de
aleatoriedade.
Por que o mundo real é pequeno?
• Eu conheço o Governador! - E eu a Xuxa, agora
pergunta se ele conhece a minha pessoa!
• Ordenação  redes de amigos locais altamente
clusterizadas.
• Aleatoriedade  algum contato que foge da rede local.
Número de Bacon
http://oracleofbacon.org/
Facebook
• Experiência com 5,8
milhões de usuários.
• Distância média: 5,73.
• Distância máxima: 12.
Twitter
• Estudo da relação de um
usuário seguir outro.
• 5,2 bilhões de relações.
• Distância média: 4,67.
Class Size Paradox
• Simple example: each student takes one
course; suppose there is one course with 100
students, fifty courses with 2 students.
– Dean calculates: (100+50*2)/51 = 3.92
– Students calculate: (100*100+100*2)/200 = 51
• Class Size Paradox in Networks
– Average number of friends of a person’s friends is
greater than average number of friends of a
person!!
Comunidades
Community Detection Algorithms
Metric called Modularity.
Community: It is formed by individuals
such that those within a group interact
with each other more frequently than with
those outside the group
a.k.a. group, cluster, cohesive subgroup,
module in different contexts
Community detection: discovering groups
in a network where individuals’ group
memberships are not explicitly given
Complete Mutuality: Cliques
• Clique: a maximum complete subgraph in which all nodes
are adjacent to each other
• NP-hard to find the maximum clique in a network
• Straightforward implementation to find cliques is very
expensive in time complexity
Nodes 5, 6, 7 and 8 form a clique
29
Finding the Maximum Clique
• In a clique of size k, each node maintains degree >= k-1
– Nodes with degree < k-1 will not be included in the maximum clique
• Recursively apply the following pruning procedure
– Sample a sub-network from the given network, and find a clique in the
sub-network, say, by a greedy approach
– Suppose the clique above is size k, in order to find out a larger clique,
all nodes with degree <= k-1 should be removed.
• Repeat until the network is small enough
• Many nodes will be pruned as social media networks follow a
power law distribution for node degrees
30
Maximum Clique Example
• Suppose we sample a sub-network with nodes {1-9} and find a
clique {1, 2, 3} of size 3
• In order to find a clique >3, remove all nodes with degree <=3-
1=2
– Remove nodes 2 and 9
– Remove nodes 1 and 3
– Remove node 4
31
Clique Percolation Method (CPM)
• Clique is a very strict definition, unstable
• Normally use cliques as a core or a seed to find larger
communities
• CPM is such a method to find overlapping communities
– Input
• A parameter k, and a network
– Procedure
• Find out all cliques of size k in a given network
• Construct a clique graph. Two cliques are adjacent if they share k-1
nodes
• Each connected component in the clique graph forms a
community
32
CPM Example
Cliques of size 3:
{1, 2, 3}, {1, 3, 4}, {4, 5, 6},
{5, 6, 7}, {5, 6, 8}, {5, 7, 8},
{6, 7, 8}
Communities:
{1, 2, 3, 4}
{4, 5, 6, 7, 8}
33
Reachability : k-clique, k-club
• Any node in a group should be reachable in k hops
• k-clique: a maximal subgraph in which the largest geodesic
distance between any two nodes <= k
• k-club: a substructure of diameter <= k
• A k-clique might have diameter larger than k in the subgraph
– E.g. {1, 2, 3, 4, 5}
• Commonly used in traditional SNA
• Often involves combinatorial optimization
Cliques: {1, 2, 3}
2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6}
2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6}
34
Group-Centric Community Detection:
Density-Based Groups
• The group-centric criterion requires the whole group to satisfy
a certain condition
– E.g., the group density >= a given threshold
• A subgraph is a quasi-clique if
where the denominator is the maximum number of degrees.
• A similar strategy to that of cliques can be used
– Sample a subgraph, and find a maximal quasi-clique
(say, of size )
– Remove nodes with degree less than the average degree
35
,
<
Network-Centric Community Detection
• Network-centric criterion needs to consider the
connections within a network globally
• Goal: partition nodes of a network into disjoint sets
• Approaches:
– (1) Clustering based on vertex similarity
– (2) Latent space models (multi-dimensional scaling )
– (3) Block model approximation
– (4) Spectral clustering
– (5) Modularity maximization
36
Clustering based on Vertex Similarity
• Apply k-means or similarity-based clustering to nodes
• Vertex similarity is defined in terms of the similarity of their
neighborhood
• Structural equivalence: two nodes are structurally equivalent
iff they are connecting to the same set of actors
• Structural equivalence is too strict for practical use.
Nodes 1 and 3 are
structurally equivalent;
So are nodes 5 and 6.
37
(1) Clustering based on vertex similarity
Vertex Similarity
• Jaccard Similarity
• Cosine similarity
38
(1) Clustering based on vertex similarity
Cut
• Most interactions are within group whereas interactions
between groups are few
• community detection  minimum cut problem
• Cut: A partition of vertices of a graph into two disjoint sets
• Minimum cut problem: find a graph partition such that the
number of edges between the two sets is minimized
42
(4) Spectral clustering
Ratio Cut & Normalized Cut
• Minimum cut often returns an imbalanced partition, with one
set being a singleton, e.g. node 9
• Change the objective function to consider community size
Ci,:a community
|Ci|: number of nodes in Ci
vol(Ci): sum of degrees in Ci
43
(4) Spectral clustering
Ratio Cut & Normalized Cut Example
For partition in red:
For partition in green:
Both ratio cut and normalized cut prefer a balanced partition
44
(4) Spectral clustering
Girvan-Newman Algorithm
Bickel-Chen on Community Detection
Inconsistency result for Newman-Girvan.
http://www.stat.berkeley.edu/~bickel/Bickel%20Chen%2021068.full.pdf
Agglomerative Hierarchical Clustering
• Initialize each node as a community
• Merge communities successively into larger
communities following a certain criterion
– E.g., based on modularity increase
48
Dendrogram according to Agglomerative Clustering based on Modularity
Sampling
• Types of Sampling
– Simple Random Sampling
– Stratified Sampling
– Cluster Sampling
• Sample Size
• Incorporating Sample Design
• Design vs. Model-Based Sampling
– Statisticians tend to favor the design-based point of
view because it makes no assumptions about the
mechanism that generates the data in the survey, as
we explain below
Respondent Driven Sampling (RDS)
• sampling scheme for hard-to-reach populations, based
on link-tracing across a social network with coupon
incentives
• becoming extremely-widely used all over the world;
hundreds of studies done or ongoing, e.g., CDC
National HIV Behavioral Surveillance (NHBS) studies of
injection drug users
• RDS as sampling vs. RDS estimation
To Model or Not To Model;
Design-based vs. model-based
• Model the underlying network?
• What about unknown nodes?
• The recruitment process?
• Coupon refusal?
• The outcome variables (such as HIV status)?
Referências
• Modelo de Mundo Pequeno - Arthur Freitas Ramos & Hugo Neiva de Melo (000 - mundo-pequeno.pptx)
• Collective dynamics of ‘small-world’ networks Duncan J. Watts* & Steven H. Strogatz (001 - w_s_NATURE_0.pdf)
• Statistical Analysis of Network Data Lecture 1 { Network Mapping & Characterization Eric D. Kolaczyk (003 -
kolaczyk_yes13_1.pdf)
• Tutorial: Statistical Analysis of Network Data Eric D. Kolaczyk (004 - Kolaczyk-CN.pdf)
• The Watts–Strogatz network model developed by including degree distribution: theory and computer simulation Y W
Chen1,2, L F Zhang1 and J P Huang1 (005 - JPA-1.pdf)
• Where Online Friends Meet: Social Communities in Location-based Networks - Chlo¨e Brown Vincenzo Nicosia Salvatore
Scellato, Anastasios Noulas Cecilia Mascolo (006 - icwsm12_brown.pdf)
• Mining of Massive Datasets Anand Rajaraman Jure Leskovec (001 – book.pdf)
• Machine Learning for Hackers - Drew Conway and John Myles White (002 - Machine_Learning_for_Hackers.pdf)
• pandas: powerful Python data analysis toolkit Release 0.13.1 Wes McKinney & PyData Development Team(003 –
pandas.pdf)
• Provenance Data in Social Media Geoffrey , Barbier Zhuo Feng , Pritam Gundecha, Huan Liu - SYNTHESIS LECTURES ON
DATA MINING AND KNOWLEDGE DISCOVERY #7 &MC Morgan &cLaypool publishers(004 - Provenance-Data-in-Social-
Media(2).pdf)
• Python for Data Analysis Wes McKinney (005 - Python_for_Data_Analysis.pdf)
• Dominique Haughton l Jonathan Haughton - Living Standards Analytics (006 - bok%3A978-1-4614-0385-2.pdf)
• A nonparametric view of network models and Newman–Girvan and other modularities Peter J. Bickela,1 and Aiyou
Chenb (007 - Bickel%20Chen%2021068.full 2.pdf)
• Collective dynamics of ‘small-world’ networks Duncan J. Watts* & Steven H. Strogatz (008 - 393440a0.pdf)
• Communities in Networks - Mason A. Porter, Jukka-Pekka Onnela, and Peter J. Mucha(009 - 0902.3788v2 a.pdf)
• Managing and Mining Graph Data. Haixun Wang, Charu C. Aggarwal. Beijing, China. Springer. (007 - bok%3A978-1-4419-
6045-0.pdf)
• Social Network. Data Analytics. Charu C. Aggarwal. Springer. (008 - bok%3A978-1-4419-8462-3.pdf)
• RDS Analysis Tools 7.2 RDSAT 7.1 User Manual. Springer. (009 - RDSAT_7.1-Manual_2012-11-25.pdf)
• Eric D. Kolaczyk. Statistical Analysis of Network Data Methods and Models. Springer. (010 - bok_978-0-387-88146-1)
• Networks, Crowds, and Markets: Reasoning about a Highly Connected World. David Easley & Jon Kleinberg (011 -
networks-book.pdf)

More Related Content

What's hot

Spatial patterns in evolutionary games on scale-free networks and multiplexes
Spatial patterns in evolutionary games on scale-free networks and multiplexesSpatial patterns in evolutionary games on scale-free networks and multiplexes
Spatial patterns in evolutionary games on scale-free networks and multiplexes
Kolja Kleineberg
 
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
Kolja Kleineberg
 
iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)
Moniroth Suon
 

What's hot (20)

Community detection
Community detectionCommunity detection
Community detection
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
 
Community detection algorithms
Community detection algorithmsCommunity detection algorithms
Community detection algorithms
 
Detecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph SparsificationDetecting Community Structures in Social Networks by Graph Sparsification
Detecting Community Structures in Social Networks by Graph Sparsification
 
Entropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networksEntropy based algorithm for community detection in augmented networks
Entropy based algorithm for community detection in augmented networks
 
Spatial patterns in evolutionary games on scale-free networks and multiplexes
Spatial patterns in evolutionary games on scale-free networks and multiplexesSpatial patterns in evolutionary games on scale-free networks and multiplexes
Spatial patterns in evolutionary games on scale-free networks and multiplexes
 
Collective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routingCollective navigation of complex networks: Participatory greedy routing
Collective navigation of complex networks: Participatory greedy routing
 
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
The Hidden Geometry of Multiplex Networks @ Next Generation Network Analytics
 
Sunbelt 2013 Presentation
Sunbelt 2013 PresentationSunbelt 2013 Presentation
Sunbelt 2013 Presentation
 
FaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and ClusteringFaceNet: A Unified Embedding for Face Recognition and Clustering
FaceNet: A Unified Embedding for Face Recognition and Clustering
 
Community Extracting Using Intersection Graph and Content Analysis in Complex...
Community Extracting Using Intersection Graph and Content Analysis in Complex...Community Extracting Using Intersection Graph and Content Analysis in Complex...
Community Extracting Using Intersection Graph and Content Analysis in Complex...
 
MobiCom CHANTS
MobiCom CHANTSMobiCom CHANTS
MobiCom CHANTS
 
Convolutional neural networks
Convolutional neural networks Convolutional neural networks
Convolutional neural networks
 
Convolutional Neural Network (CNN) - image recognition
Convolutional Neural Network (CNN)  - image recognitionConvolutional Neural Network (CNN)  - image recognition
Convolutional Neural Network (CNN) - image recognition
 
Ijcnc050213
Ijcnc050213Ijcnc050213
Ijcnc050213
 
Ijcatr04041019
Ijcatr04041019Ijcatr04041019
Ijcatr04041019
 
Face recognition using LDA
Face recognition using LDAFace recognition using LDA
Face recognition using LDA
 
"Visual analysis of controversy in user-generated encyclopedias": A critical ...
"Visual analysis of controversy in user-generated encyclopedias": A critical ..."Visual analysis of controversy in user-generated encyclopedias": A critical ...
"Visual analysis of controversy in user-generated encyclopedias": A critical ...
 
Least Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social NetworksLeast Cost Influence in Multiplex Social Networks
Least Cost Influence in Multiplex Social Networks
 
iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)iCAMPResearchPaper_ObjectRecognition (2)
iCAMPResearchPaper_ObjectRecognition (2)
 

Viewers also liked

Baocaoseminar2012
Baocaoseminar2012Baocaoseminar2012
Baocaoseminar2012
tuongnm
 

Viewers also liked (15)

Kernighan lin
Kernighan linKernighan lin
Kernighan lin
 
The Network, the Community and the Self-Creativity
The Network, the Community and the Self-CreativityThe Network, the Community and the Self-Creativity
The Network, the Community and the Self-Creativity
 
This isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network ageThis isn't what I thought it was: community in the network age
This isn't what I thought it was: community in the network age
 
Baocaoseminar2012
Baocaoseminar2012Baocaoseminar2012
Baocaoseminar2012
 
Maximum clique detection algorithm
Maximum clique detection algorithmMaximum clique detection algorithm
Maximum clique detection algorithm
 
Network centrality measures and their effectiveness
Network centrality measures and their effectivenessNetwork centrality measures and their effectiveness
Network centrality measures and their effectiveness
 
3 Centrality
3 Centrality3 Centrality
3 Centrality
 
Group and Community Detection in Social Networks
Group and Community Detection in Social NetworksGroup and Community Detection in Social Networks
Group and Community Detection in Social Networks
 
Clique
Clique Clique
Clique
 
Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)Social Media Mining - Chapter 6 (Community Analysis)
Social Media Mining - Chapter 6 (Community Analysis)
 
Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors Community Detection in Networks Using Page Rank Vectors
Community Detection in Networks Using Page Rank Vectors
 
Network ppt
Network pptNetwork ppt
Network ppt
 
Social Network Analysis
Social Network AnalysisSocial Network Analysis
Social Network Analysis
 
استاذ
استاذاستاذ
استاذ
 
Networking ppt
Networking ppt Networking ppt
Networking ppt
 

Similar to Network sampling, community detection

log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
ABINASHPADHY6
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network Clustering
Guang Ouyang
 
4.1 network analysis basic
4.1 network analysis basic4.1 network analysis basic
4.1 network analysis basic
jilung hsieh
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
NANDHINIS900805
 

Similar to Network sampling, community detection (20)

community Detection.pptx
community Detection.pptxcommunity Detection.pptx
community Detection.pptx
 
DS9 - Clustering.pptx
DS9 - Clustering.pptxDS9 - Clustering.pptx
DS9 - Clustering.pptx
 
Algorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysisAlgorithm in Social network of graph and social network analysis
Algorithm in Social network of graph and social network analysis
 
Hierarchical Clustering
Hierarchical ClusteringHierarchical Clustering
Hierarchical Clustering
 
2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial2016 Cytoscape 3.3 Tutorial
2016 Cytoscape 3.3 Tutorial
 
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
log6kntt4i4dgwfwbpxw-signature-75c4ed0a4b22d2fef90396cdcdae85b38911f9dce0924a...
 
Clique-based Network Clustering
Clique-based Network ClusteringClique-based Network Clustering
Clique-based Network Clustering
 
4.1 network analysis basic
4.1 network analysis basic4.1 network analysis basic
4.1 network analysis basic
 
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
3b318431-df9f-4a2c-9909-61ecb6af8444.pptx
 
algoritma klastering.pdf
algoritma klastering.pdfalgoritma klastering.pdf
algoritma klastering.pdf
 
network mining and representation learning
network mining and representation learningnetwork mining and representation learning
network mining and representation learning
 
Advanced database and data mining & clustering concepts
Advanced database and data mining & clustering conceptsAdvanced database and data mining & clustering concepts
Advanced database and data mining & clustering concepts
 
Unsupervised learning (clustering)
Unsupervised learning (clustering)Unsupervised learning (clustering)
Unsupervised learning (clustering)
 
clustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdfclustering_hierarchical ckustering notes.pdf
clustering_hierarchical ckustering notes.pdf
 
Graph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear AlgebraGraph Analysis Beyond Linear Algebra
Graph Analysis Beyond Linear Algebra
 
26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt26-Clustering MTech-2017.ppt
26-Clustering MTech-2017.ppt
 
UNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptxUNIT_V_Cluster Analysis.pptx
UNIT_V_Cluster Analysis.pptx
 
machine learning - Clustering in R
machine learning - Clustering in Rmachine learning - Clustering in R
machine learning - Clustering in R
 
Training machine learning k means 2017
Training machine learning k means 2017Training machine learning k means 2017
Training machine learning k means 2017
 
CS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit VCS6010 Social Network Analysis Unit V
CS6010 Social Network Analysis Unit V
 

Recently uploaded

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Peter Udo Diehl
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
Safe Software
 

Recently uploaded (20)

Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo DiehlFuture Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
Future Visions: Predictions to Guide and Time Tech Innovation, Peter Udo Diehl
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Quantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIsQuantum Computing: Current Landscape and the Future Role of APIs
Quantum Computing: Current Landscape and the Future Role of APIs
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with ParametersEssentials of Automations: Optimizing FME Workflows with Parameters
Essentials of Automations: Optimizing FME Workflows with Parameters
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Demystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John StaveleyDemystifying gRPC in .Net by John Staveley
Demystifying gRPC in .Net by John Staveley
 
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»НАДІЯ ФЕДЮШКО БАЦ  «Професійне зростання QA спеціаліста»
НАДІЯ ФЕДЮШКО БАЦ «Професійне зростання QA спеціаліста»
 
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
De-mystifying Zero to One: Design Informed Techniques for Greenfield Innovati...
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Speed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in MinutesSpeed Wins: From Kafka to APIs in Minutes
Speed Wins: From Kafka to APIs in Minutes
 
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 

Network sampling, community detection

  • 1. Network Sampling, Community Detection (Aula 18) Roberval Gomes Mariano José Macedo Orientando Co-Orientador
  • 2.
  • 3.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
  • 10.
  • 11. The social network of friendships within a 34-person karate club When people are inuenced by the behaviors their neighbors in the network (a kind of informational or social contagion.) The spread of an epidemic disease (such as the tuberculosis)
  • 14. Eric D. Kolaczyk • Statistical Analysis of Network Data • Lecture 1 { Network Mapping & Characterization} • Dept of Mathematics and Statistics, Boston University • kolaczyk@bu.edu
  • 15. Centrality measures Closeness (Proximidade) Betweenness Usadas para calcular a importância dos Vértices dentro do grafo eigenvector A centralidade de autovalor (e.g. PageRank do Google) mede a importância de um vértice na rede considerando cada aresta como um voto.
  • 17. O quão pequeno é o mundo? • Quantos contatos separam uma pessoa de outra, em qualquer lugar do mundo? • 1990 – John Guare e a peça “Six Degrees of Separation”. • Karinthy acreditava que o mundo estava “encolhendo”. • 1960 – The Small World Experiment por Stanley Milgram e Jeffrey Travers.
  • 18. Modelo de Crescimento Exponencial • Você : 1 • Seus amigos : 100 • Os amigos dos seus amigos : 10000
  • 19. Modelo de Crescimento Exponencial • Em apenas 5 passos, 1005 pessoas seriam alcançadas, o que é maior que a população mundial. • Esse modelo despreza a probabilidade de cada pessoa ter amigos em comum. • Não serve para representar os relacionamentos sociais no mundo real. • Mundo real – muitos triângulos. • Como é possível então, que as redes sociais possuem muitos triângulos em escopo local e mesmo assim constituírem um mundo pequeno?
  • 20. Modelo de Watts-Strogatz • Criado por Duncan J. Watts e Steven Strogatz, em 1998. • Introduz o conceito de coeficiente de clusterização. • Consegue simular tanto redes aleatórias como as redes sociais do mundo real.
  • 22. Construção do grafo • O modelo de Watts-Strogatz usa os seguintes parâmetros: • N  número de nós; • K  grau de cada nó; • (0 <= P <= 1) -> coeficiente que modela o grafo, quanto maior P, maior a desordem; • C  coeficiente de clusterização; • L  distância média entre dois nós da rede.
  • 24. Estudo do grafo p = 0 (ordenado) p = 1 (aleatório) • L (n / k) L (ln n /ln k) • C 3 / 4 C (k / n) 0 • O que acontece com valores de P intermediários? • P pequeno - diminui consideravelmente L, mantém C praticamente constante. • P pequeno caracteriza redes de mundo pequeno. • Condições necessárias: alguma fonte de ordenação; um pouco de aleatoriedade.
  • 25. Por que o mundo real é pequeno? • Eu conheço o Governador! - E eu a Xuxa, agora pergunta se ele conhece a minha pessoa! • Ordenação  redes de amigos locais altamente clusterizadas. • Aleatoriedade  algum contato que foge da rede local. Número de Bacon http://oracleofbacon.org/ Facebook • Experiência com 5,8 milhões de usuários. • Distância média: 5,73. • Distância máxima: 12. Twitter • Estudo da relação de um usuário seguir outro. • 5,2 bilhões de relações. • Distância média: 4,67.
  • 26. Class Size Paradox • Simple example: each student takes one course; suppose there is one course with 100 students, fifty courses with 2 students. – Dean calculates: (100+50*2)/51 = 3.92 – Students calculate: (100*100+100*2)/200 = 51 • Class Size Paradox in Networks – Average number of friends of a person’s friends is greater than average number of friends of a person!!
  • 28. Community Detection Algorithms Metric called Modularity. Community: It is formed by individuals such that those within a group interact with each other more frequently than with those outside the group a.k.a. group, cluster, cohesive subgroup, module in different contexts Community detection: discovering groups in a network where individuals’ group memberships are not explicitly given
  • 29. Complete Mutuality: Cliques • Clique: a maximum complete subgraph in which all nodes are adjacent to each other • NP-hard to find the maximum clique in a network • Straightforward implementation to find cliques is very expensive in time complexity Nodes 5, 6, 7 and 8 form a clique 29
  • 30. Finding the Maximum Clique • In a clique of size k, each node maintains degree >= k-1 – Nodes with degree < k-1 will not be included in the maximum clique • Recursively apply the following pruning procedure – Sample a sub-network from the given network, and find a clique in the sub-network, say, by a greedy approach – Suppose the clique above is size k, in order to find out a larger clique, all nodes with degree <= k-1 should be removed. • Repeat until the network is small enough • Many nodes will be pruned as social media networks follow a power law distribution for node degrees 30
  • 31. Maximum Clique Example • Suppose we sample a sub-network with nodes {1-9} and find a clique {1, 2, 3} of size 3 • In order to find a clique >3, remove all nodes with degree <=3- 1=2 – Remove nodes 2 and 9 – Remove nodes 1 and 3 – Remove node 4 31
  • 32. Clique Percolation Method (CPM) • Clique is a very strict definition, unstable • Normally use cliques as a core or a seed to find larger communities • CPM is such a method to find overlapping communities – Input • A parameter k, and a network – Procedure • Find out all cliques of size k in a given network • Construct a clique graph. Two cliques are adjacent if they share k-1 nodes • Each connected component in the clique graph forms a community 32
  • 33. CPM Example Cliques of size 3: {1, 2, 3}, {1, 3, 4}, {4, 5, 6}, {5, 6, 7}, {5, 6, 8}, {5, 7, 8}, {6, 7, 8} Communities: {1, 2, 3, 4} {4, 5, 6, 7, 8} 33
  • 34. Reachability : k-clique, k-club • Any node in a group should be reachable in k hops • k-clique: a maximal subgraph in which the largest geodesic distance between any two nodes <= k • k-club: a substructure of diameter <= k • A k-clique might have diameter larger than k in the subgraph – E.g. {1, 2, 3, 4, 5} • Commonly used in traditional SNA • Often involves combinatorial optimization Cliques: {1, 2, 3} 2-cliques: {1, 2, 3, 4, 5}, {2, 3, 4, 5, 6} 2-clubs: {1,2,3,4}, {1, 2, 3, 5}, {2, 3, 4, 5, 6} 34
  • 35. Group-Centric Community Detection: Density-Based Groups • The group-centric criterion requires the whole group to satisfy a certain condition – E.g., the group density >= a given threshold • A subgraph is a quasi-clique if where the denominator is the maximum number of degrees. • A similar strategy to that of cliques can be used – Sample a subgraph, and find a maximal quasi-clique (say, of size ) – Remove nodes with degree less than the average degree 35 , <
  • 36. Network-Centric Community Detection • Network-centric criterion needs to consider the connections within a network globally • Goal: partition nodes of a network into disjoint sets • Approaches: – (1) Clustering based on vertex similarity – (2) Latent space models (multi-dimensional scaling ) – (3) Block model approximation – (4) Spectral clustering – (5) Modularity maximization 36
  • 37. Clustering based on Vertex Similarity • Apply k-means or similarity-based clustering to nodes • Vertex similarity is defined in terms of the similarity of their neighborhood • Structural equivalence: two nodes are structurally equivalent iff they are connecting to the same set of actors • Structural equivalence is too strict for practical use. Nodes 1 and 3 are structurally equivalent; So are nodes 5 and 6. 37 (1) Clustering based on vertex similarity
  • 38. Vertex Similarity • Jaccard Similarity • Cosine similarity 38 (1) Clustering based on vertex similarity
  • 39. Cut • Most interactions are within group whereas interactions between groups are few • community detection  minimum cut problem • Cut: A partition of vertices of a graph into two disjoint sets • Minimum cut problem: find a graph partition such that the number of edges between the two sets is minimized 42 (4) Spectral clustering
  • 40. Ratio Cut & Normalized Cut • Minimum cut often returns an imbalanced partition, with one set being a singleton, e.g. node 9 • Change the objective function to consider community size Ci,:a community |Ci|: number of nodes in Ci vol(Ci): sum of degrees in Ci 43 (4) Spectral clustering
  • 41. Ratio Cut & Normalized Cut Example For partition in red: For partition in green: Both ratio cut and normalized cut prefer a balanced partition 44 (4) Spectral clustering
  • 43. Bickel-Chen on Community Detection Inconsistency result for Newman-Girvan. http://www.stat.berkeley.edu/~bickel/Bickel%20Chen%2021068.full.pdf
  • 44. Agglomerative Hierarchical Clustering • Initialize each node as a community • Merge communities successively into larger communities following a certain criterion – E.g., based on modularity increase 48 Dendrogram according to Agglomerative Clustering based on Modularity
  • 45. Sampling • Types of Sampling – Simple Random Sampling – Stratified Sampling – Cluster Sampling • Sample Size • Incorporating Sample Design • Design vs. Model-Based Sampling – Statisticians tend to favor the design-based point of view because it makes no assumptions about the mechanism that generates the data in the survey, as we explain below
  • 46. Respondent Driven Sampling (RDS) • sampling scheme for hard-to-reach populations, based on link-tracing across a social network with coupon incentives • becoming extremely-widely used all over the world; hundreds of studies done or ongoing, e.g., CDC National HIV Behavioral Surveillance (NHBS) studies of injection drug users • RDS as sampling vs. RDS estimation
  • 47.
  • 48. To Model or Not To Model; Design-based vs. model-based • Model the underlying network? • What about unknown nodes? • The recruitment process? • Coupon refusal? • The outcome variables (such as HIV status)?
  • 49. Referências • Modelo de Mundo Pequeno - Arthur Freitas Ramos & Hugo Neiva de Melo (000 - mundo-pequeno.pptx) • Collective dynamics of ‘small-world’ networks Duncan J. Watts* & Steven H. Strogatz (001 - w_s_NATURE_0.pdf) • Statistical Analysis of Network Data Lecture 1 { Network Mapping & Characterization Eric D. Kolaczyk (003 - kolaczyk_yes13_1.pdf) • Tutorial: Statistical Analysis of Network Data Eric D. Kolaczyk (004 - Kolaczyk-CN.pdf) • The Watts–Strogatz network model developed by including degree distribution: theory and computer simulation Y W Chen1,2, L F Zhang1 and J P Huang1 (005 - JPA-1.pdf) • Where Online Friends Meet: Social Communities in Location-based Networks - Chlo¨e Brown Vincenzo Nicosia Salvatore Scellato, Anastasios Noulas Cecilia Mascolo (006 - icwsm12_brown.pdf) • Mining of Massive Datasets Anand Rajaraman Jure Leskovec (001 – book.pdf) • Machine Learning for Hackers - Drew Conway and John Myles White (002 - Machine_Learning_for_Hackers.pdf) • pandas: powerful Python data analysis toolkit Release 0.13.1 Wes McKinney & PyData Development Team(003 – pandas.pdf) • Provenance Data in Social Media Geoffrey , Barbier Zhuo Feng , Pritam Gundecha, Huan Liu - SYNTHESIS LECTURES ON DATA MINING AND KNOWLEDGE DISCOVERY #7 &MC Morgan &cLaypool publishers(004 - Provenance-Data-in-Social- Media(2).pdf) • Python for Data Analysis Wes McKinney (005 - Python_for_Data_Analysis.pdf) • Dominique Haughton l Jonathan Haughton - Living Standards Analytics (006 - bok%3A978-1-4614-0385-2.pdf) • A nonparametric view of network models and Newman–Girvan and other modularities Peter J. Bickela,1 and Aiyou Chenb (007 - Bickel%20Chen%2021068.full 2.pdf) • Collective dynamics of ‘small-world’ networks Duncan J. Watts* & Steven H. Strogatz (008 - 393440a0.pdf) • Communities in Networks - Mason A. Porter, Jukka-Pekka Onnela, and Peter J. Mucha(009 - 0902.3788v2 a.pdf) • Managing and Mining Graph Data. Haixun Wang, Charu C. Aggarwal. Beijing, China. Springer. (007 - bok%3A978-1-4419- 6045-0.pdf) • Social Network. Data Analytics. Charu C. Aggarwal. Springer. (008 - bok%3A978-1-4419-8462-3.pdf) • RDS Analysis Tools 7.2 RDSAT 7.1 User Manual. Springer. (009 - RDSAT_7.1-Manual_2012-11-25.pdf) • Eric D. Kolaczyk. Statistical Analysis of Network Data Methods and Models. Springer. (010 - bok_978-0-387-88146-1) • Networks, Crowds, and Markets: Reasoning about a Highly Connected World. David Easley & Jon Kleinberg (011 - networks-book.pdf)

Editor's Notes

  1. A typical goal is to partition a network into disjoint sets. But that can be extended to soft clustering as well
  2. widetilde{P} is introduced for notational convenience. It can be constructed based on the proximity matrix P.