SlideShare a Scribd company logo
Fast RankClus Algorithm via Dynamic Rank Score Tracking
on Bi-type Information Networks
iiWAS 2019
Kotaro Yamazaki✝ , Shohei Matsugu✝
Hiroaki Shiokawa✝✝, Hiroyuki Kitagawa✝✝
✝: Graduate School of SIE, University of Tsukuba
✝✝: Center for Computational Sciences, University of Tsukuba
Background
2
Background:
Ubiquitous Information Networks
• Information network:
Each node represents an entity and each link(edge) a relationship between
entities
• Homogeneous vs. Heterogeneous networks
- Homogeneous Network
Single type Object
E.g., Co-author network, Web pages, Friendship network
Most Current studies are on homogeneous network
- Heterogeneous Network
Objects belong to several types
E.g., Conference-author network, Medical networks
Most real system can be modeled as heterogeneous network
3
Background:
RankClus
[Sun et al.,EDBT’09]
Ranking-based Clustering algorithm for Heterogeneous Network
The Methodology
• Ranking as the features of clusters
• Clustering so that each node has the highest rank score.
• Repeat and improve the quality of clustering and ranking mutually
4
Heterogeneous Network
RankClus
Framework
with rank scores
Background:
Bottleneck of RankClus
Consumes much computational cost in ranking process
5
Why?
• Generate subgraphs
as many as the number of clusters
• Iteratively updates rank scores
for all nodesClustering
Initialization
Ranking
Repeat
•Pruning-RankClus[Yamazaki et, al. iiWAS’18]
The fast RankClus algorithm by pruning nodes
- Approach
Specify nodes that not significantly affect clustering
result and prune them
- Bottleneck
Difficult to prune while maintaining accuracy
Needs to set many user-specific parameters
6
Background:
Related Work
Background:
Our Goal
Reduce the computational cost of RankClus
7
Local update
Approach
Focus on the dynamic graph property of RankClus and
compute only evolving nodes and their neighbors efficiently
Background:
Our Contributions
1. Efficient
Our proposed method outperforms RankClus and the state-
of-the-art(Pruning-RankClus) algorithm
2. Highly Accurate
Although our proposed algorithm does not compute the
entire graph, its clustering results are more accurate than
those of the state-of-the-art algorithm
3. Easy to deploy
Our proposed method requires fewer user- specified
parameters than the state-of-the-art algorithm
8
Preliminary
9
Preliminary:
Data Model: Bi-type Information Network
• A graph consist of two kinds of nodes X and Y.
E.g.) Conference-author network
- Links can exist between
 Conference (X) and author (Y)
 Author (Y) and author (Y)
10
Target type
Target of clustering
Attribute type
Support information for clustering
X Y
Preliminary:
Algorithm Framework - Overview
11
Clustering
Initialization
Ranking
Step 0:
Step 1:
Step 2:
Repeat
with each rank scores
Target type
• Bi-type Information network: 𝔾
• Cluster number: K
• 𝑅𝑎𝑛𝑘𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑓
𝐾 = 2
Ex)
𝑓
Output
Input
Clustering
Initialization
Ranking
Step 0:
Step 1:
Step 2:
1. Partition target-type nodes into K
clusters
2. Construct subgraphs based on
their clusters
Repeat
Preliminary:
Step 0: Partition and Construct Subgraphs
12
Preliminary:
Step 1: Ranking for Each Subgraph
13
Compute rank scores for each type
by a ranking function
Clustering
Initialization
Ranking
Step 0:
Step 1:
Step 2:
Repeat
Ranking for all nodes
Preliminary:
Step 2: Clustering for Each Target Node
14
• Considers the rank scores of attribute
type as cluster features
• Estimates the posterior probability that
target nodes belongs to a cluster.
Clustering
Initialization
Ranking
Step 0:
Step 1:
Step 2:
Repeat
Re-assign new cluster
15
Preliminarily:
Key Observations
About subgraph 𝔾𝑖 in each iteration
1. Inserts new nodes and edges into 𝔾𝑖
2. Remove several nodes and edges from 𝔾𝑖
Each subgraph can be regarded as a dynamic graph
𝒕
𝒕+1
𝒕+2
Proposed Method
16
Proposed Method:
Problem Setting
• Given: 𝔾0at initial time t = 0
17
…
Problem at time t = 0:
1.Initialization
2. Ranking (Ranking function: Personalized PageRank)
3. Clustering
Problem at time 𝑡:
1. Compute approx. rank score
2. Clustering
…
…
Given at time 𝑡:
Nodes and Edges are inserted to / remove from
or
Proposed Method:
Overview
Adopt A Dynamic PPR computation
based on the Gauss-Southwell method [Ohsaka KDD’15]
18
Main Idea
1. Each subgraph can be regarded as a dynamic graph
2. Previous rank score is a GOOD initial rank score.
3. We need to improve the approximate rank score locally
𝑒𝑟𝑟𝑜𝑟 < 𝜖
𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
14] Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015.
Efficient PageRank Tracking in Evolving Networks (KDD ’15). 875–884.
Proposed Method:
Gauss-Southwell Method [Southwell. ‘40, ‘46]
• 𝑣-th rank score 𝑟𝑃𝑃𝑅(𝑣) of 𝔾𝑖
• Corresponding residual 𝑑(𝑣) as: 𝑑(𝑣) = 1 − 𝛼 𝑏 − 𝐼 − 𝛼𝑃 𝑟𝑃𝑃𝑅(𝑣)
Goal:
𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑑 𝑣 → 0
19
𝑖
・Propagate
・Update r and d
𝑒𝑟𝑟𝑜𝑟 < 𝜖
𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
𝛼
𝑑𝑖
𝑑𝑒𝑔
𝛼
𝑑𝑖
𝑑𝑒𝑔
𝑖
Proposed Method:
Dynamic Rank Score Tracking
20
• Iteration number : 𝑡
𝒓 𝑷𝑷𝑹
𝒕
(𝒗 = 𝟎) = 𝒓 𝑷𝑷𝑹
(𝒕−𝟏)
𝒅 𝒕 (𝒗 = 𝟎) = 𝒅 𝒕−𝟏 + 𝜶 𝑷
(𝒕)
− 𝑷
(𝒕−𝟏)
𝒓 𝑷𝑷𝑹
(𝒕−𝟏)
Compute approx. rank score by Gauss-Southwell algorithm
At the time t: cluster is updated
Added node
Proposed Method:
Dynamic Rank Score Tracking
21
• Iteration number : 𝑡
𝒓 𝑷𝑷𝑹
𝒕
(𝒗 = 𝟎) = 𝒓 𝑷𝑷𝑹
(𝒕−𝟏)
𝒅 𝒕 (𝒗 = 𝟎) = 𝒅 𝒕−𝟏 + 𝜶 𝑷
(𝒕)
− 𝑷
(𝒕−𝟏)
𝒓 𝑷𝑷𝑹
(𝒕−𝟏)
Compute approx. rank score by Gauss-Southwell algorithm
𝒗 = 𝟎: Compute initial solution
𝑒𝑟𝑟𝑜𝑟 < 𝜖
𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
Proposed Method:
Dynamic Rank Score Tracking
• Iteration number : 𝑡
𝒙 𝒌 𝒕 𝟎 = 𝒙 𝒌 𝒕 − 𝟏
𝒓 𝒌 𝒕 𝟎 = 𝒓 𝒌 𝒕 − 𝟏 + 𝜶 𝑷 𝒌 𝒕 − 𝑷 𝒌 𝒕 − 𝟏 𝒙 𝒌 𝒕 − 𝟏
Compute approx. rank score by Gauss-Southwell algorithm
22
𝑖
𝒗 = 1
・Propagate
・Update r and d
𝑒𝑟𝑟𝑜𝑟 < 𝜖
𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
23
𝑖
𝒗 = 𝟐
• Iteration number : 𝑡
𝒙 𝒌 𝒕 𝟎 = 𝒙 𝒌 𝒕 − 𝟏
𝒓 𝒌 𝒕 𝟎 = 𝒓 𝒌 𝒕 − 𝟏 + 𝜶 𝑷 𝒌 𝒕 − 𝑷 𝒌 𝒕 − 𝟏 𝒙 𝒌 𝒕 − 𝟏
Compute approx. rank score by Gauss-Southwell algorithm
Proposed Method:
Dynamic Rank Score Tracking
𝑒𝑟𝑟𝑜𝑟 < 𝜖
𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
Evaluation Experiments
25
Evaluation Experiments:
Setup
• Datasets
• Algorithms
- Proposed method: reduce the cost by updating locally
- RankClus: the original algorithm
- Pruning: reduce the cost by pruning
[Yamazaki, et al. iiWAS’ 18] [Yamazaki, et al. JDI’ 19]
• Evaluation Criterion
- NMI (Normalized Mutual Information)
NMI takes a value between 0 (no mutual information) and 1
(perfect correlation)
26
Dataset name # of |X| # of |Y| # of edges
DBLP 20 5,693 95,516
Yahoo-msg 57 100,001 6,359,436
Evaluation Experiments:
Running Time Analysis
27
Running times of each algorithm
(𝐾 = 4, 𝜖 = 10−9
) (𝐾 = 10, 𝜖 = 10−9
)
Evaluation Experiments:
Parameter Analysis (1/2)
28
Running times by varying 𝜖
Yahoo-msgDBLP
Evaluation Experiments:
Parameter Analysis (2/2)
29
Running times by varying the number of clusters 𝐾
Yahoo-msg
Evaluation Experiments:
Clustering Accuracy Analysis
NMI score of Proposal by varying 𝜖
30
Comparing with the original RankClus result
Conclusion
31
Conclusion
• Main Approach
- Focus on the dynamic graph property of RankClus and
compute only evolving nodes and their neighbors
• Evaluation Result
- Confirm our proposed method can obtain clusters almost
twice as faster than competitive method while keeping
the clustering accuracy for two real-world dataset
32
Propose efficient RankClus algorithm
for large-scale bi-type information networks

More Related Content

What's hot

Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based ClusteringSSA KPI
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to RankBhaskar Mitra
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reductionYan Xu
 
K means clustering
K means clusteringK means clustering
K means clusteringkeshav goyal
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewVahid Mirjalili
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXBenjamin Bengfort
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithmDarshak Mehta
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonAfzal Ahmad
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introductionYan Xu
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkmVahid Mirjalili
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...Daiki Tanaka
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16MLconf
 
K means Clustering
K means ClusteringK means Clustering
K means ClusteringEdureka!
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford MapR Technologies
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jaxAjay Iet
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for SearchBhaskar Mitra
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methodsKrish_ver2
 

What's hot (20)

Graph Based Clustering
Graph Based ClusteringGraph Based Clustering
Graph Based Clustering
 
Neural Learning to Rank
Neural Learning to RankNeural Learning to Rank
Neural Learning to Rank
 
Nonlinear dimension reduction
Nonlinear dimension reductionNonlinear dimension reduction
Nonlinear dimension reduction
 
K means clustering
K means clusteringK means clustering
K means clustering
 
Large Scale Data Clustering: an overview
Large Scale Data Clustering: an overviewLarge Scale Data Clustering: an overview
Large Scale Data Clustering: an overview
 
Graph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkXGraph Analyses with Python and NetworkX
Graph Analyses with Python and NetworkX
 
K means clustering algorithm
K means clustering algorithmK means clustering algorithm
K means clustering algorithm
 
K-Means Algorithm Implementation In python
K-Means Algorithm Implementation In pythonK-Means Algorithm Implementation In python
K-Means Algorithm Implementation In python
 
Clustering introduction
Clustering introductionClustering introduction
Clustering introduction
 
Neural Network Part-2
Neural Network Part-2Neural Network Part-2
Neural Network Part-2
 
Clustering on database systems rkm
Clustering on database systems rkmClustering on database systems rkm
Clustering on database systems rkm
 
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
[Paper reading] L-SHAPLEY AND C-SHAPLEY: EFFICIENT MODEL INTERPRETATION FOR S...
 
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
Sergei Vassilvitskii, Research Scientist, Google at MLconf NYC - 4/15/16
 
K means Clustering
K means ClusteringK means Clustering
K means Clustering
 
Lect4
Lect4Lect4
Lect4
 
Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford Fast Single-pass K-means Clusterting at Oxford
Fast Single-pass K-means Clusterting at Oxford
 
K-Means clustring @jax
K-Means clustring @jaxK-Means clustring @jax
K-Means clustring @jax
 
Kmeans
KmeansKmeans
Kmeans
 
Deep Learning for Search
Deep Learning for SearchDeep Learning for Search
Deep Learning for Search
 
3.2 partitioning methods
3.2 partitioning methods3.2 partitioning methods
3.2 partitioning methods
 

Similar to Iiwas19 yamazaki slide

background.pptx
background.pptxbackground.pptx
background.pptxKabileshCm
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Experfy
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningTaehoon Kim
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화NAVER Engineering
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint febimu409
 
Thesis Presentation on Energy Efficiency Improvement in Data Centers
Thesis Presentation on Energy Efficiency Improvement in Data CentersThesis Presentation on Energy Efficiency Improvement in Data Centers
Thesis Presentation on Energy Efficiency Improvement in Data CentersMonica Vitali
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsKonstantinos Pelechrinis
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learningmilad abbasi
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep LearningMehrnaz Faraz
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognitionkenluck2001
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...Emanuel Lacić
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...KamleshKumar394
 
` Traffic Classification based on Machine Learning
` Traffic Classification based on Machine Learning ` Traffic Classification based on Machine Learning
` Traffic Classification based on Machine Learning butest
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...MLconf
 

Similar to Iiwas19 yamazaki slide (20)

background.pptx
background.pptxbackground.pptx
background.pptx
 
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Machine learning meetup
Machine learning meetupMachine learning meetup
Machine learning meetup
 
230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx230727_HB_JointJournalClub.pptx
230727_HB_JointJournalClub.pptx
 
Dueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learningDueling network architectures for deep reinforcement learning
Dueling network architectures for deep reinforcement learning
 
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
대용량 데이터 분석을 위한 병렬 Clustering 알고리즘 최적화
 
Data mining with weka
Data mining with wekaData mining with weka
Data mining with weka
 
powerpoint feb
powerpoint febpowerpoint feb
powerpoint feb
 
Vi sem
Vi semVi sem
Vi sem
 
Thesis Presentation on Energy Efficiency Improvement in Data Centers
Thesis Presentation on Energy Efficiency Improvement in Data CentersThesis Presentation on Energy Efficiency Improvement in Data Centers
Thesis Presentation on Energy Efficiency Improvement in Data Centers
 
Winning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and TensorsWinning in Basketball with Data, Networks and Tensors
Winning in Basketball with Data, Networks and Tensors
 
An Introduction to Deep Learning
An Introduction to Deep LearningAn Introduction to Deep Learning
An Introduction to Deep Learning
 
Introduction to Deep Learning
Introduction to Deep LearningIntroduction to Deep Learning
Introduction to Deep Learning
 
Neural Networks made easy
Neural Networks made easyNeural Networks made easy
Neural Networks made easy
 
Landmark Retrieval & Recognition
Landmark Retrieval & RecognitionLandmark Retrieval & Recognition
Landmark Retrieval & Recognition
 
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
[AFEL] Neighborhood Troubles: On the Value of User Pre-Filtering To Speed Up ...
 
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
A Comprehensive Study of Clustering Algorithms for Big Data Mining with MapRe...
 
` Traffic Classification based on Machine Learning
` Traffic Classification based on Machine Learning ` Traffic Classification based on Machine Learning
` Traffic Classification based on Machine Learning
 
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
Hanjun Dai, PhD Student, School of Computational Science and Engineering, Geo...
 

Recently uploaded

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...Product School
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2DianaGray10
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka DoktorováCzechDreamin
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomCzechDreamin
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaCzechDreamin
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationZilliz
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxAbida Shariff
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Product School
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...Product School
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekCzechDreamin
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...CzechDreamin
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Alison B. Lowndes
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...Product School
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupCatarinaPereira64715
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...Sri Ambati
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Product School
 

Recently uploaded (20)

From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 
UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2UiPath Test Automation using UiPath Test Suite series, part 2
UiPath Test Automation using UiPath Test Suite series, part 2
 
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová10 Differences between Sales Cloud and CPQ, Blanka Doktorová
10 Differences between Sales Cloud and CPQ, Blanka Doktorová
 
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone KomSalesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
Salesforce Adoption – Metrics, Methods, and Motivation, Antone Kom
 
Powerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara LaskowskaPowerful Start- the Key to Project Success, Barbara Laskowska
Powerful Start- the Key to Project Success, Barbara Laskowska
 
Introduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG EvaluationIntroduction to Open Source RAG and RAG Evaluation
Introduction to Open Source RAG and RAG Evaluation
 
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptxIOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
IOS-PENTESTING-BEGINNERS-PRACTICAL-GUIDE-.pptx
 
Assuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyesAssuring Contact Center Experiences for Your Customers With ThousandEyes
Assuring Contact Center Experiences for Your Customers With ThousandEyes
 
Connector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a buttonConnector Corner: Automate dynamic content and events by pushing a button
Connector Corner: Automate dynamic content and events by pushing a button
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...
 
AI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří KarpíšekAI revolution and Salesforce, Jiří Karpíšek
AI revolution and Salesforce, Jiří Karpíšek
 
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
Behind the Scenes From the Manager's Chair: Decoding the Secrets of Successfu...
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
From Siloed Products to Connected Ecosystem: Building a Sustainable and Scala...
 
Key Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdfKey Trends Shaping the Future of Infrastructure.pdf
Key Trends Shaping the Future of Infrastructure.pdf
 
ODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User GroupODC, Data Fabric and Architecture User Group
ODC, Data Fabric and Architecture User Group
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 

Iiwas19 yamazaki slide

  • 1. Fast RankClus Algorithm via Dynamic Rank Score Tracking on Bi-type Information Networks iiWAS 2019 Kotaro Yamazaki✝ , Shohei Matsugu✝ Hiroaki Shiokawa✝✝, Hiroyuki Kitagawa✝✝ ✝: Graduate School of SIE, University of Tsukuba ✝✝: Center for Computational Sciences, University of Tsukuba
  • 3. Background: Ubiquitous Information Networks • Information network: Each node represents an entity and each link(edge) a relationship between entities • Homogeneous vs. Heterogeneous networks - Homogeneous Network Single type Object E.g., Co-author network, Web pages, Friendship network Most Current studies are on homogeneous network - Heterogeneous Network Objects belong to several types E.g., Conference-author network, Medical networks Most real system can be modeled as heterogeneous network 3
  • 4. Background: RankClus [Sun et al.,EDBT’09] Ranking-based Clustering algorithm for Heterogeneous Network The Methodology • Ranking as the features of clusters • Clustering so that each node has the highest rank score. • Repeat and improve the quality of clustering and ranking mutually 4 Heterogeneous Network RankClus Framework with rank scores
  • 5. Background: Bottleneck of RankClus Consumes much computational cost in ranking process 5 Why? • Generate subgraphs as many as the number of clusters • Iteratively updates rank scores for all nodesClustering Initialization Ranking Repeat
  • 6. •Pruning-RankClus[Yamazaki et, al. iiWAS’18] The fast RankClus algorithm by pruning nodes - Approach Specify nodes that not significantly affect clustering result and prune them - Bottleneck Difficult to prune while maintaining accuracy Needs to set many user-specific parameters 6 Background: Related Work
  • 7. Background: Our Goal Reduce the computational cost of RankClus 7 Local update Approach Focus on the dynamic graph property of RankClus and compute only evolving nodes and their neighbors efficiently
  • 8. Background: Our Contributions 1. Efficient Our proposed method outperforms RankClus and the state- of-the-art(Pruning-RankClus) algorithm 2. Highly Accurate Although our proposed algorithm does not compute the entire graph, its clustering results are more accurate than those of the state-of-the-art algorithm 3. Easy to deploy Our proposed method requires fewer user- specified parameters than the state-of-the-art algorithm 8
  • 10. Preliminary: Data Model: Bi-type Information Network • A graph consist of two kinds of nodes X and Y. E.g.) Conference-author network - Links can exist between  Conference (X) and author (Y)  Author (Y) and author (Y) 10 Target type Target of clustering Attribute type Support information for clustering X Y
  • 11. Preliminary: Algorithm Framework - Overview 11 Clustering Initialization Ranking Step 0: Step 1: Step 2: Repeat with each rank scores Target type • Bi-type Information network: 𝔾 • Cluster number: K • 𝑅𝑎𝑛𝑘𝑖𝑛𝑔 𝑓𝑢𝑛𝑐𝑡𝑖𝑜𝑛: 𝑓 𝐾 = 2 Ex) 𝑓 Output Input
  • 12. Clustering Initialization Ranking Step 0: Step 1: Step 2: 1. Partition target-type nodes into K clusters 2. Construct subgraphs based on their clusters Repeat Preliminary: Step 0: Partition and Construct Subgraphs 12
  • 13. Preliminary: Step 1: Ranking for Each Subgraph 13 Compute rank scores for each type by a ranking function Clustering Initialization Ranking Step 0: Step 1: Step 2: Repeat Ranking for all nodes
  • 14. Preliminary: Step 2: Clustering for Each Target Node 14 • Considers the rank scores of attribute type as cluster features • Estimates the posterior probability that target nodes belongs to a cluster. Clustering Initialization Ranking Step 0: Step 1: Step 2: Repeat Re-assign new cluster
  • 15. 15 Preliminarily: Key Observations About subgraph 𝔾𝑖 in each iteration 1. Inserts new nodes and edges into 𝔾𝑖 2. Remove several nodes and edges from 𝔾𝑖 Each subgraph can be regarded as a dynamic graph 𝒕 𝒕+1 𝒕+2
  • 17. Proposed Method: Problem Setting • Given: 𝔾0at initial time t = 0 17 … Problem at time t = 0: 1.Initialization 2. Ranking (Ranking function: Personalized PageRank) 3. Clustering Problem at time 𝑡: 1. Compute approx. rank score 2. Clustering … … Given at time 𝑡: Nodes and Edges are inserted to / remove from or
  • 18. Proposed Method: Overview Adopt A Dynamic PPR computation based on the Gauss-Southwell method [Ohsaka KDD’15] 18 Main Idea 1. Each subgraph can be regarded as a dynamic graph 2. Previous rank score is a GOOD initial rank score. 3. We need to improve the approximate rank score locally 𝑒𝑟𝑟𝑜𝑟 < 𝜖 𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖 14] Naoto Ohsaka, Takanori Maehara, and Ken-ichi Kawarabayashi. 2015. Efficient PageRank Tracking in Evolving Networks (KDD ’15). 875–884.
  • 19. Proposed Method: Gauss-Southwell Method [Southwell. ‘40, ‘46] • 𝑣-th rank score 𝑟𝑃𝑃𝑅(𝑣) of 𝔾𝑖 • Corresponding residual 𝑑(𝑣) as: 𝑑(𝑣) = 1 − 𝛼 𝑏 − 𝐼 − 𝛼𝑃 𝑟𝑃𝑃𝑅(𝑣) Goal: 𝑟𝑒𝑠𝑖𝑑𝑢𝑎𝑙 𝑑 𝑣 → 0 19 𝑖 ・Propagate ・Update r and d 𝑒𝑟𝑟𝑜𝑟 < 𝜖 𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖 𝛼 𝑑𝑖 𝑑𝑒𝑔 𝛼 𝑑𝑖 𝑑𝑒𝑔 𝑖
  • 20. Proposed Method: Dynamic Rank Score Tracking 20 • Iteration number : 𝑡 𝒓 𝑷𝑷𝑹 𝒕 (𝒗 = 𝟎) = 𝒓 𝑷𝑷𝑹 (𝒕−𝟏) 𝒅 𝒕 (𝒗 = 𝟎) = 𝒅 𝒕−𝟏 + 𝜶 𝑷 (𝒕) − 𝑷 (𝒕−𝟏) 𝒓 𝑷𝑷𝑹 (𝒕−𝟏) Compute approx. rank score by Gauss-Southwell algorithm At the time t: cluster is updated Added node
  • 21. Proposed Method: Dynamic Rank Score Tracking 21 • Iteration number : 𝑡 𝒓 𝑷𝑷𝑹 𝒕 (𝒗 = 𝟎) = 𝒓 𝑷𝑷𝑹 (𝒕−𝟏) 𝒅 𝒕 (𝒗 = 𝟎) = 𝒅 𝒕−𝟏 + 𝜶 𝑷 (𝒕) − 𝑷 (𝒕−𝟏) 𝒓 𝑷𝑷𝑹 (𝒕−𝟏) Compute approx. rank score by Gauss-Southwell algorithm 𝒗 = 𝟎: Compute initial solution 𝑒𝑟𝑟𝑜𝑟 < 𝜖 𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
  • 22. Proposed Method: Dynamic Rank Score Tracking • Iteration number : 𝑡 𝒙 𝒌 𝒕 𝟎 = 𝒙 𝒌 𝒕 − 𝟏 𝒓 𝒌 𝒕 𝟎 = 𝒓 𝒌 𝒕 − 𝟏 + 𝜶 𝑷 𝒌 𝒕 − 𝑷 𝒌 𝒕 − 𝟏 𝒙 𝒌 𝒕 − 𝟏 Compute approx. rank score by Gauss-Southwell algorithm 22 𝑖 𝒗 = 1 ・Propagate ・Update r and d 𝑒𝑟𝑟𝑜𝑟 < 𝜖 𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
  • 23. 23 𝑖 𝒗 = 𝟐 • Iteration number : 𝑡 𝒙 𝒌 𝒕 𝟎 = 𝒙 𝒌 𝒕 − 𝟏 𝒓 𝒌 𝒕 𝟎 = 𝒓 𝒌 𝒕 − 𝟏 + 𝜶 𝑷 𝒌 𝒕 − 𝑷 𝒌 𝒕 − 𝟏 𝒙 𝒌 𝒕 − 𝟏 Compute approx. rank score by Gauss-Southwell algorithm Proposed Method: Dynamic Rank Score Tracking 𝑒𝑟𝑟𝑜𝑟 < 𝜖 𝑒𝑟𝑟𝑜𝑟 ≥ 𝜖
  • 25. Evaluation Experiments: Setup • Datasets • Algorithms - Proposed method: reduce the cost by updating locally - RankClus: the original algorithm - Pruning: reduce the cost by pruning [Yamazaki, et al. iiWAS’ 18] [Yamazaki, et al. JDI’ 19] • Evaluation Criterion - NMI (Normalized Mutual Information) NMI takes a value between 0 (no mutual information) and 1 (perfect correlation) 26 Dataset name # of |X| # of |Y| # of edges DBLP 20 5,693 95,516 Yahoo-msg 57 100,001 6,359,436
  • 26. Evaluation Experiments: Running Time Analysis 27 Running times of each algorithm (𝐾 = 4, 𝜖 = 10−9 ) (𝐾 = 10, 𝜖 = 10−9 )
  • 27. Evaluation Experiments: Parameter Analysis (1/2) 28 Running times by varying 𝜖 Yahoo-msgDBLP
  • 28. Evaluation Experiments: Parameter Analysis (2/2) 29 Running times by varying the number of clusters 𝐾 Yahoo-msg
  • 29. Evaluation Experiments: Clustering Accuracy Analysis NMI score of Proposal by varying 𝜖 30 Comparing with the original RankClus result
  • 31. Conclusion • Main Approach - Focus on the dynamic graph property of RankClus and compute only evolving nodes and their neighbors • Evaluation Result - Confirm our proposed method can obtain clusters almost twice as faster than competitive method while keeping the clustering accuracy for two real-world dataset 32 Propose efficient RankClus algorithm for large-scale bi-type information networks

Editor's Notes

  1. Hello, everyone. I’m Kotaro Yamazaki from University of Tsukuba, in Japan. I’d like to talk about our paper “fast RankClus Algorithm via Dynamic Rank Score Tracking on Bi-Type Information Networks”.
  2. First, I introduce the background.
  3. Information networks, which consisted of nodes and edges are ubiquitous There are two types, homogeneous network and heterogeneous network. Homogeneous network consists of the same type object and links, such as collaborative network and the web pages. A lot of current studies are on homogeneous network. While, Heterogeneous network contains different types of objects such as conference-author network and medical network. most real systems contain multi- typed interacting components and we can model them as heterogeneous information networks So, it is important to analyze heterogeneous networks.
  4. RankClus is a novel graph mining framework for heterogeneous network. it has achieved clustering based on ranking results so that each node has the highest rank score. Compared with general graph clustering algorithm, it focuses on the mportance for nodes in the cluster.
  5. However, in terms of efficiently, the bottleneck of RankClus is “the computational cost ” of ranking procedure. The reason why is RankClus needs to generate subgraphs as many as the number of clusters, and it needs to iteratively perform the ranking procedure for all nodes in all subgraphs
  6. Pruning RankClus is an efficient RankClus algorithm. The approach is specifying nodes that not significantly affect clustering result and prune them. However Pruning-RankClus has bottlenecks First, it is difficult to prune while maintaining accuracy Second, it needs to set many user-specific parameter
  7. To overcome the performance limitation in RankClus, we present an efficient algorithm for speeding-up RankClus on large scale heterogeneous network. we focus on the dynamic graph property of RankClus and compute only (evolving nodes and their neighbors) efficiently.
  8. Our contributions are shown as follows Efficient Our proposed method outperforms the RankClus and the state-of-the-art algorithm Highly Accurate Although our proposed algorithm does not compute the entire graph, its clustering results are more accurate than those of the state-of-the-art algorithm Easy to deploy Our proposed method requires fewer user- specified parameters than the state-of-the-art algorithm
  9. Next, I’d like to talk about preliminary.
  10. First ,I briefly introduce the data model of RankClus. RankClus takes a bi-type information network. It is one of heterogenous network. ,Bi-type information network has two kinds of nodes; X and Y. Then, X are defined as target-type nodes and Y are defined as attribute-type nodes RankClus performs clustering only for the target type nodes, and attribute type nodes are used as a support information for the clustering.
  11. This is the overview of RankClus algorithm framework RankClus is an iterative method. As an input, it takes a bi-type information network G, cluster number K, ranking function f. As an output, it outputs clusters with their rank score for each type nodes in each subgraph. Next, I will explain the details of each step.
  12. In initialization step, RankClus partition target type nodes into K clusters Then, subgraphs are constructed based on their clusters as many as cluster Number K.
  13. In Ranking step, compute rank scores for each type by a ranking function Note that it is computationally expensive to obtain rank score because it needs to be computed for all nodes in each subgraph.
  14. Next step is clustering step. First, RankClus consds Finally, target type nodes is re-assigned into the nearest cluster.
  15. Our key observation of RankClus is that each subgraph can be regarded as a dynamic graph. When focus on one subgraph, it behave such as a dynamic graph in each iteration.
  16. In the next part, I’d like to talk about our Proposed method.
  17. Let’s begin with our problem setting. At the beginning, we are given an initial graph G(0) as many as cluster number K. Problem at time 0., We perform ranking and clustering as same as original RankClus algorithm with Personalized PageRank For each time t. Because their clusters are updated, nodes and edges are inserted to/ removed from each subgraph At time t we apply the dynamic programing property to take care of the computational complexity compute the approximate rank score.
  18. This is the overview of our proposed method to reduce the computational cost of RankClus. In this approach, we adopt a dynamic PPR computation based on the gauss-Southwell method The main idea is very simple. 1. Each subgraph can be regarded as a dynamic graph 2. The Previous rank score is considered a good initial rank score for obtaining the current rank score 3 we need to improve the approximate solution locally. Because if the change of graph is small, we need to update only few nodes. Not all nodes.
  19. I first introduce gauss-southwell method Gauss-southwell method is iterative method to solve Personalized PageRank. This method has two vectors. First is approximate rank score and Second is residuals d corresponding to the rank score. And The goal of this method is to make d nearly 0 first, at each iteration, the method picks the node which has the largest residual. If the residual is greater than epsilon, It updates the approximate rank score r and residuals d locally. The example of the updating process is shown in the figure. I mean that if it picks the node “I”, the residual propagates from I to the out neighbors. And it iterates the process until the largest residual is less than epsilon.
  20. So now we describe our proposed method. This is an example of our proposed method. We takes one of the subgraphs that is added one node at time t.
  21. First, based on our idea, we use the previous rank scores as the initial rank scores at time t. The red nods are larger than epsilon.
  22. And then, we apply Gauss-Southwell method. It picks the node that has the largest residual. the residual is propagated to the out-neighbors and updates r and d at the same time.
  23. The computation is iterated until the largest residual is less than epsilon. And that’s about the flow of our proposed method.
  24. In Ranking step, compute rank scores for each type by a ranking function Note that it is computationally expensive to compute rank score because it needs to be computed for all nodes in each subgraph.
  25. In next part, I’d like to talk about Evaluation Experiments
  26. We evaluate our proposed method on two Real dataset “DBLP” and ”Yahoo-msg”. The performance of our proposed method compared with the original RankClus algorithm and the state-of-the-art method that reduces the cost by pruning In order to evaluate the accuracy of clustering results, we employed NMI which is an information-theory measurement. NMI measures clustering accuracy by comparing the clustering result. NMI takes a value between 0 and 1; It returns 1, if the two clusters are completely s
  27. This is the result of running times of each algorithm. As we can see from the result, our proposed algorithm outperforms the other algorithms. Specifically, our proposed algorithm is up to twice as fast as the other algorithms, it suggests that our dynamic rank score tracking method can cut off the computation cost.
  28. Next, we assessed the effect of the user-specified parameter ε of our proposed method We compared running times by varying ε. These Figures show the running times of DBLP and Yahoo-msg, respectively. We varied \epsilon from 10 exponent -9 to 10 exponent -2. As we can see from these figures/ Our proposed method gradually reduces the running time as the epsilon value increases . Because our dynamic rank score tracking method only needs to update residual and score until the largest residual satisfies < \epsilon.
  29. We assessed the impact of the number of clusters K on the running time. This figure shows the runtimes when K was varied for the Yahoo-msg dataset. As we can see from this result, the speeding-up ratio increases as we increase K on our proposed algorithm . That’s because each subgraph does not drastically change if we set a large K value. In particular, if a subgraph has no updates, our proposed algorithm can skip the ranking process for the subgraph while the other algorithm needs to perform ranking process. Thus, we can further improve the efficiency for larger K settings.
  30. Finally, We assessed the accuracy of clustering result produced by the proposed algorithm. In this evaluation, we measured the NMI scores between clusters extracted by our proposed method and RankClus. Herein we varied the epsilon values from 10 exponent -9 to 10 exponent -2 The results shows NMI scores of our proposed method shows high NMI values for all \epsilon settings. even though it has drastically reduced running times compared to RankClus.
  31. Finally, let me conclude my talk.
  32. In this study, we proposed an efficient RankClus algorithm for large-scale bi-type information network. The main approach is to reduce the cost by focusing on the dynamic graph property and compute only evolving nodes and their neighbors. Evaluating experiments showed that our proposed method can obtain clusters almost twice as faster than competitive method while keeping the clustering accuracy.