SlideShare a Scribd company logo
1 KYOTO UNIVERSITY
KYOTO UNIVERSITY
Metric Recovery from Unweighted k-NN Graphs
Ryoma Sato
2 / 45 KYOTO UNIVERSITY
I introduce my favorite topic and its applications

Metric recovery from unweighted k-NN graphs is my
recent favorite technique.
I like this technique because
The scope of applications is broad, and
The results are simple but non-trivial.

I first introduce this problem.

I then introduce my recent projects that used this technique.
- Towards Principled User-side Recommender
Systems (CIKM 2022)
- Graph Neural Networks can Recover the Hidden
Features Solely from the Graph Structure (ICML 2023)
3 / 45 KYOTO UNIVERSITY
Metric Recovery from Unweighted k-NN Graphs
Morteza Alamgir, Ulrike von Luxburg. Shortest path distance in random k-nearest neighbor graphs. ICML 2012.
Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015.
4 / 45 KYOTO UNIVERSITY
k-NN graph is generated from a point cloud

We generate a k-NN graph from a point cloud.

Then, we discard the coordinates of nodes.
generate
edges
discard
coordinates
nodes have coordinates
for visualization
but they are random
5 / 45 KYOTO UNIVERSITY
Metric recovery asks to estimate the coodinates

The original coordinates are hidden now.

Metric recovery from unweighted k-NN graphs is a problem
of estimating the coordinates from the k-NN graph.
estimate
6 / 45 KYOTO UNIVERSITY
Only the existences of edges are observable

Unweighted means the edge lengths are neither available.

This is equivalent to the setting where only the 01-adjacency
matrix of the k-NN graph is available.
estimate
7 / 45 KYOTO UNIVERSITY
Given 01-adjacency, estimate the coordinates

Problem (Metric Recovery from Unweighted k-NN Graphs)
In: The 01-adjacency matrix of a k-NN graph
Out: The latent coordinates of the nodes

Very simple.
estimate
8 / 45 KYOTO UNIVERSITY
Why Is This Problem Challenging?
9 / 45 KYOTO UNIVERSITY
Standard node embedding methods fail

The type of this problem is node embedding.
I.e., In: graph, Out: node embeddings.

However, the following example tells standard embeddings
techniques fail.
10 / 45 KYOTO UNIVERSITY
Distance is opposite in the graph and latent space

The shortest-path distance between nodes A and B is 21.
The shortest-path distance between nodes A and C is 18.

Standard node embedding methods would embed node C
closer to A than node B to A, which is not consistent with
the ground truth latent coordinates.
10-NN graph
The coordinates are
supposed to be hidden,
but I show them for
illustration.
11 / 45 KYOTO UNIVERSITY
Critical assumption does not hold

Embedding nodes that are close in the input graph close
is the critical assumption in various embedding methods.

This assumption does NOT hold in our situation.
10-NN graph
The coordinates are
supposed to be hidden,
but I show them for
illustration.
12 / 45 KYOTO UNIVERSITY
Solution
13 / 45 KYOTO UNIVERSITY
Edge lengths are important

Why the previous example fails?

If the edge lengths were took into consideration,
the shortest path distance would be a consistent estimator of
the latent distance.

Step 1: Estimate the latent edge lengths.
10-NN graph
The coordinates are
supposed to be hidden,
but I show them for
illustration.
14 / 45 KYOTO UNIVERSITY
Densities are important

Observation: Edges are longer in sparse regions
and shorter in dense regions.

Step 2: Estimate the densities.

But how? We do not know the coordinates of the points...
10-NN graph
The coordinates are
supposed to be hidden,
but I show them for
illustration.
15 / 45 KYOTO UNIVERSITY
Density can be estimated from PageRank

Solution: A PageRank-like estimator solves it.
The stationary distribution of random walks (plus a simple
transformation) is a consistent estimator of the density.

The higher the rank is, the denser there is.

This can be computed solely from the unweighted graph.
10-NN graph
Stationary distribution
of simple random walks
≈ PageRank
16 / 45 KYOTO UNIVERSITY
Given 01-adjacency, estimate the coordinates

Problem definition (again)
In: The 01-adjacency matrix of a k-NN graph
Out: The latent coordinates of the nodes

Very simple.
estimate
17 / 45 KYOTO UNIVERSITY
Procedure to estimate the coordinates
1. Compute the stationary distribution of random walks.
2. Estimate the density around each node.
3. Estimate the edge lengths using the estimated densities.
4. Compute the shortest path distances using the estimated
edge lengths and compute the distance matrix.
5. Estimate the coordinates from the distance matrix
by, e.g., multidimentional scaling.

This is a consistent estimator [Hashimoto+ AISTATS 2015].
Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015.
(up to rigid transform)
18 / 45 KYOTO UNIVERSITY
We can recover the coordinates consistently
The latent coordinates can be consistently estimated
solely from the unweighted k-NN graph.
Take Home Message
19 / 45 KYOTO UNIVERSITY
Towards Principled User-side Recommender Systems (CIKM 2022)
Ryoma Sato. Towards Principled User-side Recommender Systems. CIKM 2022.
20 / 45 KYOTO UNIVERSITY
Let’s consider item-to-item recommendations

We consider item-to-item recommendations.

Ex: “Products related to this item” panel in Amazon.com.
21 / 45 KYOTO UNIVERSITY
User-side recsys realizes user’s desiderata

Problem: We are unsatisfactory with the official recommender
system.

It provides monotone recommendations.
We need serendipity.

It provides recommendations biased towards specific
companies or countries.

User-side recommender systems [Sato 2022] enable users
to build their own recommender systems that satisfy their
desiderata even when the official one does not support them.
Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without
Log Data? SDM 2022.
22 / 45 KYOTO UNIVERSITY
We need powerful and principled user-side Recsys

[Sato 2022]’s user-side recommender system is realized in an
ad-hoc manner, and the performance is not so high.

We need a way to build user-side recommender systems in a
systematic manner and a more powerful one.
Hopefully one that is as strong as the official one.
Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without
Log Data? SDM 2022.
23 / 45 KYOTO UNIVERSITY
Official (traditional) recommender systems
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
Step 1. training
Step 2. inference
recommendations
Official (traditional) recsys
24 / 45 KYOTO UNIVERSITY
Users cannot see the data, algorithm, and model
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
recommendations
These parts are not
observable for users
(industrial secrets)
25 / 45 KYOTO UNIVERSITY
How can we build our Recsys without them?
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
recommendations
But they are crucial
information to build
new Recsys...
26 / 45 KYOTO UNIVERSITY
We assume the model is embedding-based
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
recommendations
(Slight) Assumption:
The model embeds items and
recommends near items.
This is a common strategy in Recsys.
We do not assume the way it embeds.
It can be matrix factorization,
neural networks, etc.
27 / 45 KYOTO UNIVERSITY
We can observe k-NN graph of the embeddings
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
recommendations
Observation:
These outputs have sufficient information
to construct the unweighted k-NN graph.
I.e., users can build the k-NN graph by
accessing each item page, and observing
what the neighboring items are.
28 / 45 KYOTO UNIVERSITY
We can estimate the embeddings!
Recsys
Algorithm
log data
catalog
auxiliary data
Ingredients
Recsys model
sourece item
recommendations
Solution:
Estimate the item embeddings of
the official Recsys.
They are considered to be secret,
but we can estimate them from
the weighted k-NN graph!
They contain much information!
29 / 45 KYOTO UNIVERSITY
We realize our desiderata with the embeddings

We can do many things with the estimated embeddings.

We can compute recommendations by ourselves and
with our own postprocessings.

If you want more serendipity,
recommend 1st, 2nd, 4th, 8th, ... and 32nd nearest items
or add noise to the embeddings.

If you want to decrease the bias to specific companies,
add negative biases to the score of these items so as to
suppress these companies.
30 / 45 KYOTO UNIVERSITY
Experiments validated the theory

In the experiments
I conducted simulations
and showed that the hidden
item embeddings can be
estimated accurately.
I built a fair Recsys for Twitter, which runs
in the real-world, on the user’s side.
Even though the official Recsys
is not fair w.r.t. gender, mine is, and
it is more efficient than the existing one.
31 / 45 KYOTO UNIVERSITY
Users can recover the item embeddings
Users can “reverse engineer” the official item
embeddings solely from the observable information.
Take Home Message
32 / 45 KYOTO UNIVERSITY
Graph Neural Networks can Recover the Hidden Features
Solely from the Graph Structure (ICML 2023)
Ryoma Sato. Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure. ICML 2023.
33 / 45 KYOTO UNIVERSITY
We call for the theory for GNNs

Graph Neural Networks (GNNs) take a graph with node
features as input and output node embeddings.

GNNs is a popular choice in various graph-related tasks.

GNNs are so popular that understanding GNNs by theory is
an important topic in its own right.
e.g., What is the hypothesis space of GNNs?
(GNNs do not have a universal approximation power.)
Why GNNs work well in so many tasks?
34 / 45 KYOTO UNIVERSITY
GNNs apply filters to node features

GNNs apply filters to the input node features and extract
useful features.

The input node features have long been considered
to be the key to success.
If the features have no useful signals, GNNs will not work.
35 / 45 KYOTO UNIVERSITY
Good node features are not always available

However, informative node features are not always available.

E.g., social network user information may be hidden for
privacy reasons.
36 / 45 KYOTO UNIVERSITY
Uninformative features degrade the performance

If we have no features at hand, we usually input
uninformative node features such as the degree features.

No matter how such features are filtered, only uninformative
embeddings are obtained.
“garbage in, garbage out.”
This is common sense.
37 / 45 KYOTO UNIVERSITY
Can GNNs work with uninformative node features?

Research question I want to answer in this project:
Do GNNs really not work when the input node features
are uninformative?

In practice, GNNs sometimes work just with degree features.
The reason is a mystery, which I want to elucidate.
38 / 45 KYOTO UNIVERSITY
We assume latent node features behind the graph

(Slight) Assumption:
The graph structure is formed by connecting nodes whose
latent node features z*
v
are close to each other.
 The latent node features z*
v
are not an observable
e.g., "true user preference vector"
Latent features that contain users’
preferences, workplace, residence, etc.
Those who have similar
preferences and residence
have connections.
We can only observe the way they are
connected, not the coordinates.
39 / 45 KYOTO UNIVERSITY
GNNs can recover the lantent feature

Main results:
GNNs can recover the latent node features z*
v
even when the
input node features are uninformative.
 z*
v
contains the preferences of users, which is useful for tasks.
40 / 45 KYOTO UNIVERSITY
GNNs create useful node features themselves

GNNs can create completely new and useful node
features by absorbing information from the graph structure,
even when the input node features are uninformative.

A new perspective that overturns the existing view of filtering
input node features.
41 / 45 KYOTO UNIVERSITY
GNNs can recover the coordinates with some tricks

How to prove it?
→ Metric recovery from k-NN graphs as you may expect.

But be careful when you apply it.
What GNNs can do (the hypothesis space of GNNs) is limited.

The metric recovery algorithm is compatible with GNNs.
Stationary distribution → GNNs can do random walks.
Shortest path → GNNs can simulate Bellman-Ford.
MDS → This is a bit tricky part. We send the matrix to
some nodes and solve it locally.

GNNs can recover the metric with slight additional errors.
42 / 45 KYOTO UNIVERSITY
Recovered features are empicirally useful

In the experiments,
We empirically confirmed
this phenomenon.
The recovered features are useful for various downstream tasks,
even when the input features xsyn
are uninformative.
43 / 45 KYOTO UNIVERSITY
GNNs can create useful features by themselves
GNNs can create useful node features by absorbing
information from the underlying graph.
Take Home Message
44 / 45 KYOTO UNIVERSITY
Conclusion
45 / 45 KYOTO UNIVERSITY
I introduced my favorite topic and its applications

Metric recovery from unweighted k-NN graphs is my
recent favorite technique.
I like this technique because
The scope of applications is broad, and
The results are simple but non-trivial.
The latent coordinates can be consistently estimated
solely from the unweighted k-NN graph.
Take Home Message

More Related Content

Similar to Metric Recovery from Unweighted k-NN Graphs

Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
IAEME Publication
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIER
IRJET Journal
 
Machine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with RMachine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with R
IRJET Journal
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial NetworkIRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET Journal
 
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Rui Wang
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET Journal
 
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
IRJET -  	  Handwritten Bangla Digit Recognition using Capsule NetworkIRJET -  	  Handwritten Bangla Digit Recognition using Capsule Network
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
IRJET Journal
 
IRJET - Image Classification using CNN
IRJET - Image Classification using CNNIRJET - Image Classification using CNN
IRJET - Image Classification using CNN
IRJET Journal
 
Al04605265270
Al04605265270Al04605265270
Al04605265270
IJERA Editor
 
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
IRJET Journal
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platform
IRJET Journal
 
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET Journal
 
Qwertyui
QwertyuiQwertyui
Qwertyui
Jamie Boyd
 
IRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level ProcessIRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level Process
IRJET Journal
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Alexander Pozdneev
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTION
IRJET Journal
 
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET -  	  Finger Vein Extraction and Authentication System for ATMIRJET -  	  Finger Vein Extraction and Authentication System for ATM
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET Journal
 
SHORTEST PATH FINDING VISUALIZER
SHORTEST PATH FINDING VISUALIZERSHORTEST PATH FINDING VISUALIZER
SHORTEST PATH FINDING VISUALIZER
IRJET Journal
 
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
IRJET Journal
 

Similar to Metric Recovery from Unweighted k-NN Graphs (20)

Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2Dynamic approach to k means clustering algorithm-2
Dynamic approach to k means clustering algorithm-2
 
DESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIERDESIGN OF LOW POWER MULTIPLIER
DESIGN OF LOW POWER MULTIPLIER
 
Machine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with RMachine Learning, K-means Algorithm Implementation with R
Machine Learning, K-means Algorithm Implementation with R
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial NetworkIRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
 
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed GraphUsing Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
Using Set Cover to Optimize a Large-Scale Low Latency Distributed Graph
 
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
IRJET- Identification of Scene Images using Convolutional Neural Networks - A...
 
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
IRJET -  	  Handwritten Bangla Digit Recognition using Capsule NetworkIRJET -  	  Handwritten Bangla Digit Recognition using Capsule Network
IRJET - Handwritten Bangla Digit Recognition using Capsule Network
 
IRJET - Image Classification using CNN
IRJET - Image Classification using CNNIRJET - Image Classification using CNN
IRJET - Image Classification using CNN
 
Al04605265270
Al04605265270Al04605265270
Al04605265270
 
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
Design a 3D CAD Model of a Stealth Aircraft and Generate Mesh to Optimize Mes...
 
Dance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platformDance With AI – An interactive dance learning platform
Dance With AI – An interactive dance learning platform
 
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...IRJET-  	  Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
IRJET- Jeevn-Net: Brain Tumor Segmentation using Cascaded U-Net & Overall...
 
Qwertyui
QwertyuiQwertyui
Qwertyui
 
IRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level ProcessIRJET- Advanced Control Strategies for Mold Level Process
IRJET- Advanced Control Strategies for Mold Level Process
 
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
Graph Community Detection Algorithm for Distributed Memory Parallel Computing...
 
DEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTIONDEEP LEARNING BASED BRAIN STROKE DETECTION
DEEP LEARNING BASED BRAIN STROKE DETECTION
 
IRJET - Finger Vein Extraction and Authentication System for ATM
IRJET -  	  Finger Vein Extraction and Authentication System for ATMIRJET -  	  Finger Vein Extraction and Authentication System for ATM
IRJET - Finger Vein Extraction and Authentication System for ATM
 
SHORTEST PATH FINDING VISUALIZER
SHORTEST PATH FINDING VISUALIZERSHORTEST PATH FINDING VISUALIZER
SHORTEST PATH FINDING VISUALIZER
 
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
ANALYSIS OF LUNG NODULE DETECTION AND STAGE CLASSIFICATION USING FASTER RCNN ...
 

More from joisino

キャッシュオブリビアスアルゴリズム
キャッシュオブリビアスアルゴリズムキャッシュオブリビアスアルゴリズム
キャッシュオブリビアスアルゴリズム
joisino
 
Towards Principled User-side Recommender Systems
Towards Principled User-side Recommender SystemsTowards Principled User-side Recommender Systems
Towards Principled User-side Recommender Systems
joisino
 
CLEAR: A Fully User-side Image Search System
CLEAR: A Fully User-side Image Search SystemCLEAR: A Fully User-side Image Search System
CLEAR: A Fully User-side Image Search System
joisino
 
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
joisino
 
An Introduction to Spectral Graph Theory
An Introduction to Spectral Graph TheoryAn Introduction to Spectral Graph Theory
An Introduction to Spectral Graph Theory
joisino
 
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
joisino
 
最適輸送入門
最適輸送入門最適輸送入門
最適輸送入門
joisino
 
ユーザーサイド情報検索システム
ユーザーサイド情報検索システムユーザーサイド情報検索システム
ユーザーサイド情報検索システム
joisino
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
joisino
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
joisino
 
Fast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a TreeFast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a Tree
joisino
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
joisino
 
死にたくない
死にたくない死にたくない
死にたくない
joisino
 

More from joisino (13)

キャッシュオブリビアスアルゴリズム
キャッシュオブリビアスアルゴリズムキャッシュオブリビアスアルゴリズム
キャッシュオブリビアスアルゴリズム
 
Towards Principled User-side Recommender Systems
Towards Principled User-side Recommender SystemsTowards Principled User-side Recommender Systems
Towards Principled User-side Recommender Systems
 
CLEAR: A Fully User-side Image Search System
CLEAR: A Fully User-side Image Search SystemCLEAR: A Fully User-side Image Search System
CLEAR: A Fully User-side Image Search System
 
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
Private Recommender Systems: How Can Users Build Their Own Fair Recommender S...
 
An Introduction to Spectral Graph Theory
An Introduction to Spectral Graph TheoryAn Introduction to Spectral Graph Theory
An Introduction to Spectral Graph Theory
 
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
Word Tour: One-dimensional Word Embeddings via the Traveling Salesman Problem...
 
最適輸送入門
最適輸送入門最適輸送入門
最適輸送入門
 
ユーザーサイド情報検索システム
ユーザーサイド情報検索システムユーザーサイド情報検索システム
ユーザーサイド情報検索システム
 
最適輸送の解き方
最適輸送の解き方最適輸送の解き方
最適輸送の解き方
 
Random Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural NetworksRandom Features Strengthen Graph Neural Networks
Random Features Strengthen Graph Neural Networks
 
Fast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a TreeFast Unbalanced Optimal Transport on a Tree
Fast Unbalanced Optimal Transport on a Tree
 
グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題グラフニューラルネットワークとグラフ組合せ問題
グラフニューラルネットワークとグラフ組合せ問題
 
死にたくない
死にたくない死にたくない
死にたくない
 

Recently uploaded

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
yqqaatn0
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
Daniel Tubbenhauer
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills MN
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
Leonel Morgado
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
David Osipyan
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
vluwdy49
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
University of Rennes, INSA Rennes, Inria/IRISA, CNRS
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
yqqaatn0
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
University of Maribor
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
RitabrataSarkar3
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
Vandana Devesh Sharma
 

Recently uploaded (20)

Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
如何办理(uvic毕业证书)维多利亚大学毕业证本科学位证书原版一模一样
 
Equivariant neural networks and representation theory
Equivariant neural networks and representation theoryEquivariant neural networks and representation theory
Equivariant neural networks and representation theory
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
Travis Hills' Endeavors in Minnesota: Fostering Environmental and Economic Pr...
 
Immersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths ForwardImmersive Learning That Works: Research Grounding and Paths Forward
Immersive Learning That Works: Research Grounding and Paths Forward
 
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
3D Hybrid PIC simulation of the plasma expansion (ISSS-14)
 
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
在线办理(salfor毕业证书)索尔福德大学毕业证毕业完成信一模一样
 
Deep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless ReproducibilityDeep Software Variability and Frictionless Reproducibility
Deep Software Variability and Frictionless Reproducibility
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
原版制作(carleton毕业证书)卡尔顿大学毕业证硕士文凭原版一模一样
 
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
Remote Sensing and Computational, Evolutionary, Supercomputing, and Intellige...
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 
Eukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptxEukaryotic Transcription Presentation.pptx
Eukaryotic Transcription Presentation.pptx
 
Compexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titrationCompexometric titration/Chelatorphy titration/chelating titration
Compexometric titration/Chelatorphy titration/chelating titration
 

Metric Recovery from Unweighted k-NN Graphs

  • 1. 1 KYOTO UNIVERSITY KYOTO UNIVERSITY Metric Recovery from Unweighted k-NN Graphs Ryoma Sato
  • 2. 2 / 45 KYOTO UNIVERSITY I introduce my favorite topic and its applications  Metric recovery from unweighted k-NN graphs is my recent favorite technique. I like this technique because The scope of applications is broad, and The results are simple but non-trivial.  I first introduce this problem.  I then introduce my recent projects that used this technique. - Towards Principled User-side Recommender Systems (CIKM 2022) - Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure (ICML 2023)
  • 3. 3 / 45 KYOTO UNIVERSITY Metric Recovery from Unweighted k-NN Graphs Morteza Alamgir, Ulrike von Luxburg. Shortest path distance in random k-nearest neighbor graphs. ICML 2012. Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015.
  • 4. 4 / 45 KYOTO UNIVERSITY k-NN graph is generated from a point cloud  We generate a k-NN graph from a point cloud.  Then, we discard the coordinates of nodes. generate edges discard coordinates nodes have coordinates for visualization but they are random
  • 5. 5 / 45 KYOTO UNIVERSITY Metric recovery asks to estimate the coodinates  The original coordinates are hidden now.  Metric recovery from unweighted k-NN graphs is a problem of estimating the coordinates from the k-NN graph. estimate
  • 6. 6 / 45 KYOTO UNIVERSITY Only the existences of edges are observable  Unweighted means the edge lengths are neither available.  This is equivalent to the setting where only the 01-adjacency matrix of the k-NN graph is available. estimate
  • 7. 7 / 45 KYOTO UNIVERSITY Given 01-adjacency, estimate the coordinates  Problem (Metric Recovery from Unweighted k-NN Graphs) In: The 01-adjacency matrix of a k-NN graph Out: The latent coordinates of the nodes  Very simple. estimate
  • 8. 8 / 45 KYOTO UNIVERSITY Why Is This Problem Challenging?
  • 9. 9 / 45 KYOTO UNIVERSITY Standard node embedding methods fail  The type of this problem is node embedding. I.e., In: graph, Out: node embeddings.  However, the following example tells standard embeddings techniques fail.
  • 10. 10 / 45 KYOTO UNIVERSITY Distance is opposite in the graph and latent space  The shortest-path distance between nodes A and B is 21. The shortest-path distance between nodes A and C is 18.  Standard node embedding methods would embed node C closer to A than node B to A, which is not consistent with the ground truth latent coordinates. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.
  • 11. 11 / 45 KYOTO UNIVERSITY Critical assumption does not hold  Embedding nodes that are close in the input graph close is the critical assumption in various embedding methods.  This assumption does NOT hold in our situation. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.
  • 12. 12 / 45 KYOTO UNIVERSITY Solution
  • 13. 13 / 45 KYOTO UNIVERSITY Edge lengths are important  Why the previous example fails?  If the edge lengths were took into consideration, the shortest path distance would be a consistent estimator of the latent distance.  Step 1: Estimate the latent edge lengths. 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.
  • 14. 14 / 45 KYOTO UNIVERSITY Densities are important  Observation: Edges are longer in sparse regions and shorter in dense regions.  Step 2: Estimate the densities.  But how? We do not know the coordinates of the points... 10-NN graph The coordinates are supposed to be hidden, but I show them for illustration.
  • 15. 15 / 45 KYOTO UNIVERSITY Density can be estimated from PageRank  Solution: A PageRank-like estimator solves it. The stationary distribution of random walks (plus a simple transformation) is a consistent estimator of the density.  The higher the rank is, the denser there is.  This can be computed solely from the unweighted graph. 10-NN graph Stationary distribution of simple random walks ≈ PageRank
  • 16. 16 / 45 KYOTO UNIVERSITY Given 01-adjacency, estimate the coordinates  Problem definition (again) In: The 01-adjacency matrix of a k-NN graph Out: The latent coordinates of the nodes  Very simple. estimate
  • 17. 17 / 45 KYOTO UNIVERSITY Procedure to estimate the coordinates 1. Compute the stationary distribution of random walks. 2. Estimate the density around each node. 3. Estimate the edge lengths using the estimated densities. 4. Compute the shortest path distances using the estimated edge lengths and compute the distance matrix. 5. Estimate the coordinates from the distance matrix by, e.g., multidimentional scaling.  This is a consistent estimator [Hashimoto+ AISTATS 2015]. Tatsunori Hashimoto, Yi Sun, Tommi Jaakkola. Metric recovery from directed unweighted graphs. AISTATS 2015. (up to rigid transform)
  • 18. 18 / 45 KYOTO UNIVERSITY We can recover the coordinates consistently The latent coordinates can be consistently estimated solely from the unweighted k-NN graph. Take Home Message
  • 19. 19 / 45 KYOTO UNIVERSITY Towards Principled User-side Recommender Systems (CIKM 2022) Ryoma Sato. Towards Principled User-side Recommender Systems. CIKM 2022.
  • 20. 20 / 45 KYOTO UNIVERSITY Let’s consider item-to-item recommendations  We consider item-to-item recommendations.  Ex: “Products related to this item” panel in Amazon.com.
  • 21. 21 / 45 KYOTO UNIVERSITY User-side recsys realizes user’s desiderata  Problem: We are unsatisfactory with the official recommender system.  It provides monotone recommendations. We need serendipity.  It provides recommendations biased towards specific companies or countries.  User-side recommender systems [Sato 2022] enable users to build their own recommender systems that satisfy their desiderata even when the official one does not support them. Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? SDM 2022.
  • 22. 22 / 45 KYOTO UNIVERSITY We need powerful and principled user-side Recsys  [Sato 2022]’s user-side recommender system is realized in an ad-hoc manner, and the performance is not so high.  We need a way to build user-side recommender systems in a systematic manner and a more powerful one. Hopefully one that is as strong as the official one. Ryoma Sato. Private Recommender Systems: How Can Users Build Their Own Fair Recommender Systems without Log Data? SDM 2022.
  • 23. 23 / 45 KYOTO UNIVERSITY Official (traditional) recommender systems Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item Step 1. training Step 2. inference recommendations Official (traditional) recsys
  • 24. 24 / 45 KYOTO UNIVERSITY Users cannot see the data, algorithm, and model Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations These parts are not observable for users (industrial secrets)
  • 25. 25 / 45 KYOTO UNIVERSITY How can we build our Recsys without them? Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations But they are crucial information to build new Recsys...
  • 26. 26 / 45 KYOTO UNIVERSITY We assume the model is embedding-based Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations (Slight) Assumption: The model embeds items and recommends near items. This is a common strategy in Recsys. We do not assume the way it embeds. It can be matrix factorization, neural networks, etc.
  • 27. 27 / 45 KYOTO UNIVERSITY We can observe k-NN graph of the embeddings Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations Observation: These outputs have sufficient information to construct the unweighted k-NN graph. I.e., users can build the k-NN graph by accessing each item page, and observing what the neighboring items are.
  • 28. 28 / 45 KYOTO UNIVERSITY We can estimate the embeddings! Recsys Algorithm log data catalog auxiliary data Ingredients Recsys model sourece item recommendations Solution: Estimate the item embeddings of the official Recsys. They are considered to be secret, but we can estimate them from the weighted k-NN graph! They contain much information!
  • 29. 29 / 45 KYOTO UNIVERSITY We realize our desiderata with the embeddings  We can do many things with the estimated embeddings.  We can compute recommendations by ourselves and with our own postprocessings.  If you want more serendipity, recommend 1st, 2nd, 4th, 8th, ... and 32nd nearest items or add noise to the embeddings.  If you want to decrease the bias to specific companies, add negative biases to the score of these items so as to suppress these companies.
  • 30. 30 / 45 KYOTO UNIVERSITY Experiments validated the theory  In the experiments I conducted simulations and showed that the hidden item embeddings can be estimated accurately. I built a fair Recsys for Twitter, which runs in the real-world, on the user’s side. Even though the official Recsys is not fair w.r.t. gender, mine is, and it is more efficient than the existing one.
  • 31. 31 / 45 KYOTO UNIVERSITY Users can recover the item embeddings Users can “reverse engineer” the official item embeddings solely from the observable information. Take Home Message
  • 32. 32 / 45 KYOTO UNIVERSITY Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure (ICML 2023) Ryoma Sato. Graph Neural Networks can Recover the Hidden Features Solely from the Graph Structure. ICML 2023.
  • 33. 33 / 45 KYOTO UNIVERSITY We call for the theory for GNNs  Graph Neural Networks (GNNs) take a graph with node features as input and output node embeddings.  GNNs is a popular choice in various graph-related tasks.  GNNs are so popular that understanding GNNs by theory is an important topic in its own right. e.g., What is the hypothesis space of GNNs? (GNNs do not have a universal approximation power.) Why GNNs work well in so many tasks?
  • 34. 34 / 45 KYOTO UNIVERSITY GNNs apply filters to node features  GNNs apply filters to the input node features and extract useful features.  The input node features have long been considered to be the key to success. If the features have no useful signals, GNNs will not work.
  • 35. 35 / 45 KYOTO UNIVERSITY Good node features are not always available  However, informative node features are not always available.  E.g., social network user information may be hidden for privacy reasons.
  • 36. 36 / 45 KYOTO UNIVERSITY Uninformative features degrade the performance  If we have no features at hand, we usually input uninformative node features such as the degree features.  No matter how such features are filtered, only uninformative embeddings are obtained. “garbage in, garbage out.” This is common sense.
  • 37. 37 / 45 KYOTO UNIVERSITY Can GNNs work with uninformative node features?  Research question I want to answer in this project: Do GNNs really not work when the input node features are uninformative?  In practice, GNNs sometimes work just with degree features. The reason is a mystery, which I want to elucidate.
  • 38. 38 / 45 KYOTO UNIVERSITY We assume latent node features behind the graph  (Slight) Assumption: The graph structure is formed by connecting nodes whose latent node features z* v are close to each other.  The latent node features z* v are not an observable e.g., "true user preference vector" Latent features that contain users’ preferences, workplace, residence, etc. Those who have similar preferences and residence have connections. We can only observe the way they are connected, not the coordinates.
  • 39. 39 / 45 KYOTO UNIVERSITY GNNs can recover the lantent feature  Main results: GNNs can recover the latent node features z* v even when the input node features are uninformative.  z* v contains the preferences of users, which is useful for tasks.
  • 40. 40 / 45 KYOTO UNIVERSITY GNNs create useful node features themselves  GNNs can create completely new and useful node features by absorbing information from the graph structure, even when the input node features are uninformative.  A new perspective that overturns the existing view of filtering input node features.
  • 41. 41 / 45 KYOTO UNIVERSITY GNNs can recover the coordinates with some tricks  How to prove it? → Metric recovery from k-NN graphs as you may expect.  But be careful when you apply it. What GNNs can do (the hypothesis space of GNNs) is limited.  The metric recovery algorithm is compatible with GNNs. Stationary distribution → GNNs can do random walks. Shortest path → GNNs can simulate Bellman-Ford. MDS → This is a bit tricky part. We send the matrix to some nodes and solve it locally.  GNNs can recover the metric with slight additional errors.
  • 42. 42 / 45 KYOTO UNIVERSITY Recovered features are empicirally useful  In the experiments, We empirically confirmed this phenomenon. The recovered features are useful for various downstream tasks, even when the input features xsyn are uninformative.
  • 43. 43 / 45 KYOTO UNIVERSITY GNNs can create useful features by themselves GNNs can create useful node features by absorbing information from the underlying graph. Take Home Message
  • 44. 44 / 45 KYOTO UNIVERSITY Conclusion
  • 45. 45 / 45 KYOTO UNIVERSITY I introduced my favorite topic and its applications  Metric recovery from unweighted k-NN graphs is my recent favorite technique. I like this technique because The scope of applications is broad, and The results are simple but non-trivial. The latent coordinates can be consistently estimated solely from the unweighted k-NN graph. Take Home Message