SlideShare a Scribd company logo
1 of 23
Download to read offline
MANIFOLDS IN SEMI-SUPERVISED LEARNING
Monojit Basu
Director, TechYugadi IT Solutions & Consulting, Bangalore
EXTENDED
2
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
3
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
4
Semi-supervized Learning: Overview
● Training Samples consist of data with and without class label
● Images with and without captions
● Text with and without tags, ..
● Model is built with both labeled and unlabeled data
Prob(y|x) Prob(x)
● Smoothness Property: If two data points are close, their labels
should be similar
Label Data
Based on labeled samples Based on both labeled
and unlabeled samples
5
Graph-based Algorithms For SSL
● There are many many ways of exploiting smoothness property
● A simplistic baseline approach is self-training (not graph-based)
● Graph-based Algorithms are particularly effective
● Label Propagation
● Random-Walk
● Min-Cut
● Density-based Distances
● Local and Global Consistency
● Using Graph Kernels, ..
6
Label Propagation
● Generates a weighted graph where edges between similar
neighbours have higher weights (Zhu and Ghahramani, 2002)
● Defines a transition matrix:
● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label
● Repeatedly multiplies the current label matrix with the transition
matrix (which itself gets updated)
● Until labels on all nodes stabilize (convergence)
● In effect labels propagate from labeled to unlabeled nodes
1
1
1
00
0 unlabeled
7
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
8
Manifold Structures
● Data (nodes) are distributed over low and high density regions
● Two nodes that are geometrically close may not be similar
● Or equivalently, the geometry / distance measure should be redefined
● Euclidean distances and weights based on them may not work
● Such data is said to lie on a manifold
● Although not necessary, manifold structures are often
observed with high-dimensional data
● More complex scenario: data may not lie on a single manifold
● This is called multi-manifold structure
9
Single Manifold Structures
SWISS ROLL TWO MOONS
10
Multi-manifold Structures
$
Dollar Symbol
Surface Sphere
11
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
12
Manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a (single) manifold (Belkin et al., 2006)
● Key is to establish connectivity between similar nodes by
staying along a high-density region
● Mathematically it involves
● Computing a matrix L derived from the ordinary weight matrix W
● Taking the top n eigenvalues of L
● Computing an indicator function using the dot product of a data point
with the eigenvalues
● It is based on a theory known as Kernel Hilbert Spaces
13
Maniford Regularization (Schematic)
DATA
W
L=D-W
Eigen(L)
dotxData Point >0
+ve
-ve
CLASS LABELS
14
Multi-manifold Regularization
● This is the technical term for semi-supervized classification of
data distributed on a multi-manifold (Goldberg et al., 2009)
● Single manifold algorithm still starts with Euclidean distances,
but reformulates steps based on the derived matrix L
● Multi-manifold algorithm straight away changes distance
metrics
● It is based on Hellinger distances H, and
● A Mahalnabis k-nearest neighbor graph computed from H
● Complete algorithm is much longer, involving spectral
clustering and self-training on each cluster
15
Multi-manifold Regularization (Schematic)
DATA
Σs
Sample Cov. Mat.
H
kNN graph
Spectral Clustering
Self-trained Clusters
16
Multi-view Semi-supervised Learning
● Multi-view learning involves two or more independent
projections for each data point
● Classic Example: web-page classification using
● Bag of words
● Links to other web-pages
● Instead of representing data as (X, y) where y is class label, it
may be represented as (X1, X2, y), where Xi are views
● Somewhat related to multimodal learning (like video and
audio)
17
Multi-view Manifold Regularization
● Can manifold regularization be extended to multi-view data
● Yes, algorithms exist, based on strong mathematical
foundations, like Sindhwani and Rosenberg, 2008
● There is actually a generic pattern for multi-view semi-
supervized learning, called co-training
● Sindhwani et al., extends co-training with an algorithm called
co-regularization
● It reduces the problem to a convex optimization to minimize a
loss function
● The total loss function depends on individual class predictors
for each view, and a couple of regularization hyperparameters
18
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
19
Python Implementation
● An implementation of some of these algorithms in Python 3.x is
published on github:
https://github.com/techyugadi/manifold_ssl
● These algorithms offer an interface similar to scikit-learn
● There are some programs to generate synthetic data and also
use the MNIST handwritten digits data
● Note: scikit-learn as of now supports only label propagation
algorithm for semi-supervized learning
● R package has more algorithms but not maifold regularization
● This is early-access release, more algorithms to be published !
20
Outline
● Semi-supervized Learning and Graph-based Algorithms
● Data Distribution on Manifold and Multi-manifold
● Classification Algorithms with Manifold Regularization
● Implementation Hints
● Closing Remarks
21
Summary
● Manifold regularization is an improvement over the standard
label propagation algorithm for semi-supervised learning
● It may lead to better results when data is distributed over a
manifold or multi-manifold
● This class of algorithms cover a wide range of scenarios,
including multi-view datasets
● These algorithms can be implemented in Python using
common numpy and linear algebra packages (see github)
22
References
● Zhu and Ghahramani, 2002: Learning from Labeled and
Unlabeled Data with Label Propagation
● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization:
A Geometric Framework for Learning from Labeled and
Unlabeled Examples
● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View
Learning and Manifold Co-Regularization
● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold
Semi-Supervised Learning
23
THANK YOU
monojit@techyugadi.com

More Related Content

Similar to NODES 2020 extended - Manifolds in semi-supervised learning

Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine LearningSudarsun Santhiappan
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series ForecastingBillTubbs
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksSeunghyun Hwang
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroDaniel Marcous
 
Object Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopObject Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopMohammad Shawahneh
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Pramati Technologies
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility PrincipleBADR
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboMaryam Farooq
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper reviewMazen Aly
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Anant Corporation
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...thanhdowork
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImageryRAHUL BHOJWANI
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Sparkdatamantra
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowAnant Corporation
 
End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflowsAdam Gibson
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6Peter Tröger
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsPrincy Joy
 

Similar to NODES 2020 extended - Manifolds in semi-supervised learning (20)

Challenges in Large Scale Machine Learning
Challenges in Large Scale  Machine LearningChallenges in Large Scale  Machine Learning
Challenges in Large Scale Machine Learning
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Web Traffic Time Series Forecasting
Web Traffic  Time Series ForecastingWeb Traffic  Time Series Forecasting
Web Traffic Time Series Forecasting
 
ResNeSt: Split-Attention Networks
ResNeSt: Split-Attention NetworksResNeSt: Split-Attention Networks
ResNeSt: Split-Attention Networks
 
Production-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to heroProduction-Ready BIG ML Workflows - from zero to hero
Production-Ready BIG ML Workflows - from zero to hero
 
Object Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshopObject Oriented, Design patterns and data modelling worshop
Object Oriented, Design patterns and data modelling worshop
 
C3 w3
C3 w3C3 w3
C3 w3
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 
Single Responsibility Principle
Single Responsibility PrincipleSingle Responsibility Principle
Single Responsibility Principle
 
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell ReboNYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
NYAI #25: Evolution Strategies: An Alternative Approach to AI w/ Maxwell Rebo
 
Ad Click Prediction - Paper review
Ad Click Prediction - Paper reviewAd Click Prediction - Paper review
Ad Click Prediction - Paper review
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
Data Engineer's Lunch 89: Machine Learning Orchestration with AirflowMachine ...
 
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
240115_Thanh_LabSeminar[Don't walk, skip! online learning of multi-scale netw...
 
Semantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite ImagerySemantic Segmentation on Satellite Imagery
Semantic Segmentation on Satellite Imagery
 
Introduction to Machine Learning with Spark
Introduction to Machine Learning with SparkIntroduction to Machine Learning with Spark
Introduction to Machine Learning with Spark
 
Machine Learning Orchestration with Airflow
Machine Learning Orchestration with AirflowMachine Learning Orchestration with Airflow
Machine Learning Orchestration with Airflow
 
End to end MLworkflows
End to end MLworkflowsEnd to end MLworkflows
End to end MLworkflows
 
OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6OpenHPI - Parallel Programming Concepts - Week 6
OpenHPI - Parallel Programming Concepts - Week 6
 
Deep Semi-supervised Learning methods
Deep Semi-supervised Learning methodsDeep Semi-supervised Learning methods
Deep Semi-supervised Learning methods
 

More from Neo4j

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...Neo4j
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansNeo4j
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityNeo4j
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Neo4j
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfNeo4j
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...Neo4j
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosNeo4j
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Neo4j
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jNeo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Neo4j
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeNeo4j
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsNeo4j
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j
 

More from Neo4j (20)

Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...Workshop - Best of Both Worlds_ Combine  KG and Vector search for  enhanced R...
Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced R...
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and BioinformaticiansQIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
QIAGEN: Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians
 
EY_Graph Database Powered Sustainability
EY_Graph Database Powered SustainabilityEY_Graph Database Powered Sustainability
EY_Graph Database Powered Sustainability
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024Build your next Gen AI Breakthrough - April 2024
Build your next Gen AI Breakthrough - April 2024
 
Connecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdfConnecting the Dots for Information Discovery.pdf
Connecting the Dots for Information Discovery.pdf
 
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
ISDEFE - GraphSummit Madrid - ARETA: Aviation Real-Time Emissions Token Accre...
 
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafosBBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
BBVA - GraphSummit Madrid - Caso de éxito en BBVA: Optimizando con grafos
 
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
Graph Everywhere - Josep Taruella - Por qué Graph Data Science en tus modelos...
 
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4jGraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
GraphSummit Madrid - Product Vision and Roadmap - Luis Salvador Neo4j
 
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdfNeo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
Neo4j_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdfRabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
Rabobank_Exploring the Impact of Graph Technology on Financial Services.pdf
 
Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!Webinar - IA generativa e grafi Neo4j: RAG time!
Webinar - IA generativa e grafi Neo4j: RAG time!
 
IA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG timeIA Generativa y Grafos de Neo4j: RAG time
IA Generativa y Grafos de Neo4j: RAG time
 
Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)Neo4j: Data Engineering for RAG (retrieval augmented generation)
Neo4j: Data Engineering for RAG (retrieval augmented generation)
 
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdfNeo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
Neo4j Graph Summit 2024 Workshop - EMEA - Breda_and_Munchen.pdf
 
Enabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge GraphsEnabling GenAI Breakthroughs with Knowledge Graphs
Enabling GenAI Breakthroughs with Knowledge Graphs
 
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdfNeo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
Neo4j_Anurag Tandon_Product Vision and Roadmap.Benelux.pptx.pdf
 
Neo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with GraphNeo4j Jesus Barrasa The Art of the Possible with Graph
Neo4j Jesus Barrasa The Art of the Possible with Graph
 

Recently uploaded

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...MyIntelliSource, Inc.
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfkalichargn70th171
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceanilsa9823
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...ICS
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...panagenda
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...harshavardhanraghave
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...MyIntelliSource, Inc.
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxComplianceQuest1
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxbodapatigopi8531
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerThousandEyes
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Modelsaagamshah0812
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providermohitmore19
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️Delhi Call girls
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionSolGuruz
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comFatema Valibhai
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdfWave PLM
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsAlberto González Trastoy
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsArshad QA
 

Recently uploaded (20)

Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
Steps To Getting Up And Running Quickly With MyTimeClock Employee Scheduling ...
 
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdfThe Ultimate Test Automation Guide_ Best Practices and Tips.pdf
The Ultimate Test Automation Guide_ Best Practices and Tips.pdf
 
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female serviceCALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
CALL ON ➥8923113531 🔝Call Girls Badshah Nagar Lucknow best Female service
 
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
The Real-World Challenges of Medical Device Cybersecurity- Mitigating Vulnera...
 
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
W01_panagenda_Navigating-the-Future-with-The-Hitchhikers-Guide-to-Notes-and-D...
 
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
Reassessing the Bedrock of Clinical Function Models: An Examination of Large ...
 
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
Try MyIntelliAccount Cloud Accounting Software As A Service Solution Risk Fre...
 
A Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docxA Secure and Reliable Document Management System is Essential.docx
A Secure and Reliable Document Management System is Essential.docx
 
Hand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptxHand gesture recognition PROJECT PPT.pptx
Hand gesture recognition PROJECT PPT.pptx
 
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected WorkerHow To Troubleshoot Collaboration Apps for the Modern Connected Worker
How To Troubleshoot Collaboration Apps for the Modern Connected Worker
 
Unlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language ModelsUnlocking the Future of AI Agents with Large Language Models
Unlocking the Future of AI Agents with Large Language Models
 
TECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service providerTECUNIQUE: Success Stories: IT Service provider
TECUNIQUE: Success Stories: IT Service provider
 
Microsoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdfMicrosoft AI Transformation Partner Playbook.pdf
Microsoft AI Transformation Partner Playbook.pdf
 
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
call girls in Vaishali (Ghaziabad) 🔝 >༒8448380779 🔝 genuine Escort Service 🔝✔️✔️
 
Diamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with PrecisionDiamond Application Development Crafting Solutions with Precision
Diamond Application Development Crafting Solutions with Precision
 
HR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.comHR Software Buyers Guide in 2024 - HRSoftware.com
HR Software Buyers Guide in 2024 - HRSoftware.com
 
5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf5 Signs You Need a Fashion PLM Software.pdf
5 Signs You Need a Fashion PLM Software.pdf
 
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time ApplicationsUnveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
Unveiling the Tech Salsa of LAMs with Janus in Real-Time Applications
 
Software Quality Assurance Interview Questions
Software Quality Assurance Interview QuestionsSoftware Quality Assurance Interview Questions
Software Quality Assurance Interview Questions
 
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS LiveVip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
Vip Call Girls Noida ➡️ Delhi ➡️ 9999965857 No Advance 24HRS Live
 

NODES 2020 extended - Manifolds in semi-supervised learning

  • 1. MANIFOLDS IN SEMI-SUPERVISED LEARNING Monojit Basu Director, TechYugadi IT Solutions & Consulting, Bangalore EXTENDED
  • 2. 2 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 3. 3 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 4. 4 Semi-supervized Learning: Overview ● Training Samples consist of data with and without class label ● Images with and without captions ● Text with and without tags, .. ● Model is built with both labeled and unlabeled data Prob(y|x) Prob(x) ● Smoothness Property: If two data points are close, their labels should be similar Label Data Based on labeled samples Based on both labeled and unlabeled samples
  • 5. 5 Graph-based Algorithms For SSL ● There are many many ways of exploiting smoothness property ● A simplistic baseline approach is self-training (not graph-based) ● Graph-based Algorithms are particularly effective ● Label Propagation ● Random-Walk ● Min-Cut ● Density-based Distances ● Local and Global Consistency ● Using Graph Kernels, ..
  • 6. 6 Label Propagation ● Generates a weighted graph where edges between similar neighbours have higher weights (Zhu and Ghahramani, 2002) ● Defines a transition matrix: ● Tij = probability of node i ‘jumping’ into node j, that is, taking up j’s label ● Repeatedly multiplies the current label matrix with the transition matrix (which itself gets updated) ● Until labels on all nodes stabilize (convergence) ● In effect labels propagate from labeled to unlabeled nodes 1 1 1 00 0 unlabeled
  • 7. 7 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 8. 8 Manifold Structures ● Data (nodes) are distributed over low and high density regions ● Two nodes that are geometrically close may not be similar ● Or equivalently, the geometry / distance measure should be redefined ● Euclidean distances and weights based on them may not work ● Such data is said to lie on a manifold ● Although not necessary, manifold structures are often observed with high-dimensional data ● More complex scenario: data may not lie on a single manifold ● This is called multi-manifold structure
  • 11. 11 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 12. 12 Manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a (single) manifold (Belkin et al., 2006) ● Key is to establish connectivity between similar nodes by staying along a high-density region ● Mathematically it involves ● Computing a matrix L derived from the ordinary weight matrix W ● Taking the top n eigenvalues of L ● Computing an indicator function using the dot product of a data point with the eigenvalues ● It is based on a theory known as Kernel Hilbert Spaces
  • 14. 14 Multi-manifold Regularization ● This is the technical term for semi-supervized classification of data distributed on a multi-manifold (Goldberg et al., 2009) ● Single manifold algorithm still starts with Euclidean distances, but reformulates steps based on the derived matrix L ● Multi-manifold algorithm straight away changes distance metrics ● It is based on Hellinger distances H, and ● A Mahalnabis k-nearest neighbor graph computed from H ● Complete algorithm is much longer, involving spectral clustering and self-training on each cluster
  • 15. 15 Multi-manifold Regularization (Schematic) DATA Σs Sample Cov. Mat. H kNN graph Spectral Clustering Self-trained Clusters
  • 16. 16 Multi-view Semi-supervised Learning ● Multi-view learning involves two or more independent projections for each data point ● Classic Example: web-page classification using ● Bag of words ● Links to other web-pages ● Instead of representing data as (X, y) where y is class label, it may be represented as (X1, X2, y), where Xi are views ● Somewhat related to multimodal learning (like video and audio)
  • 17. 17 Multi-view Manifold Regularization ● Can manifold regularization be extended to multi-view data ● Yes, algorithms exist, based on strong mathematical foundations, like Sindhwani and Rosenberg, 2008 ● There is actually a generic pattern for multi-view semi- supervized learning, called co-training ● Sindhwani et al., extends co-training with an algorithm called co-regularization ● It reduces the problem to a convex optimization to minimize a loss function ● The total loss function depends on individual class predictors for each view, and a couple of regularization hyperparameters
  • 18. 18 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 19. 19 Python Implementation ● An implementation of some of these algorithms in Python 3.x is published on github: https://github.com/techyugadi/manifold_ssl ● These algorithms offer an interface similar to scikit-learn ● There are some programs to generate synthetic data and also use the MNIST handwritten digits data ● Note: scikit-learn as of now supports only label propagation algorithm for semi-supervized learning ● R package has more algorithms but not maifold regularization ● This is early-access release, more algorithms to be published !
  • 20. 20 Outline ● Semi-supervized Learning and Graph-based Algorithms ● Data Distribution on Manifold and Multi-manifold ● Classification Algorithms with Manifold Regularization ● Implementation Hints ● Closing Remarks
  • 21. 21 Summary ● Manifold regularization is an improvement over the standard label propagation algorithm for semi-supervised learning ● It may lead to better results when data is distributed over a manifold or multi-manifold ● This class of algorithms cover a wide range of scenarios, including multi-view datasets ● These algorithms can be implemented in Python using common numpy and linear algebra packages (see github)
  • 22. 22 References ● Zhu and Ghahramani, 2002: Learning from Labeled and Unlabeled Data with Label Propagation ● Belkin, Niyogi and Sindhwani, 2006: Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples ● Sindhwani and Rosenberg, 2008: An RKHS for Multi-View Learning and Manifold Co-Regularization ● Goldberg, Zhu, Singh, Xu and Nowak, 2009: Multi-Manifold Semi-Supervised Learning