The document discusses using a kernel self-organizing map (SOM) to cluster the vertices of a large graph to understand its structure. Specifically, it aims to cluster vertices in a graph representing relationships between 615 peasants in medieval France based on agricultural contracts. It motivates the use of graph clustering and discusses defining distances between vertices using methods like the Laplacian matrix. The kernel SOM is proposed to cluster the non-vectorial graph vertices.
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
Mini useR! in Melbourne https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network/events/251933078/
MelbURN (Melbourne useR group) https://www.meetup.com/fr-FR/MelbURN-Melbourne-Users-of-R-Network
July 16th, 2018
Melbourne, Australia
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
AG du PEPI IBIS, 1er avril 2014
Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Visualiser et fouiller des réseaux - Méthodes et exemples dans Rtuxette
AG du PEPI IBIS, 1er avril 2014
Cet exposé introduira la notion de réseaux et les problématiques élémentaires qui y sont généralement associées (visualisation, recherche de sommets importants, recherche de modules). Les notions seront illustrées à l'aide d'exemples utilisant le logiciel R sur un réseau réel.
https://telecombcn-dl.github.io/2017-dlai/
Deep learning technologies are at the core of the current revolution in artificial intelligence for multimedia data analysis. The convergence of large-scale annotated datasets and affordable GPU hardware has allowed the training of neural networks for data analysis tasks which were previously addressed with hand-crafted features. Architectures such as convolutional neural networks, recurrent neural networks or Q-nets for reinforcement learning have shaped a brand new scenario in signal processing. This course will cover the basic principles of deep learning from both an algorithmic and computational perspectives.
This presentation is based on ``Statistical Modeling: The two cultures'' from Leo Breiman. It compares the data modeling culture (statistics) and the algorithmic modeling culture (machine learning).
Kernel methods and variable selection for exploratory analysis and multi-omic...tuxette
Nathalie Vialaneix
4th course on Computational Systems Biology of Cancer: Multi-omics and Machine Learning Approaches
International course, Curie training
https://training.institut-curie.org/courses/sysbiocancer2021
(remote)
September 29th, 2021
Slide 1: Title Slide
Extrachromosomal Inheritance
Slide 2: Introduction to Extrachromosomal Inheritance
Definition: Extrachromosomal inheritance refers to the transmission of genetic material that is not found within the nucleus.
Key Components: Involves genes located in mitochondria, chloroplasts, and plasmids.
Slide 3: Mitochondrial Inheritance
Mitochondria: Organelles responsible for energy production.
Mitochondrial DNA (mtDNA): Circular DNA molecule found in mitochondria.
Inheritance Pattern: Maternally inherited, meaning it is passed from mothers to all their offspring.
Diseases: Examples include Leber’s hereditary optic neuropathy (LHON) and mitochondrial myopathy.
Slide 4: Chloroplast Inheritance
Chloroplasts: Organelles responsible for photosynthesis in plants.
Chloroplast DNA (cpDNA): Circular DNA molecule found in chloroplasts.
Inheritance Pattern: Often maternally inherited in most plants, but can vary in some species.
Examples: Variegation in plants, where leaf color patterns are determined by chloroplast DNA.
Slide 5: Plasmid Inheritance
Plasmids: Small, circular DNA molecules found in bacteria and some eukaryotes.
Features: Can carry antibiotic resistance genes and can be transferred between cells through processes like conjugation.
Significance: Important in biotechnology for gene cloning and genetic engineering.
Slide 6: Mechanisms of Extrachromosomal Inheritance
Non-Mendelian Patterns: Do not follow Mendel’s laws of inheritance.
Cytoplasmic Segregation: During cell division, organelles like mitochondria and chloroplasts are randomly distributed to daughter cells.
Heteroplasmy: Presence of more than one type of organellar genome within a cell, leading to variation in expression.
Slide 7: Examples of Extrachromosomal Inheritance
Four O’clock Plant (Mirabilis jalapa): Shows variegated leaves due to different cpDNA in leaf cells.
Petite Mutants in Yeast: Result from mutations in mitochondrial DNA affecting respiration.
Slide 8: Importance of Extrachromosomal Inheritance
Evolution: Provides insight into the evolution of eukaryotic cells.
Medicine: Understanding mitochondrial inheritance helps in diagnosing and treating mitochondrial diseases.
Agriculture: Chloroplast inheritance can be used in plant breeding and genetic modification.
Slide 9: Recent Research and Advances
Gene Editing: Techniques like CRISPR-Cas9 are being used to edit mitochondrial and chloroplast DNA.
Therapies: Development of mitochondrial replacement therapy (MRT) for preventing mitochondrial diseases.
Slide 10: Conclusion
Summary: Extrachromosomal inheritance involves the transmission of genetic material outside the nucleus and plays a crucial role in genetics, medicine, and biotechnology.
Future Directions: Continued research and technological advancements hold promise for new treatments and applications.
Slide 11: Questions and Discussion
Invite Audience: Open the floor for any questions or further discussion on the topic.
(May 29th, 2024) Advancements in Intravital Microscopy- Insights for Preclini...Scintica Instrumentation
Intravital microscopy (IVM) is a powerful tool utilized to study cellular behavior over time and space in vivo. Much of our understanding of cell biology has been accomplished using various in vitro and ex vivo methods; however, these studies do not necessarily reflect the natural dynamics of biological processes. Unlike traditional cell culture or fixed tissue imaging, IVM allows for the ultra-fast high-resolution imaging of cellular processes over time and space and were studied in its natural environment. Real-time visualization of biological processes in the context of an intact organism helps maintain physiological relevance and provide insights into the progression of disease, response to treatments or developmental processes.
In this webinar we give an overview of advanced applications of the IVM system in preclinical research. IVIM technology is a provider of all-in-one intravital microscopy systems and solutions optimized for in vivo imaging of live animal models at sub-micron resolution. The system’s unique features and user-friendly software enables researchers to probe fast dynamic biological processes such as immune cell tracking, cell-cell interaction as well as vascularization and tumor metastasis with exceptional detail. This webinar will also give an overview of IVM being utilized in drug development, offering a view into the intricate interaction between drugs/nanoparticles and tissues in vivo and allows for the evaluation of therapeutic intervention in a variety of tissues and organs. This interdisciplinary collaboration continues to drive the advancements of novel therapeutic strategies.
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Richard's aventures in two entangled wonderlandsRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Seminar of U.V. Spectroscopy by SAMIR PANDASAMIR PANDA
Spectroscopy is a branch of science dealing the study of interaction of electromagnetic radiation with matter.
Ultraviolet-visible spectroscopy refers to absorption spectroscopy or reflect spectroscopy in the UV-VIS spectral region.
Ultraviolet-visible spectroscopy is an analytical method that can measure the amount of light received by the analyte.
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
The increased availability of biomedical data, particularly in the public domain, offers the opportunity to better understand human health and to develop effective therapeutics for a wide range of unmet medical needs. However, data scientists remain stymied by the fact that data remain hard to find and to productively reuse because data and their metadata i) are wholly inaccessible, ii) are in non-standard or incompatible representations, iii) do not conform to community standards, and iv) have unclear or highly restricted terms and conditions that preclude legitimate reuse. These limitations require a rethink on data can be made machine and AI-ready - the key motivation behind the FAIR Guiding Principles. Concurrently, while recent efforts have explored the use of deep learning to fuse disparate data into predictive models for a wide range of biomedical applications, these models often fail even when the correct answer is already known, and fail to explain individual predictions in terms that data scientists can appreciate. These limitations suggest that new methods to produce practical artificial intelligence are still needed.
In this talk, I will discuss our work in (1) building an integrative knowledge infrastructure to prepare FAIR and "AI-ready" data and services along with (2) neurosymbolic AI methods to improve the quality of predictions and to generate plausible explanations. Attention is given to standards, platforms, and methods to wrangle knowledge into simple, but effective semantic and latent representations, and to make these available into standards-compliant and discoverable interfaces that can be used in model building, validation, and explanation. Our work, and those of others in the field, creates a baseline for building trustworthy and easy to deploy AI models in biomedicine.
Bio
Dr. Michel Dumontier is the Distinguished Professor of Data Science at Maastricht University, founder and executive director of the Institute of Data Science, and co-founder of the FAIR (Findable, Accessible, Interoperable and Reusable) data principles. His research explores socio-technological approaches for responsible discovery science, which includes collaborative multi-modal knowledge graphs, privacy-preserving distributed data mining, and AI methods for drug discovery and personalized medicine. His work is supported through the Dutch National Research Agenda, the Netherlands Organisation for Scientific Research, Horizon Europe, the European Open Science Cloud, the US National Institutes of Health, and a Marie-Curie Innovative Training Network. He is the editor-in-chief for the journal Data Science and is internationally recognized for his contributions in bioinformatics, biomedical informatics, and semantic technologies including ontologies and linked data.
Nutraceutical market, scope and growth: Herbal drug technologyLokesh Patil
As consumer awareness of health and wellness rises, the nutraceutical market—which includes goods like functional meals, drinks, and dietary supplements that provide health advantages beyond basic nutrition—is growing significantly. As healthcare expenses rise, the population ages, and people want natural and preventative health solutions more and more, this industry is increasing quickly. Further driving market expansion are product formulation innovations and the use of cutting-edge technology for customized nutrition. With its worldwide reach, the nutraceutical industry is expected to keep growing and provide significant chances for research and investment in a number of categories, including vitamins, minerals, probiotics, and herbal supplements.
Nutraceutical market, scope and growth: Herbal drug technology
Graph mining with kernel self-organizing map
1. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Graph mining with kernel self-organizing map
Nathalie Villa-Vialaneix
http://www.nathalievilla.org
Joint work with Fabrice Rossi, INRIA, Rocquencourt, France
Institut de Mathématiques de Toulouse, - IUT de Carcassonne, Université de
Perpignan
France
SanTouVal, February 1st, 2008
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
2. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
3. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
4. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Historian’s questions:
family or geographical social links ?
central people having a main social role ?
. . .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
5. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Exploring a big historic database
Data
1000 agrarian contracts,
from four seignories (about 10 villages) of South West of
France,
established between 1250 and 1350 (before the Hundred
Years’ war).
Historian’s questions:
family or geographical social links ?
central people having a main social role ?
. . .
⇒ Data mining is required.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
6. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
7. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
8. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Number of vertices: 615
Number of edges: 4193
Total of weights: 40 329
Diameter: 10
Density: 2,2%
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
9. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A graph clustering problem
From the database, building a weighted graph:
with 615 vertices x1, . . . , xn := peasants found in the
contracts;
with weights (wi,j)i,j=1,...,n := {contracts where xi and xj are
mentionned}.
Number of vertices: 615
Number of edges: 4193
Total of weights: 40 329
Diameter: 10
Density: 2,2%
Clustering the vertices into homogeneous social groups to
understand the structure of the peasant community.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
10. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
11. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Question: Understanding the structure of these large graphs
Clustering: building relevant homogeneous groups;
Graph drawing: giving a global representation of the graph.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
12. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Other fields modelized by large graphs
Computer science: World Wide Web, P2P network. . .
Social networks
Biology: Protein interactions, Neuronal network,. . .
Business, management: Transportation networks, Industry
partnerships. . .
Question: Understanding the structure of these large graphs
Clustering: building relevant homogeneous groups;
Graph drawing: giving a global representation of the graph.
Here: Self-Organizing Map for nonvectorial data.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
13. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
14. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
15. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Dissimilarities based on the shortest paths;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
16. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Usual dissimilarities between vertices
The Dice (Jaccard) index:
D(xi, xj) =
Γ(xi) ∩ Γ(xj)
|Γ(xi)| + |Γ(xj)|
(non weighted graphs);
Dissimilarities based on the shortest paths;
Dissimilarities or distances based on the Laplacian matrix:
spectral clustering.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
17. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian
Definitions
For a graph with vertices V = {x1, . . . , xn} having positive weights
(wi,j)i,j=1,...,n such that, for all i, j = 1, . . . , n, wi,j = wj,i and di = n
j=1 wi,j,
Laplacian: L = (Li,j)i,j=1,...,n where
Li,j =
−wi,j if i j
di if i = j
;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
18. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property I [von Luxburg, 2007]
Connected subgraphs
KerL = Span{IA1
, . . . , IAk
} where Ai indicates the positions of the
vertices of the ith connected component of the graph.
1
4
5
2
3
KerL = Span
1
0
0
1
1
;
0
1
1
0
0
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
19. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Laplacian and perfect communities
For a non weighted graph,
The graph has a perfect community with m vertices
⇔
L has m eigenvectors such that each eigenvector has the same
n − m coordinates that vanish.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
20. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Application :
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
21. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property II [Boulet et al., 2008]
Perfect community : Complete subgraph (clique) which vertices
share the same neighbors outside the clique.
Application :
But: only 1/3 of the graph can be drawn this way.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
22. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum , is equivalent to minimize
H = arg min
h∈Rn×k
Tr hT
Lh subject to
hT
h = I
hi = 1/
√
|Ai|1Ai
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
23. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum , is equivalent to minimize
H = arg min
h∈Rn×k
Tr hT
Lh subject to
hT
h = I
hi = 1/
√
|Ai|1Ai
⇒ NP-complete problem.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
24. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum can be approached by
H = arg min
h∈Rn×k
Tr hT
Lh subject to hT
h = I
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
25. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Laplacian: property III [von Luxburg, 2007]
Min Cut problem: Suppose that we have a connected graph.
Find a classification of the vertices of the graph, A1, . . . , Ak such
that
1
2
k
i=1 j∈Ai,j Ai
wj,j
is minimum can be approached by
H = arg min
h∈Rn×k
Tr hT
Lh subject to hT
h = I
Spectral clustering: Find the k smallest eigenvectors of L, H, and
make the classification on the rows of H.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
26. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
A regularized version of L
Regularization : the diffusion matrix : pour β > 0,
Kβ = e−βL
= +∞
k=1
(−βL)k
k! .
⇒
kβ
: V × V → R
(xi, xj) → K
β
i,j
diffusion kernel (or heat kernel).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
27. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Diffusion process on the graph
If Z0 = (1 1 1 . . . 1 1)T
is the “energy” of each vertex at time 0 and
if a small fraction of this energy is propagated among the edges
of the graph at each time step, then after t steps, the energy of the
vertices of the graph is:
Zt = (1 + L)t
Z0
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
28. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Diffusion process on the graph
If Z0 = (1 1 1 . . . 1 1)T
is the “energy” of each vertex at time 0 and
if a small fraction of this energy is propagated among the edges
of the graph at each time step, then after t steps, the energy of the
vertices of the graph is:
Zt = (1 + L)t
Z0
Limits: Time step ∆t by t → t/(∆t) and → ∆t; then
(∆t) → 0 (continuous process) gives
lim Zt = e tL
= K t
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
29. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
30. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
2 Regularization operator: for u ∈ Rn
∼ V, uT
Kβu is higher for
vectors u that vary a lot over “close” vertices of the graph.
β intensity of regularization (for small β, direct neighbors are
more important);
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
31. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Properties
1 Diffusion on the graph: kβ(xi, xj) quantity of energy
accumulated in xj after a given time if energy 1 is injected in xi
at time 0 and if diffusion is done continuously along the edges.
β intensity of diffusion;
2 Regularization operator: for u ∈ Rn
∼ V, uT
Kβu is higher for
vectors u that vary a lot over “close” vertices of the graph.
β intensity of regularization (for small β, direct neighbors are
more important);
3 Reproducing kernel property: kβ is symmetric and positive
⇒ ∃ Hilbert space (H, ., . ) and φ : V → H such that
kβ
(xi, xj) = φ(xi), φ(xj) .
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
32. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
33. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Kohonen map
Mapping the data onto a 2 dimensional map
Each neuron of the map, i = 1, . . . , M is associated to a
prototype, pi ∈ H ;
Neurons are related to each others by a neighborhood
relationship (“distance”: d) :
Classifying the vertices on the map
Each xi is associated to a neuron (cluster or class) of the map,
f(xi).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
34. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Preserving the initial topology
Energy
The goal is to minimize the energy of the map:
E =
M
i=1
h(d(f(x), i)) x − pi
2
H dP(x)
where h is a decreasing function (ex: h(t) = αe−t/2σ2
).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
35. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Preserving the initial topology
Energy
The goal is to minimize the energy of the map:
E =
M
i=1
h(d(f(x), i)) x − pi
2
H dP(x)
where h is a decreasing function (ex: h(t) = αe−t/2σ2
).
Energy is approached by its empirical version:
En
=
n
j=1
M
i=1
h(d(f(xj), i)) xj − pi
2
H .
and minimization is approached by SOM algorithm.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
36. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
37. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
fl
(xi) = arg min
j=1,...,M
φ(xi) −
n
i=1
γl
jiφ(xi)
H
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
38. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
fl
(xi) = arg min
j=1,...,M
φ(xi) −
n
i=1
γl
jiφ(xi)
H
Representation step
γl
j = arg min
γ∈Rn
n
i=1
h(fl
(xi), j) φ(xi) −
n
l =1
γl φ(xl )
2
H
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
39. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Batch kernel SOM [Villa and Rossi, 2007]
Initialize randomly γ0
ji
∈ R (i, j = 1, . . . , n) and p0
j
= n
i=1 γ0
ji
φ(xi).
Then, for l = 1, . . . , n repeat
Assignment step
for all xi,
f(xi) = arg min
j=1,...,M
n
u,u =1
γjuγju kβ
(xu, xu ) − 2
n
u=1
γjukβ
(xu, xi)
Representation step
γl
ji =
h(fl
(xi), j))
n
i =1 h(fl(xi , j))
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
40. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Table of contents
1 Motivations
2 Dissimilarities and distances between vertices
3 Kernel SOM
4 Application and comments
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
41. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
42. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
43. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Results on a 7 × 7 rectangular map
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
44. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
Expected developments
1 Hierarchical clustering;
2 Achieve a classification based on density criterium (joint work
with S. Gadat);
3 Adapting the algorithm to very large graphs (thousands of
vertices).
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008
45. Motivations
Dissimilarities and distances between vertices
Kernel SOM
Application and comments
References
Boulet, R., Jouve, B., Rossi, F., and Villa, N. (2008).
Batch kernel SOM and related laplacian methods for social network
analysis.
Neurocomputing.
To appear.
Villa, N. and Rossi, F. (2007).
A comparison between dissimilarity SOM and kernel SOM for clustering the
vertices of a graph.
In Proceedings of the 6th Workshop on Self-Organizing Maps (WSOM 07),
Bielefield, Germany.
von Luxburg, U. (2007).
A tutorial on spectral clustering.
Technical Report TR-149, Max Planck Institut für biologische Kybernetik.
Avaliable at http://www.kyb.mpg.de/publications/
attachments/luxburg06_TR_v2_4139%5B1%5D.pdf.
Nathalie Villa - nathalie.villa@math.univ-toulouse.fr SanTouVal - Feb. 2008