SlideShare a Scribd company logo
1 of 24
Machine Learning
A Probabilistic Perspective
Chapter 27
Latent variable models for
discrete data
Keywords : topic model, LDA, graph structure
Kyoto University, Okuno lab.
M1 Ikemiya Yukara
Introduction
• Discrete data and Continuous data
• Latent variable model
• Text analysis
1
0 0 1 0 1 1 1 0 1 …
foods computer sports
apple ?
?
topic
?
Bag of words : Word order is ingored
Topic model : Topics generate words
this
is
apen
10
yx +
?
…
pig dog cat
“animal” topic
latent
Distribution and Function
2
functionSoftmax:
ondistributiDirichlet:
ondistributiPoisson:
ondistributilCategorica:
ondistributialMultinomin:
ondistributiNormal:
(:)
(:)
(:)
(:)
(:)
(:)
S
Dir
Poi
Cat
Mu
N
Softmax function
3
• Another name
– Normalized exponential function
Ni
a
a
S N
j j
i
,...,1
)exp(
)exp(
)(
1
,a ==
∑ =
4
Topic model
(not just for Text analysis)
Definition
5
iv vi
ivi
ii
li
Ln
i
vLn
NNi
iLLl
ilVy
∑ =
∈
=
=
∈
,
,
,
:},...,1,0{
):(:1
):(:1
:},...,1{
documentinoccurs
wordtimesofnumberthe
documentsofnumberthe
documentoflengththe
documentinwordth'the
Mixture models
• Simplest model
• Model of count vectors
6
∏=
==
i
i
L
l
kliiLi yCatkqp
1
,:1, )|()|( by
document i i’s topic word l
…
pig dog cat
topic k’s
word distribution
latent
)),|(),|( ,∑===
v
viikiiiii nLLMukqLp (bnn
count vector of
words in document i
)|(),( ,, kvviii nPoikqp λ==n
If is unknowniL
Exponential family PCA (ePCA)
• Probabilistic PCA (PPCA) <Chap. 12>
• Categorical PCA
• Model of count vectors
7
iiiii dNNp zΣμzIWzyθy ),|(),|()|( 00
2
∫= σ
prior of latent variableslikelihood
Continuous data
Change!!
Discrete or count data
∏=
=
i
i
L
l
iliiLi SyCatp
1
,:1, ))(|()|( Wzzy
KV×
ℜ∈W
K
i ℜ∈z
: weight matrix
))(,|(),|( iiiiii SLMuLp Wznzn =
Purpose : More flexible model
Idea : Change latent variables
discrete to continuous
What’s doing? :
Dimension reduction of
iq iz
KV →
(words) (topics)
mPCA and LDA
• Multinomial PCA (mPCA)
• LDA
8
In ePCA, represents the natural parameters of the exponential familyiWz
The natural parameter
Vector of log odds
The dual parameter
Probability vector
iWz iBπ
)(
),|(),|(
Ki
iiiiii
Dir
LMuLp
1~π
Bπnπn
α
=
∏=
=
i
i
L
l
iliiLi yCatp
1
,:1, )|()|( Bππy
…
…
…
B
word 1 2 3 V…
topic
…
1
2
K
iπ …
topic 1 2 3 K…
iBπ …
word 1 2 3 V…
(e.g. Multinominal)
Latent Dirichlet allocation (LDA)
• Main purpose
• Advantage
• Dirichlet distribution
9
Unsupervised discovery of topics.
LDA can handle ambiguity (polysemy).
- To play ball
- To play the coronet
- Shakespeare’s play
10,1 ≤≤=∑ i
i
i θθ
1 2 3 1 2 3
Latent Dirichlet allocation (LDA)
• Full model
10
)(,|
},...,{)(|
)(|
},...,{)(|
,,
,1,
,
,1,
klili
VkkkVk
iili
KiiiKi
Catkqy
bbDir
Catq
Dir
b~B
b:1~b
π~π
π:1~π
=
≡
≡
γγ
ππαα
iL
N
α
B
γ
iπ
liq ,
liy ,
ily
k
ilq
i
li
k
li
i
documentinword:
topicofondistributiword:b
documentinwordtheoftopic:
documentofondistributitopic:π
,
,
Latent Dirichlet allocation (LDA)
11
word 1 2 3
word 1 2 3
1b
2b
Unsupervised
discovery
of topics
Dimension reduction
23 →
(words) (topics)
Evaluation of LDA
• Perplexity
– Evaluation as a language model
12






−= ∑ ∑= =
N
i
L
l
li
i
emp
i
yq
LN
qpperplexity
1 1
, )(log
11
exp),(

test
documents
language
model
Extensions of LDA
• Correlated topic model
• Dynamic topic model
13
business
finance
animal
topics
)(|
),(~
iii
i
S
N
zzπ
Σμz
=
topic “neuroscience”
1900s
“nerve”
2000s
“calcium
receptor”
),(~| 2
,1,1, Vktktkt N Ibbb σ−−
Correlation
Normalization
Extensions of LDA
• LDA-HMM
– HMM generates syntactically correct sentences,
but not semantically plausible ones.
14
HMM function or syntactic words
LDA content or semantic words
the state-state HMM transition matrix
the state-word HMM emission matrix
HMM
A
HMM
B
15
Graph structure
LVMs for graph-structured data
16
1. Discover some “interesting structure” in the graph,
such as clusters and communities.
2. Predict which links might occur in the future
(e.g. who will make friend with whom).
LVMs for graph-structured data
• Stochastic block model
17
)|(),,|(
),(~)(~
,,
,
bajiji
bai
rBerbqaqrRp
BetaCatq
η
βαη
==== η
,π
adjacency
matrix
nodes probability of connecting
group a to group b
LVMs for graph-structured data
• Mixed membership stochastic block model
– Lift the restriction that each node only belong to
one cluster
18
},...,1{ Kqi ∈ Ki S∈π
)|(),,|(
)(~)(~
)(~
,, bajijiji
jjiiji
i
rBerbqaqrRp
CatqCatq
Dir
η==== ←→
←→
η
π,π
απ
Who-likes-whom graph
labeled
by hand
iπ
19
Relational data
LVMs for relational data
20
protein
protein
chemical
typesentity,relation :),...,(... 121 KK TTTTTR ×××⊆
1T 2T
1T
}1,0{: 211 →×× TTTR
:1),,( =kjiR
Protein i interacts with
protein j when chemical k
is present.
3d binary matrix
Extend the stochastic block model
cbakji
t
t
i
cqbqaqrkjiRp
tiKq
,,
211
),,,|),,((
:},...,1{
η=====
∈
η
typeeachofentityeach
Infinite relational model (IRM)
• Idea
– Using a Dirichlet process
• Inference
– Variational Bayes
– Collapsed Gibbs sampling
21
The number of clusters for each type
tK infinite
We just sketch some interesting applications.
Applications of IRM
• Learning ontologies
– Organization of knowledge
22
What is “disease”?
What does it do?
Semantic network
T1 : 135 concepts (e.g. “disease”, “diagnostic procedure”, “animal”)
T2 : 49 predicates (e.g. “affects”, “prevents”)
}1,0{: 211 →×× TTTR
The system found
14 concept clusters and
21 predicate clusters.
(e.g. “biological function
affect organisms”)
Result
Summary
• Topic model
– Latent Dirichlet allocation (LDA)
• Graph structure
– Stochastic block model
• Relational data
– Infinite relational model
23

More Related Content

What's hot

Building trust through Explainable AI
Building trust through Explainable AIBuilding trust through Explainable AI
Building trust through Explainable AIPeet Denny
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognitionAshiq Ullah
 
Linear regression
Linear regressionLinear regression
Linear regressionMartinHogg9
 
Project presentation by Debendra Adhikari
Project presentation by Debendra AdhikariProject presentation by Debendra Adhikari
Project presentation by Debendra AdhikariDEBENDRA ADHIKARI
 
Introduction to pattern recognization
Introduction to pattern recognizationIntroduction to pattern recognization
Introduction to pattern recognizationAjharul Abedeen
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Sri Ambati
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detectionBrodmann17
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningYu Huang
 
Medical image processing
Medical image processingMedical image processing
Medical image processingDr G R Sinha
 
IRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET Journal
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learningSri Ambati
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with RGreat Wide Open
 
Image analysis using python
Image analysis using pythonImage analysis using python
Image analysis using pythonJerlyn Manohar
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep LearningCapgemini
 
Hidden surface removal
Hidden surface removalHidden surface removal
Hidden surface removalAnkit Garg
 
Animal identification using machine learning techniques
Animal identification using machine learning techniquesAnimal identification using machine learning techniques
Animal identification using machine learning techniquesAboul Ella Hassanien
 

What's hot (20)

Building trust through Explainable AI
Building trust through Explainable AIBuilding trust through Explainable AI
Building trust through Explainable AI
 
Introduction to Object recognition
Introduction to Object recognitionIntroduction to Object recognition
Introduction to Object recognition
 
Linear regression
Linear regressionLinear regression
Linear regression
 
Project presentation by Debendra Adhikari
Project presentation by Debendra AdhikariProject presentation by Debendra Adhikari
Project presentation by Debendra Adhikari
 
Introduction to pattern recognization
Introduction to pattern recognizationIntroduction to pattern recognization
Introduction to pattern recognization
 
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
Interpretable Machine Learning Using LIME Framework - Kasia Kulma (PhD), Data...
 
face detection
face detectionface detection
face detection
 
Graph Kernelpdf
Graph KernelpdfGraph Kernelpdf
Graph Kernelpdf
 
Introduction to object detection
Introduction to object detectionIntroduction to object detection
Introduction to object detection
 
Pose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learningPose estimation from RGB images by deep learning
Pose estimation from RGB images by deep learning
 
Medical image processing
Medical image processingMedical image processing
Medical image processing
 
IRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning AlgorithmIRJET- Road Accident Prediction using Machine Learning Algorithm
IRJET- Road Accident Prediction using Machine Learning Algorithm
 
Interpretable machine learning
Interpretable machine learningInterpretable machine learning
Interpretable machine learning
 
Computer Vision
Computer VisionComputer Vision
Computer Vision
 
Big Data Analytics with R
Big Data Analytics with RBig Data Analytics with R
Big Data Analytics with R
 
Image analysis using python
Image analysis using pythonImage analysis using python
Image analysis using python
 
Computer Vision with Deep Learning
Computer Vision with Deep LearningComputer Vision with Deep Learning
Computer Vision with Deep Learning
 
Hidden surface removal
Hidden surface removalHidden surface removal
Hidden surface removal
 
Animal identification using machine learning techniques
Animal identification using machine learning techniquesAnimal identification using machine learning techniques
Animal identification using machine learning techniques
 
Pattern recognition
Pattern recognitionPattern recognition
Pattern recognition
 

Viewers also liked

パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)Yukara Ikemiya
 
Presentación1234
Presentación1234Presentación1234
Presentación1234EVRMORALS
 
Hemorragia
HemorragiaHemorragia
HemorragiaMISS Y
 
Back to work or stay-at-home mother? Maternal employment in Finland
Back to work or stay-at-home mother? Maternal employment in FinlandBack to work or stay-at-home mother? Maternal employment in Finland
Back to work or stay-at-home mother? Maternal employment in FinlandKelan tutkimus / Research at Kela
 
音声にまつわる技術の基礎
音声にまつわる技術の基礎音声にまつわる技術の基礎
音声にまつわる技術の基礎Yukara Ikemiya
 
Drees action sociale des communes
Drees action sociale des communesDrees action sociale des communes
Drees action sociale des communesSociété Tripalio
 
Inquiry based learning on climate change
Inquiry based learning on climate changeInquiry based learning on climate change
Inquiry based learning on climate changeWei Chiao Kuo
 
Camera Shots in TV Drama
Camera Shots in TV DramaCamera Shots in TV Drama
Camera Shots in TV DramaNaamah Hill
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with DataSeth Familian
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017Drift
 

Viewers also liked (11)

パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)パターン認識と機械学習6章(カーネル法)
パターン認識と機械学習6章(カーネル法)
 
Presentación1234
Presentación1234Presentación1234
Presentación1234
 
Hemorragia
HemorragiaHemorragia
Hemorragia
 
Back to work or stay-at-home mother? Maternal employment in Finland
Back to work or stay-at-home mother? Maternal employment in FinlandBack to work or stay-at-home mother? Maternal employment in Finland
Back to work or stay-at-home mother? Maternal employment in Finland
 
音声にまつわる技術の基礎
音声にまつわる技術の基礎音声にまつわる技術の基礎
音声にまつわる技術の基礎
 
Drees action sociale des communes
Drees action sociale des communesDrees action sociale des communes
Drees action sociale des communes
 
Inquiry based learning on climate change
Inquiry based learning on climate changeInquiry based learning on climate change
Inquiry based learning on climate change
 
2concrete
2concrete2concrete
2concrete
 
Camera Shots in TV Drama
Camera Shots in TV DramaCamera Shots in TV Drama
Camera Shots in TV Drama
 
Visual Design with Data
Visual Design with DataVisual Design with Data
Visual Design with Data
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 

Similar to Machine Learning : Latent variable models for discrete data (Topic model ...)

Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)fridolin.wild
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015rusbase
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationUnited States Air Force Academy
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperShreya Chakrabarti
 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...Anirbit Mukherjee
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseBertram Ludäscher
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modellingcsandit
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGcscpconf
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberAlex Klibisz
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculusAfaq Siddiqui
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.Anirbit Mukherjee
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUSkills Matter
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptxHaibinSu2
 
LISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийLISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийSigma Software
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp languageDavid Gu
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexityIntro C# Book
 

Similar to Machine Learning : Latent variable models for discrete data (Topic model ...) (20)

Prolog & lisp
Prolog & lispProlog & lisp
Prolog & lisp
 
Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)Natural Language Processing in R (rNLP)
Natural Language Processing in R (rNLP)
 
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
Сергей Кольцов —НИУ ВШЭ —ICBDA 2015
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Programming Languages - Functional Programming Paper
Programming Languages - Functional Programming PaperProgramming Languages - Functional Programming Paper
Programming Languages - Functional Programming Paper
 
Presentation1
Presentation1Presentation1
Presentation1
 
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
My invited talk at the 2018 Annual Meeting of SIAM (Society of Industrial and...
 
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine ZeitreiseDeduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
Deduktive Datenbanken & Logische Programme: Eine kleine Zeitreise
 
A Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic ModellingA Text Mining Research Based on LDA Topic Modelling
A Text Mining Research Based on LDA Topic Modelling
 
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLINGA TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
A TEXT MINING RESEARCH BASED ON LDA TOPIC MODELLING
 
Research Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, GruberResearch Summary: Hidden Topic Markov Models, Gruber
Research Summary: Hidden Topic Markov Models, Gruber
 
Introduction to lambda calculus
Introduction to lambda calculusIntroduction to lambda calculus
Introduction to lambda calculus
 
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
My 2hr+ survey talk at the Vector Institute, on our deep learning theorems.
 
Using Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPUUsing Language Oriented Programming to Execute Computations on the GPU
Using Language Oriented Programming to Execute Computations on the GPU
 
transfer.pptx
transfer.pptxtransfer.pptx
transfer.pptx
 
LISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола МозговийLISP: назад в будущее, Микола Мозговий
LISP: назад в будущее, Микола Мозговий
 
Lambda Calculus
Lambda CalculusLambda Calculus
Lambda Calculus
 
An Intoduction to R
An Intoduction to RAn Intoduction to R
An Intoduction to R
 
A brief introduction to lisp language
A brief introduction to lisp languageA brief introduction to lisp language
A brief introduction to lisp language
 
19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity19. Java data structures algorithms and complexity
19. Java data structures algorithms and complexity
 

Recently uploaded

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxAshokKarra1
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Jisc
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Celine George
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfMr Bounab Samir
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxnelietumpap1
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSJoshuaGantuangco2
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Celine George
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)lakshayb543
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designMIPLM
 

Recently uploaded (20)

Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Karra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptxKarra SKD Conference Presentation Revised.pptx
Karra SKD Conference Presentation Revised.pptx
 
Raw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptxRaw materials used in Herbal Cosmetics.pptx
Raw materials used in Herbal Cosmetics.pptx
 
Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...Procuring digital preservation CAN be quick and painless with our new dynamic...
Procuring digital preservation CAN be quick and painless with our new dynamic...
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17Difference Between Search & Browse Methods in Odoo 17
Difference Between Search & Browse Methods in Odoo 17
 
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdfLike-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
Like-prefer-love -hate+verb+ing & silent letters & citizenship text.pdf
 
Q4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptxQ4 English4 Week3 PPT Melcnmg-based.pptx
Q4 English4 Week3 PPT Melcnmg-based.pptx
 
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTSGRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
GRADE 4 - SUMMATIVE TEST QUARTER 4 ALL SUBJECTS
 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
 
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
call girls in Kamla Market (DELHI) 🔝 >༒9953330565🔝 genuine Escort Service 🔝✔️✔️
 
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
Incoming and Outgoing Shipments in 3 STEPS Using Odoo 17
 
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
Visit to a blind student's school🧑‍🦯🧑‍🦯(community medicine)
 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
 
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptxYOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
YOUVE_GOT_EMAIL_PRELIMS_EL_DORADO_2024.pptx
 
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
Keynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-designKeynote by Prof. Wurzer at Nordex about IP-design
Keynote by Prof. Wurzer at Nordex about IP-design
 

Machine Learning : Latent variable models for discrete data (Topic model ...)

  • 1. Machine Learning A Probabilistic Perspective Chapter 27 Latent variable models for discrete data Keywords : topic model, LDA, graph structure Kyoto University, Okuno lab. M1 Ikemiya Yukara
  • 2. Introduction • Discrete data and Continuous data • Latent variable model • Text analysis 1 0 0 1 0 1 1 1 0 1 … foods computer sports apple ? ? topic ? Bag of words : Word order is ingored Topic model : Topics generate words this is apen 10 yx + ? … pig dog cat “animal” topic latent
  • 4. Softmax function 3 • Another name – Normalized exponential function Ni a a S N j j i ,...,1 )exp( )exp( )( 1 ,a == ∑ =
  • 5. 4 Topic model (not just for Text analysis)
  • 7. Mixture models • Simplest model • Model of count vectors 6 ∏= == i i L l kliiLi yCatkqp 1 ,:1, )|()|( by document i i’s topic word l … pig dog cat topic k’s word distribution latent )),|(),|( ,∑=== v viikiiiii nLLMukqLp (bnn count vector of words in document i )|(),( ,, kvviii nPoikqp λ==n If is unknowniL
  • 8. Exponential family PCA (ePCA) • Probabilistic PCA (PPCA) <Chap. 12> • Categorical PCA • Model of count vectors 7 iiiii dNNp zΣμzIWzyθy ),|(),|()|( 00 2 ∫= σ prior of latent variableslikelihood Continuous data Change!! Discrete or count data ∏= = i i L l iliiLi SyCatp 1 ,:1, ))(|()|( Wzzy KV× ℜ∈W K i ℜ∈z : weight matrix ))(,|(),|( iiiiii SLMuLp Wznzn = Purpose : More flexible model Idea : Change latent variables discrete to continuous What’s doing? : Dimension reduction of iq iz KV → (words) (topics)
  • 9. mPCA and LDA • Multinomial PCA (mPCA) • LDA 8 In ePCA, represents the natural parameters of the exponential familyiWz The natural parameter Vector of log odds The dual parameter Probability vector iWz iBπ )( ),|(),|( Ki iiiiii Dir LMuLp 1~π Bπnπn α = ∏= = i i L l iliiLi yCatp 1 ,:1, )|()|( Bππy … … … B word 1 2 3 V… topic … 1 2 K iπ … topic 1 2 3 K… iBπ … word 1 2 3 V… (e.g. Multinominal)
  • 10. Latent Dirichlet allocation (LDA) • Main purpose • Advantage • Dirichlet distribution 9 Unsupervised discovery of topics. LDA can handle ambiguity (polysemy). - To play ball - To play the coronet - Shakespeare’s play 10,1 ≤≤=∑ i i i θθ 1 2 3 1 2 3
  • 11. Latent Dirichlet allocation (LDA) • Full model 10 )(,| },...,{)(| )(| },...,{)(| ,, ,1, , ,1, klili VkkkVk iili KiiiKi Catkqy bbDir Catq Dir b~B b:1~b π~π π:1~π = ≡ ≡ γγ ππαα iL N α B γ iπ liq , liy , ily k ilq i li k li i documentinword: topicofondistributiword:b documentinwordtheoftopic: documentofondistributitopic:π , ,
  • 12. Latent Dirichlet allocation (LDA) 11 word 1 2 3 word 1 2 3 1b 2b Unsupervised discovery of topics Dimension reduction 23 → (words) (topics)
  • 13. Evaluation of LDA • Perplexity – Evaluation as a language model 12       −= ∑ ∑= = N i L l li i emp i yq LN qpperplexity 1 1 , )(log 11 exp),(  test documents language model
  • 14. Extensions of LDA • Correlated topic model • Dynamic topic model 13 business finance animal topics )(| ),(~ iii i S N zzπ Σμz = topic “neuroscience” 1900s “nerve” 2000s “calcium receptor” ),(~| 2 ,1,1, Vktktkt N Ibbb σ−− Correlation Normalization
  • 15. Extensions of LDA • LDA-HMM – HMM generates syntactically correct sentences, but not semantically plausible ones. 14 HMM function or syntactic words LDA content or semantic words the state-state HMM transition matrix the state-word HMM emission matrix HMM A HMM B
  • 17. LVMs for graph-structured data 16 1. Discover some “interesting structure” in the graph, such as clusters and communities. 2. Predict which links might occur in the future (e.g. who will make friend with whom).
  • 18. LVMs for graph-structured data • Stochastic block model 17 )|(),,|( ),(~)(~ ,, , bajiji bai rBerbqaqrRp BetaCatq η βαη ==== η ,π adjacency matrix nodes probability of connecting group a to group b
  • 19. LVMs for graph-structured data • Mixed membership stochastic block model – Lift the restriction that each node only belong to one cluster 18 },...,1{ Kqi ∈ Ki S∈π )|(),,|( )(~)(~ )(~ ,, bajijiji jjiiji i rBerbqaqrRp CatqCatq Dir η==== ←→ ←→ η π,π απ Who-likes-whom graph labeled by hand iπ
  • 21. LVMs for relational data 20 protein protein chemical typesentity,relation :),...,(... 121 KK TTTTTR ×××⊆ 1T 2T 1T }1,0{: 211 →×× TTTR :1),,( =kjiR Protein i interacts with protein j when chemical k is present. 3d binary matrix Extend the stochastic block model cbakji t t i cqbqaqrkjiRp tiKq ,, 211 ),,,|),,(( :},...,1{ η===== ∈ η typeeachofentityeach
  • 22. Infinite relational model (IRM) • Idea – Using a Dirichlet process • Inference – Variational Bayes – Collapsed Gibbs sampling 21 The number of clusters for each type tK infinite We just sketch some interesting applications.
  • 23. Applications of IRM • Learning ontologies – Organization of knowledge 22 What is “disease”? What does it do? Semantic network T1 : 135 concepts (e.g. “disease”, “diagnostic procedure”, “animal”) T2 : 49 predicates (e.g. “affects”, “prevents”) }1,0{: 211 →×× TTTR The system found 14 concept clusters and 21 predicate clusters. (e.g. “biological function affect organisms”) Result
  • 24. Summary • Topic model – Latent Dirichlet allocation (LDA) • Graph structure – Stochastic block model • Relational data – Infinite relational model 23