SlideShare a Scribd company logo
SUPERVISED PREDICTION
OF GRAPH SUMMARIES
Daniil Mirylenka
University of Trento, Italy
1
Outline
• Motivating example (from my Ph.D. research)
• Supervised learning
•  Binary classification
•  Perceptron
•  Ranking, Multiclass
• Structured output prediction
•  General approach, Structured Perceptron
•  “Easy” cases
• Prediction as search
•  Searn, DAGGER
•  Back to the motivating example
2
Motivating example
Representing academic search results
3
graph summary
search results
Motivating example
Suppose we can do this:
4
large graph
Motivating example
Then we only need to do this:
5
small graph
Motivating example
What is a good graph summary?
Let’s learn from examples!
6
Supervised learning
7
What is supervised learning?
x1, y1( ),  x2, y2( ), …,  xn, yn( )
f : x ! y
8
Supervised
Learning
bunch of
examples
function
x1, y1( ),  x2, y2( ), …,  xn, yn( )
9
distribution of examples
Statistical learning theory
P(x,y)
Where do our examples come from?
samples drawn i.i.d.
10
hypothesis space
Statistical learning theory
What functions do we consider?
f ∈ H
linear?
H1
cubic?
H2
piecewise-linear?
H3
11
loss function
Statistical learning theory
How bad is it to predict instead of (true) ?f x( ) y
L f x( ), y( )
L y, !y( )=
0, when y = !y
1, otherwise
"
#
$
Example: zero-one loss
Statistical learning theory
argmin
f ∈H
L( f (x), y)p(x, y)dxdy
X×Y
∫
Goal:
Requirement:
argmin
f ∈H
L( f (xi ), yi )
i=1
n
∑
12
expected loss
on new examples
total loss on
training data
Linear models
w = argmin
w
L( fw x( ), yi )
i=1
n
∑
Inference (prediction):
fw x( )= g( w,ϕ x( ) )
optimization with respect to w
(e.g. gradient descent)
Learning:
13
features of x scalar product
(linear combination,
weighted sum)
H
Binary classification
y ∈ {−1,1}
Prediction: fw x( )= sign( w, x )
14
above or below the
line (hyperplane)?
yi w, xi > 0
Note that:
w, x > 0
w, x < 0
w, x = 0
Perceptron
Learning algorithm
(optimizes one example at a time)
For every xi
if yi w, xi ≤ 0
w ← w + yi xi
•  if made a mistake
•  update the weights
if yi>0 makes wi+1 more like xi
if yi<0 makes wi+1 more like -xi
15
Repeat
Perceptron
Update rule:
wold
16
xi
w ← w + yi xi
Perceptron
wold
17
xi
wnew
Update rule: w ← w + yi xi
Max-margin classification
Idea: ensure some distance form the hyperplane
18
yi w, xi ≥1
Require:
Preference learning
Suppose we want to predict rankings: x ! y =
v1
v2
"
vk
x,vi( )≻ x,vj( )⇔ i < j
19
joint features of x and v
w
ϕ x,v1( )
ϕ x,v2( )
ϕ x,v3( )
ϕ x,v4( )
ϕ x,v5( )
Also works for:
•  selecting just the best one
•  multiclass classification
w,ϕ x,v( )−ϕ x, "v( ) ≥1
Structured prediction
20
Structured prediction
Examples:
•  “Time flies like an arrow.”x =
part-of-speech tagging
•  (noun verb preposition determiner noun)!y =
•  (S (NP (NNP Time))
(VP (VBZ flies)
(PP (IN like)
(NP (DT an)
(NN arrow)))))
y =
parse tree
or
21
Structured prediction
How can we approach this problem?
f x( )= g w,ϕ x( )( )•  before we had:
•  now must be a complex objectf x( )
f x( )= argmax
y
w,ψ x, y( )( )
joint features of x and y
(kind of like we had with ranking)
22
Structured Perceptron
Almost the same as ordinary perceptron
•  For every xi
•  if ˆyi ≠ yi
w ← w +ψ xi, yi( )−ψ xi, ˆyi( )
(if made a mistake)
update the weights
•  predict: ˆyi = argmax
y
w,ψ xi, y( )( )
• 
23
Argmax problem
often infeasible
ˆy = argmax
y
w,ψ x, y( )( )
Prediction:
Examples:
24
•  sequence of length T, with d options for each label: dT
•  a subgraph of size T from a graph G: |G| chose T
•  10-word sentence, 5 parts of speech: ~10 million outputs
•  10-node subgraph of a 300-node graph:
1,398,320,233,231,701,770 outputs (around 1018)
Argmax problem
ˆy = argmax
y
w,ψ x, y( )( )
Prediction:
Learning:
•  even more difficult
•  includes prediction as a subroutine
25
often infeasible
Argmax problem: easy cases
Independent prediction
•  suppose decomposes into
•  and decomposes into
y v1,v2,…,vT( )
ψ x, y( )= ψi x,vi( )
i=1
T
∑
ψ x, y( )
argmax
y
w,ψ x, y( ) =
              argmax
v1
w,ψ1 x,v1( ) ,…,argmax
vn
w,ψn x,vn( )
!
"
#
$
%
&
•  then predictions can be made independently
26
Argmax problem: easy cases
Sequence labeling
•  suppose decomposes into
•  and decomposes into
y v1,v2,…,vT( )
ψ x, y( )= ψi x,vi,vi+1( )
i=1
T−1
∑
ψ x, y( )
•  dynamic programming : O(Td2)
27
•  with ternary features : O(Td3), etc.
•  in general tractable in graphs with bounded treewidth
Approximate argmax
General idea:
•  search
in the space of outputs
Natural generalization:
•  space of partial outputs
•  composing the solution
sequentially
Let’s learn to make good moves!
How do we decide which moves to take?
Most interesting/crazy
idea of this talk
(And we don’t need the original argmax problem anymore)
28
Learning to search
29
Learning to search
Sequential prediction of structured outputs
y = v1,v2,…,vT( )•  decompose the output
π : v1,v2,…,vt( )→ vt+1•  learn the policy
state action
s0
π
! →! s1
π
! →! … π
! →! sT = y
•  apply the policy sequentially
st → vt+1
•  policy can be trained on examples st,vt+1( )
•  preference learning
30
Learning to search
The caveat of sequential prediction
siStates : coordinates of the car
Actions : steering (‘left’, ‘right’)vi+1
Oops!
left!
left!
right!
right!
left!
Problem:
•  errors accumulate
•  training data is not i.i.d.!
Solution:
•  Train on the states produced by our policy!
Chicken-and-egg problem
(solution: iterate)
31
Searn and DAGGER
Searn = “search” + “learn”
•  start from optimal policy;
move away
•  generate new states with current
•  generate actions based on regret
•  train on new state-action pairs
•  interpolate the current policy
πi+1 ← βπi + 1− β( ) #πi+1
policy learnt at ith iteration
regret
32
[1] Hal Daumé III, John Langford, Daniel Marcu. Search-based Structured Prediction. Machine Learning Journal, 2006.
[1]
!πi+1
πi
Searn and DAGGER
DAGGER = “dataset” + “aggregation”
•  start from the ‘ground truth’ dataset,
enrich it with new state-action pairs
•  train a policy on the current dataset
•  use the policy to generate new states
•  generate ‘expert’s actions’ for new states
•  add new state-action pairs to the dataset
expert’s
actions
As in Searn, we’ll eventually be training
on our own-produced states
33
[2] Stephane Ross, Geoffrey Gordon, Drew Bagnell. A reduction of imitation learning and
structured prediction to no-regret online learning. Journal of Machine Learning Research, 2011.
[2]
DAGGER for building the graph summaries
Input: topic graph ,
search results ,
relation
Output: topic summary of size
G V, E( )
R ⊆ V × S
S
GT VT , ET( ) T
A few tricks:
•  Predict only vertices ,VT
•  Require that the summaries be nested:
•  which means
•  hence, the task is to predict the sequence
∅ =V0 ⊂ V1 ⊂…⊂ VT
Vi+1 =Vi + vi+1
v1,v2,…,vT( )
34
DAGGER for building the graph summaries
•  Provide the ‘ground truth’ topic sequences
V,S, R( ),v1,v2,…,vT( ) a single ground truth example
topics (vertices)
documents
(search results)
topic-document
relations
topic sequence
•  Create the dataset D0 = ∪ si,vi+1( ){ }i=0
T
•  train the policy onπi Di
•  apply to the initial states to generate state sequencesπi
s0
s1,s2,…,sT( )
empty summary
intermediate summaries
•  produce the ‘expert action’ for every generated state
•  produce
v*
Di+1 = Di ∪ s,v*
( ){ }
35
DAGGER: producing the ‘expert action’
•  The expert’s action brings us closer
to the ‘ground-truth’ trajectory
expert’s
actions
•  Suppose the ‘ground-truth’ trajectory is
36
s1,s2,…,sT( )
•  And the generated trajectory is
ˆs1, ˆs2,…, ˆsT( )
•  The expert’s action
vi+1
*
= argmin
v
Δ ˆsi ∪ v{ },si+1( )( )
dissimilarity between the states
DAGGER: topic sequence dissimilarity
•  Set-based dissimilarity, e.g. Jaccard distance
•  similarity between topics ?
•  encourages redundancy
•  Sequence-matching based dissimilarity
•  greedy approximation
37
Δ v1,v2,…,vt( ), !v1, !v2,…, !vt( )( )
DAGGER: topic graph features
•  Coverage and diversity
•  [transitive] document coverage
•  [transitive] topic frequency, average and min
•  topic overlap, average and max
•  parent-child overlap, average and max
•  …
38
ψ V,S, R( ), v1,v2,…,vt( )( )
Recap
•  We’ve learnt:
•  … how to do binary classification
•  and implement it in 4 lines of code
•  … about more complex problems
(ranking, and structured prediction)
•  general approach, structured Perceptron
•  argmax problem
•  … that learning and search are two sides of the same coin
•  … how to predict complex structures by building them sequentially
•  Searn and DAGGER
39
Questions?
40
dmirylenka @ disi.unitn.it
Extra slides
41
Support Vector Machine
Idea: large margin between positive and negative examples
Loss function:
L y, f x( )( )=[1− y⋅ f x( )]+
Hinge loss
(solved by constrained convex optimization)
yi w, xi ≥ C
C → max
#
$
%
&%
⇔
yi w, xi ≥1
w → min
#
$
%
&%
42
Structured SVM
correct outputs score higher by a margin
w → min
w,ψ xi, y( ) − w,ψ xi, yi( ) ≥1,  yi ≠ y
w,ψ xi, yi( )−ψ xi, y( ) ≥ Δ yi, y( ),  yi ≠ y
margin depends on dissimilarity
Taking into account (dis)similarity between the outputs
43

More Related Content

What's hot

Lesson 29: Integration by Substition (worksheet solutions)
Lesson 29: Integration by Substition (worksheet solutions)Lesson 29: Integration by Substition (worksheet solutions)
Lesson 29: Integration by Substition (worksheet solutions)
Matthew Leingang
 
Lesson 29: Integration by Substition
Lesson 29: Integration by SubstitionLesson 29: Integration by Substition
Lesson 29: Integration by Substition
Matthew Leingang
 
Lesson 27: Integration by Substitution (slides)
Lesson 27: Integration by Substitution (slides)Lesson 27: Integration by Substitution (slides)
Lesson 27: Integration by Substitution (slides)
Matthew Leingang
 
Lesson 27: Integration by Substitution (Section 4 version)
Lesson 27: Integration by Substitution (Section 4 version)Lesson 27: Integration by Substitution (Section 4 version)
Lesson 27: Integration by Substitution (Section 4 version)
Matthew Leingang
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
Mark Chang
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Big_Data_Ukraine
 
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
The Statistical and Applied Mathematical Sciences Institute
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
Dong Guo
 
Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)
Matthew Leingang
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
Yasuo Tabei
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
Matthew Leingang
 
Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)
Matthew Leingang
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
남주 김
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
Taehoon Kim
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Marina Santini
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
Mark Chang
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
Delta Pi Systems
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
Stéphane Canu
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
Marina Santini
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
Stéphane Canu
 

What's hot (20)

Lesson 29: Integration by Substition (worksheet solutions)
Lesson 29: Integration by Substition (worksheet solutions)Lesson 29: Integration by Substition (worksheet solutions)
Lesson 29: Integration by Substition (worksheet solutions)
 
Lesson 29: Integration by Substition
Lesson 29: Integration by SubstitionLesson 29: Integration by Substition
Lesson 29: Integration by Substition
 
Lesson 27: Integration by Substitution (slides)
Lesson 27: Integration by Substitution (slides)Lesson 27: Integration by Substitution (slides)
Lesson 27: Integration by Substitution (slides)
 
Lesson 27: Integration by Substitution (Section 4 version)
Lesson 27: Integration by Substitution (Section 4 version)Lesson 27: Integration by Substitution (Section 4 version)
Lesson 27: Integration by Substitution (Section 4 version)
 
Domain Adaptation
Domain AdaptationDomain Adaptation
Domain Adaptation
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
2018 MUMS Fall Course - Mathematical surrogate and reduced-order models - Ral...
 
Additive model and boosting tree
Additive model and boosting treeAdditive model and boosting tree
Additive model and boosting tree
 
Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)Lesson 27: Integration by Substitution (Section 10 version)
Lesson 27: Integration by Substitution (Section 10 version)
 
Dmss2011 public
Dmss2011 publicDmss2011 public
Dmss2011 public
 
Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)Lesson 21: Curve Sketching (slides)
Lesson 21: Curve Sketching (slides)
 
Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)Lesson 1: Functions and their representations (slides)
Lesson 1: Functions and their representations (slides)
 
Generative adversarial networks
Generative adversarial networksGenerative adversarial networks
Generative adversarial networks
 
Deep Reasoning
Deep ReasoningDeep Reasoning
Deep Reasoning
 
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear ClassifiersLecture 03: Machine Learning for Language Technology - Linear Classifiers
Lecture 03: Machine Learning for Language Technology - Linear Classifiers
 
Generative Adversarial Networks
Generative Adversarial NetworksGenerative Adversarial Networks
Generative Adversarial Networks
 
Introduction to inverse problems
Introduction to inverse problemsIntroduction to inverse problems
Introduction to inverse problems
 
Lecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primalLecture 1: linear SVM in the primal
Lecture 1: linear SVM in the primal
 
Lecture 5: Structured Prediction
Lecture 5: Structured PredictionLecture 5: Structured Prediction
Lecture 5: Structured Prediction
 
Lecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the DualLecture 2: linear SVM in the Dual
Lecture 2: linear SVM in the Dual
 

Similar to Supervised Prediction of Graph Summaries

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
David Gleich
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
Paris Women in Machine Learning and Data Science
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
Akin Osman Kazakci
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
Anuj Gupta
 
Deep Learning Bangalore meet up
Deep Learning Bangalore meet up Deep Learning Bangalore meet up
Deep Learning Bangalore meet up
Satyam Saxena
 
ppt - Deep Learning From Scratch.pdf
ppt - Deep Learning From Scratch.pdfppt - Deep Learning From Scratch.pdf
ppt - Deep Learning From Scratch.pdf
surefooted
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
Yogendra Singh
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
MLconf
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
HomeworkAssignmentHe
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of Generalization
Hikaru Ibayashi
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
Po-Hsiang (Barnett) Chiu
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
HODIT12
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
MostafaHazemMostafaa
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
Andreas Scheidegger
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
Advanced-Concepts-Team
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
NAVER Engineering
 
PR07.pdf
PR07.pdfPR07.pdf
PR07.pdf
Radhwan2
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
Dongheon Lee
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
Albert Bifet
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
Junghyun Lee
 

Similar to Supervised Prediction of Graph Summaries (20)

Anti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCutAnti-differentiating Approximation Algorithms: PageRank and MinCut
Anti-differentiating Approximation Algorithms: PageRank and MinCut
 
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof..."Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
"Let us talk about output features! by Florence d’Alché-Buc, LTCI & Full Prof...
 
Introduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theoristIntroduction to search and optimisation for the design theorist
Introduction to search and optimisation for the design theorist
 
DLBLR talk
DLBLR talkDLBLR talk
DLBLR talk
 
Deep Learning Bangalore meet up
Deep Learning Bangalore meet up Deep Learning Bangalore meet up
Deep Learning Bangalore meet up
 
ppt - Deep Learning From Scratch.pdf
ppt - Deep Learning From Scratch.pdfppt - Deep Learning From Scratch.pdf
ppt - Deep Learning From Scratch.pdf
 
L1 intro2 supervised_learning
L1 intro2 supervised_learningL1 intro2 supervised_learning
L1 intro2 supervised_learning
 
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
Animashree Anandkumar, Electrical Engineering and CS Dept, UC Irvine at MLcon...
 
Probability statistics assignment help
Probability statistics assignment helpProbability statistics assignment help
Probability statistics assignment help
 
Rethinking of Generalization
Rethinking of GeneralizationRethinking of Generalization
Rethinking of Generalization
 
Generalized Reinforcement Learning
Generalized Reinforcement LearningGeneralized Reinforcement Learning
Generalized Reinforcement Learning
 
Machine Learning ebook.pdf
Machine Learning ebook.pdfMachine Learning ebook.pdf
Machine Learning ebook.pdf
 
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 11_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
1_5_AI_edx_ml_51intro_240204_104838machine learning lecture 1
 
Formulation of model likelihood functions
Formulation of model likelihood functionsFormulation of model likelihood functions
Formulation of model likelihood functions
 
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
2024.03.22 - Mike Heddes - Introduction to Hyperdimensional Computing.pdf
 
Efficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representationsEfficient end-to-end learning for quantizable representations
Efficient end-to-end learning for quantizable representations
 
PR07.pdf
PR07.pdfPR07.pdf
PR07.pdf
 
Deep Learning for AI (2)
Deep Learning for AI (2)Deep Learning for AI (2)
Deep Learning for AI (2)
 
Introduction to Big Data Science
Introduction to Big Data ScienceIntroduction to Big Data Science
Introduction to Big Data Science
 
block-mdp-masters-defense.pdf
block-mdp-masters-defense.pdfblock-mdp-masters-defense.pdf
block-mdp-masters-defense.pdf
 

Recently uploaded

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
Sérgio Sacani
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
PRIYANKA PATEL
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
Aditi Bajpai
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
İsa Badur
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
Gokturk Mehmet Dilci
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
MAGOTI ERNEST
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
kejapriya1
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
muralinath2
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
KrushnaDarade1
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
pablovgd
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
University of Maribor
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
University of Hertfordshire
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
IshaGoswami9
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
Anagha Prasad
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
University of Maribor
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
by6843629
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Texas Alliance of Groundwater Districts
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Ana Luísa Pinho
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
TinyAnderson
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
Sharon Liu
 

Recently uploaded (20)

The binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defectsThe binding of cosmological structures by massless topological defects
The binding of cosmological structures by massless topological defects
 
ESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptxESR spectroscopy in liquid food and beverages.pptx
ESR spectroscopy in liquid food and beverages.pptx
 
Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.Micronuclei test.M.sc.zoology.fisheries.
Micronuclei test.M.sc.zoology.fisheries.
 
aziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobelaziz sancar nobel prize winner: from mardin to nobel
aziz sancar nobel prize winner: from mardin to nobel
 
Shallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptxShallowest Oil Discovery of Turkiye.pptx
Shallowest Oil Discovery of Turkiye.pptx
 
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptxThe use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
The use of Nauplii and metanauplii artemia in aquaculture (brine shrimp).pptx
 
bordetella pertussis.................................ppt
bordetella pertussis.................................pptbordetella pertussis.................................ppt
bordetella pertussis.................................ppt
 
Oedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptxOedema_types_causes_pathophysiology.pptx
Oedema_types_causes_pathophysiology.pptx
 
SAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdfSAR of Medicinal Chemistry 1st by dk.pdf
SAR of Medicinal Chemistry 1st by dk.pdf
 
NuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyerNuGOweek 2024 Ghent programme overview flyer
NuGOweek 2024 Ghent programme overview flyer
 
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
Comparing Evolved Extractive Text Summary Scores of Bidirectional Encoder Rep...
 
Applied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdfApplied Science: Thermodynamics, Laws & Methodology.pdf
Applied Science: Thermodynamics, Laws & Methodology.pdf
 
Phenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvementPhenomics assisted breeding in crop improvement
Phenomics assisted breeding in crop improvement
 
molar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptxmolar-distalization in orthodontics-seminar.pptx
molar-distalization in orthodontics-seminar.pptx
 
Randomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNERandomised Optimisation Algorithms in DAPHNE
Randomised Optimisation Algorithms in DAPHNE
 
8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf8.Isolation of pure cultures and preservation of cultures.pdf
8.Isolation of pure cultures and preservation of cultures.pdf
 
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero WaterSharlene Leurig - Enabling Onsite Water Use with Net Zero Water
Sharlene Leurig - Enabling Onsite Water Use with Net Zero Water
 
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...
 
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdfTopic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
Topic: SICKLE CELL DISEASE IN CHILDREN-3.pdf
 
20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx20240520 Planning a Circuit Simulator in JavaScript.pptx
20240520 Planning a Circuit Simulator in JavaScript.pptx
 

Supervised Prediction of Graph Summaries

  • 1. SUPERVISED PREDICTION OF GRAPH SUMMARIES Daniil Mirylenka University of Trento, Italy 1
  • 2. Outline • Motivating example (from my Ph.D. research) • Supervised learning •  Binary classification •  Perceptron •  Ranking, Multiclass • Structured output prediction •  General approach, Structured Perceptron •  “Easy” cases • Prediction as search •  Searn, DAGGER •  Back to the motivating example 2
  • 3. Motivating example Representing academic search results 3 graph summary search results
  • 4. Motivating example Suppose we can do this: 4 large graph
  • 5. Motivating example Then we only need to do this: 5 small graph
  • 6. Motivating example What is a good graph summary? Let’s learn from examples! 6
  • 8. What is supervised learning? x1, y1( ),  x2, y2( ), …,  xn, yn( ) f : x ! y 8 Supervised Learning bunch of examples function
  • 9. x1, y1( ),  x2, y2( ), …,  xn, yn( ) 9 distribution of examples Statistical learning theory P(x,y) Where do our examples come from? samples drawn i.i.d.
  • 10. 10 hypothesis space Statistical learning theory What functions do we consider? f ∈ H linear? H1 cubic? H2 piecewise-linear? H3
  • 11. 11 loss function Statistical learning theory How bad is it to predict instead of (true) ?f x( ) y L f x( ), y( ) L y, !y( )= 0, when y = !y 1, otherwise " # $ Example: zero-one loss
  • 12. Statistical learning theory argmin f ∈H L( f (x), y)p(x, y)dxdy X×Y ∫ Goal: Requirement: argmin f ∈H L( f (xi ), yi ) i=1 n ∑ 12 expected loss on new examples total loss on training data
  • 13. Linear models w = argmin w L( fw x( ), yi ) i=1 n ∑ Inference (prediction): fw x( )= g( w,ϕ x( ) ) optimization with respect to w (e.g. gradient descent) Learning: 13 features of x scalar product (linear combination, weighted sum) H
  • 14. Binary classification y ∈ {−1,1} Prediction: fw x( )= sign( w, x ) 14 above or below the line (hyperplane)? yi w, xi > 0 Note that: w, x > 0 w, x < 0 w, x = 0
  • 15. Perceptron Learning algorithm (optimizes one example at a time) For every xi if yi w, xi ≤ 0 w ← w + yi xi •  if made a mistake •  update the weights if yi>0 makes wi+1 more like xi if yi<0 makes wi+1 more like -xi 15 Repeat
  • 18. Max-margin classification Idea: ensure some distance form the hyperplane 18 yi w, xi ≥1 Require:
  • 19. Preference learning Suppose we want to predict rankings: x ! y = v1 v2 " vk x,vi( )≻ x,vj( )⇔ i < j 19 joint features of x and v w ϕ x,v1( ) ϕ x,v2( ) ϕ x,v3( ) ϕ x,v4( ) ϕ x,v5( ) Also works for: •  selecting just the best one •  multiclass classification w,ϕ x,v( )−ϕ x, "v( ) ≥1
  • 21. Structured prediction Examples: •  “Time flies like an arrow.”x = part-of-speech tagging •  (noun verb preposition determiner noun)!y = •  (S (NP (NNP Time)) (VP (VBZ flies) (PP (IN like) (NP (DT an) (NN arrow))))) y = parse tree or 21
  • 22. Structured prediction How can we approach this problem? f x( )= g w,ϕ x( )( )•  before we had: •  now must be a complex objectf x( ) f x( )= argmax y w,ψ x, y( )( ) joint features of x and y (kind of like we had with ranking) 22
  • 23. Structured Perceptron Almost the same as ordinary perceptron •  For every xi •  if ˆyi ≠ yi w ← w +ψ xi, yi( )−ψ xi, ˆyi( ) (if made a mistake) update the weights •  predict: ˆyi = argmax y w,ψ xi, y( )( ) •  23
  • 24. Argmax problem often infeasible ˆy = argmax y w,ψ x, y( )( ) Prediction: Examples: 24 •  sequence of length T, with d options for each label: dT •  a subgraph of size T from a graph G: |G| chose T •  10-word sentence, 5 parts of speech: ~10 million outputs •  10-node subgraph of a 300-node graph: 1,398,320,233,231,701,770 outputs (around 1018)
  • 25. Argmax problem ˆy = argmax y w,ψ x, y( )( ) Prediction: Learning: •  even more difficult •  includes prediction as a subroutine 25 often infeasible
  • 26. Argmax problem: easy cases Independent prediction •  suppose decomposes into •  and decomposes into y v1,v2,…,vT( ) ψ x, y( )= ψi x,vi( ) i=1 T ∑ ψ x, y( ) argmax y w,ψ x, y( ) =               argmax v1 w,ψ1 x,v1( ) ,…,argmax vn w,ψn x,vn( ) ! " # $ % & •  then predictions can be made independently 26
  • 27. Argmax problem: easy cases Sequence labeling •  suppose decomposes into •  and decomposes into y v1,v2,…,vT( ) ψ x, y( )= ψi x,vi,vi+1( ) i=1 T−1 ∑ ψ x, y( ) •  dynamic programming : O(Td2) 27 •  with ternary features : O(Td3), etc. •  in general tractable in graphs with bounded treewidth
  • 28. Approximate argmax General idea: •  search in the space of outputs Natural generalization: •  space of partial outputs •  composing the solution sequentially Let’s learn to make good moves! How do we decide which moves to take? Most interesting/crazy idea of this talk (And we don’t need the original argmax problem anymore) 28
  • 30. Learning to search Sequential prediction of structured outputs y = v1,v2,…,vT( )•  decompose the output π : v1,v2,…,vt( )→ vt+1•  learn the policy state action s0 π ! →! s1 π ! →! … π ! →! sT = y •  apply the policy sequentially st → vt+1 •  policy can be trained on examples st,vt+1( ) •  preference learning 30
  • 31. Learning to search The caveat of sequential prediction siStates : coordinates of the car Actions : steering (‘left’, ‘right’)vi+1 Oops! left! left! right! right! left! Problem: •  errors accumulate •  training data is not i.i.d.! Solution: •  Train on the states produced by our policy! Chicken-and-egg problem (solution: iterate) 31
  • 32. Searn and DAGGER Searn = “search” + “learn” •  start from optimal policy; move away •  generate new states with current •  generate actions based on regret •  train on new state-action pairs •  interpolate the current policy πi+1 ← βπi + 1− β( ) #πi+1 policy learnt at ith iteration regret 32 [1] Hal Daumé III, John Langford, Daniel Marcu. Search-based Structured Prediction. Machine Learning Journal, 2006. [1] !πi+1 πi
  • 33. Searn and DAGGER DAGGER = “dataset” + “aggregation” •  start from the ‘ground truth’ dataset, enrich it with new state-action pairs •  train a policy on the current dataset •  use the policy to generate new states •  generate ‘expert’s actions’ for new states •  add new state-action pairs to the dataset expert’s actions As in Searn, we’ll eventually be training on our own-produced states 33 [2] Stephane Ross, Geoffrey Gordon, Drew Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research, 2011. [2]
  • 34. DAGGER for building the graph summaries Input: topic graph , search results , relation Output: topic summary of size G V, E( ) R ⊆ V × S S GT VT , ET( ) T A few tricks: •  Predict only vertices ,VT •  Require that the summaries be nested: •  which means •  hence, the task is to predict the sequence ∅ =V0 ⊂ V1 ⊂…⊂ VT Vi+1 =Vi + vi+1 v1,v2,…,vT( ) 34
  • 35. DAGGER for building the graph summaries •  Provide the ‘ground truth’ topic sequences V,S, R( ),v1,v2,…,vT( ) a single ground truth example topics (vertices) documents (search results) topic-document relations topic sequence •  Create the dataset D0 = ∪ si,vi+1( ){ }i=0 T •  train the policy onπi Di •  apply to the initial states to generate state sequencesπi s0 s1,s2,…,sT( ) empty summary intermediate summaries •  produce the ‘expert action’ for every generated state •  produce v* Di+1 = Di ∪ s,v* ( ){ } 35
  • 36. DAGGER: producing the ‘expert action’ •  The expert’s action brings us closer to the ‘ground-truth’ trajectory expert’s actions •  Suppose the ‘ground-truth’ trajectory is 36 s1,s2,…,sT( ) •  And the generated trajectory is ˆs1, ˆs2,…, ˆsT( ) •  The expert’s action vi+1 * = argmin v Δ ˆsi ∪ v{ },si+1( )( ) dissimilarity between the states
  • 37. DAGGER: topic sequence dissimilarity •  Set-based dissimilarity, e.g. Jaccard distance •  similarity between topics ? •  encourages redundancy •  Sequence-matching based dissimilarity •  greedy approximation 37 Δ v1,v2,…,vt( ), !v1, !v2,…, !vt( )( )
  • 38. DAGGER: topic graph features •  Coverage and diversity •  [transitive] document coverage •  [transitive] topic frequency, average and min •  topic overlap, average and max •  parent-child overlap, average and max •  … 38 ψ V,S, R( ), v1,v2,…,vt( )( )
  • 39. Recap •  We’ve learnt: •  … how to do binary classification •  and implement it in 4 lines of code •  … about more complex problems (ranking, and structured prediction) •  general approach, structured Perceptron •  argmax problem •  … that learning and search are two sides of the same coin •  … how to predict complex structures by building them sequentially •  Searn and DAGGER 39
  • 42. Support Vector Machine Idea: large margin between positive and negative examples Loss function: L y, f x( )( )=[1− y⋅ f x( )]+ Hinge loss (solved by constrained convex optimization) yi w, xi ≥ C C → max # $ % &% ⇔ yi w, xi ≥1 w → min # $ % &% 42
  • 43. Structured SVM correct outputs score higher by a margin w → min w,ψ xi, y( ) − w,ψ xi, yi( ) ≥1,  yi ≠ y w,ψ xi, yi( )−ψ xi, y( ) ≥ Δ yi, y( ),  yi ≠ y margin depends on dissimilarity Taking into account (dis)similarity between the outputs 43