Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Learning to Grow Structured Visual Summaries
for Document Collections
Daniil Mirylenka Andrea Passerini
University of Tren...
Problem: informative representation of documents
Application: academic search
Input: document collection Output: topic map...
Our approach:
Building and summarizing the topic graph
⇒ ⇒
Building the topic graph:
Overview
1. Map documents to Wikipedia articles
2. Retrieve the parent categories
3. Link catego...
Building the topic graph:
Mapping the document to Wikipedia articles
“..we propose a method of summarizing collections
of ...
Building the topic graph:
Retrieving the parent categories
⇓
Building the topic graph:
Linking the categories
⇓
Building the topic graph:
Merging similar topics
⇓
Building the topic graph:
Breaking the cycles
⇓
Building the topic graph:
Example of an actual topic graph built from 100 abstracts
Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
What is a good summary?
- ???
Summarizing the topic graph
Reflection
⇒
What is a summary?
- a set of nodes (topics).
What is a good summary?
- ???
Let’s ...
Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
Problem: evaluation on |G|
T...
Summarizing the topic graph
The first attempt
Structured prediction
ˆGT = arg max
GT
F(G, GT )
Problem: evaluation on |G|
T...
Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
Now we can build sum...
Summarizing the topic graph
Key idea
Restriction: summaries should be nested
∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
Now we can build sum...
Learning to grow summaries
as imitation learning
Imitation learning (racing analogy)
destination: finish
sequence of states...
Learning to grow summaries
as imitation learning
Imitation learning (racing analogy)
destination: finish
sequence of states...
Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the...
Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the...
Learning to grow summaries
How can we do that?
Straightforward approach
Choose a classifier π : (G, Gt−1) → vt
Train on the...
Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation le...
Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation le...
Learning to grow summaries
DAgger (dataset aggregation)
S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation le...
Learning to grow summaries
Collecting the actions
DAgger (dataset aggregation)
iterating, we collect states
but we also ne...
Learning to grow summaries
Collecting the actions
DAgger (dataset aggregation)
iterating, we collect states
but we also ne...
Learning to grow summaries
Recap of the algorithm
The algorithm
‘ground truth’ dataset: points
(state, action)
train π on ...
Learning to grow summaries
Training the classifier
Classifier
π : (G, Gt−1) → vt
Scoring function
F(G, Gt−1, vt) = w, Ψ (G, ...
Learning to grow summaries
Providing the expert’s actions
Expert’s action
brings us closest to the optimal trajectory
Tech...
Learning grow summaries
Graph features
Some of the features:
document coverage
transitive document coverage
average and ma...
Initial experiments
Evaluation
Microsoft Academic Search
10 manually annotated queries
leave-one-out cross-validation
gree...
Thank You
Thank You!
Questions?
Daniil Mirylenka
dmirylenka@disi.unitn.it
Upcoming SlideShare
Loading in …5
×

of

Learning to Grow Structured Visual Summaries for Document Collections Slide 1 Learning to Grow Structured Visual Summaries for Document Collections Slide 2 Learning to Grow Structured Visual Summaries for Document Collections Slide 3 Learning to Grow Structured Visual Summaries for Document Collections Slide 4 Learning to Grow Structured Visual Summaries for Document Collections Slide 5 Learning to Grow Structured Visual Summaries for Document Collections Slide 6 Learning to Grow Structured Visual Summaries for Document Collections Slide 7 Learning to Grow Structured Visual Summaries for Document Collections Slide 8 Learning to Grow Structured Visual Summaries for Document Collections Slide 9 Learning to Grow Structured Visual Summaries for Document Collections Slide 10 Learning to Grow Structured Visual Summaries for Document Collections Slide 11 Learning to Grow Structured Visual Summaries for Document Collections Slide 12 Learning to Grow Structured Visual Summaries for Document Collections Slide 13 Learning to Grow Structured Visual Summaries for Document Collections Slide 14 Learning to Grow Structured Visual Summaries for Document Collections Slide 15 Learning to Grow Structured Visual Summaries for Document Collections Slide 16 Learning to Grow Structured Visual Summaries for Document Collections Slide 17 Learning to Grow Structured Visual Summaries for Document Collections Slide 18 Learning to Grow Structured Visual Summaries for Document Collections Slide 19 Learning to Grow Structured Visual Summaries for Document Collections Slide 20 Learning to Grow Structured Visual Summaries for Document Collections Slide 21 Learning to Grow Structured Visual Summaries for Document Collections Slide 22 Learning to Grow Structured Visual Summaries for Document Collections Slide 23 Learning to Grow Structured Visual Summaries for Document Collections Slide 24 Learning to Grow Structured Visual Summaries for Document Collections Slide 25 Learning to Grow Structured Visual Summaries for Document Collections Slide 26 Learning to Grow Structured Visual Summaries for Document Collections Slide 27 Learning to Grow Structured Visual Summaries for Document Collections Slide 28 Learning to Grow Structured Visual Summaries for Document Collections Slide 29 Learning to Grow Structured Visual Summaries for Document Collections Slide 30 Learning to Grow Structured Visual Summaries for Document Collections Slide 31 Learning to Grow Structured Visual Summaries for Document Collections Slide 32 Learning to Grow Structured Visual Summaries for Document Collections Slide 33 Learning to Grow Structured Visual Summaries for Document Collections Slide 34 Learning to Grow Structured Visual Summaries for Document Collections Slide 35
Upcoming SlideShare
Леммл т. Ccna. cisco certified network associate. учебное руководство (2 е издание, 2002)
Next
Download to read offline and view in fullscreen.

0 Likes

Share

Download to read offline

Learning to Grow Structured Visual Summaries for Document Collections

Download to read offline

Related Books

Free with a 30 day trial from Scribd

See all

Related Audiobooks

Free with a 30 day trial from Scribd

See all
  • Be the first to like this

Learning to Grow Structured Visual Summaries for Document Collections

  1. 1. Learning to Grow Structured Visual Summaries for Document Collections Daniil Mirylenka Andrea Passerini University of Trento, Italy Machine learning seminar, Waikato University, 2013
  2. 2. Problem: informative representation of documents Application: academic search Input: document collection Output: topic map ⇒
  3. 3. Our approach: Building and summarizing the topic graph ⇒ ⇒
  4. 4. Building the topic graph: Overview 1. Map documents to Wikipedia articles 2. Retrieve the parent categories 3. Link categories to each other 4. Merge similar topics 5. Break cycles in the graph
  5. 5. Building the topic graph: Mapping the document to Wikipedia articles “..we propose a method of summarizing collections of documents with concise topic hierarchies, and show how it can be applied to visualization and browsing of academic search results.” ⇓ “..we propose a method summarizing collections of documents with concise [[Topic (linguistics) |topic]] [[Hierarchy |hierarchies]], and show how it can be applied to [[Visualization (computer graphics) |visualization]] and [[Web browser |browsing]] of [[List of academic databases and search engines |academic search]] results.”
  6. 6. Building the topic graph: Retrieving the parent categories ⇓
  7. 7. Building the topic graph: Linking the categories ⇓
  8. 8. Building the topic graph: Merging similar topics ⇓
  9. 9. Building the topic graph: Breaking the cycles ⇓
  10. 10. Building the topic graph: Example of an actual topic graph built from 100 abstracts
  11. 11. Summarizing the topic graph Reflection ⇒ What is a summary? - a set of nodes (topics).
  12. 12. Summarizing the topic graph Reflection ⇒ What is a summary? - a set of nodes (topics). What is a good summary? - ???
  13. 13. Summarizing the topic graph Reflection ⇒ What is a summary? - a set of nodes (topics). What is a good summary? - ??? Let’s learn from examples! - subjective
  14. 14. Summarizing the topic graph The first attempt Structured prediction ˆGT = arg max GT F(G, GT )
  15. 15. Summarizing the topic graph The first attempt Structured prediction ˆGT = arg max GT F(G, GT ) Problem: evaluation on |G| T subgraphs - Example: 300-node topic graph 10-node summary
  16. 16. Summarizing the topic graph The first attempt Structured prediction ˆGT = arg max GT F(G, GT ) Problem: evaluation on |G| T subgraphs - Example: 300-node topic graph 10-node summary 1 398 320 233 241 701 770 possible subgraphs (1 million graphs per second ⇒ 44 311 years)
  17. 17. Summarizing the topic graph Key idea Restriction: summaries should be nested ∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT
  18. 18. Summarizing the topic graph Key idea Restriction: summaries should be nested ∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT Now we can build summaries sequentially Gt = Gt−1 ∪ {vt}
  19. 19. Summarizing the topic graph Key idea Restriction: summaries should be nested ∅ = G0 ⊂ G1 ⊂ · · · ⊂ GT Now we can build summaries sequentially Gt = Gt−1 ∪ {vt} Still a supervised learning problem - training data: summary sequences (G, G1, G2, · · · , GT ) - or topic sequences: (G, v1, v2, · · · , vT )
  20. 20. Learning to grow summaries as imitation learning Imitation learning (racing analogy) destination: finish sequence of states driver’s actions (steering, etc.) goal: copy the behaviour Supervised Trai Expert Trajectories Learned Policy: aˆsup  (borrowed from the presentation of Stephane Ross)
  21. 21. Learning to grow summaries as imitation learning Imitation learning (racing analogy) destination: finish sequence of states driver’s actions (steering, etc.) goal: copy the behaviour Supervised Trai Expert Trajectories Learned Policy: aˆsup  (borrowed from the presentation of Stephane Ross) Our problem destination: summary GT states: intermediate summaries G0, G1, · · · , GT−1 actions: topics v1, v2, · · · , vT added to the summaries goal: copy the behaviour
  22. 22. Learning to grow summaries How can we do that? Straightforward approach Choose a classifier π : (G, Gt−1) → vt Train on the ‘ground truth’ examples ((G, Gt−1), vt) Sequentially apply on the new graphs ∅ = ˆG0 π(G,.) → ˆG1 π(G,.) → · · · π(G,.) → ˆGT
  23. 23. Learning to grow summaries How can we do that? Straightforward approach Choose a classifier π : (G, Gt−1) → vt Train on the ‘ground truth’ examples ((G, Gt−1), vt) Sequentially apply on the new graphs ∅ = ˆG0 π(G,.) → ˆG1 π(G,.) → · · · π(G,.) → ˆGT Will it work?
  24. 24. Learning to grow summaries How can we do that? Straightforward approach Choose a classifier π : (G, Gt−1) → vt Train on the ‘ground truth’ examples ((G, Gt−1), vt) Sequentially apply on the new graphs ∅ = ˆG0 π(G,.) → ˆG1 π(G,.) → · · · π(G,.) → ˆGT Will it work? No. (unable to recover from mistakes)
  25. 25. Learning to grow summaries DAgger (dataset aggregation) S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011. Idea: train on the states we are going to encounter (our own-generated states)
  26. 26. Learning to grow summaries DAgger (dataset aggregation) S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011. Idea: train on the states we are going to encounter (our own-generated states) How can we do that? We haven’t trained the classifier yet!
  27. 27. Learning to grow summaries DAgger (dataset aggregation) S. Ross, G. J. Gordon, and D. Bagnell. A reduction of imitation learning and structured prediction to no-regret online learning. Journal of Machine Learning Research - Proceedings Track, 15:627635, 2011. Idea: train on the states we are going to encounter (our own-generated states) How can we do that? We haven’t trained the classifier yet! We will do it iteratively (for i = 0, 1,) train the classifier πi on the dataset Di generate the trajectories using πi add new states to the dataset Di+1
  28. 28. Learning to grow summaries Collecting the actions DAgger (dataset aggregation) iterating, we collect states but we also need actions
  29. 29. Learning to grow summaries Collecting the actions DAgger (dataset aggregation) iterating, we collect states but we also need actions “Let the expert steer” Q: What action is optimal? A: One that brings us closest to the optimal trajectory. DAgger: Dataset Aggregation • Collect new trajectories with 1 1 14 Steering from expert (borrowed from the presentation of Stephane Ross)
  30. 30. Learning to grow summaries Recap of the algorithm The algorithm ‘ground truth’ dataset: points (state, action) train π on the ‘ground truth’ dataset apply π to the initial states - generate the trajectories generate expert’s actions add new state-action pairs to the dataset repeat DAgger: Dataset Aggregation • Collect new trajectories with 1 1 14 Steering from expert (borrowed from the presentation of Stephane Ross)
  31. 31. Learning to grow summaries Training the classifier Classifier π : (G, Gt−1) → vt Scoring function F(G, Gt−1, vt) = w, Ψ (G, Gt−1, vt) Prediction vt = arg maxv F(G, Gt−1, v) Learning: SVMstruct - ensures that optimal topics score best
  32. 32. Learning to grow summaries Providing the expert’s actions Expert’s action brings us closest to the optimal trajectory Technically by minimizing the loss function vt = arg min v G (Gt−1 ∪ {v}, Gopt t ) Loss functions graphs as topic sets ⇒ redundancy key: consider similarity between the topics
  33. 33. Learning grow summaries Graph features Some of the features: document coverage transitive document coverage average and max. overlap between topics average and max. parent-child overlap the height of the graph the number of connected components ...
  34. 34. Initial experiments Evaluation Microsoft Academic Search 10 manually annotated queries leave-one-out cross-validation greedy coverage baseline spectral clustering-based method based on U. Scaiella, P. Ferragina, A. Marino, and M. Ciaramita. Topical clustering of search results. WSDM 2012. Notes small number of points unique task ⇒ no established datasets no appropriate competitor approaches q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q 1 2 3 4 5 6 7 8 0.20.30.40.50.60.70.8 Number n of predicted topics match@n q q q q q GreedyCov LSG our method: 1st iteration our method, iterations 2−9 our method, 10th iteration
  35. 35. Thank You Thank You! Questions? Daniil Mirylenka dmirylenka@disi.unitn.it

Views

Total views

279

On Slideshare

0

From embeds

0

Number of embeds

3

Actions

Downloads

3

Shares

0

Comments

0

Likes

0

×