SlideShare a Scribd company logo
1 of 103
Natural Language
Summarization of Text and
Videos using Topic Models
Pradipto Das
PhD Dissertation Defense
CSE Department, SUNY at Buffalo
Rohini K. Srihari Sargur N. Srihari Aidong Zhang
Professor and Committee Chair Distinguished Professor Professor and Chair
CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo
Download this presentation from http://bit.ly/pdasthesispptx or http://bit.ly/pdasthesispptxpdf
Primary committee members
Using Tag-Topic Models and
Rhetorical Structure Trees
to Generate Bulleted List
Summaries[journal
submission]
The Road Ahead (modulo presenter)
Discovering Voter Preferences using Mixtures
of Topic Models [AND Wkshp 2009]
Simultaneous Joint and
Conditional Modeling of
documents Tagged from Two
Perspectives [CIKM 2011]
A Thousand Frames in just a Few
Words: Lingual Descriptions of Videos
through Latent Topic Models and
Sparse Object Stitching [CVPR 2013]
Translating Related Words
to Videos and Back through
Latent Topics [WSDM 2013]
Introduction
to LDA
Learning to
Summarize using
Coherence [NIPS
Wkshp 2009]
• Stay hungry
• Stay foolish
The answers are coming within the
next 60-75 minutes.. so..
Steve Jobs: Stanford Commencement
Speech, 2005
there is great food,
green tea and coffee
at the back!
But if you stay hungry I will happily
grab the leftovers!
Contributions of this thesis
We can explore our data, extrapolate from our data and
use context to guide decisions about new information
Can we find topics from a corpus without human
intervention? Can we use these topics to annotate
documents and use annotations to organize, summarize
and search text? Well, yes, LDA does that for us! That is so 2003!
 Well, can LDA model documents tagged from at least
two different viewpoints or perspectives? No!
 Can we do that after reading this thesis? Yes we can!
 Can we generate bulleted lists from multiple
documents after reading this thesis? Yes we can!
 Can we go further and translate videos into text and
vice versa after reading this thesis? Yes we can!
Bottomline:
http://www.cs.princeton.edu/~blei/kdd-tutorial.pdf
DavidBlei’stalkatKDD2012
DavidBlei’stalkatICML2012
• Unsupervised topic exploration using LDA
– Full text of first 50 patents from uspto.gov using search
keywords of “rocket” & full text of 50 scientific papers from
American journal of Aerospace Engineering
– Vocabulary size: 10102 words; Total word count: 219568
Theme 1 Theme 2 Theme 3 Theme 4 Theme 5
insulation fuel launch rocket system
composition matter mission assembly fuel
fiber A-B space nozzle engine
system engineer system surface combustion
sensor tower vehicle portion propulsion
fire magnetic earth ring pump
water electron orbit motor oxidizeTopic from
patent
documents
Topic from
journal
papers
Topic from
patent
documents
Topic from
journal
papers
Topic from
journal
papers
Explore and extrapolate from context
Power of LDA: Language independence
Topic Translation Topic Translation Topic Translation
,
, ,
,
,
,
,
Tsunami,
earthquake,
Chile,
Pichilemu,
gone,
warning ,
news, city
,
, ,
,
,
, ,
,
,
flight,
Air, France,
Brazil,
A, 447,
disappear,
ocean
France
,
,
,
, ,
,
,
China,
Olympic,
Beijing,
Gore,
function,
stadium,
games
Topic Translation Topic Translation Topic Translation
,
,
xx->xx,
,
,
,
,
:xx->xx
Tsunami,
earthquake,
earthquake:x
x->xx, city,
local, UTC,
Mayor,
Tsunami:xx-
>xx
xx-
>xx,
xx->xx,
xx->xx,
Brazil, A,
disappeared,
search, flight,
aircraft:xx-
>xx, ocean,
ship:xx->xx,
air:xx->xx,
air, space
xx->xx,
xx-
>xx, xx-
>xx,
xx-
>xx,
China,
Olympic,
China:xx->xx,
Olympic:xx-
>xx, Gore:xx-
>xx, Gore,
gold,
Beijing:xx-
>xx, National
TopicsoverwordsTopicsovercontrolledvocabulary
How does LDA look at documents?
A boring view
of Wikipedia
What about other perspectives?
Words
forming
other Wiki
articles
Article
specific
content
words
Words forming
section titles
An exciting
view of
Wikipedia
Insulation,
composition, fiber
system, sensor,
fire, water
Fuel, matter, A-B
Engineer, tower
magnetic, electron
Rocket, assembly,
Nozzle, surface,
Portion, ring,
motor
Launch, mission,
Space, system,
Vehicle, earth
orbit
We are identifying the
landscape from within the
landscape – similar to
finding the map of a maze
from within the maze!
Fuel, matter, A-B
Engineer, tower
magnetic, electron
Explore and extrapolate from context
Mostly from
premier topic
model research
groups
Year I
joined
UB
Today!
Success of LDA: a Generative Model
August
Success of LDA
• Fitting themes to an UNSEEN patent document on insulating a
rocket motor using basalt fibers, nanoclay compositions etc.
Theme 1 Theme 2 Theme 3 Theme 4 Theme 5
insulation fuel launch rocket system
composition matter mission assembly fuel
fiber A-B space nozzle engine
system engineer system surface combustion
sensor tower vehicle portion propulsion
fire magnetic earth ring pump
water electron orbit motor oxidize
“What is claimed is:
1. An insulation composition comprising: a polymer comprising at least one
of a nitrile butadiene rubber and polybenzimidazole fibers; basalt fibers
having a diameter that is at least 5 .mu.m
2. (lots more) …”
Topic from
patent
documents
Topic from
journal
papers
Topic from
patent
documents
Topic from
journal
papers
Topic from
journal
papers
K-Means
Hierarchical
Clustering
LDA: VB
LDA: Gibbs
Dynamic
LDA
MMLDA
Corr-LDA
Hierarchi
cal LDA
Markov
LDA
Syntactic
LDA
Suffix
Tree LDA
TagLDA
Corr-
METag2LDA
Corr-
MMG
LDA
Model Complexities (modulo presenter)
GMM
Model Complexities (modulo presenter)
K-Means
GMM
Hierarchical
Clustering
LDA: VB
Dynamic
LDA
MMLDA
Corr-LDA
Hierarchi
cal LDA
Markov
LDA
Syntactic
LDA
Suffix
Tree LDA
TagLDA
Corr-
METag2LDA
Corr-
MMG
LDA
Hair Loss
LDA: Gibbs
Why do we want to explore?
Master Yoda, how do I find wisdom
from so many things happening
around us?
Go to the center of the data and
find your wisdom you will
parkour perform traceur area flip footage jump park
urban run outdoor outdoors kid group pedestrian
playground
lobster burger dress celery Christmas wrap roll mix
tarragon steam season scratch stick live water lemon
garlic
floor parkour wall jump handrail locker contestant
school run interview block slide indoor perform build
tab duck
make dog sandwich man outdoors guy bench black
sit park white disgustingly toe cough feed rub
contest parody
Can you find your wisdom?
Corr-
MMGLDA
Corr-
MMGLDA
parkour perform traceur area flip footage jump park
urban run outdoor outdoors kid group pedestrian
lobster burger dress celery Christmas wrap roll mix
tarragon steam season scratch stick live water lemon
floor parkour wall jump handrail locker contestant
school run interview block slide indoor perform build
tab duck
make dog sandwich man outdoors guy bench black
sit park white disgustingly toe cough feed rub
contest parody
tutorial: man explains how to make lobster rolls from scratch
One guy is making sandwich outdoors
montage of guys free running up
a tree and through the woods
interview with parkour contestants
Kid does parkour around the park
Footage of group of performing parkour outdoors
A family holds a strange burger assembly
and wrapping contest at Christmas
Actualground-truthsynopsesoverlaid
Man performs parkour in various locations
Are these what you were thinking?
1 10 11 12 13 142 3 4 5 6 7 8 9
• No ground truth label assignments are known
The Classical Partitioning Problem
1 10 11 12 13 142 3 4 5 6 7 8 9
• Then, select the one with the lowest loss; for example the one
shown – blue = +1, red = -1
• But we don’t really have a good way to measure loss here!
Distance from or closeness
to a central point
The Classical Partitioning Problem
1 10 11 12 13 142 3 4 5 6 7 8 9
• Then, select the one with the lowest loss; for example the one
shown – blue = +1, red = -1
• But we don’t really have a good way to measure loss here!
Distance from or closeness
to a central point
Lets sample one more point
The Ground Truth – Two “Topics”
The seven
virtues
The seven
vices
Assume, now, that we have some vocabulary V of English words
X is a set of positions and each element of X is labeled with an
element from V
If X is a multi-set of words (set of positions), then it has an inherent
structure in it: for e.g.
• We no longer see:
• We are used to: and #pow is in #doing
Additional Partitioning: Documents
The seven
virtues
The seven
vices
Success behind LDA
 Allocate as few topics to a document
 Allocate as few words to each topic
I am Nik
WalLenDABalancing Act
This checker board
pattern has a
significance – in general
NP-Hard to figure out
the correct pattern from
limited samples even for
2 topics
 The topic ALLOCATION is controlled by the parameter of a DIRICHLET distribution
governing a LATENT proportion of topics over each document
Current Timeline Consequent Timeline
Event Categories: Accidents/Natural Disasters; Attacks (Criminal/Terrorist); Health &
Safety; Endangered Resources; Investigations (Criminal/Legal/Other)
Previously, long long time ago
Centers of an utterance – Entities serving to link that
utterance to other utterances in the current discourse
segment
Sparse Coherence Flows
[BarbaraJ.Grosz,ScottWeinstein,andArvindK.Joshi.Centering:Aframeworkfor
modelingthelocalcoherenceofdiscourse.InComputationalLinguistics,volume21,
pages203–225,1995]
a. Bob opened a new dealership last week. [Cf=Bob,
dealership; Cp=Bob; Cb=undefined]
b. John took a look at the Fords in his lot. [Cf=John, Fords;
Cp=John; Cb=Bob] {Retain}
c. He ended up buying one.
i. [Cf=John; Cp=John; Cb=John] {Smooth-Shift} OR
ii. [Cf=Bob; Cp=Bob; Cb=Bob] {Continue}
Previously, long long time ago
Centerapproximation=the(word,[Grammatical/
Semantic]role)pair(GSR)e.g.(Bob,Subject),(John,
Subject),(dealership,Noun)
Algorithmically
By inspection
For n+1 = 3 and case ii
Global (document/section level) focus
Problems with Centering Theory
a. The house appeared to have been burgled. [Cf=house ]
b. The door was ajar. [ Cb=house; Cf=door, house; Cp=door]
c. The furniture was in disarray. [ Cb=house; Cf=furniture,
house; Cp=furniture] {?}
Previously, long long time ago
For n+1 = 3
 Utterances like these are the majority in most free text
documents [redundancy reduction]
 In general, co-reference resolution is very HARD
An example summary sentence from folder D0906B-A of TAC2009 A timeline:
• “A fourth day of thrashing thunderstorms began to take a heavier toll on southern
California on Sunday with at least three deaths blamed on the rain, as flooding and
mudslides forced road closures and emergency crews carried out harrowing rescue
operations.”
The next two contextual sentences in the document of the previous sentence are:
• “In Elysian Park, just north of downtown, a 42-year-old homeless man was killed
and another injured when a mudslide swept away their makeshift encampment.”
• “Another man was killed on Pacific Coast Highway in Malibu when his sport utility
vehicle skidded into a mud patch and plunged into the Pacific Ocean.”
If the query is, “Describe the effects and responses to the heavy rainfall and mudslides
in Southern California,” observe the focus of attention on mudslides as subject in
the first two sentences in the table below:
Sentence-GSR grid for a sample summary document slice
Summarization using Coherence
 Incorporating coherence this way does not necessarily
lead to the final summary being coherent
 Coherence is best obtained in a post processing step
using the Traveling Salesman Problem
measure project lady
tape indoor sew
marker pleat
highwaist zigzag
scissor card mark
teach cut fold stitch
pin woman skirt
machine fabric inside
scissors make leather
kilt man beltloop
sew woman fabric
make machine show
baby traditional loom
blouse outdoors
blanket quick
rectangle hood knit
indoor stitch scissors
pin cut iron studio
montage measure kid
penguin dad stuff
thread
One lady is doing sewing project indoors
Woman demonstrating different stitches using a
serger/sewing machine
dad sewing up stuffed penguin for kids Woman makes a bordered hem skirt
A pair of hands do a sewing project using a sewing machine
ground-truthsynopsesoverlaid
But what we really want is this
ground-truthsynopsesoverlaid
clock mechanism
repair computer tube
wash machine lapse
click desk mouse time
front wd40 pliers
reattach knob make
level video water
control person clip
part wire inside
indoor whirlpool man
gear machine guy
repair sew fan test
make replace grease
vintage motor box
indoor man tutorial
fuse bypass brush
wrench repairman
lubricate workshop
bottom remove screw
unscrew screwdriver
video wire
How to repair the water level control mechanism on a
Whirlpool washing machine
a man is repairing a whirlpool washer
how to remove blockage from
a washing machine pump
Woman demonstrates replacing a door hinge
on a dishwasher
A guy shows how to make
repairs on a microwave
How to fix a broken agitator on a Whirlpool
washing machine
A guy working on a vintage box
fan
And this
And this
And this
Roadmap
Introduction
to LDA
Discovering Voter Preferences Using
Mixtures of Topic Models [AND’09 Oral]
Learning to Summarize
Using Coherence [NIPS
09 Poster]
Core NLP
including summarization,
information extraction,
unsupervised grammar
induction, dependency parsing,
rhetorical parsing, sentiment
and polarity analysis…
Non-parametric Bayes
Applied StatisticsExit 2
Exit 1
Uncharted territory –
proceed at your own risk
Why
When
Who
Where
TagLDA: More Observed Constraints
Domain knowledge
Topic
distribution
over words
Annotation/
Tag
distribution
over words
Is there a model which
can take additional clues
and attempt to correct
the misclassifications?
Why
When
Who
Where
Domain knowledge
Incorporating Prior Knowledge
Topic
distribution
over words
but
conditioned
over tags
Number of
parameters
= (K+T)V
TagLDA
switches to
this view for
partial
normalization
of some
weights
- x5 and x10 are annotated with the
orange label and x5 co-occurs with x9
both in documents d1 and d2
- It is thus likely that x5, x9 and x10
belong to the same class since both d1
and d2 should contain as few topics
Why
When
Who
Where
Domain knowledge
Incorporating Prior Knowledge
LDA
TagLDA
Incorporating Prior Knowledge
With Additional Perspectives
Why
When
Who
Where
Domain knowledge
LDA
TagLDA
LDA
Words
indicative of
important
Wiki concepts
Actual human
generated
Wiki category
tags – words
that
summarize/
categorize the
document
Wikipedia
Ubiquitous Bi-Perspective Document Structure
Words
indicative
of
questions
Actual tags
for the
forum post
– even
frequencies
are
available!
Words
indicative
of answers
StackOverflow
Ubiquitous Bi-Perspective Document Structure
Words
indicative
of
document
title
Actual
tags given
by users
Words
indicative
of image
description
Yahoo! Flickr
Ubiquitous Bi-Perspective Document Structure
News Article
What if the documents
are plain text files?
Understanding the Two Perspectives
It is believed US investigators have asked
for, but have been so far refused access to,
evidence accumulated by German
prosecutors probing allegations that former
GM director, Mr. Lopez, stole industrial
secrets from the US group and took them
with him when he joined VW last year.
This investigation was launched by US
President Bill Clinton and is in principle a far
more simple or at least more single-minded
pursuit than that of Ms. Holland.
Dorothea Holland, until four months ago
was the only prosecuting lawyer on the
German case.
News Article
Imagine browsing over many reports on an event
Understanding the Two Perspectives
It is believed USinvestigators have asked for,
but have been so far refused access to, evidence
accumulated by German prosecutors
probing allegations that former GM director, Mr.
Lopez, stole industrial secrets from the USgroup
and took them with him when he joined VW last year.
This investigation was launched by US
President Bill Clinton and is in principle a far more simple
or at least more single-minded pursuit than that of Ms.
Holland.
Dorothea Holland, until four months ago
was the only prosecutinglawyer on the
German case.News Article
The “document level”
perspective
What words can we remember after a first browse?
German, US,
investigations,
GM, Dorothea
Holland, Lopez,
prosecute
Understanding the Two Perspectives
Important Verbs
and Dependents
Named Entities
What helped us remember?
ORGANIZATION
It is believed US investigators have asked
for, but have been so far refused access to,
evidence accumulated by German
prosecutors probing allegations that former
GM director, Mr. Lopez, stole industrial
secrets from the US group and took them
with him when he joined VW last year.
This investigation was launched by US
President Bill Clinton and is in principle a far
more simple or at least more single-minded
pursuit than that of Ms. Holland.
Dorothea Holland, until four months ago
was the only prosecuting lawyer on the
German case.
News Article
LOCATION
MISC
PERSON
WHAT
HAPPENED?
The “word level”
perspective
The “document level”
perspective
German, US,
investigations,
GM, Dorothea
Holland, Lopez,
prosecute
Understanding the Two Perspectives
Summarization power of the perspectives
It is believed US investigators have asked
for, but have been so far refused access to,
evidence accumulated by German
prosecutors probing allegations that former
GM director, Mr. Lopez, stole industrial
secrets from the US group and took them
with him when he joined VW last year.
This investigation was launched by US
President Bill Clinton and is in principle a far
more simple or at least more single-minded
pursuit than that of Ms. Holland
Dorothea Holland, until four months ago
was the only prosecuting lawyer on the
German case.
German, US,
investigations,
GM, Dorothea
Holland, Lopez,
prosecute
Sentence Boundaries
What if we turn the document off?BeginMiddleEnd
A young man climbs an artificial rock wall indoors
Adjective modifier
(What kind of wall?)
Direct Object
Direct
Subject
Adverb modifier
(climbing where?)
Major Topic: Rock climbing
Sub-topics: artificial rock wall, indoor rock climbing gym
And as if that wasn’t enough!
Categories: Weather hazards to aircraft | Accidents involving fog | Snow or ice weather
phenomena | Fog | Psychrometrics Labeled by human editors
BeginningMiddleEnd
A Wikipedia Article on “fog”
 Take the first category label – “weather hazards to aircraft”
 “aircraft” doesn’t occur in the document body!
 “hazard” only appears in a section title read as “Visibility
hazards”
 “Weather” appears only 6 out of 15 times in the main body
 However, the images suggest that fog is related to concepts like
fog over the Golden Gate bridge, fog in streets, poor visibility
and quality of air
Wiki categories: Abstract or specific?
Labeled by a Tag2LDA model from title and image captions
Categories: Weather hazards to aircraft | Accidents
involving fog | Snow or ice weather phenomena | Fog |
Psychrometrics Labeled by human editors
Categories: fog, San Francisco, visible, high,
temperature, streets, Bay, lake, California, bridge, air
• How do we model such a document
collection?
METag2LDA Corr-METag2LDAMMLDA CorrMMLDATagLDA
Combines
TagLDA
and
MMLDA
Combines
TagLDA and
Corr-
MMLDA
MM = Multinomial + Multinomial; ME = Multinomial + Exponential
Made Possible with Tag2LDA Models
E-
Harmony!
Topic ALLOCATION is controlled by the parameter of a
DIRICHLET distribution governing a LATENT proportion of
topics over each document
I am Nik
WalLenDA
Bi-Perspective Topic Model – METag2LDA
And this
balancing act
got a whole
lot tougher
Exponential State Space
Bayes
Ball
Constructing Variational Dual
Mean Field Distributions
Mean Field Distributions
Mean Field Distributions
Hmmm… a
smudge…
wipe.. wipe..
wipe..
2 plates, 2
arrows, 4
circles… no
smudges…
even and
nice!
Mixture Model: Real valued data
y
x
Mixture Model: Real valued data
Mean Parameters
Mean Field Optimization
Empirical mean
p belongs to
exponential
family by MaxEnt
Forward Mapping
Backward
Mapping
Mean Field Optimization Sufficient
statistics
Mean Field Optimization
Very similar to finding the basic feasible solution
(BFS) in linear programming
• Start with pivot at the origin (only slack variables
as solution)
• Cycle the pivot through the extreme points i.e.
replace slacks in BFS until solution is found
Mean Field Optimization
However, mean field optimization space is
inherently non-convex over the set of tractable
distributions due to the delta functions which match
the extreme points of the convex hull of sufficient
statistics of the original discrete distributions
ELBO: Evidence Lower BOund
Mean Field Inference
Mean Field Inference
Mean Field Inference
ELBO
Topics conditioned on different section identifiers
(WL tag categories)
Topic Marginals
Topics
over
image
captions
Correspondence
of DL tag words
with content
words
Topic Labeling
Faceted Bi-Perspective Document Organization
All of the inference machinery *is needed*
to generate exploratory outputs like this!
• METag2LDA: A topic generating all DL tags in a document
does not necessarily mean that the same topic generates
all words in the document
• Corr-METag2LDA: A topic generating *all* DL tags in a
document does mean that the same topic generates all
words in the document - a considerable strongpoint
Topic concentration parameter
Document specific topic proportions
Document content words
Document Level (DL) tags
Word Level (WL) tags
Indicator variables
Topic Parameters
Tag Parameters
CorrME-
Tag2LDA
METag2LDA
The Family of Tag2LDA Models
Experiments
 Wikipedia articles with images and captions manually
collected along {food, animal, countries, sport, war,
transportation, nature, weapon, universe and ethnic
groups} concepts
 Annotations/tags used:
 DL Tags – image caption words and the article titles
 WL Annotations – Positions of sections binned into 5
bins
 Objective: to generate category labels for test documents
 Evaluation
– ELBO: to see performance among various TagLDA models
– WordNet based similarity evaluation between actual category
labels and proxies for them from caption words
Held-out ELBO
Selected Wikipedia Articles
 WL annotations – Section positions in the document
 DL tags – image caption words and article titles
 TagLDA perplexity is comparable to MM(METag2)LDA
 The (image caption words + article titles) and the content words
are independently discriminative enough
 Corr-MM(METag2)LDA performs best since almost all image caption
words and the article title for a Wikipedia document are about a
specific topic
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
K=20 K=50 K=100 K=200Millions
MMLDA TagLDA corrLDA METag2LDA corrMETag2LDA
0
0.5
1
1.5
2
40 60 80 100
Millions
MMLDA METag2LDA corrLDA
corrMETag2LDA TagLDA
Held-out ELBO
DUC05 Newswire Dataset (Recent Experiments with TagLDA Included)
 WL annotations – Named Entities
 DL tags – abstract coherence tuples like (subject, object) e.g. “Mary(Subject) taught the
class. Everybody liked Mary(Object).” *Ignoring coref resolution]
 Abstract markers like (“subj” “obj”) acting as DL perspective are not document
discriminative or even topical markers
 Rather they indicate a semantic perspective of coherence which is intricately linked
to words
 By ignoring the DL perspective completely leads to better fit by TagLDA due to variations
in word distributions only
1.35
1.4
1.45
1.5
1.55
1.6
1.65
40 60 80 100
Millions
MMLDA METag2LDA corrLDA corrMETag2LDA
Are Categories more abstract or specific?
Inverse Hop distance in WordNet ontology
 Top 5 words from the caption vocabulary are chosen
 Max Weighted Average = 5, Max Best = 1
 METag2LDA almost always wins by narrow margins
 METag2LDA reweights the vocabulary of caption words and article titles that are about a
topic and hence may miss specializations relevant to document within the top (5) ones
 In WordNet ontology, specializations lead to more hop distance
 Ontology based scoring helps explain connections to caption words to ground truths e.g.
Skateboard skate glide snowboard
0
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
K=20 K=50 K=100 K=200
METag2LDA-
AverageDistance
corrMETag2LDA-
AverageDistance
METag2LDA-
BestDistance
corrMETag2LDA-
BestDistance
• Applications
– Document classification using reduced dimensions
– Find faceted topics automatically through word level tags
– Learn correspondences between perspectives
– Label topics through document level multimedia
– Create recommendations based on perspectives
– Video analysis: word prediction given video features
– Tying “multilingual comparable corpora” through topics
– Multi-document summarization using coherence
– E-Textbook aided discussion forum mining:
• Explore topics through the lens of students and teachers
• Label topics from posts through concepts in the e-textbook
Model Usefulness and Applications
Roadmap
Introduction
to LDA
Discovering Voter Preferences Using
Mixtures of Topic Models [AND’09 Oral]
Learning to Summarize
Using Coherence [NIPS
09 Poster]
Core NLP including
summarization, information
extraction, unsupervised
grammar
induction, dependency
parsing, rhetorical
parsing, sentiment and
polarity analysis…
Non-parametric Bayes
Computer Vision and Applications
– Core Technologies
Applied Statistics
Supervised
Learning, Structured
Prediction
Simultaneous Joint and
Conditional Modeling of
Documents Tagged from Two
Perspectives [CIKM 2011 Oral]
Mostly from
premier topic
model research
groups
Year I
joined
UB
Today!
Success of LDA: Image Annotation
August
Previously
Words
forming
other Wiki
articles
Article specific content words
Caption corresponding to the
embedded multimedia
[P. Das, R. K. Srihari and Y. Fu. “Simultaneous Joint and Conditional Modeling of
Documents Tagged from Two Perspectives,” CIKM, Glasgow Scotland, 2011+
Afterwards
Words
forming
other Wiki
articles
Article specific content words
Caption corresponding to the
embedded multimedia
[P. Das, R. K. Srihari and J. J. Corso. “Translating Related Words to Videos and
Back through Latent Topics,” WSDM, Rome, Italy, 2013+
 Expensive frame-wise manual
annotation efforts by drawing
bounding boxes
 Difficulties: camera
shakes, camera motion, zooming
 Careful consideration to which
objects/concepts to annotate?
 Focus on object/concept detection –
noisy for videos in-the-wild
 Does not answer which
objects/concepts are important for
summary generation?
Man with
microphone
Climbing
person
Annotations for training object/concept models
Trained Models
Information Extraction from Videos
Learning latent translation
spaces a.k.a topics
A young man is
climbing an artificial
rock wall indoors
Human Synopsis
 Mixed membership of
latent topics
 Some topics capture
observations that co-
occur commonly
 Other topics allow for
discrimination
 Different topics can be
responsible for
different modalities
No annotations
needed – only
need clip level
summary
Translating across modalities
MMGLDA model
Translating across modalities
Using learnt translation
spaces for prediction
?
Text Translation
( ) ( )
, , , ,
1 1 1 1
( | , )
( | ) ( | )
v O H
O K H K
O H
d o i v i d h i v i
o i h i
p w w w
p w p w
 Topics are marginalized
out to permute
vocabulary for
predictions
 The lower the
correlation among
topics, the better the
permutation
 Sensitive to priors for
real valued data
MMGLDA model
Translating across modalities
Use learnt translation
spaces for prediction
?
Text Translation
( ) ( )
, , , ,
1 1 1 1
( | , )
( | ) ( | )
v O H
O K H K
O H
d o i v i d h i v i
o i h i
p w w w
p w p w
 Topics are marginalized
out to permute
vocabulary for
predictions
 The lower the
correlation among
topics, the better the
permutation
 Sensitive to priors for
real valued dataResponsibility of
topic i over real
valued observations
Responsibility of
topic i over discrete
video features
Probability of learnt
topic i explaining
words in the text
vocabulary
MMGLDA model
• We first formulated the MMGLDA model just
two rooms left of where I am standing now!
An aside
1. There is a guy climbing on a rock-climbing wall.
Multiple Human Summaries: (Max 10 words i.e. imposing a length constraint)
2. A man is bouldering at an indoor rock climbing gym.
3. Someone doing indoor rock climbing.
4. A person is practicing indoor rock climbing.
5. A man is doing artificial rock climbing.
To understand whether we speak all that we see?
1. There is a guy climbing on a rock-climbing wall.
Multiple Human Summaries: (Max 10 words for imposing a length constraint)
Hand holding
climbing
surface
How many
rocks?
The sketch in
the board
Wrist-watch
What’s there
in the back?
Color of the
floor/wall
Dress of the
climber
Not so
important!
2. A man is bouldering at an indoor rock climbing gym.
Empty slots
3. Someone doing indoor rock climbing.
4. A person is practicing indoor rock climbing.
5. A man is doing artificial rock climbing.
Summaries point toward information needs!
Center of Attentions: Central Objects and Actions
Skateboarding
Feeding
animals
Landing fishes
Wedding
ceremony
Woodworking
project
Multimedia
Topic Model
– permute
event specific
vocabularies
Bag of keywords
multi-document
summaries
Sub-events e.g.
skateboarding, snowboarding, sur
fing
Multiple sets of
documents (sets of
frames in videos)
Natural language
multi-document
summaries
Multiple sentences (group of
segments in frames)
Once again: A Summarization Perspective
Evaluation: Held out ELBOs
 In a purely multinomial MMLDA model, failures of independent
events contribute highly negative terms to the log likelihoods
 NOT a measure of keyword summary generation power
 Test ELBOs on events 1-5 in the
Dev-T set
 Prediction ELBOs on events
1-5 in the Dev-T set
Skateboarding
Feeding
animals
Landing fishes
Wedding
ceremony
Woodworking
project
Multimedia
Topic Model
– permute
event specific
vocabularies
Bag of words
multi-document
summaries
Sub-events e.g.
skateboarding, snowboarding, sur
fing
Multiple sets of
documents (sets of
frames in videos)
Natural language
multi-document
summaries
Multiple sentences (group of
segments in frames)
 A c-SVM classier from the libSVM package is
used with default settings for multiclass (15
classes) classification
 55% test accuracy easily achievable
(completely off-the-shelf)
Evaluate using ROUGE-1
HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries
Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661)
Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923)
Event Classification and Summarization
Skateboarding
Feeding
animals
Landing fishes
Wedding
ceremony
Woodworking
project
Multimedia
Topic Model
– permute
event specific
vocabularies
Bag of words
multi-document
summaries
Sub-events e.g.
skateboarding, snowboarding, sur
fing
Multiple sets of
documents (sets of
frames in videos)
Natural language
multi-document
summaries
Multiple sentences (group of
segments in frames)
 A c-SVM classier from the libSVM package is
used with default settings for multiclass (15
classes) classification
 55% test accuracy easily achievable
(completely off-the-shelf)
Event Classification and Summarization
Evaluate using ROUGE-1
HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries
Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661)
Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923)
 If we can achieve 10% of this
for 10 word summaries, we
are doing pretty good!
 Caveat – Text multi-document
summarization task is much
more complex
 MMLDA can show poor ELBO – a bit
misleading
 Performs quite well on predicting
summary worthy keywords
 Sum-normalizing the real valued data
to lie in [0,1]P distorts reality for Corr-
MGLDA w.r.t. quantitative evaluation
 Summary worthiness of predicted
keywords is not good but topics are
good
 MMGLDA produces better topics and
higher ELBO
 Summary worthiness of keywords
almost same as MMLDA for lower n
Evaluation: ROUGE-1 Performance
• Simply predicting more and more keywords
(or creating sentences out of them) does not
improve the relevancy of the generated
summaries
• Instead, selecting sentences from the training
set in an intuitive way almost doubles the
relevancy of the lingual descriptions
Improving ROUGE-1/2 performance
YouCook, iAnalyze
Das et al. WSDM 2013 Das et al. CVPR 2013
Precision
2-gram
Precision
1-gram
Recall
2-gram
Recall
1-gram
Precision
2-gram
Precision
1-gram
Recall
2-gram
Recall
1-gram
0.006 15.47 0.006 19.02 5.14 25.76 6.49 32.87
ROUGE scores for “YouCook” dataset[Corso et al.]
Roadmap
Introduction
to LDA
Discovering Voter Preferences Using
Mixtures of Topic Models [AND’09 Oral]
Learning to Summarize
Using Coherence [NIPS
09 Poster]
Non-parametric Bayes
Computer Vision and Applications
– Core Technologies
Translating Related Words
to Videos and Back
through Latent Topics
[WSDM 2013 Oral]
Applied Statistics
Supervised
Learning, Structured
Prediction
Simultaneous Joint and
Conditional Modeling of
Documents Tagged from Two
Perspectives [CIKM 2011 Oral]
Core NLP including
summarization, information
extraction, unsupervised
grammar
induction, dependency
parsing, rhetorical
parsing, sentiment and
polarity analysis…
Using Tag-Topic Models and
Rhetorical Structure Trees to
Generate Bulleted List
Summaries[to be submitted to
TOIS]
Linear, Quadratic and Conic
Programming Variants
A Thousand Frames in just a Few
Words: Lingual Descriptions of
Videos through Latent Topic
Models and Sparse Object
Stitching [CVPR 2013 Spotlight]
Just one last thing…
• We want to analyze documents not only for
topic discovery but also for turning these
Just one last thing…
• into this
 A previous study on sleep deprivation that less sleep resulted in
impaired glucose metabolism.
 Women who slept less than or equal to 5 hours a night were twice as
likely to suffer from hypertension than women. [*]
 Children ages 3 to 5 years get 11-13 hours of sleep per night.
 Chronic sleep deprivation can do more it can also stress your heart.
 Sleeping less than eight hours at night, frequent nightmares and
difficulty initiating sleep were significantly associated with drinking.
 A single night of sleep deprivation can limit the consolidation of
memory the next day.
 Women’s health is much more at risk. [*]
[*] means that the sentences belong to the same document
Just one last thing…
• using these
Accidents and
Natural
Disasters
Attacks
Health and
Safety
Endangered
Resources
Investigations
and Trials
Document sets
or “Docsets”
Global Tag-Topic Model
Local
Models
Documents and
sentences
Local
Models
Local
Models
Local
Models
Training using
documents
Fitting sentences from
Docsets to the learnt
model
Candidate summary
sentence for a Docset
Weighting a
summary sentence
from local and
global models
Candidate summary
sentence for a Docset
• and these
Attribution
Cause
Elaboration
Just one last thing…
distractions such as
computers or video
games in kids '
bedrooms may
lessen sleep quality.
that only 20
percent of
adolescents get the
recommended nine
hours of sleep ;
The National
Sleep
Foundation
reported in 2006
Satellite (Leaf:
Span 1)
Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3)
Nucleus
[2]
Root [2, 3]
Attribution
Joint
and need more
than eight hours of
sleep per day .
because they 're
nocturnal
Sleep-deprived
teens crash just
about anywhere
Nucleus (Leaf:
Span 1)
Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3)
Satellite
[2,3]
Root [1, 3]
Explanation
Joint
early-risers are
actually at a higher
risk of developing
heart problems.
but a Japanese
study says
Generations
have praised the
wisdom of
getting up early
in the morning,
Nucleus (Leaf:
Span 1)
Satellite (Leaf: Span 2) Nucleus (Leaf: Span 3)
Nucleus
[2,3]
Root [1, 3]
Contrast Attribution
Fortunately
for sleepy
women , a
Penn State
College of
Medicine
study found,
Satellite
(Leaf: Span 1)
Nucleus
[2,4]
Root [1, 4]
that they 're
much better
than men at
enduring
sleep
deprivation,
Nucleus (Leaf:
Span 2)
possibly because
of '' profound
demands of
infant and child
care
Nucleus (Leaf:
Span 3)
placed on them
for most of
mankind 's
history.
Satellite (Leaf:
Span 4)
Satellite
[2,3]
• With scores like these
Just one last thing…
Just one last thing…
• and these
• We want to analyze documents
not only for topic discovery but
also for turning these
• into this
• using these
• and these
• with scores like these
• and these
The final song: Recap
The ending…
Interviewer: Do you agree with President Obama’s approach towards Libya?
Presidential: [Libya??] I just wanted to make sure we're talking about the same
Candidate thing before I say, 'Yes, I agreed' or 'No I didn't agree.' I do not
agree with the way he handled it for the following reason -- nope,
that's a different one. I got all this stuff twirling around in my head
• So that we can always have the right information
on our fingertips
Summary
• Summarize a task using contextual
exploratory analysis tools as well as
deep NLP and
• Make decisions for us!
• Topic models can now talk to structured
prediction models
• Efficient text summarization/translation of
domain specific videos is now possible
• With multi-document summarization systems
which exploit meaning in text, we are getting
closer to our ultimate dream:
– Construct an artificial assistant who can
Future Directions
• Core Algorithms
– Non-parametric Tag2LDA family models
– Address sparsity in tags and scaling of real-valued variables
in mixed domain topic models
– Efficient inference with more structure among hidden
variables
• Applications
– Type in text and get an object detector [borrowed from VPML]
– Intention analysis of videographers in social networks and
the evolution of intentions over time
– Large scale visualization using rhetorics and topic analysis
– Large scale multi-media multi-document summarization
Thank You All for Listening
Questions?

More Related Content

Similar to Here are the key points about centering theory:- Centering theory models local coherence of discourse by tracking transitions of discourse entities (centers) across utterances. - It defines three types of centers for each utterance: Cf (forward-looking center), Cp (preferred center), Cb (backward-looking center).- Cf is the set of discourse entities mentioned in the utterance. Cp is the most salient entity of Cf. Cb is the Cp of the previous utterance.- It classifies center transitions between utterances into different types (continue, retain, smooth-shift, rough-shift) which indicate the level of coherence.- A key problem

The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...Seth Grimes
 
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based DiscoverySolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based DiscoveryJack Park
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsAmparo Elizabeth Cano Basave
 
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsDobo Radichkov
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisJonathan Stray
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designUniversity of California, San Diego
 
Talk odysseyiswc2017
Talk odysseyiswc2017Talk odysseyiswc2017
Talk odysseyiswc2017hala Skaf
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesMatthew Lease
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...Jeff Z. Pan
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chiBarbara Starr
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain OntologyKeerti Bhogaraju
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxKalpit Desai
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Textbutest
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Bhaskar Mitra
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...Chuancong Gao
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in RAshraf Uddin
 

Similar to Here are the key points about centering theory:- Centering theory models local coherence of discourse by tracking transitions of discourse entities (centers) across utterances. - It defines three types of centers for each utterance: Cf (forward-looking center), Cp (preferred center), Cb (backward-looking center).- Cf is the set of discourse entities mentioned in the utterance. Cp is the most salient entity of Cf. Cb is the Cp of the previous utterance.- It classifies center transitions between utterances into different types (continue, retain, smooth-shift, rough-shift) which indicate the level of coherence.- A key problem (20)

The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
The Ins and Outs of Preposition Semantics:
 Challenges in Comprehensive Corpu...
 
Meow Hagedorn
Meow HagedornMeow Hagedorn
Meow Hagedorn
 
semantic web & natural language
semantic web & natural languagesemantic web & natural language
semantic web & natural language
 
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based DiscoverySolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
SolrSherlock: Linkfinding among Biomolecules with Literature-based Discovery
 
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic GraphsStretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
Stretching the Life of Twitter Classifiers with Time-Stamped Semantic Graphs
 
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teamsHolland & Barrett: Gen AI Prompt Engineering for Tech teams
Holland & Barrett: Gen AI Prompt Engineering for Tech teams
 
Frontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text AnalysisFrontiers of Computational Journalism week 2 - Text Analysis
Frontiers of Computational Journalism week 2 - Text Analysis
 
NANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials designNANO266 - Lecture 12 - High-throughput computational materials design
NANO266 - Lecture 12 - High-throughput computational materials design
 
ICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials ProjectICME Workshop Jul 2014 - The Materials Project
ICME Workshop Jul 2014 - The Materials Project
 
Talk odysseyiswc2017
Talk odysseyiswc2017Talk odysseyiswc2017
Talk odysseyiswc2017
 
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & OpportunitiesDeep Learning for Information Retrieval: Models, Progress, & Opportunities
Deep Learning for Information Retrieval: Models, Progress, & Opportunities
 
A-Study_TopicModeling
A-Study_TopicModelingA-Study_TopicModeling
A-Study_TopicModeling
 
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
The Rise of Approximate Ontology Reasoning: Is It Mainstream Yet? --- Revisit...
 
Kdd 2014 tutorial bringing structure to text - chi
Kdd 2014 tutorial   bringing structure to text - chiKdd 2014 tutorial   bringing structure to text - chi
Kdd 2014 tutorial bringing structure to text - chi
 
Topic Extraction on Domain Ontology
Topic Extraction on Domain OntologyTopic Extraction on Domain Ontology
Topic Extraction on Domain Ontology
 
TopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptxTopicModels_BleiPaper_Summary.pptx
TopicModels_BleiPaper_Summary.pptx
 
Information extraction for Free Text
Information extraction for Free TextInformation extraction for Free Text
Information extraction for Free Text
 
Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)Neural Text Embeddings for Information Retrieval (WSDM 2017)
Neural Text Embeddings for Information Retrieval (WSDM 2017)
 
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
EDBT 12 - Top-k interesting phrase mining in ad-hoc collections using sequenc...
 
Text Mining Infrastructure in R
Text Mining Infrastructure in RText Mining Infrastructure in R
Text Mining Infrastructure in R
 

Recently uploaded

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Paola De la Torre
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024Rafal Los
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 

Recently uploaded (20)

A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101Salesforce Community Group Quito, Salesforce 101
Salesforce Community Group Quito, Salesforce 101
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
Neo4j - How KGs are shaping the future of Generative AI at AWS Summit London ...
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024The 7 Things I Know About Cyber Security After 25 Years | April 2024
The 7 Things I Know About Cyber Security After 25 Years | April 2024
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 

Here are the key points about centering theory:- Centering theory models local coherence of discourse by tracking transitions of discourse entities (centers) across utterances. - It defines three types of centers for each utterance: Cf (forward-looking center), Cp (preferred center), Cb (backward-looking center).- Cf is the set of discourse entities mentioned in the utterance. Cp is the most salient entity of Cf. Cb is the Cp of the previous utterance.- It classifies center transitions between utterances into different types (continue, retain, smooth-shift, rough-shift) which indicate the level of coherence.- A key problem

  • 1. Natural Language Summarization of Text and Videos using Topic Models Pradipto Das PhD Dissertation Defense CSE Department, SUNY at Buffalo Rohini K. Srihari Sargur N. Srihari Aidong Zhang Professor and Committee Chair Distinguished Professor Professor and Chair CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo CSE Dept., SUNY Buffalo Download this presentation from http://bit.ly/pdasthesispptx or http://bit.ly/pdasthesispptxpdf Primary committee members
  • 2. Using Tag-Topic Models and Rhetorical Structure Trees to Generate Bulleted List Summaries[journal submission] The Road Ahead (modulo presenter) Discovering Voter Preferences using Mixtures of Topic Models [AND Wkshp 2009] Simultaneous Joint and Conditional Modeling of documents Tagged from Two Perspectives [CIKM 2011] A Thousand Frames in just a Few Words: Lingual Descriptions of Videos through Latent Topic Models and Sparse Object Stitching [CVPR 2013] Translating Related Words to Videos and Back through Latent Topics [WSDM 2013] Introduction to LDA Learning to Summarize using Coherence [NIPS Wkshp 2009]
  • 3. • Stay hungry • Stay foolish The answers are coming within the next 60-75 minutes.. so.. Steve Jobs: Stanford Commencement Speech, 2005 there is great food, green tea and coffee at the back! But if you stay hungry I will happily grab the leftovers!
  • 4. Contributions of this thesis We can explore our data, extrapolate from our data and use context to guide decisions about new information Can we find topics from a corpus without human intervention? Can we use these topics to annotate documents and use annotations to organize, summarize and search text? Well, yes, LDA does that for us! That is so 2003!  Well, can LDA model documents tagged from at least two different viewpoints or perspectives? No!  Can we do that after reading this thesis? Yes we can!  Can we generate bulleted lists from multiple documents after reading this thesis? Yes we can!  Can we go further and translate videos into text and vice versa after reading this thesis? Yes we can! Bottomline:
  • 6. • Unsupervised topic exploration using LDA – Full text of first 50 patents from uspto.gov using search keywords of “rocket” & full text of 50 scientific papers from American journal of Aerospace Engineering – Vocabulary size: 10102 words; Total word count: 219568 Theme 1 Theme 2 Theme 3 Theme 4 Theme 5 insulation fuel launch rocket system composition matter mission assembly fuel fiber A-B space nozzle engine system engineer system surface combustion sensor tower vehicle portion propulsion fire magnetic earth ring pump water electron orbit motor oxidizeTopic from patent documents Topic from journal papers Topic from patent documents Topic from journal papers Topic from journal papers Explore and extrapolate from context
  • 7. Power of LDA: Language independence Topic Translation Topic Translation Topic Translation , , , , , , , Tsunami, earthquake, Chile, Pichilemu, gone, warning , news, city , , , , , , , , , flight, Air, France, Brazil, A, 447, disappear, ocean France , , , , , , , China, Olympic, Beijing, Gore, function, stadium, games Topic Translation Topic Translation Topic Translation , , xx->xx, , , , , :xx->xx Tsunami, earthquake, earthquake:x x->xx, city, local, UTC, Mayor, Tsunami:xx- >xx xx- >xx, xx->xx, xx->xx, Brazil, A, disappeared, search, flight, aircraft:xx- >xx, ocean, ship:xx->xx, air:xx->xx, air, space xx->xx, xx- >xx, xx- >xx, xx- >xx, China, Olympic, China:xx->xx, Olympic:xx- >xx, Gore:xx- >xx, Gore, gold, Beijing:xx- >xx, National TopicsoverwordsTopicsovercontrolledvocabulary
  • 8. How does LDA look at documents? A boring view of Wikipedia
  • 9. What about other perspectives? Words forming other Wiki articles Article specific content words Words forming section titles An exciting view of Wikipedia
  • 10. Insulation, composition, fiber system, sensor, fire, water Fuel, matter, A-B Engineer, tower magnetic, electron Rocket, assembly, Nozzle, surface, Portion, ring, motor Launch, mission, Space, system, Vehicle, earth orbit We are identifying the landscape from within the landscape – similar to finding the map of a maze from within the maze! Fuel, matter, A-B Engineer, tower magnetic, electron Explore and extrapolate from context
  • 11. Mostly from premier topic model research groups Year I joined UB Today! Success of LDA: a Generative Model August
  • 12. Success of LDA • Fitting themes to an UNSEEN patent document on insulating a rocket motor using basalt fibers, nanoclay compositions etc. Theme 1 Theme 2 Theme 3 Theme 4 Theme 5 insulation fuel launch rocket system composition matter mission assembly fuel fiber A-B space nozzle engine system engineer system surface combustion sensor tower vehicle portion propulsion fire magnetic earth ring pump water electron orbit motor oxidize “What is claimed is: 1. An insulation composition comprising: a polymer comprising at least one of a nitrile butadiene rubber and polybenzimidazole fibers; basalt fibers having a diameter that is at least 5 .mu.m 2. (lots more) …” Topic from patent documents Topic from journal papers Topic from patent documents Topic from journal papers Topic from journal papers
  • 13. K-Means Hierarchical Clustering LDA: VB LDA: Gibbs Dynamic LDA MMLDA Corr-LDA Hierarchi cal LDA Markov LDA Syntactic LDA Suffix Tree LDA TagLDA Corr- METag2LDA Corr- MMG LDA Model Complexities (modulo presenter) GMM
  • 14. Model Complexities (modulo presenter) K-Means GMM Hierarchical Clustering LDA: VB Dynamic LDA MMLDA Corr-LDA Hierarchi cal LDA Markov LDA Syntactic LDA Suffix Tree LDA TagLDA Corr- METag2LDA Corr- MMG LDA Hair Loss LDA: Gibbs
  • 15. Why do we want to explore? Master Yoda, how do I find wisdom from so many things happening around us? Go to the center of the data and find your wisdom you will
  • 16. parkour perform traceur area flip footage jump park urban run outdoor outdoors kid group pedestrian playground lobster burger dress celery Christmas wrap roll mix tarragon steam season scratch stick live water lemon garlic floor parkour wall jump handrail locker contestant school run interview block slide indoor perform build tab duck make dog sandwich man outdoors guy bench black sit park white disgustingly toe cough feed rub contest parody Can you find your wisdom? Corr- MMGLDA
  • 17. Corr- MMGLDA parkour perform traceur area flip footage jump park urban run outdoor outdoors kid group pedestrian lobster burger dress celery Christmas wrap roll mix tarragon steam season scratch stick live water lemon floor parkour wall jump handrail locker contestant school run interview block slide indoor perform build tab duck make dog sandwich man outdoors guy bench black sit park white disgustingly toe cough feed rub contest parody tutorial: man explains how to make lobster rolls from scratch One guy is making sandwich outdoors montage of guys free running up a tree and through the woods interview with parkour contestants Kid does parkour around the park Footage of group of performing parkour outdoors A family holds a strange burger assembly and wrapping contest at Christmas Actualground-truthsynopsesoverlaid Man performs parkour in various locations Are these what you were thinking?
  • 18. 1 10 11 12 13 142 3 4 5 6 7 8 9 • No ground truth label assignments are known The Classical Partitioning Problem
  • 19. 1 10 11 12 13 142 3 4 5 6 7 8 9 • Then, select the one with the lowest loss; for example the one shown – blue = +1, red = -1 • But we don’t really have a good way to measure loss here! Distance from or closeness to a central point The Classical Partitioning Problem
  • 20. 1 10 11 12 13 142 3 4 5 6 7 8 9 • Then, select the one with the lowest loss; for example the one shown – blue = +1, red = -1 • But we don’t really have a good way to measure loss here! Distance from or closeness to a central point Lets sample one more point
  • 21. The Ground Truth – Two “Topics” The seven virtues The seven vices Assume, now, that we have some vocabulary V of English words X is a set of positions and each element of X is labeled with an element from V
  • 22. If X is a multi-set of words (set of positions), then it has an inherent structure in it: for e.g. • We no longer see: • We are used to: and #pow is in #doing Additional Partitioning: Documents The seven virtues The seven vices
  • 23. Success behind LDA  Allocate as few topics to a document  Allocate as few words to each topic I am Nik WalLenDABalancing Act This checker board pattern has a significance – in general NP-Hard to figure out the correct pattern from limited samples even for 2 topics  The topic ALLOCATION is controlled by the parameter of a DIRICHLET distribution governing a LATENT proportion of topics over each document
  • 24. Current Timeline Consequent Timeline Event Categories: Accidents/Natural Disasters; Attacks (Criminal/Terrorist); Health & Safety; Endangered Resources; Investigations (Criminal/Legal/Other) Previously, long long time ago
  • 25. Centers of an utterance – Entities serving to link that utterance to other utterances in the current discourse segment Sparse Coherence Flows [BarbaraJ.Grosz,ScottWeinstein,andArvindK.Joshi.Centering:Aframeworkfor modelingthelocalcoherenceofdiscourse.InComputationalLinguistics,volume21, pages203–225,1995] a. Bob opened a new dealership last week. [Cf=Bob, dealership; Cp=Bob; Cb=undefined] b. John took a look at the Fords in his lot. [Cf=John, Fords; Cp=John; Cb=Bob] {Retain} c. He ended up buying one. i. [Cf=John; Cp=John; Cb=John] {Smooth-Shift} OR ii. [Cf=Bob; Cp=Bob; Cb=Bob] {Continue} Previously, long long time ago Centerapproximation=the(word,[Grammatical/ Semantic]role)pair(GSR)e.g.(Bob,Subject),(John, Subject),(dealership,Noun) Algorithmically By inspection For n+1 = 3 and case ii
  • 26. Global (document/section level) focus Problems with Centering Theory a. The house appeared to have been burgled. [Cf=house ] b. The door was ajar. [ Cb=house; Cf=door, house; Cp=door] c. The furniture was in disarray. [ Cb=house; Cf=furniture, house; Cp=furniture] {?} Previously, long long time ago For n+1 = 3  Utterances like these are the majority in most free text documents [redundancy reduction]  In general, co-reference resolution is very HARD
  • 27. An example summary sentence from folder D0906B-A of TAC2009 A timeline: • “A fourth day of thrashing thunderstorms began to take a heavier toll on southern California on Sunday with at least three deaths blamed on the rain, as flooding and mudslides forced road closures and emergency crews carried out harrowing rescue operations.” The next two contextual sentences in the document of the previous sentence are: • “In Elysian Park, just north of downtown, a 42-year-old homeless man was killed and another injured when a mudslide swept away their makeshift encampment.” • “Another man was killed on Pacific Coast Highway in Malibu when his sport utility vehicle skidded into a mud patch and plunged into the Pacific Ocean.” If the query is, “Describe the effects and responses to the heavy rainfall and mudslides in Southern California,” observe the focus of attention on mudslides as subject in the first two sentences in the table below: Sentence-GSR grid for a sample summary document slice Summarization using Coherence  Incorporating coherence this way does not necessarily lead to the final summary being coherent  Coherence is best obtained in a post processing step using the Traveling Salesman Problem
  • 28. measure project lady tape indoor sew marker pleat highwaist zigzag scissor card mark teach cut fold stitch pin woman skirt machine fabric inside scissors make leather kilt man beltloop sew woman fabric make machine show baby traditional loom blouse outdoors blanket quick rectangle hood knit indoor stitch scissors pin cut iron studio montage measure kid penguin dad stuff thread One lady is doing sewing project indoors Woman demonstrating different stitches using a serger/sewing machine dad sewing up stuffed penguin for kids Woman makes a bordered hem skirt A pair of hands do a sewing project using a sewing machine ground-truthsynopsesoverlaid But what we really want is this
  • 29. ground-truthsynopsesoverlaid clock mechanism repair computer tube wash machine lapse click desk mouse time front wd40 pliers reattach knob make level video water control person clip part wire inside indoor whirlpool man gear machine guy repair sew fan test make replace grease vintage motor box indoor man tutorial fuse bypass brush wrench repairman lubricate workshop bottom remove screw unscrew screwdriver video wire How to repair the water level control mechanism on a Whirlpool washing machine a man is repairing a whirlpool washer how to remove blockage from a washing machine pump Woman demonstrates replacing a door hinge on a dishwasher A guy shows how to make repairs on a microwave How to fix a broken agitator on a Whirlpool washing machine A guy working on a vintage box fan And this
  • 32. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Non-parametric Bayes Applied StatisticsExit 2 Exit 1 Uncharted territory – proceed at your own risk
  • 33. Why When Who Where TagLDA: More Observed Constraints Domain knowledge Topic distribution over words Annotation/ Tag distribution over words Is there a model which can take additional clues and attempt to correct the misclassifications?
  • 34. Why When Who Where Domain knowledge Incorporating Prior Knowledge Topic distribution over words but conditioned over tags Number of parameters = (K+T)V TagLDA switches to this view for partial normalization of some weights - x5 and x10 are annotated with the orange label and x5 co-occurs with x9 both in documents d1 and d2 - It is thus likely that x5, x9 and x10 belong to the same class since both d1 and d2 should contain as few topics
  • 36. Incorporating Prior Knowledge With Additional Perspectives Why When Who Where Domain knowledge LDA TagLDA LDA
  • 37. Words indicative of important Wiki concepts Actual human generated Wiki category tags – words that summarize/ categorize the document Wikipedia Ubiquitous Bi-Perspective Document Structure
  • 38. Words indicative of questions Actual tags for the forum post – even frequencies are available! Words indicative of answers StackOverflow Ubiquitous Bi-Perspective Document Structure
  • 39. Words indicative of document title Actual tags given by users Words indicative of image description Yahoo! Flickr Ubiquitous Bi-Perspective Document Structure
  • 40. News Article What if the documents are plain text files? Understanding the Two Perspectives
  • 41. It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. News Article Imagine browsing over many reports on an event Understanding the Two Perspectives
  • 42. It is believed USinvestigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the USgroup and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecutinglawyer on the German case.News Article The “document level” perspective What words can we remember after a first browse? German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Understanding the Two Perspectives
  • 43. Important Verbs and Dependents Named Entities What helped us remember? ORGANIZATION It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland. Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. News Article LOCATION MISC PERSON WHAT HAPPENED? The “word level” perspective The “document level” perspective German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Understanding the Two Perspectives
  • 44. Summarization power of the perspectives It is believed US investigators have asked for, but have been so far refused access to, evidence accumulated by German prosecutors probing allegations that former GM director, Mr. Lopez, stole industrial secrets from the US group and took them with him when he joined VW last year. This investigation was launched by US President Bill Clinton and is in principle a far more simple or at least more single-minded pursuit than that of Ms. Holland Dorothea Holland, until four months ago was the only prosecuting lawyer on the German case. German, US, investigations, GM, Dorothea Holland, Lopez, prosecute Sentence Boundaries What if we turn the document off?BeginMiddleEnd
  • 45. A young man climbs an artificial rock wall indoors Adjective modifier (What kind of wall?) Direct Object Direct Subject Adverb modifier (climbing where?) Major Topic: Rock climbing Sub-topics: artificial rock wall, indoor rock climbing gym And as if that wasn’t enough!
  • 46. Categories: Weather hazards to aircraft | Accidents involving fog | Snow or ice weather phenomena | Fog | Psychrometrics Labeled by human editors BeginningMiddleEnd A Wikipedia Article on “fog”
  • 47.  Take the first category label – “weather hazards to aircraft”  “aircraft” doesn’t occur in the document body!  “hazard” only appears in a section title read as “Visibility hazards”  “Weather” appears only 6 out of 15 times in the main body  However, the images suggest that fog is related to concepts like fog over the Golden Gate bridge, fog in streets, poor visibility and quality of air Wiki categories: Abstract or specific? Labeled by a Tag2LDA model from title and image captions Categories: Weather hazards to aircraft | Accidents involving fog | Snow or ice weather phenomena | Fog | Psychrometrics Labeled by human editors Categories: fog, San Francisco, visible, high, temperature, streets, Bay, lake, California, bridge, air
  • 48. • How do we model such a document collection?
  • 49. METag2LDA Corr-METag2LDAMMLDA CorrMMLDATagLDA Combines TagLDA and MMLDA Combines TagLDA and Corr- MMLDA MM = Multinomial + Multinomial; ME = Multinomial + Exponential Made Possible with Tag2LDA Models E- Harmony!
  • 50. Topic ALLOCATION is controlled by the parameter of a DIRICHLET distribution governing a LATENT proportion of topics over each document I am Nik WalLenDA Bi-Perspective Topic Model – METag2LDA And this balancing act got a whole lot tougher
  • 55. Mean Field Distributions Hmmm… a smudge… wipe.. wipe.. wipe.. 2 plates, 2 arrows, 4 circles… no smudges… even and nice!
  • 56. Mixture Model: Real valued data
  • 57. y x Mixture Model: Real valued data
  • 59. Mean Field Optimization Empirical mean p belongs to exponential family by MaxEnt
  • 60. Forward Mapping Backward Mapping Mean Field Optimization Sufficient statistics
  • 61. Mean Field Optimization Very similar to finding the basic feasible solution (BFS) in linear programming • Start with pivot at the origin (only slack variables as solution) • Cycle the pivot through the extreme points i.e. replace slacks in BFS until solution is found
  • 62. Mean Field Optimization However, mean field optimization space is inherently non-convex over the set of tractable distributions due to the delta functions which match the extreme points of the convex hull of sufficient statistics of the original discrete distributions
  • 67. Topics conditioned on different section identifiers (WL tag categories) Topic Marginals Topics over image captions Correspondence of DL tag words with content words Topic Labeling Faceted Bi-Perspective Document Organization All of the inference machinery *is needed* to generate exploratory outputs like this!
  • 68. • METag2LDA: A topic generating all DL tags in a document does not necessarily mean that the same topic generates all words in the document • Corr-METag2LDA: A topic generating *all* DL tags in a document does mean that the same topic generates all words in the document - a considerable strongpoint Topic concentration parameter Document specific topic proportions Document content words Document Level (DL) tags Word Level (WL) tags Indicator variables Topic Parameters Tag Parameters CorrME- Tag2LDA METag2LDA The Family of Tag2LDA Models
  • 69. Experiments  Wikipedia articles with images and captions manually collected along {food, animal, countries, sport, war, transportation, nature, weapon, universe and ethnic groups} concepts  Annotations/tags used:  DL Tags – image caption words and the article titles  WL Annotations – Positions of sections binned into 5 bins  Objective: to generate category labels for test documents  Evaluation – ELBO: to see performance among various TagLDA models – WordNet based similarity evaluation between actual category labels and proxies for them from caption words
  • 70. Held-out ELBO Selected Wikipedia Articles  WL annotations – Section positions in the document  DL tags – image caption words and article titles  TagLDA perplexity is comparable to MM(METag2)LDA  The (image caption words + article titles) and the content words are independently discriminative enough  Corr-MM(METag2)LDA performs best since almost all image caption words and the article title for a Wikipedia document are about a specific topic 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 K=20 K=50 K=100 K=200Millions MMLDA TagLDA corrLDA METag2LDA corrMETag2LDA
  • 71. 0 0.5 1 1.5 2 40 60 80 100 Millions MMLDA METag2LDA corrLDA corrMETag2LDA TagLDA Held-out ELBO DUC05 Newswire Dataset (Recent Experiments with TagLDA Included)  WL annotations – Named Entities  DL tags – abstract coherence tuples like (subject, object) e.g. “Mary(Subject) taught the class. Everybody liked Mary(Object).” *Ignoring coref resolution]  Abstract markers like (“subj” “obj”) acting as DL perspective are not document discriminative or even topical markers  Rather they indicate a semantic perspective of coherence which is intricately linked to words  By ignoring the DL perspective completely leads to better fit by TagLDA due to variations in word distributions only 1.35 1.4 1.45 1.5 1.55 1.6 1.65 40 60 80 100 Millions MMLDA METag2LDA corrLDA corrMETag2LDA
  • 72. Are Categories more abstract or specific? Inverse Hop distance in WordNet ontology  Top 5 words from the caption vocabulary are chosen  Max Weighted Average = 5, Max Best = 1  METag2LDA almost always wins by narrow margins  METag2LDA reweights the vocabulary of caption words and article titles that are about a topic and hence may miss specializations relevant to document within the top (5) ones  In WordNet ontology, specializations lead to more hop distance  Ontology based scoring helps explain connections to caption words to ground truths e.g. Skateboard skate glide snowboard 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 K=20 K=50 K=100 K=200 METag2LDA- AverageDistance corrMETag2LDA- AverageDistance METag2LDA- BestDistance corrMETag2LDA- BestDistance
  • 73. • Applications – Document classification using reduced dimensions – Find faceted topics automatically through word level tags – Learn correspondences between perspectives – Label topics through document level multimedia – Create recommendations based on perspectives – Video analysis: word prediction given video features – Tying “multilingual comparable corpora” through topics – Multi-document summarization using coherence – E-Textbook aided discussion forum mining: • Explore topics through the lens of students and teachers • Label topics from posts through concepts in the e-textbook Model Usefulness and Applications
  • 74. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Non-parametric Bayes Computer Vision and Applications – Core Technologies Applied Statistics Supervised Learning, Structured Prediction Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives [CIKM 2011 Oral]
  • 75. Mostly from premier topic model research groups Year I joined UB Today! Success of LDA: Image Annotation August
  • 76. Previously Words forming other Wiki articles Article specific content words Caption corresponding to the embedded multimedia [P. Das, R. K. Srihari and Y. Fu. “Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives,” CIKM, Glasgow Scotland, 2011+
  • 77. Afterwards Words forming other Wiki articles Article specific content words Caption corresponding to the embedded multimedia [P. Das, R. K. Srihari and J. J. Corso. “Translating Related Words to Videos and Back through Latent Topics,” WSDM, Rome, Italy, 2013+
  • 78.  Expensive frame-wise manual annotation efforts by drawing bounding boxes  Difficulties: camera shakes, camera motion, zooming  Careful consideration to which objects/concepts to annotate?  Focus on object/concept detection – noisy for videos in-the-wild  Does not answer which objects/concepts are important for summary generation? Man with microphone Climbing person Annotations for training object/concept models Trained Models Information Extraction from Videos
  • 79. Learning latent translation spaces a.k.a topics A young man is climbing an artificial rock wall indoors Human Synopsis  Mixed membership of latent topics  Some topics capture observations that co- occur commonly  Other topics allow for discrimination  Different topics can be responsible for different modalities No annotations needed – only need clip level summary Translating across modalities MMGLDA model
  • 80. Translating across modalities Using learnt translation spaces for prediction ? Text Translation ( ) ( ) , , , , 1 1 1 1 ( | , ) ( | ) ( | ) v O H O K H K O H d o i v i d h i v i o i h i p w w w p w p w  Topics are marginalized out to permute vocabulary for predictions  The lower the correlation among topics, the better the permutation  Sensitive to priors for real valued data MMGLDA model
  • 81. Translating across modalities Use learnt translation spaces for prediction ? Text Translation ( ) ( ) , , , , 1 1 1 1 ( | , ) ( | ) ( | ) v O H O K H K O H d o i v i d h i v i o i h i p w w w p w p w  Topics are marginalized out to permute vocabulary for predictions  The lower the correlation among topics, the better the permutation  Sensitive to priors for real valued dataResponsibility of topic i over real valued observations Responsibility of topic i over discrete video features Probability of learnt topic i explaining words in the text vocabulary MMGLDA model
  • 82. • We first formulated the MMGLDA model just two rooms left of where I am standing now! An aside
  • 83. 1. There is a guy climbing on a rock-climbing wall. Multiple Human Summaries: (Max 10 words i.e. imposing a length constraint) 2. A man is bouldering at an indoor rock climbing gym. 3. Someone doing indoor rock climbing. 4. A person is practicing indoor rock climbing. 5. A man is doing artificial rock climbing. To understand whether we speak all that we see?
  • 84. 1. There is a guy climbing on a rock-climbing wall. Multiple Human Summaries: (Max 10 words for imposing a length constraint) Hand holding climbing surface How many rocks? The sketch in the board Wrist-watch What’s there in the back? Color of the floor/wall Dress of the climber Not so important! 2. A man is bouldering at an indoor rock climbing gym. Empty slots 3. Someone doing indoor rock climbing. 4. A person is practicing indoor rock climbing. 5. A man is doing artificial rock climbing. Summaries point toward information needs! Center of Attentions: Central Objects and Actions
  • 85. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of keywords multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames) Once again: A Summarization Perspective
  • 86. Evaluation: Held out ELBOs  In a purely multinomial MMLDA model, failures of independent events contribute highly negative terms to the log likelihoods  NOT a measure of keyword summary generation power  Test ELBOs on events 1-5 in the Dev-T set  Prediction ELBOs on events 1-5 in the Dev-T set
  • 87. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of words multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames)  A c-SVM classier from the libSVM package is used with default settings for multiclass (15 classes) classification  55% test accuracy easily achievable (completely off-the-shelf) Evaluate using ROUGE-1 HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661) Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923) Event Classification and Summarization
  • 88. Skateboarding Feeding animals Landing fishes Wedding ceremony Woodworking project Multimedia Topic Model – permute event specific vocabularies Bag of words multi-document summaries Sub-events e.g. skateboarding, snowboarding, sur fing Multiple sets of documents (sets of frames in videos) Natural language multi-document summaries Multiple sentences (group of segments in frames)  A c-SVM classier from the libSVM package is used with default settings for multiclass (15 classes) classification  55% test accuracy easily achievable (completely off-the-shelf) Event Classification and Summarization Evaluate using ROUGE-1 HEXTAC 2009: 100-word human references vs. 100-word manually extracted summaries Average Recall: 0.37916 (95%-confidence interval 0.37187 - 0.38661) Average Precision: 0.39142 (95%-confidence interval 0.38342 - 0.39923)  If we can achieve 10% of this for 10 word summaries, we are doing pretty good!  Caveat – Text multi-document summarization task is much more complex
  • 89.  MMLDA can show poor ELBO – a bit misleading  Performs quite well on predicting summary worthy keywords  Sum-normalizing the real valued data to lie in [0,1]P distorts reality for Corr- MGLDA w.r.t. quantitative evaluation  Summary worthiness of predicted keywords is not good but topics are good  MMGLDA produces better topics and higher ELBO  Summary worthiness of keywords almost same as MMLDA for lower n Evaluation: ROUGE-1 Performance
  • 90. • Simply predicting more and more keywords (or creating sentences out of them) does not improve the relevancy of the generated summaries • Instead, selecting sentences from the training set in an intuitive way almost doubles the relevancy of the lingual descriptions Improving ROUGE-1/2 performance
  • 91. YouCook, iAnalyze Das et al. WSDM 2013 Das et al. CVPR 2013 Precision 2-gram Precision 1-gram Recall 2-gram Recall 1-gram Precision 2-gram Precision 1-gram Recall 2-gram Recall 1-gram 0.006 15.47 0.006 19.02 5.14 25.76 6.49 32.87 ROUGE scores for “YouCook” dataset[Corso et al.]
  • 92. Roadmap Introduction to LDA Discovering Voter Preferences Using Mixtures of Topic Models [AND’09 Oral] Learning to Summarize Using Coherence [NIPS 09 Poster] Non-parametric Bayes Computer Vision and Applications – Core Technologies Translating Related Words to Videos and Back through Latent Topics [WSDM 2013 Oral] Applied Statistics Supervised Learning, Structured Prediction Simultaneous Joint and Conditional Modeling of Documents Tagged from Two Perspectives [CIKM 2011 Oral] Core NLP including summarization, information extraction, unsupervised grammar induction, dependency parsing, rhetorical parsing, sentiment and polarity analysis… Using Tag-Topic Models and Rhetorical Structure Trees to Generate Bulleted List Summaries[to be submitted to TOIS] Linear, Quadratic and Conic Programming Variants A Thousand Frames in just a Few Words: Lingual Descriptions of Videos through Latent Topic Models and Sparse Object Stitching [CVPR 2013 Spotlight]
  • 93. Just one last thing… • We want to analyze documents not only for topic discovery but also for turning these
  • 94. Just one last thing… • into this  A previous study on sleep deprivation that less sleep resulted in impaired glucose metabolism.  Women who slept less than or equal to 5 hours a night were twice as likely to suffer from hypertension than women. [*]  Children ages 3 to 5 years get 11-13 hours of sleep per night.  Chronic sleep deprivation can do more it can also stress your heart.  Sleeping less than eight hours at night, frequent nightmares and difficulty initiating sleep were significantly associated with drinking.  A single night of sleep deprivation can limit the consolidation of memory the next day.  Women’s health is much more at risk. [*] [*] means that the sentences belong to the same document
  • 95. Just one last thing… • using these Accidents and Natural Disasters Attacks Health and Safety Endangered Resources Investigations and Trials Document sets or “Docsets” Global Tag-Topic Model Local Models Documents and sentences Local Models Local Models Local Models Training using documents Fitting sentences from Docsets to the learnt model Candidate summary sentence for a Docset Weighting a summary sentence from local and global models Candidate summary sentence for a Docset
  • 96. • and these Attribution Cause Elaboration Just one last thing… distractions such as computers or video games in kids ' bedrooms may lessen sleep quality. that only 20 percent of adolescents get the recommended nine hours of sleep ; The National Sleep Foundation reported in 2006 Satellite (Leaf: Span 1) Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3) Nucleus [2] Root [2, 3] Attribution Joint and need more than eight hours of sleep per day . because they 're nocturnal Sleep-deprived teens crash just about anywhere Nucleus (Leaf: Span 1) Nucleus (Leaf: Span 2) Nucleus (Leaf: Span 3) Satellite [2,3] Root [1, 3] Explanation Joint early-risers are actually at a higher risk of developing heart problems. but a Japanese study says Generations have praised the wisdom of getting up early in the morning, Nucleus (Leaf: Span 1) Satellite (Leaf: Span 2) Nucleus (Leaf: Span 3) Nucleus [2,3] Root [1, 3] Contrast Attribution Fortunately for sleepy women , a Penn State College of Medicine study found, Satellite (Leaf: Span 1) Nucleus [2,4] Root [1, 4] that they 're much better than men at enduring sleep deprivation, Nucleus (Leaf: Span 2) possibly because of '' profound demands of infant and child care Nucleus (Leaf: Span 3) placed on them for most of mankind 's history. Satellite (Leaf: Span 4) Satellite [2,3]
  • 97. • With scores like these Just one last thing…
  • 98. Just one last thing… • and these
  • 99. • We want to analyze documents not only for topic discovery but also for turning these • into this • using these • and these • with scores like these • and these The final song: Recap
  • 100. The ending… Interviewer: Do you agree with President Obama’s approach towards Libya? Presidential: [Libya??] I just wanted to make sure we're talking about the same Candidate thing before I say, 'Yes, I agreed' or 'No I didn't agree.' I do not agree with the way he handled it for the following reason -- nope, that's a different one. I got all this stuff twirling around in my head • So that we can always have the right information on our fingertips
  • 101. Summary • Summarize a task using contextual exploratory analysis tools as well as deep NLP and • Make decisions for us! • Topic models can now talk to structured prediction models • Efficient text summarization/translation of domain specific videos is now possible • With multi-document summarization systems which exploit meaning in text, we are getting closer to our ultimate dream: – Construct an artificial assistant who can
  • 102. Future Directions • Core Algorithms – Non-parametric Tag2LDA family models – Address sparsity in tags and scaling of real-valued variables in mixed domain topic models – Efficient inference with more structure among hidden variables • Applications – Type in text and get an object detector [borrowed from VPML] – Intention analysis of videographers in social networks and the evolution of intentions over time – Large scale visualization using rhetorics and topic analysis – Large scale multi-media multi-document summarization
  • 103. Thank You All for Listening Questions?

Editor's Notes

  1. We are shaping a problem space. Each node is a problem and each peak represents a possible solution to that problem.Each problem has associated with it several smaller problems which need to be solved along the way giving rise to the mountainous terrain.We actually do not see this landscape beforehand and shape it as we move forward. A PhD candidate has to go from one peak to another to get a view of the entire landscape from where the candidate can put the landscape created by other luminaries in the field in perspective. So this is my long journey and I did not want to get stuck in one peak only and explore low lying hills (similar to writing one paper and then merely extending it)To create a landscape, we need tools to make the roads and clear away obstacles. But once done, it allows other researchers and practitioners to make use of the road infrastructure to build communities and businesses if the peaks are interesting enough to attract visitors and, of course, go from one place to another with ease.So, let’s get started… the stories of my journey will need some time to be told… and…
  2. The answers are coming in the next one and half hours
  3. A very recent talk by David Blei, who is considered to be the father of topic modeling research, also listed the importance of the problem we tackled as one of the open problems
  4. What topic models do? For sure, they can identify signature words from a corpus of documents in a data driven way.Also you can figure out which of these topics belong to which classes of documents if you have that information-----------------------------And people really wanted this for a long time!
  5. these models are language agnostic (Multilingual capability)--------------------------------------------Imagine automatically producing larger font on some important words in an HTML document – easily done not just by the words alone but also justifying it through their coherence properties
  6. From just counts to richness
  7. Each node is labeled witha word and each hill brings related nodes together with the closeness related to the length of the edges between themImagine all such points lying in a flat piece of paper on a uniform 2D grid of equal length edgesOur job is to re-arrange the nodes, connect them according to their closeness and create the triangulations so that we can discover the landscape shown hereAnd we have to do this without the model ever having any idea of the 3D landscapeThis brings us to an important question…---------------------------------------
  8. + Success of LDA+ Almost 660 citations/year!+ Really widely extended and applied in different contexts------------------------------
  9. But the success of LDA has really been in its generalization performance to fit unseen documents to the trained topic spaceMuch better generalization performance than PLSA or LSALDA can find a basis for distributions over topics unlike SVD which assumes 1 topic per document or computes a span over the topic vectorsmodels improved and they became more and more complex…-------------------------------------
  10. +Comparison of model complexities+Y-axis = HLA X-axis = model complexity
  11. HL = Hair Loss axisAll of these models address the common problem of looking at central tendencies of data
  12. Why do we want to explore? We want to explore because we seek wisdom from everything that is happening around usBut where to start?Well, as Yoda points out, we can start at the centers of the data-------------------------------------Your:Each one of us has our own model of wisdom that gets shaped through our personal exploration of the world around usEach one of us assumes that there is some hypothesis which gives rise to the data around usCenters of data:Big data problem---lots of data around us but which ones are meaningful?Need statistics from the data that meaningfully encode multiple views i.e. modalitiesSufficient statistics (i.e. the function of a sample that encodes all information about the sample) usually represent the centers of the data
  13. + Let’s start at the central tendencies…+ We want to go beyond words to full clips to visualize topics!-------------------------We have devices which continuously capture data and we seek wisdom from such large amounts of data:Wisdom is really about looking at set of representative examples (centers)Wisdom encodes variance in information compactly and completely and this improves decision making
  14. How do the centers look like?These are actual outputs from one of our models. The ground-truth synopses overlaid over the multimedia topics obtained from training data
  15. Assume each data point has an associated binary labelBut we have no training data which is representative of the classes----------------------------------------------
  16. With labels, we can optimize a loss function (similar to interpolation and extrapolation)But we do not have labels and so we need to make assumptions about some function of data only which summarizes all observations and how the observations vary from that summary i.e. find the location and scale estimates as best as possibleLet’s choose the algorithm to be K-means which yields a simple hypothesis set \arg\max_{i,j}(d(x, \mu_i), d(x, \mu_j)) i = 0, j = 1--------------------------------
  17. Lets sample one more point from the ground truth blue class and see if K means made the correct decisions based on limited samples but without any ground truth knowledge? --------------------------------
  18. Clearly there are two misclassifications
  19. We can have additional observed constraints on X: e.g. they are structured into books as a collection of sections which can focus on an idea in a coherent fashionThese structures give rise to co-occurrence which has been exploited before in IR for thesaurus constructionThe better the structure, the better the read – look at the Egyptian man – that man’s hair grew white by just scrolling through the scrollsRodin’s thinker replies to Dr. Corso’s twitter comment on our CVPR paper with “#pow is in #doing” -----------------------------------------There is a inherent partitioning of the linear position space of all wordsThis partitioning is due to the result of some sort of authorship (LDA with many authors = author topic model)
  20. The success behind LDA is really about a balancing actNot easy to balance perfectly: x_9 and x_10 can be misclassified since LDA may want to allocate as few topics to d_2 as possible and chooses the red topicWell, at least now we know why NikWalLenDA can rope walk so easily--------------------------------
  21. Summarization problem (see TAC competitions from NIST)
  22. + Earlier research on discourse analysis mainly used for co-reference resolution+ Has some really intriguing ideas!--------------------------For a sequence of utterances to be a discourse, it must exhibit coherenceIf we denote U_n and U_{n+1} to be adjacent utterances, the backward-looking center of U_n, denoted as C_b(U_n), represents the entity currently being focused on in the discourse after U_n is interpreted. The forward-looking centers of U_n, denoted as C_f(U_n), form an ordered list containing the entities mentioned in U_n, all of which can serve as C_b(U_{n+1}). In general, however, C_b(U_{n+1}) is the most highly ranked element of C_f(U_n) mentioned in U_{n+1}. The C_b of the first utterance in a discourse is undefined. Brennan et al. uses the following ordering:Subject > Existential predicate nominal > Object > Indirect object or oblique > Demarcated adverbial PP
  23. + Inducing a coherence flow comes through a lot of good writing practice+ Imputing a paragraph with salient concepts comes first to the minds of most authors and they tend to focus on the topic, which here is {house, door, furniture, burglary}--------------------------------------
  24. + Incorporating coherence this way does not necessarily lead to the final summary being coherent+ Coherence is best handled as a post processing step using the Traveling Salesman Problem [Conroy et al.]+ There are lots of open question on just multi-document summarization itself…But what I really wanted is… ------------------------------------
  25. to “see” what topics mean?+ Interpreting topics can still be tedious+ Most LDA models ignore metadata even if they are useful--------------------------------These are actual outputs:This is a tough event to match words with frames. The event is “Working on a sewing project”
  26. This is again another tough event to match words with frames. The event is “Repairing an appliance”
  27. + Describing a domain specific video with annotated keywords! This can be useful in robotic vision!
  28. + allowing robots and video recording devices to communicate at a human level
  29. Moving on – PART II+ At this point I was not sure where I should be moving? I had only a very vague idea!+ And you actually don’t know if there *are* other peaks!+ As Yoda pointed out… “Clouded your future is!”
  30. + So now lets visit the document space again+ We look at another model – TagLDA which can incorporate a certain kind of domain knowledge into LDA+ Document partitioned words have associated annotations -> gives rise to two different distributions over words and each distribution affects the other+ A word is observed under the effects of both these distributions------------------------------------
  31. What does this representation buy us?+ Goal is to assign x_9 and x_10 to their correct cluster with the use of domain knowledge+ x_5 and x_10 are annotated with the orange label and x_5 co-occurs with x_9 both in d_1 and d_2+ It is thus likely that x_5, x_9 and x_10 belong to the same class since both documents d_1 and d_2 should contain as few topics as possible
  32. + Fitting a model amounts to forming an hypothesis which can best explain a set of observations+ TagLDA implicitly expands the hypothesis space of topics to search for the best explanation needed to describe the observations with the help of the annotations from domain knowledge------------------------------------
  33. What if we assume that there is an additional perspective overd_i w.r.t x’- Is this an unnatural assumption?
  34. Well not at all!word level tags: Hyperlinked text in bodydocument level tags: Categories
  35. word level tags: question/answerdocument level tags: actual tags for the forum post
  36. word level tags: title, image descriptiondocument level tags: tags given by users
  37. Is the bi-perspective nature of documents ubiquitous?
  38. We don’t have annotations but let’s see how that can be built up!Seems like this is a document on investigation of an industrial espionage
  39. Words to the right are relevant to the topic of the document set – mostly by frequency
  40. Natural language processing based content annotationSince documents are mostly about some events; Certain words strike us – NEs mentioned frequently and across sentencesDependencies between subjects and objects of the important verbs from the document set
  41. The word and doc level tagged words alone are sufficient to summarize the document as bags of wordsSo are we done with the summarization problem?
  42. And now we want things like these! If you are in doubt, ask any member of Dr. Corso’s VPML LabBut,+ High level descriptions are complex+ Spoken Language is complicated with high degrees of paraphrasingThe translator does consciously what the author did instinctively
  43. How can bi-perspective topic models be exploited?The experiments really started off by looking at the image captions and category labels
  44. This slide is self explanatory
  45. Some people call it a mere combinationBut I say it is e-Harmony!
  46. We now cover a particular METag^2LDA model\pi: tag i.e. word annotation distribution over words\beta topic distributions over words\mu and \sigma are fixed regularizersi.e some fixed priors that help in proper scaling of the parameters while optimization-----------------------------
  47. Joint probability distribution belongs to exponential family following Maximum Entropy principleIn the original model, the hidden variables and parameters are coupled leading to an exponential state space to search for the right posterior over the hidden variables---------------------------------------
  48. Delete all observations and edges which lead to the passages of the Bayes Ball being obstructed---------------------------------
  49. Resulting in decoupling of the variables over which posterior needs to be computed+The more the decoupling, the more tractable the inference--------------------------------------
  50. + We use fixed regularizers here+ Introducing exponential family priors for \pi and \beta will need more complicated inference machinery+ There are several other approximation techniques to compute posteriors and hence compute marginal and Mean Field is a deterministic local optimization technique but celebrities have endorsed it--------------------------------------
  51. Even Adrian Monk likes Mean Field factorization!
  52. And now let me introduce our friend – the Mixture of Gaussians for real valued data+ Keep x_1 and x_2 fixed and try to explain the two samples through different location parameters of the Gaussians through log likelihood+ The two surfaces are the error surfaces for the mixture model likelihood for x_1 and x_2 individually+ For discrete data, the mean parameters of the generating distribution is not discrete-------------------------------------
  53. Mixture of two Gaussiansmodel+ Keep the two true location parameters fixed and try to explain samples generated at different distances from the two means through log likelihood+ There is a relation between the parameters of the distribution over the data (usually unknown) and the sufficient statistics as a function of data onlyWhich leads us to…------------------------------------------
  54. Mean Parameters = Expected sufficient statisticsField = energy arising out of interactions with neighboring nodes (in mathematics a field is nothing but a space)\mu_e (the red dots) are the extreme points of the polytope is a function of the sufficient statistics \Upsilon(z,x) for fixed xWhen we optimize over this space, we select one of these red dots and corresponding to it, there is an optimal mean parameter \mu^{\star}Suppose we have the complete data as (Z,X) Z = hidden variables and X = observationsM(G) is the mean parameter space corresponding to expected sufficient statistics of the hidden variables in the original graphical model GFor discrete distributions M(G) is a convex polytope due to intersection of finitely many linear inequalities i.e. half spacesFor each fixed x and p_{\theta}, there is a \mu and As p is varied holding x fixed, the set M is formed\mu provides alternative parameterization of the exponential family distribution and any mean parameter in interior(M(G)) yields a lower bound to A(\theta)Any mean parameter can mean mean parameters of distributions whose moments can be easily computed e.g. factored distributions and those assumptions lead to a non-convex domain over which optimization is performedCartoon constraint is shown in the upper right cornerZ|x ~ Mult(\theta). \Upsilon(z) is the sufficient statistics for z------------------------------------------------log partition functions play an important role in the mapping of \mu to \theta and vice versa\M_F(G) is a subset of M(G) having only the extreme points in common and dependent of the factorization Fover Z to allow discovery of this backward mapping possible in finite timeEasiest implementation of mean filed principle is to consider no direct dependencies between the distributions of the hidden vars
  55. Classic Estimator finding problem:Maximize log likelihood whose objective includes the empirical mean and the log partition functionClassic theorem:Maximize over \mu given a set of observations x to get as close to \theta\mu is dependent on the sufficient statistics associated with the variables whose likelihood we need to maximizeA(\theta) is the log partition function expressed in terms of the dual A(\mu)The dual, A(\mu) is maximized at the negative entropy of the distribution over mean parameters when the latter belong to interior(M(G))Relation between the derivatives of the log partition functions of the primal and the dual are shown in lower left corner
  56. Mean field approximation to joint p(z_1, z_2, z_3): product of independent Bernoulli distributions, p(z_i)In this case, the mean field distributions are exactly in the same exponential family as the true distributionsWrite down each q in exponential form with the log partition function (as a function of canonical parameters in this case)Solve for A(\mu) using maximum over the dual formulation yielding \theta(\mu) and A*(\mu)Solving for A(\theta) using A*(\mu) yields \mu(\theta) = exp(\theta)/(1+exp(\theta))----------------------------------------
  57. Goal is to find \mu from sufficient statistics of ZIn practical problems, there are exponentially many extreme points for all realizations of sufficient statisticsThis shows a cartoon illustration of the solution for mean parameters using linear programming
  58. Unfortunately the set of \mu s under the factorization assumption is a strict subset of M(G)This subset itself is also convex if did not have to match its extreme points to those of the enclosing setThe region over which optimization for \mu needs to happen with the tractable distribution assumption is thus non-convex- This means that we won’t get a globally optimal solution
  59. Let us now look at the relation of this formulation to the mean field formulation of METag^2LDA\Theta^T \mu = \Theta^T \int \sum \Upsilon(\theta, y, z) q(\theta, y, z) for e.g. \sum_{k=1}^K \phi_{m,k} I_{k}[z_m] \log \beta_{k}-A*(\mu) = +H(q) = -q \log q
  60. The big red box is the ELBOTop: E-step inner loop (update variational distributions for every document)Bottom: M-step parameter updates based on mean parameters of the associated document dependent sufficient statisticsFor \beta and \pi, we only do MAP estimation here corresponding to fixed priors which act as regularizers
  61. All of this inference machinery *is needed* to generate exploratory outputs like this!
  62. Non-Correspondence topic models vs. Correspondence topic models
  63. + Within the family of (Corr)MM(E)(Tag2)LDAs modeling joint observations, Corr-METag2LDA performs best+ We need to be careful about what kind of document level tags are we considering? Do those tags really collaborate in refining the topical perspective?-----------------------------------------
  64. Cons:Collocations need to be addressedChains don’t involve causality e.g. (fogs & accidents, [hop length = 12])
  65. So what’s next?
  66. I never looked seriously at this paper “Modeling annotated data” until very late (around 2010)
  67. From this to
  68. This (actually the other way around)
  69. Upper row – training (camera motion and shakes are a real problem for maintaining the bounding boxes)Lower row – trained models
  70. + Role of alpha – alpha provides a topic for every observation. Alpha is a K-vector+ Here each component of alpha is different which helps assign different proportions of observations differently (e.g. one topic can be focusing solely on “stop-words”, another one on “commonly occurring words” and other ones on the different topics etc.)+ This helps identifying a set of “basis” of topic distributions while SVD computes a span over topic distributions
  71. Translation formula (Marginalization over topics)- If there are two topics i.e. K=2, then (for e.g for the 2nd term) 0.5*0.5 + 0.5*0.5 = 0.5 < 0*0.0001 + 0.9*0.9- Values of the inferred \phi’s are very important for the real valued data – separated Gaussians are better but does not always happen- This raises an issue where the real valued data may need to be preprocessed to increase the chances of separation
  72. The sum over K is the marginalization over z in p(w,z)
  73. Again all of these are needed to translate videos into text and vice versa
  74. This is the core problem of video summarization
  75. Psycholinguistics are needed to confirm but that’s not a concern at this pointIn our dataset we have only one ground truth summary---base case for ROUGE evaluation
  76. There are no individual summaries for shots within the clip – only one high level summaryProblems with shot-wise nearest neighbor matching precisely for this reason?The dataset that we use for the video summarization task is released as part of NIST's 2011 TRECVID Multimedia Event Detection (MED) evaluation set. The dataset consists of a collection of Internet multimedia content posted to the various Internet video hosting sites. The training set is organized into 15 event categories, some of which are: 1) Attempting a board trick 2) Feeding an animal 3) Landing a fish 4) Wedding ceremony 5) Working on a woodworking project etc.We use the videos and their textual metadata in all the 15 events as training data. There are 2062 clips with summaries in the training set with almost equal distribution amongst the events. The test set which we use is called the TransparentDevelopment (Dev-T) collection. The Dev-T collection includes positive instances of the first 5 training events and near positive instances for the last 10 events---a total of 630 videos labeled with event category information (and associated human synopses which are to be compared against for summarization performance). Each summary is a short and very high level description of the entire video and ranges from 2 to 40 words but on average 10 words (with stopwords). We remove standard English stopwords and retain only the word morphologies (not required) from the synopses as our training vocabularies. The proportion of videos belonging to events 6 through 15 in the Dev-T set is much low compared to the proportion for the other events since those clips are considered to be “related" instances which cover only part of the event category specifications. The performances of our topic models are evaluated on those kinds of clips as well. The numbers of videos in events 6 through 15 in the Dev-T set are {4,9,5,7,8,3,3,3,10,8} while there are around 120 videos per event for the first 5 events. All other videos in the Dev-T set neither have any event category label nor are identified as positive, negative or related videos and we do not consider these videos in our experiments.
  77. Test ELBOs on events 1-5 in the Dev-T set – Measuring held-out log likelihoods on both videos and associated human summariesPrediction ELBOs on events 1-5 in the Dev-T set – Measuring held-out log likelihoods on just videos in absence of the textLower inverse covariance contributes high positive values to log likelihood + Gaussian entropy can be high too due to overlapping tails
  78. The HEXTAC scores can change from dataset to dataset but max around 40-45% for 100 word summaries
  79. If we can achieve 10% of this for 10 word summaries, we are doing pretty good!Caveat – The text multi-document summarization task is much more complex than this simpler task (w.r.t. summarization)
  80. Purely multinomial topic models showing lower ELBOs can perform quite well in BoW summarization. MMLDA assigns likelihoods based on success and failure of independent events and failures contribute highly negative terms to the log likelihoods but this does not indicate the model's summarization performance where low probability terms are pruned out. Gaussian components allow different but related topics to model GIST features almost equally (strong overlap in the tail of the bell shaped curves - Gaussians) and show poor permutation of predicted words due to the violation of the soft probabilistic constraint of correspondence (this also leads to higher entropy)Scaling of variables in these kinds of mixed domain topic models needs to be looked at more closely
  81. To improve relevancy of the lingual descriptions generated for the domain specific test videos, we present… for the first time ever…
  82. iAnalyze for your videos…
  83. A computer science graduate should never have to cope with information twirling around his head! We need high quality tools to address this problem.
  84. I had taken Late Amar Gopal Bose’s advice in preparing these slides: I took some time out to prepare them leaving everything else behind. As Dr. Bose would say that “creativity never comes under emotional stress or tension. The real creativity comes when the mind finally relaxes and it is quiet and then you can focus.” Watch here [http://www.ndtv.com/video/player/news/remembering-amar-bose/282935?pfrom=home-topstories]. And yes, most of these slides were prepared with a Bose iOE2 headphone over my ears.