SlideShare a Scribd company logo
MediaMeter: A Global Monitor for
Online News Coverage
Tadashi Nomoto
National Institute of Japanese Literature
Finding novel topics in news streams
So far not much success in the literature
(extraction, machine learning)
What we are aiming at
0 2000 4000 6000 8000
0100200300400500600
Rank of Topic Descriptor
Frequency
Frequencies of (manually assigned) topic descriptors that appeared
in the New York Times from June to December, 2013.
Problem
Statistics
 1  2  5  10
42.3% 62.0% 78.9% 87.4%
Frequency > 10 : 12.6%
SVM cannot handle a huge taxonomy (Liu, 2005)
The number of unique topics in NYT over 6 months
exceeds 8,000.
Approach
Memory Based Topic Label
Generation
WikiLabel
1. Look up Wikipedia to find pages most relevant to a news story
2. Generate label candidates from page titles
3. Pick those that are deemed most fit to represent the content
How it works: Overview
WikiLabel: Concept Generation
with Wikipedia
WikiLabel: Concept Generation
with Wikipedia
WikiLabel: Concept Generation
with Wikipedia
l⇤
~✓
= arg max
l:p[l]2U
Prox(p[l], ~✓|N ),
Prox(p[l], ~✓|N ) = Sr(p[l], ~✓|N ) + (1 )Lo(l, ~✓)
content similarity relevance of label
Concept Dictionarynews story
Mechanics
Sr(r, q) =
✓
1 +
NX
t
(q(t) r(t))2
◆ 1
Lo(l, v) =
P|l|
i I(l[i], v)
| l |
1
I(w, v) =
(
1 if w 2 v
0 otherwise.
Use sentence compression to generalize
What if Wikipedia does not know the event ….
Example
2009 detention of American hikers by Iran
detention
detention by Iran
detention of hikers
detention of hikers by Iran
detention of American hikers by Iran
2009 detention
2009 detention by Iran
2009 detention of hikers
2009 detention of hikers by Iran
2009 detention of American hikers by Iran
Making it shorter makes it more general
detention
2009
of
hikers
American
by
Iran
detention
2009
of
hikers
American
by
Iran
C1
C2
C3
Dependency pruning
detention
2009
of
hikers
American
by
Iran
Use every NP in the title as a resource
American hikers
dentetion
hikers
Iran
2009 dentetion of American hikers by Iran
2009 dentetion of hikers by Iran
2009 dentetion of hikers
2009 dentetion by Iran
dentetion of hikers
2009 dentetion
you start here
What you get with extension
Original approach
country media outlets #outlets #stories
us/uk the new york times, yahoo, cnn,
msnbc, fox, washington post, abc,
bbc, reuters
9 2,230 (239,844)
south-korea joongang ilbo (English edition),
chosun ilbo (English edition)
2 2,271(19,008)
japan asahi, jcast, jiji.com, mainichi,
nhk, nikkei, sankei, tbs, tokyo, tv-
asahi, yomiuri
11 2,815 (259,364)
Testing it out in the field
0.001!
0.001!
0.001!
0.001!
0.002!
0.003!
0.006!
0.008!
0.011!
0.011!
0.014!
0.015!
0.016!
0.017!
0.024!
0.035!
0.069!
0.09!
0.144!
0! 0.1! 0.2!
North-Korean abductions of Japanese citizens!
North-Koreans!
province North-Korea!
rocket North-Korea!
North-Korean famine!
North-Korean floods!
North-Korean nuclear test!
North-Korea weapons of mass destruction!
North-Korean abductions!
North-Korean test!
North-Korean missile test!
North-Korea program!
North-Korea United-States relations!
North-Korea weapons!
North-Korea South-Korea relations!
North-Korea Russia relations!
North-Korean defectors!
North-Korea nuclear program!
North-Korea relations!
Topic Popularity (South Korea)!
0.001!
0.002!
0.002!
0.002!
0.003!
0.004!
0.005!
0.008!
0.009!
0.012!
0.018!
0.018!
0.021!
0.025!
0.028!
0.032!
0.039!
0.143!
0.18!
0! 0.1! 0.2!
People's Republic North-Korea relations!
Japan North-Korea relations!
North-Korea women's team!
North-Korean abductions of Japanese citizens!
North-Korean floods!
North-Korea weapons of mass destruction!
North-Korean abductions!
North-Korea program!
North-Korean famine!
North-Korean nuclear test!
North-Korea South-Korea relations!
North-Korean defectors!
North-Korea weapons!
North-Korean missile test!
North-Korean test!
North-Korea United-States relations!
North-Korea Russia relations!
North-Korea relations!
North-Korea nuclear program!
Topic Popularity (US)!
0.002!
0.003!
0.003!
0.003!
0.005!
0.005!
0.008!
0.01!
0.013!
0.014!
0.015!
0.022!
0.023!
0.024!
0.033!
0.049!
0.071!
0.073!
0.585!
0! 0.2! 0.4! 0.6! 0.8!
North-Korean Intelligence Agencies!
Korean Language!
Mount Kumgang>>Tourist Region!
North Korean abductions of Japanese
citizens>>Victims!
North-South Summit!
Prisons in North-Korea!
First Secretary of the Workers' Party of Korea!
North-Korea sponsored schools in Japan!
North-Korean abductions!
Human rights in North-Korea!
North-Korean nuclear test!
North-Korea United-States relations!
North-Korean missile test!
Yeonpyeongdo>>bombardment!
Korean War!
North-Korean abductions of Japanese citizens!
North-Korean defectors!
Workers' Party of Korea!
North-Korean nuclear issue!
Topic Popularity (Japan)!
North-Korean Agenda
0.21!
0.082!
0.079!
0.317!
0.216!
0.025!
0.011!
0.039!
0.06!
0.071!
0.31!
0.427!
0! 0.2! 0.4! 0.6!
North-South relations!
Culture in North-Korea!
Kim Jong-il's visit to China!
Ryongchon disaster!
North-Korean nuclear issues!
Abductions of Japanese!
News Coverage Ratio!
Japan!
South-Korea!
rating explanation
5 Title is one of major topics in Article. Article gives a particular
attention to Title.
4 Part of Article deals with Title. Article makes a clear reference
to Title.
3 Part of Title has some relevance to a dominant theme of Article.
Example: Title ‘European Tax System’ is partially relevant to
an article discussing US Tax System.
2 Article makes a reference to part of Title.
1 Title has no relevance to Article, in whatever way.
language rating #instances
english 4.63 97
japanese 4.41 92
Human Evaluation
s1 s2 rouge-w
The United States of
America
The United States of America 1
The United States The United States of America 0.529
States The United States of America 0.077
Evaluation Metric: ROUGE-W
S(C|k, l) =
1
k
X
c2C|k
rouge-w(c, l)
trank rm0 rm1 rm1/x
NYT 0.000 0.056 0.056 0.069
TDT 0.030 0.042 0.048 0.051
FOX?
0.231 0.264 0.264 0.298
Results
New York Times (2013) : 19,952
TDT (1994) : 15,863
FOX (2015) :11,014
Wikipedia (2012)
Text Rank vs. WikiLabel
• Talked about topic detection using WikiLabel
• Leveraging Wikipedia
• Generalizing concept with sentence compression
• Use of sentence compression led to a huge improvement, producing
performance twice as good as that of TextRank
• Online topic learning seems promising
Summary
0 2000 4000 6000 8000
0100200300400500600
Rank of Topic Descriptor
Frequency
Frequencies of (manually assigned) topic descriptors that appeared
in the New York Times from June to December, 2013.
Solution to Problem
(Online) LearningWikiLabel
27

More Related Content

Similar to Tadashi Nomoto - 2015 - MediaMeter: A Global Monitor for Online News Coverage

Nagar
NagarNagar
092809 Gov Brick City 50m
092809 Gov Brick City 50m092809 Gov Brick City 50m
092809 Gov Brick City 50m
Monta Vista High School
 
Uncovering problems at the turn of century
Uncovering problems at the turn of centuryUncovering problems at the turn of century
Uncovering problems at the turn of century
pjkelly
 
Ancillary tasks
Ancillary tasksAncillary tasks
Ancillary tasks
lshepherd27
 
10 writing for media
10 writing for media10 writing for media
10 writing for media
deepak kulhari
 
Dontrepeat
DontrepeatDontrepeat
Dontrepeat
Knight Center
 
Essay On Various Types Of Pollution
Essay On Various Types Of PollutionEssay On Various Types Of Pollution
Essay On Various Types Of Pollution
Kelsey Bjorklund
 
Newspaper lesson
Newspaper lessonNewspaper lesson
Newspaper lesson
Newspaper lessonNewspaper lesson
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docxRunning head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
jeffsrosalyn
 
The Atlantic Slave Trade (Student Slides)
The Atlantic Slave Trade (Student Slides)The Atlantic Slave Trade (Student Slides)
The Atlantic Slave Trade (Student Slides)
stormcrow141
 
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docxExample of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
elbanglis
 
Entry 1 Sense of PlaceYour first entry will provide you with an o.docx
Entry 1 Sense of PlaceYour first entry will provide you with an o.docxEntry 1 Sense of PlaceYour first entry will provide you with an o.docx
Entry 1 Sense of PlaceYour first entry will provide you with an o.docx
elbanglis
 
PART II JOURNAL ACTIVITYDirections Select a current event th.docx
PART II JOURNAL ACTIVITYDirections Select a current event th.docxPART II JOURNAL ACTIVITYDirections Select a current event th.docx
PART II JOURNAL ACTIVITYDirections Select a current event th.docx
odiliagilby
 
Intro.deck.0-3
Intro.deck.0-3Intro.deck.0-3
Intro.deck.0-3
Will Shown
 
Studentquest
StudentquestStudentquest
Studentquest
iriagarci1
 
first loyalty is to chose a recent media issue and.docx
first loyalty is to chose a recent media issue and.docxfirst loyalty is to chose a recent media issue and.docx
first loyalty is to chose a recent media issue and.docx
write4
 
Royalkapila
RoyalkapilaRoyalkapila
Royalkapila
Knight Center
 
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the PastSandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
Reynolds Journalism Institute (RJI)
 
Research project overview
Research project overviewResearch project overview
Research project overview
jubie1978
 

Similar to Tadashi Nomoto - 2015 - MediaMeter: A Global Monitor for Online News Coverage (20)

Nagar
NagarNagar
Nagar
 
092809 Gov Brick City 50m
092809 Gov Brick City 50m092809 Gov Brick City 50m
092809 Gov Brick City 50m
 
Uncovering problems at the turn of century
Uncovering problems at the turn of centuryUncovering problems at the turn of century
Uncovering problems at the turn of century
 
Ancillary tasks
Ancillary tasksAncillary tasks
Ancillary tasks
 
10 writing for media
10 writing for media10 writing for media
10 writing for media
 
Dontrepeat
DontrepeatDontrepeat
Dontrepeat
 
Essay On Various Types Of Pollution
Essay On Various Types Of PollutionEssay On Various Types Of Pollution
Essay On Various Types Of Pollution
 
Newspaper lesson
Newspaper lessonNewspaper lesson
Newspaper lesson
 
Newspaper lesson
Newspaper lessonNewspaper lesson
Newspaper lesson
 
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docxRunning head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
Running head [BRIEF VERSION OF THE TITLE, ALL CAPS] 5.docx
 
The Atlantic Slave Trade (Student Slides)
The Atlantic Slave Trade (Student Slides)The Atlantic Slave Trade (Student Slides)
The Atlantic Slave Trade (Student Slides)
 
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docxExample of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
Example of a Full-Sentence OutlineTitle WARMING OUR WORLD.docx
 
Entry 1 Sense of PlaceYour first entry will provide you with an o.docx
Entry 1 Sense of PlaceYour first entry will provide you with an o.docxEntry 1 Sense of PlaceYour first entry will provide you with an o.docx
Entry 1 Sense of PlaceYour first entry will provide you with an o.docx
 
PART II JOURNAL ACTIVITYDirections Select a current event th.docx
PART II JOURNAL ACTIVITYDirections Select a current event th.docxPART II JOURNAL ACTIVITYDirections Select a current event th.docx
PART II JOURNAL ACTIVITYDirections Select a current event th.docx
 
Intro.deck.0-3
Intro.deck.0-3Intro.deck.0-3
Intro.deck.0-3
 
Studentquest
StudentquestStudentquest
Studentquest
 
first loyalty is to chose a recent media issue and.docx
first loyalty is to chose a recent media issue and.docxfirst loyalty is to chose a recent media issue and.docx
first loyalty is to chose a recent media issue and.docx
 
Royalkapila
RoyalkapilaRoyalkapila
Royalkapila
 
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the PastSandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
Sandhaus, Van Valkenburg, Cotler; NYT Technical Team: The Future of the Past
 
Research project overview
Research project overviewResearch project overview
Research project overview
 

More from Association for Computational Linguistics

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Association for Computational Linguistics
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Association for Computational Linguistics
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Association for Computational Linguistics
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Association for Computational Linguistics
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Association for Computational Linguistics
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Association for Computational Linguistics
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Association for Computational Linguistics
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Association for Computational Linguistics
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Association for Computational Linguistics
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Association for Computational Linguistics
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
Association for Computational Linguistics
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Association for Computational Linguistics
 

More from Association for Computational Linguistics (20)

Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal TextMuis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
Muis - 2016 - Weak Semi-Markov CRFs for NP Chunking in Informal Text
 
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
Castro - 2018 - A High Coverage Method for Automatic False Friends Detection ...
 
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour AnalysisCastro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
Castro - 2018 - A Crowd-Annotated Spanish Corpus for Humour Analysis
 
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
Muthu Kumar Chandrasekaran - 2018 - Countering Position Bias in Instructor In...
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text SimplificationElior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
Elior Sulem - 2018 - Semantic Structural Evaluation for Text Simplification
 
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future DirectionsDaniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
Daniel Gildea - 2018 - The ACL Anthology: Current State and Future Directions
 
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
Wenqiang Lei - 2018 - Sequicity: Simplifying Task-oriented Dialogue Systems w...
 
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
Matthew Marge - 2017 - Exploring Variation of Natural Human Commands to a Rob...
 
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
Venkatesh Duppada - 2017 - SeerNet at EmoInt-2017: Tweet Emotion Intensity Es...
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
John Richardson - 2015 - KyotoEBMT System Description for the 2nd Workshop on...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
Zhongyuan Zhu - 2015 - Evaluating Neural Machine Translation in English-Japan...
 
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
Hyoung-Gyu Lee - 2015 - NAVER Machine Translation System for WAT 2015
 
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 WorkshopSatoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
Satoshi Sonoh - 2015 - Toshiba MT System Description for the WAT2015 Workshop
 
Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015Chenchen Ding - 2015 - NICT at WAT 2015
Chenchen Ding - 2015 - NICT at WAT 2015
 
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
Graham Neubig - 2015 - Neural Reranking Improves Subjective Quality of Machin...
 

Recently uploaded

REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
giancarloi8888
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
khuleseema60
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
Celine George
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
EduSkills OECD
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
haiqairshad
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
سمير بسيوني
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
IsmaelVazquez38
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
Steve Thomason
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
National Information Standards Organization (NISO)
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
danielkiash986
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
deepaannamalai16
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
Celine George
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
imrankhan141184
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
David Douglas School District
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
National Information Standards Organization (NISO)
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
TechSoup
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
melliereed
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
zuzanka
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
Mohammad Al-Dhahabi
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
Jyoti Chand
 

Recently uploaded (20)

REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdfREASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
REASIGNACION 2024 UGEL CHUPACA 2024 UGEL CHUPACA.pdf
 
MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025MDP on air pollution of class 8 year 2024-2025
MDP on air pollution of class 8 year 2024-2025
 
How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17How to Predict Vendor Bill Product in Odoo 17
How to Predict Vendor Bill Product in Odoo 17
 
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxBeyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptx
 
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skillsspot a liar (Haiqa 146).pptx Technical writhing and presentation skills
spot a liar (Haiqa 146).pptx Technical writhing and presentation skills
 
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdfمصحف القراءات العشر   أعد أحرف الخلاف سمير بسيوني.pdf
مصحف القراءات العشر أعد أحرف الخلاف سمير بسيوني.pdf
 
Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.Bossa N’ Roll Records by Ismael Vazquez.
Bossa N’ Roll Records by Ismael Vazquez.
 
A Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two HeartsA Visual Guide to 1 Samuel | A Tale of Two Hearts
A Visual Guide to 1 Samuel | A Tale of Two Hearts
 
Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"Benner "Expanding Pathways to Publishing Careers"
Benner "Expanding Pathways to Publishing Careers"
 
Pharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brubPharmaceutics Pharmaceuticals best of brub
Pharmaceutics Pharmaceuticals best of brub
 
Standardized tool for Intelligence test.
Standardized tool for Intelligence test.Standardized tool for Intelligence test.
Standardized tool for Intelligence test.
 
How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17How Barcodes Can Be Leveraged Within Odoo 17
How Barcodes Can Be Leveraged Within Odoo 17
 
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
Traditional Musical Instruments of Arunachal Pradesh and Uttar Pradesh - RAYH...
 
Juneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School DistrictJuneteenth Freedom Day 2024 David Douglas School District
Juneteenth Freedom Day 2024 David Douglas School District
 
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
Jemison, MacLaughlin, and Majumder "Broadening Pathways for Editors and Authors"
 
Leveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit InnovationLeveraging Generative AI to Drive Nonprofit Innovation
Leveraging Generative AI to Drive Nonprofit Innovation
 
Nutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour TrainingNutrition Inc FY 2024, 4 - Hour Training
Nutrition Inc FY 2024, 4 - Hour Training
 
SWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptxSWOT analysis in the project Keeping the Memory @live.pptx
SWOT analysis in the project Keeping the Memory @live.pptx
 
skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)skeleton System.pdf (skeleton system wow)
skeleton System.pdf (skeleton system wow)
 
Wound healing PPT
Wound healing PPTWound healing PPT
Wound healing PPT
 

Tadashi Nomoto - 2015 - MediaMeter: A Global Monitor for Online News Coverage

  • 1. MediaMeter: A Global Monitor for Online News Coverage Tadashi Nomoto National Institute of Japanese Literature
  • 2. Finding novel topics in news streams So far not much success in the literature (extraction, machine learning) What we are aiming at
  • 3. 0 2000 4000 6000 8000 0100200300400500600 Rank of Topic Descriptor Frequency Frequencies of (manually assigned) topic descriptors that appeared in the New York Times from June to December, 2013. Problem
  • 4. Statistics  1  2  5  10 42.3% 62.0% 78.9% 87.4% Frequency > 10 : 12.6%
  • 5. SVM cannot handle a huge taxonomy (Liu, 2005) The number of unique topics in NYT over 6 months exceeds 8,000.
  • 6. Approach Memory Based Topic Label Generation WikiLabel
  • 7. 1. Look up Wikipedia to find pages most relevant to a news story 2. Generate label candidates from page titles 3. Pick those that are deemed most fit to represent the content How it works: Overview
  • 11. l⇤ ~✓ = arg max l:p[l]2U Prox(p[l], ~✓|N ), Prox(p[l], ~✓|N ) = Sr(p[l], ~✓|N ) + (1 )Lo(l, ~✓) content similarity relevance of label Concept Dictionarynews story Mechanics
  • 12. Sr(r, q) = ✓ 1 + NX t (q(t) r(t))2 ◆ 1 Lo(l, v) = P|l| i I(l[i], v) | l | 1 I(w, v) = ( 1 if w 2 v 0 otherwise.
  • 13. Use sentence compression to generalize What if Wikipedia does not know the event ….
  • 14. Example 2009 detention of American hikers by Iran detention detention by Iran detention of hikers detention of hikers by Iran detention of American hikers by Iran 2009 detention 2009 detention by Iran 2009 detention of hikers 2009 detention of hikers by Iran 2009 detention of American hikers by Iran Making it shorter makes it more general
  • 17. American hikers dentetion hikers Iran 2009 dentetion of American hikers by Iran 2009 dentetion of hikers by Iran 2009 dentetion of hikers 2009 dentetion by Iran dentetion of hikers 2009 dentetion you start here What you get with extension Original approach
  • 18. country media outlets #outlets #stories us/uk the new york times, yahoo, cnn, msnbc, fox, washington post, abc, bbc, reuters 9 2,230 (239,844) south-korea joongang ilbo (English edition), chosun ilbo (English edition) 2 2,271(19,008) japan asahi, jcast, jiji.com, mainichi, nhk, nikkei, sankei, tbs, tokyo, tv- asahi, yomiuri 11 2,815 (259,364) Testing it out in the field
  • 19. 0.001! 0.001! 0.001! 0.001! 0.002! 0.003! 0.006! 0.008! 0.011! 0.011! 0.014! 0.015! 0.016! 0.017! 0.024! 0.035! 0.069! 0.09! 0.144! 0! 0.1! 0.2! North-Korean abductions of Japanese citizens! North-Koreans! province North-Korea! rocket North-Korea! North-Korean famine! North-Korean floods! North-Korean nuclear test! North-Korea weapons of mass destruction! North-Korean abductions! North-Korean test! North-Korean missile test! North-Korea program! North-Korea United-States relations! North-Korea weapons! North-Korea South-Korea relations! North-Korea Russia relations! North-Korean defectors! North-Korea nuclear program! North-Korea relations! Topic Popularity (South Korea)! 0.001! 0.002! 0.002! 0.002! 0.003! 0.004! 0.005! 0.008! 0.009! 0.012! 0.018! 0.018! 0.021! 0.025! 0.028! 0.032! 0.039! 0.143! 0.18! 0! 0.1! 0.2! People's Republic North-Korea relations! Japan North-Korea relations! North-Korea women's team! North-Korean abductions of Japanese citizens! North-Korean floods! North-Korea weapons of mass destruction! North-Korean abductions! North-Korea program! North-Korean famine! North-Korean nuclear test! North-Korea South-Korea relations! North-Korean defectors! North-Korea weapons! North-Korean missile test! North-Korean test! North-Korea United-States relations! North-Korea Russia relations! North-Korea relations! North-Korea nuclear program! Topic Popularity (US)! 0.002! 0.003! 0.003! 0.003! 0.005! 0.005! 0.008! 0.01! 0.013! 0.014! 0.015! 0.022! 0.023! 0.024! 0.033! 0.049! 0.071! 0.073! 0.585! 0! 0.2! 0.4! 0.6! 0.8! North-Korean Intelligence Agencies! Korean Language! Mount Kumgang>>Tourist Region! North Korean abductions of Japanese citizens>>Victims! North-South Summit! Prisons in North-Korea! First Secretary of the Workers' Party of Korea! North-Korea sponsored schools in Japan! North-Korean abductions! Human rights in North-Korea! North-Korean nuclear test! North-Korea United-States relations! North-Korean missile test! Yeonpyeongdo>>bombardment! Korean War! North-Korean abductions of Japanese citizens! North-Korean defectors! Workers' Party of Korea! North-Korean nuclear issue! Topic Popularity (Japan)! North-Korean Agenda 0.21! 0.082! 0.079! 0.317! 0.216! 0.025! 0.011! 0.039! 0.06! 0.071! 0.31! 0.427! 0! 0.2! 0.4! 0.6! North-South relations! Culture in North-Korea! Kim Jong-il's visit to China! Ryongchon disaster! North-Korean nuclear issues! Abductions of Japanese! News Coverage Ratio! Japan! South-Korea!
  • 20. rating explanation 5 Title is one of major topics in Article. Article gives a particular attention to Title. 4 Part of Article deals with Title. Article makes a clear reference to Title. 3 Part of Title has some relevance to a dominant theme of Article. Example: Title ‘European Tax System’ is partially relevant to an article discussing US Tax System. 2 Article makes a reference to part of Title. 1 Title has no relevance to Article, in whatever way. language rating #instances english 4.63 97 japanese 4.41 92 Human Evaluation
  • 21.
  • 22.
  • 23.
  • 24. s1 s2 rouge-w The United States of America The United States of America 1 The United States The United States of America 0.529 States The United States of America 0.077 Evaluation Metric: ROUGE-W S(C|k, l) = 1 k X c2C|k rouge-w(c, l)
  • 25. trank rm0 rm1 rm1/x NYT 0.000 0.056 0.056 0.069 TDT 0.030 0.042 0.048 0.051 FOX? 0.231 0.264 0.264 0.298 Results New York Times (2013) : 19,952 TDT (1994) : 15,863 FOX (2015) :11,014 Wikipedia (2012) Text Rank vs. WikiLabel
  • 26. • Talked about topic detection using WikiLabel • Leveraging Wikipedia • Generalizing concept with sentence compression • Use of sentence compression led to a huge improvement, producing performance twice as good as that of TextRank • Online topic learning seems promising Summary
  • 27. 0 2000 4000 6000 8000 0100200300400500600 Rank of Topic Descriptor Frequency Frequencies of (manually assigned) topic descriptors that appeared in the New York Times from June to December, 2013. Solution to Problem (Online) LearningWikiLabel 27