Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
GENERATING NEWSCASTS
SEMANTIC SNAPSHOTS USING
ENTITY EXPANSION
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
...
NEWS CONSUMPTION
SEMANTIC SNAPSHOT
(NSS)
Named Entity
Expansion
News item
2
News Semantic Snapshot
(NSS)
Snowden asks
Russ...
NEWS ENTITY
EXPANSION
NSS
June 25, 2015 3
(20) (1) (4) (4)
Web-based, Unsupervised, Sequential
15th International Conferen...
Involving: (experts in the news domain + users)
Dimensions:
Play with the data and help us to extend it at:
https://github...
DOCUMENT
COLLECTION
(20 variations)
Using Google Custom Search Engine (CSE)1
[1] https://cse.google.com/cse/all
June 25, 2...
DOCUMENT
ANNOTATION
NER extractors in
NERD *
(*) Benchmarking the Extraction and
Disambiguation of Named Entities on
the S...
ENTITY FILTERING
(4 variations)
Filtering dimensions:
- F1: NERD type:
- Person
- Organization
- Location
- F2: Confidence...
RANKING
STRATEGIES (1)
increase representativeness  leverage on entity frequency
June 25, 2015 8
(Freq) (Gaussian)
15th I...
RANKING
STRATEGIES (2)
Rules: [ Sel(e) , ]
POPULARITY EXPERT RULES
9
- Based on Google Trends
- w = 2 months
- μ + 2*σ (2....
EVALUATION:
MEASURES
Mean P/R at N:
- Most popular
- Easy to interpret
Mean Average Precision at N (MAP):
- Considers rank...
RESULTS (1)
Baselines:
BS1: Former Entity Expansion Implementation*
• Google
• No temporal window
• No_Schema.org
• No_Fil...
RESULTS(2)
1
20 x 4 x 4 =
320 runs
F3 Freq + POP + EXPGoogle + 2W + Schema.org 12
CONCLUSIONS & FUTURE WORK
- News Entity Expansion  Generate the News
Semantic Snapshot
- Best score: 0.666 in MNDCG at 10...
JOSÉ LUIS REDONDO GARCIA
GIUSEPPE RIZZO
LILIA PÉREZ ROMERO
MICHIEL HILDEBRAND
RAPHAËL TRONCY
@peputo / redondo@eurecom.fr
...
Upcoming SlideShare
Loading in …5
×

News Semantic Snapshot

8,084 views

Published on

TV newscasts report about the latest event-related facts oc- curring in the world. Relying exclusively on them is, however, insufficient to fully grasp the context of the story being reported. In this paper, we propose an approach that retrieves and analyzes related documents from the Web to automatically generate semantic annotations that provide viewers and experts comprehensive information about the news. Using different Semantic Web and information retrieval techniques, we generate what we call Semantic Snapshot of a Newscast (NSS)

Published in: Engineering
  • Be the first to comment

News Semantic Snapshot

  1. 1. GENERATING NEWSCASTS SEMANTIC SNAPSHOTS USING ENTITY EXPANSION JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / redondo@eurecom.fr @giusepperizzo / giuseppe.rizzo@eurecom.fr L.Perez@cwi.nl @McHildebrand / Michiel.Hildebrand@cwi.nl @rtroncy / raphael.troncy@eurecom.fr
  2. 2. NEWS CONSUMPTION SEMANTIC SNAPSHOT (NSS) Named Entity Expansion News item 2 News Semantic Snapshot (NSS) Snowden asks Russia for asylum 15th International Conference on Web Engineering (ICWE)June 25, 2015
  3. 3. NEWS ENTITY EXPANSION NSS June 25, 2015 3 (20) (1) (4) (4) Web-based, Unsupervised, Sequential 15th International Conference on Web Engineering (ICWE)
  4. 4. Involving: (experts in the news domain + users) Dimensions: Play with the data and help us to extend it at: https://github.com/jluisred/NewsConceptExpansion/wiki/Golden- Standard-Creation EVALUATION: NEWS ENTITIES GOLD STANDARD (1) Video Subtitles (2) Image in the video (3) Text in the video image (4) Suggestions of an expert (5) Related articles 4June 25, 2015 15th International Conference on Web Engineering (ICWE)
  5. 5. DOCUMENT COLLECTION (20 variations) Using Google Custom Search Engine (CSE)1 [1] https://cse.google.com/cse/all June 25, 2015 5 N …N NN N N N N N N N N N N N N N N N Web sites to be crawled: - Google: - L1 : A set of 10 internationals English speaking newspapers - L2 : A set of 3 international newspapers used in GS Temporal Window: - 1W: - 2W: Annotation filtering: 15th International Conference on Web Engineering (ICWE)
  6. 6. DOCUMENT ANNOTATION NER extractors in NERD * (*) Benchmarking the Extraction and Disambiguation of Named Entities on the Semantic Web, Rizzo et al. (2004) 6June 25, 2015 15th International Conference on Web Engineering (ICWE)
  7. 7. ENTITY FILTERING (4 variations) Filtering dimensions: - F1: NERD type: - Person - Organization - Location - F2: Confidence score: > Threshold - F3: Capitalization: country president Obama asylum June 25, 2015 715th International Conference on Web Engineering (ICWE)
  8. 8. RANKING STRATEGIES (1) increase representativeness  leverage on entity frequency June 25, 2015 8 (Freq) (Gaussian) 15th International Conference on Web Engineering (ICWE)
  9. 9. RANKING STRATEGIES (2) Rules: [ Sel(e) , ] POPULARITY EXPERT RULES 9 - Based on Google Trends - w = 2 months - μ + 2*σ (2.5%) - . Example: - [ Location, = 0.48 ] - [ Person, = 0.74 ] - [ Organization, = 0.95 ] - [ < 2 , = 0.0 ] (4 variations) June 25, 2015 15th International Conference on Web Engineering (ICWE) 9
  10. 10. EVALUATION: MEASURES Mean P/R at N: - Most popular - Easy to interpret Mean Average Precision at N (MAP): - Considers ranking - Relevant documents at the top positions Mean Normalized Discounted Cumulative Gain at N (MNDCG): - Different levels of document relevance - The lower an high relevant document is ranked, the less useful is for the user N = 10 June 25, 2015 1015th International Conference on Web Engineering (ICWE)
  11. 11. RESULTS (1) Baselines: BS1: Former Entity Expansion Implementation* • Google • No temporal window • No_Schema.org • No_Filter • BS2: TFIDF-based Function. June 25, 2015 1115th International Conference on Web Engineering (ICWE) (*) Describing and Contextualizing Events in TV News Show, Redondo et al. (2014)
  12. 12. RESULTS(2) 1 20 x 4 x 4 = 320 runs F3 Freq + POP + EXPGoogle + 2W + Schema.org 12
  13. 13. CONCLUSIONS & FUTURE WORK - News Entity Expansion  Generate the News Semantic Snapshot - Best score: 0.666 in MNDCG at 10, better than BS1/2 • Collection: CSE (Google + 2W + Schema.org) • Filtering: F3 • Ranking: Freq + POP + EXP What’s next: - Extend the Ground Truth - Supervised approach - Better exploit semantic connections between entities in KB - Is MNDCG@10 an ideal indicator for assessing NSS quality? June 25, 2015 1315th International Conference on Web Engineering (ICWE)
  14. 14. JOSÉ LUIS REDONDO GARCIA GIUSEPPE RIZZO LILIA PÉREZ ROMERO MICHIEL HILDEBRAND RAPHAËL TRONCY @peputo / redondo@eurecom.fr @giusepperizzo / giuseppe.rizzo@eurecom.fr L.Perez@cwi.nl @McHildebrand / Michiel.Hildebrand@cwi.nl @rtroncy / raphael.troncy@eurecom.fr

×