SlideShare a Scribd company logo
0
Modeling and Summarizing News Events
Using Semantic Triples
ESWC’2018,Heraklion|Crete,Greece
Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt
5 June 2018
Eve the News Reader
1
… the deal is unfair for US
Industries … his desire to overturn
anything that has Obama’s stamp
of approval on it …
and 1000++ more
… Hurricane Nate slammed
Louisiana … The hurricane
killed 2 people … The pope
sent his prayer …
and 1000++ more
… Three people sustained
injuries … politicians are
corrupt … police staged
mass raids …
and 1000++ more
THIS IS
TIME
CONSUMING
Wall-e promises…
2
Don’t worry Eve! I’ll
build a summary
for you!
!
Extractive vs Abstractive
3
The Chemistry of Abstractive Summarization
“… Hurricane Nate
slammed Louisiana …
Nate struck The State
of Louisiana … The
hurricane in Louisiana
killed 2 people …”
“Hurricane” “Nate” “slammed” “Louisiana”
“Nate” “struck” “the” “State” “of” “Louisiana”
“The” “hurricane” “killed” “2” “people”
Information
Extraction
Fact fusion or merging
Extractive summary:
“Hurricane Nate slammed Louisiana.”
“The hurricane killed 2 people.”
Abstractive summary, with conjoined facts:
“Hurricane Nate slammed Louisiana, killing 2 people.”
Open Information Extraction
Entity Linking
Verb Linking 4
Abstractive News Summarization Approaches
5
Phrase-selection based
(PSB)
• Tentatively pairs
subject and verb phrases
from different sentences
• Checks for compatibility
Pattern-graph Fusion
(PGF)
• Looks for similar tokens
in different sentences
• Fuses the tokens,
thus forming a graph
PSB Example
“Hurricane” “Nate” “slammed” “Louisiana”
“It” “killed” “2” “people”
compatible
compatible
similar
“Hurricane Nate slammed Louisiana, killed 2 people”
“Nate” “struck” “the” “State” “of” “Louisiana”
compatible
similar
similar
6
PSB Limitation
7
Dilemma: losing information or being redundant
PSB Limitation: Subclause/Adverbial Fact
“Hurricane” “Nate” “slammed” “Louisiana” “after leaving Central America”
“Nate” “slammed Louisiana with 85mph winds”
similar?
Incomplete case: “Hurricane Nate slammed Louisiana after leaving Central America.”
Redundant case: “Hurricane Nate slammed Louisiana after leaving Central America,
slammed Louisiana with 85mph winds.”
8
Pattern Graph Fusion (PGF)
<people; killed by; hurricane>
<people; died in; Louisiana>
Extraction
OLLIE
9
Stanford
SEMAFOR
<PERSON; killed by; PROTAGONIST>
<PERSON; died in; LOCATION>
Typing
PERSON
killed by died in
PROTAG LOC
Graph fusion
*Li et al. Weakly Supervised Natural Language Processing Framework
for Abstractive Multi-Document Summarization (CIKM’15)
Pattern Graph Fusion: Extraction
“… Hurricane Nate slammed Louisiana with 85 mph winds …”
“… On Saturday, Nate struck Louisiana …”
“… In Louisiana, the hurricane killed 2 people …”
“Hurricane” “Nate” “slammed” “Louisiana” “with” “85” “mph” “winds”
“On” “Saturday,” “Nate” “struck” “Louisiana”
“The” “hurricane” “in” “Louisiana” “killed” “2” “people”
OLLIE
10
Extraction Typing Fusion
TypingExtraction
Pattern Graph Fusion: Typing
Stanford NLP
& SEMAFOR
“Hurricane” PERSON “slammed” LOCATION “with” “85” “mph” “winds”
“On” DATE “,” PERSON “struck” LOCATION
“The” KILLER “in” “Louisiana” “killed” “2” VICTIM
“Hurricane” “Nate” “slammed” “Louisiana” “with” “85” “mph” “winds”
“On” “Saturday,” “Nate” “struck” “Louisiana”
“The” “hurricane” “in” “Louisiana” “killed” “2” “people”
11
Extraction Typing Fusion
Fusion of Sentences along Types
Hurricane PERSON slammed LOCATION with 85 mph
winds
On DATE , struck
The KILLER killed 2 VICTIM
“On Saturday, Nate slammed Louisiana with 85 mph winds”
in
12
Extraction Typing Fusion
PERSON slammed LOCATION with 85 mph
winds
On DATE ,
Pattern Graph Fusion: Limitations #1
Hurricane PERSON slammed LOCATION with 85 mph
winds
On DATE , struck
The KILLER killed 2 VICTIM
“On Saturday, Nate sent his prayer for Louisiana with 85 mph winds”
in
Correctness: Typing might lead to incorrect result
Pope sent his RITE for
13
PERSON LOCATION with 85 mph
winds
On DATE ,
sent his RITE for
Pattern Graph Fusion: Limitations #2
• Coverage: Merging misses semantically similar verbs
• “withdraw” vs “pull out”
• Grammaticality: Merging leads to ungrammatical sentences
• “The US pulled out from the Paris Agreement caused
disappointment among environmentalist.”
14
Our Summarization
We improve
• Correctness and coverage by Entity Linking
• Coverage by Verb Linking
• Grammaticality by Grammatical Fixing
15
Clustering
Fusion, Ranking
and Generation
Our Summarization: Pipeline
Dataset Triple Extraction
Entity
Linking (EL)
Verb Linking
(VL)
Fusion, Ranking
and Generation
Summary Typing
16
Pipeline: Entity Linking
17
• Named entity recognition: DBpedia Spotlight
• Name normalization: DBpedia
• Coreference resolution: DBpedia
Stanford Coref
Hurricane Nate slammed Louisiana with 85 winds
The hurricane in The Creole State killed 2 people
It struck Central America with 90
dbp:type hurricane dbo:WikiPageRedirectsOf
winds
mph
mph
Pipeline: Verb Linking
18
• WordNet Similarity > 0.9
It struck Central America on Monday
Hurricane Nate slammed Louisiana with 85 windsmph
Pipeline: Fusion and Ranking
19
#1
• What should be merged?
• Entities, Verbs
• Types of non-entity and non-verb tokens, if exist
• Non-stopword tokens
• The fused graph when merging entities and verbs:
killed 2 people
Hurricane Nate slammed Louisiana with 85 windsmph
Central America with 90 windsmph
in
2
• Path collection
• Path ranking criteria
• Path coverage
• Node degree
• “Hurricane Nate in Louisiana with 85 mph winds”
• Path coverage = 2, Avg node degree = 15/7 = 2.14
• “Hurricane Nate slammed Louisiana killed 2 people”
• Path coverage = 2, Avg node degree = 15/6 = 2.5
Pipeline: Fusion and Ranking
20
#2
killed 2 people
2 4 4 2 2 1
2 2 1
2 2 2 1
Hurricane Nate slammed Louisiana with 85 windsmph
2
Central America with 90 windsmph
2
in
2
2
Pipeline: Summary Generation
21
• Entity: the canonicalized form of DBpedia
• Verb: the form most frequently used in the cluster
Hurricane Nate slammed Louisiana
killed 2 people
2
It
The hurricane struck The Creole State
Pipeline: Summary Generation
22
Grammatical fixing for dangling verb
• Dangling verb identification (e.g, subject presence check)
• Verb active/passive type check
 Fix verb (e.g., transform to participle)
killed 2 peopleHurricane Nate slammed Louisiana
subj
subj?
dobj
killing
Evaluation
23
• DUC 2004 and DUC 2007 Dataset
• 95 news topics in total
• For each topic:
• 10 news articles
• 4 human summaries as the gold standard
• (for DUC’07) semantic gold standard
• Settings: different merging strategies to test
• entities, verbs, typings, non-stopwords
• Metrics:
• ROUGE: n-gram overlaps with the gold standard
• PYRAMID: semantic comparison with the semantic gold standard
• Human assessment to measure coherence (grammaticality) and
correctness
Results: Rouge
24
EL + T + NS
VL + T + NS
Our (VL)EL + VL + T + NS
EL + VL + NS
EL + VL
EL + VL + T
Baseline (T + NS)
Baseline=Li et al. Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization (CIKM’15)
EL= entity linking, VL=verb linking, T=typing, NS=non-stopwords
Rouge-1 Rouge-2 Rouge-1 Rouge-2
Recall Recall Recall Recall
Results: Pyramid
25
EL+VL+T
Baseline (T + NS)
Manual: Humans judge quality on a
5-grade Likert scale
Pyramid: Semantic similarity with
gold standard, checked using low-
dimensional latent word vector*
EL + T + NS
VL + T + NS
Our (VL)EL + VL + T + NS
EL + VL + NS
EL + VL
EL + VL + T
Baseline (T + NS)
Recall
Baseline=Li et al. Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization (CIKM’15)
EL= entity linking, VL=verb linking, T=typing, NS=non-stopwords
*Guo & Diab, ACL’12
• PGF is best-fit for abstractive summarization
• We enriched PGF by semantic annotations from kbs:
• entity and verb linking
• path ranking leveraging node degrees
+ added grammatical fixes
• Experiments show our enrichments outperform the baseline
26
Conclusions
• More fine-grained representation of facts
• Semantic ranking
• Fluency
27
Future work
Hurricane Nate slammed Louisiana
killing …
causing …
On Saturday
After leaving…
With winds of …
From Information Extraction to Abstractive Summarization 28
Thank you!
Wall-e summarizes: I found that between extractive and abstractive
summarization, the latter is what we want. Then, I found that pattern graph
fusion is best-fit for abstractive summarization due to the dilemma of phrase
selection-based approaches. I improved PGF approach by introducing semantic
enrichments that are entity linking, verb linking, and path ranking leveraging
node degrees, on top of adding some grammatical fixes. My experiment shows
that our enrichments outperform the baseline.

More Related Content

Recently uploaded

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
TrustArc
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
Ivanti
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
Mariano Tinti
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
Federico Razzoli
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
danishmna97
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
Matthew Sinclair
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
Postman
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
shyamraj55
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
Tomaz Bratanic
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
Quotidiano Piemontese
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
jpupo2018
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
Jason Packer
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Safe Software
 

Recently uploaded (20)

TrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc Webinar - 2024 Global Privacy Survey
TrustArc Webinar - 2024 Global Privacy Survey
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
June Patch Tuesday
June Patch TuesdayJune Patch Tuesday
June Patch Tuesday
 
Mariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceXMariano G Tinti - Decoding SpaceX
Mariano G Tinti - Decoding SpaceX
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Webinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data WarehouseWebinar: Designing a schema for a Data Warehouse
Webinar: Designing a schema for a Data Warehouse
 
How to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptxHow to Get CNIC Information System with Paksim Ga.pptx
How to Get CNIC Information System with Paksim Ga.pptx
 
20240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 202420240607 QFM018 Elixir Reading List May 2024
20240607 QFM018 Elixir Reading List May 2024
 
WeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation TechniquesWeTestAthens: Postman's AI & Automation Techniques
WeTestAthens: Postman's AI & Automation Techniques
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with SlackLet's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slack
 
GraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracyGraphRAG for Life Science to increase LLM accuracy
GraphRAG for Life Science to increase LLM accuracy
 
National Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practicesNational Security Agency - NSA mobile device best practices
National Security Agency - NSA mobile device best practices
 
Project Management Semester Long Project - Acuity
Project Management Semester Long Project - AcuityProject Management Semester Long Project - Acuity
Project Management Semester Long Project - Acuity
 
Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024Columbus Data & Analytics Wednesdays - June 2024
Columbus Data & Analytics Wednesdays - June 2024
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Driving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success StoryDriving Business Innovation: Latest Generative AI Advancements & Success Story
Driving Business Innovation: Latest Generative AI Advancements & Success Story
 

Featured

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
marketingartwork
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
Skeleton Technologies
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
Neil Kimberley
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
contently
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
Albert Qian
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
Search Engine Journal
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
SpeakerHub
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
Tessa Mero
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Lily Ray
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
Rajiv Jayarajah, MAppComm, ACC
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Christy Abraham Joy
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
Vit Horky
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
MindGenius
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
RachelPearson36
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Applitools
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
GetSmarter
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
Alireza Esmikhani
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
Project for Public Spaces & National Center for Biking and Walking
 

Featured (20)

AI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdfAI Trends in Creative Operations 2024 by Artwork Flow.pdf
AI Trends in Creative Operations 2024 by Artwork Flow.pdf
 
Skeleton Culture Code
Skeleton Culture CodeSkeleton Culture Code
Skeleton Culture Code
 
PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 

Modeling and Summarizing News Events Using Semantic Triples

  • 1. 0 Modeling and Summarizing News Events Using Semantic Triples ESWC’2018,Heraklion|Crete,Greece Radityo Eko Prasojo, Mouna Kacimi, Werner Nutt 5 June 2018
  • 2. Eve the News Reader 1 … the deal is unfair for US Industries … his desire to overturn anything that has Obama’s stamp of approval on it … and 1000++ more … Hurricane Nate slammed Louisiana … The hurricane killed 2 people … The pope sent his prayer … and 1000++ more … Three people sustained injuries … politicians are corrupt … police staged mass raids … and 1000++ more THIS IS TIME CONSUMING
  • 3. Wall-e promises… 2 Don’t worry Eve! I’ll build a summary for you! !
  • 5. The Chemistry of Abstractive Summarization “… Hurricane Nate slammed Louisiana … Nate struck The State of Louisiana … The hurricane in Louisiana killed 2 people …” “Hurricane” “Nate” “slammed” “Louisiana” “Nate” “struck” “the” “State” “of” “Louisiana” “The” “hurricane” “killed” “2” “people” Information Extraction Fact fusion or merging Extractive summary: “Hurricane Nate slammed Louisiana.” “The hurricane killed 2 people.” Abstractive summary, with conjoined facts: “Hurricane Nate slammed Louisiana, killing 2 people.” Open Information Extraction Entity Linking Verb Linking 4
  • 6. Abstractive News Summarization Approaches 5 Phrase-selection based (PSB) • Tentatively pairs subject and verb phrases from different sentences • Checks for compatibility Pattern-graph Fusion (PGF) • Looks for similar tokens in different sentences • Fuses the tokens, thus forming a graph
  • 7. PSB Example “Hurricane” “Nate” “slammed” “Louisiana” “It” “killed” “2” “people” compatible compatible similar “Hurricane Nate slammed Louisiana, killed 2 people” “Nate” “struck” “the” “State” “of” “Louisiana” compatible similar similar 6
  • 8. PSB Limitation 7 Dilemma: losing information or being redundant
  • 9. PSB Limitation: Subclause/Adverbial Fact “Hurricane” “Nate” “slammed” “Louisiana” “after leaving Central America” “Nate” “slammed Louisiana with 85mph winds” similar? Incomplete case: “Hurricane Nate slammed Louisiana after leaving Central America.” Redundant case: “Hurricane Nate slammed Louisiana after leaving Central America, slammed Louisiana with 85mph winds.” 8
  • 10. Pattern Graph Fusion (PGF) <people; killed by; hurricane> <people; died in; Louisiana> Extraction OLLIE 9 Stanford SEMAFOR <PERSON; killed by; PROTAGONIST> <PERSON; died in; LOCATION> Typing PERSON killed by died in PROTAG LOC Graph fusion *Li et al. Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization (CIKM’15)
  • 11. Pattern Graph Fusion: Extraction “… Hurricane Nate slammed Louisiana with 85 mph winds …” “… On Saturday, Nate struck Louisiana …” “… In Louisiana, the hurricane killed 2 people …” “Hurricane” “Nate” “slammed” “Louisiana” “with” “85” “mph” “winds” “On” “Saturday,” “Nate” “struck” “Louisiana” “The” “hurricane” “in” “Louisiana” “killed” “2” “people” OLLIE 10 Extraction Typing Fusion
  • 12. TypingExtraction Pattern Graph Fusion: Typing Stanford NLP & SEMAFOR “Hurricane” PERSON “slammed” LOCATION “with” “85” “mph” “winds” “On” DATE “,” PERSON “struck” LOCATION “The” KILLER “in” “Louisiana” “killed” “2” VICTIM “Hurricane” “Nate” “slammed” “Louisiana” “with” “85” “mph” “winds” “On” “Saturday,” “Nate” “struck” “Louisiana” “The” “hurricane” “in” “Louisiana” “killed” “2” “people” 11 Extraction Typing Fusion
  • 13. Fusion of Sentences along Types Hurricane PERSON slammed LOCATION with 85 mph winds On DATE , struck The KILLER killed 2 VICTIM “On Saturday, Nate slammed Louisiana with 85 mph winds” in 12 Extraction Typing Fusion PERSON slammed LOCATION with 85 mph winds On DATE ,
  • 14. Pattern Graph Fusion: Limitations #1 Hurricane PERSON slammed LOCATION with 85 mph winds On DATE , struck The KILLER killed 2 VICTIM “On Saturday, Nate sent his prayer for Louisiana with 85 mph winds” in Correctness: Typing might lead to incorrect result Pope sent his RITE for 13 PERSON LOCATION with 85 mph winds On DATE , sent his RITE for
  • 15. Pattern Graph Fusion: Limitations #2 • Coverage: Merging misses semantically similar verbs • “withdraw” vs “pull out” • Grammaticality: Merging leads to ungrammatical sentences • “The US pulled out from the Paris Agreement caused disappointment among environmentalist.” 14
  • 16. Our Summarization We improve • Correctness and coverage by Entity Linking • Coverage by Verb Linking • Grammaticality by Grammatical Fixing 15
  • 17. Clustering Fusion, Ranking and Generation Our Summarization: Pipeline Dataset Triple Extraction Entity Linking (EL) Verb Linking (VL) Fusion, Ranking and Generation Summary Typing 16
  • 18. Pipeline: Entity Linking 17 • Named entity recognition: DBpedia Spotlight • Name normalization: DBpedia • Coreference resolution: DBpedia Stanford Coref Hurricane Nate slammed Louisiana with 85 winds The hurricane in The Creole State killed 2 people It struck Central America with 90 dbp:type hurricane dbo:WikiPageRedirectsOf winds mph mph
  • 19. Pipeline: Verb Linking 18 • WordNet Similarity > 0.9 It struck Central America on Monday Hurricane Nate slammed Louisiana with 85 windsmph
  • 20. Pipeline: Fusion and Ranking 19 #1 • What should be merged? • Entities, Verbs • Types of non-entity and non-verb tokens, if exist • Non-stopword tokens • The fused graph when merging entities and verbs: killed 2 people Hurricane Nate slammed Louisiana with 85 windsmph Central America with 90 windsmph in 2
  • 21. • Path collection • Path ranking criteria • Path coverage • Node degree • “Hurricane Nate in Louisiana with 85 mph winds” • Path coverage = 2, Avg node degree = 15/7 = 2.14 • “Hurricane Nate slammed Louisiana killed 2 people” • Path coverage = 2, Avg node degree = 15/6 = 2.5 Pipeline: Fusion and Ranking 20 #2 killed 2 people 2 4 4 2 2 1 2 2 1 2 2 2 1 Hurricane Nate slammed Louisiana with 85 windsmph 2 Central America with 90 windsmph 2 in 2 2
  • 22. Pipeline: Summary Generation 21 • Entity: the canonicalized form of DBpedia • Verb: the form most frequently used in the cluster Hurricane Nate slammed Louisiana killed 2 people 2 It The hurricane struck The Creole State
  • 23. Pipeline: Summary Generation 22 Grammatical fixing for dangling verb • Dangling verb identification (e.g, subject presence check) • Verb active/passive type check  Fix verb (e.g., transform to participle) killed 2 peopleHurricane Nate slammed Louisiana subj subj? dobj killing
  • 24. Evaluation 23 • DUC 2004 and DUC 2007 Dataset • 95 news topics in total • For each topic: • 10 news articles • 4 human summaries as the gold standard • (for DUC’07) semantic gold standard • Settings: different merging strategies to test • entities, verbs, typings, non-stopwords • Metrics: • ROUGE: n-gram overlaps with the gold standard • PYRAMID: semantic comparison with the semantic gold standard • Human assessment to measure coherence (grammaticality) and correctness
  • 25. Results: Rouge 24 EL + T + NS VL + T + NS Our (VL)EL + VL + T + NS EL + VL + NS EL + VL EL + VL + T Baseline (T + NS) Baseline=Li et al. Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization (CIKM’15) EL= entity linking, VL=verb linking, T=typing, NS=non-stopwords Rouge-1 Rouge-2 Rouge-1 Rouge-2 Recall Recall Recall Recall
  • 26. Results: Pyramid 25 EL+VL+T Baseline (T + NS) Manual: Humans judge quality on a 5-grade Likert scale Pyramid: Semantic similarity with gold standard, checked using low- dimensional latent word vector* EL + T + NS VL + T + NS Our (VL)EL + VL + T + NS EL + VL + NS EL + VL EL + VL + T Baseline (T + NS) Recall Baseline=Li et al. Weakly Supervised Natural Language Processing Framework for Abstractive Multi-Document Summarization (CIKM’15) EL= entity linking, VL=verb linking, T=typing, NS=non-stopwords *Guo & Diab, ACL’12
  • 27. • PGF is best-fit for abstractive summarization • We enriched PGF by semantic annotations from kbs: • entity and verb linking • path ranking leveraging node degrees + added grammatical fixes • Experiments show our enrichments outperform the baseline 26 Conclusions
  • 28. • More fine-grained representation of facts • Semantic ranking • Fluency 27 Future work Hurricane Nate slammed Louisiana killing … causing … On Saturday After leaving… With winds of …
  • 29. From Information Extraction to Abstractive Summarization 28 Thank you! Wall-e summarizes: I found that between extractive and abstractive summarization, the latter is what we want. Then, I found that pattern graph fusion is best-fit for abstractive summarization due to the dilemma of phrase selection-based approaches. I improved PGF approach by introducing semantic enrichments that are entity linking, verb linking, and path ranking leveraging node degrees, on top of adding some grammatical fixes. My experiment shows that our enrichments outperform the baseline.

Editor's Notes

  1. Summary: PSB is limited due to the fact that it sticks to coarse grained sentences, not fine-grained facts.
  2. Address: appropriate granularity for entity and verb grouping/classification Next, show pipeline Animation: first show state of the art approach in white, without EL and VL Then add EL and VL in blue, and adjust Clustering and Fusion etc. by coloring it green Maybe: show classification in the original version, then say that we replaced it with clustering
  3. Change order of the animation (still valid!)
  4. Node degree should go with animation Raise questions: What should be merged? Only entities, verbs, types (strict merging)? Or also non-stopwords? 1st q: What does the graph look like? 2nd q: which paths should be selected?: Should contain: important nodes -> node degree, identify all (maximal?) paths, maximize average node degree in path,
  5. Try another sentence from the Hurricane scenario
  6. Change B+ into Our Approach (…) Fill in Rouge-1, etc.
  7. Say this is from PYRAMID evaluation, rhs is manual Show one column only
  8. Too much!
  9. Salience: Add semantics to the ranking Exploiting semantics for addressing importance