SlideShare a Scribd company logo
1 of 24
Download to read offline
SEED: A Framework for Extracting
Social Events from Press News
University Ca’ Foscari – Venice
WWW2013 Rio de Janeiro - May 13th, 2013
Salvatore Orlando
orlando@unive.it
Francesco Pizzolon
pizzolon.francesco@gmail.com
Gabriele Tolomei
gabriele.tolomei@unive.it
Overview
• Introduction to the problem
• Background
• SEED
• Experiments
• Results
• Conclusions and future works
1 / 21
Intro Background SEED Experiments Results Conclusions
Places
Entertainment
Events
Events creation
events DB yourportal.com
1
2
3
4 5
News agencies
Portal’s editorial
division
1. A news agency composes the press
news
2. The press news is sent to portal’s
editorial division by mail
3. A journalist reads and analyzes the
verbose and long press news
4. New entertainment events are
added to the events DB
5. The journalist publishes the event
on portal’s site
GOAL: automate step 3 helping journalists to understand right events
Events creation process
2 / 21
Intro Background SEED Experiments Results Conclusions
Starting from unstructured text we have to extract structured information
Information Extraction
Named Entity Recognition (NER)
Relation Extraction (RE)
Find entities of the classes:
• Date
• Location
• Place
• Artist
Find 3-ary tuples in the form:
• (Date, Location, Artist)
• (Date, Place, Artist)
3 / 21
Intro Background SEED Experiments Results Conclusions
Il 2011 e' stato il suo anno. L'omonimo album di debutto l’ha resa celebre in ogni dove coronandola
"la nuova musa made in UK".
Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classe
e passione che Anna Calvi riesce ad esprimere con la sua musica e attraverso i live show. Anna Calvi e'
una grande artista, una fuoriclasse.
Gia' indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival,
Anna Calvi fara' tappa in Italia con prevendite attive da Lunedì14 Maggio sui circuiti vivaticket.it,
ticketone.it.
Martedì 24 Luglio
Roma – Parco di San Sebastiano
Roma Vintage
Via di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 Roma
Biglietto: 15,00 euro + d.p.
L'album di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliante
voce; e' un album indimenticabile e appassionante. Influenzata dalle vocalita' di artisti diversi come
Nina Simone, Maria Callas e Scott Walker, dalle chitarre di Django Rheinhard e Robert Johnson, dal
classico romanticismo di Ravel e Debussy, Anna Calvi anche se ispirata da musicisti di un lontano
passato, ha un sound totalmente attuale ma soprattutto originale. Complici lo sguardo ipnotico e una
bellezza sensuale, Anna Calvi ha conquistato le copertine ed intere pagine delle migliori riviste e
magazine francesi, tedeschi ed Italiani.
Benvenuti nel magico mondo di Anna Calvi – un luogo dove bellezza e oscurita' complottano e si
scontrano tra loro, dove indomite emozioni conquistano e consumano.
A sample press news
4 / 21
Intro Background SEED Experiments Results Conclusions
Named Entity Recognition (NER)
Requires PROs CONs
Knowledge-basedRule-basedStatistical
• a dictionary for every
entity class
• set of rules
• policies to apply rules
• large corpus with
labeled examples
• model for text
decomposition
• algorithms to train
and deploy the model
• fast performances
• high precision score
• no labeled corpus
needed
• no labeled corpus
needed
• domain insensitive
• dicts needs updates
• creating new dicts
requires efforts
• hand-creating rules
is annoying
• large corpus for new
domains are unavailable
5 / 21
Intro Background SEED Experiments Results Conclusions
Requires PROs CONsSupervised
Semi-supervised
DipreSnowballTextRunner
• set of features to train a
classifier
• labeled corpus
• can be used with any
relation
• difficult to extend
• require to preprocess
the input
• extension to high order
relations is difficult
• given relation
• seed set
• rely on NER tagger
• hard pattern matching
• soft pattern matching
• high precision
• no need of labeled
data
• self-supervised learner
• single-pass extractor
• redundancy-based
assessor
• rely on dependency
parser to self annotate
training data
• no relationship given
Relation Extraction (RE)
6 / 21
Intro Background SEED Experiments Results Conclusions
7 / 21
Intro Background SEED Experiments Results Conclusions
Named Entity Recognition Approach
GOALS
find entities of classes
Date, Location, Place and
Artist in unstructured text
ISSUES
closed domain,
no labeled corpus,
press news are in Italian
VS
SOLUTIONS
• Date: predefined forms rule-based methods
• Location: present in Wikipedia knowledge-based approach
• Place: present in company’s database knowledge-based approach
• Artist: present in Wikipedia knowledge-based approach
8 / 21
Intro Background SEED Experiments Results Conclusions
9 / 21
Intro Background SEED Experiments Results Conclusions
Il 2011 e' stato il suo anno. L'omonimo album di debutto l’ha resa celebre in ogni dove coronandola "la
nuova musa made in UK".
Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classe e
passione che [art Anna Calvi] riesce ad esprimere con la sua musica e attraverso i live show. [art Anna
Calvi] e' una grande artista, una fuoriclasse.
Gia' indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival, [art
Anna Calvi] fara' tappa in Italia con prevendite attive da [date Lunedì 14 Maggio] sui circuiti vivaticket.it,
ticketone.it.
[date Martedì 24 Luglio]
[loc Roma] – Parco di San Sebastiano
[place Roma Vintage]
Via di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 [loc Roma]
Biglietto: 15,00 euro + d.p.
L'album di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliante voce;
e' un album indimenticabile e appassionante. Influenzata dalle vocalita' di artisti diversi come [art Nina
Simone], [art Maria Callas] e [art Scott Walker], dalle chitarre di [art Django Rheinhard] e [art Robert
Johnson], dal classico romanticismo di [art Ravel] e [art Debussy], [art Anna Calvi ] anche se ispirata da
musicisti di un lontano passato, ha un sound totalmente attuale ma soprattutto originale. Complici lo
sguardo ipnotico e una bellezza sensuale, [art Anna Calvi] ha conquistato le copertine ed intere pagine
delle migliori riviste e magazine francesi, tedeschi ed Italiani.
Benvenuti nel magico mondo di [art Anna Calvi] – un luogo dove bellezza e oscurità complottano e si
scontrano tra loro, dove indomite emozioni conquistano e consumano.
The sample press news after NER phase
10 / 21
Intro Background SEED Experiments Results Conclusions
Relation Extraction Approach
GOALS
find two predefined relations
between entities extracted:
•(Date, Location, Artist)
• (Date, Place, Artist)
ISSUES
events within press news span over
a single sentence, but state-of-the-
art methods work by sentence level
HINT
Documents about Entertainment Events are often abundant on the Social Web
11 / 21
VS
Intro Background SEED Experiments Results Conclusions
Blogs Social networks
SOLUTION
Use an external Fresh Social Knowledge to infer right entertainment events,
in particular to disambiguate in the Relation Extraction task
12 / 21
Intro Background SEED Experiments Results Conclusions
13 / 21
Intro Background SEED Experiments Results Conclusions
Which fresh social knowledge?
Too static.. Events inserted after their happening!
Data is not structured for our purpose
Well, they return document related and relevant
given a query… Let’s try!
14 / 21
Encyclopedic one?
Social networks?
… and what about SEs?
Intro Background SEED Experiments Results Conclusions
Scoring tuples regarding SE Result List
(Martedì 24 luglio, Roma, Anna Calvi)
Scoring principlesScoring principles
• product of frequency count
• importance to title matches
respect snippet matches
• importance to top results
15 / 21
Intro Background SEED Experiments Results Conclusions
RE step
NER step
Date
Lunedì 14 Maggio
Martedì 24 Luglio
Location
Roma
Artist
Nina Simone
Maria Callas
Anna Calvi
Scott Walker
Django Rheinhard
Debussy
Ravel
Candidate Extraction
(Lunedì 14 maggio, Roma, Anna Calvi),
(Lunedì 14 maggio, Roma Vintage, Anna Calvi),
…
(Lunedì 14 maggio, Roma Vintage, Ravel),
(Martedì 24 luglio, Roma, Anna Calvi),
(Martedì 24 luglio, Roma Vintage, Anna Calvi),
…
(Martedì 24 luglio, Roma Vintage, Ravel)
Place
Roma Vintage
Candidate Ranking
(Martedì 24 luglio, Roma, Anna Calvi),
(Martedì 24 luglio, Roma Vintage, Anna Calvi)
16 / 21
Intro Background SEED Experiments Results Conclusions
17 / 21
Intro Background SEED Experiments Results Conclusions
DATASET
One hundred press news, provided by the company, manually labeled by a
member of the editorial office
Evaluation of a Class in NER phase
Precision: # correctly labeled entities / # labeled entities
Recall: # correctly labeled entities / # true (manually) labeled entities
F-measure: harmonic mean between Precision and Recall
18 / 21
Intro Background SEED Experiments Results Conclusions
Evaluation of the RE phase
Precision: # correctly labeled relations / # labeled relations
Recall: # correctly labeled relations/ # true (manually) labeled relations
F-measure: harmonic mean between Precision and Recall
Baselines
Baseline1: if an artist, a place
and a date are named in the
same sentence, then a tuple
containing them is returned.
Baseline2: if an artist, a place
and a date are named more
than the others the
correspondent tuple is
returned.
SEED
Linear SEED: same importance
given to SERP elements
Non-Linear SEED: more
importance given to top-K
SERP elements
Total F-measure around 81%
Named Entity Recognition Evaluation
19 / 21
Intro Background SEED Experiments Results Conclusions
F-measure around 70.2%
LINEAR: giving same importance to results
F-measure around 70.5%
NON-LINEAR: giving importance to top results
20 / 21
Intro Background SEED Experiments Results Conclusions
Relation Extraction Evaluation
What we did so far
• Introduced a novel RE techique to understand our predefined relations exploiting
the Social Web for a real world application
• Developed a framework called SEED implementing our strategy
• Evaluated SEED together with two baselines
Future works
• Improving NER phase
• evaluate RE when an optimal NER is used and viceversa
• Exploiting other social knowledges
21 / 21
Intro Background SEED Experiments Results Conclusions
Thanks! Now Q&A

More Related Content

Viewers also liked

addmaths-gantt-chart-f4-and-5
addmaths-gantt-chart-f4-and-5addmaths-gantt-chart-f4-and-5
addmaths-gantt-chart-f4-and-5suefee
 
Ranadd math form_5yearplan2009
Ranadd math form_5yearplan2009Ranadd math form_5yearplan2009
Ranadd math form_5yearplan2009suefee
 
La1 powerpoint-1
La1 powerpoint-1La1 powerpoint-1
La1 powerpoint-1suefee
 
Quadraticfuntions
QuadraticfuntionsQuadraticfuntions
Quadraticfuntionssuefee
 
Functions
FunctionsFunctions
Functionssuefee
 
Yearly plan add maths f52010
Yearly plan add maths f52010Yearly plan add maths f52010
Yearly plan add maths f52010suefee
 
Carta gantt-add-math-f4
Carta gantt-add-math-f4Carta gantt-add-math-f4
Carta gantt-add-math-f4suefee
 
Simultaneous equations
Simultaneous equationsSimultaneous equations
Simultaneous equationssuefee
 
37756909 yearly-plan-add-maths-form-4-edit-kuching-1
37756909 yearly-plan-add-maths-form-4-edit-kuching-137756909 yearly-plan-add-maths-form-4-edit-kuching-1
37756909 yearly-plan-add-maths-form-4-edit-kuching-1suefee
 
Daily lesson plan ict form 5
Daily lesson plan ict form 5Daily lesson plan ict form 5
Daily lesson plan ict form 5suefee
 

Viewers also liked (10)

addmaths-gantt-chart-f4-and-5
addmaths-gantt-chart-f4-and-5addmaths-gantt-chart-f4-and-5
addmaths-gantt-chart-f4-and-5
 
Ranadd math form_5yearplan2009
Ranadd math form_5yearplan2009Ranadd math form_5yearplan2009
Ranadd math form_5yearplan2009
 
La1 powerpoint-1
La1 powerpoint-1La1 powerpoint-1
La1 powerpoint-1
 
Quadraticfuntions
QuadraticfuntionsQuadraticfuntions
Quadraticfuntions
 
Functions
FunctionsFunctions
Functions
 
Yearly plan add maths f52010
Yearly plan add maths f52010Yearly plan add maths f52010
Yearly plan add maths f52010
 
Carta gantt-add-math-f4
Carta gantt-add-math-f4Carta gantt-add-math-f4
Carta gantt-add-math-f4
 
Simultaneous equations
Simultaneous equationsSimultaneous equations
Simultaneous equations
 
37756909 yearly-plan-add-maths-form-4-edit-kuching-1
37756909 yearly-plan-add-maths-form-4-edit-kuching-137756909 yearly-plan-add-maths-form-4-edit-kuching-1
37756909 yearly-plan-add-maths-form-4-edit-kuching-1
 
Daily lesson plan ict form 5
Daily lesson plan ict form 5Daily lesson plan ict form 5
Daily lesson plan ict form 5
 

Recently uploaded

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationMichael W. Hawkins
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonAnna Loughnan Colquhoun
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)Gabriella Davis
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsMaria Levchenko
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘RTylerCroy
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Servicegiselly40
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersThousandEyes
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Drew Madelung
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsEnterprise Knowledge
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 

Recently uploaded (20)

GenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day PresentationGenCyber Cyber Security Day Presentation
GenCyber Cyber Security Day Presentation
 
Data Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt RobisonData Cloud, More than a CDP by Matt Robison
Data Cloud, More than a CDP by Matt Robison
 
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
Transcript: #StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)A Domino Admins Adventures (Engage 2024)
A Domino Admins Adventures (Engage 2024)
 
SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024SQL Database Design For Developers at php[tek] 2024
SQL Database Design For Developers at php[tek] 2024
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Handwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed textsHandwritten Text Recognition for manuscripts and early printed texts
Handwritten Text Recognition for manuscripts and early printed texts
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
🐬 The future of MySQL is Postgres 🐘
🐬  The future of MySQL is Postgres   🐘🐬  The future of MySQL is Postgres   🐘
🐬 The future of MySQL is Postgres 🐘
 
Maximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptxMaximizing Board Effectiveness 2024 Webinar.pptx
Maximizing Board Effectiveness 2024 Webinar.pptx
 
CNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of ServiceCNv6 Instructor Chapter 6 Quality of Service
CNv6 Instructor Chapter 6 Quality of Service
 
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for PartnersEnhancing Worker Digital Experience: A Hands-on Workshop for Partners
Enhancing Worker Digital Experience: A Hands-on Workshop for Partners
 
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
Strategies for Unlocking Knowledge Management in Microsoft 365 in the Copilot...
 
IAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI SolutionsIAC 2024 - IA Fast Track to Search Focused AI Solutions
IAC 2024 - IA Fast Track to Search Focused AI Solutions
 
08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 

Seed

  • 1. SEED: A Framework for Extracting Social Events from Press News University Ca’ Foscari – Venice WWW2013 Rio de Janeiro - May 13th, 2013 Salvatore Orlando orlando@unive.it Francesco Pizzolon pizzolon.francesco@gmail.com Gabriele Tolomei gabriele.tolomei@unive.it
  • 2. Overview • Introduction to the problem • Background • SEED • Experiments • Results • Conclusions and future works
  • 3. 1 / 21 Intro Background SEED Experiments Results Conclusions Places Entertainment Events
  • 4. Events creation events DB yourportal.com 1 2 3 4 5 News agencies Portal’s editorial division 1. A news agency composes the press news 2. The press news is sent to portal’s editorial division by mail 3. A journalist reads and analyzes the verbose and long press news 4. New entertainment events are added to the events DB 5. The journalist publishes the event on portal’s site GOAL: automate step 3 helping journalists to understand right events Events creation process 2 / 21 Intro Background SEED Experiments Results Conclusions
  • 5. Starting from unstructured text we have to extract structured information Information Extraction Named Entity Recognition (NER) Relation Extraction (RE) Find entities of the classes: • Date • Location • Place • Artist Find 3-ary tuples in the form: • (Date, Location, Artist) • (Date, Place, Artist) 3 / 21 Intro Background SEED Experiments Results Conclusions
  • 6. Il 2011 e' stato il suo anno. L'omonimo album di debutto l’ha resa celebre in ogni dove coronandola "la nuova musa made in UK". Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classe e passione che Anna Calvi riesce ad esprimere con la sua musica e attraverso i live show. Anna Calvi e' una grande artista, una fuoriclasse. Gia' indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival, Anna Calvi fara' tappa in Italia con prevendite attive da Lunedì14 Maggio sui circuiti vivaticket.it, ticketone.it. Martedì 24 Luglio Roma – Parco di San Sebastiano Roma Vintage Via di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 Roma Biglietto: 15,00 euro + d.p. L'album di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliante voce; e' un album indimenticabile e appassionante. Influenzata dalle vocalita' di artisti diversi come Nina Simone, Maria Callas e Scott Walker, dalle chitarre di Django Rheinhard e Robert Johnson, dal classico romanticismo di Ravel e Debussy, Anna Calvi anche se ispirata da musicisti di un lontano passato, ha un sound totalmente attuale ma soprattutto originale. Complici lo sguardo ipnotico e una bellezza sensuale, Anna Calvi ha conquistato le copertine ed intere pagine delle migliori riviste e magazine francesi, tedeschi ed Italiani. Benvenuti nel magico mondo di Anna Calvi – un luogo dove bellezza e oscurita' complottano e si scontrano tra loro, dove indomite emozioni conquistano e consumano. A sample press news 4 / 21 Intro Background SEED Experiments Results Conclusions
  • 7. Named Entity Recognition (NER) Requires PROs CONs Knowledge-basedRule-basedStatistical • a dictionary for every entity class • set of rules • policies to apply rules • large corpus with labeled examples • model for text decomposition • algorithms to train and deploy the model • fast performances • high precision score • no labeled corpus needed • no labeled corpus needed • domain insensitive • dicts needs updates • creating new dicts requires efforts • hand-creating rules is annoying • large corpus for new domains are unavailable 5 / 21 Intro Background SEED Experiments Results Conclusions
  • 8. Requires PROs CONsSupervised Semi-supervised DipreSnowballTextRunner • set of features to train a classifier • labeled corpus • can be used with any relation • difficult to extend • require to preprocess the input • extension to high order relations is difficult • given relation • seed set • rely on NER tagger • hard pattern matching • soft pattern matching • high precision • no need of labeled data • self-supervised learner • single-pass extractor • redundancy-based assessor • rely on dependency parser to self annotate training data • no relationship given Relation Extraction (RE) 6 / 21 Intro Background SEED Experiments Results Conclusions
  • 9. 7 / 21 Intro Background SEED Experiments Results Conclusions
  • 10. Named Entity Recognition Approach GOALS find entities of classes Date, Location, Place and Artist in unstructured text ISSUES closed domain, no labeled corpus, press news are in Italian VS SOLUTIONS • Date: predefined forms rule-based methods • Location: present in Wikipedia knowledge-based approach • Place: present in company’s database knowledge-based approach • Artist: present in Wikipedia knowledge-based approach 8 / 21 Intro Background SEED Experiments Results Conclusions
  • 11. 9 / 21 Intro Background SEED Experiments Results Conclusions
  • 12. Il 2011 e' stato il suo anno. L'omonimo album di debutto l’ha resa celebre in ogni dove coronandola "la nuova musa made in UK". Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classe e passione che [art Anna Calvi] riesce ad esprimere con la sua musica e attraverso i live show. [art Anna Calvi] e' una grande artista, una fuoriclasse. Gia' indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival, [art Anna Calvi] fara' tappa in Italia con prevendite attive da [date Lunedì 14 Maggio] sui circuiti vivaticket.it, ticketone.it. [date Martedì 24 Luglio] [loc Roma] – Parco di San Sebastiano [place Roma Vintage] Via di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 [loc Roma] Biglietto: 15,00 euro + d.p. L'album di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliante voce; e' un album indimenticabile e appassionante. Influenzata dalle vocalita' di artisti diversi come [art Nina Simone], [art Maria Callas] e [art Scott Walker], dalle chitarre di [art Django Rheinhard] e [art Robert Johnson], dal classico romanticismo di [art Ravel] e [art Debussy], [art Anna Calvi ] anche se ispirata da musicisti di un lontano passato, ha un sound totalmente attuale ma soprattutto originale. Complici lo sguardo ipnotico e una bellezza sensuale, [art Anna Calvi] ha conquistato le copertine ed intere pagine delle migliori riviste e magazine francesi, tedeschi ed Italiani. Benvenuti nel magico mondo di [art Anna Calvi] – un luogo dove bellezza e oscurità complottano e si scontrano tra loro, dove indomite emozioni conquistano e consumano. The sample press news after NER phase 10 / 21 Intro Background SEED Experiments Results Conclusions
  • 13. Relation Extraction Approach GOALS find two predefined relations between entities extracted: •(Date, Location, Artist) • (Date, Place, Artist) ISSUES events within press news span over a single sentence, but state-of-the- art methods work by sentence level HINT Documents about Entertainment Events are often abundant on the Social Web 11 / 21 VS Intro Background SEED Experiments Results Conclusions
  • 14. Blogs Social networks SOLUTION Use an external Fresh Social Knowledge to infer right entertainment events, in particular to disambiguate in the Relation Extraction task 12 / 21 Intro Background SEED Experiments Results Conclusions
  • 15. 13 / 21 Intro Background SEED Experiments Results Conclusions
  • 16. Which fresh social knowledge? Too static.. Events inserted after their happening! Data is not structured for our purpose Well, they return document related and relevant given a query… Let’s try! 14 / 21 Encyclopedic one? Social networks? … and what about SEs? Intro Background SEED Experiments Results Conclusions
  • 17. Scoring tuples regarding SE Result List (Martedì 24 luglio, Roma, Anna Calvi) Scoring principlesScoring principles • product of frequency count • importance to title matches respect snippet matches • importance to top results 15 / 21 Intro Background SEED Experiments Results Conclusions
  • 18. RE step NER step Date Lunedì 14 Maggio Martedì 24 Luglio Location Roma Artist Nina Simone Maria Callas Anna Calvi Scott Walker Django Rheinhard Debussy Ravel Candidate Extraction (Lunedì 14 maggio, Roma, Anna Calvi), (Lunedì 14 maggio, Roma Vintage, Anna Calvi), … (Lunedì 14 maggio, Roma Vintage, Ravel), (Martedì 24 luglio, Roma, Anna Calvi), (Martedì 24 luglio, Roma Vintage, Anna Calvi), … (Martedì 24 luglio, Roma Vintage, Ravel) Place Roma Vintage Candidate Ranking (Martedì 24 luglio, Roma, Anna Calvi), (Martedì 24 luglio, Roma Vintage, Anna Calvi) 16 / 21 Intro Background SEED Experiments Results Conclusions
  • 19. 17 / 21 Intro Background SEED Experiments Results Conclusions DATASET One hundred press news, provided by the company, manually labeled by a member of the editorial office Evaluation of a Class in NER phase Precision: # correctly labeled entities / # labeled entities Recall: # correctly labeled entities / # true (manually) labeled entities F-measure: harmonic mean between Precision and Recall
  • 20. 18 / 21 Intro Background SEED Experiments Results Conclusions Evaluation of the RE phase Precision: # correctly labeled relations / # labeled relations Recall: # correctly labeled relations/ # true (manually) labeled relations F-measure: harmonic mean between Precision and Recall Baselines Baseline1: if an artist, a place and a date are named in the same sentence, then a tuple containing them is returned. Baseline2: if an artist, a place and a date are named more than the others the correspondent tuple is returned. SEED Linear SEED: same importance given to SERP elements Non-Linear SEED: more importance given to top-K SERP elements
  • 21. Total F-measure around 81% Named Entity Recognition Evaluation 19 / 21 Intro Background SEED Experiments Results Conclusions
  • 22. F-measure around 70.2% LINEAR: giving same importance to results F-measure around 70.5% NON-LINEAR: giving importance to top results 20 / 21 Intro Background SEED Experiments Results Conclusions Relation Extraction Evaluation
  • 23. What we did so far • Introduced a novel RE techique to understand our predefined relations exploiting the Social Web for a real world application • Developed a framework called SEED implementing our strategy • Evaluated SEED together with two baselines Future works • Improving NER phase • evaluate RE when an optimal NER is used and viceversa • Exploiting other social knowledges 21 / 21 Intro Background SEED Experiments Results Conclusions