Your SlideShare is downloading. ×
  • Like
Seed
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×

Now you can save presentations on your phone or tablet

Available for both IPhone and Android

Text the download link to your phone

Standard text messaging rates apply

Seed

  • 297 views
Published

Slides from my talk at http://wole2013.eurecom.fr/

Slides from my talk at http://wole2013.eurecom.fr/

Published in Technology , Business
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
297
On SlideShare
0
From Embeds
0
Number of Embeds
1

Actions

Shares
Downloads
2
Comments
0
Likes
0

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. SEED: A Framework for ExtractingSocial Events from Press NewsUniversity Ca’ Foscari – VeniceWWW2013 Rio de Janeiro - May 13th, 2013Salvatore Orlandoorlando@unive.itFrancesco Pizzolonpizzolon.francesco@gmail.comGabriele Tolomeigabriele.tolomei@unive.it
  • 2. Overview• Introduction to the problem• Background• SEED• Experiments• Results• Conclusions and future works
  • 3. 1 / 21Intro Background SEED Experiments Results ConclusionsPlacesEntertainmentEvents
  • 4. Events creationevents DB yourportal.com1234 5News agenciesPortal’s editorialdivision1. A news agency composes the pressnews2. The press news is sent to portal’seditorial division by mail3. A journalist reads and analyzes theverbose and long press news4. New entertainment events areadded to the events DB5. The journalist publishes the eventon portal’s siteGOAL: automate step 3 helping journalists to understand right eventsEvents creation process2 / 21Intro Background SEED Experiments Results Conclusions
  • 5. Starting from unstructured text we have to extract structured informationInformation ExtractionNamed Entity Recognition (NER)Relation Extraction (RE)Find entities of the classes:• Date• Location• Place• ArtistFind 3-ary tuples in the form:• (Date, Location, Artist)• (Date, Place, Artist)3 / 21Intro Background SEED Experiments Results Conclusions
  • 6. Il 2011 e stato il suo anno. Lomonimo album di debutto l’ha resa celebre in ogni dove coronandola"la nuova musa made in UK".Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classee passione che Anna Calvi riesce ad esprimere con la sua musica e attraverso i live show. Anna Calvi euna grande artista, una fuoriclasse.Gia indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival,Anna Calvi fara tappa in Italia con prevendite attive da Lunedì14 Maggio sui circuiti vivaticket.it,ticketone.it.Martedì 24 LuglioRoma – Parco di San SebastianoRoma VintageVia di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 RomaBiglietto: 15,00 euro + d.p.Lalbum di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliantevoce; e un album indimenticabile e appassionante. Influenzata dalle vocalita di artisti diversi comeNina Simone, Maria Callas e Scott Walker, dalle chitarre di Django Rheinhard e Robert Johnson, dalclassico romanticismo di Ravel e Debussy, Anna Calvi anche se ispirata da musicisti di un lontanopassato, ha un sound totalmente attuale ma soprattutto originale. Complici lo sguardo ipnotico e unabellezza sensuale, Anna Calvi ha conquistato le copertine ed intere pagine delle migliori riviste emagazine francesi, tedeschi ed Italiani.Benvenuti nel magico mondo di Anna Calvi – un luogo dove bellezza e oscurita complottano e siscontrano tra loro, dove indomite emozioni conquistano e consumano.A sample press news4 / 21Intro Background SEED Experiments Results Conclusions
  • 7. Named Entity Recognition (NER)Requires PROs CONsKnowledge-basedRule-basedStatistical• a dictionary for everyentity class• set of rules• policies to apply rules• large corpus withlabeled examples• model for textdecomposition• algorithms to trainand deploy the model• fast performances• high precision score• no labeled corpusneeded• no labeled corpusneeded• domain insensitive• dicts needs updates• creating new dictsrequires efforts• hand-creating rulesis annoying• large corpus for newdomains are unavailable5 / 21Intro Background SEED Experiments Results Conclusions
  • 8. Requires PROs CONsSupervisedSemi-supervisedDipreSnowballTextRunner• set of features to train aclassifier• labeled corpus• can be used with anyrelation• difficult to extend• require to preprocessthe input• extension to high orderrelations is difficult• given relation• seed set• rely on NER tagger• hard pattern matching• soft pattern matching• high precision• no need of labeleddata• self-supervised learner• single-pass extractor• redundancy-basedassessor• rely on dependencyparser to self annotatetraining data• no relationship givenRelation Extraction (RE)6 / 21Intro Background SEED Experiments Results Conclusions
  • 9. 7 / 21Intro Background SEED Experiments Results Conclusions
  • 10. Named Entity Recognition ApproachGOALSfind entities of classesDate, Location, Place andArtist in unstructured textISSUESclosed domain,no labeled corpus,press news are in ItalianVSSOLUTIONS• Date: predefined forms rule-based methods• Location: present in Wikipedia knowledge-based approach• Place: present in company’s database knowledge-based approach• Artist: present in Wikipedia knowledge-based approach8 / 21Intro Background SEED Experiments Results Conclusions
  • 11. 9 / 21Intro Background SEED Experiments Results Conclusions
  • 12. Il 2011 e stato il suo anno. Lomonimo album di debutto l’ha resa celebre in ogni dove coronandola "lanuova musa made in UK".Un grande successo di pubblico e critica ottenuto grazie alla vincente combinazione di bravura, classe epassione che [art Anna Calvi] riesce ad esprimere con la sua musica e attraverso i live show. [art AnnaCalvi] e una grande artista, una fuoriclasse.Gia indaffarata per i prossimi show estivi che la vedranno ospite di numerosi ed importanti festival, [artAnna Calvi] fara tappa in Italia con prevendite attive da [date Lunedì 14 Maggio] sui circuiti vivaticket.it,ticketone.it.[date Martedì 24 Luglio][loc Roma] – Parco di San Sebastiano[place Roma Vintage]Via di Porta San Sebastiano 2 (P.le Numa Pompilio), 00187 [loc Roma]Biglietto: 15,00 euro + d.p.Lalbum di debutto si sviluppa sulla straordinaria chitarra di Anna e sulla sua potente e ammaliante voce;e un album indimenticabile e appassionante. Influenzata dalle vocalita di artisti diversi come [art NinaSimone], [art Maria Callas] e [art Scott Walker], dalle chitarre di [art Django Rheinhard] e [art RobertJohnson], dal classico romanticismo di [art Ravel] e [art Debussy], [art Anna Calvi ] anche se ispirata damusicisti di un lontano passato, ha un sound totalmente attuale ma soprattutto originale. Complici losguardo ipnotico e una bellezza sensuale, [art Anna Calvi] ha conquistato le copertine ed intere paginedelle migliori riviste e magazine francesi, tedeschi ed Italiani.Benvenuti nel magico mondo di [art Anna Calvi] – un luogo dove bellezza e oscurità complottano e siscontrano tra loro, dove indomite emozioni conquistano e consumano.The sample press news after NER phase10 / 21Intro Background SEED Experiments Results Conclusions
  • 13. Relation Extraction ApproachGOALSfind two predefined relationsbetween entities extracted:•(Date, Location, Artist)• (Date, Place, Artist)ISSUESevents within press news span overa single sentence, but state-of-the-art methods work by sentence levelHINTDocuments about Entertainment Events are often abundant on the Social Web11 / 21VSIntro Background SEED Experiments Results Conclusions
  • 14. Blogs Social networksSOLUTIONUse an external Fresh Social Knowledge to infer right entertainment events,in particular to disambiguate in the Relation Extraction task12 / 21Intro Background SEED Experiments Results Conclusions
  • 15. 13 / 21Intro Background SEED Experiments Results Conclusions
  • 16. Which fresh social knowledge?Too static.. Events inserted after their happening!Data is not structured for our purposeWell, they return document related and relevantgiven a query… Let’s try!14 / 21Encyclopedic one?Social networks?… and what about SEs?Intro Background SEED Experiments Results Conclusions
  • 17. Scoring tuples regarding SE Result List(Martedì 24 luglio, Roma, Anna Calvi)Scoring principlesScoring principles• product of frequency count• importance to title matchesrespect snippet matches• importance to top results15 / 21Intro Background SEED Experiments Results Conclusions
  • 18. RE stepNER stepDateLunedì 14 MaggioMartedì 24 LuglioLocationRomaArtistNina SimoneMaria CallasAnna CalviScott WalkerDjango RheinhardDebussyRavelCandidate Extraction(Lunedì 14 maggio, Roma, Anna Calvi),(Lunedì 14 maggio, Roma Vintage, Anna Calvi),…(Lunedì 14 maggio, Roma Vintage, Ravel),(Martedì 24 luglio, Roma, Anna Calvi),(Martedì 24 luglio, Roma Vintage, Anna Calvi),…(Martedì 24 luglio, Roma Vintage, Ravel)PlaceRoma VintageCandidate Ranking(Martedì 24 luglio, Roma, Anna Calvi),(Martedì 24 luglio, Roma Vintage, Anna Calvi)16 / 21Intro Background SEED Experiments Results Conclusions
  • 19. 17 / 21Intro Background SEED Experiments Results ConclusionsDATASETOne hundred press news, provided by the company, manually labeled by amember of the editorial officeEvaluation of a Class in NER phasePrecision: # correctly labeled entities / # labeled entitiesRecall: # correctly labeled entities / # true (manually) labeled entitiesF-measure: harmonic mean between Precision and Recall
  • 20. 18 / 21Intro Background SEED Experiments Results ConclusionsEvaluation of the RE phasePrecision: # correctly labeled relations / # labeled relationsRecall: # correctly labeled relations/ # true (manually) labeled relationsF-measure: harmonic mean between Precision and RecallBaselinesBaseline1: if an artist, a placeand a date are named in thesame sentence, then a tuplecontaining them is returned.Baseline2: if an artist, a placeand a date are named morethan the others thecorrespondent tuple isreturned.SEEDLinear SEED: same importancegiven to SERP elementsNon-Linear SEED: moreimportance given to top-KSERP elements
  • 21. Total F-measure around 81%Named Entity Recognition Evaluation19 / 21Intro Background SEED Experiments Results Conclusions
  • 22. F-measure around 70.2%LINEAR: giving same importance to resultsF-measure around 70.5%NON-LINEAR: giving importance to top results20 / 21Intro Background SEED Experiments Results ConclusionsRelation Extraction Evaluation
  • 23. What we did so far• Introduced a novel RE techique to understand our predefined relations exploitingthe Social Web for a real world application• Developed a framework called SEED implementing our strategy• Evaluated SEED together with two baselinesFuture works• Improving NER phase• evaluate RE when an optimal NER is used and viceversa• Exploiting other social knowledges21 / 21Intro Background SEED Experiments Results Conclusions
  • 24. Thanks! Now Q&A