News Search Using Discourse Analytics
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

News Search Using Discourse Analytics

on

  • 291 views

Enhanching access to information within digital heritage archives, e.g. New York Times Corpus, by identifying discourse phenomena and searchng and filtering events according to multiple facets.

Enhanching access to information within digital heritage archives, e.g. New York Times Corpus, by identifying discourse phenomena and searchng and filtering events according to multiple facets.

Statistics

Views

Total Views
291
Views on SlideShare
291
Embed Views
0

Actions

Likes
0
Downloads
1
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

CC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike LicenseCC Attribution-NonCommercial-ShareAlike License

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

News Search Using Discourse Analytics Presentation Transcript

  • 1. News Search Using Discourse Analytics Claudiu Mihăilă The National Centre for Text Mining The University of Manchester
  • 2. Data Growing Exponential growth of data Information overload
  • 3. Data Pouring Exponential growth of data Information overload Data deluge
  • 4. Data Processing Exponential growth of data Information overload Data deluge Can we process a deluge of data in a useful manner?
  • 5. Searching Give a query as input Obtain a set of relevant articles Keyword v. Semantics – Synonyms – Hyponyms – Spelling variants – Inflections – Relations between query terms
  • 6. Searching Keywords Crimes in the town of Sandwich
  • 7. Searching Keywords Crimes in the town of Sandwich – Crime Sandwich by Click Bang Productions on SoundCloud – Sandwich Crime - Topix – Crime on rye: Four accused of stealing $10 sandwich from car – Crime Scene Sandwich Bags – Crime rate in Sandwich, Illinois (IL): murders, rapes, robberies – Ham Sandwich Nation: Due Process When Everything is a Crime
  • 8. Searching Semantics Crimes in the town of Sandwich
  • 9. Searching Semantics Crimes in the town of Sandwich – Kent Police issue warning after fake £20 notes reported in Sandwich – Trio jailed for total of 30 years after crime spree in Sandwich – Murder at Sandwich - Kent
  • 10. Semantic search engine Features Specification of semantic types of search terms: town:Sandwich Normalisation of semantic entities: Sandwich, Kent = Sandwich, UK Relations between search terms to describe events: location:Sandwich Restrictions on discourse context of retrieved events
  • 11. Structured events The event
  • 12. Discourse interpretation The story Karl Munro may have killed Sunita in Weatherfield in 2013. According to Karl Munro, Craig Tinker set Sunita on fire in Weatherfield in 2013. Karl Munro said he will kill Sunita. Karl Munro didn’t fail to kill Sunita in Weatherfield in 2013. Stella Price condemned all of Karl’s wrongdoings.
  • 13. ACE corpus 2005 version Discourse -related Attributes 599 news-domain documents Polarity – News articles Tense – Transcripts of broadcast news Specificity – Transcripts of broadcast conversation Modality – Conversational telephone speech – Weblogs – Discussion fora Source type Subjectivity
  • 14. Discourse context of events Scheme
  • 15. New York Times corpus Digital archive 20 years-worth of news articles – 1.8M Includes annotations of – Metadata – Named entities – Normalisation Facilitates diachronic studies – Language evolution – Social change – Development of events
  • 16. ISHER Semantically enabled searching Web-based User-friendly interface Intuitive query-building mechanism Refining/filtering according to facets
  • 17. ISHER Automatic Event Recognition - EventMine Miwa, Thompson, Ananiadou. (2012). Boosting automatic event extraction from the literature using domain adaptation and coreference resolution. Bioinformatics, 28(13), 1759-1765
  • 18. ISHER Web-based interface – “Coronation Street”
  • 19. ISHER Semantic clustering Lingo – 3rd party NaCTeM clustering
  • 20. ISHER Semantic clustering Cluster summarisation
  • 21. ISHER Metadata in the NYT corpus
  • 22. ISHER Entities
  • 23. ISHER Events
  • 24. ISHER Events Prime Minister Tony Blair’s election last month
  • 25. Final remarks Other domains Same technique can be adapted to other domains Previously developed –EUPMC – medical journal articles –ASCOT – clinical trials
  • 26. Final remarks Summary Future work Enhanced access to information within digital heritage archives (NYT) Apply to new domains and institutional repositories Identified discourse phenomena to search for and filter events Customise towards social unrest Created ISHER, semantic search engine to access the NYT corpus Other languages in danger of digital extinction – Meta-Net Diachronic studies
  • 27. Thank you!