MJ no more:Using Wikipedia Concurrent Edit SpikesWith Social Network Plausibility ChecksFor Breaking News DetectionThomas ...
News more and more dont break on the newswire
First Story Detection on Realtime Social NetworksTypically based on Twitter because of their Streaming API [Twitter2012].T...
Curation based on WikipediaWikipedia page view logs are publicly available [Wikipedia2012]. Updatedon an hourly basis.Osbo...
Key idea: inverse the processUse Wikipedia live IRC stream of recent changes [WikipediaIRC2012],then do a sanity check on ...
Introducing Wikipedia Live MonitorHooks into the Wikipedia recent changes IRC channels for all Wikipedialocales.Channel na...
1) ≥ 5 OccurrencesAn article cluster must have at least n edits before it is considered abreaking news candidate.2) ≤60 Se...
Koninginnedag (http://twitpic.com/cn1vgf/full)Evaluation—Does it work at all?
Champions League Semi Final BVB vs. RMD with Lewandowski (http://twitpic.com/clo0s0)Evaluation—Does it work at all?
Boston Bombings (https://twitter.com/jason_koebler/statuses/323892465545388033,http://www.usnews.com/news/articles/2013/04...
Lag time for global events: <5 minResignation of Pope Benedict XVI (http://en.wikipedia.org/wiki/Resignation_of_Pope_Bened...
Work with realtime page view logs in addition to page edit logs(API format currently being defined by Wikimedia)News categ...
Play with the system athttp://wikipedia-irc.herokuapp.com/Read the paper athttp://arxiv.org/abs/1303.4702Ask questions her...
Upcoming SlideShare
Loading in …5
×

Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection

18,924 views
18,446 views

Published on

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
18,924
On SlideShare
0
From Embeds
0
Number of Embeds
210
Actions
Shares
0
Downloads
9
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection

  1. 1. MJ no more:Using Wikipedia Concurrent Edit SpikesWith Social Network Plausibility ChecksFor Breaking News DetectionThomas Steiner (tomac@google.com, @tomayac)Seth van Hooland (svhoolan@ulb.ac.be, @sethvanhooland)Ed Summers (edsu@loc.gov, @edsu)
  2. 2. News more and more dont break on the newswire
  3. 3. First Story Detection on Realtime Social NetworksTypically based on Twitter because of their Streaming API [Twitter2012].Try to detect spikes in time, locality, text (oftentimes restricted domain, e.g., earthquake prediction).A typical representative for this kind of approach is, e.g., [Petrović2010].High recallLow precision[Twitter2012] https://dev.twitter.com/docs/streaming-apis/streams/public[Petrović2010] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection withapplication to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North AmericanChapter of the Association for Computational Linguistics (HLT 10). Association for Computational Linguistics,Stroudsburg, PA, USA, 181–189.
  4. 4. Curation based on WikipediaWikipedia page view logs are publicly available [Wikipedia2012]. Updatedon an hourly basis.Osbourne et al. have successfully shown that there is a relation betweenWikipedia page views and news events [Osbourne2012].Improves the approach of [Petrović2010] by using Wikipedia logs.Key findings:Wikipedia lags about 2h behind the news.Newly created pages add noise.[Wikipedia2012] http://dumps.wikimedia.org/other/pagecounts-raw/[Osbourne2012] M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis. 2012. Bieber no more: First StoryDetection using Twitter and Wikipedia. In SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012),Portland, Oregon, USA
  5. 5. Key idea: inverse the processUse Wikipedia live IRC stream of recent changes [WikipediaIRC2012],then do a sanity check on social networks.[WikipediaIRC2012] http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
  6. 6. Introducing Wikipedia Live MonitorHooks into the Wikipedia recent changes IRC channels for all Wikipedialocales.Channel names follow the pattern#language.project, e.g., #de.wikipediaWhen an article gets edited, retrieve all language versions and treat themas a cluster.E.g., en:Albert_Einstein is in the same cluster as de:Albert_Einstein.
  7. 7. 1) ≥ 5 OccurrencesAn article cluster must have at least n edits before it is considered abreaking news candidate.2) ≤60 Seconds Between EditsAn article cluster may have at max n seconds in between edits in order tobe regarded a breaking news candidate.3) ≥2 Concurrent EditorsAn article cluster must be edited by at least n concurrent editors before itis considered a breaking news candidate.4) ≤240 Seconds Since Last EditAn article cluster is thrown out of the monitoring loop if its last edit islonger ago than n seconds.Breaking News Conditions
  8. 8. Koninginnedag (http://twitpic.com/cn1vgf/full)Evaluation—Does it work at all?
  9. 9. Champions League Semi Final BVB vs. RMD with Lewandowski (http://twitpic.com/clo0s0)Evaluation—Does it work at all?
  10. 10. Boston Bombings (https://twitter.com/jason_koebler/statuses/323892465545388033,http://www.usnews.com/news/articles/2013/04/15/is-wikipedia-better-for-breaking-news-than-twitter)Evaluation—Does it work at all?
  11. 11. Lag time for global events: <5 minResignation of Pope Benedict XVI (http://en.wikipedia.org/wiki/Resignation_of_Pope_Benedict_XVI)Three first edit times (UTC) after news broke on Feb 11, 2013● English Wikipedia article: 10:58, 10:59, 11:02● French Wikipedia article: 11:00, 11:00, 11:01Implies that by looking at only two language versions (the actual numberof monitored versions is 42) of the Pope article, the system would havereported the news at 11:01Twitter account of Reuters announced the news at 10:59Vatican Radio’s announcement was made at 10:57:47Evaluation—How well does it work?
  12. 12. Work with realtime page view logs in addition to page edit logs(API format currently being defined by Wikimedia)News categorization and classificationE.g., Category Living-Persons removed from person implies (sad)newsImprove false-positive rate, make connection with social networks andactual article edits strongerAuto notification system upon breaking news candidatesPre-announcement: follow @WikiLiveMonFuture Work
  13. 13. Play with the system athttp://wikipedia-irc.herokuapp.com/Read the paper athttp://arxiv.org/abs/1303.4702Ask questions here or viatomac@google.com & @tomayacDemo and thank you

×