First Story Detection on Realtime Social NetworksTypically based on Twitter because of their Streaming API [Twitter2012].Try to detect spikes in time, locality, text (oftentimes restricted domain, e.g., earthquake prediction).A typical representative for this kind of approach is, e.g., [Petrović2010].High recallLow precision[Twitter2012] https://dev.twitter.com/docs/streaming-apis/streams/public[Petrović2010] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection withapplication to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North AmericanChapter of the Association for Computational Linguistics (HLT 10). Association for Computational Linguistics,Stroudsburg, PA, USA, 181–189.
Curation based on WikipediaWikipedia page view logs are publicly available [Wikipedia2012]. Updatedon an hourly basis.Osbourne et al. have successfully shown that there is a relation betweenWikipedia page views and news events [Osbourne2012].Improves the approach of [Petrović2010] by using Wikipedia logs.Key findings:Wikipedia lags about 2h behind the news.Newly created pages add noise.[Wikipedia2012] http://dumps.wikimedia.org/other/pagecounts-raw/[Osbourne2012] M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis. 2012. Bieber no more: First StoryDetection using Twitter and Wikipedia. In SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012),Portland, Oregon, USA
Key idea: inverse the processUse Wikipedia live IRC stream of recent changes [WikipediaIRC2012],then do a sanity check on social networks.[WikipediaIRC2012] http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
Introducing Wikipedia Live MonitorHooks into the Wikipedia recent changes IRC channels for all Wikipedialocales.Channel names follow the pattern#language.project, e.g., #de.wikipediaWhen an article gets edited, retrieve all language versions and treat themas a cluster.E.g., en:Albert_Einstein is in the same cluster as de:Albert_Einstein.
1) ≥ 5 OccurrencesAn article cluster must have at least n edits before it is considered abreaking news candidate.2) ≤60 Seconds Between EditsAn article cluster may have at max n seconds in between edits in order tobe regarded a breaking news candidate.3) ≥2 Concurrent EditorsAn article cluster must be edited by at least n concurrent editors before itis considered a breaking news candidate.4) ≤240 Seconds Since Last EditAn article cluster is thrown out of the monitoring loop if its last edit islonger ago than n seconds.Breaking News Conditions
Koninginnedag (http://twitpic.com/cn1vgf/full)Evaluation—Does it work at all?
Champions League Semi Final BVB vs. RMD with Lewandowski (http://twitpic.com/clo0s0)Evaluation—Does it work at all?
Boston Bombings (https://twitter.com/jason_koebler/statuses/323892465545388033,http://www.usnews.com/news/articles/2013/04/15/is-wikipedia-better-for-breaking-news-than-twitter)Evaluation—Does it work at all?
Lag time for global events: <5 minResignation of Pope Benedict XVI (http://en.wikipedia.org/wiki/Resignation_of_Pope_Benedict_XVI)Three ﬁrst edit times (UTC) after news broke on Feb 11, 2013● English Wikipedia article: 10:58, 10:59, 11:02● French Wikipedia article: 11:00, 11:00, 11:01Implies that by looking at only two language versions (the actual numberof monitored versions is 42) of the Pope article, the system would havereported the news at 11:01Twitter account of Reuters announced the news at 10:59Vatican Radio’s announcement was made at 10:57:47Evaluation—How well does it work?
Work with realtime page view logs in addition to page edit logs(API format currently being defined by Wikimedia)News categorization and classificationE.g., Category Living-Persons removed from person implies (sad)newsImprove false-positive rate, make connection with social networks andactual article edits strongerAuto notification system upon breaking news candidatesPre-announcement: follow @WikiLiveMonFuture Work
Play with the system athttp://wikipedia-irc.herokuapp.com/Read the paper athttp://arxiv.org/abs/1303.4702Ask questions here or email@example.com & @tomayacDemo and thank you