SlideShare a Scribd company logo
1 of 13
Download to read offline
MJ no more:
Using Wikipedia Concurrent Edit Spikes
With Social Network Plausibility Checks
For Breaking News Detection
Thomas Steiner (tomac@google.com, @tomayac)
Seth van Hooland (svhoolan@ulb.ac.be, @sethvanhooland)
Ed Summers (edsu@loc.gov, @edsu)
News more and more don't break on the newswire
First Story Detection on Realtime Social Networks
Typically based on Twitter because of their Streaming API [Twitter2012].
Try to detect spikes in time, locality, text (oftentimes restricted domain, e.
g., earthquake prediction).
A typical representative for this kind of approach is, e.g., [Petrović2010].
High recall
Low precision
[Twitter2012] https://dev.twitter.com/docs/streaming-apis/streams/public
[Petrović2010] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with
application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American
Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics,
Stroudsburg, PA, USA, 181–189.
Curation based on Wikipedia
Wikipedia page view logs are publicly available [Wikipedia2012]. Updated
on an hourly basis.
Osbourne et al. have successfully shown that there is a relation between
Wikipedia page views and news events [Osbourne2012].
Improves the approach of [Petrović2010] by using Wikipedia logs.
Key findings:
Wikipedia lags about 2h behind the news.
Newly created pages add noise.
[Wikipedia2012] http://dumps.wikimedia.org/other/pagecounts-raw/
[Osbourne2012] M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis. 2012. Bieber no more: First Story
Detection using Twitter and Wikipedia. In SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012),
Portland, Oregon, USA
Key idea: inverse the process
Use Wikipedia live IRC stream of recent changes [WikipediaIRC2012],
then do a sanity check on social networks.
[WikipediaIRC2012] http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
Introducing Wikipedia Live Monitor
Hooks into the Wikipedia recent changes IRC channels for all Wikipedia
locales.
Channel names follow the pattern
#language.project, e.g., #de.wikipedia
When an article gets edited, retrieve all language versions and treat them
as a cluster.
E.g., en:Albert_Einstein is in the same cluster as de:
Albert_Einstein.
1) ≥ 5 Occurrences
An article cluster must have at least n edits before it is considered a
breaking news candidate.
2) ≤60 Seconds Between Edits
An article cluster may have at max n seconds in between edits in order to
be regarded a breaking news candidate.
3) ≥2 Concurrent Editors
An article cluster must be edited by at least n concurrent editors before it
is considered a breaking news candidate.
4) ≤240 Seconds Since Last Edit
An article cluster is thrown out of the monitoring loop if its last edit is
longer ago than n seconds.
Breaking News Conditions
Koninginnedag (http://twitpic.com/cn1vgf/full)
Evaluation—Does it work at all?
Champions League Semi Final BVB vs. RMD with Lewandowski (http:
//twitpic.com/clo0s0)
Evaluation—Does it work at all?
Boston Bombings (https://twitter.
com/jason_koebler/statuses/323892465545388033,
http://www.usnews.com/news/articles/2013/04/15/is-wikipedia-better-for-
breaking-news-than-twitter)
Evaluation—Does it work at all?
Lag time for global events: <5 min
Resignation of Pope Benedict XVI (http://en.wikipedia.
org/wiki/Resignation_of_Pope_Benedict_XVI)
Three first edit times (UTC) after news broke on Feb 11, 2013
● English Wikipedia article: 10:58, 10:59, 11:02
● French Wikipedia article: 11:00, 11:00, 11:01
Implies that by looking at only two language versions (the actual number
of monitored versions is 42) of the Pope article, the system would have
reported the news at 11:01
Twitter account of Reuters announced the news at 10:59
Vatican Radio’s announcement was made at 10:57:47
Evaluation—How well does it work?
Work with realtime page view logs in addition to page edit logs
(API format currently being defined by Wikimedia)
News categorization and classification
E.g., Category Living-Persons removed from person implies (sad)
news
Improve false-positive rate, make connection with social networks and
actual article edits stronger
Auto notification system upon breaking news candidates
Pre-announcement: follow @WikiLiveMon
Future Work
Play with the system at
http://wikipedia-irc.herokuapp.com/
Read the paper at
http://arxiv.org/abs/1303.4702
Ask questions here or via
tomac@google.com & @tomayac
Demo and thank you

More Related Content

Viewers also liked

Shooting in Canada
Shooting in CanadaShooting in Canada
Shooting in CanadaNews Feather
 
Original felizmeno há luar
Original felizmeno há luarOriginal felizmeno há luar
Original felizmeno há luarKaryn XP
 
Internetový obchod 2007
Internetový obchod 2007Internetový obchod 2007
Internetový obchod 2007Filip Vatter
 
Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Ladislav Prskavec
 
Planilla retencion-iva
Planilla retencion-ivaPlanilla retencion-iva
Planilla retencion-ivaeve316
 
Sutherland media can_build awareness and growth! 2014
Sutherland media can_build awareness and growth! 2014Sutherland media can_build awareness and growth! 2014
Sutherland media can_build awareness and growth! 2014Janet Sutherland
 

Viewers also liked (13)

Kombis bab16 kel9_akt2
Kombis bab16 kel9_akt2Kombis bab16 kel9_akt2
Kombis bab16 kel9_akt2
 
Shooting in Canada
Shooting in CanadaShooting in Canada
Shooting in Canada
 
Original felizmeno há luar
Original felizmeno há luarOriginal felizmeno há luar
Original felizmeno há luar
 
Internetový obchod 2007
Internetový obchod 2007Internetový obchod 2007
Internetový obchod 2007
 
Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.Tessel is a microcontroller that runs JavaScript.
Tessel is a microcontroller that runs JavaScript.
 
Planilla retencion-iva
Planilla retencion-ivaPlanilla retencion-iva
Planilla retencion-iva
 
Kwn bab3 kel9_akt2
Kwn bab3 kel9_akt2Kwn bab3 kel9_akt2
Kwn bab3 kel9_akt2
 
Ekop bab9 kel4_akt2.ppt
Ekop bab9 kel4_akt2.pptEkop bab9 kel4_akt2.ppt
Ekop bab9 kel4_akt2.ppt
 
Kombis bab1 kel9_akt2
Kombis bab1 kel9_akt2Kombis bab1 kel9_akt2
Kombis bab1 kel9_akt2
 
Ekop bab12 kel4_akt2.ppt
Ekop bab12 kel4_akt2.pptEkop bab12 kel4_akt2.ppt
Ekop bab12 kel4_akt2.ppt
 
Sutherland media can_build awareness and growth! 2014
Sutherland media can_build awareness and growth! 2014Sutherland media can_build awareness and growth! 2014
Sutherland media can_build awareness and growth! 2014
 
Newbldg2004 2014
Newbldg2004 2014Newbldg2004 2014
Newbldg2004 2014
 
Kombis bab7 kel9_akt2
Kombis bab7 kel9_akt2Kombis bab7 kel9_akt2
Kombis bab7 kel9_akt2
 

Similar to Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Leon Derczynski
 
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018Kim Hammar
 
Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Damir Dobric
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notificationokazaki117
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaSymeon Papadopoulos
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3archiejones4
 
Tracking discourse on social media
Tracking discourse on social mediaTracking discourse on social media
Tracking discourse on social mediaAlexander Nwala
 
The Russian News Topic Modelling Based on Citation Detections
The Russian News Topic Modelling Based on Citation Detections The Russian News Topic Modelling Based on Citation Detections
The Russian News Topic Modelling Based on Citation Detections Institute of Contemporary Sciences
 
Rob Procter
Rob ProcterRob Procter
Rob ProcterNSMNSS
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentIoannis Katakis
 
The Future of Communication
The Future of CommunicationThe Future of Communication
The Future of CommunicationenseGO
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009Christopher Eagle
 
Die Zukunft der Kommunikation
Die Zukunft der KommunikationDie Zukunft der Kommunikation
Die Zukunft der KommunikationenseGO
 
Iaetsd real time event detection and alert system using sensors
Iaetsd real time event detection and alert system using sensorsIaetsd real time event detection and alert system using sensors
Iaetsd real time event detection and alert system using sensorsIaetsd Iaetsd
 
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...
A preliminary approach to knowledge integrity  risk assessment in Wikipedia p...A preliminary approach to knowledge integrity  risk assessment in Wikipedia p...
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...Pablo Aragón
 
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...icwe2015
 

Similar to Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection (20)

Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Wiki Analytics Workshop
Wiki Analytics WorkshopWiki Analytics Workshop
Wiki Analytics Workshop
 
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
Mining Social Media with Linked Open Data, Entity Recognition, and Event Extr...
 
Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018Kim Hammar Msc Thesis Defense - 2018
Kim Hammar Msc Thesis Defense - 2018
 
Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015Microsoft Io TechCamp Frankfurt am Main 2015
Microsoft Io TechCamp Frankfurt am Main 2015
 
Semantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event NotificationSemantic Twitter Analyzing Tweets For Real Time Event Notification
Semantic Twitter Analyzing Tweets For Real Time Event Notification
 
Computational Verification Challenges in Social Media
Computational Verification Challenges in Social MediaComputational Verification Challenges in Social Media
Computational Verification Challenges in Social Media
 
Strategic perspectives 3
Strategic perspectives 3Strategic perspectives 3
Strategic perspectives 3
 
Tracking discourse on social media
Tracking discourse on social mediaTracking discourse on social media
Tracking discourse on social media
 
The Russian News Topic Modelling Based on Citation Detections
The Russian News Topic Modelling Based on Citation Detections The Russian News Topic Modelling Based on Citation Detections
The Russian News Topic Modelling Based on Citation Detections
 
Rob Procter
Rob ProcterRob Procter
Rob Procter
 
Twitter Intelligent Sensor Agent
Twitter Intelligent Sensor AgentTwitter Intelligent Sensor Agent
Twitter Intelligent Sensor Agent
 
The Future of Communication
The Future of CommunicationThe Future of Communication
The Future of Communication
 
Information Management Trends 2009
Information Management Trends 2009Information Management Trends 2009
Information Management Trends 2009
 
Trend Analysis
Trend AnalysisTrend Analysis
Trend Analysis
 
Die Zukunft der Kommunikation
Die Zukunft der KommunikationDie Zukunft der Kommunikation
Die Zukunft der Kommunikation
 
Iaetsd real time event detection and alert system using sensors
Iaetsd real time event detection and alert system using sensorsIaetsd real time event detection and alert system using sensors
Iaetsd real time event detection and alert system using sensors
 
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on TwitterBroker Bots: Analyzing automated activity during High Impact Events on Twitter
Broker Bots: Analyzing automated activity during High Impact Events on Twitter
 
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...
A preliminary approach to knowledge integrity  risk assessment in Wikipedia p...A preliminary approach to knowledge integrity  risk assessment in Wikipedia p...
A preliminary approach to knowledge integrity risk assessment in Wikipedia p...
 
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
(Linked Data Development and Exploitation track) "Generating the Semantic Sna...
 

More from Gabriela Agustini

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalGabriela Agustini
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisGabriela Agustini
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e EducaçãoGabriela Agustini
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gilGabriela Agustini
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologyGabriela Agustini
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?Gabriela Agustini
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good ReportGabriela Agustini
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Gabriela Agustini
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucionalGabriela Agustini
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Gabriela Agustini
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Gabriela Agustini
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGabriela Agustini
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Gabriela Agustini
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoGabriela Agustini
 

More from Gabriela Agustini (20)

Como a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção globalComo a cultura maker vai mudar o modo de produção global
Como a cultura maker vai mudar o modo de produção global
 
Cidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociaisCidadãos como protagonistas das transformações sociais
Cidadãos como protagonistas das transformações sociais
 
Inovação digital
Inovação digital Inovação digital
Inovação digital
 
Movimento Maker e Educação
Movimento Maker e EducaçãoMovimento Maker e Educação
Movimento Maker e Educação
 
Cultura digital - Aula 4
Cultura digital - Aula 4Cultura digital - Aula 4
Cultura digital - Aula 4
 
Cultura Digital- aula 3
Cultura Digital- aula 3Cultura Digital- aula 3
Cultura Digital- aula 3
 
Cultura Digital- aula 2
Cultura Digital- aula 2Cultura Digital- aula 2
Cultura Digital- aula 2
 
Diversidade cultural gilberto gil
Diversidade cultural gilberto gilDiversidade cultural gilberto gil
Diversidade cultural gilberto gil
 
Social Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and TechnologySocial Entrepreneurship - International School of Law and Technology
Social Entrepreneurship - International School of Law and Technology
 
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
A tecnologia pode salvar a gente? | A gente pode salvar a tecnologia?
 
Makersfor Global Good Report
Makersfor Global Good ReportMakersfor Global Good Report
Makersfor Global Good Report
 
Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17Apresentação olabi institucional interna - abril 17
Apresentação olabi institucional interna - abril 17
 
7 Forum Nacional de Museus
7 Forum Nacional de Museus7 Forum Nacional de Museus
7 Forum Nacional de Museus
 
Apresentacao metashop
Apresentacao metashopApresentacao metashop
Apresentacao metashop
 
Pretalab- apresentação institucional
Pretalab- apresentação institucionalPretalab- apresentação institucional
Pretalab- apresentação institucional
 
Cultura e tecnologia - aula2
Cultura e tecnologia - aula2Cultura e tecnologia - aula2
Cultura e tecnologia - aula2
 
Cultura e tecnologia - aula1
Cultura e tecnologia - aula1Cultura e tecnologia - aula1
Cultura e tecnologia - aula1
 
Global Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine GermanyGlobal Innovation Gathering featured in Make Magazine Germany
Global Innovation Gathering featured in Make Magazine Germany
 
Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos Inovação de baixo para cima e o poder dos cidadãos
Inovação de baixo para cima e o poder dos cidadãos
 
Makerspaces e hubs de inovação
Makerspaces e hubs de inovaçãoMakerspaces e hubs de inovação
Makerspaces e hubs de inovação
 

Recently uploaded

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...itnewsafrica
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxLoriGlavin3
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesBernd Ruecker
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterMydbops
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfpanagenda
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxLoriGlavin3
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Hiroshi SHIBATA
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkPixlogix Infotech
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...Karmanjay Verma
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...BookNet Canada
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality AssuranceInflectra
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observabilityitnewsafrica
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersNicole Novielli
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFMichael Gough
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Nikki Chapple
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Kaya Weers
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPathCommunity
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfAarwolf Industries LLC
 

Recently uploaded (20)

Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...Abdul Kader Baba- Managing Cybersecurity Risks  and Compliance Requirements i...
Abdul Kader Baba- Managing Cybersecurity Risks and Compliance Requirements i...
 
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptxA Deep Dive on Passkeys: FIDO Paris Seminar.pptx
A Deep Dive on Passkeys: FIDO Paris Seminar.pptx
 
QCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architecturesQCon London: Mastering long-running processes in modern architectures
QCon London: Mastering long-running processes in modern architectures
 
Scale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL RouterScale your database traffic with Read & Write split using MySQL Router
Scale your database traffic with Read & Write split using MySQL Router
 
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdfSo einfach geht modernes Roaming fuer Notes und Nomad.pdf
So einfach geht modernes Roaming fuer Notes und Nomad.pdf
 
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyesHow to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
How to Effectively Monitor SD-WAN and SASE Environments with ThousandEyes
 
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptxDigital Identity is Under Attack: FIDO Paris Seminar.pptx
Digital Identity is Under Attack: FIDO Paris Seminar.pptx
 
Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024Long journey of Ruby standard library at RubyConf AU 2024
Long journey of Ruby standard library at RubyConf AU 2024
 
React Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App FrameworkReact Native vs Ionic - The Best Mobile App Framework
React Native vs Ionic - The Best Mobile App Framework
 
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...React JS; all concepts. Contains React Features, JSX, functional & Class comp...
React JS; all concepts. Contains React Features, JSX, functional & Class comp...
 
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
Transcript: New from BookNet Canada for 2024: BNC SalesData and LibraryData -...
 
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance[Webinar] SpiraTest - Setting New Standards in Quality Assurance
[Webinar] SpiraTest - Setting New Standards in Quality Assurance
 
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security ObservabilityGlenn Lazarus- Why Your Observability Strategy Needs Security Observability
Glenn Lazarus- Why Your Observability Strategy Needs Security Observability
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
A Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software DevelopersA Journey Into the Emotions of Software Developers
A Journey Into the Emotions of Software Developers
 
All These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDFAll These Sophisticated Attacks, Can We Really Detect Them - PDF
All These Sophisticated Attacks, Can We Really Detect Them - PDF
 
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
Microsoft 365 Copilot: How to boost your productivity with AI – Part one: Ado...
 
Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)Design pattern talk by Kaya Weers - 2024 (v2)
Design pattern talk by Kaya Weers - 2024 (v2)
 
UiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to HeroUiPath Community: Communication Mining from Zero to Hero
UiPath Community: Communication Mining from Zero to Hero
 
Landscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdfLandscape Catalogue 2024 Australia-1.pdf
Landscape Catalogue 2024 Australia-1.pdf
 

Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection

  • 1. MJ no more: Using Wikipedia Concurrent Edit Spikes With Social Network Plausibility Checks For Breaking News Detection Thomas Steiner (tomac@google.com, @tomayac) Seth van Hooland (svhoolan@ulb.ac.be, @sethvanhooland) Ed Summers (edsu@loc.gov, @edsu)
  • 2. News more and more don't break on the newswire
  • 3. First Story Detection on Realtime Social Networks Typically based on Twitter because of their Streaming API [Twitter2012]. Try to detect spikes in time, locality, text (oftentimes restricted domain, e. g., earthquake prediction). A typical representative for this kind of approach is, e.g., [Petrović2010]. High recall Low precision [Twitter2012] https://dev.twitter.com/docs/streaming-apis/streams/public [Petrović2010] Saša Petrović, Miles Osborne, and Victor Lavrenko. 2010. Streaming first story detection with application to Twitter. In Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics (HLT '10). Association for Computational Linguistics, Stroudsburg, PA, USA, 181–189.
  • 4. Curation based on Wikipedia Wikipedia page view logs are publicly available [Wikipedia2012]. Updated on an hourly basis. Osbourne et al. have successfully shown that there is a relation between Wikipedia page views and news events [Osbourne2012]. Improves the approach of [Petrović2010] by using Wikipedia logs. Key findings: Wikipedia lags about 2h behind the news. Newly created pages add noise. [Wikipedia2012] http://dumps.wikimedia.org/other/pagecounts-raw/ [Osbourne2012] M. Osborne, S. Petrovic, R. McCreadie, C. Macdonald, I. Ounis. 2012. Bieber no more: First Story Detection using Twitter and Wikipedia. In SIGIR 2012 Workshop on Time-aware Information Access (#TAIA2012), Portland, Oregon, USA
  • 5. Key idea: inverse the process Use Wikipedia live IRC stream of recent changes [WikipediaIRC2012], then do a sanity check on social networks. [WikipediaIRC2012] http://meta.wikimedia.org/wiki/IRC/Channels#Raw_feeds
  • 6. Introducing Wikipedia Live Monitor Hooks into the Wikipedia recent changes IRC channels for all Wikipedia locales. Channel names follow the pattern #language.project, e.g., #de.wikipedia When an article gets edited, retrieve all language versions and treat them as a cluster. E.g., en:Albert_Einstein is in the same cluster as de: Albert_Einstein.
  • 7. 1) ≥ 5 Occurrences An article cluster must have at least n edits before it is considered a breaking news candidate. 2) ≤60 Seconds Between Edits An article cluster may have at max n seconds in between edits in order to be regarded a breaking news candidate. 3) ≥2 Concurrent Editors An article cluster must be edited by at least n concurrent editors before it is considered a breaking news candidate. 4) ≤240 Seconds Since Last Edit An article cluster is thrown out of the monitoring loop if its last edit is longer ago than n seconds. Breaking News Conditions
  • 9. Champions League Semi Final BVB vs. RMD with Lewandowski (http: //twitpic.com/clo0s0) Evaluation—Does it work at all?
  • 11. Lag time for global events: <5 min Resignation of Pope Benedict XVI (http://en.wikipedia. org/wiki/Resignation_of_Pope_Benedict_XVI) Three first edit times (UTC) after news broke on Feb 11, 2013 ● English Wikipedia article: 10:58, 10:59, 11:02 ● French Wikipedia article: 11:00, 11:00, 11:01 Implies that by looking at only two language versions (the actual number of monitored versions is 42) of the Pope article, the system would have reported the news at 11:01 Twitter account of Reuters announced the news at 10:59 Vatican Radio’s announcement was made at 10:57:47 Evaluation—How well does it work?
  • 12. Work with realtime page view logs in addition to page edit logs (API format currently being defined by Wikimedia) News categorization and classification E.g., Category Living-Persons removed from person implies (sad) news Improve false-positive rate, make connection with social networks and actual article edits stronger Auto notification system upon breaking news candidates Pre-announcement: follow @WikiLiveMon Future Work
  • 13. Play with the system at http://wikipedia-irc.herokuapp.com/ Read the paper at http://arxiv.org/abs/1303.4702 Ask questions here or via tomac@google.com & @tomayac Demo and thank you