SlideShare a Scribd company logo
PROJECT 	

‘NIGHTCRAWLER’
Real-time event mining
SOCIAL MEDIA IS 	

RESHAPING NEWS
• In recent years, social networks have become a very popular
tool for expressing opinions, broadcasting news	

• Twitter is often the most up-to-date source of breaking news.	

• Roughly two-thirds (64%) of U.S. adults use social networks to
obtain knowledge about latest events.	

• Hence, there is considerable interest in developing systems
that can extract useful information from a stream of social
data. 	

• We address the problem of detecting new events from a
stream of Twitter posts - real time first story detection.
NOTION OF NEWS
We assume that “news” are defined by specific contextual and temporal patterns in
Twitter activity. 	

Principal component of “news” is any original content, that is going to be interesting for
a significant amount of people. We propose a model that is able to detect those
patterns at early stage, ideally before they become actual “news”. 	

Those patterns often have distinctive features:	

• similar content among the group of tweets
• specific temporal distribution
• specific spatial distribution
IMPLEMENTATION SUMMARY
• Extract all tweets arrived during the last 3 days from the database	

• Using this batch, we create a vocabulary and aTF-IDF matrix	

• Upon arrival, a new tweet is represented as a vector in high-dimensional
(d~500k) space	

• We search for k-NN for this vector under a certain threshold	

• Found NNs become candidates for a cluster	

• LSH is applied with respect to NNs spatial/temporal distribution	

• MaxCover is used to obtain the best clusterisation	

• Clusters are analysed and the results are presented to the user
Instagram Photos Swarm check-ins
LocationKeywords
Tweets
Query
STORM IS COMING!
ACKNOWLEDGEMENTS
• Denis Antyukhov (mining, LSH, maxcover)	

• Fedor Chervinskii (preprocession, parsing)	

• Nikita Pestrov (TF-IDF, vocabulary)

More Related Content

Similar to DMG_final

Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Labic Ufes
 
Gaza_Audience Gatekeeping
Gaza_Audience GatekeepingGaza_Audience Gatekeeping
Gaza_Audience Gatekeeping
Kyounghee Hazel Kwon
 
Finding News Curators in Twitter
Finding News Curators in TwitterFinding News Curators in Twitter
Finding News Curators in Twitter
Gabriela Agustini
 
Lect6 technologies fall 2017
Lect6 technologies fall 2017Lect6 technologies fall 2017
Lect6 technologies fall 2017
SukaynaAmeen
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
Ashutosh Jadhav
 
Carly and erin additions (1) final draft
Carly and erin additions (1) final draftCarly and erin additions (1) final draft
Carly and erin additions (1) final draft
cjsaha01
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
Axel Bruns
 
Fic0114 lecture 9 newsgathering & reporting
Fic0114 lecture 9   newsgathering & reportingFic0114 lecture 9   newsgathering & reporting
Fic0114 lecture 9 newsgathering & reporting
Philip Gan
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
Department of Communication Science, University of Amsterdam
 
Ins and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis ServiceIns and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis Service
Arjumand Younus
 
Gatewatching and News Curation: Social Media and the Public Sphere
Gatewatching and News Curation: Social Media and the Public SphereGatewatching and News Curation: Social Media and the Public Sphere
Gatewatching and News Curation: Social Media and the Public Sphere
Axel Bruns
 
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
Neil Foote
 
final_nlp
final_nlpfinal_nlp
final_nlp
aphex34
 
Storytelling 1
Storytelling 1Storytelling 1
Storytelling 1
Julia Goldberg
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
gjhouben
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative Reporting
Jennifer Strong
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET Journal
 
Franck Rebillard, Professeur Université Paris 3
Franck Rebillard, Professeur Université Paris 3Franck Rebillard, Professeur Université Paris 3
Franck Rebillard, Professeur Université Paris 3
SMCFrance
 
Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015
Amanda noonan
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
Simon Bishop
 

Similar to DMG_final (20)

Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
Multiple points of view in #VemPraRua Retweets: the perspectival method of ne...
 
Gaza_Audience Gatekeeping
Gaza_Audience GatekeepingGaza_Audience Gatekeeping
Gaza_Audience Gatekeeping
 
Finding News Curators in Twitter
Finding News Curators in TwitterFinding News Curators in Twitter
Finding News Curators in Twitter
 
Lect6 technologies fall 2017
Lect6 technologies fall 2017Lect6 technologies fall 2017
Lect6 technologies fall 2017
 
Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course Twitris - Web Information System 2011 Course
Twitris - Web Information System 2011 Course
 
Carly and erin additions (1) final draft
Carly and erin additions (1) final draftCarly and erin additions (1) final draft
Carly and erin additions (1) final draft
 
Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...Information Contagion through Social Media: Towards a Realistic Model of the ...
Information Contagion through Social Media: Towards a Realistic Model of the ...
 
Fic0114 lecture 9 newsgathering & reporting
Fic0114 lecture 9   newsgathering & reportingFic0114 lecture 9   newsgathering & reporting
Fic0114 lecture 9 newsgathering & reporting
 
Conceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news itemsConceptualizing and measuring news exposure as network of users and news items
Conceptualizing and measuring news exposure as network of users and news items
 
Ins and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis ServiceIns and Outs of News Twitter as a Real-Time News Analysis Service
Ins and Outs of News Twitter as a Real-Time News Analysis Service
 
Gatewatching and News Curation: Social Media and the Public Sphere
Gatewatching and News Curation: Social Media and the Public SphereGatewatching and News Curation: Social Media and the Public Sphere
Gatewatching and News Curation: Social Media and the Public Sphere
 
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas26 Feb 09   Online Journalism Crowdsourcing Wikis Story Ideas
26 Feb 09 Online Journalism Crowdsourcing Wikis Story Ideas
 
final_nlp
final_nlpfinal_nlp
final_nlp
 
Storytelling 1
Storytelling 1Storytelling 1
Storytelling 1
 
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
UMAP 2013 - Link, Like, Follow, Friend: The Social Element in User Modeling a...
 
Artificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative ReportingArtificial Intelligence For Investigative Reporting
Artificial Intelligence For Investigative Reporting
 
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
IRJET- Identification of Prevalent News from Twitter and Traditional Media us...
 
Franck Rebillard, Professeur Université Paris 3
Franck Rebillard, Professeur Université Paris 3Franck Rebillard, Professeur Université Paris 3
Franck Rebillard, Professeur Université Paris 3
 
Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015Beyond-Data-Literacy-2015
Beyond-Data-Literacy-2015
 
Using Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate ResearcherUsing Twitter as a Postgraduate Researcher
Using Twitter as a Postgraduate Researcher
 

DMG_final

  • 2. SOCIAL MEDIA IS RESHAPING NEWS • In recent years, social networks have become a very popular tool for expressing opinions, broadcasting news • Twitter is often the most up-to-date source of breaking news. • Roughly two-thirds (64%) of U.S. adults use social networks to obtain knowledge about latest events. • Hence, there is considerable interest in developing systems that can extract useful information from a stream of social data. • We address the problem of detecting new events from a stream of Twitter posts - real time first story detection.
  • 3.
  • 4. NOTION OF NEWS We assume that “news” are defined by specific contextual and temporal patterns in Twitter activity. Principal component of “news” is any original content, that is going to be interesting for a significant amount of people. We propose a model that is able to detect those patterns at early stage, ideally before they become actual “news”. Those patterns often have distinctive features: • similar content among the group of tweets • specific temporal distribution • specific spatial distribution
  • 5. IMPLEMENTATION SUMMARY • Extract all tweets arrived during the last 3 days from the database • Using this batch, we create a vocabulary and aTF-IDF matrix • Upon arrival, a new tweet is represented as a vector in high-dimensional (d~500k) space • We search for k-NN for this vector under a certain threshold • Found NNs become candidates for a cluster • LSH is applied with respect to NNs spatial/temporal distribution • MaxCover is used to obtain the best clusterisation • Clusters are analysed and the results are presented to the user
  • 6.
  • 7. Instagram Photos Swarm check-ins LocationKeywords Tweets Query
  • 9. ACKNOWLEDGEMENTS • Denis Antyukhov (mining, LSH, maxcover) • Fedor Chervinskii (preprocession, parsing) • Nikita Pestrov (TF-IDF, vocabulary)