SlideShare a Scribd company logo
A Framework for Collecting,
Extracting and Managing Event
Identity Information from Twitter
Debanjan Mahata, John R. Talburt
dxmahata@ualr.edu, jrtalburt@ualr.edu
Department of Information Science
University of Arkansas at Little Rock
Vivek Kumar Singh
vivek@cs.sau.ac.in
Department of Computer Science
South Asian University, New Delhi, India
Social Media
 A daily average of 58 million tweets is posted in Twitter. Source: http://goo.gl/Oz5sIZ
 An average 60 million photos are shared in Instagram daily. Source: http://instagram.com/press
 Facebook stores 300 petabytes of data related to its users from all over the
world. Source: http://goo.gl/XxEfeX
 72% of all internet users are now active on social media. Source: http://goo.gl/qAuIoe
 46% of adult Internet users post original photos or videos online that they
themselves have created. Source: http://goo.gl/iQ06Ix
/
Real-life Events
EIIM in MDM
Zhou, Yinle, and John Talburt. "Entity identity information management (EIIM)."International Conference on Information
Quality (ICIQ-11), Adelaide, Australia. 2011.
Problem Definition
Challenges
Volume and Velocity Veracity
New post: Sochi Was For Suckers -
Laugh Studios/
http://t.co/cWQJCBp3Ow #lol
#funny #rofl #funnypic #fail #wtf
Informal Text
Variety
Searching the Long TailSampling
Bias
Sparse Link
Structure Between
Content in
Social Media
Lack of Evaluation
Datasets
EIIM Life Cycle in Twitter
Mahata, Debanjan, and John Talburt. "A Framework for Collecting and Managing Entity Identity Information from Social Media.“ 19th
International Conference on Information Quality, Xi’An, China.
Identity Integrity1
Assigns unique identifier to a
real-life event being tracked by
the framework and maintains
the same identifier for newly
collected event references
Identity Integrity Requires
• Each real-world event in the domain has one and only
one representation in the information system.
• Distinct real-world events have distinct representations
in the information system.
Allocates
individual EIIS to
each real-life event
being tracked by
the framework
Event Reference Preparation
• Parts-of-Speech Tagging
• Special Character Detection
• Data Cleansing
• Duplicate Detection
• Stop Word Detection and Elimination
• Slang Word Extraction
• Feeling Word Extraction
• Tokenization
• Stemming
• Tweet Meta-Data
• Expanded URLs
• User Information
• Verification
• Favorite Count
• Retweet Count
• User Mentions
• Entity Extraction
Event Related Content Analysis
Event Identity Information Processing
EventIdentityInfoGraph
Process
using
EventIdentity
InfoRank
7
NDCG Curves for Millions March NYC
NDCG Values for Millions March NYC
Precision Values for Millions March
NYC
Potential Applications
• Event Monitoring and Analysis
• Event Information Retrieval
• Opinion and Review Mining
• Recommender Systems
• Event Management and Marketing
• Social Media Data Integration
• Many More
Future Directions
• Summarizing Event Content
• Identification of Insightful Opinionated
Content
• Event Topic Modeling
• Event-specific Recommendations
• Distributed Processing of
TwitterEventInfoGraph
• Ontology for Event Content in Social Media
• Many More
Additional Slides
Tweet Features
No. of Unigram Tokens, No. of Stop Words, No. of Slang
Words, No. of Feeling Words, No. of Hashtags, Has URL,
Is Verified, No. of User Mentions, Length of Post, No. of
Unique Characters, No. of Special Characters, Favorite
Count, Retweet Count, Formality, No. of Nouns, No. of
Adjectives, No. of Verbs, No. of Adverbs.
Logistic Regression
Model
Performance
Precision Recall F-1 Score
Non-informative (0) 0.70 0.49 0.57
Informative (1) 0.78 0.90 0.84
Avg/Total
Accuracy = 76.64%
0.76 0.77 0.75
Olteanu, Alexandra, et al. "CrisisLex: A lexicon for collecting and filtering microblogged communications in crises." In Proceedings of
the 8th International AAAI Conference on Weblogs and Social Media (ICWSM" 14). No. EPFL-CONF-203561. 2014.
Event Information Quality
28000 annotated tweets
26 Events
Related and Informative – “#Media
Large wildfire in N. Colorado prompts
Evacuation : Crews are battling a fast-
Moving wildfire http://t.co/ju1BGTKH
#Politics #News”
Related but not Informative – “RT
@LarimerSheriff: #HighParkFire
update http://t.co/hBy5shen”
Not Related – “#Intern #US #TATTOO
#Wisconsin #Ohio #NC #PA #Florida
#Colorado #Iowa #Nevada #Virginia
#NV #mlb Travel Destinations;
http://t.co/TIHBJKF2”
Event Related Content Analysis
EventIdentityInfoRank
NDCG Values for Millions March NYC
NDCG Curves for Millions March NYC
Precision Values for Millions March
NYC
NDCG Values for Sydney Siege Crisis
NDCG Curves for Sydney Siege Crisis
Precision Values for Sydney Siege
Crisis
• SeenRank (http://seen.co/about)
• TextRank (Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing order into texts." Association for
Computational Linguistics, 2004.)
• LexRank(Erkan, Günes, and Dragomir R. Radev. "LexRank: graph-based lexical centrality as salience
in text summarization." Journal of Artificial Intelligence Research (2004): 457-479.)
• RTRank
• Centroid(Becker, Hila, Mor Naaman, and Luis Gravano. "Selecting Quality Twitter Content for
Events." ICWSM 11 (2011).)
• Logistic Regression
Baselines
Evaluation Metrics
 


p
i
rel
p
i
DCG
i
1 )1log(
12
p
p
p
IDCG
DCG
nDCG 
n
natreferencesrelevantofNumber
natecision Pr
Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999.
Järvelin, Kalervo, and Jaana Kekäläinen. "Cumulated gain-based evaluation of IR techniques." ACM Transactions on Information
Systems (TOIS) 20.4 (2002): 422-446.

More Related Content

Viewers also liked

Tempo September 2014
Tempo September 2014Tempo September 2014
Tempo September 2014Tempoplanet
 
Handout cedric & valentin
Handout cedric & valentinHandout cedric & valentin
Handout cedric & valentinWYEF-Gruppe
 
Tecnología marina muñoz y carmen maría 3ºa.
Tecnología marina muñoz y carmen maría 3ºa.Tecnología marina muñoz y carmen maría 3ºa.
Tecnología marina muñoz y carmen maría 3ºa.Maariinaa18
 
Presentación 10 casual friday meeting
Presentación 10 casual friday meetingPresentación 10 casual friday meeting
Presentación 10 casual friday meetingcedemi
 
Dinero electrónico ecuatoriano: Un engendro monetario
Dinero electrónico ecuatoriano: Un engendro monetarioDinero electrónico ecuatoriano: Un engendro monetario
Dinero electrónico ecuatoriano: Un engendro monetarioLuisEspinosaGoded
 
El cine en las escuelas
El cine en las escuelasEl cine en las escuelas
El cine en las escuelasPauli Perez
 
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...Bint
 

Viewers also liked (17)

Los Niños no Olvidan. Parte 3
Los Niños no Olvidan. Parte 3Los Niños no Olvidan. Parte 3
Los Niños no Olvidan. Parte 3
 
Tempo September 2014
Tempo September 2014Tempo September 2014
Tempo September 2014
 
Handout cedric & valentin
Handout cedric & valentinHandout cedric & valentin
Handout cedric & valentin
 
Formulário de requerimento próprio do cmec
Formulário de requerimento próprio do cmecFormulário de requerimento próprio do cmec
Formulário de requerimento próprio do cmec
 
Carlos y Maria
Carlos y MariaCarlos y Maria
Carlos y Maria
 
Tecnología marina muñoz y carmen maría 3ºa.
Tecnología marina muñoz y carmen maría 3ºa.Tecnología marina muñoz y carmen maría 3ºa.
Tecnología marina muñoz y carmen maría 3ºa.
 
vsa_autumn11
vsa_autumn11vsa_autumn11
vsa_autumn11
 
Critical Issues Matrix - TIP Committee LMRC 22jun15
Critical Issues Matrix - TIP Committee LMRC 22jun15Critical Issues Matrix - TIP Committee LMRC 22jun15
Critical Issues Matrix - TIP Committee LMRC 22jun15
 
Presentación 10 casual friday meeting
Presentación 10 casual friday meetingPresentación 10 casual friday meeting
Presentación 10 casual friday meeting
 
CS Consulting
CS ConsultingCS Consulting
CS Consulting
 
Dinero electrónico ecuatoriano: Un engendro monetario
Dinero electrónico ecuatoriano: Un engendro monetarioDinero electrónico ecuatoriano: Un engendro monetario
Dinero electrónico ecuatoriano: Un engendro monetario
 
Findability - encontrabilidad web
Findability - encontrabilidad webFindability - encontrabilidad web
Findability - encontrabilidad web
 
El cine en las escuelas
El cine en las escuelasEl cine en las escuelas
El cine en las escuelas
 
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...
SEOGuardian - Report posizionamento nei motori di ricerca - Lenti a contatto ...
 
Res finanzas2011
Res finanzas2011Res finanzas2011
Res finanzas2011
 
Parclick presentation
Parclick presentationParclick presentation
Parclick presentation
 
Monthly Report June 2015
Monthly Report June 2015Monthly Report June 2015
Monthly Report June 2015
 

Similar to A Framework for Collecting, Extracting and Managing Event Identity Information from Twitter

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?Yiannis Kompatsiaris
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science suresh sood
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Artificial Intelligence Institute at UofSC
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Artificial Intelligence Institute at UofSC
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of DisinformationSara-Jayne Terp
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Artificial Intelligence Institute at UofSC
 
Digital Keepers: Ethics of Saving Online Data About Latin American Social Mo...
Digital Keepers:  Ethics of Saving Online Data About Latin American Social Mo...Digital Keepers:  Ethics of Saving Online Data About Latin American Social Mo...
Digital Keepers: Ethics of Saving Online Data About Latin American Social Mo...Itza Carbajal
 
Strategies for Digital Natives
Strategies for Digital NativesStrategies for Digital Natives
Strategies for Digital Nativeshblowers
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Farida Vis
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsAmit Sheth
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsYiannis Kompatsiaris
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Yiannis Kompatsiaris
 
2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformation2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformationSaraJayneTerp
 
From Chirps to Whistles - Discovering Event-specific Informative Content from...
From Chirps to Whistles - Discovering Event-specific Informative Content from...From Chirps to Whistles - Discovering Event-specific Informative Content from...
From Chirps to Whistles - Discovering Event-specific Informative Content from...Debanjan Mahata
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism Alexander Howard
 
LVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaLVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaChris Evans
 

Similar to A Framework for Collecting, Extracting and Managing Event Identity Information from Twitter (20)

From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?From Research to Applications: What Can We Extract with Social Media Sensing?
From Research to Applications: What Can We Extract with Social Media Sensing?
 
Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science  Data Science Innovations : Democratisation of Data and Data Science
Data Science Innovations : Democratisation of Data and Data Science
 
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
Citizen Sensing: Opportunities and Challenges in Mining Social Signals and Pe...
 
Towards a More Open World
Towards a More Open WorldTowards a More Open World
Towards a More Open World
 
Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping Social Media Data Mining Services - 3i Data Scraping
Social Media Data Mining Services - 3i Data Scraping
 
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
Harnessing Volume and Velocity Challenge on the Social Web using Crowd-Source...
 
The Business(es) of Disinformation
The Business(es) of DisinformationThe Business(es) of Disinformation
The Business(es) of Disinformation
 
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
Crisis Mapping, Citizen Sensing and Social Media Analytics: Leveraging Citize...
 
Digital Keepers: Ethics of Saving Online Data About Latin American Social Mo...
Digital Keepers:  Ethics of Saving Online Data About Latin American Social Mo...Digital Keepers:  Ethics of Saving Online Data About Latin American Social Mo...
Digital Keepers: Ethics of Saving Online Data About Latin American Social Mo...
 
Strategies for Digital Natives
Strategies for Digital NativesStrategies for Digital Natives
Strategies for Digital Natives
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Citizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and ApplicationsCitizen Sensor Data Mining, Social Media Analytics and Applications
Citizen Sensor Data Mining, Social Media Analytics and Applications
 
Social Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events ApplicationsSocial Data and Multimedia Analytics for News and Events Applications
Social Data and Multimedia Analytics for News and Events Applications
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
Osint part 1_personal_privacy
Osint part 1_personal_privacyOsint part 1_personal_privacy
Osint part 1_personal_privacy
 
2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformation2021 12 nyu-the_business_of_disinformation
2021 12 nyu-the_business_of_disinformation
 
From Chirps to Whistles - Discovering Event-specific Informative Content from...
From Chirps to Whistles - Discovering Event-specific Informative Content from...From Chirps to Whistles - Discovering Event-specific Informative Content from...
From Chirps to Whistles - Discovering Event-specific Informative Content from...
 
Data journalism Overview
Data journalism OverviewData journalism Overview
Data journalism Overview
 
The art and science of data-driven journalism
The art and science of data-driven journalism The art and science of data-driven journalism
The art and science of data-driven journalism
 
LVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - QualiaLVIMA DPD 2015 - Qualia
LVIMA DPD 2015 - Qualia
 

Recently uploaded

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictJack Cole
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单ukgaet
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单ocavb
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesStarCompliance.io
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单vcaxypu
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsalex933524
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单ewymefz
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...elinavihriala
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单ewymefz
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单ewymefz
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单nscud
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单ewymefz
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBAlireza Kamrani
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatheahmadsaood
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundOppotus
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .NABLAS株式会社
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?DOT TECH
 

Recently uploaded (20)

Supply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflictSupply chain analytics to combat the effects of Ukraine-Russia-conflict
Supply chain analytics to combat the effects of Ukraine-Russia-conflict
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
Investigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_CrimesInvestigate & Recover / StarCompliance.io / Crypto_Crimes
Investigate & Recover / StarCompliance.io / Crypto_Crimes
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
一比一原版(ArtEZ毕业证)ArtEZ艺术学院毕业证成绩单
 
Tabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflowsTabula.io Cheatsheet: automate your data workflows
Tabula.io Cheatsheet: automate your data workflows
 
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
一比一原版(UMich毕业证)密歇根大学|安娜堡分校毕业证成绩单
 
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
2024-05-14 - Tableau User Group - TC24 Hot Topics - Tableau Pulse and Einstei...
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
Using PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDBUsing PDB Relocation to Move a Single PDB to Another Existing CDB
Using PDB Relocation to Move a Single PDB to Another Existing CDB
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Q1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year ReboundQ1’2024 Update: MYCI’s Leap Year Rebound
Q1’2024 Update: MYCI’s Leap Year Rebound
 
社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .社内勉強会資料_LLM Agents                              .
社内勉強会資料_LLM Agents                              .
 
How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?How can I successfully sell my pi coins in Philippines?
How can I successfully sell my pi coins in Philippines?
 

A Framework for Collecting, Extracting and Managing Event Identity Information from Twitter

  • 1. A Framework for Collecting, Extracting and Managing Event Identity Information from Twitter Debanjan Mahata, John R. Talburt dxmahata@ualr.edu, jrtalburt@ualr.edu Department of Information Science University of Arkansas at Little Rock Vivek Kumar Singh vivek@cs.sau.ac.in Department of Computer Science South Asian University, New Delhi, India
  • 2. Social Media  A daily average of 58 million tweets is posted in Twitter. Source: http://goo.gl/Oz5sIZ  An average 60 million photos are shared in Instagram daily. Source: http://instagram.com/press  Facebook stores 300 petabytes of data related to its users from all over the world. Source: http://goo.gl/XxEfeX  72% of all internet users are now active on social media. Source: http://goo.gl/qAuIoe  46% of adult Internet users post original photos or videos online that they themselves have created. Source: http://goo.gl/iQ06Ix /
  • 4. EIIM in MDM Zhou, Yinle, and John Talburt. "Entity identity information management (EIIM)."International Conference on Information Quality (ICIQ-11), Adelaide, Australia. 2011.
  • 6. Challenges Volume and Velocity Veracity New post: Sochi Was For Suckers - Laugh Studios/ http://t.co/cWQJCBp3Ow #lol #funny #rofl #funnypic #fail #wtf Informal Text Variety Searching the Long TailSampling Bias Sparse Link Structure Between Content in Social Media Lack of Evaluation Datasets
  • 7. EIIM Life Cycle in Twitter Mahata, Debanjan, and John Talburt. "A Framework for Collecting and Managing Entity Identity Information from Social Media.“ 19th International Conference on Information Quality, Xi’An, China.
  • 8. Identity Integrity1 Assigns unique identifier to a real-life event being tracked by the framework and maintains the same identifier for newly collected event references Identity Integrity Requires • Each real-world event in the domain has one and only one representation in the information system. • Distinct real-world events have distinct representations in the information system. Allocates individual EIIS to each real-life event being tracked by the framework
  • 9.
  • 10.
  • 11. Event Reference Preparation • Parts-of-Speech Tagging • Special Character Detection • Data Cleansing • Duplicate Detection • Stop Word Detection and Elimination • Slang Word Extraction • Feeling Word Extraction • Tokenization • Stemming • Tweet Meta-Data • Expanded URLs • User Information • Verification • Favorite Count • Retweet Count • User Mentions • Entity Extraction
  • 12.
  • 13.
  • 15.
  • 16. Event Identity Information Processing EventIdentityInfoGraph Process using EventIdentity InfoRank 7
  • 17.
  • 18. NDCG Curves for Millions March NYC
  • 19. NDCG Values for Millions March NYC
  • 20. Precision Values for Millions March NYC
  • 21. Potential Applications • Event Monitoring and Analysis • Event Information Retrieval • Opinion and Review Mining • Recommender Systems • Event Management and Marketing • Social Media Data Integration • Many More
  • 22. Future Directions • Summarizing Event Content • Identification of Insightful Opinionated Content • Event Topic Modeling • Event-specific Recommendations • Distributed Processing of TwitterEventInfoGraph • Ontology for Event Content in Social Media • Many More
  • 23.
  • 24.
  • 26. Tweet Features No. of Unigram Tokens, No. of Stop Words, No. of Slang Words, No. of Feeling Words, No. of Hashtags, Has URL, Is Verified, No. of User Mentions, Length of Post, No. of Unique Characters, No. of Special Characters, Favorite Count, Retweet Count, Formality, No. of Nouns, No. of Adjectives, No. of Verbs, No. of Adverbs. Logistic Regression Model Performance Precision Recall F-1 Score Non-informative (0) 0.70 0.49 0.57 Informative (1) 0.78 0.90 0.84 Avg/Total Accuracy = 76.64% 0.76 0.77 0.75 Olteanu, Alexandra, et al. "CrisisLex: A lexicon for collecting and filtering microblogged communications in crises." In Proceedings of the 8th International AAAI Conference on Weblogs and Social Media (ICWSM" 14). No. EPFL-CONF-203561. 2014. Event Information Quality 28000 annotated tweets 26 Events Related and Informative – “#Media Large wildfire in N. Colorado prompts Evacuation : Crews are battling a fast- Moving wildfire http://t.co/ju1BGTKH #Politics #News” Related but not Informative – “RT @LarimerSheriff: #HighParkFire update http://t.co/hBy5shen” Not Related – “#Intern #US #TATTOO #Wisconsin #Ohio #NC #PA #Florida #Colorado #Iowa #Nevada #Virginia #NV #mlb Travel Destinations; http://t.co/TIHBJKF2”
  • 29. NDCG Values for Millions March NYC
  • 30.
  • 31. NDCG Curves for Millions March NYC
  • 32. Precision Values for Millions March NYC
  • 33. NDCG Values for Sydney Siege Crisis
  • 34. NDCG Curves for Sydney Siege Crisis
  • 35. Precision Values for Sydney Siege Crisis
  • 36. • SeenRank (http://seen.co/about) • TextRank (Mihalcea, Rada, and Paul Tarau. "TextRank: Bringing order into texts." Association for Computational Linguistics, 2004.) • LexRank(Erkan, Günes, and Dragomir R. Radev. "LexRank: graph-based lexical centrality as salience in text summarization." Journal of Artificial Intelligence Research (2004): 457-479.) • RTRank • Centroid(Becker, Hila, Mor Naaman, and Luis Gravano. "Selecting Quality Twitter Content for Events." ICWSM 11 (2011).) • Logistic Regression Baselines
  • 37. Evaluation Metrics     p i rel p i DCG i 1 )1log( 12 p p p IDCG DCG nDCG  n natreferencesrelevantofNumber natecision Pr Baeza-Yates, Ricardo, and Berthier Ribeiro-Neto. Modern information retrieval. Vol. 463. New York: ACM press, 1999. Järvelin, Kalervo, and Jaana Kekäläinen. "Cumulated gain-based evaluation of IR techniques." ACM Transactions on Information Systems (TOIS) 20.4 (2002): 422-446.