SlideShare a Scribd company logo
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
1
Recruiting from the network: discovering Twitter
users who can help combat Zika epidemics
Paolo Missier1, Callum McClean1, Jonathan Carlton1, Diego Cedrim2,
Leonardo Silva2, Alessandro Garcia2, Alexandre Plastino3, and
Alexander Romanovsky1
1 School of Computing Science, Newcastle University, UK
2 PUC-Rio, Rio de Janerio, Brazil
3 Universidad Federal Fluminense, Niteròi, Brazil
17th International Conference on Web Engineering
Rome, June 2017
2
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Motivation, Goals, and Hypothesis
• Motivation:
• Mosquito-borne epidemics frequently occur and preventions programs have no
been effective.
• Long-term goal:
• Leverage real-time, social media streams to increase effectiveness and speed
of health campaigns.
• Hypothesis:
• Social sensors and their signals can be detected from real-time social media
streams.
• Through continuous analysis of the Twitter stream, it is possible to automatically
identify them
3
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Challenges/Problems
• Discovery of relevant tweets requires a long harvest period.
• On average, relevant tweets make up 10% of a typical harvest.
• Sparsity of users and tweets
• Most users contribute a single tweet
• Lack of meaningful connections and content
• Needle in haystack problem
4
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Technical Approach
OnlineOffline
5
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Offline Phase
6
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Online
Model
7
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
User Ranking Metrics
• TwitterRank [1]
• Topological approach that takes into account the topical similarity between
users and network structure to determine a score.
• Topic Focus
• 𝑇𝐹 𝑢 =
𝑅 𝐾(𝑢)
𝑇 𝐾(𝑢)
• TF(u) is the topic focus for a user u, TK(u) is the number of tweets contributed
by u, RK(u) is the number of relevant tweets in TK(u).
• Overall Focus
• 𝑂𝐹 𝑢 =
𝑅 𝐾(𝑢)
𝑇(𝑢)
• OF(u) is the overall focus for a user u, T(u) is the total number of tweets posted
by u during the harvest period.
8
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Experiments
• Classifier trained on 10,000 labelled examples
• Collection of tweets 278,351 from September – December 2016,
15,124 are classified as relevant.
9
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Results: TwitterRank
• Limited success; but notable spread between top and bottom
ranks.
• Requires meaningful connections to exist and a lack of ground
truth.
10
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Results: Topic Focus
• Suggested correlation between a high TwitterRank and high Topic
Focus
• More than 10 users have a Topic Focus = 100
11
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Results: Overall Focus
• Users here also have a high Topic Focus
• FlorzinhaSimoes and pelotelefone also rank high in other
lists
12
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Fragment of candidate users graph
13
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
Conclusions
• Lessons learnt
• Ineffectiveness of TwitterRank
• Non-topological metrics aid in producing results
• Research in progress
• Continually experimenting with larger data sets
• Engagement with candidate users
14
17thInternationalConferenceonWebEngineering
Rome,June2017
J.Carlton
[1] Weng, Jianshu, Ee-Peng Lim, Jing Jiang, and Qi He. "Twitterrank: finding topic-sensitive influential twitterers."
In Proceedings of the third ACM international conference on Web search and data mining, pp. 261-270. ACM,
2010.
[2] Missier, Paolo, Callum McClean, Jonathan Carlton, Diego Cedrim, Leonardo Silva, Alessandro Garcia,
Alexandre Plastino, and Alexander Romanovsky. "Recruiting from the network: discovering Twitter users who can
help combat Zika epidemics." arXiv preprint arXiv:1703.03928 (2017).
[3] Missier, Paolo, Alexander Romanovsky, Tudor Miu, Atinder Pal, Michael Daniilakis, Alessandro Garcia, Diego
Cedrim, and Leonardo da Silva Sousa. "Tracking Dengue Epidemics using Twitter Content Classification and
Topic Modelling." In International Conference on Web Engineering, pp. 80-92. Springer International Publishing,
2016.
[4] Chawla, Nitesh V., Kevin W. Bowyer, Lawrence O. Hall, and W. Philip Kegelmeyer. "SMOTE: synthetic minority
over-sampling technique." Journal of artificial intelligence research 16 (2002): 321-357.
References

More Related Content

Similar to ICWE 2017: Recruiting from the network

Social Network Analysis with NodeXL Part 1
Social Network Analysis with NodeXL Part 1Social Network Analysis with NodeXL Part 1
Social Network Analysis with NodeXL Part 1
Dr Wasim Ahmed
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
Nicolas Robinson-Garcia
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
Paolo Missier
 
Predicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning TechnologiesPredicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning Technologies
lisbk
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
Martin Klein
 
Sgci iris-hep-11-1-18
Sgci iris-hep-11-1-18Sgci iris-hep-11-1-18
Sgci iris-hep-11-1-18
Nancy Wilkins-Diehr
 
Some facts and figures about JISC digitisation impact
Some facts and figures about JISC digitisation impactSome facts and figures about JISC digitisation impact
Some facts and figures about JISC digitisation impact
PaolaMarchionni
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
University of Groningen (The Netherlands)
 
GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019
Guide to PHARMACOLOGY
 
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
Sandra Gesing
 
UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
Jisc
 
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Vince Smith
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Farida Vis
 
Ouellette elixir 2017
Ouellette elixir 2017Ouellette elixir 2017
Ouellette elixir 2017
Neuro, McGill University
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Yiannis Kompatsiaris
 
YUDU - Managing a Breach (LDSC Cyber Themed Evening)
YUDU - Managing a Breach (LDSC Cyber Themed Evening)YUDU - Managing a Breach (LDSC Cyber Themed Evening)
YUDU - Managing a Breach (LDSC Cyber Themed Evening)
Tom Lejava
 
Estermann glam wiki2015_glam-survey_20150411
Estermann glam wiki2015_glam-survey_20150411Estermann glam wiki2015_glam-survey_20150411
Estermann glam wiki2015_glam-survey_20150411
Beat Estermann
 
Keynote Talk - Gaining Powerful Insights into Social Media Listening
Keynote Talk - Gaining Powerful Insights into Social Media ListeningKeynote Talk - Gaining Powerful Insights into Social Media Listening
Keynote Talk - Gaining Powerful Insights into Social Media Listening
Dr Wasim Ahmed
 
Ica shanghai presentation nov 13
Ica shanghai presentation nov 13Ica shanghai presentation nov 13
Ica shanghai presentation nov 13
Terry Flew
 
CDRC Masters Research Dissertation Programme - Call for Partners
CDRC Masters Research Dissertation Programme - Call for PartnersCDRC Masters Research Dissertation Programme - Call for Partners
CDRC Masters Research Dissertation Programme - Call for Partners
Guy Lansley
 

Similar to ICWE 2017: Recruiting from the network (20)

Social Network Analysis with NodeXL Part 1
Social Network Analysis with NodeXL Part 1Social Network Analysis with NodeXL Part 1
Social Network Analysis with NodeXL Part 1
 
Practical applications of altmetrics
Practical applications of altmetricsPractical applications of altmetrics
Practical applications of altmetrics
 
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
A Customisable Pipeline for Continuously Harvesting Socially-Minded Twitter U...
 
Predicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning TechnologiesPredicting and Preparing For Emerging Learning Technologies
Predicting and Preparing For Emerging Learning Technologies
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Sgci iris-hep-11-1-18
Sgci iris-hep-11-1-18Sgci iris-hep-11-1-18
Sgci iris-hep-11-1-18
 
Some facts and figures about JISC digitisation impact
Some facts and figures about JISC digitisation impactSome facts and figures about JISC digitisation impact
Some facts and figures about JISC digitisation impact
 
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
Automated Analysis of Journalists' and Politicians' Online Behavior on Social...
 
GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019GtoPdb Database Status Report - April 2019
GtoPdb Database Status Report - April 2019
 
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
SGCI Science Gateways: Harnessing Big Data and Open Data 03-19-2017
 
UK data management environment and support
UK data management environment and supportUK data management environment and support
UK data management environment and support
 
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
Use it or lose it: crowdsourcing support and outreach activities in a hybrid ...
 
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
Twitter analytics: some thoughts on sampling, tools, data, ethics and user re...
 
Ouellette elixir 2017
Ouellette elixir 2017Ouellette elixir 2017
Ouellette elixir 2017
 
Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...Visual Information Analysis for Crisis and Natural Disasters Management and R...
Visual Information Analysis for Crisis and Natural Disasters Management and R...
 
YUDU - Managing a Breach (LDSC Cyber Themed Evening)
YUDU - Managing a Breach (LDSC Cyber Themed Evening)YUDU - Managing a Breach (LDSC Cyber Themed Evening)
YUDU - Managing a Breach (LDSC Cyber Themed Evening)
 
Estermann glam wiki2015_glam-survey_20150411
Estermann glam wiki2015_glam-survey_20150411Estermann glam wiki2015_glam-survey_20150411
Estermann glam wiki2015_glam-survey_20150411
 
Keynote Talk - Gaining Powerful Insights into Social Media Listening
Keynote Talk - Gaining Powerful Insights into Social Media ListeningKeynote Talk - Gaining Powerful Insights into Social Media Listening
Keynote Talk - Gaining Powerful Insights into Social Media Listening
 
Ica shanghai presentation nov 13
Ica shanghai presentation nov 13Ica shanghai presentation nov 13
Ica shanghai presentation nov 13
 
CDRC Masters Research Dissertation Programme - Call for Partners
CDRC Masters Research Dissertation Programme - Call for PartnersCDRC Masters Research Dissertation Programme - Call for Partners
CDRC Masters Research Dissertation Programme - Call for Partners
 

Recently uploaded

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
Zilliz
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Tosin Akinosho
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Tatiana Kojar
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
saastr
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Wask
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
Shinana2
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
alexjohnson7307
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
kumardaparthi1024
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
Octavian Nadolu
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
Dinusha Kumarasiri
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
ssuserfac0301
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 

Recently uploaded (20)

How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Building Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and MilvusBuilding Production Ready Search Pipelines with Spark and Milvus
Building Production Ready Search Pipelines with Spark and Milvus
 
Monitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdfMonitoring and Managing Anomaly Detection on OpenShift.pdf
Monitoring and Managing Anomaly Detection on OpenShift.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
Skybuffer AI: Advanced Conversational and Generative AI Solution on SAP Busin...
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
Overcoming the PLG Trap: Lessons from Canva's Head of Sales & Head of EMEA Da...
 
Digital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying AheadDigital Marketing Trends in 2024 | Guide for Staying Ahead
Digital Marketing Trends in 2024 | Guide for Staying Ahead
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
dbms calicut university B. sc Cs 4th sem.pdf
dbms  calicut university B. sc Cs 4th sem.pdfdbms  calicut university B. sc Cs 4th sem.pdf
dbms calicut university B. sc Cs 4th sem.pdf
 
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
leewayhertz.com-AI in predictive maintenance Use cases technologies benefits ...
 
GenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizationsGenAI Pilot Implementation in the organizations
GenAI Pilot Implementation in the organizations
 
Artificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopmentArtificial Intelligence for XMLDevelopment
Artificial Intelligence for XMLDevelopment
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
Azure API Management to expose backend services securely
Azure API Management to expose backend services securelyAzure API Management to expose backend services securely
Azure API Management to expose backend services securely
 
Taking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdfTaking AI to the Next Level in Manufacturing.pdf
Taking AI to the Next Level in Manufacturing.pdf
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 

ICWE 2017: Recruiting from the network

Editor's Notes

  1. Motivation Mosquito-borne epidemics are becoming more frequent in tropical and subtropical regions of the world. Prevention programs have not been particularly effective. Example: The Brazilian Health System requires that health agents report each Zika case, however, it takes several days to process and publish. Goals Social media provides, potentially, a faster vehicle for information than traditional channels; such as in-person data collection or surveys. Hypothesis We investigate into the kind of social sensors (people who stand out for their quality and relevance of their contribution) signals that can be detected from real-time social media streams. Ranking these social sensors will provide target users for the health authorities.
  2. Long Harvest Period Finding a sufficient number of relevant tweets per user requires a long harvest period. A 3-month harvest in 2016 yielded 13,000 relevant tweets (roughly 10% of the entire harvest) with most users (who have tweets classified as relevant) only contributing one single relevant tweet Sparsity of Users Our adapted single-topic version of Twitter Rank requires both meaningful connections and content to exist to be effective and efficient. When applying it to a sparse data set yields little or no interesting results.
  3. Combination of content-based automated classification of tweets, to isolate relevant signals from noisy chatter, and ranking users who author such content. Left side is offline and right side is online.
  4. A keyword harvester listens to the live Twitter stream. Keywords are from sample tweets inspected by domain experts. Pre-processing removes social media-type communication. POS tagging and lemmatisation are also performed. Re-sampling: to boost relevant samples, we added an extra 600 annotated relevant examples then applied over-sampling using the SMOTE algorithm. This boosted samples from 12.1% to 24.3%. Model: Using a Random Forest classifier with SMOTE over-sampling, no attribute selection, and 1,2,3-grams we achieved an accuracy of 84.1
  5. Both the tweets and pre-processing is done in the same way as the offline method. The classifier assigns either Relevant, News, or Noise label depending on the tweet content. News and Noise are no longer considered, relevant tweets move forward. Non-topological is performed instantly and results are ready to be pushed to the portal. Before Twitter Rank is performed, the followers of Relevant contributing users are fetched from Twitter. Can produce a bottle neck, especially if a large amount of followers need to be collected.
  6. Twitter Rank We adapted the original version to only consider a single topic rather than multiple. Also replaced some metrics to better suit our needs Detailed in the long version of the paper I’m presenting. Topic Focus The proportion of tweets in the harvest that are Relevant for a specific user. Overall Focus The proportion of Relevant tweets in the harvest vs the total number of tweets posted by the user in the harvest period. These are interesting as they provide a comparative metric for the Twitter Rank approach, without taking the network structure into account.
  7. Classifier was trained using 10,000 manually annotated examples. Table shows various combinations of N-grams and Multinomial Naïve Bayes vs Random Forest. Best overall accuracy across all configurations is 84.1% from Random Forest with 100 trees, 1,2,3-grams, no attribute selection and SMOTE-based boosting. From 278,000 tweets we could only harvest 15,000 Relevant tweets, giving us a small amount to work with. The image is an example of a Relevant classified tweet. The table shows the sparsity of contributions by users, long single-tweet tail giving us a lot of candidate users – 13,228.
  8. Very small values are given to users, the original paper does not provide any reference figures. Notable spread between the top and bottom ranks (150%). The results are questionable as the approach only yields interesting values when a user has at least some of its followers as candidate users. We find our candidate users have very few connections amongst each other.
  9. Note that SeizeTheHeaven appears in both top-10’s for Twitter Rank and Topic Focus All top 10 Twitter Rank users appear as top 30 Topic Focus users, suggesting that a high Twitter Rank may correlate with a high Topic Focus There are more than 10 users that have a Topic Focus of 100.
  10. Overall Focus users also have a high Topic Focus and they rank within the top-20 Two users here rank highly in other lists: FlorzinhaSimoes and pelotelefone.
  11. Green nodes = top 10 Twitter Rank Blue nodes = top 10 Twitter Rank and Overall Focus Red nodes = top 10 Twitter Rank and Topic Focus
  12. Ineffectiveness Given the sparsity of contributors and their limited connections in the social graph, it’s not surprising that TwitterRank – which relies on these – is not particularly effective. Non-topological In comparison, these appear to be effective. Going forward We are experimenting with larger data sets that are continually being harvested from Twitter. High season has just passed, so a larger corpus of relevant tweets is hoped for. Rather than past-tense analysis of the users, we’re discussing directly engaging with the users that are identified when relevant content is posted.