SlideShare a Scribd company logo
1 of 18
08/30/18 Stefan Helmstetter, Heiko Paulheim 1
Weakly Supervised Learning for Fake News
Detection on Twitter
Stefan Helmstetter, Heiko Paulheim
08/30/18 Stefan Helmstetter, Heiko Paulheim 2
Motivation
• Social media...
– ...are an increasingly important source of information
– ...can be manipulated easily
08/30/18 Stefan Helmstetter, Heiko Paulheim 3
Motivation
• Fake news detection: a straight forward machine learning problem
– Simplest case: two classes
– Researched for several decades
– Used, e.g., for spam filtering
08/30/18 Stefan Helmstetter, Heiko Paulheim 4
Motivation
• Challenge
– The more training data, the better
– Mass labeling data is difficult (e.g., requires investigations)
●
cf. spam filtering: labeling can be done “on the fly” by laymen
08/30/18 Stefan Helmstetter, Heiko Paulheim 5
Approach
• We cannot easily tell a fake news tweet from a real one
• But we have information on fake and trustworthy sources
08/30/18 Stefan Helmstetter, Heiko Paulheim 6
Approach
• Naive mass labeling:
– every tweet from a fake source is a fake tweet
– every tweet from a trustworthy source is a true tweet
• Our collection:
– 65 fake news sources
– 47 trustworthy news sources
– 401k tweets
●
111k fake news
●
291k real news
08/30/18 Stefan Helmstetter, Heiko Paulheim 7
Approach
• Skew towards 2017
– time of crawling, limitations of Twitter API
– more real than fake news (intentionally!)
08/30/18 Stefan Helmstetter, Heiko Paulheim 8
Approach
• Naive mass labeling:
– every tweet from a fake source is a fake tweet
– every tweet from a trustworthy source is a true tweet
08/30/18 Stefan Helmstetter, Heiko Paulheim 9
Approach
• Mind the classification task
– if we train a classifier, we learn to identify
tweets from untrustworthy sources
– not necessarily the same as fake news tweets
• Assumption
– the training dataset is large
– non-fake news are also covered by trustworthy sources
– trustworthy copies outnumber fake news ones
●
incidental skew in the dataset
08/30/18 Stefan Helmstetter, Heiko Paulheim 10
Approach
• Leaving that caveat aside, we use
– 53 user-level features
e.g., no. of followers, tweet frequency
– 69 tweet-level features
e.g., length, no. of hashtags, no. of URLs
– text features
as BoW (60k features) or doc2vec model (300 features)
– topic features
10-200 topics created using LDA
– eight features using sentiment and polarity analysis
• Classifiers
– Naive Bayes, Decision Trees, SVM, Neural Net (1 hidden layer),
Random Forest, xgboost
– Voting and weighted voting of the above
08/30/18 Stefan Helmstetter, Heiko Paulheim 11
Approach
• Optimal selection of features per classifier
08/30/18 Stefan Helmstetter, Heiko Paulheim 12
Evaluation
• Setting 1
– Cross validation on the training set
– Remember: actual target is trustworthiness of source
• Setting 2
– Validation against a gold standard
– Target here: trustworthiness of tweet
• Two variants each
– with and without user level features
– idea: judging tweets from known and unknown sources
08/30/18 Stefan Helmstetter, Heiko Paulheim 13
Evaluation
• Setting 1
– Cross validation on the training set
– Remember: actual target is
trustworthiness of source
• Results
– up to .78 without user level tweets
– up to .94 with user level tweets
– xgboost and voting work best
08/30/18 Stefan Helmstetter, Heiko Paulheim 14
Evaluation
• Setting 2
– Validation against a gold standard
– Target here: trustworthiness of tweet
• Results
– up to .77 without user level tweets
– up to .89 with user level tweets
– neural net works best
• Observation:
– results are not much worse than for setting 1
– i.e.: source labels seem to be a suitable proxy for tweet labels
08/30/18 Stefan Helmstetter, Heiko Paulheim 15
Evaluation
• Feature weighting by xgboost:
– most important features are user level features
08/30/18 Stefan Helmstetter, Heiko Paulheim 16
Evaluation
• Without user level features
– surface level features are strong
– content/topics are not too important
08/30/18 Stefan Helmstetter, Heiko Paulheim 17
Conclusion
• Fake news detection is a straight forward classification task
– but training data is scarce
• Inexact mass-labeling can be done
– by using source instead of tweet labels
– collection of large-scale training is easy
– automatic re-collection is possible
(e.g., for new topics, changed twitter behavior)
• Results for tweet labeling
– not much worse than for source labeling
08/30/18 Stefan Helmstetter, Heiko Paulheim 18
Weakly Supervised Learning for Fake News
Detection on Twitter
Stefan Helmstetter, Heiko Paulheim

More Related Content

What's hot

Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis pptSonuCreation
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonHetu Bhavsar
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project reportBharat Khanna
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningParvathi Sanil Nair
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataHari Prasad
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on TwitterSmritiAgarwal26
 
PPT2: Introduction of Machine Learning & Deep Learning and its types
PPT2: Introduction of Machine Learning & Deep Learning and its typesPPT2: Introduction of Machine Learning & Deep Learning and its types
PPT2: Introduction of Machine Learning & Deep Learning and its typesakira-ai
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarRavi Kumar
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on TwitterNitish J Prabhu
 
Deep Learning Fundamentals
Deep Learning FundamentalsDeep Learning Fundamentals
Deep Learning FundamentalsThomas Delteil
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGene Leybzon
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine LearningIRJET Journal
 
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...Nohoax Kanont
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGISynaptonIncorporated
 

What's hot (20)

Twitter sentiment analysis ppt
Twitter sentiment analysis pptTwitter sentiment analysis ppt
Twitter sentiment analysis ppt
 
Social Media Sentiment Analysis
Social Media Sentiment AnalysisSocial Media Sentiment Analysis
Social Media Sentiment Analysis
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
Sentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using pythonSentiment analysis of Twitter data using python
Sentiment analysis of Twitter data using python
 
Twitter sentiment analysis project report
Twitter sentiment analysis project reportTwitter sentiment analysis project report
Twitter sentiment analysis project report
 
FAKE Review Detection
FAKE Review DetectionFAKE Review Detection
FAKE Review Detection
 
Seminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learningSeminar on detecting fake accounts in social media using machine learning
Seminar on detecting fake accounts in social media using machine learning
 
Sentiment Analysis using Twitter Data
Sentiment Analysis using Twitter DataSentiment Analysis using Twitter Data
Sentiment Analysis using Twitter Data
 
Detecting fake news .pptx
Detecting fake news .pptxDetecting fake news .pptx
Detecting fake news .pptx
 
Sentiment Analysis on Twitter
Sentiment Analysis on TwitterSentiment Analysis on Twitter
Sentiment Analysis on Twitter
 
Introduction to Deep Learning
Introduction to Deep Learning Introduction to Deep Learning
Introduction to Deep Learning
 
PPT2: Introduction of Machine Learning & Deep Learning and its types
PPT2: Introduction of Machine Learning & Deep Learning and its typesPPT2: Introduction of Machine Learning & Deep Learning and its types
PPT2: Introduction of Machine Learning & Deep Learning and its types
 
New sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumarNew sentiment analysis of tweets using python by Ravi kumar
New sentiment analysis of tweets using python by Ravi kumar
 
Sentiment Analaysis on Twitter
Sentiment Analaysis on TwitterSentiment Analaysis on Twitter
Sentiment Analaysis on Twitter
 
Deep Learning Fundamentals
Deep Learning FundamentalsDeep Learning Fundamentals
Deep Learning Fundamentals
 
Spam identification fake profile
Spam identification fake profileSpam identification fake profile
Spam identification fake profile
 
Generative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First SessionGenerative AI Use-cases for Enterprise - First Session
Generative AI Use-cases for Enterprise - First Session
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine Learning
 
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
Unlocking the Power of Generative AI Models and Systems such as GPT-4 and Cha...
 
Transformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGITransformers, LLMs, and the Possibility of AGI
Transformers, LLMs, and the Possibility of AGI
 

Similar to Weakly Supervised Learning for Fake News Detection on Twitter

Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningHeiko Paulheim
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysisikanow
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media frameworkOpen Analytics
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Jason Hong
 
Using Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesUsing Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesDr Wasim Ahmed
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysisikanow
 
Analysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World EventsAnalysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World Eventskcortis
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine LearningHeiko Paulheim
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentationfeinal
 
Hallam healthresearch tweetchats
Hallam healthresearch tweetchatsHallam healthresearch tweetchats
Hallam healthresearch tweetchatsHeidi Probst
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisOpen Analytics
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Timo Wandhoefer
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSumit Raj
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDivyaPatel729457
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutMatthew J. Kushin, Ph.D.
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Symeon Papadopoulos
 

Similar to Weakly Supervised Learning for Fake News Detection on Twitter (16)

Exploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data MiningExploiting Linked Open Data as Background Knowledge in Data Mining
Exploiting Linked Open Data as Background Knowledge in Data Mining
 
Open Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media AnalysisOpen Analytics: Building Effective Frameworks for Social Media Analysis
Open Analytics: Building Effective Frameworks for Social Media Analysis
 
Open analytics social media framework
Open analytics   social media frameworkOpen analytics   social media framework
Open analytics social media framework
 
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
Usable Privacy and Security: A Grand Challenge for HCI, Human Computer Inter...
 
Using Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challengesUsing Twitter as a data source: An overview of ethical challenges
Using Twitter as a data source: An overview of ethical challenges
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Analysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World EventsAnalysis of Cyberbullying Tweets in Trending World Events
Analysis of Cyberbullying Tweets in Trending World Events
 
Fast Approximate A-box Consistency Checking using Machine Learning
Fast Approximate  A-box Consistency Checking using Machine LearningFast Approximate  A-box Consistency Checking using Machine Learning
Fast Approximate A-box Consistency Checking using Machine Learning
 
Group Chapter Presentation
Group Chapter PresentationGroup Chapter Presentation
Group Chapter Presentation
 
Hallam healthresearch tweetchats
Hallam healthresearch tweetchatsHallam healthresearch tweetchats
Hallam healthresearch tweetchats
 
Building Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media AnalysisBuilding Effective Frameworks for Social Media Analysis
Building Effective Frameworks for Social Media Analysis
 
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
Online Forums vs. Social Networks: Two Case Studies to support eGovernment wi...
 
Sentiment Analysis of Twitter Data
Sentiment Analysis of Twitter DataSentiment Analysis of Twitter Data
Sentiment Analysis of Twitter Data
 
DP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptxDP1_160430723010_Divya.pptx
DP1_160430723010_Divya.pptx
 
Social Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - HandoutSocial Network Analysis Basics for Social Media Profs - Handout
Social Network Analysis Basics for Social Media Profs - Handout
 
Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar Social Media Crawling & Mining Seminar
Social Media Crawling & Mining Seminar
 

More from Heiko Paulheim

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...Heiko Paulheim
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vecHeiko Paulheim
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsHeiko Paulheim
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph BlockHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Heiko Paulheim
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge GraphsHeiko Paulheim
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphHeiko Paulheim
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Heiko Paulheim
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Heiko Paulheim
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingHeiko Paulheim
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the WebHeiko Paulheim
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyHeiko Paulheim
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopHeiko Paulheim
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionHeiko Paulheim
 

More from Heiko Paulheim (20)

Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...Knowledge Graph Generation  from Wikipedia in the Age of ChatGPT:  Knowledge ...
Knowledge Graph Generation from Wikipedia in the Age of ChatGPT: Knowledge ...
 
What_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdfWhat_do_Knowledge_Graph_Embeddings_Learn.pdf
What_do_Knowledge_Graph_Embeddings_Learn.pdf
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
New Adventures in RDF2vec
New Adventures in RDF2vecNew Adventures in RDF2vec
New Adventures in RDF2vec
 
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsKnowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
Knowledge Matters! The Role of Knowledge Graphs in Modern AI Systems
 
From Wikis to Knowledge Graphs
From Wikis to Knowledge GraphsFrom Wikis to Knowledge Graphs
From Wikis to Knowledge Graphs
 
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...
 
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids  on the Knowledge Graph BlockBeyond DBpedia and YAGO – The New Kids  on the Knowledge Graph Block
Beyond DBpedia and YAGO – The New Kids on the Knowledge Graph Block
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist’s Perspec...
 
Machine Learning & Embeddings for Large Knowledge Graphs
Machine Learning & Embeddings  for Large Knowledge GraphsMachine Learning & Embeddings  for Large Knowledge Graphs
Machine Learning & Embeddings for Large Knowledge Graphs
 
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge GraphFrom Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
From Wikipedia to Thousands of Wikis – The DBkWik Knowledge Graph
 
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
Big Data, Smart Algorithms, and Market Power - A Computer Scientist's Perspec...
 
Make Embeddings Semantic Again!
Make Embeddings Semantic Again!Make Embeddings Semantic Again!
Make Embeddings Semantic Again!
 
How much is a Triple?
How much is a Triple?How much is a Triple?
How much is a Triple?
 
Machine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge GraphsMachine Learning with and for Semantic Web Knowledge Graphs
Machine Learning with and for Semantic Web Knowledge Graphs
 
Towards Knowledge Graph Profiling
Towards Knowledge Graph ProfilingTowards Knowledge Graph Profiling
Towards Knowledge Graph Profiling
 
Knowledge Graphs on the Web
Knowledge Graphs on the WebKnowledge Graphs on the Web
Knowledge Graphs on the Web
 
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and OntologyData-driven Joint Debugging of the DBpedia Mappings and Ontology
Data-driven Joint Debugging of the DBpedia Mappings and Ontology
 
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on TopServing DBpedia with DOLCE - More Than Just Adding a Cherry on Top
Serving DBpedia with DOLCE - More Than Just Adding a Cherry on Top
 
Combining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly DetectionCombining Ontology Matchers via Anomaly Detection
Combining Ontology Matchers via Anomaly Detection
 

Recently uploaded

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Cantervoginip
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样vhwb25kk
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一F sss
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsappssapnasaifi408
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一F La
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfBoston Institute of Analytics
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfLars Albertsson
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)jennyeacort
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一F La
 

Recently uploaded (20)

Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
ASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel CanterASML's Taxonomy Adventure by Daniel Canter
ASML's Taxonomy Adventure by Daniel Canter
 
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
1:1定制(UQ毕业证)昆士兰大学毕业证成绩单修改留信学历认证原版一模一样
 
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
办理学位证中佛罗里达大学毕业证,UCF成绩单原版一比一
 
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /WhatsappsBeautiful Sapna Vip  Call Girls Hauz Khas 9711199012 Call /Whatsapps
Beautiful Sapna Vip Call Girls Hauz Khas 9711199012 Call /Whatsapps
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
办理(Vancouver毕业证书)加拿大温哥华岛大学毕业证成绩单原版一比一
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
E-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptxE-Commerce Order PredictionShraddha Kamble.pptx
E-Commerce Order PredictionShraddha Kamble.pptx
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdfPredicting Salary Using Data Science: A Comprehensive Analysis.pdf
Predicting Salary Using Data Science: A Comprehensive Analysis.pdf
 
Industrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdfIndustrialised data - the key to AI success.pdf
Industrialised data - the key to AI success.pdf
 
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
Call Us ➥97111√47426🤳Call Girls in Aerocity (Delhi NCR)
 
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
办理(UWIC毕业证书)英国卡迪夫城市大学毕业证成绩单原版一比一
 

Weakly Supervised Learning for Fake News Detection on Twitter

  • 1. 08/30/18 Stefan Helmstetter, Heiko Paulheim 1 Weakly Supervised Learning for Fake News Detection on Twitter Stefan Helmstetter, Heiko Paulheim
  • 2. 08/30/18 Stefan Helmstetter, Heiko Paulheim 2 Motivation • Social media... – ...are an increasingly important source of information – ...can be manipulated easily
  • 3. 08/30/18 Stefan Helmstetter, Heiko Paulheim 3 Motivation • Fake news detection: a straight forward machine learning problem – Simplest case: two classes – Researched for several decades – Used, e.g., for spam filtering
  • 4. 08/30/18 Stefan Helmstetter, Heiko Paulheim 4 Motivation • Challenge – The more training data, the better – Mass labeling data is difficult (e.g., requires investigations) ● cf. spam filtering: labeling can be done “on the fly” by laymen
  • 5. 08/30/18 Stefan Helmstetter, Heiko Paulheim 5 Approach • We cannot easily tell a fake news tweet from a real one • But we have information on fake and trustworthy sources
  • 6. 08/30/18 Stefan Helmstetter, Heiko Paulheim 6 Approach • Naive mass labeling: – every tweet from a fake source is a fake tweet – every tweet from a trustworthy source is a true tweet • Our collection: – 65 fake news sources – 47 trustworthy news sources – 401k tweets ● 111k fake news ● 291k real news
  • 7. 08/30/18 Stefan Helmstetter, Heiko Paulheim 7 Approach • Skew towards 2017 – time of crawling, limitations of Twitter API – more real than fake news (intentionally!)
  • 8. 08/30/18 Stefan Helmstetter, Heiko Paulheim 8 Approach • Naive mass labeling: – every tweet from a fake source is a fake tweet – every tweet from a trustworthy source is a true tweet
  • 9. 08/30/18 Stefan Helmstetter, Heiko Paulheim 9 Approach • Mind the classification task – if we train a classifier, we learn to identify tweets from untrustworthy sources – not necessarily the same as fake news tweets • Assumption – the training dataset is large – non-fake news are also covered by trustworthy sources – trustworthy copies outnumber fake news ones ● incidental skew in the dataset
  • 10. 08/30/18 Stefan Helmstetter, Heiko Paulheim 10 Approach • Leaving that caveat aside, we use – 53 user-level features e.g., no. of followers, tweet frequency – 69 tweet-level features e.g., length, no. of hashtags, no. of URLs – text features as BoW (60k features) or doc2vec model (300 features) – topic features 10-200 topics created using LDA – eight features using sentiment and polarity analysis • Classifiers – Naive Bayes, Decision Trees, SVM, Neural Net (1 hidden layer), Random Forest, xgboost – Voting and weighted voting of the above
  • 11. 08/30/18 Stefan Helmstetter, Heiko Paulheim 11 Approach • Optimal selection of features per classifier
  • 12. 08/30/18 Stefan Helmstetter, Heiko Paulheim 12 Evaluation • Setting 1 – Cross validation on the training set – Remember: actual target is trustworthiness of source • Setting 2 – Validation against a gold standard – Target here: trustworthiness of tweet • Two variants each – with and without user level features – idea: judging tweets from known and unknown sources
  • 13. 08/30/18 Stefan Helmstetter, Heiko Paulheim 13 Evaluation • Setting 1 – Cross validation on the training set – Remember: actual target is trustworthiness of source • Results – up to .78 without user level tweets – up to .94 with user level tweets – xgboost and voting work best
  • 14. 08/30/18 Stefan Helmstetter, Heiko Paulheim 14 Evaluation • Setting 2 – Validation against a gold standard – Target here: trustworthiness of tweet • Results – up to .77 without user level tweets – up to .89 with user level tweets – neural net works best • Observation: – results are not much worse than for setting 1 – i.e.: source labels seem to be a suitable proxy for tweet labels
  • 15. 08/30/18 Stefan Helmstetter, Heiko Paulheim 15 Evaluation • Feature weighting by xgboost: – most important features are user level features
  • 16. 08/30/18 Stefan Helmstetter, Heiko Paulheim 16 Evaluation • Without user level features – surface level features are strong – content/topics are not too important
  • 17. 08/30/18 Stefan Helmstetter, Heiko Paulheim 17 Conclusion • Fake news detection is a straight forward classification task – but training data is scarce • Inexact mass-labeling can be done – by using source instead of tweet labels – collection of large-scale training is easy – automatic re-collection is possible (e.g., for new topics, changed twitter behavior) • Results for tweet labeling – not much worse than for source labeling
  • 18. 08/30/18 Stefan Helmstetter, Heiko Paulheim 18 Weakly Supervised Learning for Fake News Detection on Twitter Stefan Helmstetter, Heiko Paulheim