With the spread of social media platforms and the proliferation of misleading news, misinformation
detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic,
many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these
fake news, many works have been done on misinformation detection using machine learning and sentiment
analysis in the English language. However, misinformation detection research in the Arabic language on
social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19
related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using
multiple phases of preprocessing techniques before applying different machine learning classification
algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets
achieving 93% best overall accuracy of the model in detecting misinformation. We further build another
dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of
claims. Results show that the combination of machine learning techniques and linguistic analysis achieves
the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.
Fake news detection for Arabic headlines-articles news data using deep learningIJECEIAES
Fake news has become increasingly prevalent in recent years. The evolution of social websites has spurred the expansion of fake news causing it to a mixture with truthful information. English fake news detection had the largest share of studies, unlike Arabic fake news detection, which is still very limited. Fake news phenomenon has changed people and social perspectives through revolts in several Arab countries. False news results in the distortion of reality ignite chaos and stir public judgments. This paper provides an Arabic fake news detection approach using different deep learning models including long short-term memory and convolutional neural network based on article-headline pairs to differentiate if a news headline is in fact related or unrelated to the parallel news article. In this paper, a dataset created about the war in Syria and related to the Middle East political issues is utilized. The whole data comprises 422 claims and 3,042 articles. The models yield promising results.
PANDEMIC INFORMATION DISSEMINATION WEB APPLICATION: A MANUAL DESIGN FOR EVERYONEijcsitcejournal
The aim of this research is to generate a web application from an inedited methodology with a series of
instructions indicating the coding in a flow diagram. The primary purpose of this methodology is to aid
non-profits in disseminating information regarding the COVID-19 pandemic, so that users can share vital
and up-to-date information. This is a functional design, and a series of screenshots demonstrating its
behaviour is presented below. This unique design arose from the necessity to create a web application for
an information dissemination platform; it also addresses an audience that does not have programming
knowledge. This document uses the scientific method in its writing. The authors understand that there is a
similar design in the bibliography; therefore, the differences between the designs are described herein; it
is very important to point out that this proposal can be taken as an alternative to the design of any web
application.
How does fakenews spread understanding pathways of disinformation spread thro...Araz Taeihagh
What are the pathways for spreading disinformation on social media platforms? This article addresses this question by collecting, categorising, and situating an extensive body of research on how application programming interfaces (APIs) provided by social media platforms facilitate the spread of disinformation. We first examine the landscape of official social media APIs, then perform quantitative research on the open-source code repositories GitHub and GitLab to understand the usage patterns of these APIs. By inspecting the code repositories, we classify developers' usage of the APIs as official and unofficial, and further develop a four-stage framework characterising pathways for spreading disinformation on social media platforms. We further highlight how the stages in the framework were activated during the 2016 US Presidential Elections, before providing policy recommendations for issues relating to access to APIs, algorithmic content, advertisements, and suggest rapid response to coordinate campaigns, development of collaborative, and participatory approaches as well as government stewardship in the regulation of social media platforms.
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGijaia
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety,
Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
Categorizing 2019-n-CoV Twitter Hashtag Data by Clusteringgerogepatton
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety, Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGgerogepatton
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety, Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
Fake news detection for Arabic headlines-articles news data using deep learningIJECEIAES
Fake news has become increasingly prevalent in recent years. The evolution of social websites has spurred the expansion of fake news causing it to a mixture with truthful information. English fake news detection had the largest share of studies, unlike Arabic fake news detection, which is still very limited. Fake news phenomenon has changed people and social perspectives through revolts in several Arab countries. False news results in the distortion of reality ignite chaos and stir public judgments. This paper provides an Arabic fake news detection approach using different deep learning models including long short-term memory and convolutional neural network based on article-headline pairs to differentiate if a news headline is in fact related or unrelated to the parallel news article. In this paper, a dataset created about the war in Syria and related to the Middle East political issues is utilized. The whole data comprises 422 claims and 3,042 articles. The models yield promising results.
PANDEMIC INFORMATION DISSEMINATION WEB APPLICATION: A MANUAL DESIGN FOR EVERYONEijcsitcejournal
The aim of this research is to generate a web application from an inedited methodology with a series of
instructions indicating the coding in a flow diagram. The primary purpose of this methodology is to aid
non-profits in disseminating information regarding the COVID-19 pandemic, so that users can share vital
and up-to-date information. This is a functional design, and a series of screenshots demonstrating its
behaviour is presented below. This unique design arose from the necessity to create a web application for
an information dissemination platform; it also addresses an audience that does not have programming
knowledge. This document uses the scientific method in its writing. The authors understand that there is a
similar design in the bibliography; therefore, the differences between the designs are described herein; it
is very important to point out that this proposal can be taken as an alternative to the design of any web
application.
How does fakenews spread understanding pathways of disinformation spread thro...Araz Taeihagh
What are the pathways for spreading disinformation on social media platforms? This article addresses this question by collecting, categorising, and situating an extensive body of research on how application programming interfaces (APIs) provided by social media platforms facilitate the spread of disinformation. We first examine the landscape of official social media APIs, then perform quantitative research on the open-source code repositories GitHub and GitLab to understand the usage patterns of these APIs. By inspecting the code repositories, we classify developers' usage of the APIs as official and unofficial, and further develop a four-stage framework characterising pathways for spreading disinformation on social media platforms. We further highlight how the stages in the framework were activated during the 2016 US Presidential Elections, before providing policy recommendations for issues relating to access to APIs, algorithmic content, advertisements, and suggest rapid response to coordinate campaigns, development of collaborative, and participatory approaches as well as government stewardship in the regulation of social media platforms.
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGijaia
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety,
Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
Categorizing 2019-n-CoV Twitter Hashtag Data by Clusteringgerogepatton
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety, Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
CATEGORIZING 2019-N-COV TWITTER HASHTAG DATA BY CLUSTERINGgerogepatton
Unsupervised machine learning techniques such as clustering are widely gaining use with the recent increase in social communication platforms like Twitter and Facebook. Clustering enables the finding of patterns in these unstructured datasets. We collected tweets matching hashtags linked to COVID-19 from a Kaggle dataset. We compared the performance of nine clustering algorithms using this dataset. We evaluated the generalizability of these algorithms using a supervised learning model. Finally, using a selected unsupervised learning algorithm we categorized the clusters. The top five categories are Safety, Crime, Products, Countries and Health. This can prove helpful for bodies using large amount of Twitter data needing to quickly find key points in the data before going into further classification.
Quantifying covid 19 content in the online health opinion war using machine l...Venkat Projects
Quantifying covid 19 content in the online health opinion war using machine learning
A huge amount of potentially dangerous COVID-19 misinformation is appearing online. Here we use machine learning to quantify COVID-19 content among online opponents of establishment health guidance, in particular vaccinations ("anti-vax"). We find that the anti-vax community is developing a less focused debate around COVID-19 than its counterpart, the pro-vaccination (“pro-vax”) community. However, the anti-vax community exhibits a broader range of “flavors” of COVID-19 topics, and hence can appeal to a broader cross-section of individuals seeking COVID-19 guidance online, e.g. individuals wary of a mandatory fast-tracked COVID-19 vaccine or those seeking alternative remedies. Hence the anti-vax community looks better positioned to attract fresh support going forward than the pro-vax community. This is concerning since a widespread lack of adoption of a COVID-19 vaccine will mean the world falls short of providing herd immunity, leaving countries open to future COVID-19 resurgences. We provide a mechanistic model that interprets these results and could help in assessing the likely efficacy of intervention strategies. Our approach is scalable and hence tackles the urgent problem facing social media platforms of having to analyze huge volumes of online health misinformation and disinformation.
An Overview on the Use of Data Mining and Linguistics Techniques for Building...ijcsit
The usage of Online Social Networks (OSN), such as Facebook and Twitter are becoming more and more
popular in order to exchange and disseminate news and information in real-time. Twitter in particular
allows the instant dissemination of short messages in the form of microblogs to followers. This Survey
reviews literature to explore and examine the usage of how OSNs, such as the microblogging tool Twitter,
can help in the detection of spreading epidemics. The paper highlights significant challenges in the field of
Natural Language Processing (NLP) when using microblog based Early Disease Detection Systems. For
instance, microblogging data is an unstructured collection of short messages (140 characters in Twitter),
with noise and non-standard use of the English language. Hence, research is currently exploring the field
of linguistics in order to determine the semantics of the text and uses data mining techniques in order to
extract useful information for disease spread detection. Furthermore, the survey discusses applications and
existing early disease detection systems based on OSNs and outlines directions for future research on
improving such systems based on a combination of linguistics methods, data mining techniques and
recommendation systems.
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
During the last decade, the social media has been regarded as a rich dominant source of information and news. Its unsupervised nature leads to the emergence and spread of fake news. Fake news detection has gained a great importance posing many challenges to the research community. One of the main challenges is the detection accuracy which is highly affected by the chosen and extracted features and the used classification algorithm. In this paper, we propose a context-based solution that relies on account features and random forest classifier to detect fake news. It achieves the precision of 99.8%. The system accuracy has been compared to other commonly used classifiers such as decision tree classifier, Gaussian Naïve Bayes and neural network which give precision of 98.4%, 92.6%, and 62.7% respectively. The experiments’ accuracy results show the possibility of distinguishing fake news and giving credibility scores for social media news with a relatively high performance.
Experimenting with Big Data and AI to Support Peace and SecurityUN Global Pulse
UN Global Pulse is working with partners to explore how data from social media and radio shows can inform peace and security efforts in Africa. The methodology, case studies, and tools developed as part of these efforts are detailed in this report.
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...caijjournal
The quick access to information on social media networks as well as its exponential rise also made it
difficult to distinguish among fake information or real information. The fast dissemination by way of
sharing has enhanced its falsification exponentially. It is also important for the credibility of social media
networks to avoid the spread of fake information. So it is emerging research challenge to automatically
check for misstatement of information through its source, content, or publisher and prevent the
unauthenticated sources from spreading rumours. This paper demonstrates an artificial intelligence based
approach for the identification of the false statements made by social network entities. Two variants of
Deep neural networks are being applied to evalues datasets and analyse for fake news presence. The
implementation setup produced maximum extent 99% classification accuracy, when dataset is tested for
binary (true or false) labeling with multiple epochs.
Many applications on smart Phones can use various sensors embedded in the mobiles to provide users’ private information. This can result in a variety of privacy issues that may lessening level of mobile apps usage. To understand this issue better the researcher identified the root causes of privacy concerns. The study proposed a model identifies the root causes of privacy concerns and perceived benefits based on our interpretation for information boundary theory. The proposed model also addresses the usage behavior and behavioral intention toward using mobile apps by using the Theory of Planned Behavior. The result shows that “Cultural values” alone explains 70% of “Perceived privacy concerns” followed by “Self-defense” which explains around 23% of “Perceived privacy concerns”, and then “Context of the situation” with 5%. Whereas, the findings show that “Perceived effectiveness of privacy policy” and “Perceived effectiveness of industry self-regulation” both are factors which have the ability to reduce individuals “Perceived privacy concerns” by 9% and 8% respectively.
Social Media and Forced Displacement: Big Data Analytics and Machine Learning...UN Global Pulse
UN Global Pulse and UNHCR Innovation Service, an interdepartmental initiative of the Office of the United Nations High Commissioner for Refugees (UNHCR) used data from Twitter to monitor protection issues and the safe access to asylum of migrants and refugees in Europe. The experimental project investigated interactions among refugees, between refugees and host communities, and between refugees and service providers along the way into Europe. This paper summarises the initial findings and lessons learned, and describes the results of ten mini-studies that were developed as part of the project. It outlines the process, questions and methodology used to develop the studies, and presents preliminary observations on how aspects of the Europe Refugee Emergency are related on social media.
A Pattern Language of Social Media in Public SecuritySebastian Denef
This report summarizes practices of social media use in public security. Our goal is to create an inventory of best practices, lessons-learned, and roles and responsibilities, to analyse specifically how social media is being used by police and other public security planners, within and outside Europe. By providing an overall description, we aim to spark discussions and provide a common language for social media use in the field of public security planning.
Using data from academic literature review, the review of blogs, books, existing best practice descriptions and expert knowledge this report compares social media practices. Inspired by Christopher Alexander’s work on ‘pattern languages’ for urban spaces and buildings, we analysed the data and looked for patterns. To further refine our findings, we presented the practice patterns to social media and security experts and interviewed them about their perspective and current practices.
As a result, we identified 74 practice patterns that describe and structure the use of social media for public security. The patterns are structured in three groups, describing how (1) law enforcement agencies (LEAs), such as the police, (2) citizens and (3) criminals are using social media and impact public security. With 50 patterns, the focus of our work is on group (1), the LEAs.
Monitoring Indonesian online news for COVID-19 event detection using deep le...IJECEIAES
Even though coronavirus disease 2019 (COVID-19) vaccination has been done, preparedness for the possibility of the next outbreak wave is still needed with new mutations and virus variants. A near real-time surveillance system is required to provide the stakeholders, especially the public, to act in a timely response. Due to the hierarchical structure, epidemic reporting is usually slow particularly when passing jurisdictional borders. This condition could lead to time gaps for public awareness of new and emerging events of infectious diseases. Online news is a potential source for COVID-19 monitoring because it reports almost every infectious disease incident globally. However, the news does not report only about COVID-19 events, but also various information related to COVID-19 topics such as the economic impact, health tips, and others. We developed a framework for online news monitoring and applied sentence classification for news titles using deep learning to distinguish between COVID-19 events and non-event news. The classification results showed that the fine-tuned bidirectional encoder representations from transformers (BERT) trained with Bahasa Indonesia achieved the highest performance (accuracy: 95.16%, precision: 94.71%, recall: 94.32%, F1-score: 94.51%). Interestingly, our framework was able to identify news that reports the new COVID strain from the United Kingdom (UK) as an event news, 13 days before the Indonesian officials closed the border for foreigners.
Sentiment analysis of comments in social media IJECEIAES
Social media platforms are witnessing a significant growth in both size and purpose. One specific aspect of social media platforms is sentiment analysis, by which insights into the emotions and feelings of a person can be inferred from their posted text. Research related to sentiment analysis is acquiring substantial interest as it is a promising filed that can improve user experience and provide countless personalized services. Twitter is one of the most popular social media platforms, it has users from different regions with a variety of cultures and languages. It can thus provide valuable information for a diverse and large amount of data to be used to improve decision making. In this paper, the sentiment orientation of the textual features and emoji-based components is studied targeting “Tweets” and comments posted in Arabic on Twitter, during the 2018 world cup event. This study also measures the significance of analyzing texts including or excluding emojis. The data is obtained from thousands of extracted tweets, to find the results of sentiment analysis for texts and emojis separately. Results show that emojis support the sentiment orientation of the texts and those texts or emojis cannot separately provide reliable information as they complement each other to give the intended meaning.
Fake News Detection on Social Media using Machine Learningclassic tpr
For some years, mostly since the rise of social media, fake news has become a society
problem, in some occasions spreading more and faster than the true information. Hence it is
very important to detect and reduce the involvement of fake news in social platforms. This
project comes up with the applications of NLP (Natural Language Processing) techniques for
detecting the 'fake news', that is, misleading news stories that come from the non-reputable
sources. Natural language processing (NLP) refers to the branch of computer science- and
more specifically, the branch of artificial intelligence or AI-concerned with giving computers
the ability to understand text and spoken words in much the same way human beings can.
NLP combines computational linguistics-rule based modelling of human language- with
statistical, machine learning and deep learning models. Together these technologies enable
computers to process human language in the form of text or voice data and to 'understand' its
full meaning, complete with the speaker or writer's intent and sentiment. Only by building a
model based on a count vectorizer (using word tallies) or a (Term Frequency Inverse
Document Frequency) tfidf matrix, can only get you so far. But these models do not consider
the important qualities like word ordering and context. It is very possible that two articles that
are similar in their word count will be completely different in their meaning. The data science
community has responded by taking actions against the problem.
An Assessment of Sentiment Analysis of Covid 19 Tweetsijtsrd
Various rumors and assumptions have circulated about the COVID 19 immunization, making it a heated subject of discussion in India. This prompted a reaction from the countrys populace, who During the course of favorable, negative, and neutral evaluations, tweets and retweets on twitter. The number of these tweets are a jumble of unstructured data. The goal of this study is to have the statistics justify feeling implied by it. The purpose of this study is to take advantage of twitters massive data pool and extract insights that have the implications that can be drawn from it. Comprehensive research on the peoples feelings may help us arrive at a fair familiarity with the population at larges point of view toward preventing disease by vaccination. Dataset taken into consideration for vaccination related tweets are collected for study. From 2020 to 2021, including a data mining of 16,05,152 tweets related to vaccination. Ms. Tanzeela Qureshi | Dr. Mohit Singh Tomar | Dr. Ritu Shrivastava "An Assessment of Sentiment Analysis of Covid-19 Tweets" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-5 , October 2023, URL: https://www.ijtsrd.com/papers/ijtsrd59976.pdf Paper Url: https://www.ijtsrd.com/other-scientific-research-area/other/59976/an-assessment-of-sentiment-analysis-of-covid19-tweets/ms-tanzeela-qureshi
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
More Related Content
Similar to COMBINING MACHINE LEARNING AND SEMANTIC ANALYSIS FOR EFFICIENT MISINFORMATION DETECTION OF ARABIC COVID-19 TWEETS
Quantifying covid 19 content in the online health opinion war using machine l...Venkat Projects
Quantifying covid 19 content in the online health opinion war using machine learning
A huge amount of potentially dangerous COVID-19 misinformation is appearing online. Here we use machine learning to quantify COVID-19 content among online opponents of establishment health guidance, in particular vaccinations ("anti-vax"). We find that the anti-vax community is developing a less focused debate around COVID-19 than its counterpart, the pro-vaccination (“pro-vax”) community. However, the anti-vax community exhibits a broader range of “flavors” of COVID-19 topics, and hence can appeal to a broader cross-section of individuals seeking COVID-19 guidance online, e.g. individuals wary of a mandatory fast-tracked COVID-19 vaccine or those seeking alternative remedies. Hence the anti-vax community looks better positioned to attract fresh support going forward than the pro-vax community. This is concerning since a widespread lack of adoption of a COVID-19 vaccine will mean the world falls short of providing herd immunity, leaving countries open to future COVID-19 resurgences. We provide a mechanistic model that interprets these results and could help in assessing the likely efficacy of intervention strategies. Our approach is scalable and hence tackles the urgent problem facing social media platforms of having to analyze huge volumes of online health misinformation and disinformation.
An Overview on the Use of Data Mining and Linguistics Techniques for Building...ijcsit
The usage of Online Social Networks (OSN), such as Facebook and Twitter are becoming more and more
popular in order to exchange and disseminate news and information in real-time. Twitter in particular
allows the instant dissemination of short messages in the form of microblogs to followers. This Survey
reviews literature to explore and examine the usage of how OSNs, such as the microblogging tool Twitter,
can help in the detection of spreading epidemics. The paper highlights significant challenges in the field of
Natural Language Processing (NLP) when using microblog based Early Disease Detection Systems. For
instance, microblogging data is an unstructured collection of short messages (140 characters in Twitter),
with noise and non-standard use of the English language. Hence, research is currently exploring the field
of linguistics in order to determine the semantics of the text and uses data mining techniques in order to
extract useful information for disease spread detection. Furthermore, the survey discusses applications and
existing early disease detection systems based on OSNs and outlines directions for future research on
improving such systems based on a combination of linguistics methods, data mining techniques and
recommendation systems.
Clustering analysis on news from health OSINT data regarding CORONAVIRUS-COVI...ALexandruDaia1
Our primarly goal was to detect clusters via gensim libraries in news data consisting ofinformation regarding health and threats. We identified clusters for the periodscorresponding: i) Jannuary 2006 until the end of 2019, as December 2019 is considered thefirst month in which information about CORONVIRUS COVID-19 was made public; ii)between the 1st of Jannuary 2019 and 31st December 2019; and iii) between the 31st ofDecember 2019 and the 14th of April 2020. We conducted experiments using naturallanguage on open source intelligence data offered generously by brica.de, a providerspecialized in Business Risk Intelligence & Cyberthreat Awareness.
During the last decade, the social media has been regarded as a rich dominant source of information and news. Its unsupervised nature leads to the emergence and spread of fake news. Fake news detection has gained a great importance posing many challenges to the research community. One of the main challenges is the detection accuracy which is highly affected by the chosen and extracted features and the used classification algorithm. In this paper, we propose a context-based solution that relies on account features and random forest classifier to detect fake news. It achieves the precision of 99.8%. The system accuracy has been compared to other commonly used classifiers such as decision tree classifier, Gaussian Naïve Bayes and neural network which give precision of 98.4%, 92.6%, and 62.7% respectively. The experiments’ accuracy results show the possibility of distinguishing fake news and giving credibility scores for social media news with a relatively high performance.
Experimenting with Big Data and AI to Support Peace and SecurityUN Global Pulse
UN Global Pulse is working with partners to explore how data from social media and radio shows can inform peace and security efforts in Africa. The methodology, case studies, and tools developed as part of these efforts are detailed in this report.
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...caijjournal
The quick access to information on social media networks as well as its exponential rise also made it
difficult to distinguish among fake information or real information. The fast dissemination by way of
sharing has enhanced its falsification exponentially. It is also important for the credibility of social media
networks to avoid the spread of fake information. So it is emerging research challenge to automatically
check for misstatement of information through its source, content, or publisher and prevent the
unauthenticated sources from spreading rumours. This paper demonstrates an artificial intelligence based
approach for the identification of the false statements made by social network entities. Two variants of
Deep neural networks are being applied to evalues datasets and analyse for fake news presence. The
implementation setup produced maximum extent 99% classification accuracy, when dataset is tested for
binary (true or false) labeling with multiple epochs.
Many applications on smart Phones can use various sensors embedded in the mobiles to provide users’ private information. This can result in a variety of privacy issues that may lessening level of mobile apps usage. To understand this issue better the researcher identified the root causes of privacy concerns. The study proposed a model identifies the root causes of privacy concerns and perceived benefits based on our interpretation for information boundary theory. The proposed model also addresses the usage behavior and behavioral intention toward using mobile apps by using the Theory of Planned Behavior. The result shows that “Cultural values” alone explains 70% of “Perceived privacy concerns” followed by “Self-defense” which explains around 23% of “Perceived privacy concerns”, and then “Context of the situation” with 5%. Whereas, the findings show that “Perceived effectiveness of privacy policy” and “Perceived effectiveness of industry self-regulation” both are factors which have the ability to reduce individuals “Perceived privacy concerns” by 9% and 8% respectively.
Social Media and Forced Displacement: Big Data Analytics and Machine Learning...UN Global Pulse
UN Global Pulse and UNHCR Innovation Service, an interdepartmental initiative of the Office of the United Nations High Commissioner for Refugees (UNHCR) used data from Twitter to monitor protection issues and the safe access to asylum of migrants and refugees in Europe. The experimental project investigated interactions among refugees, between refugees and host communities, and between refugees and service providers along the way into Europe. This paper summarises the initial findings and lessons learned, and describes the results of ten mini-studies that were developed as part of the project. It outlines the process, questions and methodology used to develop the studies, and presents preliminary observations on how aspects of the Europe Refugee Emergency are related on social media.
A Pattern Language of Social Media in Public SecuritySebastian Denef
This report summarizes practices of social media use in public security. Our goal is to create an inventory of best practices, lessons-learned, and roles and responsibilities, to analyse specifically how social media is being used by police and other public security planners, within and outside Europe. By providing an overall description, we aim to spark discussions and provide a common language for social media use in the field of public security planning.
Using data from academic literature review, the review of blogs, books, existing best practice descriptions and expert knowledge this report compares social media practices. Inspired by Christopher Alexander’s work on ‘pattern languages’ for urban spaces and buildings, we analysed the data and looked for patterns. To further refine our findings, we presented the practice patterns to social media and security experts and interviewed them about their perspective and current practices.
As a result, we identified 74 practice patterns that describe and structure the use of social media for public security. The patterns are structured in three groups, describing how (1) law enforcement agencies (LEAs), such as the police, (2) citizens and (3) criminals are using social media and impact public security. With 50 patterns, the focus of our work is on group (1), the LEAs.
Monitoring Indonesian online news for COVID-19 event detection using deep le...IJECEIAES
Even though coronavirus disease 2019 (COVID-19) vaccination has been done, preparedness for the possibility of the next outbreak wave is still needed with new mutations and virus variants. A near real-time surveillance system is required to provide the stakeholders, especially the public, to act in a timely response. Due to the hierarchical structure, epidemic reporting is usually slow particularly when passing jurisdictional borders. This condition could lead to time gaps for public awareness of new and emerging events of infectious diseases. Online news is a potential source for COVID-19 monitoring because it reports almost every infectious disease incident globally. However, the news does not report only about COVID-19 events, but also various information related to COVID-19 topics such as the economic impact, health tips, and others. We developed a framework for online news monitoring and applied sentence classification for news titles using deep learning to distinguish between COVID-19 events and non-event news. The classification results showed that the fine-tuned bidirectional encoder representations from transformers (BERT) trained with Bahasa Indonesia achieved the highest performance (accuracy: 95.16%, precision: 94.71%, recall: 94.32%, F1-score: 94.51%). Interestingly, our framework was able to identify news that reports the new COVID strain from the United Kingdom (UK) as an event news, 13 days before the Indonesian officials closed the border for foreigners.
Sentiment analysis of comments in social media IJECEIAES
Social media platforms are witnessing a significant growth in both size and purpose. One specific aspect of social media platforms is sentiment analysis, by which insights into the emotions and feelings of a person can be inferred from their posted text. Research related to sentiment analysis is acquiring substantial interest as it is a promising filed that can improve user experience and provide countless personalized services. Twitter is one of the most popular social media platforms, it has users from different regions with a variety of cultures and languages. It can thus provide valuable information for a diverse and large amount of data to be used to improve decision making. In this paper, the sentiment orientation of the textual features and emoji-based components is studied targeting “Tweets” and comments posted in Arabic on Twitter, during the 2018 world cup event. This study also measures the significance of analyzing texts including or excluding emojis. The data is obtained from thousands of extracted tweets, to find the results of sentiment analysis for texts and emojis separately. Results show that emojis support the sentiment orientation of the texts and those texts or emojis cannot separately provide reliable information as they complement each other to give the intended meaning.
Fake News Detection on Social Media using Machine Learningclassic tpr
For some years, mostly since the rise of social media, fake news has become a society
problem, in some occasions spreading more and faster than the true information. Hence it is
very important to detect and reduce the involvement of fake news in social platforms. This
project comes up with the applications of NLP (Natural Language Processing) techniques for
detecting the 'fake news', that is, misleading news stories that come from the non-reputable
sources. Natural language processing (NLP) refers to the branch of computer science- and
more specifically, the branch of artificial intelligence or AI-concerned with giving computers
the ability to understand text and spoken words in much the same way human beings can.
NLP combines computational linguistics-rule based modelling of human language- with
statistical, machine learning and deep learning models. Together these technologies enable
computers to process human language in the form of text or voice data and to 'understand' its
full meaning, complete with the speaker or writer's intent and sentiment. Only by building a
model based on a count vectorizer (using word tallies) or a (Term Frequency Inverse
Document Frequency) tfidf matrix, can only get you so far. But these models do not consider
the important qualities like word ordering and context. It is very possible that two articles that
are similar in their word count will be completely different in their meaning. The data science
community has responded by taking actions against the problem.
An Assessment of Sentiment Analysis of Covid 19 Tweetsijtsrd
Various rumors and assumptions have circulated about the COVID 19 immunization, making it a heated subject of discussion in India. This prompted a reaction from the countrys populace, who During the course of favorable, negative, and neutral evaluations, tweets and retweets on twitter. The number of these tweets are a jumble of unstructured data. The goal of this study is to have the statistics justify feeling implied by it. The purpose of this study is to take advantage of twitters massive data pool and extract insights that have the implications that can be drawn from it. Comprehensive research on the peoples feelings may help us arrive at a fair familiarity with the population at larges point of view toward preventing disease by vaccination. Dataset taken into consideration for vaccination related tweets are collected for study. From 2020 to 2021, including a data mining of 16,05,152 tweets related to vaccination. Ms. Tanzeela Qureshi | Dr. Mohit Singh Tomar | Dr. Ritu Shrivastava "An Assessment of Sentiment Analysis of Covid-19 Tweets" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-7 | Issue-5 , October 2023, URL: https://www.ijtsrd.com/papers/ijtsrd59976.pdf Paper Url: https://www.ijtsrd.com/other-scientific-research-area/other/59976/an-assessment-of-sentiment-analysis-of-covid19-tweets/ms-tanzeela-qureshi
Similar to COMBINING MACHINE LEARNING AND SEMANTIC ANALYSIS FOR EFFICIENT MISINFORMATION DETECTION OF ARABIC COVID-19 TWEETS (20)
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Industrial Training at Shahjalal Fertilizer Company Limited (SFCL)MdTanvirMahtab2
This presentation is about the working procedure of Shahjalal Fertilizer Company Limited (SFCL). A Govt. owned Company of Bangladesh Chemical Industries Corporation under Ministry of Industries.
Immunizing Image Classifiers Against Localized Adversary Attacksgerogepatton
This paper addresses the vulnerability of deep learning models, particularly convolutional neural networks
(CNN)s, to adversarial attacks and presents a proactive training technique designed to counter them. We
introduce a novel volumization algorithm, which transforms 2D images into 3D volumetric representations.
When combined with 3D convolution and deep curriculum learning optimization (CLO), itsignificantly improves
the immunity of models against localized universal attacks by up to 40%. We evaluate our proposed approach
using contemporary CNN architectures and the modified Canadian Institute for Advanced Research (CIFAR-10
and CIFAR-100) and ImageNet Large Scale Visual Recognition Challenge (ILSVRC12) datasets, showcasing
accuracy improvements over previous techniques. The results indicate that the combination of the volumetric
input and curriculum learning holds significant promise for mitigating adversarial attacks without necessitating
adversary training.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Water scarcity is the lack of fresh water resources to meet the standard water demand. There are two type of water scarcity. One is physical. The other is economic water scarcity.
Hybrid optimization of pumped hydro system and solar- Engr. Abdul-Azeez.pdffxintegritypublishin
Advancements in technology unveil a myriad of electrical and electronic breakthroughs geared towards efficiently harnessing limited resources to meet human energy demands. The optimization of hybrid solar PV panels and pumped hydro energy supply systems plays a pivotal role in utilizing natural resources effectively. This initiative not only benefits humanity but also fosters environmental sustainability. The study investigated the design optimization of these hybrid systems, focusing on understanding solar radiation patterns, identifying geographical influences on solar radiation, formulating a mathematical model for system optimization, and determining the optimal configuration of PV panels and pumped hydro storage. Through a comparative analysis approach and eight weeks of data collection, the study addressed key research questions related to solar radiation patterns and optimal system design. The findings highlighted regions with heightened solar radiation levels, showcasing substantial potential for power generation and emphasizing the system's efficiency. Optimizing system design significantly boosted power generation, promoted renewable energy utilization, and enhanced energy storage capacity. The study underscored the benefits of optimizing hybrid solar PV panels and pumped hydro energy supply systems for sustainable energy usage. Optimizing the design of solar PV panels and pumped hydro energy supply systems as examined across diverse climatic conditions in a developing country, not only enhances power generation but also improves the integration of renewable energy sources and boosts energy storage capacities, particularly beneficial for less economically prosperous regions. Additionally, the study provides valuable insights for advancing energy research in economically viable areas. Recommendations included conducting site-specific assessments, utilizing advanced modeling tools, implementing regular maintenance protocols, and enhancing communication among system components.
TECHNICAL TRAINING MANUAL GENERAL FAMILIARIZATION COURSEDuvanRamosGarzon1
AIRCRAFT GENERAL
The Single Aisle is the most advanced family aircraft in service today, with fly-by-wire flight controls.
The A318, A319, A320 and A321 are twin-engine subsonic medium range aircraft.
The family offers a choice of engines
Halogenation process of chemical process industries
COMBINING MACHINE LEARNING AND SEMANTIC ANALYSIS FOR EFFICIENT MISINFORMATION DETECTION OF ARABIC COVID-19 TWEETS
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
DOI: 10.5121/ijcsit.2022.14303 31
COMBINING MACHINE LEARNING AND SEMANTIC
ANALYSIS FOR EFFICIENT MISINFORMATION
DETECTION OF ARABIC COVID-19 TWEETS
Abdulrahim Alhaizaey and Jawad Berri
Department of Information Systems, King Saud University, Riyadh, Saudi Arabia
ABSTRACT
With the spread of social media platforms and the proliferation of misleading news, misinformation
detection within microblogging platforms has become a real challenge. During the Covid-19 pandemic,
many fake news and rumors were broadcasted and shared daily on social media. In order to filter out these
fake news, many works have been done on misinformation detection using machine learning and sentiment
analysis in the English language. However, misinformation detection research in the Arabic language on
social media is limited. This paper introduces a misinformation verification system for Arabic COVID-19
related news using an Arabic rumors dataset on Twitter. We explored the dataset and prepared it using
multiple phases of preprocessing techniques before applying different machine learning classification
algorithms combined with a semantic analysis method. The model was applied on 3.6k annotated tweets
achieving 93% best overall accuracy of the model in detecting misinformation. We further build another
dataset of Covid-19 related claims in Arabic to examine how our model performs with this new set of
claims. Results show that the combination of machine learning techniques and linguistic analysis achieves
the best scores reaching 92% best accuracy in detecting the veracity of sentences of the new dataset.
KEYWORDS
Misinformation, machine learning, Arabic NLP, contextual exploration, rumor detection.
1. INTRODUCTION
In recent years, the use of social media platforms has increased considerably, imposing itself as
one of the most important sources for spreading news and broadcasting opinions. Social media
platforms have become a rich source of information and have provided many researchers with an
opportunity to analyze the vast amount of data that users of these platforms post on a daily basis.
Within the Arab world, a recent study indicated that social media platforms had become the
dominant source of news for Arab youth [1]. Twitter1
is one of the most important of these
platforms, as it is the second most used platform after Facebook2
by Arabic speakers [2]. The
Arabic language is the sixth largest language present on Twitter [3], making it an important
source for data mining research and analysis.
With the emergence of the Covid-19 pandemic, Twitter, like many other social media platforms,
has witnessed an increased interaction by people in expressing opinions and discussions related to
the Coronavirus. Regardless of the different measures enforced by governments to contain the
spread of this virus, the spread of misleading news about the Coronavirus represents a unique
1
https://twitter.com/home
2
https://www.facebook.com/
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
32
phenomenon, requiring many decision-makers to hold daily conferences to answer and refute
many widespread fake news and rumors. Moreover, with millions of daily posted tweets and
replays, it becomes nearly impossible for reliable fact-checking services to keep up checking
claims with such a massive amount of data. We argue that data mining and machine learning
tools can take some of the burden off fact-checking platforms that fundamentally rely on the role
of experts in checking and refuting rumors and misleading news. A broad range of literature has
studied claim verification using machine learning tools in the English language online content.
Many new recent studies address the problem of Covid-19 rumors detection in English online
content such as [4] and [5]. However, very few studies have addressed this problem in Arabic
online content. For example, [6] suggested a model for rumor detection in Arabic tweets using
semi-supervised and unsupervised expectation-maximization. Another study [7] introduced an
Arabic corpus of fake news targeting YouTube. However, both of these works did not address the
misinformation detection related to the Coronavirus pandemic in Arabic language. In general,
studies on the verification of Arabic language claims and rumors, particularly those related to the
pandemics, are rarely explored. To the best of our knowledge, this is one of the very few studies
that address this topic.
This paper provides a model that uses machine learning tools combined with a semantic analysis
method to detect misleading news related to the Coronavirus pandemic. We used the ArCOV19-
Rumors dataset [8] in our study, which is an Arabic COVID-19 related dataset retrieved from
Twitter. The dataset ArCOV19-Rumors is the only available Arabic dataset to support claims
verification studies. In addition, the authors who released this dataset have intensively annotated
the veracity of the tweets. This intensive annotation makes it a good resource to support research
on misinformation detection as one of the major problems encountered during the pandemic.
Furthermore, the dataset is balanced in terms of the distribution of true and false tweets, making
it a sound resource for building and training a verification system. The dataset we used was
analyzed using Sentiment Analysis approaches to provide instant detection of fake news. Despite
that this model could arguably be a more affordable alternative than relying on expert editors or
journalists, this model could also help in immediately notifying those who are interested in fact-
checking such as professional journalists and editors running fact-checking services. We used the
Ar-Covid-19 dataset [8] in our study. This dataset contains a total of 1831 Arabic tweets with a
valid claim and 1660 Arabic tweets with a false claim related to the Coronavirus during the
period from 27𝑡ℎ January till the end of April 2020. Table 1. shows a statistical summary of the
labeled tweets in ArCOV19-Rumors dataset. Based on this labeled dataset, we developed a claim
verification system taking advantage of the capabilities provided by Python programming
language with a suite of libraries for natural language processing and machine learning. Our
model consists of multiple components: data collection, data preprocessing and cleansing, and
machine learning classification. In the next section, we provide a background of the related
works. Then, in Section 3, we present the methodology. Section 4 presents the results and
discussion. Section 5 shows how the accuracy of the classifiers can be enhanced on our testing set
of claims after applying some sematic based rules. The conclusion of the work is then provided.
Table 1. Statistics of Claims in ArCOV19-Rumors [8].
Label Number of tweets Percentage
False Tweets 1753 (48.9%)
True Tweets 1831 (51.1%)
Total 3,584 (100%)
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
33
2. LITERATURE REVIEW
In the internet age, the rise of fake news highlights the urgent need for organizational solutions
for such a challenging problem. Fake news and manipulating people for different reasons have a
long history and used to be addressed by journalistic norms of objectivity; however, social
networks have created a context in which fake news can attract a mass audience [9]. With the
emerging of Covid-19, fake news and claims spreading on social media platforms are immensely
remarkable. With the increased transformation to social media as the new news sources, the
traditional ones that had enjoyed good levels of public trust and reliability have lost their
popularity. Hence, a new way of verifying news on social media is needed. Fact-checking
websites can be seen as an attempt to alleviate this complex problem in the internet world. Very
few Arabic platforms provide Fact-Checking services to evaluate factual claims of news stories in
Arabic. For example, Misbar3
, AFP4
, and fatabyyano5
are digital services that have been launched
for the purpose of news verification. These fact-checking services rely heavily on their networks
of expert editors and journalists to run their fact-check services. However, despite that the use of
these fact-checking services is still not measured, the effectiveness of such services in the Arabic
world needs investigation.
There is a broad body of literature that would suggest that humans are not good at predicting and
decision making. For that reason, machine learning has been widely used in many real-life
applications to improve and enhance human decision-making. For example, in August 2016, the
Allegheny County Department of Human Services implemented a machine learning-based
predictive risk modeling tool called AFST [10]. This tool was explicitly designed to improve
child welfare call screening decisions. The tool was the result of a two-year process of
exploration about how existing administrative data could be used more efficiently to improve
decision-making at the time of a child welfare referral. Facebook also uses machine learning
algorithms to identify and remove sensitive content such as violence and sexual contents
automatically. In addition, realizing that it is impossible for human fact-checkers to review stories
on an individual basis with around two billion Facebook registered users, Facebook recently
started to use similar technologies to recognize false news and take action on a bigger scale [11].
Likewise, we believe that using the state of the art machine learning and artificial intelligence
techniques could be an effective tool in combating misinformation in the near future.
Instantaneous insights into fake news and rumors in social media could be done using sentimental
analysis (SA) and machine learning (ML) techniques. SA is a methodology used for natural
language processing. It is used to analyze human opinions, emotions, sentiments, and attitudes.
The noticeable increase in employing SA techniques in opinion mining has been co-occurred
with the increased use of social media, benefiting from the huge amounts of available data in
social media platforms. SA can be implemented using machine learning models. So, could we
take advantage of such techniques and methods to detect misinformation in social media? This is
what we are trying to do in this research. We apply SA using ML to Ar-Rumor dataset [8] to see
to what extent these techniques can be beneficial in detecting Arabic fake news. The main
contribution of this work is, in one aspect, that it shows the feasibility of automating the detection
of misinformation on the microblogging platforms such as Twitter. There must be automated
systems for such purposes, given the uncontrollable vast amount of content and users. The second
aspect is that misinformation detection in Arabic is rarely explored, which this paper has touched
upon. The third aspect is the combination between machine learning techniques and semantic
analysis which turned out to produce better results in detecting misinformation. The following
section presents our methodology for tackling such a problem.
3
https://misbar.com/
4
https://factuel.afp.com/ar/list
5
https://fatabyyano.net
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
34
3. METHODOLOGY
Once we retrieved and collected the dataset, we build our claim verification system, which
contains multiple components. The first part is data preprocessing and normalization using text
preprocessing techniques followed by an additional layer of data filtering and feature extraction.
The next component is the classification using different machine learning classifiers. Our model
is depicted in Figure 1. The details of the model components are described in the following
paragraphs.
3.1. Retrieving Dataset
We retrieved the dataset using Hydrator [12], a free tool for hydrating Twitter ID datasets. We
collected 9.4k labeled relevant tweets. A total of 3.6k of the collected tweets are labeled either
true or false. A total of 1,753 tweets were labeled false, and a total of 1,831 tweets were labeled
true. The distribution of true tweets versus false tweets is balanced, making it a good resource for
training verification systems.
3.2. Data Preprocessing and Normalization
After collecting the Twitter dataset, the data were prepared for analysis. Due to the noisy nature
of the microblog data such as Twitter tweets, text preprocessing and cleaning is an essential step
to eliminate noise and unnecessary data that do not represent any value about the sentiment of the
text. Data preprocessing plays a significant role in improving the accuracy of classification
algorithms and speeding up the training process to enhance the models' learning efficiency [13].
For this purpose, we used a natural language toolkit (NLTK) [14] of Python, which has a suite of
text processing libraries.
Preprocessing and cleansing include the following steps: removing repeated letters, removing
stop words, which are provided in [14], removing words that have less than two letters, removing
numbers and additional spaces, removing any non-Arabic letters, removing punctuation marks
and emojis, removing Arabic diacritics, and normalizing the data by converting multiple forms of
a letter into one uniform letter since some Arabic letters could be represented in different forms.
3.3. Additional Filtering
Besides eliminating noise data, and since the dataset might contain duplicate tweets and what
could be considered spam tweets, we performed an additional filtering step. In this step, tweets
with URLs, phone numbers, more than four hashtags, and duplicated tweets are classified as
spam, so we deleted them. Then, we removed hashtags, URLs, and mentions. After that, the
duplicated tweets are dropped to keep only one representation of each tweet. Finally, after
preprocessing and cleaning, some tweets resulted in a very short text of only a few words, so we
removed any tweet with less than two words because we believe such tweets do not contribute
towards the classification and need to be removed.
3.4. Feature Extraction
Predictive modeling requires text data to be customized and prepared before using it for
prediction purposes. The textual data have to be parsed and tokenized such that the words are
converted into numbers or to be used as input for the machine, which is referred to as
vectorization. For this purpose, the scikit-learn library [15] provides convenient tools to perform
feature extraction of text data. With this tool, we used a popular technique called "Term
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
35
Frequency and Inverse Document Frequency" (TF-IDF) to represent the data as a feature where
words are given frequency scores. This helps in determining the words that are more interesting
for classification. Also, Arabic stop words imported from NLTK [14] were used to drop specific
words from data. These words do not contribute towards classification.
3.5. Machine Learning Classification
Many supervised ML classifiers have been widely used in the literature for text analysis. We
have used four different common ML algorithms to analyze and compare the results, namely:
Logistic Regression (LR) [16], Random Forest (RF) [17], Multinomial Naive Bayes (MNB) [18],
and Support Vector Machine (SVM) [19]. Although many ML classifiers are available in text
classification problems, we adopted the four mentioned classifiers because these classifiers have
been widely applied and used in this context. The ones involved here are just chosen as a proof
of concept, although other classifiers could be chosen. These four ML algorithms have been
popularly used for text and sentiment analysis, so we found them proper to use in our study
context. After preprocessing data, we supplied the dataset to different ML algorithms to build a
classifier that learns from training labeled tweets. In order to build our classifier, we used a
Scikit-learn library, an open-source machine learning package in Python that provides various
classification algorithms. We have randomly set our ML algorithms to choose 20% of the dataset
as test sets. Then we have used the remaining 80% of the dataset to train our machine learning
models. The trained models can then predict whether another claim from the test set is true or
false claim. In our analysis, the dataset we used was classified into two classes, true and false.
Figure 1. Verification Model Architecture
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
36
4. EXPERIMENT AND RESULTS ANALYSIS
After building our verification prediction model, we have carried out our experiments to test and
evaluate the accuracy of this model. In order to evaluate our model, we used the accuracy metric
(1), which is a common metric to measure performance in machine learning classification and
pattern recognition. A true positive (TP) is an outcome where the machine learning model
correctly predicts the true claim. Likewise, a true negative (TN) is an outcome where the model
correctly predicts the false claim. Similarly, a false positive (FP) is a result where the model
incorrectly predicts the true class. And a false negative (FN) is a result where the model
incorrectly predicts the false claim. We further build another dataset of Covid-19 related claims
in Arabic to examine how our model performs with this new set of claims. Figure 2 shows the
analysis of the results of our model for the two datasets and the machine learning models chosen
in the experiments.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 =
𝑇𝑃+𝑇𝑁
𝑇𝑃+𝑇𝑁+𝐹𝑃+𝐹𝑁
Our first observation on the accuracy of our model is that the SVM model achieved 93%
accuracy compared to the remaining machine learning approaches, where the accuracy of LR,
MNB, RF are 91%, 91%, and 89%, respectively. On the other hand, when testing our model
against our dataset to measure its performance, we see that our current system performs well for
the new dataset that we built for the purpose of measuring the performance of the model. Our
accuracy analysis shows that our dataset results in a bit lower accuracy. The accuracy of the
machine learning algorithms in detecting veracity in our dataset is 79%, 75%, 85%, and 87% for
LR, RF, MNB, and SVM approaches, respectively. It can be observed that SVM and MNB
significantly outperform LR and RF in our dataset. This is because our dataset is relatively small.
In addition, the proper learning of our system with more data can increase the accuracy of the
system as a whole.
Figure 2. Results Analysis of the two datasets
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
37
5. SEMANTIC TWEETS ANALYSIS
In order to improve the accuracy of the results obtained in the previous step with machine
learning algorithms, we used a linguistic analysis method which aim to analyze the results of the
learning algorithms semantically. Machine learning algorithms are efficient in dealing with huge
data and can learn and improve. However, these algorithms may not classify the tweets in the
dataset correctly when it comes to very fine semantic analysis that is necessary to define the
meaning of sentences. On the other hand, semantic analysis lacks performance when dealing with
big datasets but is efficient in discovering fine semantic nuances in the meaning of tweets which
can falsify the results. A combination of both methods is crucial to improve the accuracy of
misinformation detection in the dataset used in the experiment. Semantic analysis should explore
the tweet linguistic context to be able to detect the right semantic value of the tweet. We used a
linguistic method, namely the Contextual Exploration (CE) method [22], which can analyze the
context and take the right decision for labeling sentences semantically. CE is used here to
compute the semantic value representing the meaning of a sentence. CE does not require huge
linguistic resources and fine semantic descriptions of language units. Instead, it requires only the
linguistic grammatical and lexical know-how used by a decision maker when assigning the
semantic value to a sentence [23]. A close look at the dataset shows that many linguistic markers
can falsely classify tweets' semantic value. We focused in this research on four semantic labels,
namely: the interrogation, the negation, the claim and the reported statement. Table 2 provides
illustrative examples of tweets from the dataset. Using interrogation sentences is common in
news headlines to grab the audience's attention. On Twitter, it is usually posted with an attached
URL to the news webpage. Interrogation is expressed in Arabic by using the interrogation mark
"؟" such as in the tweet example S1. This sentence questions the veracity of the Hanta virus that
appeared in China without stating any truth value. CE detects these sentences by the interrogation
mark and accordingly labels them as an interrogation sentence and reclassify them as Neutral.
Negation in standard Arabic can be expressed in different ways depending on the verb tense and
using the five negation forms: ليس ،لن ،ما ،لم ،ال. For instance, sentence S2 in Table 2 is labeled as
a negation sentence since it includes the word "ال". These sentences are reclassified to the
opposite class assigned by the classifier.
Table 2. Examples of Tweet sentences.
Text Sentence (Tweet)
S1
الحقيقة؟ ما ...الصين في هانتا فيروس ظهور كورونا بعد
After Corona, the emergence of the Hanta virus in China ... What is the truth?
S2
"S Gene not detected"
عن بمنأى أنك يعني ال
كوفيد
-
19
!
"S Gene not detected" doesn't mean you are immune to COVID-19!
S3
كورونا جائحة بدأ قبل حتى موجود كورونا لقاح :أن يدعي فيديو
)...(
Video claims that: Corona vaccine existed even before the Corona pandemic began (…)
S4
:البريطانية ميل ديلي صحيفة
الموسم هذا أخرى مرة اإلنجليزي الدوري يعود لن
)...(
The British Daily Mail: The English Premier League will not resume this season (...)
A claim is a statement conveyed to assert something either by the speaker or it is reported as a
claim of another party such in sentence S3, where the claim is reported from a video (third
person) using the verb "يدعي" (claims) and the colon mark (:) which is explicitly used in the
sentence. In general, when the speaker claims something (by using the pronoun I, or we) this
means that the claim is most likely definite and hence it is labeled as a positive sentence. On the
other hand, when the claim is stated by another party (not the speaker) the sentence is classified
as neutral. A reported statement is frequently used by news reporters who convey news
statements from other news agencies or information sources such in sentence S4. In such case the
sentence is labeled as neutral since the veracity of the reported sentence is referred to the claimer
8. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
38
without any responsibility on the speaker. It is worth mentioning that some sentences can be
associated with more than one semantic label. For example, sentence S4 in Table 3 is labeled as a
negation claim and a conveyed claim. For such sentences another semantic analysis stage is
necessary to assign dominant semantic label and then reclassify it. Sentences falling in these four
linguistic categories in the test dataset are handled properly and reclassified during the post-
semantic analysis. Then the accuracy of the dataset is recalculated.
Figure 3. Results with Post Semantic Analysis
Figure 3 shows the results after the post-semantic analysis. The post semantic analysis has clearly
improved accuracy for all algorithms for the test dataset. The proposed approach, which
combines machine learning algorithms with post semantic analysis has better accuracy then the
machine learning approach alone. Although the linguistic approach currently uses only four
semantic labels in the classification process, it may be further enhanced by including additional
semantic labels to capture the fine semantic meanings carried by natural language sentences.
6. DISCUSSION
Social media networks have introduced a new era for the public to share ideas, opinions, and
debates. These shared data establish a significant source to detect opinion trends regarding
different topics. In Parallel, different methods are becoming efficient in analyzing attitudes of
complex data in social media. For example, artificial intelligence, sentiment analysis, data
mining, and machine learning would help by providing different means of information extraction
and knowledge acquisition regarding different topics. With people's tendency to exchange and
broadcast their opinions on social media platforms, and with the emergence of the COVID-19
pandemic, many legislative and health authorities have found themselves in the face of
pandemic-related rumors. These rumors affect and slow down the efforts to combat the
pandemic, as it consumes much time and effort to refute them. For that scenario, researchers and
decision-makers can benefit from this data in analyzing and studying several social, behavioral,
or organizational phenomena. Unlike the traditional knowledge management practices that rely
on human experts as a primary source of knowledge acquisition, harvesting and analyzing social
media data with the help of machine learning could represent the ultimate source from which the
knowledge is extracted. Harvesting and mining the COVID-19 Twitter data could provide
9. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
39
innovative insights and ideas on how to better manage the implications and apprehensions
associated with pandemic-related rumors. Such studies could provide some roadmap for potential
future alternative sources of knowledge and suggest future machine learning and system
involvement. In addition, media practitioners can benefit from such affordable technical solutions
to automate the process of refuting rumors and misinformation.
Figure 4. False tweets word clouds Figure 5. True tweets word clouds
In terms of results accuracy, the experiment shows that combining machine learning techniques
with semantic analysis can increase accuracy in detecting rumors in a dataset and avoid claims
that may false classify tweets. Text analytics could be enhanced using data visualization
techniques such as words cloud. Though a simple concept, word clouds are a good visualization
technique to communicate an overall picture of text contents [20]. A study [21] concluded that
word clouds are certainly an effective tool for text analysis if enhanced with further information
and powerful interaction techniques. Figure 4 and Figure 5 display word clouds generated from
the dataset [8] based on its class label. Yet, there are some limitations of this work. First, the
focus of this work was on Twitter content only due to the availability of the annotated dataset.
However, Twitter is not the only source that misinformation can spread. Second, the dataset is
relatively small, which impacts the splitting decision of the training and testing sets. Thus, the
proper and sufficient learning dataset of the system with more data can increase the system's
accuracy as a whole.
7. CONCLUSION
Analyzing Twitter news data using natural language processing, sentimental analysis, machine
learning, and artificial intelligence techniques could serve as critical tools to help protect people
from misinformation. With millions of contents posted every day on social media platforms,
relying on human fact-checkers to review stories on an individual basis is almost impossible.
Alternatively, there is an urgent need for a new way to quickly detect posts that may contain false
claims and automatically send them to fact-checkers to focus their time and expertise on fact-
checking new content. English text analysis had had a great deal of work on detecting rumors.
However, in Arabic, this field is still in an early stage where the lack of resources and language
nature confront huge challenges to the research. This study presents an Arabic misinformation
detection model that mines related Arabic texts from Twitter to detect fake news. For this study,
we preprocessed a dataset of 3.6k tweets related to Covid-19 claims. This dataset was then fed
into different machine learning classifiers to measure classifier accuracy in detecting
misinformation. The results showed that the best overall accuracy of the model in detecting
misinformation was 93%. Another dataset of Covid-19 related claims in Arabic was built to
examine the performance of our model. The results showed that the model's overall accuracy in
detecting the veracity of the new dataset was 87% at best. The results provided by the classifiers
10. International Journal of Computer Science & Information Technology (IJCSIT) Vol 14, No 3, June 2022
40
were analyzed further by a semantic analysis method to detect neutral sentences which can false
classification of tweets. The results show that this post semantic analysis phase has improved
accuracy of the results of the top result in the first phase to 92%.
REFERENCES
[1] Radcliffe and H. Abuhmaid, “Social media in the middle east: 2019 in review,” SSRN Electron. J.,
2020.
[2] ExtraDigital Ltd, www. extradigital. co. uk, “Prominent Arabic Social Media,” Extradigital.co.uk.
[Online]. Available: https://www.extradigital.co.uk/articles/arabic/social-media.html. [Accessed: 02-
May-2021].
[3] “Twitter: most-used languages 2013,” Statista.com. [Online]. Available:
https://www.statista.com/statistics/267129/most-used-languages-on-twitter/. [Accessed: 02-May-
2021].
[4] Alam et al., “Fighting the COVID-19 infodemic: Modeling the perspective of journalists, fact-
checkers, social media platforms, policy makers, and the society,” arXiv [cs.CL], 2020.
[5] P. Patwa et al., “Fighting an Infodemic: COVID-19 Fake News Dataset,” arXiv [cs.CL], 2020.
[6] S. M. Alzanin and A. M. Azmi, “Rumor detection in Arabic tweets using semi-supervised and
unsupervised expectation–maximization,” Knowl. Based Syst., vol. 185, no. 104945, p. 104945, 2019.
[7] M. Alkhair, K. Meftouh, K. Smaïli, and N. Othman, “An Arabic corpus of fake news: Collection,
analysis and classification,” in Communications in Computer and Information Science, Cham:
Springer International Publishing, 2019, pp. 292–302.
[8] Haouari, M. Hasanain, R. Suwaileh, and T. Elsayed, “ArCOV19-Rumors: Arabic COVID-19 Twitter
dataset for misinformation detection,” arXiv [cs.CL], 2020.
[9] M. J. Lazer et al., “The science of fake news,” Science, vol. 359, no. 6380, pp. 1094–1096, 2018.
[10] R. Vaithianathan, N. Jiang, T. Maloney, P. Nand, and E. Putnam-Hornstein, Developing predictive
risk models to support child maltreatment hotline screening decisions: Allegheny County
methodology and implementation [PDF. Auckland: Centre for Social Data Analytics, 2017.
[11] “Working to stop misinformation and false news,” Facebook.com. [Online]. Available:
https://www.facebook.com/formedia/blog/working-to-stop-misinformation-and-false-news.
[Accessed: 02-May-2021].
[12] Documenting the Now, Hydrator [Computer Software]. Retrieved from
https://github.com/docnow/hydrator. Accessed March, 2021.
[13] S. Larabi Marie-Sainte, N. Alalyani, S. Alotaibi, S. Ghouzali, and I. Abunadi, “Arabic natural
language processing and machine learning-based systems,” IEEE Access, vol. 7, pp. 7011–7020,
2019.
[14] S. Bird, E. Klein, and E. Loper, Natural language processing with python: Analyzing text with the
natural language toolkit. O’Reilly Media, 2009.
[15] Pedregosa et al., “Scikit-learn: Machine Learning in Python,” arXiv [cs.LG], 2012.
[16] I. Webb et al., “Logistic Regression,” in Encyclopedia of Machine Learning, Boston, MA: Springer
US, 2011, pp. 631–631.
[17] M. D. Buhmann et al., “Random Forests,” in Encyclopedia of Machine Learning, Boston, MA:
Springer US, 2011, pp. 828–828.
[18] I. Webb, E. Keogh, R. Miikkulainen, R. Miikkulainen, and M. Sebag, “Naïve Bayes,” in
Encyclopedia of Machine Learning, Boston, MA: Springer US, 2011, pp. 713–714.
[19] Cortes and V. Vapnik, “Support-vector networks,” Mach. Learn., vol. 20, no. 3, pp. 273–297, 1995.
[20] B. Y.-L. Kuo, T. Hentrich, B. M. Good, and M. D. Wilkinson, “Tag clouds for summarizing web
search results,” in Proceedings of the 16th international conference on World Wide Web - WWW ’07,
2007.
[21] F. Heimerl, S. Lohmann, S. Lange, and T. Ertl, “Word cloud explorer: Text analytics based on word
clouds,” in 47th Hawaii International Conference on System Sciences, 2014.
[22] J. Berri, M. Al-Khamis, Information Exploration Using Mobile Agents, WSEAS Transactions on
Computers, vol. 3, no. 3, 706-712, 2004.
[23] J. Berri, R. Benlamri, Y Atif, H. Khallouki, Web Hypermedia Resources Reuse and Integration for
On-Demand M-Learning, International Journal of Computer Science and Network Security, vol. 21,
no. 1, pp. 125-136, 2021