SlideShare a Scribd company logo
1 of 13
Download to read offline
AMUSED: An Annotation Framework of
Multi-modal Social Media Data
Gautam Kishore Shahi1
and Tim A. Majchrzak2
1
University of Duisburg-Essen, Germany
2
University of Agder, Norway
gautam.shahi@uni-due.de, timam@uia.no
Abstract. Social media nowadays is both an important news source
and used for spreading misinformation. Systematically studying social
media phenomena, however, has been challenging due to the lack of la-
belled data. This paper presents the semi-automated annotation frame-
work AMUSED for gathering multi-lingual multi-modal annotated data
from social networking sites. The framework is designed to mitigate the
workload in collecting and annotating social media data by cohesively
combining machine and human in the data collection process. From a
given list of news articles, AMUSED detects links to social media posts
and then downloads the data from the respective social networking sites
and assigns a label to it. The framework can fetch the annotated data
from multiple platforms like Twitter, YouTube, and Reddit. As a use
case, we have implemented AMUSED for collecting COVID-19 misin-
formation data from different social media sites from 8 077 fact-checked
articles into four different categories of Misinformation.
Keywords: Data Annotation ¡ Social media ¡ Misinformation¡ News
articles ¡ Fact-checking
1 Introduction
With the growth of users on different social media sites, social media have be-
come part of our lives. They play an essential role in making communication
easier and accessible. People and organisations use social media to share and
browse information, especially during the current pandemic; social media sites
get massive attention from users [33,23]. Braun and Tarleton [5] conducted a
study to analyse the public discourse on social media sites and news organisa-
tion. Social media sites allow getting more attention from the users for sharing
news or user-generated content. Several statistical or computational study has
been conducted using social media data [5]. But data gathering and its annota-
tion are challenging and financially costly [19].
Social media data analytics research poses challenges in data collection, data
sampling, data annotation, quality of the data, and bias in data [17]. Data an-
notation is the process of assigning a category to the data. Researchers annotate
social media data for research on hate speech, misinformation, online mental
arXiv:2010.00502v2
[cs.SI]
10
Aug
2021
2 Shahi and Majchrzak
health etc. For supervised machine learning, labelled data sets are required to
understand the input patterns[26]. To build a supervised or semi-supervised
model on social media data, researchers face two challenges- timely data collec-
tion and data annotation [30]. Timely data collection is essential because some
platforms either restrict data access or the post itself is deleted by social media
platforms or by the user [32]. Another problem stands with data annotation; it is
conducted either in an in-house fashion (within lab or organisation) or by using a
crowdsourced tool (like Amazon Mechanical Turk (AMT)) [4]. Both approaches
require a fair amount of effort to write the annotation guidelines along. There is
also a chance of wrongly labelled data leading to bias [10].
We propose a semi-automatic framework for data annotation from social me-
dia platforms to solve timely data collection and annotation. AMUSED gathers
labelled data from different social media platform in multiple formats (text,
image, video). It can get annotated data on social issues like misinformation,
hate speech or other critical social scenarios. AMUSED resolves bias in the data
(wrong label assigned by annotator). Our contribution is to provide a semi-
automatic approach for collecting labelled data from different social media sites
in multiple languages and data formats. Our framework can be applied in many
application domains for which it typically is hard to gather the data, for instance,
misinformation, mob lynching etc.
This paper is structured as follows. In Section 2 we discuss the background
of our work. We then present the work method of AMUSED in Section 3, In Sec-
tion 4 we give details on the implementation of AMUSED based on a case study.
We discuss our observations in Section 6 and draw a conclusion in Section 7.
2 Background
The following section describes the background on data annotation, types of
data on social media, and the problem of the current annotations technique.
2.1 Data Annotation
Much research has been published that uses social media data. Typically, it is
limited to a few social media platforms or language in a single work. Also, the
result is published with a limited amount of data. There are multiple reasons for
these limitations; one of the key reason is the availability of annotated data for
the research [36,2]. Chapman et al. [8] highlight the problem of getting labelled
data for an NLP related problem. A study is conducted on data quality and
the role of annotator in the performance of machine learning model. With poor
data, it hard to build a generalisable classifier [15].
Researchers are dependent on in-house or crowd-based data annotation. Re-
cently, Alam et al. [3] used a crowd-based annotation technique and asks people
to volunteer for data annotation, but there is no significant success in getting a
large number of labelled data. The current annotation technique is dependent
on the background expertise of the annotators. Finding past data on an incident
AMUSED Annotation Framework 3
like mob lynching is challenging because of data restrictions by social media
platforms. It requires looking at a massive number of posts and news articles,
leading to much manual work. In addition, billions of social media posts are
sampled to a few thousand posts for data annotation either by random sample
or keyword sampling, leading to sampling bias.
With in-house data annotation, it is challenging to hire an annotator with
background expertise in a domain. Another issue is the development of a codebook
with a proper explanation [13]. The entire process is financially costly and time-
taking [12]. The problem with crowd-based annotation tools like AMT is that
the low cost may result in the wrong labelling of data. Many annotators may
cheat, not properly performing the job, use robots, or answer randomly [14,25].
Since the emergence of social media as a news resource [7], people use this re-
source very differently. They may share news, state a personal opinion or commit
a social crime in the form of hate speech or cyberbullying [22]. The COVID-19
pandemic arguably has to lead to a surge in the spread of misinformation [28]
Nowadays, journalists cover some common issues like misinformation, mob lynch-
ing, and hate speech; they also link the social media post in the news articles [11].
To solve the problem of the data collection and its annotation, related social
media posts from news articles can be used. Labelling social media is then done
based on the news article’s contents. To get a reliable label, the credibility of the
news sources must be considered [21]. For example, a professional news website
registered with the International Fact-Checking Network [20] should, generally,
be rather creditable.
2.2 Data on Social Media Platforms
Social Media sites allow users to create and view posts in multiple formats.
Every day, billions of posts containing images, text, videos are shared on social
media sites such as Facebook, Twitter, YouTube and Instagram [1]. Data are
available in different formats, and each social media platform apply restriction
on data crawling. For instance, Facebook allows crawling data only related to
public posts and groups.
Giglietto et al. discuss the requirement of multi-modal data for the study of
social phenomenon [16]. Almost every social media platform allows user to create
or respond to the social media post in text. But each social media platform has a
different restriction on the length of the text. The content and the writing style
changes with the character limit of different social media platform. Images are
also common across different social media platforms. Platform have restriction
on the size of the image. Some platforms are primarily focused on video, whereas
some are multi-modal. Furthermore, for video, there are restrictions in terms of
duration. This influences the characteristics of usage.
2.3 Problems of Current Annotation Techniques
There are several problems with the current annotation approaches. First, social
media platforms restrict users when fetching data; for example, a user delete
4 Shahi and Majchrzak
Fig. 1. AMUSED: An Annotation Framework for Multi-modal Social Media data
the tweets or videos on YouTube. Without on-time crawling, data access is lost.
Second, if the volume of data is high, filtering based on several criteria like
keyword, date, location etc., is needed. This filtering degrades the data quality
by excluding much data. For example, if we sample data using hateful keywords
for hate speech, we might lose many hate speech tweets but do not contain any
hateful words.
Third, getting a good annotator is a difficult task. Annotation quality de-
pends on the background expertise of the person. For crowdsourcing, maintain-
ing annotation quality is complicated. Moreover, maintaining a good agreement
between multiple annotators is tedious. Fourth, the development of annotation
guidelines is tricky. Writing a good codebook requires domain knowledge and
consultation from experts.
Fifth, data annotation is costly and time-consuming [31]. Sixth, social media
is available in multiple languages, but much research is limited to English. Data
annotation in other languages, especially under-resourced languages, is difficult
due to the lack of experienced annotators.
3 Method
AMUSED’s elements are summarised in Figure 1. It follows nine steps.
Step 1: Domain Identification The first step is the identification of the
domain in which we want to gather the data. A domain could focus on a par-
ticular public discourse. For example, a domain could be fake news in the US
election, or hate speech in trending hashtags on Twitter. Domain selection helps
to find the relevant data sources.
AMUSED Annotation Framework 5
Element Definition
News ID Unique identifying ID of each news articles. We use an acronym for
news source and the number to identify a news articles.
Example: PY9
Newssource URL Unique identifier pointing to the news articles.
Example: https: // factcheck. afp. com/
video-actually-shows-anti-government-protest-belarus
News Title The title of the news article.
Example: A video shows a rally against coronavirus restrictions in
the British capital of London.
Published date Date when an article published in online media.
Example: 01 September 2020
News Class Each news articles published the fact check article with a class like
false, true, misleading. We store it in the class column.
Example: False
Published-By The name of the news websites
Example: AFP, TheQuint
Country Country where the news article is published.
Example: Australia
Language Language used for news article.
Example: English
Table 1. Description of attributes and their examples
Step 2: Data Source Data sources comprise news websites that mention a
particular topic. For example, many news websites have a separate section that
discusses the election or other ongoing issues.
Step 3: Web scraping AMUSED then crawls all news articles from news
websites using a Python-based crawler. We fetch details such as the published
date, author, location, news content (see Table 1).
Step 4: Language Identification After getting the details from the news
articles, we check its language. We use ISO 639-1 for naming the language. Based
on the language, we can further filter articles and apply a language-specific model
for finding insights.
Step 5: Social Media Link From the crawled data, we fetch the anchor tag
<a> mentioned in the news content. We then filter the hyperlinks to identify
social media platforms and fetch unique identifiers to the posts.
Step 6: Social Media Data Crawling We now fetch the data from the
respective social media platform. For this purpose, we built a crawler for each
social media platform, which consumes the unique identifiers obtained from the
previous step. For Twitter we used a Python crawler using Tweepy, which crawls
all details about a Tweet. We collect text, time, likes, retweet, user details such
as name, location, follower count. Similarly, we build our crawler for other plat-
forms. Due to the data restriction from Facebook and Instagram, we use Crowd-
tangle [34] to fetch data from Facebook and Instagram, but it only gives numer-
ical data like likes and followers.
Step 7: Data Labelling We assign labels to the social media data based
on the label assigned to the news articles by journalists. Often news articles
6 Shahi and Majchrzak
categorise a social media post, for example, like hate speech or propaganda.
We assign the label to social media post as class mentioned in the news article
as a class described by the journalist. For example, suppose a news article a
containing social media post s has been published by a journalist j, and journalist
j has described the social media post s to be misinformation. In that case, We
label the social media post s as misinformation. It will ease the workload by
getting the number of social media post check by a journalist.
Step 8: Human Verification To check the correctness, a human verifies
the assigned label to the social media post. If the label is wrongly assigned,
then data is removed from the corpus. This step assures that the collected social
media post contains the relevant post and correctly given label. A human can
verify the label of the randomly selected news articles.
Step 9: Data Enrichment We finally merge the social media data with the
details from the news articles. It helps to accumulate extra information, which
might allow for further analysis.
4 Implementation: A Case Study on Misinformation
While our framework allows for general application, understanding its merits is
best possible by applying it to a specific domain. AMUSED can be helpful for
several domains, but news companies are quite active in the domain of misinfor-
mation, especially during a crisis. Misinformation, often yet imprecisely referred
to as a piece of information that is shared unintentionally or by mistake, without
knowing the truthfulness of the content [27].
There is an increasing amount of Misinformation in the media, social media,
and other web sources; this has become a topic of much research attention [38].
Nowadays, more than 100 fact-checking websites are working to tackle the prob-
lem of misinformation [9].
People have spread vast amounts of misinformation during the COVID-19
pandemic and in relation to elections and disasters [18]. Due to the lack of
labelled data, it is challenging to make a proper analysis of the misinformation.
As a case study, we apply the AMUSED for data annotation for COVID-19
misinformation, following the steps illustrated in the prior section.
Step 1: Domain Identification Out of several possible application do-
mains, we consider the spread of misinformation in the context of COVID-19.
Misinformation likely worsens the negative effects of the pandemic [28]. The di-
rector of the World Health Organization (WHO) considers that we are not only
fighting with a pandemic but also an infodemic [35,37]. One of the fundamental
problems is the lack of sufficient corpus related to pandemic [27].
Step 2: Data Sources For data source, we analysed 25 fact-checking web-
sites and decided to use Poynter and Snopes. We choose Poynter because it has
a central data hub that collects data from more than 98 fact-checking websites,
while Snopes is not integrated with Poynter but has more than 300 fact-checked
articles on COVID-19.
AMUSED Annotation Framework 7
Step 3: Web Scraping In this step, we fetched all the news articles from
Poynter and Snopes.
Step 4: Language Detection We collected data in multiple languages like
English, German, and Hindi. To identify the language of the news article, we
have used langdetect, a Python-based library to detect the language of the news
articles. We used the textual content of new articles to check the language of the
news articles.
Step 5: Social Media Link In the next step, while doing HTML crawling,
we filter the URL from the parsed tree of the DOM (Document Object Model).
We analysed the URL pattern from different social media platforms and applied
keyword-based filtering from all hyperlinks in the DOM. For instance, For each
Tweet, Twitter follows a pattern twitter.com/user name/status/tweetid. So, in
the collection hyperlink, we searched for the keyword “twitter.com” and “status”.
This assures that we have collected the hyperlink referring to the tweet. This
process is shown in Figure 2.
Similarly, we followed the approach for other social media platforms like
Facebook and Instagram. We used the regex code to filter the unique ID for
each social media post in the next step.
Fig. 2. An Illustration of data collection from social media platform(Twitter) from a
news article [27]
Step 6: Social Media Data Crawling We now have the unique identifier
of each social media post. We built a Python-based program for crawling the
data from the respective social media platform. The summary is given in Table 2.
Step 7: Data Labelling For data labelling, we used the label assigned in
the news articles, then we mapped the social media post with their respective
news article and assigned the label to the social media post. For example, a
Tweet extracted from a news article is mapped to the class of the news article.
This process is shown in Figure 3.
8 Shahi and Majchrzak
Platform Posts Unique Text Image Text+Image Video
Facebook 5 799 3 200 1167 567 1 006 460
Instagram 385 197 - 106 41 52
Pinterest 5 3 - 3 0 0
Reddit 67 33 16 10 7 0
TikTok 43 18 - - - 18
Twitter 3 142 1 758 1300 116 143 199
Wikipedia 393 176 106 34 20 16
YouTube 2 087 (916) - - - 916
Table 2. Summary of data collected
Fig. 3. An Illustration for annotation of social media posting using the label mentioned
in the news article.
Step 8: Human Verification We manually checked each social media post
to assess the correctness of the process. We provided the annotator with all
necessary information about the class mapping and asked them to verify it. For
example, in Figure 3, a human open the news article using the newssource URL
and verifies the label assigned to the tweet. For COVID-19 misinformation, we
check the annotation by randomly choosing 100 social media posts from each
social media platform and verifying the label assigned to the social media post
and label mentioned in the news articles. We measured the inter-coder reliability
using Cohen’s kappa and got a value of 0.72-0.86, which is a good agreement.
We further normalised the data label into false, partially false, true and others
using the definitions mentioned in [27].
Step 9: Data Enrichment We then enriched the data by providing extra
information about the social media post. The first step is merging the social me-
dia post with the respective news article, and it includes additional information
like textual content, news source, author. The detailed analysis of the collected
data is discussed in the result section.
5 Results
For the use case of COVID-19 Misinformation, we identified Poynter and Snopes
as the data source, and we collected data from different social media platforms.
We found that around 51% of news articles linked their content to social media
websites. Overall, we have collected 8,077 fact-checked news articles from 105
countries in 40 languages. We have cleaned the hyperlinks collected using the
AMUSED framework and filtered the social media posts by removing the dupli-
cates using the unique identifier. Finally, we will release the data as open-source.
AMUSED Annotation Framework 9
SM Platform False Partially False Other True
Facebook 2,776 325 94 6
Instagram 166 28 2 1
Reddit 21 9 2 1
Twitter 1,318 234 50 13
Wikipedia 154 18 3 1
YouTube 739 164 13 0
Table 3. Summary of COVID-19 misinformation posts collected.
We plotted the data from those social media platform which has the total
number of post more than 25 unique posts in Table 3 because it depreciates
the plot distribution. We dropped the plot from Pinterest (3), Whatsapp (23),
Tiktok (25), Reddit (43). The plot shows that most of the social media posts are
from Facebook and Twitter, followed by YouTube, Wikipedia and Instagram.
Table 3 also presents the class distribution of these posts. Misinformation also
follows the COVID-19 situation in many countries because the number of social
media posts also decreased after June 2020. The possible reason could be either
that the spread of Misinformation is reduced or that fact-checking websites are
not focusing on this issue as during the early stage.
6 Discussion
Our study highlighted the process of fetching the labelled social media post from
news fact-checked articles. Usually, the fact-checking website links the social me-
dia post from multiple social media platforms. We tried to gather data from var-
ious social media platforms, but we found the maximum number of Facebook,
Twitter, and YouTube links. There are few unique posts from Reddit (21), Tik-
Tok (9) etc., which shows that fact-checker mainly focused on analysing content
from Facebook, Twitter, and YouTube.
Surprisingly there are only three unique posts from Pinterest, and there are
no data available from Gab, ShareChat, and Snapchat. However, Gab is well
known for harmful content, and people in their regional languages use ShareChat.
There are only three unique posts from Pinterest. Many people use Wikipedia as
a reliable source of information, but there are 393 links from Wikipedia. Hence,
overall fact-checking website is limited to some trending social media platforms
like Twitter or Facebook, while social media platforms like Gab, TikTok is fa-
mously famous for malformation or misinformation [6]. WhatsApp is an instant
messaging app used among friends or group of people. So, we only found some
hyperlink which links to the public WhatsApp group. To increase the visibility of
fact-checked articles, a journalist can also use schema.org vocabulary along with
the Microdata, RDFa, or JSON-LD formats to add details about Misinformation
to the news articles [29].
AMUSED requires some effort but still is beneficial compared to random
data annotation because we need to annotate thousands of social media posts.
Still, the chances of getting misinformation are significantly less.
10 Shahi and Majchrzak
Another aspect is the diversity of social media post on the different social me-
dia platforms. News articles often mention Facebook, Twitter, YouTube, yet only
seldom Instagram, Pinterest, Gab and Tiktok were not mentioned at all. The
reasons for this need to be explored. It would be interesting to study the propa-
gation of misinformation on different platforms like Tiktok and Gab in relation
to the news coverage they get. Such a cross-platform study would particularly
insightful with contemporary topics such as misinformation on COVID-19. Such
a cross-platform work could also be linked to classification models [26,24].
We have also analysed the multi-modality of the data on the social media
platform; the number of social media post is shown in Table 2. We further classify
the misinformation into four different categories, as discussed in step 8. The
amount of Misinformation as text is greater compared to video or image. Thus,
in Table 3 we present the textual misinformation into four different categories.
Apart from text, the misinformation is also shared as image, video or embedding
format like image-text.
While applying the AMUSED framework on the misinformation on COVID-
19, we found that misinformation spreads across multiple source platforms, but
it mainly circulated across Facebook, Twitter, YouTube. Our finding suggests
concentrating mitigation efforts onto these platforms.
7 Conclusion and Future Work
In this paper, we presented a semi-automatic framework for social media data
annotation. The framework can be applied to several domains like misinforma-
tion, mob lynching, and online abuse. As a part of the framework, we also used
a Python-based crawler for different social media websites. After data labelling,
the labels are cross-checked by a human, which ensures a two-step verification of
data annotation for the social media posts. We also enrich the social media post
by mapping it to the news article to gather more analysis about it. The data
enrichment will be able to provide additional information for the social media
post. We have implemented the proposed framework for collecting the misinfor-
mation post related to the COVID-19. One of the limitations of the framework
is that, presently, we do not address the multiple (possibly contradicting) labels
assigned by different fact-checkers over the same claim.
As future work, the framework can be extended for getting the annotated
data on other topics like hate speech, mob lynching etc. The framework will
be helpful in gathering annotated data for other domains from multiple social
media sites for further analysis.
AMUSED will decrease the labour cost and time for the data annotation
process. Our framework will also increase the data annotation quality because
we crawl the data from news articles published by an expert journalist.
References
1. Aggarwal, C.C.: An introduction to social network data analytics. In: Social net-
work data analytics, pp. 1–15. Springer (2011)
AMUSED Annotation Framework 11
2. Ahmed, S., Pasquier, M., Qadah, G.: Key issues in conducting sentiment analysis
on arabic social media text. In: 2013 9th International Conference on Innovations
in Information Technology (IIT). pp. 72–77. IEEE (2013)
3. Alam, F., Dalvi, F., Shaar, S., Durrani, N., Mubarak, H., Nikolov, A., Martino,
G.D.S., Abdelali, A., Sajjad, H., Darwish, K., et al.: Fighting the covid-19 info-
demic in social media: A holistic perspective and a call to arms. arXiv preprint
arXiv:2007.07996 (2020)
4. Aroyo, L., Welty, C.: Truth is a lie: Crowd truth and the seven myths of human
annotation. AI Magazine 36(1), 15–24 (2015)
5. Braun, J., Gillespie, T.: Hosting the public discourse, hosting the public: When
online news and social media converge. Journalism Practice 5(4), 383–398 (2011)
6. Brennen, J.S., Simon, F., Howard, s.N., Nielsen, R.K.: Types, sources, and claims
of covid-19 misinformation. Reuters Institute 7, 3–1 (2020)
7. Caumont, A.: 12 trends shaping digital news. Pew Research Center 16 (2013)
8. Chapman, W.W., Nadkarni, P.M., Hirschman, L., D’avolio, L.W., Savova, G.K.,
Uzuner, O.: Overcoming barriers to nlp for clinical text: the role of shared tasks
and the need for additional creative solutions (2011)
9. Cherubini, F., Graves, L.: The rise of fact-checking sites in europe. Reuters Institute
for the Study of Journalism, University of Oxford. http://reutersinsfitute. polifics.
ox. ac. uk/our-research/rise-fact-checking-sites-europe (2016)
10. Cook, P., Stevenson, S.: Automatically identifying changes in the semantic orien-
tation of words. In: LREC (2010)
11. Cui, X., Liu, Y.: How does online news curate linked sources? a content analysis
of three online news media. Journalism 18(7), 852–870 (2017)
12. Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of
human actions in video. In: 2009 IEEE 12th International Conference on Computer
Vision. pp. 1491–1498. IEEE (2009)
13. Forbush, T.B., Shen, S., South, B.R., DuValla, S.L.: What a catch! traits that
define good annotators. Studies in health technology and informatics 192, 1213–
1213 (2013)
14. Fort, K., Adda, G., Cohen, K.B.: Amazon mechanical turk: Gold mine or coal
mine? Computational Linguistics 37(2), 413–420 (2011)
15. Geiger, R.S., Yu, K., Yang, Y., Dai, M., Qiu, J., Tang, R., Huang, J.: Garbage
in, garbage out? do machine learning application papers in social computing re-
port where human-labeled training data comes from? In: Proceedings of the 2020
Conference on Fairness, Accountability, and Transparency. pp. 325–336 (2020)
16. Giglietto, F., Rossi, L., Bennato, D.: The open laboratory: Limits and possibili-
ties of using facebook, twitter, and youtube as a research data source. Journal of
technology in human services 30(3-4), 145–159 (2012)
17. Grant-Muller, S.M., Gal-Tzur, A., Minkov, E., Nocera, S., Kuflik, T., Shoor, I.:
Enhancing transport data collection through social media sources: methods, chal-
lenges and opportunities for textual data. IET Intelligent Transport Systems 9(4),
407–417 (2014)
18. Gupta, A., Lamba, H., Kumaraguru, P., Joshi, A.: Faking sandy: characterizing
and identifying fake images on twitter during hurricane sandy. In: Proceedings of
the 22nd international conference on World Wide Web. pp. 729–736 (2013)
19. Haertel, R.A.: Practical cost-conscious active learning for data annotation in
annotator-initiated environments. Brigham Young University-Provo (2013)
20. Institute, P.: The International Fact-Checking Network (2020), https://www.
poynter.org/ifcn/
12 Shahi and Majchrzak
21. Kohring, M., Matthes, J.: Trust in news media: Development and validation of a
multidimensional scale. Communication research 34(2), 231–252 (2007)
22. Mandl, T., Modha, S., Shahi, G.K., Jaiswal, A.K., Nandini, D., Patel, D., Ma-
jumder, P., Schäfer, J.: Overview of the HASOC track at FIRE 2020: Hate speech
and offensive content identification in indo-european languages. In: Mehta, P.,
Mandl, T., Majumder, P., Mitra, M. (eds.) Working Notes of FIRE 2020. CEUR
Workshop Proceedings, vol. 2826, pp. 87–111. CEUR-WS.org (2020)
23. McGahan, C., Katsion, J.: Secondary communication crisis: Social media news
information. Liberty University Research Week (2021)
24. Nandini, D., Capecci, E., Koefoed, L., Laña, I., Shahi, G.K., Kasabov, N.: Mod-
elling and analysis of temporal gene expression data using spiking neural net-
works. In: International Conference on Neural Information Processing. pp. 571–
581. Springer (2018)
25. Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through
crowdsourcing: Towards best practice guidelines. In: LREC. pp. 859–866 (2014)
26. Shahi, G.K., Bilbao, I., Capecci, E., Nandini, D., Choukri, M., Kasabov, N.: Anal-
ysis, classification and marker discovery of gene expression data with evolving
spiking neural networks. In: International Conference on Neural Information Pro-
cessing. pp. 517–527. Springer (2018)
27. Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of covid-19 mis-
information on twitter. Online social networks and media p. 100104 (2021)
28. Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news
dataset for covid-19. In: Workshop Proceedings of the 14th International AAAI
Conference on Web and Social Media (2020), http://workshop-proceedings.
icwsm.org/pdf/2020_14.pdf
29. Shahi, G.K., Nandini, D., Kumari, S.: Inducing schema. org markup from natural
language context. Kalpa Publications in Computing 10, 38–42 (2019)
30. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media:
A data mining perspective. ACM SIGKDD explorations newsletter 19(1), 22–36
(2017)
31. Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In:
2008 IEEE computer society conference on computer vision and pattern recognition
workshops. pp. 1–8. IEEE (2008)
32. Stieglitz, S., Mirbabaie, M., Ross, B., Neuberger, C.: Social media analytics–
challenges in topic discovery, data collection, and data preparation. International
journal of information management 39, 156–168 (2018)
33. Talwar, S., Dhir, A., Kaur, P., Zafar, N., Alrasheedy, M.: Why do people share
fake news? associations between the dark side of social media use and fake news
sharing behavior. Journal of Retailing and Consumer Services 51, 72–82 (2019)
34. Team, C.: Crowdtangle. facebook, menlo park, california, united states (2020)
35. The Guardian: The WHO v coronavirus: why it can’t handle the
pandemic (2020), https://www.theguardian.com/news/2020/apr/10/
world-health-organization-who-v-coronavirus-why-it-cant-handle-pandemic
36. Thorson, K., Driscoll, K., Ekdale, B., Edgerly, S., Thompson, L.G., Schrock, A.,
Swartz, L., Vraga, E.K., Wells, C.: Youtube, twitter and the occupy movement:
Connecting content and circulation practices. Information, Communication & So-
ciety 16(3), 421–451 (2013)
37. Zarocostas, J.: World Report How to fight an infodemic. The Lancet 395, 676
(2020). https://doi.org/10.1016/S0140-6736(20)30461-X
AMUSED Annotation Framework 13
38. Zhou, X., Zafarani, R.: Fake news: A survey of research, detection methods, and
opportunities. CoRR abs / 1812.00315 (2018), http://arxiv.org/abs/1812.
00315

More Related Content

Similar to AMUSED An Annotation Framework Of Multi-Modal Social Media Data

Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Piyush Malik
 
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...IRJET Journal
 
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYMACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYIAEME Publication
 
HADOOP based Recommendation Algorithm for Micro-video URL
HADOOP based Recommendation Algorithm for Micro-video URLHADOOP based Recommendation Algorithm for Micro-video URL
HADOOP based Recommendation Algorithm for Micro-video URLdbpublications
 
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...CSEIJJournal
 
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...CSEIJJournal
 
253 By Dr. Patricia Franks and Robert Smallwood .docx
253   By Dr. Patricia Franks and Robert Smallwood   .docx253   By Dr. Patricia Franks and Robert Smallwood   .docx
253 By Dr. Patricia Franks and Robert Smallwood .docxlorainedeserre
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisFarida Vis
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paperFiras Husseini
 
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyComprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyCSCJournals
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learningIAESIJAI
 
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profilescsandit
 
Scraping and clustering techniques
Scraping and clustering techniquesScraping and clustering techniques
Scraping and clustering techniquescsandit
 
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...ijtsrd
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...AIRCC Publishing Corporation
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...ijcsit
 
Terrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningTerrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningIRJET Journal
 
Organisational challenges of using social media marketing caliesch liebrich_2...
Organisational challenges of using social media marketing caliesch liebrich_2...Organisational challenges of using social media marketing caliesch liebrich_2...
Organisational challenges of using social media marketing caliesch liebrich_2...www.rw-oberwallis.ch
 
Ijcatr04041017
Ijcatr04041017Ijcatr04041017
Ijcatr04041017Editor IJCATR
 

Similar to AMUSED An Annotation Framework Of Multi-Modal Social Media Data (20)

Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...Full Paper: Analytics: Key to go from generating big data to deriving busines...
Full Paper: Analytics: Key to go from generating big data to deriving busines...
 
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
Social Media Privacy Protection for Blockchain with Cyber Security Prediction...
 
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDYMACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
MACHINE LEARNING ALGORITHMS FOR HETEROGENEOUS DATA: A COMPARATIVE STUDY
 
HADOOP based Recommendation Algorithm for Micro-video URL
HADOOP based Recommendation Algorithm for Micro-video URLHADOOP based Recommendation Algorithm for Micro-video URL
HADOOP based Recommendation Algorithm for Micro-video URL
 
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...
Exploratory Data Analysis and Feature Selection for Social Media Hackers Pred...
 
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...
EXPLORATORY DATA ANALYSIS AND FEATURE SELECTION FOR SOCIAL MEDIA HACKERS PRED...
 
253 By Dr. Patricia Franks and Robert Smallwood .docx
253   By Dr. Patricia Franks and Robert Smallwood   .docx253   By Dr. Patricia Franks and Robert Smallwood   .docx
253 By Dr. Patricia Franks and Robert Smallwood .docx
 
Researching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media AnalysisResearching Social Media – Big Data and Social Media Analysis
Researching Social Media – Big Data and Social Media Analysis
 
Knime social media_white_paper
Knime social media_white_paperKnime social media_white_paper
Knime social media_white_paper
 
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
[IJET-V2I1P14] Authors:Aditi Verma, Rachana Agarwal, Sameer Bardia, Simran Sh...
 
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage TechnologyComprehensive Social Media Security Analysis & XKeyscore Espionage Technology
Comprehensive Social Media Security Analysis & XKeyscore Espionage Technology
 
Combating propaganda texts using transfer learning
Combating propaganda texts using transfer learningCombating propaganda texts using transfer learning
Combating propaganda texts using transfer learning
 
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin ProfilesScraping and Clustering Techniques for the Characterization of Linkedin Profiles
Scraping and Clustering Techniques for the Characterization of Linkedin Profiles
 
Scraping and clustering techniques
Scraping and clustering techniquesScraping and clustering techniques
Scraping and clustering techniques
 
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
Retrieving Hidden Friends a Collusion Privacy Attack against Online Friend Se...
 
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
Increasing the Investment’s Opportunities in Kingdom of Saudi Arabia By Study...
 
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
INCREASING THE INVESTMENT’S OPPORTUNITIES IN KINGDOM OF SAUDI ARABIA BY STUDY...
 
Terrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data MiningTerrorism Analysis through Social Media using Data Mining
Terrorism Analysis through Social Media using Data Mining
 
Organisational challenges of using social media marketing caliesch liebrich_2...
Organisational challenges of using social media marketing caliesch liebrich_2...Organisational challenges of using social media marketing caliesch liebrich_2...
Organisational challenges of using social media marketing caliesch liebrich_2...
 
Ijcatr04041017
Ijcatr04041017Ijcatr04041017
Ijcatr04041017
 

More from Christina Bauer

10 Best Printable Primary Writing Paper Template
10 Best Printable Primary Writing Paper Template10 Best Printable Primary Writing Paper Template
10 Best Printable Primary Writing Paper TemplateChristina Bauer
 
Essay Writing Format For Kids. Browse Printable Ess
Essay Writing Format For Kids. Browse Printable EssEssay Writing Format For Kids. Browse Printable Ess
Essay Writing Format For Kids. Browse Printable EssChristina Bauer
 
24 Page Set Of Winter Themed Writing Paper By
24 Page Set Of Winter Themed Writing Paper By24 Page Set Of Winter Themed Writing Paper By
24 Page Set Of Winter Themed Writing Paper ByChristina Bauer
 
Summarize Paragraph In Short. Online assignment writing service.
Summarize Paragraph In Short. Online assignment writing service.Summarize Paragraph In Short. Online assignment writing service.
Summarize Paragraph In Short. Online assignment writing service.Christina Bauer
 
How To Write About The Theme Of A Book Coverl
How To Write About The Theme Of A Book CoverlHow To Write About The Theme Of A Book Coverl
How To Write About The Theme Of A Book CoverlChristina Bauer
 
Art Thesis Examples - What Is Art Essay Examples Wh
Art Thesis Examples - What Is Art Essay Examples WhArt Thesis Examples - What Is Art Essay Examples Wh
Art Thesis Examples - What Is Art Essay Examples WhChristina Bauer
 
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.Writing Process Essay Pdf. 19 Ex. Online assignment writing service.
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.Christina Bauer
 
Thank You Writing Paper. Online assignment writing service.
Thank You Writing Paper. Online assignment writing service.Thank You Writing Paper. Online assignment writing service.
Thank You Writing Paper. Online assignment writing service.Christina Bauer
 
Great Personal Narratives. How To Write A Personal N
Great Personal Narratives. How To Write A Personal NGreat Personal Narratives. How To Write A Personal N
Great Personal Narratives. How To Write A Personal NChristina Bauer
 
How To Insert A Citation In Mla Style - Lasgrace
How To Insert A Citation In Mla Style - LasgraceHow To Insert A Citation In Mla Style - Lasgrace
How To Insert A Citation In Mla Style - LasgraceChristina Bauer
 
Autobiography Outline Template For Middle School
Autobiography Outline Template For Middle SchoolAutobiography Outline Template For Middle School
Autobiography Outline Template For Middle SchoolChristina Bauer
 
020 Introducing Myself Essay Self Introduction In
020 Introducing Myself Essay Self Introduction In020 Introducing Myself Essay Self Introduction In
020 Introducing Myself Essay Self Introduction InChristina Bauer
 
Ghost Writing And Craft By Its MoNiques World Teac
Ghost Writing And Craft By Its MoNiques World TeacGhost Writing And Craft By Its MoNiques World Teac
Ghost Writing And Craft By Its MoNiques World TeacChristina Bauer
 
Descriptive Paragraph Format. Online assignment writing service.
Descriptive Paragraph Format. Online assignment writing service.Descriptive Paragraph Format. Online assignment writing service.
Descriptive Paragraph Format. Online assignment writing service.Christina Bauer
 
Thesis Psychology. Online assignment writing service.
Thesis Psychology. Online assignment writing service.Thesis Psychology. Online assignment writing service.
Thesis Psychology. Online assignment writing service.Christina Bauer
 
Paperback Writer Partitions The Beat. Online assignment writing service.
Paperback Writer Partitions The Beat. Online assignment writing service.Paperback Writer Partitions The Beat. Online assignment writing service.
Paperback Writer Partitions The Beat. Online assignment writing service.Christina Bauer
 
Argument Analysis - Excelsior College OWL - E
Argument Analysis - Excelsior College OWL - EArgument Analysis - Excelsior College OWL - E
Argument Analysis - Excelsior College OWL - EChristina Bauer
 
Short Essay For School Students O. Online assignment writing service.
Short Essay For School Students O. Online assignment writing service.Short Essay For School Students O. Online assignment writing service.
Short Essay For School Students O. Online assignment writing service.Christina Bauer
 
My First Day At Secondary School. Online assignment writing service.
My First Day At Secondary School. Online assignment writing service.My First Day At Secondary School. Online assignment writing service.
My First Day At Secondary School. Online assignment writing service.Christina Bauer
 
Famous Quotes For Essays. 170 Writing Quotes By Famous Aut
Famous Quotes For Essays. 170 Writing Quotes By Famous AutFamous Quotes For Essays. 170 Writing Quotes By Famous Aut
Famous Quotes For Essays. 170 Writing Quotes By Famous AutChristina Bauer
 

More from Christina Bauer (20)

10 Best Printable Primary Writing Paper Template
10 Best Printable Primary Writing Paper Template10 Best Printable Primary Writing Paper Template
10 Best Printable Primary Writing Paper Template
 
Essay Writing Format For Kids. Browse Printable Ess
Essay Writing Format For Kids. Browse Printable EssEssay Writing Format For Kids. Browse Printable Ess
Essay Writing Format For Kids. Browse Printable Ess
 
24 Page Set Of Winter Themed Writing Paper By
24 Page Set Of Winter Themed Writing Paper By24 Page Set Of Winter Themed Writing Paper By
24 Page Set Of Winter Themed Writing Paper By
 
Summarize Paragraph In Short. Online assignment writing service.
Summarize Paragraph In Short. Online assignment writing service.Summarize Paragraph In Short. Online assignment writing service.
Summarize Paragraph In Short. Online assignment writing service.
 
How To Write About The Theme Of A Book Coverl
How To Write About The Theme Of A Book CoverlHow To Write About The Theme Of A Book Coverl
How To Write About The Theme Of A Book Coverl
 
Art Thesis Examples - What Is Art Essay Examples Wh
Art Thesis Examples - What Is Art Essay Examples WhArt Thesis Examples - What Is Art Essay Examples Wh
Art Thesis Examples - What Is Art Essay Examples Wh
 
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.Writing Process Essay Pdf. 19 Ex. Online assignment writing service.
Writing Process Essay Pdf. 19 Ex. Online assignment writing service.
 
Thank You Writing Paper. Online assignment writing service.
Thank You Writing Paper. Online assignment writing service.Thank You Writing Paper. Online assignment writing service.
Thank You Writing Paper. Online assignment writing service.
 
Great Personal Narratives. How To Write A Personal N
Great Personal Narratives. How To Write A Personal NGreat Personal Narratives. How To Write A Personal N
Great Personal Narratives. How To Write A Personal N
 
How To Insert A Citation In Mla Style - Lasgrace
How To Insert A Citation In Mla Style - LasgraceHow To Insert A Citation In Mla Style - Lasgrace
How To Insert A Citation In Mla Style - Lasgrace
 
Autobiography Outline Template For Middle School
Autobiography Outline Template For Middle SchoolAutobiography Outline Template For Middle School
Autobiography Outline Template For Middle School
 
020 Introducing Myself Essay Self Introduction In
020 Introducing Myself Essay Self Introduction In020 Introducing Myself Essay Self Introduction In
020 Introducing Myself Essay Self Introduction In
 
Ghost Writing And Craft By Its MoNiques World Teac
Ghost Writing And Craft By Its MoNiques World TeacGhost Writing And Craft By Its MoNiques World Teac
Ghost Writing And Craft By Its MoNiques World Teac
 
Descriptive Paragraph Format. Online assignment writing service.
Descriptive Paragraph Format. Online assignment writing service.Descriptive Paragraph Format. Online assignment writing service.
Descriptive Paragraph Format. Online assignment writing service.
 
Thesis Psychology. Online assignment writing service.
Thesis Psychology. Online assignment writing service.Thesis Psychology. Online assignment writing service.
Thesis Psychology. Online assignment writing service.
 
Paperback Writer Partitions The Beat. Online assignment writing service.
Paperback Writer Partitions The Beat. Online assignment writing service.Paperback Writer Partitions The Beat. Online assignment writing service.
Paperback Writer Partitions The Beat. Online assignment writing service.
 
Argument Analysis - Excelsior College OWL - E
Argument Analysis - Excelsior College OWL - EArgument Analysis - Excelsior College OWL - E
Argument Analysis - Excelsior College OWL - E
 
Short Essay For School Students O. Online assignment writing service.
Short Essay For School Students O. Online assignment writing service.Short Essay For School Students O. Online assignment writing service.
Short Essay For School Students O. Online assignment writing service.
 
My First Day At Secondary School. Online assignment writing service.
My First Day At Secondary School. Online assignment writing service.My First Day At Secondary School. Online assignment writing service.
My First Day At Secondary School. Online assignment writing service.
 
Famous Quotes For Essays. 170 Writing Quotes By Famous Aut
Famous Quotes For Essays. 170 Writing Quotes By Famous AutFamous Quotes For Essays. 170 Writing Quotes By Famous Aut
Famous Quotes For Essays. 170 Writing Quotes By Famous Aut
 

Recently uploaded

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxJisc
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibitjbellavia9
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsMebane Rash
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Pooja Bhuva
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxJisc
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jisc
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxAreebaZafar22
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxDr. Ravikiran H M Gowda
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17Celine George
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxAmanpreet Kaur
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfagholdier
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...pradhanghanshyam7136
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...ZurliaSoop
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsKarakKing
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.pptRamjanShidvankar
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfSherif Taha
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxEsquimalt MFRC
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSCeline George
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxmarlenawright1
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseAnaAcapella
 

Recently uploaded (20)

Wellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptxWellbeing inclusion and digital dystopias.pptx
Wellbeing inclusion and digital dystopias.pptx
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
On National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan FellowsOn National Teacher Day, meet the 2024-25 Kenan Fellows
On National Teacher Day, meet the 2024-25 Kenan Fellows
 
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
Beyond_Borders_Understanding_Anime_and_Manga_Fandom_A_Comprehensive_Audience_...
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptxSKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
SKILL OF INTRODUCING THE LESSON MICRO SKILLS.pptx
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Food safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdfFood safety_Challenges food safety laboratories_.pdf
Food safety_Challenges food safety laboratories_.pdf
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 

AMUSED An Annotation Framework Of Multi-Modal Social Media Data

  • 1. AMUSED: An Annotation Framework of Multi-modal Social Media Data Gautam Kishore Shahi1 and Tim A. Majchrzak2 1 University of Duisburg-Essen, Germany 2 University of Agder, Norway gautam.shahi@uni-due.de, timam@uia.no Abstract. Social media nowadays is both an important news source and used for spreading misinformation. Systematically studying social media phenomena, however, has been challenging due to the lack of la- belled data. This paper presents the semi-automated annotation frame- work AMUSED for gathering multi-lingual multi-modal annotated data from social networking sites. The framework is designed to mitigate the workload in collecting and annotating social media data by cohesively combining machine and human in the data collection process. From a given list of news articles, AMUSED detects links to social media posts and then downloads the data from the respective social networking sites and assigns a label to it. The framework can fetch the annotated data from multiple platforms like Twitter, YouTube, and Reddit. As a use case, we have implemented AMUSED for collecting COVID-19 misin- formation data from different social media sites from 8 077 fact-checked articles into four different categories of Misinformation. Keywords: Data Annotation ¡ Social media ¡ Misinformation¡ News articles ¡ Fact-checking 1 Introduction With the growth of users on different social media sites, social media have be- come part of our lives. They play an essential role in making communication easier and accessible. People and organisations use social media to share and browse information, especially during the current pandemic; social media sites get massive attention from users [33,23]. Braun and Tarleton [5] conducted a study to analyse the public discourse on social media sites and news organisa- tion. Social media sites allow getting more attention from the users for sharing news or user-generated content. Several statistical or computational study has been conducted using social media data [5]. But data gathering and its annota- tion are challenging and financially costly [19]. Social media data analytics research poses challenges in data collection, data sampling, data annotation, quality of the data, and bias in data [17]. Data an- notation is the process of assigning a category to the data. Researchers annotate social media data for research on hate speech, misinformation, online mental arXiv:2010.00502v2 [cs.SI] 10 Aug 2021
  • 2. 2 Shahi and Majchrzak health etc. For supervised machine learning, labelled data sets are required to understand the input patterns[26]. To build a supervised or semi-supervised model on social media data, researchers face two challenges- timely data collec- tion and data annotation [30]. Timely data collection is essential because some platforms either restrict data access or the post itself is deleted by social media platforms or by the user [32]. Another problem stands with data annotation; it is conducted either in an in-house fashion (within lab or organisation) or by using a crowdsourced tool (like Amazon Mechanical Turk (AMT)) [4]. Both approaches require a fair amount of effort to write the annotation guidelines along. There is also a chance of wrongly labelled data leading to bias [10]. We propose a semi-automatic framework for data annotation from social me- dia platforms to solve timely data collection and annotation. AMUSED gathers labelled data from different social media platform in multiple formats (text, image, video). It can get annotated data on social issues like misinformation, hate speech or other critical social scenarios. AMUSED resolves bias in the data (wrong label assigned by annotator). Our contribution is to provide a semi- automatic approach for collecting labelled data from different social media sites in multiple languages and data formats. Our framework can be applied in many application domains for which it typically is hard to gather the data, for instance, misinformation, mob lynching etc. This paper is structured as follows. In Section 2 we discuss the background of our work. We then present the work method of AMUSED in Section 3, In Sec- tion 4 we give details on the implementation of AMUSED based on a case study. We discuss our observations in Section 6 and draw a conclusion in Section 7. 2 Background The following section describes the background on data annotation, types of data on social media, and the problem of the current annotations technique. 2.1 Data Annotation Much research has been published that uses social media data. Typically, it is limited to a few social media platforms or language in a single work. Also, the result is published with a limited amount of data. There are multiple reasons for these limitations; one of the key reason is the availability of annotated data for the research [36,2]. Chapman et al. [8] highlight the problem of getting labelled data for an NLP related problem. A study is conducted on data quality and the role of annotator in the performance of machine learning model. With poor data, it hard to build a generalisable classifier [15]. Researchers are dependent on in-house or crowd-based data annotation. Re- cently, Alam et al. [3] used a crowd-based annotation technique and asks people to volunteer for data annotation, but there is no significant success in getting a large number of labelled data. The current annotation technique is dependent on the background expertise of the annotators. Finding past data on an incident
  • 3. AMUSED Annotation Framework 3 like mob lynching is challenging because of data restrictions by social media platforms. It requires looking at a massive number of posts and news articles, leading to much manual work. In addition, billions of social media posts are sampled to a few thousand posts for data annotation either by random sample or keyword sampling, leading to sampling bias. With in-house data annotation, it is challenging to hire an annotator with background expertise in a domain. Another issue is the development of a codebook with a proper explanation [13]. The entire process is financially costly and time- taking [12]. The problem with crowd-based annotation tools like AMT is that the low cost may result in the wrong labelling of data. Many annotators may cheat, not properly performing the job, use robots, or answer randomly [14,25]. Since the emergence of social media as a news resource [7], people use this re- source very differently. They may share news, state a personal opinion or commit a social crime in the form of hate speech or cyberbullying [22]. The COVID-19 pandemic arguably has to lead to a surge in the spread of misinformation [28] Nowadays, journalists cover some common issues like misinformation, mob lynch- ing, and hate speech; they also link the social media post in the news articles [11]. To solve the problem of the data collection and its annotation, related social media posts from news articles can be used. Labelling social media is then done based on the news article’s contents. To get a reliable label, the credibility of the news sources must be considered [21]. For example, a professional news website registered with the International Fact-Checking Network [20] should, generally, be rather creditable. 2.2 Data on Social Media Platforms Social Media sites allow users to create and view posts in multiple formats. Every day, billions of posts containing images, text, videos are shared on social media sites such as Facebook, Twitter, YouTube and Instagram [1]. Data are available in different formats, and each social media platform apply restriction on data crawling. For instance, Facebook allows crawling data only related to public posts and groups. Giglietto et al. discuss the requirement of multi-modal data for the study of social phenomenon [16]. Almost every social media platform allows user to create or respond to the social media post in text. But each social media platform has a different restriction on the length of the text. The content and the writing style changes with the character limit of different social media platform. Images are also common across different social media platforms. Platform have restriction on the size of the image. Some platforms are primarily focused on video, whereas some are multi-modal. Furthermore, for video, there are restrictions in terms of duration. This influences the characteristics of usage. 2.3 Problems of Current Annotation Techniques There are several problems with the current annotation approaches. First, social media platforms restrict users when fetching data; for example, a user delete
  • 4. 4 Shahi and Majchrzak Fig. 1. AMUSED: An Annotation Framework for Multi-modal Social Media data the tweets or videos on YouTube. Without on-time crawling, data access is lost. Second, if the volume of data is high, filtering based on several criteria like keyword, date, location etc., is needed. This filtering degrades the data quality by excluding much data. For example, if we sample data using hateful keywords for hate speech, we might lose many hate speech tweets but do not contain any hateful words. Third, getting a good annotator is a difficult task. Annotation quality de- pends on the background expertise of the person. For crowdsourcing, maintain- ing annotation quality is complicated. Moreover, maintaining a good agreement between multiple annotators is tedious. Fourth, the development of annotation guidelines is tricky. Writing a good codebook requires domain knowledge and consultation from experts. Fifth, data annotation is costly and time-consuming [31]. Sixth, social media is available in multiple languages, but much research is limited to English. Data annotation in other languages, especially under-resourced languages, is difficult due to the lack of experienced annotators. 3 Method AMUSED’s elements are summarised in Figure 1. It follows nine steps. Step 1: Domain Identification The first step is the identification of the domain in which we want to gather the data. A domain could focus on a par- ticular public discourse. For example, a domain could be fake news in the US election, or hate speech in trending hashtags on Twitter. Domain selection helps to find the relevant data sources.
  • 5. AMUSED Annotation Framework 5 Element Definition News ID Unique identifying ID of each news articles. We use an acronym for news source and the number to identify a news articles. Example: PY9 Newssource URL Unique identifier pointing to the news articles. Example: https: // factcheck. afp. com/ video-actually-shows-anti-government-protest-belarus News Title The title of the news article. Example: A video shows a rally against coronavirus restrictions in the British capital of London. Published date Date when an article published in online media. Example: 01 September 2020 News Class Each news articles published the fact check article with a class like false, true, misleading. We store it in the class column. Example: False Published-By The name of the news websites Example: AFP, TheQuint Country Country where the news article is published. Example: Australia Language Language used for news article. Example: English Table 1. Description of attributes and their examples Step 2: Data Source Data sources comprise news websites that mention a particular topic. For example, many news websites have a separate section that discusses the election or other ongoing issues. Step 3: Web scraping AMUSED then crawls all news articles from news websites using a Python-based crawler. We fetch details such as the published date, author, location, news content (see Table 1). Step 4: Language Identification After getting the details from the news articles, we check its language. We use ISO 639-1 for naming the language. Based on the language, we can further filter articles and apply a language-specific model for finding insights. Step 5: Social Media Link From the crawled data, we fetch the anchor tag <a> mentioned in the news content. We then filter the hyperlinks to identify social media platforms and fetch unique identifiers to the posts. Step 6: Social Media Data Crawling We now fetch the data from the respective social media platform. For this purpose, we built a crawler for each social media platform, which consumes the unique identifiers obtained from the previous step. For Twitter we used a Python crawler using Tweepy, which crawls all details about a Tweet. We collect text, time, likes, retweet, user details such as name, location, follower count. Similarly, we build our crawler for other plat- forms. Due to the data restriction from Facebook and Instagram, we use Crowd- tangle [34] to fetch data from Facebook and Instagram, but it only gives numer- ical data like likes and followers. Step 7: Data Labelling We assign labels to the social media data based on the label assigned to the news articles by journalists. Often news articles
  • 6. 6 Shahi and Majchrzak categorise a social media post, for example, like hate speech or propaganda. We assign the label to social media post as class mentioned in the news article as a class described by the journalist. For example, suppose a news article a containing social media post s has been published by a journalist j, and journalist j has described the social media post s to be misinformation. In that case, We label the social media post s as misinformation. It will ease the workload by getting the number of social media post check by a journalist. Step 8: Human Verification To check the correctness, a human verifies the assigned label to the social media post. If the label is wrongly assigned, then data is removed from the corpus. This step assures that the collected social media post contains the relevant post and correctly given label. A human can verify the label of the randomly selected news articles. Step 9: Data Enrichment We finally merge the social media data with the details from the news articles. It helps to accumulate extra information, which might allow for further analysis. 4 Implementation: A Case Study on Misinformation While our framework allows for general application, understanding its merits is best possible by applying it to a specific domain. AMUSED can be helpful for several domains, but news companies are quite active in the domain of misinfor- mation, especially during a crisis. Misinformation, often yet imprecisely referred to as a piece of information that is shared unintentionally or by mistake, without knowing the truthfulness of the content [27]. There is an increasing amount of Misinformation in the media, social media, and other web sources; this has become a topic of much research attention [38]. Nowadays, more than 100 fact-checking websites are working to tackle the prob- lem of misinformation [9]. People have spread vast amounts of misinformation during the COVID-19 pandemic and in relation to elections and disasters [18]. Due to the lack of labelled data, it is challenging to make a proper analysis of the misinformation. As a case study, we apply the AMUSED for data annotation for COVID-19 misinformation, following the steps illustrated in the prior section. Step 1: Domain Identification Out of several possible application do- mains, we consider the spread of misinformation in the context of COVID-19. Misinformation likely worsens the negative effects of the pandemic [28]. The di- rector of the World Health Organization (WHO) considers that we are not only fighting with a pandemic but also an infodemic [35,37]. One of the fundamental problems is the lack of sufficient corpus related to pandemic [27]. Step 2: Data Sources For data source, we analysed 25 fact-checking web- sites and decided to use Poynter and Snopes. We choose Poynter because it has a central data hub that collects data from more than 98 fact-checking websites, while Snopes is not integrated with Poynter but has more than 300 fact-checked articles on COVID-19.
  • 7. AMUSED Annotation Framework 7 Step 3: Web Scraping In this step, we fetched all the news articles from Poynter and Snopes. Step 4: Language Detection We collected data in multiple languages like English, German, and Hindi. To identify the language of the news article, we have used langdetect, a Python-based library to detect the language of the news articles. We used the textual content of new articles to check the language of the news articles. Step 5: Social Media Link In the next step, while doing HTML crawling, we filter the URL from the parsed tree of the DOM (Document Object Model). We analysed the URL pattern from different social media platforms and applied keyword-based filtering from all hyperlinks in the DOM. For instance, For each Tweet, Twitter follows a pattern twitter.com/user name/status/tweetid. So, in the collection hyperlink, we searched for the keyword “twitter.com” and “status”. This assures that we have collected the hyperlink referring to the tweet. This process is shown in Figure 2. Similarly, we followed the approach for other social media platforms like Facebook and Instagram. We used the regex code to filter the unique ID for each social media post in the next step. Fig. 2. An Illustration of data collection from social media platform(Twitter) from a news article [27] Step 6: Social Media Data Crawling We now have the unique identifier of each social media post. We built a Python-based program for crawling the data from the respective social media platform. The summary is given in Table 2. Step 7: Data Labelling For data labelling, we used the label assigned in the news articles, then we mapped the social media post with their respective news article and assigned the label to the social media post. For example, a Tweet extracted from a news article is mapped to the class of the news article. This process is shown in Figure 3.
  • 8. 8 Shahi and Majchrzak Platform Posts Unique Text Image Text+Image Video Facebook 5 799 3 200 1167 567 1 006 460 Instagram 385 197 - 106 41 52 Pinterest 5 3 - 3 0 0 Reddit 67 33 16 10 7 0 TikTok 43 18 - - - 18 Twitter 3 142 1 758 1300 116 143 199 Wikipedia 393 176 106 34 20 16 YouTube 2 087 (916) - - - 916 Table 2. Summary of data collected Fig. 3. An Illustration for annotation of social media posting using the label mentioned in the news article. Step 8: Human Verification We manually checked each social media post to assess the correctness of the process. We provided the annotator with all necessary information about the class mapping and asked them to verify it. For example, in Figure 3, a human open the news article using the newssource URL and verifies the label assigned to the tweet. For COVID-19 misinformation, we check the annotation by randomly choosing 100 social media posts from each social media platform and verifying the label assigned to the social media post and label mentioned in the news articles. We measured the inter-coder reliability using Cohen’s kappa and got a value of 0.72-0.86, which is a good agreement. We further normalised the data label into false, partially false, true and others using the definitions mentioned in [27]. Step 9: Data Enrichment We then enriched the data by providing extra information about the social media post. The first step is merging the social me- dia post with the respective news article, and it includes additional information like textual content, news source, author. The detailed analysis of the collected data is discussed in the result section. 5 Results For the use case of COVID-19 Misinformation, we identified Poynter and Snopes as the data source, and we collected data from different social media platforms. We found that around 51% of news articles linked their content to social media websites. Overall, we have collected 8,077 fact-checked news articles from 105 countries in 40 languages. We have cleaned the hyperlinks collected using the AMUSED framework and filtered the social media posts by removing the dupli- cates using the unique identifier. Finally, we will release the data as open-source.
  • 9. AMUSED Annotation Framework 9 SM Platform False Partially False Other True Facebook 2,776 325 94 6 Instagram 166 28 2 1 Reddit 21 9 2 1 Twitter 1,318 234 50 13 Wikipedia 154 18 3 1 YouTube 739 164 13 0 Table 3. Summary of COVID-19 misinformation posts collected. We plotted the data from those social media platform which has the total number of post more than 25 unique posts in Table 3 because it depreciates the plot distribution. We dropped the plot from Pinterest (3), Whatsapp (23), Tiktok (25), Reddit (43). The plot shows that most of the social media posts are from Facebook and Twitter, followed by YouTube, Wikipedia and Instagram. Table 3 also presents the class distribution of these posts. Misinformation also follows the COVID-19 situation in many countries because the number of social media posts also decreased after June 2020. The possible reason could be either that the spread of Misinformation is reduced or that fact-checking websites are not focusing on this issue as during the early stage. 6 Discussion Our study highlighted the process of fetching the labelled social media post from news fact-checked articles. Usually, the fact-checking website links the social me- dia post from multiple social media platforms. We tried to gather data from var- ious social media platforms, but we found the maximum number of Facebook, Twitter, and YouTube links. There are few unique posts from Reddit (21), Tik- Tok (9) etc., which shows that fact-checker mainly focused on analysing content from Facebook, Twitter, and YouTube. Surprisingly there are only three unique posts from Pinterest, and there are no data available from Gab, ShareChat, and Snapchat. However, Gab is well known for harmful content, and people in their regional languages use ShareChat. There are only three unique posts from Pinterest. Many people use Wikipedia as a reliable source of information, but there are 393 links from Wikipedia. Hence, overall fact-checking website is limited to some trending social media platforms like Twitter or Facebook, while social media platforms like Gab, TikTok is fa- mously famous for malformation or misinformation [6]. WhatsApp is an instant messaging app used among friends or group of people. So, we only found some hyperlink which links to the public WhatsApp group. To increase the visibility of fact-checked articles, a journalist can also use schema.org vocabulary along with the Microdata, RDFa, or JSON-LD formats to add details about Misinformation to the news articles [29]. AMUSED requires some effort but still is beneficial compared to random data annotation because we need to annotate thousands of social media posts. Still, the chances of getting misinformation are significantly less.
  • 10. 10 Shahi and Majchrzak Another aspect is the diversity of social media post on the different social me- dia platforms. News articles often mention Facebook, Twitter, YouTube, yet only seldom Instagram, Pinterest, Gab and Tiktok were not mentioned at all. The reasons for this need to be explored. It would be interesting to study the propa- gation of misinformation on different platforms like Tiktok and Gab in relation to the news coverage they get. Such a cross-platform study would particularly insightful with contemporary topics such as misinformation on COVID-19. Such a cross-platform work could also be linked to classification models [26,24]. We have also analysed the multi-modality of the data on the social media platform; the number of social media post is shown in Table 2. We further classify the misinformation into four different categories, as discussed in step 8. The amount of Misinformation as text is greater compared to video or image. Thus, in Table 3 we present the textual misinformation into four different categories. Apart from text, the misinformation is also shared as image, video or embedding format like image-text. While applying the AMUSED framework on the misinformation on COVID- 19, we found that misinformation spreads across multiple source platforms, but it mainly circulated across Facebook, Twitter, YouTube. Our finding suggests concentrating mitigation efforts onto these platforms. 7 Conclusion and Future Work In this paper, we presented a semi-automatic framework for social media data annotation. The framework can be applied to several domains like misinforma- tion, mob lynching, and online abuse. As a part of the framework, we also used a Python-based crawler for different social media websites. After data labelling, the labels are cross-checked by a human, which ensures a two-step verification of data annotation for the social media posts. We also enrich the social media post by mapping it to the news article to gather more analysis about it. The data enrichment will be able to provide additional information for the social media post. We have implemented the proposed framework for collecting the misinfor- mation post related to the COVID-19. One of the limitations of the framework is that, presently, we do not address the multiple (possibly contradicting) labels assigned by different fact-checkers over the same claim. As future work, the framework can be extended for getting the annotated data on other topics like hate speech, mob lynching etc. The framework will be helpful in gathering annotated data for other domains from multiple social media sites for further analysis. AMUSED will decrease the labour cost and time for the data annotation process. Our framework will also increase the data annotation quality because we crawl the data from news articles published by an expert journalist. References 1. Aggarwal, C.C.: An introduction to social network data analytics. In: Social net- work data analytics, pp. 1–15. Springer (2011)
  • 11. AMUSED Annotation Framework 11 2. Ahmed, S., Pasquier, M., Qadah, G.: Key issues in conducting sentiment analysis on arabic social media text. In: 2013 9th International Conference on Innovations in Information Technology (IIT). pp. 72–77. IEEE (2013) 3. Alam, F., Dalvi, F., Shaar, S., Durrani, N., Mubarak, H., Nikolov, A., Martino, G.D.S., Abdelali, A., Sajjad, H., Darwish, K., et al.: Fighting the covid-19 info- demic in social media: A holistic perspective and a call to arms. arXiv preprint arXiv:2007.07996 (2020) 4. Aroyo, L., Welty, C.: Truth is a lie: Crowd truth and the seven myths of human annotation. AI Magazine 36(1), 15–24 (2015) 5. Braun, J., Gillespie, T.: Hosting the public discourse, hosting the public: When online news and social media converge. Journalism Practice 5(4), 383–398 (2011) 6. Brennen, J.S., Simon, F., Howard, s.N., Nielsen, R.K.: Types, sources, and claims of covid-19 misinformation. Reuters Institute 7, 3–1 (2020) 7. Caumont, A.: 12 trends shaping digital news. Pew Research Center 16 (2013) 8. Chapman, W.W., Nadkarni, P.M., Hirschman, L., D’avolio, L.W., Savova, G.K., Uzuner, O.: Overcoming barriers to nlp for clinical text: the role of shared tasks and the need for additional creative solutions (2011) 9. Cherubini, F., Graves, L.: The rise of fact-checking sites in europe. Reuters Institute for the Study of Journalism, University of Oxford. http://reutersinsfitute. polifics. ox. ac. uk/our-research/rise-fact-checking-sites-europe (2016) 10. Cook, P., Stevenson, S.: Automatically identifying changes in the semantic orien- tation of words. In: LREC (2010) 11. Cui, X., Liu, Y.: How does online news curate linked sources? a content analysis of three online news media. Journalism 18(7), 852–870 (2017) 12. Duchenne, O., Laptev, I., Sivic, J., Bach, F., Ponce, J.: Automatic annotation of human actions in video. In: 2009 IEEE 12th International Conference on Computer Vision. pp. 1491–1498. IEEE (2009) 13. Forbush, T.B., Shen, S., South, B.R., DuValla, S.L.: What a catch! traits that define good annotators. Studies in health technology and informatics 192, 1213– 1213 (2013) 14. Fort, K., Adda, G., Cohen, K.B.: Amazon mechanical turk: Gold mine or coal mine? Computational Linguistics 37(2), 413–420 (2011) 15. Geiger, R.S., Yu, K., Yang, Y., Dai, M., Qiu, J., Tang, R., Huang, J.: Garbage in, garbage out? do machine learning application papers in social computing re- port where human-labeled training data comes from? In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency. pp. 325–336 (2020) 16. Giglietto, F., Rossi, L., Bennato, D.: The open laboratory: Limits and possibili- ties of using facebook, twitter, and youtube as a research data source. Journal of technology in human services 30(3-4), 145–159 (2012) 17. Grant-Muller, S.M., Gal-Tzur, A., Minkov, E., Nocera, S., Kuflik, T., Shoor, I.: Enhancing transport data collection through social media sources: methods, chal- lenges and opportunities for textual data. IET Intelligent Transport Systems 9(4), 407–417 (2014) 18. Gupta, A., Lamba, H., Kumaraguru, P., Joshi, A.: Faking sandy: characterizing and identifying fake images on twitter during hurricane sandy. In: Proceedings of the 22nd international conference on World Wide Web. pp. 729–736 (2013) 19. Haertel, R.A.: Practical cost-conscious active learning for data annotation in annotator-initiated environments. Brigham Young University-Provo (2013) 20. Institute, P.: The International Fact-Checking Network (2020), https://www. poynter.org/ifcn/
  • 12. 12 Shahi and Majchrzak 21. Kohring, M., Matthes, J.: Trust in news media: Development and validation of a multidimensional scale. Communication research 34(2), 231–252 (2007) 22. Mandl, T., Modha, S., Shahi, G.K., Jaiswal, A.K., Nandini, D., Patel, D., Ma- jumder, P., Schäfer, J.: Overview of the HASOC track at FIRE 2020: Hate speech and offensive content identification in indo-european languages. In: Mehta, P., Mandl, T., Majumder, P., Mitra, M. (eds.) Working Notes of FIRE 2020. CEUR Workshop Proceedings, vol. 2826, pp. 87–111. CEUR-WS.org (2020) 23. McGahan, C., Katsion, J.: Secondary communication crisis: Social media news information. Liberty University Research Week (2021) 24. Nandini, D., Capecci, E., Koefoed, L., Laña, I., Shahi, G.K., Kasabov, N.: Mod- elling and analysis of temporal gene expression data using spiking neural net- works. In: International Conference on Neural Information Processing. pp. 571– 581. Springer (2018) 25. Sabou, M., Bontcheva, K., Derczynski, L., Scharl, A.: Corpus annotation through crowdsourcing: Towards best practice guidelines. In: LREC. pp. 859–866 (2014) 26. Shahi, G.K., Bilbao, I., Capecci, E., Nandini, D., Choukri, M., Kasabov, N.: Anal- ysis, classification and marker discovery of gene expression data with evolving spiking neural networks. In: International Conference on Neural Information Pro- cessing. pp. 517–527. Springer (2018) 27. Shahi, G.K., Dirkson, A., Majchrzak, T.A.: An exploratory study of covid-19 mis- information on twitter. Online social networks and media p. 100104 (2021) 28. Shahi, G.K., Nandini, D.: FakeCovid – a multilingual cross-domain fact check news dataset for covid-19. In: Workshop Proceedings of the 14th International AAAI Conference on Web and Social Media (2020), http://workshop-proceedings. icwsm.org/pdf/2020_14.pdf 29. Shahi, G.K., Nandini, D., Kumari, S.: Inducing schema. org markup from natural language context. Kalpa Publications in Computing 10, 38–42 (2019) 30. Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: A data mining perspective. ACM SIGKDD explorations newsletter 19(1), 22–36 (2017) 31. Sorokin, A., Forsyth, D.: Utility data annotation with amazon mechanical turk. In: 2008 IEEE computer society conference on computer vision and pattern recognition workshops. pp. 1–8. IEEE (2008) 32. Stieglitz, S., Mirbabaie, M., Ross, B., Neuberger, C.: Social media analytics– challenges in topic discovery, data collection, and data preparation. International journal of information management 39, 156–168 (2018) 33. Talwar, S., Dhir, A., Kaur, P., Zafar, N., Alrasheedy, M.: Why do people share fake news? associations between the dark side of social media use and fake news sharing behavior. Journal of Retailing and Consumer Services 51, 72–82 (2019) 34. Team, C.: Crowdtangle. facebook, menlo park, california, united states (2020) 35. The Guardian: The WHO v coronavirus: why it can’t handle the pandemic (2020), https://www.theguardian.com/news/2020/apr/10/ world-health-organization-who-v-coronavirus-why-it-cant-handle-pandemic 36. Thorson, K., Driscoll, K., Ekdale, B., Edgerly, S., Thompson, L.G., Schrock, A., Swartz, L., Vraga, E.K., Wells, C.: Youtube, twitter and the occupy movement: Connecting content and circulation practices. Information, Communication & So- ciety 16(3), 421–451 (2013) 37. Zarocostas, J.: World Report How to fight an infodemic. The Lancet 395, 676 (2020). https://doi.org/10.1016/S0140-6736(20)30461-X
  • 13. AMUSED Annotation Framework 13 38. Zhou, X., Zafarani, R.: Fake news: A survey of research, detection methods, and opportunities. CoRR abs / 1812.00315 (2018), http://arxiv.org/abs/1812. 00315