SlideShare a Scribd company logo
1 of 12
Download to read offline
Application of linguistic cues in the analysis of
language of hate groups
Bartlomiej Balcerzak1
, Wojciech Jaworski1,2
, and Adam Wierzbicki1
1
Polish-Japanese Institute of Information Technology
Koszykowa 86
02-008 Warsaw, Poland
2
Institute of Informatics, University of Warsaw
Banacha 2
02-097 Warsaw, Poland
Abstract. Hate speech and fringe ideologies are social phenomena that
seem to thrive on-line. Various members of the political and religious
fringe are able, via the Internet, to propagate the idea with less effort
compared to more traditional media. In this article we attempt to use
linguistic cues such as parts of speech occurrence in order to distinguish
the language of fringe groups from strictly informative sources. The aim
of this research is to provide a preliminary model for identifying decep-
tive and persuasive materials on-line. Examples of such would include
aggressive marketing and hate speech. For the sake of this paper we aim
to focus on the political aspect. Our research has shown that informa-
tion about sentence length and the occurrence of adjectives and adverbs
can provide information for the identification of differences between the
language of fringe political groups and mainstream news media. The
most efficient method involved the classification of fringe sentences (as
detected by Part of Speech occurrence) within an article, and taking into
account their frequency within the article.
Key words: hate speech, natural language processing, propa-
ganda, machine learning
1 Introduction
In cyberspace millions of people send out and review terabites of data. At the
same time, countless political, religious and ideological agendas can be propa-
gated with nearly no limitations. This in turn leads to an increased risk of the
successful spread of various types of ideologies. Many tools can be introduced,
in order to combat this process and promote a critical approach to informa-
tion available on-line. With the ever growing plethora of websites and forums,
an automated method of recognizing such material seems a viable solution. In
this paper we propose an approach to this problem, which can lead to the de-
velopment of such tools. By applying methods of natural language processing,
we aim to construct a classiļ¬er for identifying the language of political propa-
ganda. For the sake of this research we decided to focus on written text that
constitute the language of entities from the political extremes. Applying meth-
ods that concentrate on the structure of text rather than word count may allow
for a more generalized method of automated text processing. In this paper we
want to use a selection of verbal cues as a method of distinguishing the language
of political propaganda. These characteristics include basic information about
sentence length, and itsā€™ structure, and will be extracted with the use of Part
of Speech taggers already available for the English language. Unlike the bag of
words approach, which focuses on word concurrence within a set of documents,
our approach aims to focus on a deeper level of analysis, one involving cues that
describe the style of the text. By using this information, we plan to construct
robust classiļ¬ers able to identify the language of fringe elements in the political
spectrum.
In order to perform our research we collected two corpora representing the
extremes of the political spectrum. One, are texts gathered from a set websites
connected with American groups promoting radical nationalistic ideologies (i.e.
national socialism, white and black supremacy, racism, anti-immigrant combat-
ants). The other corpus consists of text from websites that promote communism.
We decided to split the fringe material in two corpora, in order to take account
of potential diļ¬€erences between extreme ideologies. This collection will be com-
pared with the language of mainstream political news, both from national and
local news agencies and newspapers.
If this model of linguistic cues that are based on part of speech occurrence
provides viable results with such diļ¬€ering materials it would indicate that it can
also be used in more subtle scenarios. Analysis of the given material has been
conducted both on the article and sentence level. This involved the application of
machine learning algorithms in order to identify sentences and articles belonging
to political propaganda. Our main assumption is that since fringe ideologies are
bent on changing the world in a radical way, materials that endorse such a
sentiment would be mostly emotional and ļ¬lled with statements that evaluate
the current situation as well as the desired ideal world, the end game of fully
implementing the tenets of the ideology. Therefore, when speaking in linguistic
term, texts belonging to a fringe ideology corpus would have higher amounts of
adjectives and adverbs than strictly informative sources. Hence, the following
hypotheses regarding the role each of the selected linguistic attributes have been
proposed for fringe ideology detection:
Hypothesis 1 Text belonging to fringe ideologies contain more adjec-
tives and adverbs than strictly informative texts.
which can also be rephrased as:
Hypothesis 2 Sentences belonging to fringe ideologies contain more
adjectives and adverbs than strictly informative texts.
We also propose an additional hypothesis regarding the average length of a sen-
tence in both fringe and informative sources.
Hypothesis 3 Texts that endorse fringe ideologies contain longer sen-
tences than strictly informative sources.
In our current research we focus both on the sentence and article level. We
aim to verify these hypotheses by using machine learning techniques. Extracted
features will serve as attributes used by the algorithm for training. This means
that standard procedures of machine learning evaluation will be applied.
We also propose a fourth hypothesis:
Hypothesis 4 When predicting whether a text belongs to a fringe ide-
ology or informative sources, information about the frequency of fringe
sentences within it provides the highest measures of performance (clas-
sification accuracy)
In order to test these hypotheses, two main tasks have been conducted. First
involved using machine learning techniques to identify whether the sentences or
articles from our corpus belong to the fringe or informative class. Afterwards the
output from the models applied to this tasks will be used on the set of articles
from our corpora. The second task would be predicting, based on the frequency
of propaganda sentences in an article, as given by the classiļ¬er, whether said
article belongs to the category of fringe or information. After the performance
of algorithms used in each task is evaluated, attribute extraction methods are
used in order to determine which attributes are crucial for determining whether
the source is fringe or information.
2 Related work
Various ļ¬elds of natural language processing can be connected with the problem
of identifying the language of political fringe. One of these ļ¬elds would be text
genre detection, where NLP tools are used in order to identify the genre of a
given text. Such tools involve a bag-of-words approach as well as Part-of-speech
n-grams as used in [1]. Other applications of genre classiļ¬cation include the use
of both linguistic cues and html structure of the web page[2]. In our work, how-
ever we focus not only on detecting text genres, but also on identifying a very
speciļ¬c type of narration, which can be classiļ¬ed as manipulative ore deceptive.
This leads to a second ļ¬eld of study that can be applied to the study of ma-
terials from the political fringe. Analysis of such sources on the Internet had
been present in the ļ¬eld of computer science for some time. The research done
so far can be divided into three main areas of investigation. First of them can
be described as studies into behavioral patterns of propagandists on-line. This
approach focuses mostly on the agent distributing the content, rather than the
content itself, it is very strongly inspired by research into spam detection [3].
Such research was mostly focused on micro-blogging platforms such as Twitter
[?]lumezano) or open source projects such as Wikipedia [4]. What is notable is
the fact that in these papers the main source of traits that could identify pro-
paganda are mostly meta information, referring to frequency of commenting on
a material, and presenting the same content repeatedly. The other main type of
research is the ļ¬eld of deception detection which deals with the problem of distin-
guishing whether the author of a text intended to fool the recipient. Researchers
working in this ļ¬eld wanted to identify linguistic cues related with deceptive
behavior. Most of these cues such as the ones used by [9] in experimental setups
or [6] for fraudulent ļ¬nancial claims emphasized basic shallow characteristics
such as sentence or noun phrase length that showed to be indicative of deceptive
materials. A more sophisticated method was proposed by [8]. In their paper they
introduced a concept of stylometry (analyzing larger syntactical structures) for
deception detection and opinion spam. Their work focuses on the structure of
syntax, the relation between larger elements of the sentence (ie. Noun phrases).
A slightly diļ¬€erent approach was tested by [5] in their essay experiment. With
the aid of LIWC tool[12] they aimed to identify words that appear most often
in deceptive materials. In our work we want to extend and test the ļ¬ndings of
deception detection for the task of identifying textual materials forming the bulk
of fringe ideologies. It is also worth noting that most of the research in this ļ¬eld
was done for the English language. The only instance of deception detection
in another language that we managed to ļ¬nd was study [7] that was done for
Italian.
3 Hypothesis verification
3.1 Dataset
In order to test our hypothesis we gathered a corpus consisting of both radical
ideological material and balanced informative text. All of the corpora represent
modern American English. We prepared a collection of texts taken from three
distinguished sources:
ā€“ Nazi corpus: This contains articles from websites belonging to groups in
the US that promote national socialism and ideas of racial and ethnic superi-
ority. When collecting the websites, we used the list of hate groups provided
by the Southern Poverty Law Center[18]. Noticeable sources include Amer-
ican National Socialist Party, National Socialist movement, Aryan Nations
etc. In all, 100 web pages were extracted. An example of a text from this
corpus is shown below:
To a National Socialist, things like PRIDE, HONOR, LOYALTY, COURAGE,
DISCIPLINE, and MORALITY - actually MEAN something. Like our fore-
fathers, we too are willing to SACRIFICE to build a better world for our
children whom we love deeply, and like them as well - we are willing to
DO ANYTHING NECESSARY to ACHIEVE THAT GOAL. We are your
brothers and your sisters, we are your fathers and your mothers, your friends
and your co-workers - WE ARE WHITE AMERICA, just like YOU! Your
enemies in control attempt with their constant ā€anti-naziā€ propaganda - to
persuade you that ā€theyā€ are your ā€real friendsā€ - and that WE, your own
kin, are your lifelong enemies. Yet, ask yourself THIS - ā€WHOā€ have you
to ā€THANKā€ for ALL the PROBLEMS FACING YOU? The creatures who
HAVE been in total CONTROL - or - those of us resisting the evil? The
TRUTH is right there in front of you - DONā€™T be afraid to understand it
and to ACT upon it! You hold the FUTURE - BRIGHT or DARK - in
YOUR hands, and TIME IS RUNNING OUT!
ā€“ Communist Corpus: This corpus contains pages from websites of Ameri-
can groups and parties that describe themselves as communist. These include
among others: Progresive Labour Party and The Communist party of the
United States of America. As with the Nazi corpus, 100 web pages were ex-
tracted. Example is presented below:
Todayā€™s action, organized by Good Jobs Nation, comes a year after it filed
a complaint with the Department of Labor that accused food franchises at
federal buildings of violating minimum-wage and overtime laws. They want
Obama to sign an executive order requiring federal agencies to contract only
with companies that engage in collective bargaining. The union leaders pulling
the strings behind Good Jobs Nation are the same people who got us into this
mess in the first place. Most contract jobs used to be full-time union jobs,
and the unions did nothing to stop the bosses from eliminating them. Now
the unions are trying to rebuild their ranks among low-wage workers who
replaced their former members. We need to abolish wage slavery with com-
munist revolution. And the struggle between reform and revolution must be
waged within struggles like this one.
ā€“ News Corpus: This corpus will serve as a reference point. It contains polit-
ical news and opinion sections from various American news sources (CNN,
FOX news, CNBC, and local media outlets), in order to construct a balanced
corpus for analysis also 100 web pages are extracted for the task of training
a machine learning algorithm. A typical article from this set looks like one
shown below:
President Barack Obamaā€™s policy toward Syria ā€“ three years of red lines and
calls for regime change ā€“ culminated Monday in a barrage of air strikes on
terror targets there, marking a turning point for the conflict and thrusting
the President further into it.The U.S. said Saudi Arabia, the United Arab
Emirates, Qatar, Bahrain and Jordan had joined in the attack on ISIS tar-
gets near Raqqa in Syria. The U.S. also launched air strikes against another
terrorist organization, the Khorasan Group.
3.2 Dataset preprocessing
After the working corpus has been prepared we divided each sentence in it into
tokens and conducted Part-of-Speech tagging (with the use of NLTK default
POS tagger). Sentence and word length have been calculated as well, hence
creating a database containing the following attributes of all sentences:
ā€“ Sentence Class: a variable determining, whether the sentence belongs to
the informative or fringe corpus
ā€“ Traits describing textual quantity: number of tokens in a sentence and
average number of characters in words constituting a sentence.
ā€“ Frequency of adjectives and adverbs
We also included occurrence of other parts of speech in order to test if they
have any impact on distinguishing fringe language from informative sources.
After the database was prepared, machine learning algorithms have been
implemented. In order to evaluate the performance of said algorithms in both of
the tasks, the following measures will be used:
ā€“ Accuracy: the percentage of all true positives and true negatives produced
by the machine learning algorithm.
ā€“ AUC (Area under ROC Curve): the measure of True Positive to False Posi-
tive ratio for various thresholds provided by the machine learning algorithm.
3.3 Machine learning: sentence prediction
The task given to the algorithms was to identify whether a sentence or article
belongs to the fringe or news corpus. 5-fold cross validation procedure has been
implemented in order to validate the algorithmsā€™ performance.
3.4 Article prediction
In this section, we analyze the performance of the article prediction based on
the frequency of Parts of Speech. Since our corpora are balanced, the baseline
measuring performance is 50%, both for accuracy and AUC. Three algorithms
turned put to be the most eļ¬€ective, these being Naive Bayes, K-NN (K=7, cosine
distance) and Neural Net.
Table 1. AUC and Accuracy of used algorithms
Algorithms applied Acc. Nazi AUC Nazi Acc. Comm AUC Comm
Naive Bayes 71% 83% 75% 85%
K-NN 60% 66% 59% 61%
Neural Net 70% 75% 77% 87%
As it is shown in table 1, the best scores were produced by the Naive Bayes
and Neural Net Algorithms. The accuracy and AUC values for these were within
the range of 70 to 80 per cent, which indicates a moderately strong performance.
It also can pointed out that higher scores were achieved for the communist
corpus. This may be in part due to the fact, that communist articles came
from mainly two large sources. Therefore their style maybe more cohesive. The
national-socialist corpus on the other hand was more diverse, allowing for a more
broader spectrum of styles. Still, both types of fringe ideologies achieved similar
results.
Applied methods of attribute extraction shown that the most important
attributes in this task were the frequency of adjectives, adverbs, and proper
nouns. As shown in table 2 containing the standardized diļ¬€erences between the
mean values of these attributes, the frequency of adjectives was higher in both
fringe corpora in relation to the informative sources. In case of adverbs the eļ¬€ect
was visible only for Nazi materials. What is also interesting, articles belonging
to fringe ideologies contained less proper nouns than informative sources. This
maybe because ideological materials tend to be more vague, and connected with
more general processes and entities. However such hypothesis needs to be veriļ¬ed
separately.
Table 2. Standardized differences between mean values for adjective, adverb and
proper noun frequency
Nazi/News Communist/News
Adjectives% 0,24 0,39
Adverbs% 0,2 -0,04
Proper Nouns% -0,38 -0,33
3.5 Sentence prediction
Table 3. AUC of used algorithms
Acc. nazi AUC nazi Acc. communist AUC communist
Naive Bayes 61% 65% 61% 65%
k-NN 62% 69% 64% 64%
Neural Net 65% 72% 63% 69%
As shown in 3 the results tend to be between the threshold of 60 and 70%. No
larger diļ¬€erences have been observed when analyzing the Nazi and communist
corpora. It is also worth noting that classiļ¬cation conducted on the sentence
level provided weaker scores than the one done for the articles. This may be
due to the fact that sentences are shorter, therefor provide less data that the
machine learning algorithms can use.
Similarly to the article prediction task, this is also complemented by attribute
extraction scores from chi-square method. According to chi-squared method the
most important attributes include: sentence length, adjectives, adverbs, and
proper nouns frequency. The relative importance of the attributes used is shown
in table 4, containing the standardized diļ¬€erences between the their mean val-
ues. Only the attributes with the highest diļ¬€erences are presented in this table.
Positive values indicate that the mean value of a given attribute was higher
for the fringe corpus. Negatives indicate that the value was higher for the news
corpus.
Table 4. Standardized differences between mean values for adjective, adverb, noun
and proper noun frequency
Nazi/News Communist/News
Sentence length 0,12 - 0,14
Adjectives% 0,3 0,3
Adverbs% 0,2 0,04
Nouns% -0,18 0,36
Proper Nouns% -0,32 -0,4
The table show a pattern similar to that observed when classifying articles.
Fringe ideology sentences, both communist and Nazi, contained a higher amount
of adjectives and adverbs. They also had less proper nouns than informative
sources. What sets the sentence classiļ¬cation aside from the article one is the
relatively high diļ¬€erence in the frequency of nouns between communist material
and informative sources, as well as the fact that both corpora varied in regards
to average sentence length. The Nazi one tend to contain longer sentences, while
communist collection has shorter ones. In both cases the diļ¬€erences are not as
pronounced as in other attributes.
3.6 Sentence based article prediction
For this task, we decided to use the labels applied to sentences in the previ-
ous subsection in order to predict whether the entire articles belong to fringe
or informative class. Frequency of fringe sentences, as provided by the Neural
Net classiļ¬er has been calculated for each article from the corpora used in our
research. Afterwards, we calculated the optimal threshold for classiļ¬cation. For
Nazi materials the threshold value was 57% of fringe sentences per article, and
for communists it was 63%. The performance scores for these corpora are shown
in table 5.
Table 5. Performance
Accuracy AUC
Nazi 81% 80%
Communist 83% 81%
When compared with the scores of machine learning from the method used
in this subsection provides a higher accuracy (80% and 83% compared to 70%
and 77% respectively), with a relatively similar level of AUC. What is also worth
noting, the performance of this method is high even though the scores for sen-
tence prediction rarely exceeded the 70% threshold for either accuracy and AUC.
This observation indicates that even tough prediction on a sentence level can be
faulty, but, when aggregated, they provide a strong signal that allows for a more
successful identiļ¬cation of fringe sources. What is more this method proves to be
slightly more eļ¬ƒcient than using Part-of-speech information on the article level.
Further work on the possibilities and limits of this approach will be pursued.
4 Conclusions and observations
Our research has led as to the following conclusions and observations:
ā€“ Article classiļ¬cation based on Part-of-Speech tagging has provided robust
scores, indicating that the chosen attributes can be used for identifying fringe
ideological sources. With a baseline of 50% for both accuracy and AUC, the
machine learning algorithms achieved scores exceeding 70% for accuracy, and
80% for AUC. Neural Net and K Nearest Neighbors algorithms produced the
best performance.
ā€“ The same task conducted on the level of sentences provided weaker scores,
mostly within the 60% - 70% range both for accuracy and AUC. However,
when the information about the frequency of fringe sentences was applied
to articles, the performance was increased to 80% for accuracy and AUC.
Consequently these score provide a more robust classiļ¬cation than the one
based on Part-of-Speech frequency on the article level. This may indicate
that sentence level classiļ¬cation is burdened with high noise which may be
countered by taking account of fringe sentences frequency in the article.
ā€“ In view of collected data hypothesis 3 cannot be considered as true. The
sentence length proved not to be an important attribute for determining
whether the sentence belongs to a fringe or informative source. The texts
from the Nazi corpus, on average, contained longer sentences. Conversely,
communist sources where constructed of shorter ones. However neither pro-
duced diļ¬€erences large enough to aļ¬€ect the machine learning algorithms
performance.
ā€“ Analysis on both the article and sentence level shows that adjectives and ad-
verbs frequency plays an important role in identifying fringe sources. More-
over in collected data allows us to consider hypotheses 1 and 2 as veriļ¬ed.
ā€“ Analysis has also shown that fringe materials contain less named entities
than informative sources. This may suggest that fringe text tend to be more
vague than those that are solely informative.
In summary, our research lends credence to the notion that such basic linguistic
cues as occurrence of Parts-of-Speech can be used for determining whether a
text belongs to the language of information or fringe ideology.
5 Future work
The results we came up with show, that there is room for further research. Using
basic stylistic cues for identifying speciļ¬c narrations may be extended to include
language of political, religious or scientiļ¬c discourse. In our future work we plan
to gather text corpora related to such types of language (marketing, religion,
science etc.) and test them with the use of shallow linguistic characteristics. We
also plan to include in our model such elements of propaganda as repetition,
and vagueness [19][15]. We plan to conduct more detailed analysis of the two
observations we made when researching the role of linguistic features: the dif-
ferences in named entities frequency in fringe and informative sources, and the
frequency of positive class sentences as a method of article prediction. Devising
a computational model may lead to the development of tools dedicated to au-
tomatic detection of speciļ¬c forms of languages. So far we worked with modern
documents written in English which are focused mostly on one ideology. This is
why in our future work we aim to extend our focus the historical texts written
in diļ¬€erent languages (Polish, German etc.). Therefore we will be able to show
whether or not the language of propaganda follows a rigid unchanging pattern,
or is it strongly culture-based. It is also important in order to identify linguistic
cues that can be applied to fringe ideologies only. To this end we plan to apply
the developed model for other fringe ideologies and pseudoscience.
Results we obtained with shallow methods of POS tagging show great promise.
The next natural step would be to use deeper NLP methods, such as syntax pars-
ing, or transcribing texts into logical formulas. In summary, our current work
serves as preliminary research into a ļ¬eld of computational analysis of discourse.
Therefore we can obtain an eļ¬€ective model, which can be extrapolated to var-
ious topical domains. This will prove useful as a way of constructing a more
general theory of computational models of recognizing language of fringe ide-
ologies (as well as other forms of written language such as language of religion,
science, marketing or various social classes). This will be especially important
for classiļ¬cation method based on the fringe sentence frequency in the article.
Providing a more general method of text corpora analysis, therefore allowing for
the design of more intricate automated system that could detect propaganda or
other forms of manipulative content. Focusing on structural aspects would make
it also more resistant to deceptive strategies on the part of content producers.
6 Acknowlegments
This work was ļ¬nancially supported by the European Community from the
European Social Fund within the INTERKADRA project.
It was also supported by the grant Reconcile: Robust Online Credibility Eval-
uation of Web Content from Switzerland through the Swiss Contribution to the
enlarged European Union.
References
1. Sharoff, Serge. ā€Classifying Web corpora into domain and genre using automatic
feature identification.ā€ Proceedings of the 3rd Web as Corpus Workshop. 2007.
2. Santini, Marina, Richard Power, and Roger Evans. ā€Implementing a characteriza-
tion of genre for automatic genre identification of web pages.ā€ Proceedings of the
COLING/ACL on Main conference poster sessions. Association for Computational
Linguistics, 2006.
3. Metaxas, Panagiotis Takis. ā€Web spam, social propaganda and the evolution of
search engine rankings.ā€ Web Information Systems and Technologies. Springer
Berlin Heidelberg, 2010. 170-182.
4. Chandy, Rishi. ā€Wikiganda: Identifying propaganda through text analysis.ā€ Cal-
tech Undergraduate Research Journal. Winter 2009 (2008).
5. Mihalcea, Rada, and Carlo Strapparava. ā€The lie detector: Explorations in the
automatic recognition of deceptive language.ā€ Proceedings of the ACL-IJCNLP
2009 Conference Short Papers. Association for Computational Linguistics, 2009.
6. Humpherys, Sean L., et al. ā€Identification of fraudulent financial statements using
linguistic credibility analysis.ā€ Decision Support Systems 50.3 (2011): 585-594.
7. Fornaciari, Tommaso, and Massimo Poesio. ā€Lexical vs. surface features in de-
ceptive language analysis.ā€ Proceedings of the ICAIL 2011 Workshop Applying
Human Language Technology to the Law, AHLTL. 2011.
8. Feng, Song, Ritwik Banerjee, and Yejin Choi. ā€Syntactic stylometry for deception
detection.ā€ Proceedings of the 50th Annual Meeting of the Association for Compu-
tational Linguistics: Short Papers-Volume 2. Association for Computational Lin-
guistics, 2012.
9. Ott, Myle, et al. ā€Finding deceptive opinion spam by any stretch of the imag-
ination.ā€ Proceedings of the 49th Annual Meeting of the Association for Com-
putational Linguistics: Human Language Technologies-Volume 1. Association for
Computational Linguistics, 2011.
10. Lee, A. M. and Lee(eds.), E. B. (1939). The Fine Art of Propaganda. The Institute
for Propaganda Analysis. Harcourt, Brace and Co.
11. Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with
Python. Oā€™Reilly Media, Inc., 2009.
12. Pennebaker, James W., Martha E. Francis, and Roger J. Booth. ā€Linguistic inquiry
and word count: LIWC 2001.ā€ Mahway: Lawrence Erlbaum Associates 71 (2001):
2001.
13. Progressive Labour Party website. http://www.plp.org/
14. Communist Party of the United States of America website www.cpusa.org
15. G?fu, Daniela, and Dan Cristea. ā€Towards an Automated Semiotic Analysis of the
Romanian Political Discourse.ā€ Computer Science 21.1 (2013): 61.
16. Paik, Woojin, et al. ā€Applying natural language processing (nlp) based metadata
extraction to automatically acquire user preferences.ā€ Proceedings of the 1st inter-
national conference on Knowledge capture. ACM, 2001.
17. GıĢ‚fu, Daniela, and Ioan Constantin Dima. ā€An operational approach of communi-
cational propaganda.ā€ International Letters of Social and Humanistic Sciences 23
(2014): 29-38.
18. http://www.splcenter.org/
19. Lacoue-Labarthe, Philippe, and Jean-Luc Nancy. Le mythe nazi. Editions de
lā€™Aube, 1998.

More Related Content

Similar to Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYijnlc
Ā 
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...Faiz Ullah
Ā 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsLisa Graves
Ā 
The Relevance of content analysis to the media
The Relevance of content analysis to the mediaThe Relevance of content analysis to the media
The Relevance of content analysis to the mediaAlex Taremwa
Ā 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORAcsandit
Ā 
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docx
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docxYour Annotated Bibliography must haveĀ 8 sources. Please go back to t.docx
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docxbudbarber38650
Ā 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
Ā 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsIJCERT JOURNAL
Ā 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet IJECEIAES
Ā 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448IJERA Editor
Ā 
234640669.pdf
234640669.pdf234640669.pdf
234640669.pdfThanoonQasem
Ā 
Research methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisResearch methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisNikhil kumar Tyagi
Ā 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language ProcessingMariana Soffer
Ā 
A Simple Approach To Classify Fictional And Non-Fictional Genres
A Simple Approach To Classify Fictional And Non-Fictional GenresA Simple Approach To Classify Fictional And Non-Fictional Genres
A Simple Approach To Classify Fictional And Non-Fictional GenresAndrea Porter
Ā 
Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Bahria University, Islamabad
Ā 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourseijitcs
Ā 
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusAutomatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusRichard Hogue
Ā 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewINFOGAIN PUBLICATION
Ā 
The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...
 The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide... The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...
The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...English Literature and Language Review ELLR
Ā 

Similar to Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups (20)

AMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITYAMBIGUITY-AWARE DOCUMENT SIMILARITY
AMBIGUITY-AWARE DOCUMENT SIMILARITY
Ā 
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...
8.+A+Multidimensional+Comparative+Analysis+of+Promotional+and+Informational+F...
Ā 
A Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And ApplicationsA Review Of Text Mining Techniques And Applications
A Review Of Text Mining Techniques And Applications
Ā 
Introduction to Nvivo
Introduction to NvivoIntroduction to Nvivo
Introduction to Nvivo
Ā 
The Relevance of content analysis to the media
The Relevance of content analysis to the mediaThe Relevance of content analysis to the media
The Relevance of content analysis to the media
Ā 
TOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORATOPIC BASED ANALYSIS OF TEXT CORPORA
TOPIC BASED ANALYSIS OF TEXT CORPORA
Ā 
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docx
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docxYour Annotated Bibliography must haveĀ 8 sources. Please go back to t.docx
Your Annotated Bibliography must haveĀ 8 sources. Please go back to t.docx
Ā 
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGESOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGE
Ā 
Mining Opinion Features in Customer Reviews
Mining Opinion Features in Customer ReviewsMining Opinion Features in Customer Reviews
Mining Opinion Features in Customer Reviews
Ā 
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
A Framework for Arabic Concept-Level Sentiment Analysis using SenticNet
Ā 
Ny3424442448
Ny3424442448Ny3424442448
Ny3424442448
Ā 
234640669.pdf
234640669.pdf234640669.pdf
234640669.pdf
Ā 
Research methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content AnalysisResearch methodolgy and legal writing: Content Analysis
Research methodolgy and legal writing: Content Analysis
Ā 
Natural Language Processing
Natural Language ProcessingNatural Language Processing
Natural Language Processing
Ā 
A Simple Approach To Classify Fictional And Non-Fictional Genres
A Simple Approach To Classify Fictional And Non-Fictional GenresA Simple Approach To Classify Fictional And Non-Fictional Genres
A Simple Approach To Classify Fictional And Non-Fictional Genres
Ā 
Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10Qualitative Methods in International Relations - Chapters 5, 8, 10
Qualitative Methods in International Relations - Chapters 5, 8, 10
Ā 
Information extraction using discourse
Information extraction using discourseInformation extraction using discourse
Information extraction using discourse
Ā 
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A CorpusAutomatize Document Topic And Subtopic Detection With Support Of A Corpus
Automatize Document Topic And Subtopic Detection With Support Of A Corpus
Ā 
Text Mining at Feature Level: A Review
Text Mining at Feature Level: A ReviewText Mining at Feature Level: A Review
Text Mining at Feature Level: A Review
Ā 
The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...
 The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide... The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...
The Influence of Content Schema on L2 Learnersā€™ Reading Comprehension: Evide...
Ā 

More from Leonard Goudy

Full Page Printable Lined Paper - Printable World Ho
Full Page Printable Lined Paper - Printable World HoFull Page Printable Lined Paper - Printable World Ho
Full Page Printable Lined Paper - Printable World HoLeonard Goudy
Ā 
Concept Paper Examples Philippines Educational S
Concept Paper Examples Philippines Educational SConcept Paper Examples Philippines Educational S
Concept Paper Examples Philippines Educational SLeonard Goudy
Ā 
How To Improve An Essay In 7 Steps Smartessayrewrit
How To Improve An Essay In 7 Steps SmartessayrewritHow To Improve An Essay In 7 Steps Smartessayrewrit
How To Improve An Essay In 7 Steps SmartessayrewritLeonard Goudy
Ā 
INTERESTING THESIS TOPICS FOR HIGH SCHO
INTERESTING THESIS TOPICS FOR HIGH SCHOINTERESTING THESIS TOPICS FOR HIGH SCHO
INTERESTING THESIS TOPICS FOR HIGH SCHOLeonard Goudy
Ā 
Persuasive Essay Site That Writes Essays For You Free
Persuasive Essay Site That Writes Essays For You FreePersuasive Essay Site That Writes Essays For You Free
Persuasive Essay Site That Writes Essays For You FreeLeonard Goudy
Ā 
Money Cant Buy Happiness But Happi
Money Cant Buy Happiness But HappiMoney Cant Buy Happiness But Happi
Money Cant Buy Happiness But HappiLeonard Goudy
Ā 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssLeonard Goudy
Ā 
Persuasive Essays Examples And Samples Es
Persuasive Essays Examples And Samples EsPersuasive Essays Examples And Samples Es
Persuasive Essays Examples And Samples EsLeonard Goudy
Ā 
Thesis Statement Thesis Essay Sa
Thesis Statement Thesis Essay SaThesis Statement Thesis Essay Sa
Thesis Statement Thesis Essay SaLeonard Goudy
Ā 
A Multimedia Visualization Tool For Solving Mechanics Dynamics Problem
A Multimedia Visualization Tool For Solving Mechanics Dynamics ProblemA Multimedia Visualization Tool For Solving Mechanics Dynamics Problem
A Multimedia Visualization Tool For Solving Mechanics Dynamics ProblemLeonard Goudy
Ā 
A3 Methodology Going Beyond Process Improvement
A3 Methodology  Going Beyond Process ImprovementA3 Methodology  Going Beyond Process Improvement
A3 Methodology Going Beyond Process ImprovementLeonard Goudy
Ā 
Asexuality In Disability Narratives
Asexuality In Disability NarrativesAsexuality In Disability Narratives
Asexuality In Disability NarrativesLeonard Goudy
Ā 
A Short Essay Of Three Research Methods In Qualitative
A Short Essay Of Three Research Methods In QualitativeA Short Essay Of Three Research Methods In Qualitative
A Short Essay Of Three Research Methods In QualitativeLeonard Goudy
Ā 
An Interactive Educational Environment For Preschool Children
An Interactive Educational Environment For Preschool ChildrenAn Interactive Educational Environment For Preschool Children
An Interactive Educational Environment For Preschool ChildrenLeonard Goudy
Ā 
An Apology For Hermann Hesse S Siddhartha
An Apology For Hermann Hesse S SiddharthaAn Apology For Hermann Hesse S Siddhartha
An Apology For Hermann Hesse S SiddharthaLeonard Goudy
Ā 
A Survey Of Unstructured Outdoor Play Habits Among Irish Children A Parents ...
A Survey Of Unstructured Outdoor Play Habits Among Irish Children  A Parents ...A Survey Of Unstructured Outdoor Play Habits Among Irish Children  A Parents ...
A Survey Of Unstructured Outdoor Play Habits Among Irish Children A Parents ...Leonard Goudy
Ā 
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...Leonard Goudy
Ā 
A Major Project Report On Quot VEHICLE TRACKING SYSTEM USING GPS AND GSM Q...
A Major Project Report On  Quot  VEHICLE TRACKING SYSTEM USING GPS AND GSM  Q...A Major Project Report On  Quot  VEHICLE TRACKING SYSTEM USING GPS AND GSM  Q...
A Major Project Report On Quot VEHICLE TRACKING SYSTEM USING GPS AND GSM Q...Leonard Goudy
Ā 

More from Leonard Goudy (20)

Full Page Printable Lined Paper - Printable World Ho
Full Page Printable Lined Paper - Printable World HoFull Page Printable Lined Paper - Printable World Ho
Full Page Printable Lined Paper - Printable World Ho
Ā 
Concept Paper Examples Philippines Educational S
Concept Paper Examples Philippines Educational SConcept Paper Examples Philippines Educational S
Concept Paper Examples Philippines Educational S
Ā 
How To Improve An Essay In 7 Steps Smartessayrewrit
How To Improve An Essay In 7 Steps SmartessayrewritHow To Improve An Essay In 7 Steps Smartessayrewrit
How To Improve An Essay In 7 Steps Smartessayrewrit
Ā 
INTERESTING THESIS TOPICS FOR HIGH SCHO
INTERESTING THESIS TOPICS FOR HIGH SCHOINTERESTING THESIS TOPICS FOR HIGH SCHO
INTERESTING THESIS TOPICS FOR HIGH SCHO
Ā 
German Essays
German EssaysGerman Essays
German Essays
Ā 
Persuasive Essay Site That Writes Essays For You Free
Persuasive Essay Site That Writes Essays For You FreePersuasive Essay Site That Writes Essays For You Free
Persuasive Essay Site That Writes Essays For You Free
Ā 
Money Cant Buy Happiness But Happi
Money Cant Buy Happiness But HappiMoney Cant Buy Happiness But Happi
Money Cant Buy Happiness But Happi
Ā 
Example Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free EssExample Of Methodology In Research Paper - Free Ess
Example Of Methodology In Research Paper - Free Ess
Ā 
Persuasive Essays Examples And Samples Es
Persuasive Essays Examples And Samples EsPersuasive Essays Examples And Samples Es
Persuasive Essays Examples And Samples Es
Ā 
Thesis Statement Thesis Essay Sa
Thesis Statement Thesis Essay SaThesis Statement Thesis Essay Sa
Thesis Statement Thesis Essay Sa
Ā 
A Multimedia Visualization Tool For Solving Mechanics Dynamics Problem
A Multimedia Visualization Tool For Solving Mechanics Dynamics ProblemA Multimedia Visualization Tool For Solving Mechanics Dynamics Problem
A Multimedia Visualization Tool For Solving Mechanics Dynamics Problem
Ā 
A3 Methodology Going Beyond Process Improvement
A3 Methodology  Going Beyond Process ImprovementA3 Methodology  Going Beyond Process Improvement
A3 Methodology Going Beyond Process Improvement
Ā 
Asexuality In Disability Narratives
Asexuality In Disability NarrativesAsexuality In Disability Narratives
Asexuality In Disability Narratives
Ā 
A Short Essay Of Three Research Methods In Qualitative
A Short Essay Of Three Research Methods In QualitativeA Short Essay Of Three Research Methods In Qualitative
A Short Essay Of Three Research Methods In Qualitative
Ā 
An Interactive Educational Environment For Preschool Children
An Interactive Educational Environment For Preschool ChildrenAn Interactive Educational Environment For Preschool Children
An Interactive Educational Environment For Preschool Children
Ā 
An Apology For Hermann Hesse S Siddhartha
An Apology For Hermann Hesse S SiddharthaAn Apology For Hermann Hesse S Siddhartha
An Apology For Hermann Hesse S Siddhartha
Ā 
Applied Math
Applied MathApplied Math
Applied Math
Ā 
A Survey Of Unstructured Outdoor Play Habits Among Irish Children A Parents ...
A Survey Of Unstructured Outdoor Play Habits Among Irish Children  A Parents ...A Survey Of Unstructured Outdoor Play Habits Among Irish Children  A Parents ...
A Survey Of Unstructured Outdoor Play Habits Among Irish Children A Parents ...
Ā 
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...
An Exploration Of Corporate Social Responsibility (CSR) As A Lever For Employ...
Ā 
A Major Project Report On Quot VEHICLE TRACKING SYSTEM USING GPS AND GSM Q...
A Major Project Report On  Quot  VEHICLE TRACKING SYSTEM USING GPS AND GSM  Q...A Major Project Report On  Quot  VEHICLE TRACKING SYSTEM USING GPS AND GSM  Q...
A Major Project Report On Quot VEHICLE TRACKING SYSTEM USING GPS AND GSM Q...
Ā 

Recently uploaded

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementmkooblal
Ā 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersSabitha Banu
Ā 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for BeginnersSabitha Banu
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
Ā 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceSamikshaHamane
Ā 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxAnupkumar Sharma
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxRaymartEstabillo3
Ā 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPCeline George
Ā 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxChelloAnnAsuncion2
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
Ā 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxiammrhaywood
Ā 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptxSherlyMaeNeri
Ā 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatYousafMalik24
Ā 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPCeline George
Ā 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxLigayaBacuel1
Ā 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfphamnguyenenglishnb
Ā 

Recently uploaded (20)

Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
Ā 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
Ā 
Hierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of managementHierarchy of management that covers different levels of management
Hierarchy of management that covers different levels of management
Ā 
DATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginnersDATA STRUCTURE AND ALGORITHM for beginners
DATA STRUCTURE AND ALGORITHM for beginners
Ā 
Full Stack Web Development Course for Beginners
Full Stack Web Development Course  for BeginnersFull Stack Web Development Course  for Beginners
Full Stack Web Development Course for Beginners
Ā 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
Ā 
Roles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in PharmacovigilanceRoles & Responsibilities in Pharmacovigilance
Roles & Responsibilities in Pharmacovigilance
Ā 
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptxMULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
MULTIDISCIPLINRY NATURE OF THE ENVIRONMENTAL STUDIES.pptx
Ā 
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptxEPANDING THE CONTENT OF AN OUTLINE using notes.pptx
EPANDING THE CONTENT OF AN OUTLINE using notes.pptx
Ā 
How to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERPHow to do quick user assign in kanban in Odoo 17 ERP
How to do quick user assign in kanban in Odoo 17 ERP
Ā 
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptxGrade 9 Q4-MELC1-Active and Passive Voice.pptx
Grade 9 Q4-MELC1-Active and Passive Voice.pptx
Ā 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
Ā 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
Ā 
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptxECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
ECONOMIC CONTEXT - PAPER 1 Q3: NEWSPAPERS.pptx
Ā 
Judging the Relevance and worth of ideas part 2.pptx
Judging the Relevance  and worth of ideas part 2.pptxJudging the Relevance  and worth of ideas part 2.pptx
Judging the Relevance and worth of ideas part 2.pptx
Ā 
Earth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice greatEarth Day Presentation wow hello nice great
Earth Day Presentation wow hello nice great
Ā 
What is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERPWhat is Model Inheritance in Odoo 17 ERP
What is Model Inheritance in Odoo 17 ERP
Ā 
OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...OS-operating systems- ch04 (Threads) ...
OS-operating systems- ch04 (Threads) ...
Ā 
Planning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptxPlanning a health career 4th Quarter.pptx
Planning a health career 4th Quarter.pptx
Ā 
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdfAMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
AMERICAN LANGUAGE HUB_Level2_Student'sBook_Answerkey.pdf
Ā 

Application Of Linguistic Cues In The Analysis Of Language Of Hate Groups

  • 1. Application of linguistic cues in the analysis of language of hate groups Bartlomiej Balcerzak1 , Wojciech Jaworski1,2 , and Adam Wierzbicki1 1 Polish-Japanese Institute of Information Technology Koszykowa 86 02-008 Warsaw, Poland 2 Institute of Informatics, University of Warsaw Banacha 2 02-097 Warsaw, Poland Abstract. Hate speech and fringe ideologies are social phenomena that seem to thrive on-line. Various members of the political and religious fringe are able, via the Internet, to propagate the idea with less effort compared to more traditional media. In this article we attempt to use linguistic cues such as parts of speech occurrence in order to distinguish the language of fringe groups from strictly informative sources. The aim of this research is to provide a preliminary model for identifying decep- tive and persuasive materials on-line. Examples of such would include aggressive marketing and hate speech. For the sake of this paper we aim to focus on the political aspect. Our research has shown that informa- tion about sentence length and the occurrence of adjectives and adverbs can provide information for the identification of differences between the language of fringe political groups and mainstream news media. The most efficient method involved the classification of fringe sentences (as detected by Part of Speech occurrence) within an article, and taking into account their frequency within the article. Key words: hate speech, natural language processing, propa- ganda, machine learning 1 Introduction In cyberspace millions of people send out and review terabites of data. At the same time, countless political, religious and ideological agendas can be propa- gated with nearly no limitations. This in turn leads to an increased risk of the successful spread of various types of ideologies. Many tools can be introduced, in order to combat this process and promote a critical approach to informa- tion available on-line. With the ever growing plethora of websites and forums, an automated method of recognizing such material seems a viable solution. In this paper we propose an approach to this problem, which can lead to the de- velopment of such tools. By applying methods of natural language processing, we aim to construct a classiļ¬er for identifying the language of political propa- ganda. For the sake of this research we decided to focus on written text that
  • 2. constitute the language of entities from the political extremes. Applying meth- ods that concentrate on the structure of text rather than word count may allow for a more generalized method of automated text processing. In this paper we want to use a selection of verbal cues as a method of distinguishing the language of political propaganda. These characteristics include basic information about sentence length, and itsā€™ structure, and will be extracted with the use of Part of Speech taggers already available for the English language. Unlike the bag of words approach, which focuses on word concurrence within a set of documents, our approach aims to focus on a deeper level of analysis, one involving cues that describe the style of the text. By using this information, we plan to construct robust classiļ¬ers able to identify the language of fringe elements in the political spectrum. In order to perform our research we collected two corpora representing the extremes of the political spectrum. One, are texts gathered from a set websites connected with American groups promoting radical nationalistic ideologies (i.e. national socialism, white and black supremacy, racism, anti-immigrant combat- ants). The other corpus consists of text from websites that promote communism. We decided to split the fringe material in two corpora, in order to take account of potential diļ¬€erences between extreme ideologies. This collection will be com- pared with the language of mainstream political news, both from national and local news agencies and newspapers. If this model of linguistic cues that are based on part of speech occurrence provides viable results with such diļ¬€ering materials it would indicate that it can also be used in more subtle scenarios. Analysis of the given material has been conducted both on the article and sentence level. This involved the application of machine learning algorithms in order to identify sentences and articles belonging to political propaganda. Our main assumption is that since fringe ideologies are bent on changing the world in a radical way, materials that endorse such a sentiment would be mostly emotional and ļ¬lled with statements that evaluate the current situation as well as the desired ideal world, the end game of fully implementing the tenets of the ideology. Therefore, when speaking in linguistic term, texts belonging to a fringe ideology corpus would have higher amounts of adjectives and adverbs than strictly informative sources. Hence, the following hypotheses regarding the role each of the selected linguistic attributes have been proposed for fringe ideology detection: Hypothesis 1 Text belonging to fringe ideologies contain more adjec- tives and adverbs than strictly informative texts. which can also be rephrased as: Hypothesis 2 Sentences belonging to fringe ideologies contain more adjectives and adverbs than strictly informative texts. We also propose an additional hypothesis regarding the average length of a sen- tence in both fringe and informative sources.
  • 3. Hypothesis 3 Texts that endorse fringe ideologies contain longer sen- tences than strictly informative sources. In our current research we focus both on the sentence and article level. We aim to verify these hypotheses by using machine learning techniques. Extracted features will serve as attributes used by the algorithm for training. This means that standard procedures of machine learning evaluation will be applied. We also propose a fourth hypothesis: Hypothesis 4 When predicting whether a text belongs to a fringe ide- ology or informative sources, information about the frequency of fringe sentences within it provides the highest measures of performance (clas- sification accuracy) In order to test these hypotheses, two main tasks have been conducted. First involved using machine learning techniques to identify whether the sentences or articles from our corpus belong to the fringe or informative class. Afterwards the output from the models applied to this tasks will be used on the set of articles from our corpora. The second task would be predicting, based on the frequency of propaganda sentences in an article, as given by the classiļ¬er, whether said article belongs to the category of fringe or information. After the performance of algorithms used in each task is evaluated, attribute extraction methods are used in order to determine which attributes are crucial for determining whether the source is fringe or information. 2 Related work Various ļ¬elds of natural language processing can be connected with the problem of identifying the language of political fringe. One of these ļ¬elds would be text genre detection, where NLP tools are used in order to identify the genre of a given text. Such tools involve a bag-of-words approach as well as Part-of-speech n-grams as used in [1]. Other applications of genre classiļ¬cation include the use of both linguistic cues and html structure of the web page[2]. In our work, how- ever we focus not only on detecting text genres, but also on identifying a very speciļ¬c type of narration, which can be classiļ¬ed as manipulative ore deceptive. This leads to a second ļ¬eld of study that can be applied to the study of ma- terials from the political fringe. Analysis of such sources on the Internet had been present in the ļ¬eld of computer science for some time. The research done so far can be divided into three main areas of investigation. First of them can be described as studies into behavioral patterns of propagandists on-line. This approach focuses mostly on the agent distributing the content, rather than the content itself, it is very strongly inspired by research into spam detection [3]. Such research was mostly focused on micro-blogging platforms such as Twitter [?]lumezano) or open source projects such as Wikipedia [4]. What is notable is the fact that in these papers the main source of traits that could identify pro- paganda are mostly meta information, referring to frequency of commenting on
  • 4. a material, and presenting the same content repeatedly. The other main type of research is the ļ¬eld of deception detection which deals with the problem of distin- guishing whether the author of a text intended to fool the recipient. Researchers working in this ļ¬eld wanted to identify linguistic cues related with deceptive behavior. Most of these cues such as the ones used by [9] in experimental setups or [6] for fraudulent ļ¬nancial claims emphasized basic shallow characteristics such as sentence or noun phrase length that showed to be indicative of deceptive materials. A more sophisticated method was proposed by [8]. In their paper they introduced a concept of stylometry (analyzing larger syntactical structures) for deception detection and opinion spam. Their work focuses on the structure of syntax, the relation between larger elements of the sentence (ie. Noun phrases). A slightly diļ¬€erent approach was tested by [5] in their essay experiment. With the aid of LIWC tool[12] they aimed to identify words that appear most often in deceptive materials. In our work we want to extend and test the ļ¬ndings of deception detection for the task of identifying textual materials forming the bulk of fringe ideologies. It is also worth noting that most of the research in this ļ¬eld was done for the English language. The only instance of deception detection in another language that we managed to ļ¬nd was study [7] that was done for Italian. 3 Hypothesis verification 3.1 Dataset In order to test our hypothesis we gathered a corpus consisting of both radical ideological material and balanced informative text. All of the corpora represent modern American English. We prepared a collection of texts taken from three distinguished sources: ā€“ Nazi corpus: This contains articles from websites belonging to groups in the US that promote national socialism and ideas of racial and ethnic superi- ority. When collecting the websites, we used the list of hate groups provided by the Southern Poverty Law Center[18]. Noticeable sources include Amer- ican National Socialist Party, National Socialist movement, Aryan Nations etc. In all, 100 web pages were extracted. An example of a text from this corpus is shown below: To a National Socialist, things like PRIDE, HONOR, LOYALTY, COURAGE, DISCIPLINE, and MORALITY - actually MEAN something. Like our fore- fathers, we too are willing to SACRIFICE to build a better world for our children whom we love deeply, and like them as well - we are willing to DO ANYTHING NECESSARY to ACHIEVE THAT GOAL. We are your brothers and your sisters, we are your fathers and your mothers, your friends and your co-workers - WE ARE WHITE AMERICA, just like YOU! Your enemies in control attempt with their constant ā€anti-naziā€ propaganda - to persuade you that ā€theyā€ are your ā€real friendsā€ - and that WE, your own
  • 5. kin, are your lifelong enemies. Yet, ask yourself THIS - ā€WHOā€ have you to ā€THANKā€ for ALL the PROBLEMS FACING YOU? The creatures who HAVE been in total CONTROL - or - those of us resisting the evil? The TRUTH is right there in front of you - DONā€™T be afraid to understand it and to ACT upon it! You hold the FUTURE - BRIGHT or DARK - in YOUR hands, and TIME IS RUNNING OUT! ā€“ Communist Corpus: This corpus contains pages from websites of Ameri- can groups and parties that describe themselves as communist. These include among others: Progresive Labour Party and The Communist party of the United States of America. As with the Nazi corpus, 100 web pages were ex- tracted. Example is presented below: Todayā€™s action, organized by Good Jobs Nation, comes a year after it filed a complaint with the Department of Labor that accused food franchises at federal buildings of violating minimum-wage and overtime laws. They want Obama to sign an executive order requiring federal agencies to contract only with companies that engage in collective bargaining. The union leaders pulling the strings behind Good Jobs Nation are the same people who got us into this mess in the first place. Most contract jobs used to be full-time union jobs, and the unions did nothing to stop the bosses from eliminating them. Now the unions are trying to rebuild their ranks among low-wage workers who replaced their former members. We need to abolish wage slavery with com- munist revolution. And the struggle between reform and revolution must be waged within struggles like this one. ā€“ News Corpus: This corpus will serve as a reference point. It contains polit- ical news and opinion sections from various American news sources (CNN, FOX news, CNBC, and local media outlets), in order to construct a balanced corpus for analysis also 100 web pages are extracted for the task of training a machine learning algorithm. A typical article from this set looks like one shown below: President Barack Obamaā€™s policy toward Syria ā€“ three years of red lines and calls for regime change ā€“ culminated Monday in a barrage of air strikes on terror targets there, marking a turning point for the conflict and thrusting the President further into it.The U.S. said Saudi Arabia, the United Arab Emirates, Qatar, Bahrain and Jordan had joined in the attack on ISIS tar- gets near Raqqa in Syria. The U.S. also launched air strikes against another terrorist organization, the Khorasan Group.
  • 6. 3.2 Dataset preprocessing After the working corpus has been prepared we divided each sentence in it into tokens and conducted Part-of-Speech tagging (with the use of NLTK default POS tagger). Sentence and word length have been calculated as well, hence creating a database containing the following attributes of all sentences: ā€“ Sentence Class: a variable determining, whether the sentence belongs to the informative or fringe corpus ā€“ Traits describing textual quantity: number of tokens in a sentence and average number of characters in words constituting a sentence. ā€“ Frequency of adjectives and adverbs We also included occurrence of other parts of speech in order to test if they have any impact on distinguishing fringe language from informative sources. After the database was prepared, machine learning algorithms have been implemented. In order to evaluate the performance of said algorithms in both of the tasks, the following measures will be used: ā€“ Accuracy: the percentage of all true positives and true negatives produced by the machine learning algorithm. ā€“ AUC (Area under ROC Curve): the measure of True Positive to False Posi- tive ratio for various thresholds provided by the machine learning algorithm. 3.3 Machine learning: sentence prediction The task given to the algorithms was to identify whether a sentence or article belongs to the fringe or news corpus. 5-fold cross validation procedure has been implemented in order to validate the algorithmsā€™ performance. 3.4 Article prediction In this section, we analyze the performance of the article prediction based on the frequency of Parts of Speech. Since our corpora are balanced, the baseline measuring performance is 50%, both for accuracy and AUC. Three algorithms turned put to be the most eļ¬€ective, these being Naive Bayes, K-NN (K=7, cosine distance) and Neural Net. Table 1. AUC and Accuracy of used algorithms Algorithms applied Acc. Nazi AUC Nazi Acc. Comm AUC Comm Naive Bayes 71% 83% 75% 85% K-NN 60% 66% 59% 61% Neural Net 70% 75% 77% 87%
  • 7. As it is shown in table 1, the best scores were produced by the Naive Bayes and Neural Net Algorithms. The accuracy and AUC values for these were within the range of 70 to 80 per cent, which indicates a moderately strong performance. It also can pointed out that higher scores were achieved for the communist corpus. This may be in part due to the fact, that communist articles came from mainly two large sources. Therefore their style maybe more cohesive. The national-socialist corpus on the other hand was more diverse, allowing for a more broader spectrum of styles. Still, both types of fringe ideologies achieved similar results. Applied methods of attribute extraction shown that the most important attributes in this task were the frequency of adjectives, adverbs, and proper nouns. As shown in table 2 containing the standardized diļ¬€erences between the mean values of these attributes, the frequency of adjectives was higher in both fringe corpora in relation to the informative sources. In case of adverbs the eļ¬€ect was visible only for Nazi materials. What is also interesting, articles belonging to fringe ideologies contained less proper nouns than informative sources. This maybe because ideological materials tend to be more vague, and connected with more general processes and entities. However such hypothesis needs to be veriļ¬ed separately. Table 2. Standardized differences between mean values for adjective, adverb and proper noun frequency Nazi/News Communist/News Adjectives% 0,24 0,39 Adverbs% 0,2 -0,04 Proper Nouns% -0,38 -0,33 3.5 Sentence prediction Table 3. AUC of used algorithms Acc. nazi AUC nazi Acc. communist AUC communist Naive Bayes 61% 65% 61% 65% k-NN 62% 69% 64% 64% Neural Net 65% 72% 63% 69% As shown in 3 the results tend to be between the threshold of 60 and 70%. No larger diļ¬€erences have been observed when analyzing the Nazi and communist
  • 8. corpora. It is also worth noting that classiļ¬cation conducted on the sentence level provided weaker scores than the one done for the articles. This may be due to the fact that sentences are shorter, therefor provide less data that the machine learning algorithms can use. Similarly to the article prediction task, this is also complemented by attribute extraction scores from chi-square method. According to chi-squared method the most important attributes include: sentence length, adjectives, adverbs, and proper nouns frequency. The relative importance of the attributes used is shown in table 4, containing the standardized diļ¬€erences between the their mean val- ues. Only the attributes with the highest diļ¬€erences are presented in this table. Positive values indicate that the mean value of a given attribute was higher for the fringe corpus. Negatives indicate that the value was higher for the news corpus. Table 4. Standardized differences between mean values for adjective, adverb, noun and proper noun frequency Nazi/News Communist/News Sentence length 0,12 - 0,14 Adjectives% 0,3 0,3 Adverbs% 0,2 0,04 Nouns% -0,18 0,36 Proper Nouns% -0,32 -0,4 The table show a pattern similar to that observed when classifying articles. Fringe ideology sentences, both communist and Nazi, contained a higher amount of adjectives and adverbs. They also had less proper nouns than informative sources. What sets the sentence classiļ¬cation aside from the article one is the relatively high diļ¬€erence in the frequency of nouns between communist material and informative sources, as well as the fact that both corpora varied in regards to average sentence length. The Nazi one tend to contain longer sentences, while communist collection has shorter ones. In both cases the diļ¬€erences are not as pronounced as in other attributes. 3.6 Sentence based article prediction For this task, we decided to use the labels applied to sentences in the previ- ous subsection in order to predict whether the entire articles belong to fringe or informative class. Frequency of fringe sentences, as provided by the Neural Net classiļ¬er has been calculated for each article from the corpora used in our research. Afterwards, we calculated the optimal threshold for classiļ¬cation. For Nazi materials the threshold value was 57% of fringe sentences per article, and for communists it was 63%. The performance scores for these corpora are shown in table 5.
  • 9. Table 5. Performance Accuracy AUC Nazi 81% 80% Communist 83% 81% When compared with the scores of machine learning from the method used in this subsection provides a higher accuracy (80% and 83% compared to 70% and 77% respectively), with a relatively similar level of AUC. What is also worth noting, the performance of this method is high even though the scores for sen- tence prediction rarely exceeded the 70% threshold for either accuracy and AUC. This observation indicates that even tough prediction on a sentence level can be faulty, but, when aggregated, they provide a strong signal that allows for a more successful identiļ¬cation of fringe sources. What is more this method proves to be slightly more eļ¬ƒcient than using Part-of-speech information on the article level. Further work on the possibilities and limits of this approach will be pursued. 4 Conclusions and observations Our research has led as to the following conclusions and observations: ā€“ Article classiļ¬cation based on Part-of-Speech tagging has provided robust scores, indicating that the chosen attributes can be used for identifying fringe ideological sources. With a baseline of 50% for both accuracy and AUC, the machine learning algorithms achieved scores exceeding 70% for accuracy, and 80% for AUC. Neural Net and K Nearest Neighbors algorithms produced the best performance. ā€“ The same task conducted on the level of sentences provided weaker scores, mostly within the 60% - 70% range both for accuracy and AUC. However, when the information about the frequency of fringe sentences was applied to articles, the performance was increased to 80% for accuracy and AUC. Consequently these score provide a more robust classiļ¬cation than the one based on Part-of-Speech frequency on the article level. This may indicate that sentence level classiļ¬cation is burdened with high noise which may be countered by taking account of fringe sentences frequency in the article. ā€“ In view of collected data hypothesis 3 cannot be considered as true. The sentence length proved not to be an important attribute for determining whether the sentence belongs to a fringe or informative source. The texts from the Nazi corpus, on average, contained longer sentences. Conversely, communist sources where constructed of shorter ones. However neither pro- duced diļ¬€erences large enough to aļ¬€ect the machine learning algorithms performance. ā€“ Analysis on both the article and sentence level shows that adjectives and ad- verbs frequency plays an important role in identifying fringe sources. More- over in collected data allows us to consider hypotheses 1 and 2 as veriļ¬ed.
  • 10. ā€“ Analysis has also shown that fringe materials contain less named entities than informative sources. This may suggest that fringe text tend to be more vague than those that are solely informative. In summary, our research lends credence to the notion that such basic linguistic cues as occurrence of Parts-of-Speech can be used for determining whether a text belongs to the language of information or fringe ideology. 5 Future work The results we came up with show, that there is room for further research. Using basic stylistic cues for identifying speciļ¬c narrations may be extended to include language of political, religious or scientiļ¬c discourse. In our future work we plan to gather text corpora related to such types of language (marketing, religion, science etc.) and test them with the use of shallow linguistic characteristics. We also plan to include in our model such elements of propaganda as repetition, and vagueness [19][15]. We plan to conduct more detailed analysis of the two observations we made when researching the role of linguistic features: the dif- ferences in named entities frequency in fringe and informative sources, and the frequency of positive class sentences as a method of article prediction. Devising a computational model may lead to the development of tools dedicated to au- tomatic detection of speciļ¬c forms of languages. So far we worked with modern documents written in English which are focused mostly on one ideology. This is why in our future work we aim to extend our focus the historical texts written in diļ¬€erent languages (Polish, German etc.). Therefore we will be able to show whether or not the language of propaganda follows a rigid unchanging pattern, or is it strongly culture-based. It is also important in order to identify linguistic cues that can be applied to fringe ideologies only. To this end we plan to apply the developed model for other fringe ideologies and pseudoscience. Results we obtained with shallow methods of POS tagging show great promise. The next natural step would be to use deeper NLP methods, such as syntax pars- ing, or transcribing texts into logical formulas. In summary, our current work serves as preliminary research into a ļ¬eld of computational analysis of discourse. Therefore we can obtain an eļ¬€ective model, which can be extrapolated to var- ious topical domains. This will prove useful as a way of constructing a more general theory of computational models of recognizing language of fringe ide- ologies (as well as other forms of written language such as language of religion, science, marketing or various social classes). This will be especially important for classiļ¬cation method based on the fringe sentence frequency in the article. Providing a more general method of text corpora analysis, therefore allowing for the design of more intricate automated system that could detect propaganda or other forms of manipulative content. Focusing on structural aspects would make it also more resistant to deceptive strategies on the part of content producers.
  • 11. 6 Acknowlegments This work was ļ¬nancially supported by the European Community from the European Social Fund within the INTERKADRA project. It was also supported by the grant Reconcile: Robust Online Credibility Eval- uation of Web Content from Switzerland through the Swiss Contribution to the enlarged European Union. References 1. Sharoff, Serge. ā€Classifying Web corpora into domain and genre using automatic feature identification.ā€ Proceedings of the 3rd Web as Corpus Workshop. 2007. 2. Santini, Marina, Richard Power, and Roger Evans. ā€Implementing a characteriza- tion of genre for automatic genre identification of web pages.ā€ Proceedings of the COLING/ACL on Main conference poster sessions. Association for Computational Linguistics, 2006. 3. Metaxas, Panagiotis Takis. ā€Web spam, social propaganda and the evolution of search engine rankings.ā€ Web Information Systems and Technologies. Springer Berlin Heidelberg, 2010. 170-182. 4. Chandy, Rishi. ā€Wikiganda: Identifying propaganda through text analysis.ā€ Cal- tech Undergraduate Research Journal. Winter 2009 (2008). 5. Mihalcea, Rada, and Carlo Strapparava. ā€The lie detector: Explorations in the automatic recognition of deceptive language.ā€ Proceedings of the ACL-IJCNLP 2009 Conference Short Papers. Association for Computational Linguistics, 2009. 6. Humpherys, Sean L., et al. ā€Identification of fraudulent financial statements using linguistic credibility analysis.ā€ Decision Support Systems 50.3 (2011): 585-594. 7. Fornaciari, Tommaso, and Massimo Poesio. ā€Lexical vs. surface features in de- ceptive language analysis.ā€ Proceedings of the ICAIL 2011 Workshop Applying Human Language Technology to the Law, AHLTL. 2011. 8. Feng, Song, Ritwik Banerjee, and Yejin Choi. ā€Syntactic stylometry for deception detection.ā€ Proceedings of the 50th Annual Meeting of the Association for Compu- tational Linguistics: Short Papers-Volume 2. Association for Computational Lin- guistics, 2012. 9. Ott, Myle, et al. ā€Finding deceptive opinion spam by any stretch of the imag- ination.ā€ Proceedings of the 49th Annual Meeting of the Association for Com- putational Linguistics: Human Language Technologies-Volume 1. Association for Computational Linguistics, 2011. 10. Lee, A. M. and Lee(eds.), E. B. (1939). The Fine Art of Propaganda. The Institute for Propaganda Analysis. Harcourt, Brace and Co. 11. Bird, Steven, Ewan Klein, and Edward Loper. Natural language processing with Python. Oā€™Reilly Media, Inc., 2009. 12. Pennebaker, James W., Martha E. Francis, and Roger J. Booth. ā€Linguistic inquiry and word count: LIWC 2001.ā€ Mahway: Lawrence Erlbaum Associates 71 (2001): 2001. 13. Progressive Labour Party website. http://www.plp.org/ 14. Communist Party of the United States of America website www.cpusa.org 15. G?fu, Daniela, and Dan Cristea. ā€Towards an Automated Semiotic Analysis of the Romanian Political Discourse.ā€ Computer Science 21.1 (2013): 61.
  • 12. 16. Paik, Woojin, et al. ā€Applying natural language processing (nlp) based metadata extraction to automatically acquire user preferences.ā€ Proceedings of the 1st inter- national conference on Knowledge capture. ACM, 2001. 17. GıĢ‚fu, Daniela, and Ioan Constantin Dima. ā€An operational approach of communi- cational propaganda.ā€ International Letters of Social and Humanistic Sciences 23 (2014): 29-38. 18. http://www.splcenter.org/ 19. Lacoue-Labarthe, Philippe, and Jean-Luc Nancy. Le mythe nazi. Editions de lā€™Aube, 1998.