投影片 1

401 views

Published on

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
401
On SlideShare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

投影片 1

  1. 1. Sentiment and Affect analysis of Dark Web Forums: Measuring Radicalization on the Internet<br />Hsinchun Chen, Fellow, IEEE<br />
  2. 2. Introduction<br />Web forums offer participants a medium to express their opinions and emotions freely in discussion.<br />Extremist and terrorist groups also use web forums for community.<br />Expression and dissemination of their ideologies and propaganda<br />Such forums are often referred to as being part of Dark Web<br />
  3. 3. Introduction<br />Information contained within Dark Web forums represent asignificant source of knowledge for security and intelligence organizations.<br />Theopinions and emotions expressed within these forums provide valuable insights:<br />the nature and position of the online community <br />Characterizing individual participants<br />Manual analysis of the vast quantities of messages to measure the opinions and emotions expressed is often infeasible. <br />
  4. 4. Introduction<br />This paper presents an automated approach to sentiment and affect analysis of two Dark Web forums related to the Iraqi insurgency and Al-Qaeda.<br />The automated approach utilizes a rich set of textual features and machine learning techniques.<br />
  5. 5. Related Work<br />Sentiment and affect analysis are related tasks in text mining that focus on directional text, containing opinions, emotions, and biases.<br />[5] M. A. Hearst, “Direction-based text interpretation as an information <br />access refinement,” In Text-Based Intelligent Systems: Current Research <br />and Practice in Information Extraction and Retrieval. Lawrence <br />Erlbaum Associates, 1992. <br />[6] J. Wiebe, “Tracking point of view in narrative,” Computational<br />Linguistics, vol. 20 (2), pg. 233-287, 1994. <br />
  6. 6. Related Work<br />Sentiment analysis attempt to identify, analyze, and measure opinions expressed in text.<br />Affect analysis focuses on the emotional content of the communication.<br /> R. Agrawal, S. Rajagopalan, R. Srikant, and Y. Xu, “Mining <br />newsgroups using networks arising from social behavior,” Proc. of the <br />12th Int’l WWW Conf., 2003. <br />P. Subasic and A. Huettner, “Affect analysis of text using fuzzy <br />semantic typing,” IEEE Trans. Fuzzy Systems, vol. 9 (4), pg. 483-496.<br />
  7. 7. Related Work<br />There are some important distinction between the two<br />Affect analysis evaluates the intensity of a number of potential emotions, including happiness, sadness, anger, fear, etc<br />Sentiment analysis considers the polarity of opinions along a positive-neutral-negative continuum.<br />The words and phrases associated with sentiments are mutually exclusive.<br />Segments of text can convey multiple affects<br />
  8. 8. Related Work<br />Researchers have utilized various machine learning approaches to perform automated sentiment and affect analysis.<br />B. Pang, L. Lee, and S. Vaithyanathain, “Thumbs up? sentiment <br />classification using machine learning techniques,” Proc. Empirical <br />Methods in Natural Language Processing, pg. 79-86, 2002. <br />R. W. Picard, E. Vyzas, and J. Healey, “Toward machine emotional <br />intelligence: analysis of affective physiological state,” IEEE Tran. <br />Pattern Analysis and Machine Intelligence, vol. 23 (10), pg. 1179-1191, <br />2001. <br />
  9. 9. Related Work<br />In particular, the SVM learning approach has been shown to be particularly effective in determining whether a text segment contains expression of a particular affects class.<br />Only for discrete label.<br /> Y. H. Cho and K. J. Lee, “Automatic affect recognition using natural <br />language processing techniques and manually built affect lexicon,” <br />IEICE Tran. Information Systems, vol. E89 (12), pg. 2964-2971, 2006. <br />
  10. 10. Related Work<br />SVR is an alternate approach that is capable of predicting continuous sentiment and affect intensities while benefitting from the robustness of SVM.<br /> A. Webb, Statistical Pattern Recognition. John Wiley & Sons, 2002. <br />
  11. 11. Research Questions<br />In a recent book by Ryan, the author highlights the critical role that the Web forums play for militant Islamic radicalization on the Internet.<br />Marc Sageman, an internationally renowned terrorism study consultant, also emphasizes the importance of the internet, especially forums.<br />This paper presents our web mining research on sentiment and affect analysis of two large-scale, internal Jihadist forums.<br />
  12. 12. Research Questions<br />This study seeks to answer the following research questions:<br />How effective are automated methods of sentiment and affect analysis in measuring the polarities of opinions and intensities of emotions in Dark Web forums? <br />What insights into the Dark Web forums are gained by performing sentiment and affect analysis? <br />
  13. 13. Data<br />Two Dark Web forums were selected for sentiment and affect analysis<br /> Al-Firdaws (www.alfirdaws.org/vb)<br />Montada (www.montada.com)<br />Al-Firdaws<br />a more radical forum<br />considerable content dedicated to support of the Iraqi insurgency and Al-Qaeda.<br />Montada<br />Montada is a general discussion forum with content pertaining to a variety of social and religious issues.<br />Domain experts consider Montada to be more moderate compared to Al-Firdaws, with less radical content.<br />
  14. 14. Data<br />Spidering programs were used to collect the content from the two web forums.<br />A summary of the collection statistics is presented in Table I.<br />Data set is larger.<br />An older forum<br />Al-Firdaws is too radical<br />
  15. 15. Data<br />Both Al-Firdawsand Montada are major forums for their respective purposes and communities, with relatively high membership levels and numerous authors.<br />
  16. 16. Data<br />In both cases postings are more evenly distributed across web forum threads.<br />Although the Montada forum has a larger average number of posts per thread compared to Al-Firdaws, the median number of posts per thread is nearly equal.<br />
  17. 17. Data<br />500 sentences were selected from each web forum, and scored for the intensities of sentiments and affects expressed.<br />The affects of interest in the study included those of most interest to security and intelligence organizations<br />including violence, anger, hate, and racism. <br />These affects were measured on a continuous scale ranging from 0 to 1.<br />The sentiment measurement was on a continuous scale from -1 to 1<br />
  18. 18. Data<br />
  19. 19. Methods<br />
  20. 20. Methods<br />Annotation step<br />Character, word, root, collocation n-grams<br />Character and word n-grams are commonly used in text mining applications. <br />To derive root level n-grams, Arabic words were converted to their roots using a clustering algorithm.<br />Collocation n-grams included the Hapax and Dis collocations.<br />Features with less than four occurrences in the test bed were excluded.<br />
  21. 21. Methods<br />
  22. 22. Methods<br />The machine learning approach for identifying the presence and intensities of sentiments and affects in Dark Web forum sentences utilized a SVR ensemble.<br />SVR was utilized toleverage the robustness of SVM, while accommodating the continuous intensities of sentiments and affects.<br />Ensemble classifiers aggregate multiple independent classifiers built using different techniques or feature subsets<br />improving performance over a single classifier.<br />
  23. 23. Methods<br />For the analysis of the Al-Firdaws and Montada web forums, a separate classifier was developed for each of the five sentiment and affect classes<br />
  24. 24. Methods<br />Feature selection<br />Information gain (IG) heuristic<br />Discretization of intensities were performed before IG could be applied and the relevant features selected.<br />To compensate for the discretization, multiple iterations were performed varying the number of class bins for intensity between 2 and 10.<br />The IG heuristic was used recursively to select relevant features in these iterations using recursive feature elimination (RFE).<br />
  25. 25. Methods<br />
  26. 26. Methods<br />The feature selection phase resulted in a subset of the features identified in the test bed selected for each of the 5 classifiers in the ensemble.<br />Originally 7556 features.<br />Only 22% was selected<br />
  27. 27. Methods<br />Evaluation was performed using 10-fold cross validation<br />
  28. 28. Results<br />A sample of messages and their sentiment and affect intensities determined through automated analysis are presented inTable VII. <br />
  29. 29. Results<br />Results confirm the assessment of the forums by domain experts.<br />The Al-Firdaws forum contained higher intensities of violence and hate affects with a more negative sentiment polarity<br />
  30. 30. Results<br />The percentage of postings containing intense levels of the four affects are greater in the Al-Firdaws forum compared to the Montada forum, as shown in Figs. 8 and 9.<br />
  31. 31. Results<br />The violence and hate affects were used by a relatively large percentage of Al-Firdaw authors<br />
  32. 32. Results<br />A time series analysis was performed to understand how forum affect intensities progressed over time<br />

×