SlideShare a Scribd company logo
1 of 25
Word Embeddings to Enhance Twitter
Gang Member Profile Identification
Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis)
Wright State University, Dayton, OH, USA
Amit Sheth
amit@knoesis.org
Derek Doran
derek@knoesis.org
Presented at IJCAI Workshop on Semantic Machine Learning (SML 2016)
New York City, NY, USA, July 10, 2016
Lakshika Balasuriya
lakshika@knoesis.org
Sanjaya Wijeratne
sanjaya@knoesis.org
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification 2SML @ IJCAI 2016
Tweets Source – http://www.wired.com/2013/09/gangs-of-social-media/all/
Image Source – http://www.7bucktees.com/shop/chi-raq-chiraq-version-2-t-shirt/
*Example tweets shown above are public data reported in the cited news paper article. They are not part of our research dataset.
3SML @ IJCAI 2016
What does gang related research tell us?
“Gangs use social media mainly to post videos depicting
their illegal behaviors, watch videos, threaten rival gangs
and their members, display firearms and money from
drug sales” [Patton 2015, Morselli 2013]
Studies have shown,
 82% of gang members had used the Internet and 71%
of them had used social media [Decker 2011]
 45% of gang members have participated in online
offensive activities and 8% of them have recruited new
members online [Pyrooz 2013]
Image Source – http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
4SML @ IJCAI 2016
There are other spectators too…
Source – http://www.lexisnexis.com/risk/downloads/whitepaper/2014-social-media-use-in-law-enforcement.pdf
Image Source – http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
5
But there’s a problem…
Image Source – http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg
Wijeratne, Sanjaya et al. "Analyzing the Social Media Footprint of Street Gangs" in IEEE ISI 2015, doi: 10.1109/ISI.2015.7165945
Source – http://www.lexisnexis.com/risk/downloads/whitepaper/2014-social-media-use-in-law-enforcement.pdf
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile IdentificationSML @ IJCAI 2016
6SML @ IJCAI 2016
Enhance Twitter Gang Member Profile
Identification with Word Embeddings
Image Source – http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
7SML @ IJCAI 2016
Gang Member Dataset
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Number of Words
from each Feature
Gang Member
Class
Non-gang Member
Class
Total
Tweets 3,825,092 45,213,027 49,038,119
Profile Descriptions 3,348 21,182 24,530
Emoji 732,712 3,685,669 4,418,381
YouTube Videos 554,857 10,459,235 11,041,092
Profile Images 10,162 73,252 83,414
Total 5,126,176 59,452,365 64,578,536
Description Gang Member
Class
Non-gang
Member Class
Total
Number of Profiles 400 2,865 3,265
Number of Tweets 821,412 7,238,758 8,060,170
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
8SML @ IJCAI 2016
Feature Type – Tweets
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Words from Gang Member’s Tweets Words from Non-gang Member’s Tweets
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
9SML @ IJCAI 2016
Feature Type – Profile Descriptions
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
*Example Twitter profiles shown above are public data reported in news articles. They are not part of our research dataset.
Word usage in profile descriptions: Gang Vs. Non-gang members
Percentofprofilesthatcontainthewordinprofiledescription
10SML @ IJCAI 2016
Feature Types – Emoji
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
Emoji usage distribution: Gang Vs. Non-gang members
PercentofprofilesthatcontainstheemojiinTweets
11SML @ IJCAI 2016
Feature Types – YouTube Links
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
51.25% of gang members post at least one
YouTube video
 76.58% of them are related to gangster hip-hop
 A gang member shares 8 YouTube links on average
 Top 5 words from gang members – shit, like,
nigga, fuck, lil
Top 5 words from non-gang members – like,
love, peopl, song, get
12SML @ IJCAI 2016
Feature Types – Profile Images
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
Image tags distribution: Gang Vs. Non-gang members
Percentofprofilesthatcontainstheimagetag
13SML @ IJCAI 2016
Classification Approach
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Convert all non textual features to text
 Emoji for Python API for emoji1
 Clarifai API for profile and cover images2
Remove stop word and stem the remaining
Train a Skip-gram model using textual features
 Negative sample rate = 10
 Context window size = 5
 Ignore words with min count = 5
 Number of dimensions = 300
1https://pypi.python.org/pypi/emoji | 2http://www.clarifai.com/
14SML @ IJCAI 2016
Classification Approach Cont.
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Classifier training with word embeddings
15SML @ IJCAI 2016
Representing Training Examples
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
For each Twitter profile p, with n unique
words, and the vector of the ith word in p
denoted by wip, we compute the feature
vectors for Vp in the following ways.
 Sum of word embeddings – Sum of word vectors for
all words in Twitter profile p
16SML @ IJCAI 2016
Representing Training Examples Cont.
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
 Mean of word embeddings – Mean of word vectors
for all words in Twitter profile p
 Sum of word embeddings weighted by term
frequency where term frequency of ith word in p is
denoted by cip
17SML @ IJCAI 2016
Representing Training Examples Cont.
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
 Sum of word embeddings weighted by tf-idf where
tf-idf of ith word in p is denoted by tip
 Sum of word embeddings weighted by term
frequency where term frequency of ith word in p is
denoted by cip.
18SML @ IJCAI 2016
Classification Results
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Classification results based on 10-fold cross validation
19SML @ IJCAI 2016
Results Interpretation
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Averaging the vector sum weighted by term
frequency performs the best
Out of the five vector based operations on
word embeddings;
 Four of them gave classifiers that beat baseline
model(1)
 Two of them performed as nearly equal or beat
baseline model(2)
20SML @ IJCAI 2016
Results Interpretation Cont.
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
Word vector (top 10) for BDK showed
 Five were rival gangs of Black Disciples
 Two were syntactic variations of BDK
 Three were syntactic variations of GDK
Word vector (top 10) for GDK showed
 Six were Gangster Disciple gangs where others
have expressed their hatred towards them
Word vector (top 10) for CPDK showed
 Other keywords showing hatred towards cops
such as fuck12, fuckthelaw, fuk12 and gang names
21SML @ IJCAI 2016
Challenges and Future work
 Build image recognition systems that can
accurately identify images commonly posted by
gang members
 Guns, Drugs, Money, Gang Signs
 Integrate slang dictionaries and use them to pre-
train word embeddings
 Hipwiki.com
 Utilize social networks of gang members to
identify new gang members
Image Source – http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
22SML @ IJCAI 2016
Connect with me
sanjaya@knoesis.org
@sanjrockz
http://bit.do/sanjaya
Image Source – http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg
Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
23SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
References
[1] D. U. Patton, “Gang violence, crime, and substance use
on twitter: A snapshot of gang communications in
Detroit,” Society for Social Work and Research 19th
Annual Conference: The Social and Behavioral
Importance of Increased Longevity, Jan 2015
[2] C. Morselli and D. Decary-Hetu, “Crime facilitation
purposes of social networking sites: A review and
analysis of the cyber banging phenomenon,” Small
Wars & Insurgencies, vol. 24, no. 1, pp. 152–170, 2013
[3] S. Decker and D. Pyrooz, “Leaving the gang: Logging off
and moving on. Council on foreign relations,” 2011
24SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
References Cont.
[4] D. C. Pyrooz, S. H. Decker, and R. K. Moule Jr, “Criminal
and routine activities in online settings: Gangs,
offenders, and the internet,” Justice Quarterly, no.
ahead-of-print, pp. 1–29, 2013
[5] S. Wijeratne, D. Doran, A. Sheth, and J. L. Dustin,
“Analyzing the social media footprint of street gangs”,
In Proc. of IEEE ISI, 2015, pages 91–96, May 2015.
[6] L. Balasuriya, S. Wijeratne, D. Doran, and A. Sheth,
“Finding street gang members on twitter”, In Technical
Report, Wright State University, 2016.
25SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

More Related Content

Viewers also liked

Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Artificial Intelligence Institute at UofSC
 
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreamingSemantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
Artificial Intelligence Institute at UofSC
 
Semantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representationSemantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representation
Artificial Intelligence Institute at UofSC
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
Artificial Intelligence Institute at UofSC
 
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
Artificial Intelligence Institute at UofSC
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Artificial Intelligence Institute at UofSC
 

Viewers also liked (20)

Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
Semantic, Cognitive and Perceptual Computing -Keynote artificial intelligence...
 
Semantic perception tkp(1)
Semantic perception tkp(1)Semantic perception tkp(1)
Semantic perception tkp(1)
 
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
Listening to the pulse of our cities fusing Social Media Streams and Call Dat...
 
Exploring Synthetic Cannabinoid Effects Using Web Forum Data
Exploring Synthetic Cannabinoid Effects Using Web Forum Data Exploring Synthetic Cannabinoid Effects Using Web Forum Data
Exploring Synthetic Cannabinoid Effects Using Web Forum Data
 
Semantic, Cognitive and Perceptual Computing -Cognitive computing in autonomo...
Semantic, Cognitive and Perceptual Computing -Cognitive computing in autonomo...Semantic, Cognitive and Perceptual Computing -Cognitive computing in autonomo...
Semantic, Cognitive and Perceptual Computing -Cognitive computing in autonomo...
 
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
Vahid Taslimitehrani PhD Dissertation Defense: Contrast Pattern Aided Regress...
 
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
Stream Reasoning: mastering the velocity and variety dimensions of Big Data...
 
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreamingSemantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
Semantic, Cognitive and Perceptual Computing -Cognitive theory of dreaming
 
Integrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City EventsIntegrating Sensor and Social Data for Understanding City Events
Integrating Sensor and Social Data for Understanding City Events
 
Semantic, Cognitive and Perceptual Computing -Moonwalking with einstein
Semantic, Cognitive and Perceptual Computing -Moonwalking with einsteinSemantic, Cognitive and Perceptual Computing -Moonwalking with einstein
Semantic, Cognitive and Perceptual Computing -Moonwalking with einstein
 
Trending: Social media analysis to monitor cannabis and synthetic cannabino...
Trending: Social media analysis to monitor cannabis and synthetic cannabino...Trending: Social media analysis to monitor cannabis and synthetic cannabino...
Trending: Social media analysis to monitor cannabis and synthetic cannabino...
 
Semantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representationSemantic, Cognitive and Perceptual Computing -Human mental representation
Semantic, Cognitive and Perceptual Computing -Human mental representation
 
Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning Semantic, Cognitive and Perceptual Computing -Deep learning
Semantic, Cognitive and Perceptual Computing -Deep learning
 
Finding Street Gang Members on Twitter
Finding Street Gang Members on TwitterFinding Street Gang Members on Twitter
Finding Street Gang Members on Twitter
 
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
Semantic, Cognitive and Perceptual Computing -Using semantics and statistics ...
 
Implicit Entity Linking in Tweets
Implicit Entity Linking in TweetsImplicit Entity Linking in Tweets
Implicit Entity Linking in Tweets
 
Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2Feedbackdriven radiologyreportretrieval ichi2015-v2
Feedbackdriven radiologyreportretrieval ichi2015-v2
 
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual ObservationsUnderstanding City Traffic Dynamics Utilizing Sensor and Textual Observations
Understanding City Traffic Dynamics Utilizing Sensor and Textual Observations
 
Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]Context Aware Harassment Detection in Social Media [Overview]
Context Aware Harassment Detection in Social Media [Overview]
 
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...Personalized and Adaptive Semantic Information Filtering for Social Media - P...
Personalized and Adaptive Semantic Information Filtering for Social Media - P...
 

Similar to Word Embeddings to Enhance Twitter Gang Member Profile Identification

Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
IJERA Editor
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWC
Heidi Dusek
 
Comparative Study of Cyberbullying Detection using Different Machine Learning...
Comparative Study of Cyberbullying Detection using Different Machine Learning...Comparative Study of Cyberbullying Detection using Different Machine Learning...
Comparative Study of Cyberbullying Detection using Different Machine Learning...
ijtsrd
 

Similar to Word Embeddings to Enhance Twitter Gang Member Profile Identification (20)

Analyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street GangsAnalyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street Gangs
 
Analyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street GangsAnalyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street Gangs
 
Analyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street GangsAnalyzing the Social Media Footprint of Street Gangs
Analyzing the Social Media Footprint of Street Gangs
 
IRJET - Social Media Intelligence Tools
IRJET -  	  Social Media Intelligence ToolsIRJET -  	  Social Media Intelligence Tools
IRJET - Social Media Intelligence Tools
 
Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique Sentiment Analysis of Twitter tweets using supervised classification technique
Sentiment Analysis of Twitter tweets using supervised classification technique
 
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social MediaCredibility, Identity Resolution, Privacy, and Policing in Online Social Media
Credibility, Identity Resolution, Privacy, and Policing in Online Social Media
 
SWMRA EF- 2011
SWMRA EF- 2011SWMRA EF- 2011
SWMRA EF- 2011
 
Analyzing Emoji in Text
Analyzing Emoji in TextAnalyzing Emoji in Text
Analyzing Emoji in Text
 
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly DetectionDetection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
Detection and Analysis of Twitter Trending Topics via Link-Anomaly Detection
 
Educational Technology YWC
Educational Technology YWCEducational Technology YWC
Educational Technology YWC
 
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNINGSENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
SENTIMENT ANALYSIS OF SOCIAL MEDIA DATA USING DEEP LEARNING
 
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITYFRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
FRAMEWORK FOR ANALYZING TWITTER TO DETECT COMMUNITY SUSPICIOUS CRIME ACTIVITY
 
Comparative Study of Cyberbullying Detection using Different Machine Learning...
Comparative Study of Cyberbullying Detection using Different Machine Learning...Comparative Study of Cyberbullying Detection using Different Machine Learning...
Comparative Study of Cyberbullying Detection using Different Machine Learning...
 
IRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment AnalysisIRJET - Election Result Prediction using Sentiment Analysis
IRJET - Election Result Prediction using Sentiment Analysis
 
Social Media @Home and @Work: Understanding Who Is Using and Why
Social Media @Home and @Work:Understanding Who Is Using and WhySocial Media @Home and @Work:Understanding Who Is Using and Why
Social Media @Home and @Work: Understanding Who Is Using and Why
 
New Castle County - Social Media Primer
New Castle County - Social Media PrimerNew Castle County - Social Media Primer
New Castle County - Social Media Primer
 
What The heck Is Social Media?
What The heck Is Social Media?What The heck Is Social Media?
What The heck Is Social Media?
 
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
Twitter Based Sentiment Analysis of Each Presidential Candidate Using Long Sh...
 
Leveraging Social Media with Peter and KiKi
Leveraging Social Media with Peter and KiKiLeveraging Social Media with Peter and KiKi
Leveraging Social Media with Peter and KiKi
 
What's a government department doing on Twitter?
What's a government department doing on Twitter?What's a government department doing on Twitter?
What's a government department doing on Twitter?
 

Recently uploaded

Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
AnaAcapella
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
ZurliaSoop
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 

Recently uploaded (20)

Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.ICT role in 21st century education and it's challenges.
ICT role in 21st century education and it's challenges.
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
Spellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please PractiseSpellings Wk 3 English CAPS CARES Please Practise
Spellings Wk 3 English CAPS CARES Please Practise
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Magic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptxMagic bus Group work1and 2 (Team 3).pptx
Magic bus Group work1and 2 (Team 3).pptx
 
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptxBasic Civil Engineering first year Notes- Chapter 4 Building.pptx
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
 
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
Jual Obat Aborsi Hongkong ( Asli No.1 ) 085657271886 Obat Penggugur Kandungan...
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Spatium Project Simulation student brief
Spatium Project Simulation student briefSpatium Project Simulation student brief
Spatium Project Simulation student brief
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Holdier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdfHoldier Curriculum Vitae (April 2024).pdf
Holdier Curriculum Vitae (April 2024).pdf
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17How to Give a Domain for a Field in Odoo 17
How to Give a Domain for a Field in Odoo 17
 

Word Embeddings to Enhance Twitter Gang Member Profile Identification

  • 1. Word Embeddings to Enhance Twitter Gang Member Profile Identification Ohio Center of Excellence in Knowledge-enabled Computing (Kno.e.sis) Wright State University, Dayton, OH, USA Amit Sheth amit@knoesis.org Derek Doran derek@knoesis.org Presented at IJCAI Workshop on Semantic Machine Learning (SML 2016) New York City, NY, USA, July 10, 2016 Lakshika Balasuriya lakshika@knoesis.org Sanjaya Wijeratne sanjaya@knoesis.org
  • 2. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification 2SML @ IJCAI 2016 Tweets Source – http://www.wired.com/2013/09/gangs-of-social-media/all/ Image Source – http://www.7bucktees.com/shop/chi-raq-chiraq-version-2-t-shirt/ *Example tweets shown above are public data reported in the cited news paper article. They are not part of our research dataset.
  • 3. 3SML @ IJCAI 2016 What does gang related research tell us? “Gangs use social media mainly to post videos depicting their illegal behaviors, watch videos, threaten rival gangs and their members, display firearms and money from drug sales” [Patton 2015, Morselli 2013] Studies have shown,  82% of gang members had used the Internet and 71% of them had used social media [Decker 2011]  45% of gang members have participated in online offensive activities and 8% of them have recruited new members online [Pyrooz 2013] Image Source – http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 4. 4SML @ IJCAI 2016 There are other spectators too… Source – http://www.lexisnexis.com/risk/downloads/whitepaper/2014-social-media-use-in-law-enforcement.pdf Image Source – http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 5. 5 But there’s a problem… Image Source – http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg Wijeratne, Sanjaya et al. "Analyzing the Social Media Footprint of Street Gangs" in IEEE ISI 2015, doi: 10.1109/ISI.2015.7165945 Source – http://www.lexisnexis.com/risk/downloads/whitepaper/2014-social-media-use-in-law-enforcement.pdf Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile IdentificationSML @ IJCAI 2016
  • 6. 6SML @ IJCAI 2016 Enhance Twitter Gang Member Profile Identification with Word Embeddings Image Source – http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
  • 7. 7SML @ IJCAI 2016 Gang Member Dataset Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Number of Words from each Feature Gang Member Class Non-gang Member Class Total Tweets 3,825,092 45,213,027 49,038,119 Profile Descriptions 3,348 21,182 24,530 Emoji 732,712 3,685,669 4,418,381 YouTube Videos 554,857 10,459,235 11,041,092 Profile Images 10,162 73,252 83,414 Total 5,126,176 59,452,365 64,578,536 Description Gang Member Class Non-gang Member Class Total Number of Profiles 400 2,865 3,265 Number of Tweets 821,412 7,238,758 8,060,170 Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
  • 8. 8SML @ IJCAI 2016 Feature Type – Tweets Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Words from Gang Member’s Tweets Words from Non-gang Member’s Tweets Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016
  • 9. 9SML @ IJCAI 2016 Feature Type – Profile Descriptions Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 *Example Twitter profiles shown above are public data reported in news articles. They are not part of our research dataset. Word usage in profile descriptions: Gang Vs. Non-gang members Percentofprofilesthatcontainthewordinprofiledescription
  • 10. 10SML @ IJCAI 2016 Feature Types – Emoji Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 Emoji usage distribution: Gang Vs. Non-gang members PercentofprofilesthatcontainstheemojiinTweets
  • 11. 11SML @ IJCAI 2016 Feature Types – YouTube Links Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 51.25% of gang members post at least one YouTube video  76.58% of them are related to gangster hip-hop  A gang member shares 8 YouTube links on average  Top 5 words from gang members – shit, like, nigga, fuck, lil Top 5 words from non-gang members – like, love, peopl, song, get
  • 12. 12SML @ IJCAI 2016 Feature Types – Profile Images Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Balasuriya, Lakshika et al. “Finding Street Gang Members on Twitter”, Technical Report, Wright State University, 2016 Image tags distribution: Gang Vs. Non-gang members Percentofprofilesthatcontainstheimagetag
  • 13. 13SML @ IJCAI 2016 Classification Approach Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Convert all non textual features to text  Emoji for Python API for emoji1  Clarifai API for profile and cover images2 Remove stop word and stem the remaining Train a Skip-gram model using textual features  Negative sample rate = 10  Context window size = 5  Ignore words with min count = 5  Number of dimensions = 300 1https://pypi.python.org/pypi/emoji | 2http://www.clarifai.com/
  • 14. 14SML @ IJCAI 2016 Classification Approach Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Classifier training with word embeddings
  • 15. 15SML @ IJCAI 2016 Representing Training Examples Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification For each Twitter profile p, with n unique words, and the vector of the ith word in p denoted by wip, we compute the feature vectors for Vp in the following ways.  Sum of word embeddings – Sum of word vectors for all words in Twitter profile p
  • 16. 16SML @ IJCAI 2016 Representing Training Examples Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification  Mean of word embeddings – Mean of word vectors for all words in Twitter profile p  Sum of word embeddings weighted by term frequency where term frequency of ith word in p is denoted by cip
  • 17. 17SML @ IJCAI 2016 Representing Training Examples Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification  Sum of word embeddings weighted by tf-idf where tf-idf of ith word in p is denoted by tip  Sum of word embeddings weighted by term frequency where term frequency of ith word in p is denoted by cip.
  • 18. 18SML @ IJCAI 2016 Classification Results Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Classification results based on 10-fold cross validation
  • 19. 19SML @ IJCAI 2016 Results Interpretation Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Averaging the vector sum weighted by term frequency performs the best Out of the five vector based operations on word embeddings;  Four of them gave classifiers that beat baseline model(1)  Two of them performed as nearly equal or beat baseline model(2)
  • 20. 20SML @ IJCAI 2016 Results Interpretation Cont. Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification Word vector (top 10) for BDK showed  Five were rival gangs of Black Disciples  Two were syntactic variations of BDK  Three were syntactic variations of GDK Word vector (top 10) for GDK showed  Six were Gangster Disciple gangs where others have expressed their hatred towards them Word vector (top 10) for CPDK showed  Other keywords showing hatred towards cops such as fuck12, fuckthelaw, fuk12 and gang names
  • 21. 21SML @ IJCAI 2016 Challenges and Future work  Build image recognition systems that can accurately identify images commonly posted by gang members  Guns, Drugs, Money, Gang Signs  Integrate slang dictionaries and use them to pre- train word embeddings  Hipwiki.com  Utilize social networks of gang members to identify new gang members Image Source – http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 22. 22SML @ IJCAI 2016 Connect with me sanjaya@knoesis.org @sanjrockz http://bit.do/sanjaya Image Source – http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 23. 23SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 24. References [1] D. U. Patton, “Gang violence, crime, and substance use on twitter: A snapshot of gang communications in Detroit,” Society for Social Work and Research 19th Annual Conference: The Social and Behavioral Importance of Increased Longevity, Jan 2015 [2] C. Morselli and D. Decary-Hetu, “Crime facilitation purposes of social networking sites: A review and analysis of the cyber banging phenomenon,” Small Wars & Insurgencies, vol. 24, no. 1, pp. 152–170, 2013 [3] S. Decker and D. Pyrooz, “Leaving the gang: Logging off and moving on. Council on foreign relations,” 2011 24SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification
  • 25. References Cont. [4] D. C. Pyrooz, S. H. Decker, and R. K. Moule Jr, “Criminal and routine activities in online settings: Gangs, offenders, and the internet,” Justice Quarterly, no. ahead-of-print, pp. 1–29, 2013 [5] S. Wijeratne, D. Doran, A. Sheth, and J. L. Dustin, “Analyzing the social media footprint of street gangs”, In Proc. of IEEE ISI, 2015, pages 91–96, May 2015. [6] L. Balasuriya, S. Wijeratne, D. Doran, and A. Sheth, “Finding street gang members on twitter”, In Technical Report, Wright State University, 2016. 25SML @ IJCAI 2016 Wijeratne, Sanjaya et al. Word Embeddings to Enhance Twitter Gang Member Profile Identification

Editor's Notes

  1. Tweets Source - http://www.wired.com/2013/09/gangs-of-social-media/all/ Image Source - http://www.7bucktees.com/shop/chi-raq-chiraq-version-2-t-shirt/ *Example tweets shown above are public data reported in the cited news paper article. They are not part of our research dataset
  2. Image Source - http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg
  3. Image Source - http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg
  4. Image Source - http://www.officialpsds.com/images/stocks/crime-scene-stock1016.jpg
  5. Image Source - http://www.sciencenutshell.com/wp-content/uploads/2014/12/o-GANG-VIOLENCE-facebook.jpg
  6. *Example Twitter profiles shown above are public data reported in news articles. They are not part of our research dataset.
  7. Image Source - http://i.ytimg.com/vi/dqyYvIqjuFI/maxresdefault.jpg
  8. Image Source - http://www.pcb.its.dot.gov/standardstraining/mod08/ppt/m08ppt23.jpg